<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Go on Aeryx</title><link>https://blog.aeryx.ai/tags/go/</link><description>Recent content in Go on Aeryx</description><generator>Hugo</generator><language>en-us</language><lastBuildDate>Thu, 25 Jun 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://blog.aeryx.ai/tags/go/index.xml" rel="self" type="application/rss+xml"/><item><title>lmkit-go vs lmkit: pure Go on XLA against PyTorch, same 8GB card</title><link>https://blog.aeryx.ai/posts/lmkit-go-vs-lmkit/</link><pubDate>Thu, 25 Jun 2026 00:00:00 +0000</pubDate><guid>https://blog.aeryx.ai/posts/lmkit-go-vs-lmkit/</guid><description>Built an LLM trainer in pure Go on XLA to see if it could keep up with the PyTorch original. Ran both to a 2B-token Chinchilla budget on the same 8GB 3070 Ti, same 100M model, same config, same tokens, same seed. Go-on-XLA lands at 85% of PyTorch throughput at ~1.7x the memory, and the loss curves track within 0.1 the whole way. Three weeks ago the same stack was 26x slower and couldn&amp;rsquo;t fit the model at all.</description></item></channel></rss>