AI, Best Practices, Software Development

Will This Change the Course of Large Language Models?

September 5, 2025 Giovanna D Simoes Leave a comment

Probably not, but it could have some impact.

The key parameters for what an LLM can do is the number of “parameters” it can run.

Up until now, anything larger than 32 billion parameters and most computers that you might buy in the store will come to a screeching halt.

But now, someone on Reddit (if it is on Reddit, it must be true :)) claims to be able to run OpenAI’s newest 120 billion parameter on a budget computer with an 8 GB GPU. Sounds like fiction.

The secret sauce to making this work is something called Mixture of Experts (MoE). MoE splits one big LLM into several smaller ones. This drastically reduces the memory needed so a modest system with a modest GPU will work.

OpenAI says that a datacenter grade GPU with at least 80 gig of memory is required.

But this Redditor says it will run on a PC with a budget 8 GB GPU and 64 GB of RAM.

A researcher decided to test it on his home rig. The GPU in his system isn’t optimal for running LLMs. It also only has 20 gig of RAM which a lot less than OpenAI claims is needed but a lot more than the Redditor says is required.

When he ran a 70 billion parameter model on it before, the best he got was a throughput of around one token a second – less than a word per second.

So he downloaded the model and went with the default setup as a test.

He got a warning message that the model was too large to run on his system, so he had to disable all of the safetys.

He asked the LLM to explain gravity and he got 15-20 tokens per second with a (relatively) crappy home PC and no tuning.

This MoE technique delivers over 20x performance improvement AND it actually makes useful LLMs run on a home computer.

While this breakthrough doesn’t solve all of the worlds (AI) problems, it is really interesting and if you ran it on a modestly bigger computer it might work acceptably in a development environment.

And, this is the first time anyone has done this, so what will next month bring.

Definitely very interesting. Credit: Cybernews