NVIDIA just released a new Nemotron 120 billion parameter model. 120b is a pretty big model, but it’s also a pretty big model I can run. It’s always exciting when I get to try out a new model.

I installed this via Ollama for my testing. While I can probably get better performance out of llama.cpp or vLLM, Ollama is easy so I’m going with that.

I first tried the test I ran long ago in the post Let’s test some models. This is the table it spit out

Word1 Word2 Word3 Word4 Word5 Word6
apple bread chair eagle fiver grape
house ivory jelly knave lemon mango

That’s better than anything has done yet. We could nitpick it didn’t use a 0 index integer for the Word position, but I’ll let it go.

I also had it summarize some text and write me a tic tac toe game. It did what I say is a reasonable job. I don’t have anything to really nitpick or claim it’s amazing.

I’m seeing between 12 and 19 tokens per second which is pretty good for a model of this size.

In the past I asked the LLMs to Create a html web page that draws a christmas tree in the middle of the page so why not try that again. While it’s not Christmas anymore, who doesn’t wish it was!

The last image looked like this

alt

The new image looks like this

alt

That’s probably worse

In the near future I’ll see how it does with programming. One of the challenges a lot of the models have is tool calling in a way that isn’t broken. I tried to do this in the post Using Claude Code with Ollama. I’ll probably use opencode instead of claude code. But that’s a project I’ve never gotten any of the local LLMs to build correctly.

There’s definitely potential here. I’m excited to see what other local models we see in 2026. I’m a gigantic cheapskate, so I love that I can test all this out locally.