MiniMax M2.5 and Qwen3.5: The Open-Weight Models Worth Knowing About
In the last article I showed the SWE-Bench numbers. Open-weight models are basically tied with the proprietary ones now. Two models stood out to me: MiniMax M2.5 and Qwen3.5.
Here's what I found out about them.
MiniMax M2.5
MiniMax is a Chinese startup, founded in 2022. Not part of a big tech company like Alibaba or Google. They built models for text, audio, video, and music. You might know their Hailuo Video product.
The M2.5 is a Mixture-of-Experts model. 230 billion parameters total but only 10 billion active per call. That's why the price works. You get the intelligence of a big model but pay for a small one. Input costs $0.30 per million tokens, output $1.20. You can check their API pricing here.
What's different about it
Two things I noticed.
First, the training. M2.5 was trained with reinforcement learning in over 200,000 real environments. Not static data. Actual code repos, browsers, office apps. The model learned by doing things, not just reading about them. One behavior that came out of this is what MiniMax calls "Architect Mindset". Before writing any code, the model breaks down the problem and plans the structure. It thinks about design before it starts coding. This wasn't programmed in, it just appeared during training. You can read more about it in their release blog.
Second, speed. M2.5 finished the SWE-Bench evaluation 37% faster than the previous version and matched Claude Opus 4.6 in speed. They also offer two API versions: regular and highspeed, same quality but lower latency.
The catch
Independent tests from OpenHands show M2.5 is strong at building apps from scratch and fixing issues. But it sometimes forgets to follow formatting instructions. In one test it pushed to the wrong branch. Good at coding, not as precise as Claude at following complex instructions.
The weights are on HuggingFace, MIT license. You can deploy it privately and fine-tune it.
Qwen3.5
Qwen is maintained by the Qwen team at Alibaba Cloud. All models are Apache 2.0 licensed, which means free for commercial use. The family goes from 0.8B to 397B parameters. The full model list is on their GitHub repo.
One thing to note: Lin Junyang, the technical lead who ran the Qwen3.5 development, left Alibaba in early 2026. Alibaba says they'll keep investing in open source but it's worth watching.
Model sizes
The lineup covers different use cases:
0.8B and 2B for phones and edge devices
4B for lightweight agents, 262K context
9B is the sweet spot for laptops
27B scored 0.724 on SWE-Bench, needs an A100 or a Mac with lots of RAM (model card)
35B-A3B is MoE with only 3B active, very efficient (model card)
397B-A17B is the flagship, 262K context that can extend to 1M
The architecture is different from standard Transformers. Qwen3.5 combines Gated Delta Networks with MoE, which makes it faster and uses less memory. The models are also natively multimodal, trained on text, images, and video from the start.
Running it on your laptop
Ollama is the easiest way. The 9B model is the right size for consumer hardware:
# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
# Download and run (~5.4GB)
ollama run qwen3.5:9b
Works on laptops with 16GB of RAM. The quantized version loses less than 1% quality compared to full precision.
Smaller options:
ollama run qwen3.5:4b # ~2.5GB
ollama run qwen3.5:2b # ~1.5GB
Once running, Ollama gives you an OpenAI-compatible API at localhost:11434:
curl http://localhost:11434/api/chat \
-d '{"model": "qwen3.5:9b", "messages": [{"role": "user", "content": "Hello!"}]}'
You can also use LM Studio if you prefer a GUI, or llama.cpp for full control.
Why this matters
Cost. Coding tasks can go to M2.5 at \(0.30/\)1.20 instead of Claude at \(5/\)25. The price difference absorbs the small quality gap.
Compliance. Open-weight models inside your VPC means data never leaves your infrastructure. If you deal with PCI, SOC2, or HIPAA, this solves a real problem.
Speed of testing. Two commands and you have a working model locally. No account, no API key, no cost. You can test ideas before committing to anything.
Next article I'll talk about context windows and why bigger doesn't mean better. This one surprised me when I first looked into it.
Guilherme is a Senior Cloud/DevOps Engineer focused on AI infrastructure, building production pipelines in regulated environments.
