INDUSTRIES

Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
Discover More
- "Working with Arbisoft has felt less like hiring a vendor and more like gaining a team of trusted colleagues. Their developers don’t just build what we ask, they think alongside us, offer smart suggestions, and care deeply about getting it right."
  Sarah Johnson / SVP of Product, Summit K12
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
Discover More
- “Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
  Paul English / Co-Founder, KAYAK
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
Discover More
- "I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
  Matt Hasel / Program Manager, eHuman
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
Discover More
- “Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
  Jake Peters / CEO & Co-Founder, PayPerks
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
Discover More
- "The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
  Veronika Sonsev / Co-Founder
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
Discover More
- “The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
  Silvan Rath / CEO, Predict.io

Inside Alibaba’s Qwen3 AI Models: How They Compare to Claude Opus 4

Amna ManzoorPosted on August 5, 2025

9-10 Min Read Time

Almost every week, a new AI model is launched with big promises: better reasoning, faster output, smarter code, or stronger multilingual abilities. But most of them follow a familiar pattern, slightly improved benchmark scores, a different name, and another spot on the leaderboard.

Alibaba’s July 2025 release breaks that pattern. In one release, Alibaba launched a new set of open-source language models that quickly made an impact in the global AI space. The most important one is Qwen3-235B-A22B-Instruct-2507, a powerful model that not only competes with but sometimes outperforms Anthropic’s Claude Opus 4. Along with it, Alibaba introduced Qwen3-Coder-480B-A35B-Instruct, a large model focused on code and agent-style behavior.

Both models are already moving up on the Hugging Face Open LLM Leaderboard and are getting a lot of attention in AI communities. Let’s take a closer look at what they offer, how they compare to Claude, and what people in the field are saying.

Alibaba Qwen3-235B: Big, Smart, and Efficient

This model is designed for general-purpose use and excels at understanding and following instructions. It has 235 billion total parameters and uses a Mixture-of-Experts (MoE) system. Out of 128 total experts, only 8 are active during any single run, which means 22 billion active parameters are used at a time.

What really makes this model special is how well it balances size and efficiency. It supports long context windows, up to 262,144 tokens, which is perfect for long-form reasoning, document analysis, and complex multi-step tasks.

Performance-wise, Qwen3-235B has shown solid results across many areas: instruction following, understanding different languages, math reasoning, and even coding. The benchmarks show this clearly:

Benchmark/Test	Qwen3-235B-A22B-Instruct-2507	Qwen3-235B-A22B-Thinking-2507	Claude Opus 4
MMLU (General Knowledge)	~83.0%	—	~87–89%
GPQA (Graduate QA)	77.5%	81.1%	~79.6% → ~83% (thinking mode)
AIME25 (Reasoning)	~70.3%	92.3%	~75% (with thinking mode)
LiveCodeBench v6 (Coding)	51.8%	74.1%	~72–73% (estimated)
Arena-Hard v2 (Alignment)	79.2%	79.7%	Not reported
Thinking Mode	Not included	Yes	Yes
Open Source	Yes	Yes	No
Context Length	262K tokens	262K tokens	200K tokens

These results put it very close to Claude Opus 4 and ahead of many open-source models, especially for advanced reasoning and language understanding.

Alibaba Qwen3-Coder-480B: A Powerful Model for Code

Just two days later, Alibaba released Qwen3-Coder, a model made for coding tasks, like generating code, fixing bugs, using tools, and completing complex software workflows.

It’s a much bigger model, with 480 billion total parameters and 160 experts. Like the other model, it activates only 8 experts per run, resulting in 35 billion active parameters at a time. This makes it surprisingly efficient, despite its large size.

Qwen3-Coder also handles very long context windows, with built-in support for 256K tokens, and it can scale up to 1 million tokens. This allows it to process entire codebases, long logs, or documentation in one go.

Its performance also stands out. On SWE-bench Verified, a benchmark that tests how well models fix real-world software bugs over 100+ steps, it scored around 67%. That puts it on par with Claude Sonnet and ahead of many other models, such as DeepSeek-V2 and Kimi K2.

How Do These Models Compare to Claude and Other Top Models?

Perhaps the standout feature of Alibaba’s new models is that they’re open source, while performing just as well (if not better) than Claude.

Models like Claude Opus and Claude Sonnet are closed-source and can only be used through APIs. That limits how much developers or researchers can customize or run them independently. In contrast, Qwen3 models are open-weight, which means anyone can host them, modify them, and use them freely.

Here’s how they stack up:

As discussed above, Qwen3-235B-A22B performs as well as or better than Claude Opus in:
- MMLU-Redux
- Long-context tasks
- Math and reasoning
Qwen3-Coder performs almost exactly like Claude Sonnet on real-world programming benchmarks, and is available for full access.

These results aren’t just numbers; they matter to developers. For anyone building AI-powered apps or tools, these models offer high capability with fewer restrictions.

Why Every New Model Claims to Be the Best

It’s common for every new AI model to say it’s the best. That’s because the race is intense, models are constantly trying to show higher scores, longer context windows, or faster speeds.

But small gains, like 1% on MMLU or a few points on GSM8K, don’t always lead to noticeable improvements in real-world use. What matters more is:

How well a model aligns with user input
How much context can it handle
Whether it works well with AI agents
And how flexible it is across different tasks

That’s where models like Qwen3-235B and Qwen3-Coder stand out. They aren’t just growing in size, they’re designed for balanced performance, lower computing costs, and agent-friendly behavior, especially in the case of the coding model.

While Qwen3 is making headlines today, it’s part of a broader wave of next-gen Chinese AI models reshaping the global AI race from Kimi K1.5 to Manus AI, explore how China is setting the pace.

Is Alibaba Catching Up in the Global AI Race?

Even though Alibaba is releasing some of the best models out there, it's still not a widely recognized name in the AI industry. Here’s why:

Geopolitical factors make it harder for Alibaba to form commercial partnerships in the US and Europe, which are the established playgrounds for the biggest names in the space.
Its models are often released first in Chinese-language platforms, which delays attention from English-speaking users.
Media attention tends to focus more on companies like OpenAI, Google DeepMind, or Anthropic who are established players in this space.

Still, things are changing. Since the July release, Qwen3 models have gained momentum globally. They’ve seen rising GitHub stars, more downloads on Hugging Face, and increasing mentions in research papers.

So even if Alibaba isn’t dominating headlines, it’s clearly making a real impact in the AI space.

The AI Race Is Shifting and It’s Not Just About Size Anymore

In 2023 and 2024, the AI race was all about size. Bigger models were assumed to be better. But in 2025, the trend has changed. The top-performing models now focus on a balance of power, speed, and flexibility.

The best models today:

Work well across different languages
Understand complex tasks deeply
Support AI agents
And can run locally when needed

Alibaba’s latest models meet all of these needs. They might not have the loudest marketing, but they’re earning developer trust and delivering strong technical results, reflecting the impact of advanced AI technology solutions. In the end, that’s what really matters, and it’s what the next stage of the AI race will be built on.

Just published

img-https://d1foa0aaimjyw4.cloudfront.net/Top_5_QA_AI_Tools_Transforming_Test_Automation_in_2025_00d40213ff.png

Top 5 QA AI Tools Transforming Test Automation in 2025Read more

img-https://d1foa0aaimjyw4.cloudfront.net/Why_I_Ditched_Electron_for_Qt_And_You_Probably_Should_Too_31a04bf8ad.png

Why I Ditched Electron for Qt (And You Probably Should Too)Read more

img-https://d1foa0aaimjyw4.cloudfront.net/How_to_Deploy_Deep_Learning_for_Dynamic_Pricing_in_Travel_and_Hospitality_fe4b6ff5a4.png

How to Deploy Deep Learning for Dynamic Pricing in Travel and HospitalityRead more

Explore More

Have Questions? Let's Talk.

We have got the answers to your questions.

Trusted by Market Leaders in Education, Travel, Finance and E-commerce since 2007

We put excellence, value and quality above all - and it shows

NPS

INDUSTRIES

Real-time Maintenance Reporting

Workflow Automation Platform

Recruitment Automation Tool

Learner Engagement Platform

Customer Feedback Analytics

School Communication Suite

Digital Learning Suite

Software Development Outsourcing

Dedicated Teams

IT Staff Augmentation

New Venture Partnership

Inside Alibaba’s Qwen3 AI Models: How They Compare to Claude Opus 4

Alibaba Qwen3-235B: Big, Smart, and Efficient

Alibaba Qwen3-Coder-480B: A Powerful Model for Code

How Do These Models Compare to Claude and Other Top Models?

Why Every New Model Claims to Be the Best

Is Alibaba Catching Up in the Global AI Race?

The AI Race Is Shifting and It’s Not Just About Size Anymore

People Also Ask

Just published

Have Questions? Let's Talk.

Inside Alibaba’s Qwen3 AI Models: How They Compare to Claude Opus 4

Alibaba Qwen3-235B: Big, Smart, and Efficient

Alibaba Qwen3-Coder-480B: A Powerful Model for Code

How Do These Models Compare to Claude and Other Top Models?

Why Every New Model Claims to Be the Best

Is Alibaba Catching Up in the Global AI Race?

The AI Race Is Shifting and It’s Not Just About Size Anymore

People Also Ask

Just published

Have Questions? Let's Talk.

Newsletter