“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
“They delivered a high-quality product and their customer service was excellent. We’ve had other teams approach us, asking to use it for their own projects”.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
81.8% NPS78% of our clients believe that Arbisoft is better than most other providers they have worked with.
Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
“Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented.
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met.
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
“The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
“Arbisoft partnered with Travelliance (TVA) to develop Accounting, Reporting, & Operations solutions. We helped cut downtime to zero, providing 24/7 support, and making sure their database of 7 million users functions smoothly.”
“I couldn’t be more pleased with the Arbisoft team. Their engineering product is top-notch, as is their client relations and account management. From the beginning, they felt like members of our own team—true partners rather than vendors.”
Arbisoft was an invaluable partner in developing TripScanner, as they served as my outsourced website and software development team. Arbisoft did an incredible job, building TripScanner end-to-end, and completing the project on time and within budget at a fraction of the cost of a US-based developer.
AI now isn’t just about building bigger models—it’s about making them accessible. Google just dropped a bombshell with its Gemma 3 Quantization-Aware Trained (QAT) models, and it’s a game-changer for developers, startups, and hobbyists tired of begging for cloud credits or $10,000 GPUs. Let’s break down why this matters in 2025—and how it could reshape how we build, deploy, and interact with AI.
The Memory
Quantization isn’t new, but Google’s int4 implementation for Gemma 3 is a leap forward. By compressing model weights from 16-bit floating points (BF16) to 4-bit integers (int4), they’ve hacked memory needs without sacrificing usability. Here’s the breakdown:
27B model - Drops from 54 GB → 14.1 GB (74% reduction)
12B model - Shrinks 24 GB → 6.6 GB (73% lighter)
4B model - Goes from 8 GB → 2.6 GB (ideal for Raspberry Pi projects)
1B model - Reduces down 2 GB → 0.5 GB (yes, your phone could run this).
What's new in Google’s Gemma 3 QAT Models
The 27B QAT model fits on a consumer RTX 3090 (24 GB VRAM), a GPU that’s now 4-5 years old and widely available secondhand for ~$1,500. This means startups and indie developers can fine-tune enterprise-grade models locally, bypassing costly cloud rentals.
For comparison, Meta’s Llama 3 8B (non-quantized) requires ~16 GB of VRAM—Gemma 3’s 27B QAT is nearly 2x larger but uses less memory. Gemma 3 QAT is 2.8x faster on cheaper hardware.
Lee Mager tested the 27B model on an RTX 5090, hitting 56 tokens/sec (faster than most APIs!).
No More “Quantization = Quality Loss”
Quantization often turns models into sluggish, dumbed-down versions of themselves. But Google’s QAT approach flips the script:
5,000-step distillation - The model learns from its full-precision counterpart during training, mimicking its behavior to minimize accuracy loss.
54% lower perplexity drop - Compared to post-training quantization (PTQ), Gemma 3 QAT retains far more of its original “intelligence.” Perplexity (a measure of model confidence) dropped just 0.8 points vs. PTQ’s 1.75-point fall on the llama.cpp benchmark. Perplexity measures how ‘confused’ a model is—lower is better. Gemma 3 QAT’s 0.8 drop means it stays sharp even after compression.
Preserved capabilities - Despite compression, it keeps the original Gemma 3’s instruction-tuning, multi-turn chat skills, and dynamic tool use (e.g., coding, data analysis, API calls).
For instance, developers can integrate multi-modal workflows (text + vision + code) without latency—critical for real-time apps like design tools or robotics.
Why This Matters
But how does this translate from lab benchmarks to real-world impact? Let’s break it down.
Cost Efficiency - Training clusters like NVIDIA’s DGX H100 cost ~$250K. Gemma 3 QAT lets smaller teams compete with corporate giants using consumer GPUs.
Edge AI Explosion - With VRAM demands crushed, expect AI in offline apps, rural healthcare tools, and lightweight IoT devices. Google’s MLX compatibility means Apple Silicon Macs can now run 27B models natively.
Tool Integration 2.0 - Improved argument selection means the model doesn’t just use tools—it refines them. For example, in coding tasks, it can adjust API calls mid-conversation based on user feedback.
Stat to note - In 2025, 60% of new AI projects are expected to prioritize local deployment over cloud-based solutions (Gartner, 2024)—Gemma 3 QAT is arriving right on time.
Real-World Use Cases
Want to know how this tech performs when it’s out in the wild? Here’s how developers are already putting Gemma 3 QAT to work.
Run a 12B model on a Jetson Orin (8 GB VRAM) to summarize papers, extract data, and generate hypotheses—no internet needed.
Embed a 4B model into a no-code tool for small businesses, offering ChatGPT-like features without $10K/month AWS bills. Low-Code SaaS platforms are reliving again?
Deploy a 1B model on a Raspberry Pi 5 for secure, offline mental health support in areas with spotty connectivity.
For instance, if a telco tested the 12B model for network troubleshooting, it may resolve tickets 40% faster than their previous fine-tuned 7B model.
Kamell praised its practicality: “Finally, a model that doesn’t require a NASA-level setup.”
Some Potential Limitations
While Gemma 3 QAT is groundbreaking, it’s not a magic bullet. Here’s where it falls short:
Struggles with multi-step logic tasks like advanced math proofs or legal analysis. For example, GPT-4 scores ~85% on the MATH benchmark (problems requiring calculus/stats), while Gemma 3 QAT hits ~62% (Google’s internal tests). It’s great for chatbots and coding assistants, not for replacing niche experts.
Quantization reduces memory but adds computational overhead. While Lee Mager hit 56 tokens/sec on an RTX 5090, older GPUs like the RTX 3090 see ~20% slower speeds vs. BF16 models.
Out-of-the-box, Gemma 3 QAT isn’t optimized for ultra-specialized tasks like medical imaging or quantum chemistry. You’ll still need domain-specific data to fine-tune it.
While MLX supports Apple Silicon, older Intel Macs or budget Windows laptops with integrated GPUs may struggle with the 4B+ models.
How to Get Started with Gemma 3 QAT Models
Here’s how to deploy Gemma 3 QAT in a few easy steps.
Grab the GGUF files from Hugging Face or Kaggle—no login walls or paywalls.
Use llama.cpp for CLI lovers, LM Studio for GUI fans, or Ollama for seamless Mac/Linux/Win integration.
Follow Google’s tutorials for Apple Silicon (MLX), Docker, or even Kubernetes clusters.
The Q4_0 format ensures compatibility with older quantized runtimes, making upgrades frictionless for existing projects.
The Bottom Line
Google hasn't just released a model—they’re democratizing AI innovation. By slashing hardware barriers and preserving quality, Gemma 3 QAT lets anyone build, experiment, and deploy without corporate-scale resources. In 2025, when AI is expected to add $15.7 trillion to the global economy (PwC), tools like this could redistribute power from Silicon Valley boardrooms to indie devs in Nairobi or Warsaw.
According to a prediction,
With Google’s roadmap, expect Gemma 4 to bring int2 quantization—cutting memory needs by another 50% by 2026. - The Information, 2025
The future is lightweight, local, and open. And with Gemma 3 QAT, it’s already seeming really near.