INDUSTRIES

Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
Discover More
- "Working with Arbisoft has felt less like hiring a vendor and more like gaining a team of trusted colleagues. Their developers don’t just build what we ask, they think alongside us, offer smart suggestions, and care deeply about getting it right."
  Sarah Johnson / SVP of Product, Summit K12
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
Discover More
- “Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
  Paul English / Co-Founder, KAYAK
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
Discover More
- "I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
  Matt Hasel / Program Manager, eHuman
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
Discover More
- “Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
  Jake Peters / CEO & Co-Founder, PayPerks
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
Discover More
- "The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
  Veronika Sonsev / Co-Founder
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
Discover More
- “The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
  Silvan Rath / CEO, Predict.io

AI Model Compression Part VI: The Awakening of Vision - From Simple Cells to Convolutional Understanding

Ateeb TaseerPosted on January 2, 2025

4-5 Min Read Time

Building on the insights into memory and learning explored in AI Model Compression Part V: The LSTM: A New Kind of Memory, we now turn to another cornerstone of intelligence: vision. Understanding how we and machines perceive the world has been pivotal in shaping the design of neural networks like CNNs, which form the backbone of deep learning solutions widely utilized across industries.

The Ancient Art of Seeing

In 1959, scientists David Hubel and Torsten Wiesel made an important discovery while studying how cats see. They found that some brain cells in the cats only reacted when they saw lines at certain angles. This simple idea—that vision starts with recognizing basic patterns—later helped shape the design of convolutional neural networks. But the story actually starts much earlier.

The Hierarchy of Sight

Consider how a Renaissance artist learns to draw a face:

First, they see basic lines and edges
These combine into simple shapes
Shapes form features like eyes and nose
Features compose into the complete face

This step-by-step approach is similar to how our brain processes what we see and, later, how convolutional networks learn to "see."

The Mathematics of Vision: One Pixel at a Time

The Convolution Operation: Nature's Pattern Detector

This step-by-step way of understanding is similar to how our brain processes what we see, and later, how convolutional networks learn to "see".

Convolution Operation:
(f * g)(x) = ∑ f(τ)g(x - τ)

In vision terms:
- f is what we're looking at (the image)
- g is what we're looking for (the kernel/filter)
- τ represents shifting viewpoint

Think of it like an ancient tracker reading marks in the sand:

The tracker has learned patterns (kernels) to recognize
They slide their gaze across the ground (convolution)
At each point, they compute how well the pattern matches

The math is similar to this ancient skill:

For a 3x3 kernel scanning an image:
           Image Patch        Kernel
[p1 p2 p3]    [1  0  1]
[p4 p5 p6] ⊙  [0  1  0] = Σ(pᵢkᵢ)
[p7 p8 p9]    [1  0  1]

This isn't just multiplication; it's math that shows how pattern recognition works.

Pooling: The Art of Summarizing

After detecting patterns, our visual system (and CNNs) must summarize what it sees. Max pooling tells this story mathematically:

Max Pooling Operation:
2x2 Region:      Summary:
[2.1  1.4]   →     2.1
[0.8  1.9]

Like an artist stepping back from their canvas,

seeing the larger composition rather than individual strokes.

This operation embodies a deep truth about perception: sometimes, to understand better, we need to see less. It's the visual equivalent of "missing the forest for the trees", a concept reminiscent of visualization solutions that translate complex data into clear visual narratives.

The Deep Architecture of Vision

Layer by Layer: Building Complexity

A CNN's structure mirrors the evolutionary development of vision itself:

Layer 1 (Simple Cells):
Kernel examples:
[-1  0  1]    [1   1   1]
[-2  0  2]    [0   0   0]
[-1  0  1]    [-1 -1  -1]

These detect edges, just like the basic cells Hubel and Wiesel found in the brain.

As we go further, the network begins to recognize more complex patterns:

unnamed (4).png

This hierarchy is like how a child learns to see: First edges → Then shapes → Then objects → Finally meaning.

The Backpropagation Story: Learning to See Better

The way CNNs learn is a very human process. When we calculate gradients through the network:

unnamed (3).png

This isn't just calculus; it's math that shows how we learn from mistakes. Like an art student improving their work based on feedback, each backpropagation step helps the network get better at understanding what it sees.

The Activation Story: ReLU and the Art of Decision

The choice of ReLU (Rectified Linear Unit) as an activation function tells a surprisingly profound story:

ReLU(x) = max(0, x)

Graphically:

unnamed (5).png

This simple function shows an important truth about both biological and artificial vision: neurons must decide whether to fire or not. Why is this so effective? Think about how we see:

Some features in a scene matter (positive activation)
Others don't (set to zero)
There's no need for complicated in-between details

This is similar to how we pay attention—either we notice something, or we don’t. The math behind ReLU captures this simple yes/no process of seeing, while still allowing the network to learn.

Check out our previous blog, AI Model Compression Part III: Foundation of Optimization: The Birth of Neural Computation, for a deeper understanding.

Just published

img-https://d1foa0aaimjyw4.cloudfront.net/Inside_Alibaba_s_Qwen3_AI_Models_How_They_Compare_to_Claude_Opus_4_Arbisoft_1_f07ca5a2f4.png

Inside Alibaba’s Qwen3 AI Models: How They Compare to Claude Opus 4Read more

img-https://d1foa0aaimjyw4.cloudfront.net/ML_Project_Team_in_2025_83294a53cc.png

What Are the Advantages and Disadvantages of Hiring a Dedicated AI/ML Project Team in 2025?Read more

img-https://d1foa0aaimjyw4.cloudfront.net/What_Team_Model_Is_Ideal_for_Early_Stage_vs_Enterprise_AI_Projects_fca90d6097.png

What Team Model Is Ideal for Early-Stage vs. Enterprise AI Projects?Read more

...Loading Related Blogs

Explore More

Have Questions? Let's Talk.

We have got the answers to your questions.

NPS

INDUSTRIES

Real-time Maintenance Reporting

Workflow Automation Platform

Recruitment Automation Tool

Learner Engagement Platform

Customer Feedback Analytics

School Communication Suite

Digital Learning Suite

Software Development Outsourcing

Dedicated Teams

IT Staff Augmentation

New Venture Partnership

AI Model Compression Part VI: The Awakening of Vision - From Simple Cells to Convolutional Understanding

The Ancient Art of Seeing

The Hierarchy of Sight

The Mathematics of Vision: One Pixel at a Time

Pooling: The Art of Summarizing

The Deep Architecture of Vision

The Backpropagation Story: Learning to See Better

The Activation Story: ReLU and the Art of Decision

Just published

Have Questions? Let's Talk.

More from Ateeb Taseer

DeepSeek Popped the AI Bubble, So What Comes Next?

AI Model Compression PART IX: The Final Frontier: Ultimate Compression...

AI Model Compression Part VIII: The Compression Revolution - Finding P...

Just published

AI Model Compression Part VI: The Awakening of Vision - From Simple Cells to Convolutional Understanding

The Ancient Art of Seeing

The Hierarchy of Sight

The Mathematics of Vision: One Pixel at a Time

Pooling: The Art of Summarizing

The Deep Architecture of Vision

The Backpropagation Story: Learning to See Better

The Activation Story: ReLU and the Art of Decision

Just published

Have Questions? Let's Talk.

Newsletter

More from Ateeb Taseer

DeepSeek Popped the AI Bubble, So What Comes Next?

AI Model Compression PART IX: The Final Frontier: Ultimate Compression...

AI Model Compression Part VIII: The Compression Revolution - Finding P...

Just published