INDUSTRIES

Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
Discover More
- "Working with Arbisoft has felt less like hiring a vendor and more like gaining a team of trusted colleagues. Their developers don’t just build what we ask, they think alongside us, offer smart suggestions, and care deeply about getting it right."
  Sarah Johnson / SVP of Product, Summit K12
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
Discover More
- “Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
  Paul English / Co-Founder, KAYAK
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
Discover More
- "I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
  Matt Hasel / Program Manager, eHuman
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
Discover More
- “Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
  Jake Peters / CEO & Co-Founder, PayPerks
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
Discover More
- "The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
  Veronika Sonsev / Co-Founder
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
Discover More
- “The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
  Silvan Rath / CEO, Predict.io

Liquid Clustering in Databricks: A General Availability Overview

Muhammad Irfan UmarPosted on July 4, 2025

9-10 Min Read Time

Introduction

Managing how data is stored physically has always been a tricky part of building high-performance data systems. Data engineers and analysts have long used techniques like partitioning and Z-Ordering to speed up queries on large datasets in Delta Lake. These methods certainly help—but they also come with strings attached: manual configurations, regular maintenance, and deep knowledge of how data is queried.

To address these challenges, Databricks recently rolled out a powerful new feature: Liquid Clustering, now generally available. This tool brings automation and adaptability to how data is laid out, helping teams get better performance without lifting a finger.

In this post, we’ll dive into what liquid clustering actually does, how it works, what makes it useful, and where it fits (or doesn’t) in your data strategy.

What is Liquid Clustering?

Liquid Clustering is an intelligent and automated system that organizes data within Delta Lake tables to optimize query performance. Unlike Z-Ordering, which requires users to actively choose clustering keys or carry out routine maintenance, Liquid Clustering continually reorganizes data in the background according to its usage.

So while Z-Ordering is more of a “set it and re-run it” solution, Liquid Clustering is dynamic. It learns from query patterns and adapts the data layout to align with real-world usage automatically, without requiring any manual effort.

How Liquid Clustering Works

Here’s how it works behind the scenes:

Query-Aware Optimization: Databricks watches your queries to see what you do often. For example, it notices which columns you filter a lot or which time ranges you search the most.

Smart File Layouts: Once it picks up on those patterns, it groups related data together in smarter ways. That means it can find what you need faster and won’t have to read as much data.

Hands-Off Operation: You don’t have to set anything up. There’s no need to choose clustering keys or manage anything. It works on its own.

Unity Catalog Support: If you’re using Unity Catalog, it fits right in. You still get all your security and rules, but now with better performance too.

The idea is to make your data lakehouse smarter, automatically.

Benefits of Liquid Clustering

Here’s what makes Liquid Clustering stand out:

No Configuration Hassles: You don’t have to choose any keys or do anything special. Databricks does it all for you.

Faster Queries: If your query has filters or looks through ranges, it will be quicker. It skips files you don’t need and reads less data.

Adaptive Over Time: As your data or queries change, Liquid Clustering changes too. It keeps things fast without you doing anything.

Cost-Efficient Queries: Faster queries use less computer power. That means lower costs, especially if you have a lot of data.

Compatible with Delta Live Tables: It fits into your data pipelines. It helps when data comes in and when you query it later.

Enabling Liquid Clustering: It’s easy to get started. To enable it on an existing Delta table, run:

ALTER TABLE my_table SET TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');

To create a new table with Liquid Clustering already turned on:

CREATE TABLE my_table (
 id INT,
 name STRING,
 created_at TIMESTAMP
)
TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');

You don’t need to specify any clustering keys—the system figures it out.

When You Might Not Need It

Even though Liquid Clustering is helpful, you don’t always need it.

Small Tables: If your tables are small, you probably won’t see much of a speed boost. It works best with big datasets where there's more to organize.

Simple, Predictable Queries: If your queries are super simple and follow the same pattern every time, and you’re already using Z-Ordering with no problems, there’s not much reason to change things.

Write-Heavy Environments: if you're doing a lot of writing to your tables, Liquid Clustering might slow that part down a little because it reorganizes data in the background. But in most cases, the faster reads make up for it.

Monitoring Liquid Clustering

You can check if Liquid Clustering is doing its job and helping your data run better.

i) One way is to use a simple command: DESCRIBE HISTORY my_table. This shows you when optimizations happened and what changed. It’s like looking at a timeline of updates.

ii) You can also look at SQL dashboards or use Unity Catalog to keep track of performance. These tools give you helpful charts and numbers so you can see what’s going on.

iii) Another way is to check Delta Lake’s metadata using special APIs. These give you stats about how the data is clustered and how it’s improving over time.

How It Compares: Liquid Clustering vs Z-Ordering

Feature	Z-Ordering	Liquid Clustering
Manual Setup	Yes	No
Adapts over time	No	Yes
Requires Re-Optimization	Yes	No
Query Pattern Awareness	No	Yes
Background Processing	No	Yes

Liquid Clustering is a more modern, scalable alternative to traditional methods, particularly useful when dealing with unpredictable workloads.

Under the Hood: What Actually Happens?

1. Clustering Keys Are Hints, Not Partitions

When you enable clustering (e.g., on customer_id), you’re not restructuring partitions. Instead, you’re letting Databricks know what field matters most for filtering.

ALTER TABLE orders CLUSTER BY (customer_id);

2. Smarter File Organization

Databricks uses techniques like Z-ordering (space-filling curves) to reorder data inside files. This improves metadata quality and speeds up filtering at query time.

3. Background Optimization

Unlike Z-Ordering, which you run manually with OPTIMIZE, Liquid Clustering quietly works in the background. It watches query patterns, checks for file fragmentation, and optimizes files only when needed—automatically.

4. Stream-Friendly

Writers can keep appending data—there’s no need to manage partitions. This makes it ideal for streaming ingestion and highly concurrent environments.

5. Metadata Efficiency

By keeping smart min/max statistics on clustering keys, Liquid Clustering enables faster query planning without bloating your metastore with tiny partitions.

Real-World Example

Traditional setup:

CREATE TABLE events (
 event_id STRING,
 event_type STRING,
 customer_id STRING,
 timestamp TIMESTAMP
)
USING DELTA
PARTITIONED BY (event_type);

You’re locked into event_type even if most filters use customer_id.
Now with Liquid Clustering:

CREATE TABLE events (
 event_id STRING,
 event_type STRING,
 customer_id STRING,
 timestamp TIMESTAMP
)
USING DELTA
TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');
ALTER TABLE events CLUSTER BY (customer_id);

You get flexibility, better query performance, and zero manual optimization.

Databricks vs Snowflake in Liquid Clustering

Snowflake has no direct equivalent to liquid clustering compared to Databricks' liquid clustering.

In Snowflake, the Automatic Clustering Service exists, but:

It's designed for tables with clustering keys defined manually.
It reclusters data in the background to maintain performance, but it’s not adaptive to query patterns over time the way Liquid Clustering is.
Clustering must still be manually defined, and there’s an additional cost for using automatic clustering.

So, Snowflake offers background clustering, but it's less intelligent and requires user-defined keys. So, here again, databricks wins!

Final Thoughts

Liquid Clustering marks a turning point for Delta Lake. It merges the flexibility of schema-on-read with the performance of pre-clustered data, without the burden of maintenance.

If your tables handle large volumes of data, are queried in unpredictable ways, or suffer from slow filters and large scans—this feature can help.
The best part? It’s non-intrusive. If Liquid Clustering isn’t needed for a table, Databricks simply won’t apply it.

Try It Yourself

Want to see the benefits firsthand? Just run:

ALTER TABLE your_table SET TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');

Sit back, and let Databricks handle the rest.

Just published

img-https://d1foa0aaimjyw4.cloudfront.net/AWC_Blog_Shifting_Accessibility_Left_How_to_Empower_Developers_QA_and_Designers_Together_Tanveer_Khan_844e625162.jpg

Shifting Accessibility Left: How to Empower Developers, QA and Designers TogetherRead more

img-https://d1foa0aaimjyw4.cloudfront.net/AWC_Blog_How_Transformers_Redefined_Natural_Language_Processing_Abdul_Moiz_afab5da5f1.png

How Transformers Redefined Natural Language ProcessingRead more

img-https://d1foa0aaimjyw4.cloudfront.net/AWC_Blog_Micro_Partitions_The_Hidden_Engine_Behind_Snowflake_s_Performance_Advantage_Abdul_Rafey_7de6610d5d.png

Micro-Partitions: The Hidden Engine Behind Snowflake's Performance AdvantageRead more

...Loading Related Blogs

Explore More

Have Questions? Let's Talk.

We have got the answers to your questions.

Trusted by Market Leaders in Education, Travel, Finance and E-commerce since 2007

We put excellence, value and quality above all - and it shows

NPS

INDUSTRIES

Real-time Maintenance Reporting

Workflow Automation Platform

Recruitment Automation Tool

Learner Engagement Platform

Customer Feedback Analytics

School Communication Suite

Digital Learning Suite

Software Development Outsourcing

Dedicated Teams

IT Staff Augmentation

New Venture Partnership

Liquid Clustering in Databricks: A General Availability Overview

Introduction

What is Liquid Clustering?

How Liquid Clustering Works

Benefits of Liquid Clustering

When You Might Not Need It

Monitoring Liquid Clustering

How It Compares: Liquid Clustering vs Z-Ordering

Under the Hood: What Actually Happens?

1. Clustering Keys Are Hints, Not Partitions

2. Smarter File Organization

3. Background Optimization

4. Stream-Friendly

5. Metadata Efficiency

Real-World Example

Databricks vs Snowflake in Liquid Clustering

Final Thoughts

Try It Yourself

Just published

Have Questions? Let's Talk.

More from Muhammad Irfan Umar

The Business Benefits of the Databricks Unified Data Analytics Platfor...

Why Your Business Should Consider Databricks in 2025: Strategic Benefi...

Just published

Liquid Clustering in Databricks: A General Availability Overview

Introduction

What is Liquid Clustering?

How Liquid Clustering Works

Benefits of Liquid Clustering

When You Might Not Need It

Monitoring Liquid Clustering

How It Compares: Liquid Clustering vs Z-Ordering

Under the Hood: What Actually Happens?

1. Clustering Keys Are Hints, Not Partitions

2. Smarter File Organization

3. Background Optimization

4. Stream-Friendly

5. Metadata Efficiency

Real-World Example

Databricks vs Snowflake in Liquid Clustering

Final Thoughts

Try It Yourself

Just published

Have Questions? Let's Talk.

Newsletter

More from Muhammad Irfan Umar

The Business Benefits of the Databricks Unified Data Analytics Platfor...

Why Your Business Should Consider Databricks in 2025: Strategic Benefi...

Just published