“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
“They delivered a high-quality product and their customer service was excellent. We’ve had other teams approach us, asking to use it for their own projects”.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
81.8% NPS78% of our clients believe that Arbisoft is better than most other providers they have worked with.
Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
“Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
"I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
"The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
“The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
“Arbisoft partnered with Travelliance (TVA) to develop Accounting, Reporting, & Operations solutions. We helped cut downtime to zero, providing 24/7 support, and making sure their database of 7 million users functions smoothly.”
“I couldn’t be more pleased with the Arbisoft team. Their engineering product is top-notch, as is their client relations and account management. From the beginning, they felt like members of our own team—true partners rather than vendors.”
"Arbisoft was an invaluable partner in developing TripScanner, as they served as my outsourced website and software development team. Arbisoft did an incredible job, building TripScanner end-to-end, and completing the project on time and within budget at a fraction of the cost of a US-based developer."
Managing how data is stored physically has always been a tricky part of building high-performance data systems. Data engineers and analysts have long used techniques like partitioning and Z-Ordering to speed up queries on large datasets in Delta Lake. These methods certainly help—but they also come with strings attached: manual configurations, regular maintenance, and deep knowledge of how data is queried.
To address these challenges, Databricks recently rolled out a powerful new feature: Liquid Clustering, now generally available. This tool brings automation and adaptability to how data is laid out, helping teams get better performance without lifting a finger.
In this post, we’ll dive into what liquid clustering actually does, how it works, what makes it useful, and where it fits (or doesn’t) in your data strategy.
What is Liquid Clustering?
Liquid Clustering is an intelligent and automated system that organizes data within Delta Lake tables to optimize query performance. Unlike Z-Ordering, which requires users to actively choose clustering keys or carry out routine maintenance, Liquid Clustering continually reorganizes data in the background according to its usage.
So while Z-Ordering is more of a “set it and re-run it” solution, Liquid Clustering is dynamic. It learns from query patterns and adapts the data layout to align with real-world usage automatically, without requiring any manual effort.
How Liquid Clustering Works
Here’s how it works behind the scenes:
Query-Aware Optimization: Databricks watches your queries to see what you do often. For example, it notices which columns you filter a lot or which time ranges you search the most.
Smart File Layouts: Once it picks up on those patterns, it groups related data together in smarter ways. That means it can find what you need faster and won’t have to read as much data.
Hands-Off Operation: You don’t have to set anything up. There’s no need to choose clustering keys or manage anything. It works on its own.
Unity Catalog Support: If you’re using Unity Catalog, it fits right in. You still get all your security and rules, but now with better performance too.
The idea is to make your data lakehouse smarter, automatically.
Benefits of Liquid Clustering
Here’s what makes Liquid Clustering stand out:
No Configuration Hassles: You don’t have to choose any keys or do anything special. Databricks does it all for you.
Faster Queries: If your query has filters or looks through ranges, it will be quicker. It skips files you don’t need and reads less data.
Adaptive Over Time: As your data or queries change, Liquid Clustering changes too. It keeps things fast without you doing anything.
Cost-Efficient Queries: Faster queries use less computer power. That means lower costs, especially if you have a lot of data.
Compatible with Delta Live Tables: It fits into your data pipelines. It helps when data comes in and when you query it later.
Enabling Liquid Clustering: It’s easy to get started. To enable it on an existing Delta table, run:
ALTER TABLE my_table SET TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');
To create a new table with Liquid Clustering already turned on:
CREATE TABLE my_table (
id INT,
name STRING,
created_at TIMESTAMP
)
TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');
You don’t need to specify any clustering keys—the system figures it out.
When You Might Not Need It
Even though Liquid Clustering is helpful, you don’t always need it.
Small Tables: If your tables are small, you probably won’t see much of a speed boost. It works best with big datasets where there's more to organize.
Simple, Predictable Queries: If your queries are super simple and follow the same pattern every time, and you’re already using Z-Ordering with no problems, there’s not much reason to change things.
Write-Heavy Environments: if you're doing a lot of writing to your tables, Liquid Clustering might slow that part down a little because it reorganizes data in the background. But in most cases, the faster reads make up for it.
Monitoring Liquid Clustering
You can check if Liquid Clustering is doing its job and helping your data run better.
i) One way is to use a simple command: DESCRIBE HISTORY my_table. This shows you when optimizations happened and what changed. It’s like looking at a timeline of updates.
ii) You can also look at SQL dashboards or use Unity Catalog to keep track of performance. These tools give you helpful charts and numbers so you can see what’s going on.
iii) Another way is to check Delta Lake’s metadata using special APIs. These give you stats about how the data is clustered and how it’s improving over time.
How It Compares: Liquid Clustering vs Z-Ordering
Feature
Z-Ordering
Liquid Clustering
Manual Setup
Yes
No
Adapts over time
No
Yes
Requires Re-Optimization
Yes
No
Query Pattern Awareness
No
Yes
Background Processing
No
Yes
Liquid Clustering is a more modern, scalable alternative to traditional methods, particularly useful when dealing with unpredictable workloads.
Under the Hood: What Actually Happens?
1. Clustering Keys Are Hints, Not Partitions
When you enable clustering (e.g., on customer_id), you’re not restructuring partitions. Instead, you’re letting Databricks know what field matters most for filtering.
ALTER TABLE orders CLUSTER BY (customer_id);
2. Smarter File Organization
Databricks uses techniques like Z-ordering (space-filling curves) to reorder data inside files. This improves metadata quality and speeds up filtering at query time.
3. Background Optimization
Unlike Z-Ordering, which you run manually with OPTIMIZE, Liquid Clustering quietly works in the background. It watches query patterns, checks for file fragmentation, and optimizes files only when needed—automatically.
4. Stream-Friendly
Writers can keep appending data—there’s no need to manage partitions. This makes it ideal for streaming ingestion and highly concurrent environments.
5. Metadata Efficiency
By keeping smart min/max statistics on clustering keys, Liquid Clustering enables faster query planning without bloating your metastore with tiny partitions.
Real-World Example
Traditional setup:
CREATE TABLE events (
event_id STRING,
event_type STRING,
customer_id STRING,
timestamp TIMESTAMP
)
USING DELTA
PARTITIONED BY (event_type);
You’re locked into event_type even if most filters use customer_id. Now with Liquid Clustering:
CREATE TABLE events (
event_id STRING,
event_type STRING,
customer_id STRING,
timestamp TIMESTAMP
)
USING DELTA
TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');
ALTER TABLE events CLUSTER BY (customer_id);
You get flexibility, better query performance, and zero manual optimization.
Databricks vs Snowflake in Liquid Clustering
Snowflake has no direct equivalent to liquid clustering compared to Databricks' liquid clustering.
In Snowflake, the Automatic Clustering Service exists, but:
It's designed for tables with clustering keys defined manually.
It reclusters data in the background to maintain performance, but it’s not adaptive to query patterns over time the way Liquid Clustering is.
Clustering must still be manually defined, and there’s an additional cost for using automatic clustering.
So, Snowflake offers background clustering, but it's less intelligent and requires user-defined keys. So, here again, databricks wins!
Final Thoughts
Liquid Clustering marks a turning point for Delta Lake. It merges the flexibility of schema-on-read with the performance of pre-clustered data, without the burden of maintenance.
If your tables handle large volumes of data, are queried in unpredictable ways, or suffer from slow filters and large scans—this feature can help. The best part? It’s non-intrusive. If Liquid Clustering isn’t needed for a table, Databricks simply won’t apply it.
Try It Yourself
Want to see the benefits firsthand? Just run:
ALTER TABLE your_table SET TBLPROPERTIES ('delta.liquidClustering.enabled' = 'true');