INDUSTRIES

Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
Discover More
- "Working with Arbisoft has felt less like hiring a vendor and more like gaining a team of trusted colleagues. Their developers don’t just build what we ask, they think alongside us, offer smart suggestions, and care deeply about getting it right."
  Sarah Johnson / SVP of Product, Summit K12
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
Discover More
- “Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
  Paul English / Co-Founder, KAYAK
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
Discover More
- "I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
  Matt Hasel / Program Manager, eHuman
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
Discover More
- “Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
  Jake Peters / CEO & Co-Founder, PayPerks
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
Discover More
- "The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
  Veronika Sonsev / Co-Founder
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
Discover More
- “The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
  Silvan Rath / CEO, Predict.io

How to Migrate from Snowflake to Databricks: A Zero-Downtime Guide for Data Engineers

Ahmad AliPosted on May 23, 2025

12-13 Min Read Time

Recent benchmarks show ETL workloads can cost up to 9x more on Snowflake than Databricks Lakehouse, with ETL typically accounting for 50% or more of total data platform costs.

If you're thinking about moving from Snowflake to Databricks, you're not alone. Many data teams are making the shift to take advantage of Databricks' unified + data intelligence platform for data engineering, analytics, and machine learning.

When we compare the two platforms, Snowflake is a powerful data warehouse. But Databricks gives you more flexibility if you're working with big data, running real-time analytics, or building AI models. It's especially attractive if you want to unify your data, analytics, and machine learning workflows in one place.

That said, migrating isn't just about moving data. It's about making sure your systems work just as well, if not better, after the move. In this post, I'll walk you through everything about making this migration smooth and stress-free, with particular attention to maintaining operations throughout the transition.

Preparation

Before you touch any data, take a step back and look at your current setup in Snowflake.

Here's what I recommend:

Audit your Snowflake environment

Make a comprehensive inventory of all your assets. Run SHOW TABLES, SHOW VIEWS, SHOW PROCEDURES, and SHOW ROLES commands to document your entire environment. Use the Snowflake account_usage views to identify query patterns, usage metrics, and performance bottlenecks that might need special attention during migration.

SELECT query_text, database_name, schema_name, 
      execution_status, execution_time/1000 as execution_time_sec
FROM snowflake.account_usage.query_history
WHERE execution_time > 60000 -- queries that took more than 60 seconds
ORDER BY execution_time DESC
LIMIT 100;

Identify key workloads

Not everything needs to move on Day 1. Start with high-priority tables or jobs that benefit most from Databricks' processing power. Analyze your warehouse usage patterns to understand peak load times and critical ETL windows:

SELECT warehouse_name, 
      SUM(credits_used) as total_credits,
      COUNT(DISTINCT query_id) as query_count
FROM snowflake.account_usage.warehouse_metering_history
GROUP BY 1
ORDER BY 2 DESC;

Evaluate dependencies

Look at what tools are connected to Snowflake, like BI dashboards, data pipelines, etc. Use the ACCESS_HISTORY view to see who and what is connecting to your data:

SELECT user_name, direct_object_name, query_text
FROM snowflake.account_usage.access_history
WHERE event_timestamp > DATEADD(month, -1, CURRENT_TIMESTAMP())
ORDER BY event_timestamp DESC;

Choose your migration approach

You can go for a phased migration (move in chunks) or a full cutover. If you ask me, phased is safer for maintaining zero downtime. It lets you test along the way and gradually shift workloads while maintaining operational continuity.

Migration Process

Now that you're prepared, let's dive into the actual steps.

1. Extract Data from Snowflake

The first step is to get your data out of Snowflake.

Here's a more precise approach:

Use Snowflake's COPY INTO <location> command to export data to cloud storage like S3 or Azure Blob. For optimal performance with Parquet files, specify the format and compression:

COPY INTO 's3://your-bucket/path/to/data/'
FROM my_database.my_schema.my_table
FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY)
HEADER = TRUE
MAX_FILE_SIZE = 128000000; -- 128MB files for optimal performance

While Parquet is generally preferred for performance, recent benchmarks show that Snowflake's new vectorized Parquet scanner has made significant improvements, achieving 80% faster performance compared to previous Parquet testing. With this enhancement, Parquet has become the most efficient format when using the vectorized scanner option.

For large tables, consider partitioning your data exports for more efficient parallel processing:

COPY INTO 's3://your-bucket/path/to/data/'
FROM my_database.my_schema.my_table
PARTITION BY (date_column)
FILE_FORMAT = (TYPE = PARQUET COMPRESSION = SNAPPY);

For incremental processing, capture the last modified timestamp to enable incremental migration:

COPY INTO 's3://your-bucket/path/to/incremental/'
FROM (SELECT * FROM my_table WHERE last_modified_ts > 
    (SELECT MAX(last_processed_ts) FROM migration_control_table))
FILE_FORMAT = (TYPE = PARQUET);

Remember that when unloading data into multiple files (the default with SINGLE=FALSE), each file will have a unique name pattern like data_00_1_0.parquet. You can use a prefix in your specification to better organize these files.

2. Transform the Data

Before loading the data into Databricks, check if you need to make any changes to ensure compatibility and optimization.

Data type mappings: Snowflake and Databricks don't use identical data types. Here's a quick reference for common mappings:

Snowflake	Databricks/Spark
NUMBER	DECIMAL
FLOAT	FLOAT
VARCHAR	STRING
DATE	DATE
TIMESTAMP	TIMESTAMP
VARIANT	STRING (or MAP/STRUCT in specific cases)

Optimization for Delta Lake: Consider partitioning strategies appropriate for Delta Lake, which might differ from your Snowflake partitioning.

# PySpark example for partitioning in Delta Lake
df.write.format("delta") \
  .partitionBy("year", "month") \
  .option("overwriteSchema", "true") \
  .mode("overwrite") \
  .save("/path/to/delta-table")

Schema evolution handling: If you need to modify schemas during migration, Delta Lake offers more flexible schema evolution than Parquet alone:

# Enable schema evolution when writing
df.write.format("delta") \
  .option("mergeSchema", "true") \
  .mode("append") \
  .save("/path/to/delta-table")

3. Load into Databricks

Now comes the fun part - loading the data into your new system. Based on the latest documentation and real-world practices, here are the most effective approaches:

Set up a Databricks workspace and cluster configuration optimized for data loading:

{
  "num_workers": 8,
  "spark_conf": {
    "spark.databricks.delta.optimizeWrite.enabled": "true",
    "spark.databricks.delta.autoCompact.enabled": "true",
    "spark.sql.files.maxPartitionBytes": "134217728",
    "spark.sql.shuffle.partitions": "400"
  },
  "node_type_id": "i3.4xlarge"
}

For direct migration from Parquet, use CONVERT TO DELTA which is significantly more efficient than rewriting the data:

# This is more efficient than reading and rewriting
spark.sql("CONVERT TO DELTA parquet.`s3://my-bucket/parquet-data`")

For larger datasets, consider using the CLONE feature which offers both shallow and deep options:

# Shallow clone (creates pointers without copying data)
spark.sql("CREATE TABLE target_table SHALLOW CLONE parquet.`/path/to/data`")

# Deep clone (copies all data to new location)
spark.sql("CREATE TABLE target_table CLONE parquet.`/path/to/data`")

For ongoing, incremental data loads, set up Auto Loader for change data capture:

from pyspark.sql.functions import current_timestamp

# Configure Auto Loader for continuous ingestion
df = spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "parquet") \
  .option("cloudFiles.schemaLocation", "/tmp/schema") \
  .load("s3://my-bucket/new-data/")
  
# Write to Delta table with tracking
df.withColumn("ingestion_time", current_timestamp()) \
  .writeStream \
  .format("delta") \
  .outputMode("append") \
  .option("checkpointLocation", "/tmp/checkpoint") \
  .start("/path/to/delta-table")

Delta Lake gives you versioning, ACID transactions, and better performance - features essential for enterprise-grade data systems. You will find these features to be game-changers for teams previously struggling with data consistency issues in their pipelines.

Tools and Resources

You don't have to do this all by hand. Here are tools and real-world benchmarks that can accelerate your migration:

Fivetran or Matillion: Great for automated ETL from Snowflake to Databricks. With Fivetran, you can simply change the destination from Snowflake to Databricks and run a new sync, which is particularly useful for maintaining parallel systems during migration.

Databricks Partner Connect: Helps you discover connectors and integrations with third-party tools already in use with your Snowflake instance.

Databricks CLI: Lets you automate things like cluster creation and job scheduling. Here's an example for automating Delta table optimizations:

databricks jobs create --json '{
  "name": "Optimize Delta Tables", 
  "schedule": {"quartz_cron_expression": "0 0 * * * ?"},
  "tasks": [{
    "task_key": "optimize",
    "notebook_task": {
      "notebook_path": "/path/to/optimize_script"
    }
  }]
}'

dbt: If you're using dbt with Snowflake, adapting to Databricks is straightforward. Update your profiles.yml:

# Before (Snowflake)
snowflake:
  type: snowflake
  account: your_account
  
# After (Databricks)
databricks:
  type: databricks
  host: your-workspace.cloud.databricks.com
  token: dapi123456789
  catalog: main
  schema: default

Performance benchmarks: In recent testing at enterprise scale, the vectorized Parquet scanner in Databricks showed a 3-4x performance improvement over standard Parquet processing. This becomes particularly relevant for large analytical queries spanning terabytes of data.

Challenges and Solutions

Let's be honest, no migration is completely smooth. Here are the most common issues people face with zero-downtime migrations and how you can solve them:

Data type mismatches

Snowflake and Databricks don't always use the same types. Beyond simple casting, consider using a schema registry or validation framework to ensure consistency:

# Schema validation pattern
expected_schema = StructType([
    StructField("id", LongType(), False),
    StructField("name", StringType(), True)
])

# Validate against expected schema
if df.schema != expected_schema:
    # Log the difference and apply transformation
    diff = set(df.schema) - set(expected_schema)
    print(f"Schema mismatch detected: {diff}")
    # Apply transformations to align schemas

Large volumes of data

For petabyte-scale migrations, I recommend a combination of parallel extraction from Snowflake with multiple warehouses, partitioned unloading, and then parallel ingestion into Databricks:

-- In Snowflake: Create multiple warehouses for parallel extraction
CREATE WAREHOUSE extract_wh_1 WITH WAREHOUSE_SIZE = 'LARGE';
CREATE WAREHOUSE extract_wh_2 WITH WAREHOUSE_SIZE = 'LARGE';

-- Then run multiple COPY operations in parallel sessions
-- Session 1
USE WAREHOUSE extract_wh_1;
COPY INTO 's3://bucket/part1/' FROM
    (SELECT * FROM huge_table WHERE date_partition BETWEEN '2020-01-01' AND '2020-06-30')
    FILE_FORMAT = (TYPE = PARQUET);

-- Session 2
USE WAREHOUSE extract_wh_2;
COPY INTO 's3://bucket/part2/' FROM
    (SELECT * FROM huge_table WHERE date_partition BETWEEN '2020-07-01' AND '2020-12-31')
    FILE_FORMAT = (TYPE = PARQUET);

Security and permissions

You'll need to recreate your access control rules in Databricks. Use Unity Catalog to manage data governance:

-- In Databricks SQL
GRANT SELECT ON TABLE gold.customers TO role_analysts;
GRANT MODIFY ON TABLE silver.raw_events TO role_eng;

-- For column-level security
GRANT SELECT ON TABLE gold.customers(id, name, city) TO role_marketing;

Zero-downtime migration strategy

Maintain dual-write pipelines during migration to ensure both systems have current data:

# Example dual-write pattern in Python
def save_to_both_platforms(data):
    # Write to Snowflake
    snowflake_conn.write(data)
    
    # Write to Databricks
    spark.createDataFrame(data).write.format("delta").mode("append").save("/path/to/delta")
    
    # Verify consistency between platforms
    snowflake_count = snowflake_conn.query("SELECT COUNT(*) FROM table")
    databricks_count = spark.sql("SELECT COUNT(*) FROM delta.`/path/to/delta`").collect()[^0][^0]
    
    assert abs(snowflake_count - databricks_count) < 10, "Data drift detected between platforms"

Test everything in a staging environment before going live. Set up automated reconciliation checks between Snowflake and Databricks during the transition to catch any inconsistencies early.

Cost Comparison and Final Thoughts

Switching from Snowflake to Databricks isn't just a technical migration. It's a shift in how your team handles data, and often, a significant cost-saving opportunity.

In terms of pricing models, Databricks offers a pay-as-you-go approach with no up-front costs, while Snowflake operates on a credit-based system with different rates for Standard ($2.00/credit), Enterprise ($3.00/credit), and Business Critical ($4.00/credit) editions.

For a real-world comparison, a client with the following profile:

5TB of data across 200+ tables
10-15 concurrent ETL jobs
30+ BI dashboard users

Their monthly costs dropped from approximately $25,000 on Snowflake to $15,000 on Databricks after optimization, representing a 40% reduction. The savings came primarily from:

More efficient compute utilization (Databricks' cluster management)
Reduced storage costs with Delta Lake's compression capabilities
Elimination of separate ML tooling costs by utilizing Databricks' unified platform

With the right prep and a careful rollout, you'll end up with a system that's faster, smarter, better suited for modern data workflows, and often significantly less expensive. Start small, test often, document everything, and implement reconciliation checks during the transition period.

Whether you're moving for cost savings, performance, or better ML support, Databricks can be a game-changer when implemented with careful planning and technical precision.

Since Databricks is a single platform, you need limited development team with Databricks expertise only, and all data ETL, ML, AI, Gen AI, and BI needs can be fulfilled by a small team. No need to hire multiple teams to manage these use cases on multiple platforms.

Databricks helps eliminating data silos with governance, rather than storing data and solving different use cases on different platforms.

Looking for expert guidance or a turnkey migration? Explore our Databricks services to see how we can accelerate your journey.

Just published

img-https://d1foa0aaimjyw4.cloudfront.net/How_to_Ensure_Proper_Knowledge_Transfer_and_Documentation_with_Outsourced_ML_team_081f506f0e.png

How to Ensure Proper Knowledge Transfer and Documentation with Outsourced ML TeamsRead more

img-https://d1foa0aaimjyw4.cloudfront.net/AWC_Blog_A_Reality_Check_30a9a19c11.png

How Do You Hire a Machine Learning Engineer Who Can Deliver Real Business Value?Read more

img-https://d1foa0aaimjyw4.cloudfront.net/Blog_Cover_Image_7af348a2bf.png

How To Know If a Machine Learning Consulting Company Truly Understands Your Use CaseRead more

...Loading Related Blogs

Explore More

NPS

INDUSTRIES

Real-time Maintenance Reporting

Workflow Automation Platform

Recruitment Automation Tool

Learner Engagement Platform

Customer Feedback Analytics

School Communication Suite

Digital Learning Suite

Software Development Outsourcing

Dedicated Teams

IT Staff Augmentation

New Venture Partnership

How to Migrate from Snowflake to Databricks: A Zero-Downtime Guide for Data Engineers

Preparation

Audit your Snowflake environment

Identify key workloads

Evaluate dependencies

Choose your migration approach

Migration Process

1. Extract Data from Snowflake

2. Transform the Data

3. Load into Databricks

Tools and Resources

Challenges and Solutions

Data type mismatches

Large volumes of data

Security and permissions

Zero-downtime migration strategy

Cost Comparison and Final Thoughts

Just published

Have Questions? Let's Talk.

More from Ahmad Ali

Machine Learning Experts Hiring Guide 2025: What to Know, What to Ask,...

What Are the Key Skills to Look For When Hiring a Data Engineer in 202...

How To Find the Right Data Engineering Consulting Partner That Support...

Just published

How to Migrate from Snowflake to Databricks: A Zero-Downtime Guide for Data Engineers

Preparation

Audit your Snowflake environment

Identify key workloads

Evaluate dependencies

Choose your migration approach

Migration Process

1. Extract Data from Snowflake

2. Transform the Data

3. Load into Databricks

Tools and Resources

Challenges and Solutions

Data type mismatches

Large volumes of data

Security and permissions

Zero-downtime migration strategy

Cost Comparison and Final Thoughts

Just published

Have Questions? Let's Talk.

Newsletter

More from Ahmad Ali

Machine Learning Experts Hiring Guide 2025: What to Know, What to Ask,...

What Are the Key Skills to Look For When Hiring a Data Engineer in 202...

How To Find the Right Data Engineering Consulting Partner That Support...

Just published