INDUSTRIES

Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
Discover More
- "Working with Arbisoft has felt less like hiring a vendor and more like gaining a team of trusted colleagues. Their developers don’t just build what we ask, they think alongside us, offer smart suggestions, and care deeply about getting it right."
  Sarah Johnson / SVP of Product, Summit K12
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
Discover More
- “Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
  Paul English / Co-Founder, KAYAK
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
Discover More
- "I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
  Matt Hasel / Program Manager, eHuman
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
Discover More
- “Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
  Jake Peters / CEO & Co-Founder, PayPerks
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
Discover More
- "The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
  Veronika Sonsev / Co-Founder
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
Discover More
- “The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
  Silvan Rath / CEO, Predict.io

Databricks Workflows-How We Automated Our Employee Satisfaction Analytics

Iqra SarwarPosted on June 3, 2025

9-10 Min Read Time

We used to manually run reports for our HR team until last month. Every Monday morning, our team would waste 2 hours pulling employee satisfaction data, cleaning it up, and generating dashboards that were already outdated by the time anyone looked at them. HR would get impatient, and we would start the week already behind.

The Breaking Point

The final straw came when our CEO asked for ad-hoc reports during an all-hands meeting. There I was, frantically querying databases while everyone waited. I had to admit that our "real-time" employee satisfaction tracking was actually T+3 days at best. That embarrassing moment finally pushed us to properly implement Databricks Workflows.

What Exactly Is Databricks Workflows?

Databricks Workflows is actually an orchestration layer that lets you automate and schedule data pipelines, and it integrates seamlessly with the Databricks Lakehouse architecture. Strip away the marketing talk, and it's essentially:

A task scheduler that can run notebooks, SQL, and Python jobs
A dependency manager that ensures tasks run in the correct order
A resource optimizer that starts and stops compute as needed
A monitoring system that alerts you when things break

What I appreciate most is how it handles compute resources. Before, our clusters ran excessively, costing us a fortune. With Workflows, we use job clusters that spin up just for the task and terminate when done. Our finance team noticed the savings immediately.

DB Blog Image 1.png

Our Employee Satisfaction Data Pipeline

Our employee mood tracking system has three main data sources:

A daily "mood pulse" where employees record how they're feeling based on different questions (1-5 scale)
Quarterly satisfaction surveys with detailed feedback
Ad-hoc surveys around specific initiatives or events

Previously, I manually combined these sources into reports. Now, our Databricks Workflow handles everything automatically—one of the benefits of leveraging robust Databricks data engineering services.

Setting Up the Pipeline

Here's how I built our employee satisfaction workflow:

Created a new workflow called "satisfaction_survey_main"
Added the first task to ingest data from various sources.
This task extracts raw employee mood pulse data by consolidating sources, including historical survey records and Workstream database entries. It captures key fields such as timestamps, employee IDs, departments, and mood scores.
- The historical survey data was fetched once during initial ingestion.
- Ongoing ingestion continues from the Workstream database.
- Runs on our existing cluster (we didn't need a job cluster for this small task)
Created a silver-level processing task that depends on the ingestion task
- This task cleans and standardizes the mood data
- Maps employee IDs to departments and teams
- Handles inconsistencies (like when someone accidentally submits multiple mood responses)
- Augments with metadata (like company events or holidays that might affect mood)
Added a gold-level task that takes the silver data and creates analytics-ready datasets
- Calculates aggregations like average mood by department, team, and date
- Identifies trend changes and statistical anomalies
- Joins with ad-hoc survey results when available
- Produces the final tables that power our dashboards

The interesting part was connecting these jobs together. In the workflow definition, we use a "depends_on" field to specify that the silver task can only run after the ingestion completes, and the gold task only runs after the silver task finishes.

DB Blog Image 2.png

Managing Different Data Sources

One challenge was handling our different data frequencies:

Mood pulse data comes in daily
Quarterly surveys arrive, well, quarterly
Ad-hoc surveys appear unpredictably

Rather than creating separate workflows, we built intelligence into our tasks:

The ingestion task always runs and gets the latest mood pulse data
It checks for new survey data (both quarterly and ad-hoc)
If found, it processes those too; if not, it just processes the mood data
The silver and gold tasks adapt their processing based on what data came in.

This approach means our workflow runs the same way every day, but handles different incoming data appropriately.

Parameters for Flexibility

After running the workflow for a few weeks, our HR team started asking for specific reports. Instead of creating multiple workflows, I added parameters:

survey_type - toggles between pulse, annual, or exit surveys
date_range - lets us run historical analyses for specific time periods
department_filter - allows reports focused on specific parts of the company
sentiment_analysis - enables NLP processing on free-text comments
anonymity_threshold - ensures enough responses to maintain employee privacy

Now, when HR needs something specific, they just click "Run Now" and change the parameters. No code changes needed, no manual data pulls. In my notebooks, I access the parameters like this: dbutils.widgets.get("date_range"). Pretty straightforward.

DB Blog Image 3.png DB Blog Image 5.png DB Blog Image 6.png

Handling Different Processing Needs

One technical challenge was that different parts of our pipeline needed different resources:

Data ingestion is lightweight and runs fine on our shared cluster
The silver task needs more memory for complex joins and data cleaning
Our gold task needs heavy compute for statistical anomaly detection

Databricks Workflows solved this elegantly. We specified different clusters for each task in our workflow configuration. The silver and gold tasks run as "run_job_task" types that reference separate job definitions with their own cluster configurations, for example high-memory or high-compute.

We also optimized cost by choosing the right compute for the right job. For example, using serverless compute for lightweight tasks optimizes job time by skipping cluster start time compared to shared compute. This approach also positively impacted our overall compute costs, which dropped significantly.

Real-Time Alerting for HR

Beyond scheduled reports, we implemented alerting as part of our data observability strategy to detect concerning patterns in employee sentiment for example sudden drops in mood scores:

Created a monitoring task that runs after the gold task
It looks for:
- Teams with declining satisfaction trends over two weeks
- Departments with scores 20% below the company average
- Sudden drops in individual employee mood scores
When triggers are detected, the workflow:
- Creates an alert in our HR dashboard
- Sends notifications to relevant managers
- Generates a preliminary analysis suggesting potential factors

The real value was in the speed, HR now gets alerts within hours of potential issues instead of waiting for weekly reports.

DB Blog Image 7.png

Technical Setup Details

For those interested in the actual technical implementation, our workflow configuration uses:

Parameter passing between tasks to maintain context
Bundle deployment for version control of our workflow
Health rules that alert us if the job runs longer than 30 minutes
Queue management to handle backlogged executions properly

The configuration specifies notification settings, health metrics, cluster IDs, notebook paths, job dependencies, and parameters all in a structured format that can be version controlled.

Monitoring Our Workflow

The basic Databricks monitoring worked well enough, but we wanted more specific insights. We added:

Custom metrics tracking:
- Number of employees reporting daily (participation rate)
- Processing time for each task
- Data quality scores (completeness, consistency)
A monitoring notebook that captures operational metrics:
- It pulls run data from the Databricks Jobs API
- Records duration, status, and row counts for each run
- Identifies trends in processing time or failure rates
A separate dashboard specifically for data pipeline health

This has been crucial for capacity planning. For example, we noticed our Monday processing took longer because weekend data arrived in batches. We adjusted our cluster sizing accordingly.

The Real Results

After 4 months with our automated workflow:

HR gets daily updates instead of weekly ones
Our compute costs dropped by 42% (job clusters vs. always-on)
We can detect satisfaction issues 75% faster than before
Department managers are more engaged with the data (because it's current)

The result was very impressive. When our CEO now asks for employee satisfaction stats during meetings, we just point to the real-time dashboard instead of frantically querying data.

Pain Points & Lessons Learned

To be honest, it wasn't all smooth sailing. Here are some hard-won lessons:

Start with simple flows: My first version tried to process everything in one massive job. Debugging was a nightmare. Break complex workflows into multiple linked workflows.
Watch your cluster configs: Our initial runs would sometimes fail because:
- Memory settings were too low for our data volume
- Autoscaling wasn't configured properly
- Driver node would run out of memory while the workers sat idle
Be careful with delta tables: We initially had locking issues when multiple workflows tried to write to the same delta tables. Switched to a more granular approach with specific write paths.
Set sensible timeouts: Our first health rule triggered alerts if jobs ran longer than 10 minutes. This caused false alarms every Monday when processing was heavier.
Document everything: I created a wiki page explaining the workflow, parameters, and expected outputs. This saved me from endless questions when I went on vacation.

What Still Bothers Us

Workflows aren't perfect. our ongoing frustrations:

Limited parameter validation (it's easy for users to enter invalid values)
No built-in data quality monitoring (we had to build our own)
The UI locks get annoying when you need to make quick changes
Deployment between dev/test/prod environments is still clunky
You can't easily clone a workflow with all its dependencies

Despite these issues, it's still better than our old manual process or trying to maintain Airflow ourselves.

Conclusion

If you're ready to escape spreadsheet hell and manual data pulls:

Map out your current reporting process
Identify logical breakpoints between tasks
Determine dependencies between those tasks
Start with a simple workflow that handles your most frequent data needs
Add parameters to make it flexible
Gradually expand to handle more complex cases

Don't try to automate everything on day one. Get one piece working well, then expand. We are still learning and improving, and would love to hear what's working for others.

Just published

img-https://d1foa0aaimjyw4.cloudfront.net/The_Dark_Side_of_Vibe_Coding_Debugging_Technical_Debt_and_Security_Risks_db8c24c758.png

The Dark Side of Vibe-Coding: Debugging, Technical Debt & Security RisksRead more

img-https://d1foa0aaimjyw4.cloudfront.net/AI_Hype_vs_Reality_Can_Vibe_Coding_AI_Tools_Help_You_Go_from_Prototype_to_an_MVP_Product_592a11c6fa.png

AI Hype vs. Reality: Can Vibe Coding AI Tools Help You Go from Prototype to an MVP Product?Read more

img-https://d1foa0aaimjyw4.cloudfront.net/Predictive_Analytics_Pillar_Sub_topic_5_What_Are_the_First_Steps_to_Integrating_Predictive_Data_Analytics_Solutions_into_Your_Existing_Infrastructure_5874a1987b.png

What Are the First Steps to Integrating Predictive Data Analytics Solutions into Your Existing Infrastructure?Read more

...Loading Related Blogs

Explore More

Have Questions? Let's Talk.

We have got the answers to your questions.

Trusted by Market Leaders in Education, Travel, Finance and E-commerce since 2007

We put excellence, value and quality above all - and it shows

NPS

INDUSTRIES

Real-time Maintenance Reporting

Workflow Automation Platform

Recruitment Automation Tool

Learner Engagement Platform

Customer Feedback Analytics

School Communication Suite

Digital Learning Suite

Software Development Outsourcing

Dedicated Teams

IT Staff Augmentation

New Venture Partnership

Databricks Workflows-How We Automated Our Employee Satisfaction Analytics

The Breaking Point

What Exactly Is Databricks Workflows?

Our Employee Satisfaction Data Pipeline

Setting Up the Pipeline

Managing Different Data Sources

Parameters for Flexibility

Handling Different Processing Needs

Real-Time Alerting for HR

Technical Setup Details

Monitoring Our Workflow

The Real Results

Pain Points & Lessons Learned

What Still Bothers Us

Conclusion

Just published

Have Questions? Let's Talk.

More from Iqra Sarwar

Moving from Legacy Data Platforms to Databricks A Step-by-Step Guide

Building Custom Moodle Plugins: A Step-by-Step Guide for PHP Developer...

Tracking User Engagement in Moodle with Custom Analytics and SQL Repor...

Just published

Databricks Workflows-How We Automated Our Employee Satisfaction Analytics

The Breaking Point

What Exactly Is Databricks Workflows?

Our Employee Satisfaction Data Pipeline

Setting Up the Pipeline

Managing Different Data Sources

Parameters for Flexibility

Handling Different Processing Needs

Real-Time Alerting for HR

Technical Setup Details

Monitoring Our Workflow

The Real Results

Pain Points & Lessons Learned

What Still Bothers Us

Conclusion

Just published

Have Questions? Let's Talk.

Newsletter

More from Iqra Sarwar

Moving from Legacy Data Platforms to Databricks A Step-by-Step Guide

Building Custom Moodle Plugins: A Step-by-Step Guide for PHP Developer...

Tracking User Engagement in Moodle with Custom Analytics and SQL Repor...

Just published