“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
“They delivered a high-quality product and their customer service was excellent. We’ve had other teams approach us, asking to use it for their own projects”.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
81.8% NPS78% of our clients believe that Arbisoft is better than most other providers they have worked with.
Arbisoft is your one-stop shop when it comes to your eLearning needs. Our Ed-tech services are designed to improve the learning experience and simplify educational operations.
“Arbisoft has been a valued partner to edX since 2013. We work with their engineers day in and day out to advance the Open edX platform and support our learners across the world.”
Get cutting-edge travel tech solutions that cater to your users’ every need. We have been employing the latest technology to build custom travel solutions for our clients since 2007.
“Arbisoft has been my most trusted technology partner for now over 15 years. Arbisoft has very unique methods of recruiting and training, and the results demonstrate that. They have great teams, great positive attitudes and great communication.”
As a long-time contributor to the healthcare industry, we have been at the forefront of developing custom healthcare technology solutions that have benefitted millions.
"I wanted to tell you how much I appreciate the work you and your team have been doing of all the overseas teams I've worked with, yours is the most communicative, most responsive and most talented."
We take pride in meeting the most complex needs of our clients and developing stellar fintech solutions that deliver the greatest value in every aspect.
“Arbisoft is an integral part of our team and we probably wouldn't be here today without them. Some of their team has worked with us for 5-8 years and we've built a trusted business relationship. We share successes together.”
Unlock innovative solutions for your e-commerce business with Arbisoft’s seasoned workforce. Reach out to us with your needs and let’s get to work!
"The development team at Arbisoft is very skilled and proactive. They communicate well, raise concerns when they think a development approach wont work and go out of their way to ensure client needs are met."
Arbisoft is a holistic technology partner, adept at tailoring solutions that cater to business needs across industries. Partner with us to go from conception to completion!
“The app has generated significant revenue and received industry awards, which is attributed to Arbisoft’s work. Team members are proactive, collaborative, and responsive”.
“Arbisoft partnered with Travelliance (TVA) to develop Accounting, Reporting, & Operations solutions. We helped cut downtime to zero, providing 24/7 support, and making sure their database of 7 million users functions smoothly.”
“I couldn’t be more pleased with the Arbisoft team. Their engineering product is top-notch, as is their client relations and account management. From the beginning, they felt like members of our own team—true partners rather than vendors.”
"Arbisoft was an invaluable partner in developing TripScanner, as they served as my outsourced website and software development team. Arbisoft did an incredible job, building TripScanner end-to-end, and completing the project on time and within budget at a fraction of the cost of a US-based developer."
Imagine this: it’s a typical afternoon, you’re at your desk with a steaming cup of tea, facing the daunting task of creating a complex SQL query to make sense of your company’s sales numbers. If that scenario sounds all too familiar, you’re not alone. A lot of folks wrestle with this kind of headache every single day. For many professionals, this is a daily reality that is filled with frustration and a steep learning curve. Databricks saw the struggle and came up with something pretty exciting: Databricks Genie.
In the paragraphs ahead, we’ll take a closer look at how Databricks Genie works, explore real-world scenarios where it shines, and show why it’s genuinely a game-changer, especially if SQL isn't your forte.
Meet Databricks Genie
Think of Databricks Genie not as just another analytics tool but as your own on-call data expert. Instead of wrestling with SQL syntax, you simply ask a business question in plain English, and Genie instantly crafts the exact query you need. In other words, Genie handles all the technical heavy lifting so you don’t have to.
The Magic Behind Genie: How Does It Work?
Imagine you’re peeking behind the scenes of Databricks Genie’s deceptively simple search bar and discovering a kind of AI orchestra in full swing. Rather than leaning on a single giant language model, Genie quietly enlists a handful of specialized “musicians”, each tuned to a particular task, so that when you type a question in plain English, it can instantly hand you a spot-on, efficient SQL query. Here’s a detailed, step-by-step look at how Genie turns your everyday words into production-ready SQL, complete with real examples to make it all click.
Step 1 – Parsing and Interpretation
When you type something like, “Show me last quarter’s top five products by revenue,” Genie doesn’t just skim those words. Instead, it:
Breaks your sentence into chunks Think of it like slicing a sentence into bite-sized pieces: action (“show me”), timeframe (“last quarter”), metric (“revenue”), and scope (“top five products”).
Pins down your intent Is this a request for a summary table? A trend? A ranking? Genie’s job here is to figure out exactly what you want.
Maps each chunk to a business concept
“Last quarter” becomes a concrete date range, say, January 1 through March 31, 2025, based on your warehouse’s calendar.
“Products” points to whichever table (or tables) hold your product-related data.
By the end of this phase, Genie isn’t just aware of your words; it understands the business question behind them. It knows you want a read-only query (a SELECT), grouped by product, filtered to that first quarter of 2025, and capped at five results.
Step 2 – Leveraging and Interpreting Metadata
Knowing what you want is only half the battle. Next, Genie needs to figure out where to look. It taps into your Unity Catalog and does the following:
Scans Table and Column Names Which tables might store “product” or “revenue” info? For example, it might find products, sales_transactions, or orders.
Checks Column Descriptions It confirms that “revenue” maps to a monetary field, perhaps something like order_total or sales_value, rather than, say, quantity_sold.
Examines Primary and Foreign Keys If there’s a products.product_id that links to sales_transactions.product_id, Genie spots that relationship right away and knows how to join those tables cleanly.
Looks at Any Examples or Annotations You’ve Provided If you’ve ever left a sample query or a comment in your catalog, say, an example like:
-- Example:
SELECT product_id, SUM(order_total) AS total_rev
FROM sales_transactions
WHERE sale_date BETWEEN '2025-01-01' AND '2025-03-31'
GROUP BY product_id;
Genie can treat that as a template for similar queries.
By scanning all this metadata, Genie builds a mental blueprint: “Okay, I’ll need sales_transactions.order_total for revenue, sales_transactions.sale_date for date filters, and I’ll join that to products.product_name for a friendly label.” It ignores irrelevant tables, no need to drag in customers if you’re only caring about product revenue, and zeroes in on exactly the fields and joins necessary.
Step 3 – Generating the SQL Query
Now comes the fun part: composing the actual SQL. With intent and metadata in hand, Genie:
Assembles the SELECT Clause Since you asked for “top five products by revenue,” it writes something like:
SELECT
p.product_name,
SUM(st.order_total) AS total_revenue
Inserts the JOIN Logic Because it knows sales_transactions links to products, it tacks on:
FROM sales_transactions st
JOIN products p
ON st.product_id = p.product_id
Applies the Date Filter “Last quarter” becomes:
WHERE
st.sale_date >= '2025-01-01'
AND st.sale_date < '2025-04-01'
Groups, Orders, and Limits To get the top five, Genie adds:
GROUP BY p.product_name
ORDER BY total_revenue DESC
LIMIT 5;
Ensures Read-Only Safety You’ll notice there are no INSERT, UPDATE, or DELETE statements, just a safe, read-only SELECT.
All together, for our example, Genie whips up this query:
SELECT
p.product_name,
SUM(st.order_total) AS total_revenue
FROM
sales_transactions st
JOIN products p
ON st.product_id = p.product_id
WHERE
st.sale_date >= '2025-01-01'
AND st.sale_date < '2025-04-01'
GROUP BY
p.product_name
ORDER BY
total_revenue DESC
LIMIT 5;
Because Genie can optimize for best practices, think partition pruning or avoiding unnecessary SELECT *s, you often end up with something even cleaner and faster than what you might type by hand.
Step 4 – Execution and Result Delivery
Once Genie finishes drafting that SQL, Databricks takes over:
Query Compilation & Optimization Databricks’ Catalyst optimizer kicks in, rewriting the plan to avoid full table scans and use partition filters if possible, say, only scanning the partitions for January through March 2025.
Cluster Processing The query runs in parallel across worker nodes. Each node might calculate intermediate “partial sums” of order_total before shuffling results back to the driver for a final aggregation.
Result Formatting When the dust settles, you get a neat table, five rows, one per product, with the total revenue beside each. If you’d also asked for a chart, Genie would have appended code to render it, for example:
Interactive Refinement Let’s say you see Product A at the top of your list and want more detail: “Break down Product A’s revenue by region.” You type that follow-up in plain English, and Genie re-parses your intent, checks the metadata again (this time adding the regions table), and adjusts your SQL:
SELECT
r.region_name,
SUM(st.order_total) AS region_revenue
FROM
sales_transactions st
JOIN products p
ON st.product_id = p.product_id
JOIN regions r
ON st.region_id = r.region_id
WHERE
p.product_name = 'Product A'
AND st.sale_date >= '2025-01-01'
AND st.sale_date < '2025-04-01'
GROUP BY
r.region_name
ORDER BY
region_revenue DESC;
Within seconds, sometimes less, you go from a plain-English question to a polished SQL query to a clear table (and optional chart). No more wrestling with field names, join conditions, or date math.
Why You’ll Love It
Big Time Savings Instead of spending half an hour scouring through schemas, typing JOINs, and debugging syntax errors, you get your answer in a minute, tops.
Fewer Mistakes Because Genie leans on your catalog’s metadata, it avoids the classic slip-ups, typos in column names, forgotten join keys, or misinterpreted date filters.
Consistent Best Practices Every query Genie writes follows optimal patterns: partition pruning, explicit read-only semantics, and just the columns you need.
Instant Back-and-Forth Want to tweak your filters, add a new dimension, or change the chart type? You simply type it in English, Genie picks up your previous context and updates the query on the fly.
The next time you find yourself staring blankly at a SQL editor, remember: with Genie on your side, you’re just a few words away from the insight you need. No fussing over syntax, just plain-English questions that return clear, data-driven answers.
Optimizing Data for Genie (Best Practices for Setup and Guidance)
You’re about to introduce Genie to your data, and you want it to rock, no hiccups, no confusion over “I-don’t-understand-this-column.” Think of Genie as a guest chef in your data kitchen: if the pantry is disorganized, the meal will taste off. But when everything is prepped, organized, and labeled, Genie will whip up insights faster than you can say “SQL shortcuts.” Ready to make your data Genie-ready? Let’s dive in!
1. Curate Your Data for Easy Analysis
Rather than exposing every column in every table, create lean, purpose-built views. For example, if you only need product_id, sale_date, and order_total for Q1 revenue, build a view like:
CREATE VIEW v_q1_revenue AS
SELECT
p.product_name,
st.sale_date,
st.order_total
FROM sales_transactions st
JOIN products p
ON st.product_id = p.product_id
WHERE sale_date BETWEEN '2025-01-01' AND '2025-03-31';
With a tidy view, Genie zeroes in on just what matters, avoiding irrelevant data.
2. Keep Tables and Columns to a Minimum
Large tables with dozens of fields can overwhelm Genie (and users). Ask, “Do I really need user_bio or session_duration for a marketing report?” If not, drop them. Partition timestamped data (e.g., by month) so Genie can skip unneeded partitions, like slicing a pizza into neat pieces.
3. Define Relationships Clearly
Genie relies heavily on knowing exactly how your tables fit together. Make sure every primary key (PK) and foreign key (FK) is explicitly defined in Unity Catalog. For example, if you have a customers table with customer_id as the primary key, and an orders table that references customer_id as an FK, Genie will instantly grasp how to join those two tables correctly. Clear relationships mean fewer “unknown column” errors and more accurate insights.
4. Manage Access and Permissions Thoughtfully
Nobody wants Genie to hit a wall because of missing permissions. Use Unity Catalog to set fine-grained access controls; everyone who needs to query data should have SELECT rights on the tables or views they’ll be using. Also, don’t forget to grant CAN USE on the default SQL warehouse (or whatever SQL endpoint your Genie space uses). If a user can’t access the warehouse, Genie can’t run the query, even if everything else is perfectly set up.
5. Start Small, Then Expand
Don’t dump your entire data lake on Genie day one. Begin with a single dataset, say, “Q1 Sales.” Let your team ask fun questions and watch what happens. Once everyone trusts Genie’s accuracy, gradually add new tables (e.g., product details, customer segments).
6. Treat Metric Views as Special Guests
If you’re using metric views, pre-aggregated tables for things like “monthly revenue” or “average session duration”, remember that a single Genie space can only point at one object type at a time. That means if you create a space specifically for metric views, you won’t be able to mix in regular tables or standard views. Plan ahead: decide whether this space’s sole purpose is to serve up metrics, or if you’ll keep it general to accommodate both metrics and raw tables.
7. Provide Examples and Instructions
Genie learns your lingo from examples. Drop in sample SQL queries like:
-- Example for YoY Growth:
SELECT
year,
(SUM(order_total) - LAG(SUM(order_total))
OVER (ORDER BY year))
/ LAG(SUM(order_total))
OVER (ORDER BY year) AS yoy_growth
FROM sales_data
GROUP BY year;
Write Plain-Text Instructions
Think of these as “contextual hints” that fill in missing business logic or jargon. For instance:
“When users ask for ‘active regions’, filter on regions.active_flag = true.”
“In our reports, ‘customer churn’ means DATEDIFF(day, last_purchase_date, CURRENT_DATE) > 30. Use that as the filter.”
“If someone types ‘total ARR’, they mean SUM(annual_recurring_revenue). That’s our field name.”
Create a brief cheat sheet inside your Genie space that explains terms (“‘active regions’ means regions.active_flag = true,” etc.). With examples and hints, Genie won’t wander blind.
8. Iterate and Improve Based on Feedback
Once your Genie space is live, listen to how people use it. Are users getting weird or irrelevant results? Maybe something in your data model changed, or an example is out of date. Take note of those hiccups, update your instructions, and roll out fresh examples. Over time, you’ll refine Genie’s “understanding” so it delivers sharper, more accurate answers every time.
Here’s a fun idea! Schedule a short “Genie Jam Session” every few weeks:
Ask, “What strange results did you see?”
Ask, “What new term should we teach Genie?”
Ask, “Did any queries run slower than expected?”
Make quick fixes, rename a column, update a join, refine metadata, so Genie learns from real usage and keeps getting sharper.
By treating your data like a well-stocked kitchen, giving Genie clear recipes, and implementing these practices, you can ensure a smoother, more accurate user experience and maximize Genie’s full potential. A well-maintained Genie space not only improves insight quality but also fosters greater adoption among users across technical and non-technical roles. Before you know it, Genie will be serving up insights, effortlessly, accurately, and with minimal cleanup.
Practical Example: Simplifying Sales Analysis
Consider this practical example: your company wants to quickly pinpoint regions that performed best last quarter. Traditionally, this might involve complicated SQL queries. With Genie, the process simplifies drastically.
You Ask Genie: "Show me our highest-performing sales regions in the first quarter of last year."
Genie’s SQL Response:
SELECT
region,
SUM(sales_amount) AS total_sales
FROM sales_data
WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
GROUP BY region
ORDER BY total_sales DESC
LIMIT 3;
The moment Genie runs this query, Databricks provides you a visual report or an easy-to-understand table displaying your top-performing regions.
Advanced Example: Complex Time-Series and Product Revenue Breakdown
Let’s look at a more sophisticated scenario that works with the available dataset sales_data (with columns: sale_id, region, product, sales_amount, sale_date). This example illustrates how Genie can generate a detailed and complex visualization, such as a stacked area chart or a multi-line comparison chart.
You Ask Genie: "Show a month-wise breakdown of total sales for each product category across all regions over the past year, using a stacked area chart."
Genie’s SQL Response:
SELECT
month_year,
product,
total_sales
FROM (
SELECT
DATE_TRUNC('month', sale_date) AS month_year,
product,
SUM(sales_amount) AS total_sales
FROM sales_data
WHERE sale_date >= DATE_TRUNC('year', CURRENT_DATE) -
INTERVAL 1 YEAR
GROUP BY month_year product
)
ORDER BY month_year, product;
This query aggregates sales by month and product type over the last 12 months. When visualized as a stacked area chart, this output allows business users to identify:
Seasonal product performance
Sales growth trends
Product contribution to overall revenue
Challenging Example: Multi-Table Join with Complex Analytics
Imagine your organization has expanded data sources beyond sales_data. You now have two additional tables:
You Ask Genie: "Identify, for each category and region, the month-over-month percentage change in total sales for the last 12 months. Then, filter to include only those category-region pairs where any month shows at least a 20% increase. Finally, join with region_info to include the manager’s name, and rank the top 5 category-region combinations by their average monthly growth."
Genie’s SQL Response:
WITH filtered_sales AS (
SELECT * FROM sales_data
WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
),
joined_data AS (
SELECT
sd.*, ph.category, ri.manager_name
FROM filtered_sales sd
JOIN product_hierarchy ph
ON sd.product = ph.product
JOIN region_info ri
ON region = ri.region
),
monthly_sales AS (
SELECT
category,
region,
DATE_TRUNC('month', sale_date) AS month,
SUM(sales_amount) AS total_sales
FROM joined_data
GROUP BY category, region, DATE_TRUNC('month', sale_date)
),
percentage_change AS (
SELECT
category, region, month, total_sales,
(
try_divide(
total_sales, LAG(total_sales) OVER (
PARTITION BY category, region ORDER BY month
)
) - 1
) * 100 AS percentage_change
FROM monthly_sales
),
filtered_data AS (
SELECT
category, region, manager_name,
AVG(percentage_change) AS avg_monthly_growth
FROM percentage_change JOIN joined_data USING (category, region)
WHERE percentage_change >= 20
GROUP BY category, region, manager_name
),
ranked_data AS (
SELECT
category, region, manager_name, avg_monthly_growth,
ROW_NUMBER() OVER (ORDER BY avg_monthly_growth DESC) AS rank
FROM filtered_data
)
SELECT * FROM ranked_data WHERE rank <= 5
ORDER BY rank
Real-world Impact: Why Your Business Needs Genie
Democratizing Data Access
Imagine a world where anyone on your team, whether they’re in marketing, sales, or even executive leadership, can type a simple question and get real insights without waiting on a data engineer. Genie makes that possible. By removing the need to learn SQL, Genie puts data at everyone’s fingertips. Now, a marketer can ask, “Which campaign drove the most sign-ups last month?” and see the answer in seconds. This kind of access not only speeds up everyday tasks but also helps build a truly data-driven culture, where decisions aren’t based on gut feeling but on clear, up-to-date information.
Boosting Productivity
Writing and debugging SQL queries can easily eat up hours of your team’s day. With Genie handling that heavy lifting, your analysts and engineers can focus on higher-value work, like exploring new hypotheses, fine-tuning models, or developing data-driven strategies. Instead of spending time troubleshooting a malformed JOIN or chasing down a missing column, they can jump straight into interpreting results and making recommendations. In other words, Genie turns the hours you used to spend on syntax headaches into productive, impactful work.
Accelerating Decision-Making
When insights take hours or days to reach decision-makers, opportunities can slip through the cracks. With Genie, answers are available almost instantly. Whether it’s spotting a sudden dip in customer engagement or identifying a top-performing product line, your team can react immediately. Faster insights mean you can pivot marketing tactics, optimize pricing, or adjust operations on the fly, giving you a real edge over competitors who are still wrangling slow, manual processes.
In short, Genie doesn’t just simplify SQL; it empowers your entire organization to move faster, think smarter, and stay ahead in a rapidly changing market.
How to Integrate Databricks Genie: A Step-by-Step Guide
Implementing Databricks Genie is designed to be straightforward:
Activate Genie: Simply turn on Genie within your Databricks workspace.
Connect your data: Effortlessly connect Genie to your existing data.
Ask Away: Type your questions in plain language, and Genie will handle the rest.
Managing Privileges and Access in Genie
Think of Genie as an assistant that only sees the data you give it permission to access, even down to individual rows. With Unity Catalog’s fine-grained controls, you can, for example, allow each salesperson to query only their own region’s sales figures. So if Alice’s territory is “North America,” you’d grant her a row-level policy such as WHERE region = 'North America'. Meanwhile, Bob in EMEA could have a policy like WHERE region = 'Europe'. At the same time, your finance team might need global visibility, so you’d give them broader access without a row filter.
To set this up:
Grant SELECT on Tables and Views Make sure each user has SELECT rights on the specific tables or views they’ll need. If someone doesn’t have SELECT privileges, Genie simply won’t return any results for them.
Apply Row-Level Policies In Unity Catalog, define a policy that restricts rows based on a column value. For example:
Alice (North America):
CREATE ROW FILTER north_america_sales ON sales_data FOR SELECT USING (region = 'North America');
Bob (EMEA):
CREATE ROW FILTER emea_sales ON sales_data FOR SELECT
USING (region = 'Europe');
With those filters in place, when Alice asks Genie, “What were my team’s total sales last quarter?” Genie automatically adds WHERE region = 'North America' to the SQL, ensuring she sees only her region’s numbers.
Grant CAN USE on the SQL Warehouse Genie needs permission to run queries, so each user must have CAN USE on the SQL warehouse tied to the Genie space. Without this, Genie can’t execute queries on behalf of the user.
By combining standard SELECT privileges with row-level filters, you ensure that users get exactly the right slice of data. This approach not only strengthens security but also means every response Genie returns is automatically scoped to what each person is allowed to see, no extra steps needed.
Limitations
As of mid-2025, Databricks Genie has evolved into a powerful tool for natural language querying and data visualization. However, certain limitations persist that users should be aware of:
1. Data & Compute Limits
a) Max 25 Tables/Views per Space: If you hit that cap, you’ll need to split your data into another Genie space or rethink which tables really need to be exposed.
b) Throughput Cap (20 Questions/Minute): In a busy environment, you might notice a slowdown if multiple people are firing off queries at once.
c) Conversation Size (10,000 Conversations × 10,000 Messages): This is generous, but very long-running, chatty projects could bump up against it.
2. Access & Permissions
a) Unity Catalog Dependency: Genie only understands data registered in Unity Catalog, and users need SELECT rights on tables or views plus CAN USE on the SQL warehouse.
b) Admin-Only Setup Steps: Granting those permissions or adjusting compute settings usually requires administrator access. If you’re not an admin, you’ll need to coordinate with one.
3. Visualization Quirks
a) No Interactive Chart Edits: If you want to split a bar chart further, apply a new filter, or tweak colors on the fly, you’ll have to edit your SQL or adjust your dataset manually.
4. Natural-Language Gotchas
a) Metadata Matters: Genie leans heavily on well-documented tables and clear instructions. If your column names are cryptic or your hints are missing, the answers might be fuzzy.
b) Ambiguity Can Confuse It: Complex questions without enough context can trip Genie up. Providing examples or a short cheat sheet goes a long way.
5. API Integration (Beta)
a) Public Preview Only: You can use Genie’s Conversation API today, think Slack or Teams integrations, but the feature set is still growing. Don’t expect every endpoint or capability you might see in the future.
Understanding these limitations is crucial for effectively leveraging Genie in your data strategy. By acknowledging and planning for these constraints, organizations can better integrate Genie into their workflows and maximize its benefits.
Further Exploration: Recommended Resources
To deepen your understanding of Databricks Genie and related technologies, consider these valuable resources:
Ultimately, Databricks Genie is more than just a technology feature; it represents a cultural shift toward accessibility and ease of use in data analytics. It empowers your entire organization, from novices to experts, to seamlessly access and interpret complex data. By reducing the complexity involved in data analysis, Genie ensures your team can focus on what matters most: driving your business forward.