Dear Analyst #131: Key insights and best practices from writing SQL for 15+ years with Ergest Xheblati
Podcast: Play in new window | Download
Subscribe: Spotify | TuneIn | RSS
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career out of it. Ergest Xheblati didn’t just pick SQL and decide to bet his career on it. He started as a software engineer, held various roles in data, and even became a product manager. After trying these different roles, Ergest would still find his way back to data engineering and has dedicated 15+ years to writing SQL for a living. In this episode he talks about why he loves SQL, reducing the number of dashboards in your company, and best practices for writing SQL.
Why Ergest loves writing SQL
The reason why Ergest loves SQL is also the reason how most things get invented: laziness. As that Stack Exchange thread points out, a lazy developer tends to find shortcuts and automations to make repetitive and tedious tasks less onerous. You could also argue that Excel shortcuts are a result of analysts being lazy and not wanting to use their mouse to do mundane formatting tasks.
As it pertains to programming, Ergest saw that a standard framework might require 20-30 lines of code to pull some data from a database. Ergest could handle do that same operation by writing a few lines of SQL with a simple SELECT
statement.
Solving business problems with technology
Ergest was a data analyst, data engineer, and also what we call an analytics engineer. When Ergest was a data analyst, he didn’t have tools like dbt which prevented him from succeeding as an analyst. As with many data roles, Ergest still straddles multiple roles today. He still considers himself a blend between a data analyst and data engineer with SQL being his main tool of choice. At a high level, Ergest talks about “solving business problems with technology.”
I think it’s important to emphasize this point which many other guests on Dear Analyst have pointed out as well. Learning tools like Excel and SQL are great, but if you cannot communicate your findings and solve real business problems with these tools, then what’s the point? I think data professionals get caught up with how to utilize a data tool’s features when time should really be spent on what can be done to solve your customer’s problems.
I recently had a conversation with a technical program manager who had an opportunity to sit on a few customer meetings with her sales team. She was amazed to learn about the actual problems her company’s customers face every day. It gave her a new perspective on the backend infrastructure her team supports.
Mining open source data with SQL
Most of the projects Ergest works on are focused on business intelligence. For instance, he had to work on a project where the company wanted to build robust customer profiles. You typically want to see all these different aspects of a customer so you know how to best market to and retain the customer. From a data perspective, Ergest was writing SQL to transform and merge different data from different sources.
Some data source might have names of the customers while another source might have numbers. You then have to look at the session logs of what these customers are doing on your website and create tables based on this customer activity. Ergest is a proponent of the One Big Table (OBT) approach for this customer activity data to make querying and management easier. This graphic below shows the main structural difference between the standard star schema and OBTs:
How to stop building dashboards and answering ad-hoc questions
Ergest wrote a great blog post a few months ago called Transforming a Data Culture. The blog post talks about how data teams can prevent the deluge of one-off data questions being asked by the business and to shift to being a more strategic partner. Does this sound like a goal or OKR your data team is striving for?
Ergest did an audit at a company that had 17,000+ dashboards! Talk about not knowing which metrics matter. Ergest believes in going back to first principles when it comes to dashboarding. There are 4 questions Ergest believes you need to answer when creating a dashboard:
- What’s happening?
- Why did it happen?
- What are you going to do?
- What’s your prediction?
The blog post goes in-depth on how getting executive buy-in is the most important step in reducing the number of questions coming at the data team.
Best patterns for writing SQL
Ergest reviews a lot of SQL queries and saw mistakes and anti-patterns in how his fellow analysts and data engineers were writing queries. Surely, there must be a book about the best patterns for writing SQL, Ergest thought. There are many books on best patterns for coding and how to debug code. The only books Ergest could find on SQL were anti-patterns. He ended up writing a book called Minimum Viable SQL Patterns based on his experience reviewing other people’s queries. He breaks the patterns down into 4 buckets:
- Query composition patterns – How to make your complex queries shorter, more legible, and more performant
- Query maintainability patterns – Constructing CTEs that can be reused. In software engineering, it’s called the DRY principle (don’t repeat yourself)
- Query robustness patterns – Constructing queries that don’t break when the underlying data changes in unpredictable ways
- Query performance patterns – Make your queries faster (and cheaper) regardless of specific database you’re using
These 4 patterns are pulled directly from this workshop Ergest gave about SQL patterns:
According to Ergest, what separates his book apart from other books about SQL is that the patterns he discusses are based on writing professional/production-ready SQL for cloud environments. He assumes you are writing SQL to query data warehouses in AWS, Azure, or some other public cloud platform.
Other Podcasts & Blog Posts
No other podcasts or blog posts mentioned in this episode!
Is there a technical problem with the Ipodcast? It won’t play or download for me
Just fixed the issue, it should be working now!