Dear Analyst #96: Treating data as code and the new frontier for DBAs with Sean Scott
Podcast: Play in new window | Download
Subscribe: Google Podcasts | Spotify | Stitcher | TuneIn | RSS
What did the developer say to the DBA? It doesn’t matter, the answer is “no.” I’ve never worked with a database administrator (DBA) before but know they play an important part in the data lifecycle at a company. Sean Scott stumbled into the DBA world and has been in this field for 25+ years. He started his career working at a consumer electronics manufacturer. He started building his data chops as an inventory analyst and eventually got into the world of Oracle database migrations and application development. This episode explores a perspective on data we don’t normally see: from the DBA. I think it’s important to understand this perspective since data and business analysts ultimately use the data that is transformed and formatted by DBAs.
I remember saying I will never be a DBA
Sean likes to poke fun at the DBA crowd and remembers telling someone at a party he would never become a DBA. Perhaps his tongue-and-cheek attitude towards DBAs is what makes him so successful as a DBA. He currently works at a company called Viscosity where he does database and application development.
In short, I solve puzzles.
Sean explains how his background in data analysis and DevOps has helped him in his career as a DBA. This area of data is beyond my area of expertise, but Sean was able to relate things back to why this area matters for data analysts. Despite being in a “technical” role, Sean discusses the other qualities that make a DBA (or anyone in a technical role) successful:
The best technical people I’ve met have had great people and business skills.
How data analysts can work with DBAs better
What does DBA stand for? “Don’t bother asking.” That’s Sean’s favorite DBA joke. From a data or business analyst perspective, Sean says DBAs are typically seen as people who restrict access to data. DBAs can sometimes be seen as barriers or just standing in the way. Sean’s advice to data analysts and the consumers of data in organizations is to help change the perception of what DBAs do. I love this extremely outdated video explaining what DBAs do:
Sean says many DBAs fail to see the difference between data and databases. Many tend to mix the two together, but Sean believes these two concepts should be thought about and treated differently. Data analysts should seek to work with DBAs to understand where their data comes from. This leads to an important concept I haven’t heard of until this conversation with Sean: data as code.
Data as code leading to a diversity of ideas
Sean says that DBAs may think of data as being very fragile and brittle. They have this perception that data needs to be restricted or else it might be deleted when it’s in the wrong hands. That’s because DBAs aren’t thinking of data like other parts of the DevOps process.
Infrastructure as code has become a well-known concept as DevOps engineers manage data centers in the cloud, why can’t this same concept be applied to data? We can apply automation and configuration to the management of data. DevOps is typically concerned with storage and networking. The data lifecycle and pipeline can also be added to this list to “harden” data for the enterprise.
Now the actual nuts and bolts of this stuff is way beyond my pay grade. The benefits to data analysts, according to Sean, are plentiful. Analysts typically just analyze the data but don’t have much experience managing the data on their own. With these configurable data pipeline processes, analysts and other non-traditional infrastructure professionals can build their own data environments.
Whether or not analysts want to own this responsibility is another question for each data organization. I think what’s important is that analysts can be empowered to learn new skills and not rely on a data engineer to get the data they need. We saw this trend with Canva’s data engineering team in episode #58 as well. This trend of treating data as code can lead to more diverse ideas coming from all parts of the data organization.
Creating database artifacts
As I mentioned in the previous section, I’m getting way over my skis here :). Sean does an excellent job of digging into how data as code is important for analysts. I’ll try my best to summarize his thoughts below.
Docker opened up Sean’s eyes to treating data as code. He says database artifacts are like database images. Think of it as nothing more than an application that performs some service you want it to do. The data in the database can be turned into an “artifact” as well. With this artifact, you can store it somewhere, version it, share it with people, etc. The data is an asset that can go through various transformations and you can write code to “fix” and transform the data. This data-lake-as-code repo looks like an all-in-one application that shows how a “data as code” architecture might look on AWS:
When you’re working with data in highly-regulated industries or in an e-commerce environment, adopting this data as code framework is important because having bad data would be costly for the business. In e-commerce, for instance, transactional data is constantly streaming in and potentially changing, so you want to make sure the quality of the data is high. If the data infrastructure results in customers getting incorrect data about their order, that’s obviously a bad customer experience.
Ensuring high quality database upgrades
Another characteristic of this data as code framework is that you can “replay” the code build the data asset. This is like using packaged code in Terraform or Ansible and applying this code to your infrastructure. You can do the same thing with a dataset and have that dataset “stored” in a repo somewhere. Now you have an asset like a Docker image.
Data analysts can pull this image of a dataset and can do their analysis like they would any other dataset. The key takeaway is that if you QA or run some process against this data, you are guaranteed a specific outcome with the data. The end result is the same no matter how your run the code because you have set up a repeatable process that’s been version-controlled. This ensures that “dirty” data will get fixed the right way each time.
How to manage a database through Slack
Sean used to work on an operations team and described a fun project where his team had a direct impact on customers and helped changed the perception of DBAs at his company.
Sean’s was working at an e-commerce company and the website used a legacy application for issuing coupon codes. These codes would get sent every Monday morning at 8AM. There was also an email that got sent to customers reminding them to login at 8AM on Monday to get their coupon code. It’s 7:59AM on a given Monday and the team is ready to see a bunch of coupon codes get issued. The problem? The legacy application wasn’t issuing coupon codes correctly.
The marketing team started getting nervous as customers started complaining and issuing tickets to the customer support team. The customer support team would then re-assign the tickets to the developers, and the developers would re-assign the tickets to the DBAs. Sean’s team is now on the hook to resolve these customer issues. This process could take a few hours as tickets are getting resolved.
Sean’s team realized it didn’t make sense for the ticket to get passed from one team to another. Precious time is getting wasted and the customer is just sitting there with no coupon code in hand.
The solution was a Slack channel just for the marketing team to keep track of the status of tickets. Sean’s team created a custom slash command for the marketing team to use. The marketing team could simply type
/fix in this channel and a script would run in the background to pull up the relevant details about a given customer from the customer’s database. This prevented the need to contact the developers and DBAs.
Sean’s team realized that these Slack slash commands could do other operations in the database. Maybe you might want to update a customer’s record in the database or pull their latest sales. You could even use these slash commands on your phone!
The end result: the DBA team empowered the marketing team to solve real customer issues without needing a DBA’s help. The DBA team also felt like they were able to solve real customer issues instead of being stuck in SQL land all day. This solution also led other teams to look at the DBAs differently since the DBAs created a solution that impacted the front lines of the business.
Go out and love your DBA. We do try to balance the safety and security of the database against change. We are protective over our babies. Be demanding in a nice way.
Other Podcasts & Blog Posts
No other podcasts mentioned in this episode!
[…] apply concepts of versioning from software engineering to your data systems. Sean Scott talks about data as code in episode #96. The ETL process is completely automated and a data engineer or analyst can clone […]