Transforming Netlify’s data pipeline one SQL statement at a time. Lauren Adabie started her career analyzing data and answering questions about the data at Zynga. As a data analyst at Netlify, she’s doing more than just exploratory analysis. She’s also helping build out Netlify’s revenue data pipeline; something she’s never done before. We discuss how her team is transforming data with SQL, how to get her stakeholders to have confidence in the data, and the path that led her to a career in data analytics.
Re-architecting a Revenue Pipeline
Lauren joined the Netlify team near the beginning of this revenue pipeline project project. Currently the pipeline is a combination of a few workflows. There are hourly processes to export the data to CSVs and Databricks jobs to load and aggregate data and then producing topic-specific tables. Lauren is currently helping migrate this workflow to dbt. With the current pipeline, if there’s failure downstream, it’s hard to find when and where the failure is happening.
Lauren’s first task was bringing raw data into the “staging” layer (data lake). She initially tackled it by pulling all data into the staging layer right away. Looking back, she would have done it differently now that she knows more about the tools and processes. The goal is to help her team monitor and catch issues before it reaches the business stakeholders. As we saw with Canva’s data pipeline, the benefit for the data team and the people who rely on the data is saving frustration and time.
A good data pipeline is one that doesn’t have many issues. More importantly, when issues do come up, it should be very easy for the data team to diagnose the issue. This impact of this revenue pipeline project is reducing time spent triaging issues, increase speed and ease at accessing data, and analyzing data at various levels. Additionally, the team can decrease communication difficulties with a a version-controlled dictionary of their metrics (similar to the data dictionary Education Perfect is creating).
Learning the tools of the trade
As a data analyst, you may not be diving into GitHub and the various workflows engineers typically use for reviewing and pushing code. Lauren’s team is a huge proponent of Github Issues to manage internal processes (she had an outstanding GitHub issue to work on as we were speaking). If engineers add new products to Netfliy’s product line, they add a new GitHub issue for Lauren’s team to address.
I was curious how Lauren gained the skills for some of the tools she uses every day. When you think of the tools a data analyst uses, you might think of Excel, SQL, R, etc. These are not necessarily tools or platforms you take classes for in college, so what was Lauren’s learning path?
Lauren has learned most tools on the job. She learned Python after graduating college.
I learned [python] partially because I was trying to do things in Excel that were frustrating. I was pushing Excel to do too much with VLOOKUPs, references, etc.
Here’s a tool you don’t hear every day: in college, Lauren learned Fortran 90 because people in her environmental engineering department were still using this programming language. She ended up learning SQL solely because she wanted to go into analytics and learned from a book. One thing she said about the tools she uses is that it’s all about the nuance and control you have over the tool that make you stick with the tool. It’s the small little things that keep you going back to that tool long term.
It’s all about nuance when working with stakeholders
Lauren explained that there is sometimes a mismatch between how we communicate and what we mean in terms of explaining metrics. Sometimes you need to sit down explain and where the data is coming from and showing why the numbers are what they are. Something she’s doing more of now is explaining the specific nuances of the data her team produces.
As analysts, we need to think in the big picture and the nuances.
Stakeholders need confidence in the numbers but also analysts also need to validate the numbers with other data sources the stakeholders are looking at. Sometimes the stakeholder you need to win over is yourself.
When Lauren was doing an experimental analysis at a previous company, she was expecting to see more clicks on a certain report. The hardest part with this experiment was that product managers typically run experiments and the analytics team is just assisting with driving the outcomes. The initial hypothesis about the numbers is driven by the PMs, not by analytics.
When you’re working with business stakeholders and trying to get them to have confidence in the numbers, simply being a kind person and good communicator can help. Lauren likes to remind myself that people typically mean well and everyone is coming to the table with the same information. These conversations about why numbers don’t look the way they should (from the perspective of the stakeholder) can be uncomfortable and not always fun. If you’re having trouble communicating, come to it with kindness and transparency.
From wastewater treatment to data analytics
We also talked about how Lauren started her career in data analytics which she discussed at length at a talk with the Society of Women Engineers. During college, Lauren majored in environmental engineering and thought she was going to be a civil engineer after graduation. Specifically, she wanted to go into wastewater treatment.
After working at a wastewater treatment plant, however, Lauren discovered she was more passionate about answering questions about the data in the wastewater treatment space. At the time, she didn’t even realize data analytics existed as a potential career path.
I think it’s difficult to find any job where working with data is not part of the job responsibilities. Lauren’s advice for people who want to get into a data analytics role but may not necessarily have the proper experience is reframing what you learned in school or at a current job to the role you’re interested in. For instance, Lauren took various match courses in college and was able to map the technical language from her studies and her environmental science job into skills required for a data analytics role at Zynga.
Lauren also talked about the power of connections and meeting people. If she could change one thing about how she got started in data analytics, it would be participating and contributing in various analytics communities. In particular, many of these tools she uses have thriving communities where like-minded people hang out and discuss product improvements, questions on how to do stuff, etc. Lauren plans on being more active in some of these communities which includes conferences like PyCon.
Tools for 2021
Finally, we discussed tools Lauren is excited to try out and use this year. She’s a big fan of dbt because of its ability to implement tests and its various documentation features. She’s also excited to start using transform.io to help with Netlify’s data dictionary. Another crowd favorite is Mode Analytics. Another area she’s excited to learn about in the next year is microservices and building analyses on top of what she builds there.
From a personal perspective, Lauren is thinking about starting a data blog. As an amateur blogger myself, I’ll vouch for that :).
Other Podcasts & Blog Posts
No other podcasts!