Dear Analyst https://www.thekeycuts.com/category/podcast/ A show made for analysts: data, data analysis, and software. Mon, 09 Sep 2024 19:53:46 +0000 en-US hourly 1 https://wordpress.org/?v=6.6.2 This is a podcast made by a lifelong analyst. I cover topics including Excel, data analysis, and tools for sharing data. In addition to data analysis topics, I may also cover topics related to software engineering and building applications. I also do a roundup of my favorite podcasts and episodes. KeyCuts false episodic KeyCuts info@thekeycuts.com podcast A show made for analysts: data, data analysis, and software. Dear Analyst https://www.thekeycuts.com/wp-content/uploads/2019/03/dear_analyst_logo-1.png https://www.thekeycuts.com/excel-blog/ TV-G New York, NY New York, NY 5f213539-991a-51f4-96e4-df596a7aec88 50542147 Dear Analyst #132: How the semantic layer translates your physical data into user-centric business data with Frances O’Rafferty https://www.thekeycuts.com/dear-analyst-132-semantic-layer-translates-physical-data-into-business-data-frances-orafferty/ https://www.thekeycuts.com/dear-analyst-132-semantic-layer-translates-physical-data-into-business-data-frances-orafferty/#respond Tue, 10 Sep 2024 06:31:00 +0000 https://www.thekeycuts.com/?p=55232 When you think of your data warehouse, the “semantic layer” may not be the first thing that pops in your mind. Prior to reading Frances O’Rafferty‘s blog post on this topic, I didn’t even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing […]

The post Dear Analyst #132: How the semantic layer translates your physical data into user-centric business data with Frances O’Rafferty appeared first on .

]]>
When you think of your data warehouse, the “semantic layer” may not be the first thing that pops in your mind. Prior to reading Frances O’Rafferty‘s blog post on this topic, I didn’t even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing to me since I’m not building data warehouses and data products all day. Frances grew up in northern England studying mathematics during the recession. The decision to jump into data was a function of what jobs happened to be available at the time. Frances worked through a variety of data warehousing, BI, and ETL roles before looking more into the data management space like data modeling and cataloguing. This conversation is a deep dive into the world of data warehousing, data catalogues, and of course, the data semantic layer.

Enforcing data warehouse conformity for an insurance company

Imagine an insurance company where the policies are in two different systems. Which database contains the “right” policy for a customer? This is the mess Frances had to deal with when she helped build out the insurance company’s data warehouse. What I thought was interesting is Frances’ team looked at the source data and then interviewed people in the business to understand how the data is generated and how the data is being used. The questions she was asking were pretty high-level:

  1. What do you do on a day-to-day basis?
  2. What works well and doesn’t work well?
  3. What would you like the data to do?
Source: LinkedIn

Data quality validation checks and global lookups were set up so that if a new piece of data entered the warehouse and it didn’t match, then the administrator would get an alert. They would then have to figure out what to do with that rogue piece of data to fit the rules that have been set up.

A methodology Frances brought up I’ve never heard before is the Kimball methodology for setting up a data warehouse or BI system. The main tenets of the methodology are basically how modern data warehouses are setup: add business value, structure data with dimensions, and develop the warehouse iteratively. This is an image of the lifecycle from their website:

Source: Kimball Group

Focusing on different layers of the warehouse “stack”

Frances’ team first focused on the data source layer and tried to figure out where all the data came from. After that, then came the consolidation layer. That consolidation layer is where the data gets split into facts and dimensions.

I figured even for a data warehouse project, Excel must come into play at some point. Excel was used fro all the modeling to figure out what the dimensions and facts were. It wasn’t a core part of the warehouse but it was still a one-time use tool in the development of the warehouse.

The final layer is the target layer where we are getting more into the business intelligence realm. There are different ways the insurance company wanted to see the data. So Frances team had to create different views of the data to answer questions like: What premiums have we received? What transactions have come through? The actuarial team wanted to see what the balance was on an account so another view was created for them.

Frances noticed that different regions would call the data different things but they were all still referring to the same concept. There wasn’t a system to translate what a metric like gross revenue, for instance, meant in one region would mean in another region. This foreshadows the semantic layer concept Frances wrote about.

Source: MyReactionGifs

Cataloging 5,000 data attributes for an investment bank

Data catalog tools can get expensive (and rightly so) if you are managing thousands of data attributes and definitions. As I discussed in episode #129 with Jean-Mathieu, if you only have a handful of attributes, using Excel or Goole Sheets is completely doable as a data catalog.

The investment bank Frances was working for had many different source systems, KPIs, and measures that the entire investment bank was trying to get alignment on. Span this across a variety of financial products and the team came back with 5,000 attributes to put into a data catalog. The challenge was understanding the requirements from the finance, risk, and treasury departments to create a catalog that could be shared internally within the entire bank.

Frances’ team looked at the taxonomy first across loans, customers, and risk. They had an original glossary and compared the glossary with the new taxonomy. The main tool they used for the data catalog was Collibra. With this new catalog, new publishers of data had to abide by a strict format dictated by the catalog.

After one year and talking with 150 people, they finally launched the data catalog to the entire investment bank. I asked Frances how her team was able to best understand what different data attributes meant. The answer is just as you would expect: she asked people within the bank to send an example of the data attribute and how it’s being used.

Source: Tumblr

Translating data for enterprise consumers with the semantic layer

Back to the Frances original post about semantic layer: it historically is “trapped” in a BI tool, according to Frances. When Frances first started using SAP, there was a business objects universe in SAP which allowed you to create joins between tables and define data attributes. But these rules and definitions only existed in SAP.

Today, the semantic layer can show up in places like dbt, Collibra, graph databases, and more. There isn’t a “semantic layer vendor” that does it all (which is the first question I asked Frances about her blog post). The key takeaway is that the raw data in the data warehouse needs to be translated/converted into something usable by consumers within the enterprise. Frances said this translation is usually needed with legacy applications.

Source: G2 Learn Hub

This is a good diagram of where the semantic layer sits within the data stack:

Source: Modern Data 101

The next question to answer is: who owns the semantic layer? According to Frances, this also depends on your data team. It could be the data governance team, data management team, or even the data visualization team. If you’re looking at ownership from the perspective of product management, it would be the product owner. At the end of the day, it’s the team that is working with the people who are consuming your organization’s data.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #132: How the semantic layer translates your physical data into user-centric business data with Frances O’Rafferty appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-132-semantic-layer-translates-physical-data-into-business-data-frances-orafferty/feed/ 0 When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. When you think of your data warehouse, the "semantic layer" may not be the first thing that pops in your mind. Prior to reading Frances O'Rafferty's blog post on this topic, I didn't even know this was a concept that mattered in the data stack. To be honest, the concept is still a bit confusing to me since I'm not building data warehouses and data products all day. Frances grew up in northern England studying mathematics during the recession. The decision to jump into data was a function of what jobs happened to be available at the time. Frances worked through a variety of data warehousing, BI, and ETL roles before looking more into the data management space like data modeling and cataloguing. This conversation is a deep dive into the world of data warehousing, data catalogues, and of course, the data semantic layer.







Enforcing data warehouse conformity for an insurance company



Imagine an insurance company where the policies are in two different systems. Which database contains the "right" policy for a customer? This is the mess Frances had to deal with when she helped build out the insurance company's data warehouse. What I thought was interesting is Frances' team looked at the source data and then interviewed people in the business to understand how the data is generated and how the data is being used. The questions she was asking were pretty high-level:




* What do you do on a day-to-day basis?



* What works well and doesn't work well?



* What would you like the data to do?




Source: LinkedIn



Data quality validation checks and global lookups were set up so that if a new piece of data entered the warehouse and it didn't match, then the administrator would get an alert. They would then have to figure out what to do with that rogue piece of data to fit the rules that have been set up.



A methodology Frances brought up I've never heard before is the Kimball methodology for setting up a data warehouse or BI system. The main tenets of the methodology are basically how modern data warehouses are setup: add business value, structure data with dimensions, and develop the warehouse iteratively. This is an image of the lifecycle from their website:



Source: Kimball Group



Focusing on different layers of the warehouse "stack"



Frances' team first focused on the data source layer and tried to figure out where all the data came from. After that, then came the consolidation layer. That consolidation layer is where the data gets split into facts and dimensions.



I figured even for a data warehouse project, Excel must come into play at some point. Excel was used fro all the modeling to figure out what the dimensions and facts were. It wasn't a core part of the warehouse but it was still a one-time use tool in the development of the warehouse.



The final layer is the target layer where we are getting more into the business intelligence realm. There are different ways the insurance company wanted to see the data. So Frances team had to create different views of the data to answer questions like: What premiums have we received? What transactions have come through? The actuarial team wanted to see what the balance was on an account so another view was created for them.



Frances noticed that different regions would call the data different things but they were all s...]]>
Dear Analyst 132 132 full false 35:22 55232
Dear Analyst #131: Key insights and best practices from writing SQL for 15+ years with Ergest Xheblati https://www.thekeycuts.com/dear-analyst-131-key-insights-and-best-practices-from-writing-sql-for-15-years-with-ergest-xheblati/ https://www.thekeycuts.com/dear-analyst-131-key-insights-and-best-practices-from-writing-sql-for-15-years-with-ergest-xheblati/#comments Mon, 05 Aug 2024 05:19:00 +0000 https://www.thekeycuts.com/?p=55029 If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build […]

The post Dear Analyst #131: Key insights and best practices from writing SQL for 15+ years with Ergest Xheblati appeared first on .

]]>
If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career out of it. Ergest Xheblati didn’t just pick SQL and decide to bet his career on it. He started as a software engineer, held various roles in data, and even became a product manager. After trying these different roles, Ergest would still find his way back to data engineering and has dedicated 15+ years to writing SQL for a living. In this episode he talks about why he loves SQL, reducing the number of dashboards in your company, and best practices for writing SQL.

Why Ergest loves writing SQL

The reason why Ergest loves SQL is also the reason how most things get invented: laziness. As that Stack Exchange thread points out, a lazy developer tends to find shortcuts and automations to make repetitive and tedious tasks less onerous. You could also argue that Excel shortcuts are a result of analysts being lazy and not wanting to use their mouse to do mundane formatting tasks.

Source: Invisible Bread

As it pertains to programming, Ergest saw that a standard framework might require 20-30 lines of code to pull some data from a database. Ergest could handle do that same operation by writing a few lines of SQL with a simple SELECT statement.

Solving business problems with technology

Ergest was a data analyst, data engineer, and also what we call an analytics engineer. When Ergest was a data analyst, he didn’t have tools like dbt which prevented him from succeeding as an analyst. As with many data roles, Ergest still straddles multiple roles today. He still considers himself a blend between a data analyst and data engineer with SQL being his main tool of choice. At a high level, Ergest talks about “solving business problems with technology.”

Source: dbt Labs

I think it’s important to emphasize this point which many other guests on Dear Analyst have pointed out as well. Learning tools like Excel and SQL are great, but if you cannot communicate your findings and solve real business problems with these tools, then what’s the point? I think data professionals get caught up with how to utilize a data tool’s features when time should really be spent on what can be done to solve your customer’s problems.

I recently had a conversation with a technical program manager who had an opportunity to sit on a few customer meetings with her sales team. She was amazed to learn about the actual problems her company’s customers face every day. It gave her a new perspective on the backend infrastructure her team supports.

Mining open source data with SQL

Most of the projects Ergest works on are focused on business intelligence. For instance, he had to work on a project where the company wanted to build robust customer profiles. You typically want to see all these different aspects of a customer so you know how to best market to and retain the customer. From a data perspective, Ergest was writing SQL to transform and merge different data from different sources.

Some data source might have names of the customers while another source might have numbers. You then have to look at the session logs of what these customers are doing on your website and create tables based on this customer activity. Ergest is a proponent of the One Big Table (OBT) approach for this customer activity data to make querying and management easier. This graphic below shows the main structural difference between the standard star schema and OBTs:

Source: Databricks SQL SME

How to stop building dashboards and answering ad-hoc questions

Ergest wrote a great blog post a few months ago called Transforming a Data Culture. The blog post talks about how data teams can prevent the deluge of one-off data questions being asked by the business and to shift to being a more strategic partner. Does this sound like a goal or OKR your data team is striving for?

Source: iFunny

Ergest did an audit at a company that had 17,000+ dashboards! Talk about not knowing which metrics matter. Ergest believes in going back to first principles when it comes to dashboarding. There are 4 questions Ergest believes you need to answer when creating a dashboard:

  1. What’s happening?
  2. Why did it happen?
  3. What are you going to do?
  4. What’s your prediction?

The blog post goes in-depth on how getting executive buy-in is the most important step in reducing the number of questions coming at the data team.

Best patterns for writing SQL

Ergest reviews a lot of SQL queries and saw mistakes and anti-patterns in how his fellow analysts and data engineers were writing queries. Surely, there must be a book about the best patterns for writing SQL, Ergest thought. There are many books on best patterns for coding and how to debug code. The only books Ergest could find on SQL were anti-patterns. He ended up writing a book called Minimum Viable SQL Patterns based on his experience reviewing other people’s queries. He breaks the patterns down into 4 buckets:

  1. Query composition patterns – How to make your complex queries shorter, more legible, and more performant
  2. Query maintainability patterns – Constructing CTEs that can be reused. In software engineering, it’s called the DRY principle (don’t repeat yourself)
  3. Query robustness patterns – Constructing queries that don’t break when the underlying data changes in unpredictable ways
  4. Query performance patterns – Make your queries faster (and cheaper) regardless of specific database you’re using

These 4 patterns are pulled directly from this workshop Ergest gave about SQL patterns:

According to Ergest, what separates his book apart from other books about SQL is that the patterns he discusses are based on writing professional/production-ready SQL for cloud environments. He assumes you are writing SQL to query data warehouses in AWS, Azure, or some other public cloud platform.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #131: Key insights and best practices from writing SQL for 15+ years with Ergest Xheblati appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-131-key-insights-and-best-practices-from-writing-sql-for-15-years-with-ergest-xheblati/feed/ 2 If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, If you could only learn one programming language for the rest of your career, what would be it be? You could Google the most popular programming languages and just pick the one of the top 3 and off you go (FYI they are Python, C++, and C). Or, you could pick measly #10 and build a thriving career out of it. Ergest Xheblati didn't just pick SQL and decide to bet his career on it. He started as a software engineer, held various roles in data, and even became a product manager. After trying these different roles, Ergest would still find his way back to data engineering and has dedicated 15+ years to writing SQL for a living. In this episode he talks about why he loves SQL, reducing the number of dashboards in your company, and best practices for writing SQL.







Why Ergest loves writing SQL



The reason why Ergest loves SQL is also the reason how most things get invented: laziness. As that Stack Exchange thread points out, a lazy developer tends to find shortcuts and automations to make repetitive and tedious tasks less onerous. You could also argue that Excel shortcuts are a result of analysts being lazy and not wanting to use their mouse to do mundane formatting tasks.



Source: Invisible Bread



As it pertains to programming, Ergest saw that a standard framework might require 20-30 lines of code to pull some data from a database. Ergest could handle do that same operation by writing a few lines of SQL with a simple SELECT statement.



Solving business problems with technology



Ergest was a data analyst, data engineer, and also what we call an analytics engineer. When Ergest was a data analyst, he didn't have tools like dbt which prevented him from succeeding as an analyst. As with many data roles, Ergest still straddles multiple roles today. He still considers himself a blend between a data analyst and data engineer with SQL being his main tool of choice. At a high level, Ergest talks about "solving business problems with technology."



Source: dbt Labs



I think it's important to emphasize this point which many other guests on Dear Analyst have pointed out as well. Learning tools like Excel and SQL are great, but if you cannot communicate your findings and solve real business problems with these tools, then what's the point? I think data professionals get caught up with how to utilize a data tool's features when time should really be spent on what can be done to solve your customer's problems.



I recently had a conversation with a technical program manager who had an opportunity to sit on a few customer meetings with her sales team. She was amazed to learn about the actual problems her company's customers face every day. It gave her a new perspective on the backend infrastructure her team supports.



Mining open source data with SQL



Most of the projects Ergest works on are focused on business intelligence. For instance, he had to work on a project where the company wanted to build robust customer profiles. You typically want to see all these different aspects of a customer so you know how to best market to and retain the customer. From a data perspective, Ergest was writing SQL to 55029
Dear Analyst #130: What happens when we rely too much on Excel spreadsheets and shadow IT takes over? https://www.thekeycuts.com/dear-analyst-130-what-happens-when-we-rely-too-much-on-excel-spreadsheets-and-shadow-it-takes-over/ https://www.thekeycuts.com/dear-analyst-130-what-happens-when-we-rely-too-much-on-excel-spreadsheets-and-shadow-it-takes-over/#respond Mon, 24 Jun 2024 05:31:00 +0000 https://www.thekeycuts.com/?p=55006 This is a replay of an episode from the Make Sense podcast with Lindsay Tabas.In the 1990s, large enterprises typically bought software in a tops-down approach. IT teams would get get Oracle software or Microsoft Office and get their entire organization to use the software. Since these tools are the default IT “blessed” tools, people start putting […]

The post Dear Analyst #130: What happens when we rely too much on Excel spreadsheets and shadow IT takes over? appeared first on .

]]> This is a replay of an episode from the Make Sense podcast with Lindsay Tabas.

In the 1990s, large enterprises typically bought software in a tops-down approach. IT teams would get get Oracle software or Microsoft Office and get their entire organization to use the software. Since these tools are the default IT “blessed” tools, people start putting everything in these tools. This is why I think most people decide to push everything into Excel even though Excel is primarily meant for financial analysis. When it’s already installed on your computer and everyone knows how to use it, Excel becomes the crutch that we turn to regardless of the use case.

Source: xkcd

Shadow IT and the swinging pendulum of SaaS tools vs. Excel spreadsheets

In this episode, Lindsay Tabas and I talk about why large enterprises rely so much on Excel. This is part of a bigger movement of shadow IT and citizen development where individuals build business-critical workflows without needing an engineer or developer to step in. We talk about the shift from the 1990s of big monolithic software platforms to the explosion of workplace SaaS tools going into the 2000s and 2010s. The pendulum keeps on swinging back and forth as the SaaS tool sprawl gets too wide for IT departments to handle.

Despite the ebb and flow of teams having freedom to pick up their own tools vs. IT shoving software down everyone’s throats, we talk about why Excel will never die. We talk about how to get off the crutch of using Excel and one of the strategies I mention is to have a curious mind and be willing to learn new to tools. Every week we see new tools launched on Product Hunt that are supposed to replace some feature in Excel. These tools were borne out of the frustration that comes with trying to do something in Excel that Excel was not meant for. Nevertheless, you need to keep an open mind to see what these new tools are all about. You never know which one of these tools just might replace how you use Excel.

Source: Not Boring by Packy McCormick

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #130: What happens when we rely too much on Excel spreadsheets and shadow IT takes over? appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-130-what-happens-when-we-rely-too-much-on-excel-spreadsheets-and-shadow-it-takes-over/feed/ 0 This is a replay of an episode from the Make Sense podcast with Lindsay Tabas. In the 1990s, large enterprises typically bought software in a tops-down approach. IT teams would get get Oracle software or Microsoft Office and get their entire organiz... This is a replay of an episode from the Make Sense podcast with Lindsay Tabas.



In the 1990s, large enterprises typically bought software in a tops-down approach. IT teams would get get Oracle software or Microsoft Office and get their entire organization to use the software. Since these tools are the default IT "blessed" tools, people start putting everything in these tools. This is why I think most people decide to push everything into Excel even though Excel is primarily meant for financial analysis. When it's already installed on your computer and everyone knows how to use it, Excel becomes the crutch that we turn to regardless of the use case.



Source: xkcd



Shadow IT and the swinging pendulum of SaaS tools vs. Excel spreadsheets



In this episode, Lindsay Tabas and I talk about why large enterprises rely so much on Excel. This is part of a bigger movement of shadow IT and citizen development where individuals build business-critical workflows without needing an engineer or developer to step in. We talk about the shift from the 1990s of big monolithic software platforms to the explosion of workplace SaaS tools going into the 2000s and 2010s. The pendulum keeps on swinging back and forth as the SaaS tool sprawl gets too wide for IT departments to handle.



Despite the ebb and flow of teams having freedom to pick up their own tools vs. IT shoving software down everyone's throats, we talk about why Excel will never die. We talk about how to get off the crutch of using Excel and one of the strategies I mention is to have a curious mind and be willing to learn new to tools. Every week we see new tools launched on Product Hunt that are supposed to replace some feature in Excel. These tools were borne out of the frustration that comes with trying to do something in Excel that Excel was not meant for. Nevertheless, you need to keep an open mind to see what these new tools are all about. You never know which one of these tools just might replace how you use Excel.



Source: Not Boring by Packy McCormick



Other Podcasts & Blog Posts



No other podcasts or blog posts mentioned in this episode!
]]>
Dear Analyst 130 130 full false 45:18 55006 Dear Analyst #129: How to scale self-serve analytics tools to thousands of users at Datadog with Jean-Mathieu Saponaro https://www.thekeycuts.com/dear-analyst-129-how-to-scale-self-serve-analytics-tools-to-thousands-of-users-at-datadog-with-jean-mathieu-saponaro/ https://www.thekeycuts.com/dear-analyst-129-how-to-scale-self-serve-analytics-tools-to-thousands-of-users-at-datadog-with-jean-mathieu-saponaro/#comments Tue, 18 Jun 2024 04:28:00 +0000 https://www.thekeycuts.com/?p=54766 When you’re organization is small, a centralized data team can take care of all the internal data tooling, reporting, and requests for all departments. As the team grows from 100 to thousands of people, a centralized data team simply cannot handle the number of requests and doesn’t have the domain knowledge of all the departments. […]

The post Dear Analyst #129: How to scale self-serve analytics tools to thousands of users at Datadog with Jean-Mathieu Saponaro appeared first on .

]]>
When you’re organization is small, a centralized data team can take care of all the internal data tooling, reporting, and requests for all departments. As the team grows from 100 to thousands of people, a centralized data team simply cannot handle the number of requests and doesn’t have the domain knowledge of all the departments. Jean-Mathieu Saponaro (JM) has experienced this transformation at Datadog. He first joined Datadog in 2015 as a research engineer. He was part of the inaugural data analytics team which now supports 6,000+ employees. In this episode, he discusses scaling a self-serve analytics tool, moving from ETL to ELT data pipelines, and structuring the data team in a hybrid data mesh model.

Building a data catalog for data discovery

According to JM, creating a data catalog is not that hard (when you’re organization is small). I’ve seen data catalogs done in a shared Google Doc where everyone knows what all the tables and columns mean. When the data warehouse grows to hundreds of tables, that’s when you’ll need a proper data cataloging solution to store all the metadata about your data assets. This is when you move to something like Excel (just kidding)! In all seriousness, a shared Google Sheet isn’t a terrible solution if your data warehouse isn’t that large and the data structure isn’t very complicated.

Source: North Shore Data Services

JM discussed a few strategies that helped them scale their internal data discovery tool:

Strong naming conventions

A pretty common pattern for data warehouses containing “business” data is using dim and fact tables. All tables in the data warehouse have to be prepended with dim or fact so that it’s clear what data is stored in the table. There are also consistent naming conventions for the properties in the table. Finally, the “display” name for the table should be closely related to the actual table name itself. For instance, if the table is dim_customers, the display name for the table would just be customers.

Snowflake schema

Another common pattern is using a snowflake scheme to structure the relationship between tables. This structure makes it easy to do business intelligence (e.g. reports in Excel) later on.

Source: Wikipedia

Customizing the data discovery experience

Datadog switched BI tools a few years ago so that the tool could be used by technical and non-technical users alike. They ended up going with Metabase because it didn’t feel as “advanced” as Tableau.

In terms of their data catalog, one of the key decisions going into picking a tool was being able to quickly answer the question: where do I start? Where do I go to learn about our customer data? Product data? This is where the discovery experience is important. JM said the entry point to their catalog is still just a list of 800+ tables but they are working on a custom home page.

JM’s team thought about the classic build vs. buy decision for their data cataloging tool. Given the size of their organization, they went with the building the tool internally. If the number of users was smaller, it would’ve been fine to go with an off-the-shelf SaaS tool. JM’s team set a goal to build the tool in a few months and it took them 3.5 months exactly. Building the tool internally also meant they could design and re-use custom UI components. This resulted in a consistent user experience for every step of the data discovery process.

Should you migrate data pipelines from ETL to ELT?

When JM joined Datadog, he found that all the ETL data pipelines were done in Spark and Scala. If you were to ask me a year ago what “ETL,” “data pipeline,” and tools like Spark and Scala mean I would’ve thought you were speaking a different language. But once you hear the same terms over and over again from various analysts and data engineers, you’ll start to understand how these different data tools and architecture work together. If you are new to Apache Spark, this is a quick intro video that I found useful:

As Datadog grew, so did the number of data pipelines. JM saw the number of data pipelines grow from 50 to hundreds and Spark didn’t make sense as a data processing framework anymore. Every time you wanted to add a new field to a table or change a workflow, it required an engineer to submit a pull request and deploy the change to the data warehouse.

Eventually tools like dbt and came onto the scene which prevented the need for relying on engineers to make changes to the data pipeline. Analysts who are not on the core data engineering team could develop and test data pipelines by writing SQL. One might saw dbt is like the no-code/low-code data processing framework democratizing who can create data pipelines. As the data team scaled, their data pipelines migrated from ETL to it cousin ELT. The team uses Airbyte for the “extraction” step and dbt does all the data cataloging.

Source: Airbyte

Since dbt opened up the data pipeline development process to more people outside the data team, it became even more important to enforce best practices for naming conventions for tables and fields.

Pros and cons of data meshing

Another term I didn’t learn about until a few years ago: data mesh. A definition from AWS:

A data mesh is an architectural framework that solves advanced data security challenges through distributed, decentralized ownership. Organizations have multiple data sources from different lines of business that must be integrated for analytics. A data mesh architecture effectively unites the disparate data sources and links them together through centrally managed data sharing and governance guidelines.

When JM first started working at Datadog, there was a central data team that did everything for every department at Datadog. The data team did the extraction, ingestion, dashboarding, and even recruiting. This is totally reasonable when you’re a small organization.

As the organization grew to thousands of people, it became harder for this centralized data team to cater to all the departments (who were also growing in size and complexity). The data team simply didn’t have the domain knowledge and expertise of these business units.

Source: Trending GIFs

If Datadog were to go full on data mesh, silos would form in each department. This is one of those situations where the data mesh sounds good in theory, but in practice, Datadog hired and structured their data teams to meet the needs of their data consumers. Having each team manage their own data extraction and ingestion would lead to a big mess according to JM.

Start with one team to prove the semi-data mesh model

JM’s team started with the recruiting team to prove this data mesh hybrid model would work. The recruiting team started hiring its own data analysts who understood the business priorities of the team. The analysts would help clean and process the data. An example of domain-specific data for the recruiting team might be engineering interview data. The analysts helped make sure that interviews were properly distributed among engineers so that no engineer was overloaded.

To see JM’s journey in more detail, take a look at this talk he gave in 2023 at the Compass Tech Summit:

Find something you like and have fun

People have given all types of great advice on this podcast in terms of how to switch to a career in data. Sometimes the advice goes beyond data and applies to life in general. Perhaps this is getting too touchy-feely but JM’s advice for aspiring data professionals is to “find a domain that you like and have fun.” Life is finite, after all. This is one of my favorite visualizations when I need to remind myself about the finiteness of life (read the full blog post from Wait But Why):

Source: Wait But Why

JM also talked about the ability to be flexible once you’re in the data world because tooling changes a lot. The definition of roles change a lot and new roles pop up every year the redefine what working in “data” means. Case in point is the analytics engineer role. JM’s advice is that you should feel empowered to follow wherever the industry may go.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #129: How to scale self-serve analytics tools to thousands of users at Datadog with Jean-Mathieu Saponaro appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-129-how-to-scale-self-serve-analytics-tools-to-thousands-of-users-at-datadog-with-jean-mathieu-saponaro/feed/ 1 When you're organization is small, a centralized data team can take care of all the internal data tooling, reporting, and requests for all departments. As the team grows from 100 to thousands of people, a centralized data team simply cannot handle the ... When you're organization is small, a centralized data team can take care of all the internal data tooling, reporting, and requests for all departments. As the team grows from 100 to thousands of people, a centralized data team simply cannot handle the number of requests and doesn't have the domain knowledge of all the departments. Jean-Mathieu Saponaro (JM) has experienced this transformation at Datadog. He first joined Datadog in 2015 as a research engineer. He was part of the inaugural data analytics team which now supports 6,000+ employees. In this episode, he discusses scaling a self-serve analytics tool, moving from ETL to ELT data pipelines, and structuring the data team in a hybrid data mesh model.







Building a data catalog for data discovery



According to JM, creating a data catalog is not that hard (when you're organization is small). I've seen data catalogs done in a shared Google Doc where everyone knows what all the tables and columns mean. When the data warehouse grows to hundreds of tables, that's when you'll need a proper data cataloging solution to store all the metadata about your data assets. This is when you move to something like Excel (just kidding)! In all seriousness, a shared Google Sheet isn't a terrible solution if your data warehouse isn't that large and the data structure isn't very complicated.



Source: North Shore Data Services



JM discussed a few strategies that helped them scale their internal data discovery tool:



Strong naming conventions



A pretty common pattern for data warehouses containing "business" data is using dim and fact tables. All tables in the data warehouse have to be prepended with dim or fact so that it's clear what data is stored in the table. There are also consistent naming conventions for the properties in the table. Finally, the "display" name for the table should be closely related to the actual table name itself. For instance, if the table is dim_customers, the display name for the table would just be customers.



Snowflake schema



Another common pattern is using a snowflake scheme to structure the relationship between tables. This structure makes it easy to do business intelligence (e.g. reports in Excel) later on.



Source: Wikipedia



Customizing the data discovery experience



Datadog switched BI tools a few years ago so that the tool could be used by technical and non-technical users alike. They ended up going with Metabase because it didn't feel as "advanced" as Tableau.



In terms of their data catalog, one of the key decisions going into picking a tool was being able to quickly answer the question: where do I start? Where do I go to learn about our customer data? Product data? This is where the discovery experience is important. JM said the entry point to their catalog is still just a list of 800+ tables but they are working on a custom home page.



JM's team thought about the classic build vs. buy decision for their data cataloging tool. Given the size of their organization, they went with the building the tool internally. If the number of users was smaller, it would've been fine to go with an off-the-shelf SaaS tool. JM's team set a goal to build the tool in a few months and it took them 3.5 months exactly. Building the tool internally also meant they could design and re-use custom UI components. This resulted in a consistent user experience for every step of the data discovery process.



Should you migrate data pipelines from ETL to ELT?


]]>
Dear Analyst 121 121 full false 31:58 54766
Dear Analyst #128: What is citizen development and how to build solutions with spreadsheets? https://www.thekeycuts.com/dear-analyst-128-what-is-citizen-development-and-how-to-build-solutions-with-spreadsheets/ https://www.thekeycuts.com/dear-analyst-128-what-is-citizen-development-and-how-to-build-solutions-with-spreadsheets/#comments Tue, 28 May 2024 05:11:00 +0000 https://www.thekeycuts.com/?p=54902 This is a replay of an episode from the Citizen Development Live podcast with Neil Miller.Citizen development is a relatively new term I learned about a year ago or so. To me, it’s using no-code tools at scale within a large enterprise. It’s a term that covers the population of people who are not developers, […]

The post Dear Analyst #128: What is citizen development and how to build solutions with spreadsheets? appeared first on .

]]>
This is a replay of an episode from the Citizen Development Live podcast with Neil Miller.

Citizen development is a relatively new term I learned about a year ago or so. To me, it’s using no-code tools at scale within a large enterprise. It’s a term that covers the population of people who are not developers, programmers, and software engineers by trade but know how to build apps and workflows to accomplish business-critical tasks. This is the definition of a citizen developer from PMI (Project Management Institute):

Low-code or no-code development is the creation of applications software using graphic user interfaces or minimal basic code instead of large strings of complex coding. This term is often used to describe citizen development processes and technology. Low-code and no-code technology provides visual drag-and-drop or point-and-click user interfaces, making them easy for anyone to use.

Source: PMI

In this conversation on the Citizen Development Live podcast, Neil and I discuss various spreadsheets I’ve built in the past, when to move beyond spreadsheets, and why citizen development is a growing trend within the enterprise. I referred to a talk I gave at the 2019 No-Code Conference where I spoke about building tools with spreadsheets (and why the spreadsheet is the real first no-code tool):

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #128: What is citizen development and how to build solutions with spreadsheets? appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-128-what-is-citizen-development-and-how-to-build-solutions-with-spreadsheets/feed/ 1 This is a replay of an episode from the Citizen Development Live podcast with Neil Miller. Citizen development is a relatively new term I learned about a year ago or so. To me, it's using no-code tools at scale within a large enterprise. This is a replay of an episode from the Citizen Development Live podcast with Neil Miller.



Citizen development is a relatively new term I learned about a year ago or so. To me, it's using no-code tools at scale within a large enterprise. It's a term that covers the population of people who are not developers, programmers, and software engineers by trade but know how to build apps and workflows to accomplish business-critical tasks. This is the definition of a citizen developer from PMI (Project Management Institute):




Low-code or no-code development is the creation of applications software using graphic user interfaces or minimal basic code instead of large strings of complex coding. This term is often used to describe citizen development processes and technology. Low-code and no-code technology provides visual drag-and-drop or point-and-click user interfaces, making them easy for anyone to use.
Source: PMI



In this conversation on the Citizen Development Live podcast, Neil and I discuss various spreadsheets I've built in the past, when to move beyond spreadsheets, and why citizen development is a growing trend within the enterprise. I referred to a talk I gave at the 2019 No-Code Conference where I spoke about building tools with spreadsheets (and why the spreadsheet is the real first no-code tool):




https://www.youtube.com/watch?v=M1GAArkYfug




Other Podcasts & Blog Posts



No other podcasts or blog posts mentioned in this episode!
]]>
Dear Analyst 115 115 full false 54902
Dear Analyst #127: Spreadsheets vs. Jira: Which one is better for your team? https://www.thekeycuts.com/dear-analyst-127-spreadsheets-vs-jira-which-one-is-better-for-your-team/ https://www.thekeycuts.com/dear-analyst-127-spreadsheets-vs-jira-which-one-is-better-for-your-team/#comments Mon, 13 May 2024 16:41:00 +0000 https://www.thekeycuts.com/?p=54831 I wasn’t sure if this topic should be it’s own episode but it’s been on my mind ever since I came back from Atlassian Team ’24 (Atlassian’s annual conference). At the conference, I had the opportunity to meet with a few people who are just as interested in spreadsheets as I am. We talked specifically […]

The post Dear Analyst #127: Spreadsheets vs. Jira: Which one is better for your team? appeared first on .

]]>
I wasn’t sure if this topic should be it’s own episode but it’s been on my mind ever since I came back from Atlassian Team ’24 (Atlassian’s annual conference). At the conference, I had the opportunity to meet with a few people who are just as interested in spreadsheets as I am. We talked specifically how Jira can best work with spreadsheets (Excel or Google Sheets) and different workflows that result from the combination of these two tools. It was fascinating to hear how company culture and old ingrained way of doing things leads to the usage of spreadsheets when Jira and its add-ons can accomplish 80-90% of what the business need is. This episode highlights some of the things we discussed at the conference and implications for the future for teams using Jira and spreadsheets.

Source: Atlassian

What is Jira?

Since most people following the newsletter are data analysts, I thought it would be relevant to first share what Jira is. Most would say Jira is issue-tracking software used by engineering and product teams to track software projects. The software aims to mirror agile and scrum methodologies for accomplishing tasks versus traditional waterfall techniques. The rituals behind agile and scrum are codified in Jira’s features, so that’s why the software is loved by thousands of engineering teams around the world. This is a good video from Atlassian on what a scrum project in Jira looks like. Near the end, you’ll see a backlog of tasks. The backlog is one of the most foundational principles of the scrum methodology and will serve as the launching pad for this discussion on Jira and spreadsheets.

Why do teams export Jira issues to Excel spreadsheets?

One theme for why teams would want to export Jira issues into spreadsheets is reporting. We also talked about using other tools like Power BI for reporting purposes, but the intermediary step between Jira and Power BI is still a CSV export.

There are built-in reporting and charting capabilities in Jira. There are also a plethora of add-ons in the Atlassian marketplace for custom charts. The issue with the add-ons is they can get quite costly since you are paying on a per-seat basis. So even if the Jira admin is the one creating the charts, you still have to pay for the other Jira users who are simply viewing the charts. This charting add-on below is one of the most popular add-ons for Jira with 10,000+ downloads. Looks a bit like Excel, no?

Source: eazyBI

Digging a little deeper, we also discussed how the Jira backlog is kind of like a datastore for what the product and eng teams are working on. You can think of this almost like another table of data in your data warehouse. What does this mean for a regular business user who doesn’t work on the eng or product team and still needs the data? Traditionally, they would write a SQL query to get the data they need, do their analysis, and call it a day. With Jira, they would need the Jira admin to export the backlog to a CSV and then they can go off into Excel and do their custom reporting, PivotTables, and dashboarding to show how the product and eng team’s work aligns with the rest of the work of the company.

Story points, finance, and HRIS systems

Expounding on the above point, being able to merge your Jira backlog with other business data is why teams are exporting from Jira into spreadsheets. During the conference, I brought up the point that other business data might just be other worksheets in your Excel workbook. Perhaps one tab has data from your customer support team and another tab has data from your sales team. Through a series of VLOOKUPs and INDEX/MATCHes, a product owner may be able to get a full P&L for their area of work. Perhaps ERP software can do this but can it get to the level of fidelity that a Jira backlog has? This is why it’s easier to just export all your data (not just Jira) into one Excel file and do the custom analysis in that workbook.

How to export Jira backlog to CSV after writing a JQL query. Source: Quora

Relating to this topic, one use case our group discussed was figuring out how much work was actually completed by the engineering team. To get an accurate picture of this, story points are included in the export. For those new to agile, story points are a unit of measurement for estimating the effort required to complete an item in the backlog.

The CSV export now contains the entire backlog, the engineer assigned to each item, and the story point estimate for the task. You can then combine this Jira data with data from an HRIS system like Workday to understand the output for each engineer taking into account PTO, holidays, etc. Furthermore, engineers might self-report how much time or capacity they are spending each project. Perhaps 50% of time is spent on Project A and 50% on Project B. These ratios (probably also tracked in a spreadsheet somewhere), can then be applied to the story points to get an accurate picture of how much effort was actually spent on the project as a whole.

Source: Plaky

You can take this one step even further by combining your Jira backlog data with costs and salaries from your finance system. Then you can start seeing the actual dollar costs for different software projects. This might be important for accounting teams as they may be interested in software capitalization and being able to compare software projects with other assets in the company.

The key takeaway is that these questions and answers start with exporting data from Jira into spreadsheets.

Benefits of exporting Jira backlog into spreadsheets

If you’re a follower of this newsletter and podcast, you already know why spreadsheets are the preferred tool for business users. Stepping outside of Jira for a second, Excel is still one of the best analysis and prototyping tools for businesses of all sizes. Our group talked about why Excel and Google Sheets is still used within companies and why it’s the first thing you even think about exporting to. We all already have practice doing this in our personal lives. Think of the first time you were able download transactions from your bank statement into a spreadsheet. What did that moment feel like? Is magical a stretch?

Source: Amazon

There are other benefits for exporting your Jira backlog into spreadsheets beyond reporting. If other team members don’t have a Jira license, they can still “see” the data in a spreadsheet format (assuming the organization is a Microsoft Office or Google Suite shop). It’s not ideal, but emailing that spreadsheet around or storing it on Sharepoint makes that Jira backlog collaborative. Now others beyond the engineering team can get visibility into what the engineering team is doing.

Jira add-ons for niche use cases

I mentioned the plethora of add-ons for custom reports in Jira. It’s amazing to me how many add-ons exist for very niche use cases in Jira.

One topic that came up during our discussion is how to calculate the time a backlog items spends in different statuses. When the item moves from “Not Started” to “In Progress,” you may want to know how much time has elapsed. This cycle time is important to understand how long it takes to complete tasks once they’ve started. There are add-ons for this in Jira but there are times when you may want to calculate the time in status according to your business rules. This means–surprise surprise–exporting to Excel and writing a formula to calculate the time it takes for items to move through statuses.

Snapshot of the Time in Status add-on for Jira Cloud

The issue is that this granularity of data doesn’t exist in the native export in Jira. To get this granular level of data, you would need another export that would have duplicate entries of a task and the timestamp for when that task moved to a certain status. This data is available through the API, but that would require additional work beyond doing a simple export from Jira.

Importing a spreadsheet into Jira

I didn’t consider the “other” direction of importing a spreadsheet into Jira until I met with people at the conference. To be precise, you can only import a CSV into Jira. The reason for importing from a CSV to Jira is when you want to make bulk changes or additions to your Jira backlog or perhaps you’re migrating from another issue tracking platform.

Another edge case I had not considered is that data entry into Jira is not straightforward for non-engineering teams. If you’re trying to crowdsource ideas for new projects to tackle for next quarter, do you ask your key stakeholders to simply create projects in Jira? What are the proper settings for the project? From the stakeholder’s point of view, all they care about is being able to add a new project quickly and perhaps a description of the project.

To make the data entry easier, you could use a Google Form to collect ideas from various business stakeholders. The form has standard fields like project name, project description, type of project (using a dropdown of pre-filled values), team, etc. Now you’ll have a Google Sheet of all the projects sourced from various parts of your organization in a standard format that works for submitting to your Jira workspace. Even after people submit projects, however, the Jira admin or DRI would have to clean up the submissions to make sure that only valid projects get imported into Jira.

Maintaining workflows with spreadsheets and Jira

Once your Jira backlog is exported out of Jira into a spreadsheet or the data is prepared in a spreadsheet to be imported into Jira, there is a whole separate set of issues that arise. We know that the spreadsheet is the easiest interface for non-engineering teams to use when it comes to Jira data, but someone still has to maintain that spreadsheet. Usually it’s the person who has the domain or business knowledge for why they need the spreadsheet in the first place and they happen to know enough formulas to get the analysis done.

In the moment, the business user gets the job done and goes on with their day. Inevitably, someone will ask the question: can we get the data for next month? Or next week? Now that business user has to think about a long-term solution for maintaining this spreadsheet or push it off to IT. This is where the spreadsheet can really hinder the progress of non-engineering teams. At the conference, we talked about a few different directions this could go.

Just do it manually each time

Not the most ideal scenario, but it gets the job done. After you’ve exported from Jira and your other tools, you brute force to get the job done. Some parts of the file might be automated or you set it up so that all you need to do is just paste in the raw data from the Jira export. Once you get good at doing the manual work, it becomes muscle memory. Thinking through an automated solution or moving off of spreadsheets entirely is an opportunity cost because you have become so fast at doing it manually. We talked about how this method is usually the preferred method when the analysis is ad-hoc or doesn’t need to be done frequently (e.g. quarterly).

Scripting with VBA or Google Apps Script

You decide you have some time to come up with an automated solution or you’re tired of people asking you for the analysis. So you decide to learn a little VBA for Excel or Google Apps Script for Google Sheets. I recently spoke with someone who figured out a way to write a Google Apps Script that pulls from Jira and dumps the backlog into a Google Sheets directly (using the Jira API).

While this solution does remove the manual work from the exporting from Jira and downstream analysis, the question remains: who will maintain this script? What happens when there are new custom fields in Jira that need to be pulled into the spreadsheet? If you are the only person that knows how the script works, you may find yourself scripting and gathering requirements from stakeholders on what data from Jira they want to pull.

Isn’t there an add-on for this?

For Jira Cloud and Google Sheets, yes. It’s just called Jira Cloud for Google Sheets. In the Atlassian marketplace, it only has a few reviews but in the Google Workspace Marketplace, it has 2M+ downloads and over 100 reviews. It’s actually supported by Atlassian and it’s free. But according to the Atlassian Marketplace listing, the add-on was released in 2022 and hasn’t been updated since then.

The reviews are mixed and for those who gave 1-2 stars, the add-on just stopped working altogether. So try at your own risk and I wouldn’t implement it in any business-critical workflows. This review shows what happens when you depend on the add-on and it stops working all of a sudden:

This review speaks to the point I made about embedding a spreadsheet or an add-on in your workflow and then having that spreadsheet or add-on go down. You (in this case, Atlassian) is in charge of fixing the issue and other downstream stakeholders have to come up with workarounds. This Jira user now needs to do a regular CSV export from Jira and get it in the format that he had set up with the Google Sheets add-on.

Build vs. buy: manage the spreadsheet yourself or let Atlassian do it for you?

The classic build vs. buy decision can be applied to the stance your organization takes on the usage of spreadsheets and Jira. You could rely on your TPMs and data analysts to pull data out of Jira and manage the scripts and spreadsheets themselves. This is the “build” scenario.

Atlassian knows that spreadsheets are a problem within organizations because they are brittle, prone to human error, and most importantly, no one typically takes ownership of the maintenance of that spreadsheet. So Atlassian gives you the opportunity to let them replace that spreadsheet at your organization with Jira Align. The product marketer for Jira Align did a great job of framing the problem on this landing page:

Saying Atlassian is giving you the “opportunity” to let them solve the spreadsheet problem for you is a bit of a euphemism. Atlassian sees a huge revenue opportunity with Jira Align which is why the pricing is on the higher side. It’s meant for large enterprises with 500+ Jira users. For 100 seats, Jira Align costs $155,000 annually (or around $130/mo/user). This includes “integrated” users who are still doing work in Jira Cloud and don’t need access to all Jira Align features.

Is Atlassian’s Jira Align worth it?

With an enterprise license, your organization will also most likely get all kinds of human support. Maybe a few account managers, CSMs, etc. to make sure you’re feeling happy and getting the most out of the tool. The main question is what happens when you want to customize something about a chart or metric that is outside of Align’s current feature set? Do you also get an on-call developer who build that feature for you or prioritize your feature requests on Align’s own product backlog?

Source: I Am Developer

I think you know where I’m going with this. Instead of not only exporting from Jira Cloud, some analyst may need to now export from Jira Align and merge that with other business data. I don’t necessarily agree with this paragraph on the Jira Align landing page:

Spreadsheet solutions are not designed for the rapid collaboration and information sharing that an innovative enterprise needs. A connected enterprise can quickly and easily answer questions like: Will we deliver on time? What are my team’s dependencies? Are we building the right things? How do we know?

I think the spreadsheet is exactly the “rapid collaboration” tool that most enterprises use because it’s not cost-prohibitive and knowledge workers know how to navigate a spreadsheet. However, if no team maintains the spreadsheet or the various workflows surrounding it, then Jira Align is probably worth it given it’s a managed service by Atlassian. Jira Align was actually an Atlassian acquisition back in 2020 and was formerly called AgileCraft. So there may still be some legacy integration issues between Jira Cloud and Align. But if you’re tired of exporting out of Jira into spreadsheets and would rather have this be someone else’s problem, Jira Align might be worth exploring.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #127: Spreadsheets vs. Jira: Which one is better for your team? appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-127-spreadsheets-vs-jira-which-one-is-better-for-your-team/feed/ 1 I wasn't sure if this topic should be it's own episode but it's been on my mind ever since I came back from Atlassian Team '24 (Atlassian's annual conference). At the conference, I had the opportunity to meet with a few people who are just as intereste... I wasn't sure if this topic should be it's own episode but it's been on my mind ever since I came back from Atlassian Team '24 (Atlassian's annual conference). At the conference, I had the opportunity to meet with a few people who are just as interested in spreadsheets as I am. We talked specifically how Jira can best work with spreadsheets (Excel or Google Sheets) and different workflows that result from the combination of these two tools. It was fascinating to hear how company culture and old ingrained way of doing things leads to the usage of spreadsheets when Jira and its add-ons can accomplish 80-90% of what the business need is. This episode highlights some of the things we discussed at the conference and implications for the future for teams using Jira and spreadsheets.



Source: Atlassian



What is Jira?



Since most people following the newsletter are data analysts, I thought it would be relevant to first share what Jira is. Most would say Jira is issue-tracking software used by engineering and product teams to track software projects. The software aims to mirror agile and scrum methodologies for accomplishing tasks versus traditional waterfall techniques. The rituals behind agile and scrum are codified in Jira's features, so that's why the software is loved by thousands of engineering teams around the world. This is a good video from Atlassian on what a scrum project in Jira looks like. Near the end, you'll see a backlog of tasks. The backlog is one of the most foundational principles of the scrum methodology and will serve as the launching pad for this discussion on Jira and spreadsheets.




https://www.youtube.com/watch?v=SOVGEsV5O9A




Why do teams export Jira issues to Excel spreadsheets?



One theme for why teams would want to export Jira issues into spreadsheets is reporting. We also talked about using other tools like Power BI for reporting purposes, but the intermediary step between Jira and Power BI is still a CSV export.



There are built-in reporting and charting capabilities in Jira. There are also a plethora of add-ons in the Atlassian marketplace for custom charts. The issue with the add-ons is they can get quite costly since you are paying on a per-seat basis. So even if the Jira admin is the one creating the charts, you still have to pay for the other Jira users who are simply viewing the charts. This charting add-on below is one of the most popular add-ons for Jira with 10,000+ downloads. Looks a bit like Excel, no?



Source: eazyBI



Digging a little deeper, we also discussed how the Jira backlog is kind of like a datastore for what the product and eng teams are working on. You can think of this almost like another table of data in your data warehouse. What does this mean for a regular business user who doesn't work on the eng or product team and still needs the data? Traditionally, they would write a SQL query to get the data they need, do their analysis, and call it a day. With Jira, they would need the Jira admin to export the backlog to a CSV and then they can go off into Excel and do their custom reporting, PivotTables, and dashboarding to show how the product and eng team's work aligns with the rest of the work of the company.



Story points,]]>
Dear Analyst 127 127 full false 36:58 54831
Dear Analyst #126: How to data storytelling and create amazing data visualizations with Amanda Makulec https://www.thekeycuts.com/dear-analyst-how-to-data-storytelling-and-create-amazing-data-visualizations-with-amanda-makulec/ https://www.thekeycuts.com/dear-analyst-how-to-data-storytelling-and-create-amazing-data-visualizations-with-amanda-makulec/#respond Mon, 15 Apr 2024 06:05:00 +0000 https://www.thekeycuts.com/?p=54336 With an undergraduate degree in zoology and a master’s in public health, you wouldn’t expect Amanda Makulec to lead a successful career in data analytics and data visualization. As we’ve seen with multiple guests on the podcast, the path to a career in data analytics is windy and unexpected. It was the intersection of public […]

The post Dear Analyst #126: How to data storytelling and create amazing data visualizations with Amanda Makulec appeared first on .

]]>
With an undergraduate degree in zoology and a master’s in public health, you wouldn’t expect Amanda Makulec to lead a successful career in data analytics and data visualization. As we’ve seen with multiple guests on the podcast, the path to a career in data analytics is windy and unexpected. It was the intersection of public health and data visualization that got Amanda interested in data visualization as a career. In one of her roles, Amanda was supporting USAID by analyzing open data sets and creating charts and graphs for publishing content. Her team consisted of graphic designers and developers. Designers would basically take her charts from Excel and add more color and add on text to the chart. Amanda found that large enterprises were facing the same challenges as the organizations she was supporting in public health (and enterprises have more money to throw at this problem). Thus began Amanda’s career in data viz.

How do you tell a data story?

We’ve talked a lot about data storytelling a lot on this podcast. If there is one person who can crisply define what data storytelling is, it would be Amanda. This is Amanda’s definition according to this blog post:

Finding creative ways to weave together numbers, charts, and context in a meaningful narrative to help someone understand or communicate a complex topic. 

We talked a bit about how data storytelling can mean different things to different people (this blog post in Nightingale talks more about this). You might work with a business partner or client who says they want a data story, but all they really want is just an interactive dashboard with a filter. Amanda cites Robert Kosara’s definition of data storytelling in 2014 as one of her favorites:

  • ties facts together: there is a reason why this particular collection of facts is in this story, and the story gives you that reason
  • provides a narrative path through those facts: guides the viewer/reader through the world, rather than just throwing them in there
  • presents a particular interpretation of those facts: a story is always a particular path through a world, so it favors one way of seeing things over all others

Amanda stresses the 3rd bullet point as the most important part of data storytelling. If the audience has to walk away with one analytics fact from the story, what is that fact you want to get across?

Source: Effective Data Storytelling

Getting feedback on your data stories and visualization

One point Amanda brought up during the conversation which I think is worth highlighting is feedback. After you’ve published of launched an analysis, dashboard, or data story, you rarely get feedback on how effective the product was at telling a story. You might get some qualitative feedback like the dashboard answers specific questions or that the findings are “interesting.” But was the visualization actually effective at telling a story?

Amanda likes to ask people what they like and don’t like about her data stories and visualizations. Often people will get frustrate because the key takeaway from the data story is simply counter to what they believe. This leads them to questioning the validity of the data source. But you as the storyteller are simply conveying the signal from the noise in all the data.

During the pandemic, Amanda worked with the John Hopkins Center for Communications to create charts around COVID. Talk about telling an important data story! Amanda is presenting data about a worldwide pandemic while working with an organization that was at the core of reporting on the stats on the pandemic. Needless to say, the data stories and visualizations drew a variety of feedback. Remember seeing stories like this questioning how different entities and organizations were collecting and disseminating data about COVID? Being able to concisely present dense survey data about COVID is probably the toughest data storytelling job I can think of.

Applying principles of user-centered design to data visualization

Before Amanda starts working on a new dashboard or visualization, she asks several questions about the project:

  • Who is going to use the dashboard?
  • When are they going to use it?
  • What are their needs?

Before designing the dashboard, Amanda likes to borrow from the world of user-centered design to make sure her data visualization meets the goals of the end user. She creates mindset maps to make sure the dashboard is serving the right group. Journey maps also helps with figuring out how often the target audience will engage with the dashboard.

Source: The Interaction Design Foundation

We chatted about data exploration and data explanation. The explanation step is sometimes overlooked by analysts because so much time is spent on the nuts and bolts of creating the visualization. But data explanation is just as important because this helps lead the end user to the key analytical fact of the data story. This means having clear chart titles and annotations so that you’re guiding the user to the key takeaway of the story.

Data tools for building effective data visualizations

I love talking about tools so we spent some time talking about the tools Amanda uses to build her data stories and data visualizations. Amanda talked about understanding the constraints of the data tools so that you know what you can and cannot build with the tool. For instance, Amanda talked about Power BI not supporting dot chart plots before so she didn’t consider Power BI as tool in her toolbelt for telling data stories (if it involve dot chart plots). Other tools like Tableau and Ggplot are great for adding annotations to different parts of the data visualization.

Did you really mean story-finding?

Amanda talks about how some people want more data storytelling but what they really want is “story-finding.” Coupled with data exploration, story-finding is all about finding the trends and outliers in a dataset before doing the actual data storytelling. This graphic from Amanda’s blog post neatly plots some of these common terms we hear in the data viz world and shows how important words are in describing what we want:

Source: Nightingale

I asked Amanda what story-finding projects she’s actively working on and she talked about a project she’s working on with the Data Visualization Society (where she is the Executive Director). Her team has been trying to learn more about the membership (31,000+ people) so that they can create better programming for members. They partnered with the The Data School at The Information Lab to create a survey for members. As Amanda explored the data, she found that members are requesting information about professional development and work in a digital analytics capacity. The Data Visualization Society also issues a challenge to the community to come up with interesting visualizations using the survey data (see results from 2022 here). I really like this one done in Figma which shows which tools are most used by the members (surprise surprise Excel is on the map):

Source: James Wood

Advancing the profession of data visualization

We talked about how Amanda got involved with the Data Visualization Society and how it keeps her connected with the broader data viz community. It’s a volunteer board and the the organization has its roots in the Tapestry Conference in 2018. Elijah Meeks (then a senior data viz engineer at Netflix) gave the keynote about a “3rd wave” of BI tools becoming popular like Tableau and Jupyter notebooks. The talk is definitely worth a watch if you’re interested in the history of data visualization:

The Data Visualization Society is a space for people working in the data visualization profession to connect. The main goals of organization are to celebrate the profession, nurture cross-functional connections, and advance the practice of data visualization. Their annual conference is aptly called Outlier and is coming up in June.

Landing your next data visualization role

As with all episodes, I asked Amanda on advice she has for aspiring data visualization professionals. She had a lot to say on the topic. One thing that stood out to me is that all of us have the skills that translate well into data analytics and data visualization. Whether you are a writer or elementary school teacher, your communication and collaboration skills to produce a deliverable are the skills you need to be a data visualization expert.

Source: Global Investigative Journalism Network

Aside from joining organizations like the Data Visualization Society, Amanda suggested mastering fundamental design skills. Understanding how to declutter charts is one important aspect of being a data visualization expert. Of course, the Data Visualization Society’s journal has a bunch of great resources like this article on starting out in the world of data visualization and questions to ask when starting out in data viz. In the starting out article, I really liked this line about making things simple:

Simple is also beautiful in data visualization, and as long as what you’re creating is meeting the needs of your audience, you’re succeeding in making data more accessible to more people, which is an incredible talent in itself.

Source: Nightingale

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #126: How to data storytelling and create amazing data visualizations with Amanda Makulec appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-how-to-data-storytelling-and-create-amazing-data-visualizations-with-amanda-makulec/feed/ 0 With an undergraduate degree in zoology and a master's in public health, you wouldn't expect Amanda Makulec to lead a successful career in data analytics and data visualization. As we've seen with multiple guests on the podcast, With an undergraduate degree in zoology and a master's in public health, you wouldn't expect Amanda Makulec to lead a successful career in data analytics and data visualization. As we've seen with multiple guests on the podcast, the path to a career in data analytics is windy and unexpected. It was the intersection of public health and data visualization that got Amanda interested in data visualization as a career. In one of her roles, Amanda was supporting USAID by analyzing open data sets and creating charts and graphs for publishing content. Her team consisted of graphic designers and developers. Designers would basically take her charts from Excel and add more color and add on text to the chart. Amanda found that large enterprises were facing the same challenges as the organizations she was supporting in public health (and enterprises have more money to throw at this problem). Thus began Amanda's career in data viz.







How do you tell a data story?



We've talked a lot about data storytelling a lot on this podcast. If there is one person who can crisply define what data storytelling is, it would be Amanda. This is Amanda's definition according to this blog post:




Finding creative ways to weave together numbers, charts, and context in a meaningful narrative to help someone understand or communicate a complex topic. 




We talked a bit about how data storytelling can mean different things to different people (this blog post in Nightingale talks more about this). You might work with a business partner or client who says they want a data story, but all they really want is just an interactive dashboard with a filter. Amanda cites Robert Kosara's definition of data storytelling in 2014 as one of her favorites:




* ties facts together: there is a reason why this particular collection of facts is in this story, and the story gives you that reason



* provides a narrative path through those facts: guides the viewer/reader through the world, rather than just throwing them in there



* presents a particular interpretation of those facts: a story is always a particular path through a world, so it favors one way of seeing things over all others




Amanda stresses the 3rd bullet point as the most important part of data storytelling. If the audience has to walk away with one analytics fact from the story, what is that fact you want to get across?



Source: Effective Data Storytelling



Getting feedback on your data stories and visualization



One point Amanda brought up during the conversation which I think is worth highlighting is feedback. After you've published of launched an analysis, dashboard, or data story, you rarely get feedback on how effective the product was at telling a story. You might get some qualitative feedback like the dashboard answers specific questions or that the findings are "interesting." But was the visualization actually effective at telling a story?



Amanda likes to ask people what they like and don't like about her data stories and visualizations. Often people will get frustrate because the key takeaway from the data story is simply counter to what they believe. This leads them to questioning the validity of the data source. But you as the storyteller are simply conveying the signal from t...]]>
Dear Analyst 125 125 full false 44:36 54336
Dear Analyst #125: How to identify Taylor Swift’s most underrated songs using data with Andrew Firriolo https://www.thekeycuts.com/dear-analyst-125-how-to-identify-taylor-swifts-most-underrated-songs-using-data-with-andrew-firriolo/ https://www.thekeycuts.com/dear-analyst-125-how-to-identify-taylor-swifts-most-underrated-songs-using-data-with-andrew-firriolo/#respond Mon, 25 Mar 2024 05:34:00 +0000 https://www.thekeycuts.com/?p=54652 Sometimes pop culture and data analysis meet and the result is something interesting, thought-provoking, and of course controversial. How can one use data to prove definitely which Taylor Swift songs are the most underrated? Isn’t this a question for your heart to answer? Andrew Firriolo sought to answer this question over the last few months […]

The post Dear Analyst #125: How to identify Taylor Swift’s most underrated songs using data with Andrew Firriolo appeared first on .

]]>
Sometimes pop culture and data analysis meet and the result is something interesting, thought-provoking, and of course controversial. How can one use data to prove definitely which Taylor Swift songs are the most underrated? Isn’t this a question for your heart to answer? Andrew Firriolo sought to answer this question over the last few months and the results are interesting (if you’re a Taylor Swift fan). As a Swiftie since 2006 (moniker for Taylor Swift fans), Andrew wanted to find a way to bridge his passions for Taylor Swift and data analysis. He’s currently a senior data analyst at Buzzfeed, and published his findings on Buzzfeed to much reaction from the Swiftie community. In the words of Taylor Swift, Andrew’s methodology and analysis just “hits different.”

From comp sci to data analytics

Andrew studied computer science at New Jersey Institute of Technology but realized he liked the math parts of his degree over the engineering parts. Like many guests on this podcast, he made a transition to data analytics. Interestingly, it wasn’t a job that propelled him into the world of data analytics. But rather, going to graduate school at Georgia Institute of Technology (Georgia Tech). GIT has some really affordable online technical programs including data analytics. After getting his master’s degree, he worked at Rolling Stone as a data analyst. This is the beginning of Andrew’s exploration into the Spotify API to see the data behind music. You can see some of the articles Andrew published while at Rolling Stone here.

Source: Pocketmags

After Rolling Stone, Andrew landed his current role at Buzzfeed building internal dashboards and doing internal analysis. In both of his roles, he talks about using a lot of SQL and R. A big part of his job is explaining the analyses he’s doing to his colleagues. This is where the data storytelling aspect of a data analyst’s job comes into play. I call this the “soft” side of analytics but some would argue that it’s the most important part of a data analyst’s job. In most data analyst roles you aren’t just sitting at your desk writing SQL queries and building Excel models. You’re a business partner with other people in the organization communication skills are more important than technical skills.

Answering a Taylor Swift question with data

Andrew became a Taylor Swift fan through his sister in 2006. They both listed to the world premier of Taylor’s first album. Given his background in data, Andrew decided to answer a question about Taylor Swift that’s been on his mind for a while: what are Taylor Swift’s most underrated songs?

To read Andrew’s full article, go to this Buzzfeed post.

Andrew’s hypothesis was that there’s a way to use data to prove which songs in Taylor’s discography are most underrated. When I classify something as “underrated,” it’s usually a decision you make with your gut. But it’s always interesting to see the data (and the methodology) for determining if something is truly “underrated.”

Multiple iterations in song streaming analysis

As mentioned earlier, Andrew made good use of Spotify’s API. The API gives you a plethora of information about songs such as how “danceable” or “acoustic” a song is. Each characteristic is measured on a scale of 0 to 1.

For the first iteration of Andrew’s analysis, he simply compared a given song’s streaming performance to the album’s median streaming performance. The hypothesis here is that the less-streamed songs are considered the underrated songs. The result of this analysis was a lot of Taylor’s deluxe tracks.

Source: Genius

The second iteration was to look beyond the streaming performance of the album the song is on. Andrew compared the song’s performance relative to album’s released before and after the current album. This surfaced some more underrated songs.

Getting the opinion of Swifties

While Andrew’s analysis so far yielded some interesting songs, he found that these songs weren’t all that loved by other Swifties.

In his final iteration, Andrew implemented a quality score to his analysis. This is a more subjective number that would take into account the opinion of experts.

At Rolling Stone, they had a rolling list of expert opinions that were published in various places. He had a data set of 1,000 opinions on different Taylor Swift songs that he could use to qualify a song. The big question is, how much weight do you give the quality score? In the end, Andrew decided on a weight od 33% to each metric he tracked:

  1. Percent difference between its lifetime Spotify streams and the median streams of its album
  2. Percent difference between its lifetime Spotify streams and the median streams, including neighboring albums
  3. Average of six rankings of Taylor’s discography from media publications (quality score)

The quality score basically took into account the wisdom of the Swifty community.

Source: Know Your Meme

Getting to the #1 most underrated song: Holy Ground (Red)

Andrew was able to use R–a tool he’s already using every day on his job–to do this analysis. After dumping all the data from the Spotify API into a CSV, he used the Tidyverse R packages do crunch the numbers. One of the most commonly used packages for data visualization in Tidyverse is ggplot. But superimposing the images of Taylor Swift’s albums onto the charts created by ggplot was a new script Andrew had to write in R. I asked Andrew if he had to learn any new skills for this Taylor Swift analysis, and the main skill Andrew said he had to learn was data visualization. Here’s an example of a visual from Andrew’s blog post for the #1 most underrated Taylor Swift song:

Source: Republic Records / Tidyverse / Andrew Firriolo / BuzzFeed

To make sure he was on the right track, Andrew asked other Swifties what their #1 most underrated Taylor Swift song was. To Andrew’s delight, two co-workers said Holy Ground. Getting this qualitative feedback let Andrew know he was on the right track.

On the Buzzfeed article, half of the commenters agree that Holy Ground is indeed the most underrated song. The other half talk about other songs that should on the list. When Andrew posted his analysis on LinkedIn, most people commented on his methodology and thought process (like we did in this episode).

Using science to see which re-releases of Taylor’s songs most resemble the original song

Of course, “science” is used a bit loosely here. But similar to Andrew’s underrated song analysis, this analysis utilized the Spotify API to see which Taylor’s Version song most closely matches the original song. This was Andrew’s first analysis on Taylor Swift published late last year.

Read the Buzzfeed article for the full details on the methodology. Andrew also used R and various packages like the HTTP request package to pull the data from Spotify. To skip right to the results: the #1 song where Taylor’s version is most similar to the original is Welcome to New York.

Source: Republic Records/Big Machine Records/Tidyverse/Andrew Firriolo/BuzzFeed

Euclidean Pythagorean distance scores and Taylor Swift

When Andrew first brought up this concept I just scratched my head. Sounds advanced and if someone is bringing up Euclid in a Taylor Swift analysis, you trust that it must be thorough and accurate.

In reality, this concept harkens back to your high school geometry/algebra days. The distance formula simply measures the distance between two points on an X-Y plot:

Source: HowStuffWorks

In this analysis, Andrew utilized 7 metrics from the Spotify API for each version of Taylor’s songs. So each song could be plotted on an X-Y plot where the X might be the acousticness of the original song and the Y would be the acousticness of Taylor’s Version. The beauty of this formula is that it can find the distance between N points in N dimensions. I definitely went down the rabbit hole on this one to learn more about this formula I originally learned in high school. Here’s an explanation of the distance formula in 3-D space (something we can comprehend visually):

But in this analysis, there are 7 points. That means there are points in 7 dimensions. How do we even visualize that many dimensions? This explanation discusses a solution to this problem of how to think about plotting points beyond three dimensions. Math and linear algebra for the win!

I asked Andrew what the next Taylor Swift analysis will be. He said once he sees enough people asking a question about Taylor Swift that can potentially be answered by data, he’ll start an exploratory analysis (most likely with the Spotify API).

Getting your big break in data analytics

Andrew’s #1 advice for landing a job in data analytics or transitioning to a career in data is getting your master’s degree. We haven’t heard this advice too much on the podcast, but Andrew is a shining example of how a master’s degree in data can help. Especially at a university like GIT where the cost is quite low relative to a traditional university.

Andrew also discussed the importance of knowing SQL as the key technical skill for a data analytics role. Who knew that a database query language from 1970 would still be in high demand today?

Source: Medium / Çağatay Kılınç

The final piece of advice Andrew gave regarding skills you need for a career in data analytics is communication. Specifically, knowing how to communicate your analysis to a non-technical audience. At the beginning of his career at Buzzfeed, Andrew received feedback that his explanations were too technical. He realized that everyone didn’t need to know how the SQL query was constructed and people just cared about the trends and final results.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #125: How to identify Taylor Swift’s most underrated songs using data with Andrew Firriolo appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-125-how-to-identify-taylor-swifts-most-underrated-songs-using-data-with-andrew-firriolo/feed/ 0 Sometimes pop culture and data analysis meet and the result is something interesting, thought-provoking, and of course controversial. How can one use data to prove definitely which Taylor Swift songs are the most underrated? Sometimes pop culture and data analysis meet and the result is something interesting, thought-provoking, and of course controversial. How can one use data to prove definitely which Taylor Swift songs are the most underrated? Isn't this a question for your heart to answer? Andrew Firriolo sought to answer this question over the last few months and the results are interesting (if you're a Taylor Swift fan). As a Swiftie since 2006 (moniker for Taylor Swift fans), Andrew wanted to find a way to bridge his passions for Taylor Swift and data analysis. He's currently a senior data analyst at Buzzfeed, and published his findings on Buzzfeed to much reaction from the Swiftie community. In the words of Taylor Swift, Andrew's methodology and analysis just "hits different."







From comp sci to data analytics



Andrew studied computer science at New Jersey Institute of Technology but realized he liked the math parts of his degree over the engineering parts. Like many guests on this podcast, he made a transition to data analytics. Interestingly, it wasn't a job that propelled him into the world of data analytics. But rather, going to graduate school at Georgia Institute of Technology (Georgia Tech). GIT has some really affordable online technical programs including data analytics. After getting his master's degree, he worked at Rolling Stone as a data analyst. This is the beginning of Andrew's exploration into the Spotify API to see the data behind music. You can see some of the articles Andrew published while at Rolling Stone here.



Source: Pocketmags



After Rolling Stone, Andrew landed his current role at Buzzfeed building internal dashboards and doing internal analysis. In both of his roles, he talks about using a lot of SQL and R. A big part of his job is explaining the analyses he's doing to his colleagues. This is where the data storytelling aspect of a data analyst's job comes into play. I call this the "soft" side of analytics but some would argue that it's the most important part of a data analyst's job. In most data analyst roles you aren't just sitting at your desk writing SQL queries and building Excel models. You're a business partner with other people in the organization communication skills are more important than technical skills.



Answering a Taylor Swift question with data



Andrew became a Taylor Swift fan through his sister in 2006. They both listed to the world premier of Taylor's first album. Given his background in data, Andrew decided to answer a question about Taylor Swift that's been on his mind for a while: what are Taylor Swift's most underrated songs?







To read Andrew's full article, go to this Buzzfeed post.



Andrew's hypothesis was that there's a way to use data to prove which songs in Taylor's discography are most underrated. When I classify something as "underrated," it's usually a decision you make with your gut. But it's always interesting to see the data (and the methodology) for determining if something is truly "underrated."



Multiple iterations in song streaming analysis
...]]>
Dear Analyst 125 125 full false 37:06 54652
Dear Analyst #124: Navigating people, politics and analytics solutions at large companies with Alex Kolokolov https://www.thekeycuts.com/dear-analyst-124-navigating-people-politics-and-analytics-solutions-at-large-companies-with-alex-kolokolov/ https://www.thekeycuts.com/dear-analyst-124-navigating-people-politics-and-analytics-solutions-at-large-companies-with-alex-kolokolov/#respond Mon, 05 Feb 2024 06:16:00 +0000 https://www.thekeycuts.com/?p=54019 We sometimes forget that a large organization is composed of groups and divisions. Within these groups, there are teams and individuals looking to advance their careers. Sometimes at the expense of others. When your advancement depends on the success of your project, the benefits of that project to your company may be suspect and the […]

The post Dear Analyst #124: Navigating people, politics and analytics solutions at large companies with Alex Kolokolov appeared first on .

]]>
We sometimes forget that a large organization is composed of groups and divisions. Within these groups, there are teams and individuals looking to advance their careers. Sometimes at the expense of others. When your advancement depends on the success of your project, the benefits of that project to your company may be suspect and the tools you use to complete that project may not be the best tools for the job. Alex Kolokolov started his journey in data like many of us: in Excel. He moved on to Power BI, PowerPivot, PowerQuery, and building data visualizations for the last 15 years. In this episode, he talks through consulting with a company as the analytics expert only to find out that the the underlying forces at play were company politics. He also discusses strategies to make your line charts tell a better data story.

The state of analytics at companies in traditional industries

Alex consults with large companies in “traditional” industries like oil, gas, and mining companies. The state of analytics and knowledge of analytics is not equal in these companies, according to Alex. You’ll come across data science and AI groups at these companies who are, indeed, working on the cutting edge. But then when you approach other departments like HR or operations, they are still quite far from this digital transformation that everyone’s talking about.

Alex worked with a mining company where there are cameras that can ID employees using facial recognition when they walk through the door. But when you sit down with the folks who are actually doing the work at the plant, they are still humming along on Excel 2010. Excel 2010! What a time…

Source: dummies.com

In terms of creating dashboards, teams from these companies would consult their IT or tech team to create a report. But then the IT team comes back and says it will take three months to create this report given their current backlog. Hence the reason these companies outsource the analytics training, metrics collection, and dashboarding to people like Alex.

Internal battles for power and platforms

Alex once worked with a government institution and they were building an internal SQL data warehouse before Power BI came on the scene. This specific project was driven by IT as a warehouse solution for the finance department. a few years later, the head of this SQL project became the CIO, but started getting some pushback from the heads of the finance department. It turns out the finance department heads already had their own platform in mind and claimed Microsoft’s technology was outdated for their purposes (the finance team wanted to go with Tableau to build out pretty dashboards).

Source: reddit.com

The finance department proceeded to roll out their solution in Tableau and the CFO eventually became the Chief Digital Office and pushed the CIO who was spearheading the SQL project out. The project wasn’t about Microsoft vs. Tableau at all. It was all about who was better at playing the game of internal politics and fighting for the resources to get your project across the line.

When digital transformation is 10 years too late

Large companies Alex has worked claimed they went through “digital transformation” but this was back in 2012. When Alex started working with these companies over the last few years, he found that individuals were still using SAP and Excel 2010. It’s as if the digital transformation didn’t go past 2012, and whatever tools were brought in at the time were meant to carry the organization for another 20 years. We’ve all seen this story. Large companies and enterprises move slow and digital transformation sounds nice and warm, but execution is where organizations may lose their place.

Source: marketoonist.com

In my own experience, teaching someone an Excel keyboard shortcut that saves them X number of hours per week of manual work is a pretty awesome feeling. It’s a visceral feeling of knowing you are having a direct impact on the person’s productivity. Alex has done the same thing at these large companies which, at the heart of it, is explaining somewhat “technical” concepts in an approachable way. If there’s one lesson Alex has learned over the years from helping people stand up dashboards, the one advice he always gives is: don’t insert new columns. Adding new columns may ruin the way the data is laid out (if its a time series) or affect the look and feel of a dashboard.

When your line charts look like spaghetti

Alex published a blog post late last year called When Charts Look Like Spaghetti, Try These Saucy Solutions where he provides different strategies for “untangling” your messy line charts. The goal is to have your audience walk away with the key message from the line chart. For instance, you have a line chart where it’s hard to detect trends (given the number of series on the chart) so you can selectively highlight a line (and gray out the rest) to make a point:

Source: nightingaledvs.com

Another option is to simply break out each line into its own mini chart:

Source: nightingaledvs.com

The one skill Alex believes analysts and dashboard creators should learn is compromise. If your visualization is overloaded with elements and colors and your target audience says they want to see all the data, you’ll have to find ways to give them what they need and highlight the story. Imagine this scenario:

An analyst is tasked with plotting more data on a chart by their superiors and so the analyst goes off and makes more charts. Eventually, the analyst realizes there are too many charts and decides to make the dashboard an interactive dashboard with interactive filters and Slicers. This allows the target audience to manipulate the data however they see fit. But does the target audience even know how to use the filters in the first place? Do they know it’s something they are supposed to interact with. There’s a mismatch between what the analyst wants the dashboard to do and what the target audience expects (consume vs. interact).

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #124: Navigating people, politics and analytics solutions at large companies with Alex Kolokolov appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-124-navigating-people-politics-and-analytics-solutions-at-large-companies-with-alex-kolokolov/feed/ 0 We sometimes forget that a large organization is composed of groups and divisions. Within these groups, there are teams and individuals looking to advance their careers. Sometimes at the expense of others. When your advancement depends on the success o... We sometimes forget that a large organization is composed of groups and divisions. Within these groups, there are teams and individuals looking to advance their careers. Sometimes at the expense of others. When your advancement depends on the success of your project, the benefits of that project to your company may be suspect and the tools you use to complete that project may not be the best tools for the job. Alex Kolokolov started his journey in data like many of us: in Excel. He moved on to Power BI, PowerPivot, PowerQuery, and building data visualizations for the last 15 years. In this episode, he talks through consulting with a company as the analytics expert only to find out that the the underlying forces at play were company politics. He also discusses strategies to make your line charts tell a better data story.







The state of analytics at companies in traditional industries



Alex consults with large companies in "traditional" industries like oil, gas, and mining companies. The state of analytics and knowledge of analytics is not equal in these companies, according to Alex. You'll come across data science and AI groups at these companies who are, indeed, working on the cutting edge. But then when you approach other departments like HR or operations, they are still quite far from this digital transformation that everyone's talking about.







Alex worked with a mining company where there are cameras that can ID employees using facial recognition when they walk through the door. But when you sit down with the folks who are actually doing the work at the plant, they are still humming along on Excel 2010. Excel 2010! What a time...



Source: dummies.com



In terms of creating dashboards, teams from these companies would consult their IT or tech team to create a report. But then the IT team comes back and says it will take three months to create this report given their current backlog. Hence the reason these companies outsource the analytics training, metrics collection, and dashboarding to people like Alex.



Internal battles for power and platforms



Alex once worked with a government institution and they were building an internal SQL data warehouse before Power BI came on the scene. This specific project was driven by IT as a warehouse solution for the finance department. a few years later, the head of this SQL project became the CIO, but started getting some pushback from the heads of the finance department. It turns out the finance department heads already had their own platform in mind and claimed Microsoft's technology was outdated for their purposes (the finance team wanted to go with Tableau to build out pretty dashboards).



Source: reddit.com



The finance department proceeded to roll out their solution in Tableau and the CFO eventually became the Chief Digital Office and pushed the CIO who was spearheading the SQL project out. The project wasn't about Microsoft vs. Tableau at all. It was all about who was better at playing the game of internal politics and fighting for the resources to get your project across the line.



When digital transformation is 10 years too late



Large companies Alex has worked claimed they went through "digital transformation" but this was back in 2012. When Alex started working with these companies over the last few years, he found that individuals were still using SAP and Excel 2010. It's as if the digital transformation didn't go past 2012,]]>
Dear Analyst 124 124 full false 38:24 54019
Dear Analyst #123: Telling data stories about rugby and the NBA with Ben Wylie https://www.thekeycuts.com/dear-analyst-123-telling-data-stories-about-rugby-and-the-nba-with-ben-wylie/ https://www.thekeycuts.com/dear-analyst-123-telling-data-stories-about-rugby-and-the-nba-with-ben-wylie/#respond Mon, 15 Jan 2024 06:55:00 +0000 https://www.thekeycuts.com/?p=54026 When you think of data journalism, you might think of The New York Times’ nifty data visualizations and the Times’ embrace of data literacy for all their journalists. Outside of The New York Times, I haven’t met anyone who does data journalism and data storytelling full-time until I spoke with Ben Wylie. Ben is the […]

The post Dear Analyst #123: Telling data stories about rugby and the NBA with Ben Wylie appeared first on .

]]>
When you think of data journalism, you might think of The New York Times’ nifty data visualizations and the Times’ embrace of data literacy for all their journalists. Outside of The New York Times, I haven’t met anyone who does data journalism and data storytelling full-time until I spoke with Ben Wylie. Ben is the lead financial journalist at a financial publication in London. Like many data analysts, he cut his teeth in Excel, got his equivalent of a CPA in the UK, and received his master’s degree in journalism. In this episode, we discuss how his side passion (sports analytics) led him to pursue a career in data journalism and how he approaches building sports data visualizations.

Playing with rugby data on lunch breaks

When Ben worked for an accounting firm, he would pull rugby data during his lunch breaks and just analyze it for fun. One might say this started Ben’s passion in data storytelling because he started a blog called The Chase Rubgy to share his findings. The blog was a labor of love, and at the end of 2019 he had only focused on rugby. After building an audience, he realized data journalism could be a promising career path so he did some freelance sports journalism at the end of his master’s course. At the end of 2022, he started Plot the Ball (still a side project) where the tagline is “Using data to tell better stories about sport.”

Learning new data skills from writing a newsletter

Ben spoke about how writing Plot the Ball forced him to learn new tools and techniques for cleaning and visualizing data. All the visualizations on the blog are done in R. A specific R package Ben uses to scrape data from websites is rvest. Through the blog, Ben learned how to scrape, import, and clean data before he even started doing any data visualizations. Sports data all came from Wikipedia.

I’ve spoken before about how the best way to show an employer you want a job in analytics is to create a portfolio of your data explorations. Nothing is better than starting a blog where you can just showcase stuff you’re interested in.

How the NBA became a global sport

One of my favorite posts from Plot the Ball is this post entitled Wide net. It’s a short post but the visualization tells a captivating story on how the NBA became global over the last 30 years. Here’s the main visualization from the post:

Source: Plot the Ball

Ben first published a post about NBA phenom Victor Wembanyama in June 2023 (see the post for another great visualization). Ben talks about this post being a good data exercise because there is no good NBA data in tabular form. This “waffle” chart was Ben’s preferred visualization since it allows you to better see the change in the subgroups. A stacked bar chart would’ve been fine as well, but since each “row” of data represents a roster of 15 players, the individual squares abstracts the team composition each year.

Home Nations closing the gap with Tri Nations in rugby

Ben talked about another popular post from his blog entitled Heading South. The post started as a data exploration exercise where Ben was simply trying to find trends instead of telling a story. For some background, rugby has traditionally been dominated by a few teams (e.g. Australia, New Zealand, and South Africa). The most recent finals was between New Zealand and South Africa and these two clubs have won a majority of World Cups.

Ben was interested in seeing how these elite teams and other teams were trending over time. Ireland and France have started doing well over the last few years but there is not bird’s eye view of how these teams are performing as a whole. So Ben decided to create this visualization:

Source: Plot the Ball

Cognitive overload is a concept many data visualization professionals care about. When a visualization has more information than an individual has the mental capacity to process, the message and story gets lost. A few factors about the visualization above eases the path for understanding the story:

  1. Gridline color is muted
  2. Data labels only show up at the end of the line charts
  3. The colors of the lines match the series name in the title of the chart

If it’s not clear what the trend is, the main header of the chart even tells you the key takeaway from the chart.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!

The post Dear Analyst #123: Telling data stories about rugby and the NBA with Ben Wylie appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-123-telling-data-stories-about-rugby-and-the-nba-with-ben-wylie/feed/ 0 When you think of data journalism, you might think of The New York Times' nifty data visualizations and the Times' embrace of data literacy for all their journalists. Outside of The New York Times, I haven't met anyone who does data journalism and data... When you think of data journalism, you might think of The New York Times' nifty data visualizations and the Times' embrace of data literacy for all their journalists. Outside of The New York Times, I haven't met anyone who does data journalism and data storytelling full-time until I spoke with Ben Wylie. Ben is the lead financial journalist at a financial publication in London. Like many data analysts, he cut his teeth in Excel, got his equivalent of a CPA in the UK, and received his master's degree in journalism. In this episode, we discuss how his side passion (sports analytics) led him to pursue a career in data journalism and how he approaches building sports data visualizations.







Playing with rugby data on lunch breaks



When Ben worked for an accounting firm, he would pull rugby data during his lunch breaks and just analyze it for fun. One might say this started Ben's passion in data storytelling because he started a blog called The Chase Rubgy to share his findings. The blog was a labor of love, and at the end of 2019 he had only focused on rugby. After building an audience, he realized data journalism could be a promising career path so he did some freelance sports journalism at the end of his master's course. At the end of 2022, he started Plot the Ball (still a side project) where the tagline is "Using data to tell better stories about sport."







Learning new data skills from writing a newsletter



Ben spoke about how writing Plot the Ball forced him to learn new tools and techniques for cleaning and visualizing data. All the visualizations on the blog are done in R. A specific R package Ben uses to scrape data from websites is rvest. Through the blog, Ben learned how to scrape, import, and clean data before he even started doing any data visualizations. Sports data all came from Wikipedia.



I've spoken before about how the best way to show an employer you want a job in analytics is to create a portfolio of your data explorations. Nothing is better than starting a blog where you can just showcase stuff you're interested in.



How the NBA became a global sport



One of my favorite posts from Plot the Ball is this post entitled Wide net. It's a short post but the visualization tells a captivating story on how the NBA became global over the last 30 years. Here's the main visualization from the post:



Source: Plot the Ball



Ben first published a post about NBA phenom Victor Wembanyama in June 2023 (see the post for another great visualization). Ben talks about this post being a good data exercise because there is no good NBA data in tabular form. This "waffle" chart was Ben's preferred visualization since it allows you to better see the change in the subgroups. A stacked bar chart would've been fine as well, but since each "row" of data represents a roster of 15 players, the individual squares abstracts the team composition each year.



Home Nations closing the gap with Tri Nations in rugby

]]>
Dear Analyst 123 123 full false 37:25 54026