Data warehouses have come a long way since the days of Oracle and Microstrategy. A data warehouse should be able to grow and expand with the business it supports. At Canva, the data amount of data coming into the data warehouse has exploded in the last few years given the platform’s surging usage. In this episode, Krisha Naidu, a data engineer at Canva, talks about how his team is making it easier for analysts to get the data they need and the tooling to analyze data. The goal of the data warehouse team at Canva is to maintain reliability, improve access and tooling, and oversee compliance with regulations. At the end of the episode, we also discuss our mutual love for keyboard shortcuts.
A design tool for the rest of us
I never considered myself a “good” designer or artist. I still feel lost sometimes in Photoshop and Figma, but Canva makes the design process super seamless for a newbie like myself. I use the tool at least once a week for a variety of use cases. All the thumbnails on my YouTube channel were created on Canva because I can create a decent design in five minutes or less.
Krishna didn’t use Canva before he joined the company. His first foray into Canva was creating a birthday invitation for his daughter. But he quickly saw the power and potential of Canva after seeing his family members use the tool and a family friend who uses Canva for creating marketing brochures. Once Krishna joined Canva, the scope of the mission became clear to him. It’s not just about making design easy, but also giving people the ability to get their designs seen by the right people. Like many other SaaS tools, Canva has also added more collaboration features as more teams become distributed.
Structure of Canva’s data team
Given Canva’s size (1,500+ employees according to LinkedIn), the data team is quite mature relative to other SaaS companies. They have data analysts, scientists, and engineers.
The data engineering team (where Krishna works) is broken out into three sub-teams:
- Streaming – Internally this team is know as Canvalytics and they focus on capturing all the clickstream data from the product. This team helps Krishna’s team with getting data into the data warehouse.
- Platforms – They manage the data lake and tooling for data scientists
- Data Warehouse – This is Krishna’s team, and they provide tooling for the users of the data warehouse. They also enforce controls and governance of the data warehouse, and their primary business stakeholders are Canva’s data anlaysts.
The data coming into the data warehouse is constantly growing which is a good sign because it means the number of Canva users is growing. On top of that, new product features being added to the platform means more clickstream data needs to be captured and transformed in the data warehouse. To better cope with the expanding data footprint, Krishna’s team has architected some interesting solutions to cope with the company’s growth.
A sandboxed build environment for analysts
When the data team was smaller, it was easier for all analysts to work in the same data warehouse environment. If an analyst made a change to a dataset, then they might work with the data engineering team to roll the change out and that change would be communicated out to the rest of the analysts.
With more analysts, it becomes easier to step on each others’ toes since one analyst might make a change on one dataset (where they are building and testing their models), but then another analyst might be doing another a separate analysis on that dataset. Before you know it, collisions occur and the “source of truth” gets lost as the data team tries to figure out which changes need to be applied to all the data sets.
Krishna and team created mini build environments for analysts so that each analyst has their own sandboxed view of the data to experiment with. If an analyst needs to make a change to a dataset, they would submit a pull request the dev environment and this goes through a bunch of CI/CD checks set up by the data engineering team. This is pretty similar to the software development process (more on this later). Almost 30 analysts will be able to use these new build environments. In a nutshell, these build environments re-create all the schemas and views and clone tables from the data warehouse so that analysts get a quick copy of all the main fact tables.
Inspiration from GitLab
The inspiration for this project came from a company with a full-distributed team: GitLab. When your entire team is distributed, it’s even more important that any changes you make to the codebase are properly tested and communicated to all your colleagues who are working on the same codebase.
The secret to success: efficient cloning
The Canva data engineering team makes use of Snowflake’s “cloning” feature. As I mentioned above, the build environment makes a quick copy of the tables in the data warehouse without the expensive operations normally associated with copying tables. It’s done entirely in the cloud.
Snowflake and other modern data warehousing platforms are revolutionizing the way analysts can access the large amounts of data being produced within their organizations.
Historically, a data warehouse would slow down if there are a lot of users using the warehouse concurrently or if there’s a big batch process taking place. Snowflake separates where the data is stored and processed (different compute resources for load and transform). “Cloning” your dataset means creating a pointer to the dataset. As changes are made to the dataset, you just get the diffs on the data (just like you would when committing code changes to a repo).
Reducing headaches and increasing confidence in the warehouse
It is to be determined on the amount of time that shifting to Snowflake and these new build environments will yield for Canva. The most important benefit of this new architecture is that the data pipeline has become more reliable.
Krishna mentioned that the amount of time analysts might be spending in the data warehouse might increase because they have to run their own tests now on changes they make to datasets. The bigger picture, however, is that analysts and the data engineers that support them don’t have to worry about explaining why a given report is broken (because something in the pipeline broke). We talked about how you might be preparing for a board meeting and the last thing you want to be faced with is a report that won’t update because something in the pipeline broke.
You can’t underrate peace of mind and confidence in your data warehouse performing as it should :).
Crossing over from analytics to engineering
As Krisha explained the new build environments for analysts, it became clear that the skills analysts will need resemble those of software engineers. Data “development” at Canva is starting to look similar to application development. In addition to core analysis and reporting responsibilities, analysts will need to know how to write proper documentation, write and execute tests, submit pull requests, and do peer reviews. These are all practices common in software engineering, not data analytics.
We’ve seen the blend of data analysis and data engineering in previous episodes (see episode 55 and the FirstMark conversation about what the definition of a data analyst is). The fine folks at dbt coined the phrase “analytics engineering” which encompasses a lot of the skills Canva analysts have:
Analytics engineers provide clean data sets to end users, modeling data in a way that empowers end users to answer their own questions. While a data analyst spends their time analyzing data, an analytics engineer spends their time transforming, testing, deploying, and documenting data. Analytics engineers apply software engineering best practices like version control and continuous integration to the analytics code base.Source: dbt blog
Do new analysts need all these “engineering” skills to succeed as an analyst at Canva? Krishna says no:
We care more about the analyst’s creativity and skills in seeking answers from the data.
During the Canva onboarding process, analysts get training on how to do things like submit pull requests and running tests. These skills can be taught. What’s harder to teach is the curiosity one needs to dig into the data and the creativity to tell a data-driven story.
What’s next for the Canva data eng team?
According to Krishna, the data warehouse team should never settle. Krishna believes the team should continue to focus on increasing productivity for analysts.
Other bottlenecks in the pipeline might include getting access to data (e.g. PII data). The team may also want to know what’s coming into the data warehouse so this means getting observability stats on the data coming in. Perhaps the team wants to let analysts know the data might be 3 days old. Then comes all the automation and testing of these notifications so that the rest of the organization is made aware of these “health” metrics. Sounds like the data eng team is going through their own version of continuous development and improvement :).
Another interesting project the team may focus on in the future is not limiting the data warehouse for internal reporting purposes. What if you could surface interesting insights back to actual Canva users? What are the access requirements in this case? As a Canva user myself, I think it would be super interesting to see how my designs are being used and viewed by others.
Productivity hack: keyboard shortcuts
If you’ve followed my blog for some time, you’ve probably seen that I’m a huge fan of keyboard shortcuts (particularly in Excel). In fact, I created a whole class on this subject.
Krishna and I spoke about I podcast I recently listened to about Vim (see the “Other Podcasts & Blog Posts” section below). In the episode, Alex Smith, a software engineer at DEV, talks about how he first learned Excel keyboard shortcuts while working in finance. Then he transitioned to a job in software engineering, and saw how fast his colleagues were with using the keyboard to navigate Vim.
Krishna also uses a few keyboard shortcuts to be productive. He spoke about two in detail:
Other Podcasts & Blog Posts
In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:
- DevDiscuss Season #3 Episode #3: Is Vim Worth Your Time