Dear Analyst #101: How to invest in modern data startups with David Yakobovitch

Outside of Excel, you’ve seen and heard multiple data platforms on this newsletter and podcast. Everything from commercial data platforms to open-source platforms driven by communities. In this episode, you’ll hear the other side of the data platform ecosystem. David Yakobovitch is a general partner at DataPower Ventures, a venture capital firm that invests in early stage data science, applied AI, and machine learning startups. I don’t normally hear or read about the investor’s perspective in the data space, so this episode was quite the learning opportunity. You’ll also hear about some of the data startups David’s firm has invested in and what their unique value propositions are.

Mainframes, Tableau dashboards, and the modern data stack

David started his career in actuarial science and finance information systems. He originally worked at Aflac on their mainframes. At the time, the “modern” data stack included tools like Qlik, SAP Crystal Reports, and of course, SQL. David eventually moved to Tableau and was building dashboards for his team. After stints in the banking world at Citi and Deutsche, David moved to NYC and started working a lot with Python and R. He was a lead data science instructor at General Assembly and eventually landed at Galvanize as the data science team lead. I’ve been an instructor and have gone to events for both GA and Galvanize and encourage you to check out both organizations if you would like to up-level your data skills.

Galvanize co-working space. Source: ERIC LAIGNEL

David currently works full-time at SingleStore as a senior manager of technical enablement. Saying he works “full-time” for SingleStore is not an accurate characterization of what David does day-to-day since he wears many hats. He is also runs a venture capital fund called DataPower Ventures and hosts an artificial intelligence podcast called HumAIn. As a believer in side hustles, I think David shows no side hustle is too small or big to take on!

Evolution of the modern data stack

David shared his perspective on the modern data stack and the key takeaway is (surprise surprise) Excel is not going anywhere. Old and new platforms still have integrations with Excel. David rattled off a few including Refinitiv, Quickbooks, and Bloomberg. As an analyst, you have so many tools in the data stack that allow you to work with data. With ETL or ELT, you can import/export your data tables and schemas into another tool (like Excel) to do the actual analysis. This is where tools like Fivetran and dbt really shine to help you get your data into the right destination. The data can be in a low-code tool where you drag-and-drop tables and schemas or even in a Jupyter notebook. Once the analysis is done, you have visualization tools like Power BI and Looker to help you communicate your findings.

The above modern data stack diagram comes from David Jayatillake’s substack newsletter. To hear about other tools in the data stack, I’d recommend listening to David’s episode or this episode with Priyanka Somrah from Work-Bench.

David also brought up an interesting observation about how data analysts are viewed at different companies. For instance, McKinsey typically views data analysts as strategists who help solve customer problems. At Lyft, data analysts are treated more like data scientists where you’re scripting and building automations in your data workflows. What is the definition of a data analyst at your company?

Becoming a VC investor

When David first moved to NYC 7 years ago, he attended Founder Fridays. Founder Fridays is a meetup where a founder of a company has an open chat with the Founder Fridays community of other founders. I attended these meetups a few years ago and it’s refreshing to hear candid stories from founders about the ups and downs of running a startup. A lot of startup meetups just focus on how a founder is crushing it without giving air time to the parts of running a startup that suck.

David was meeting founders through Founder Fridays and Techstars and had the technical skills to help these founders on the data and tech side. The next logical step was to start coaching and investing in these startups. The average check size at DataPower Ventures is $250K and the fund helps bring startups from the accelerator phase to their Series A. DataPower also helps its portfolio companies with scaling data pipelines, hiring, and basically whatever is required to help the startups succeed. The portfolio consists of 30 companies in the AI and ML stack who are mostly based in NYC. We then chatted a bit about some of the companies in the portfolio David is excited about.

OpenAxis

OpenAxis is a no-code tool for building data visualizations. The problem data analysts face is that visualizations are challenging to create when you’re building from scratch. This is typically the case when you’re building visualizations for your team in Tableau and Looker. What’s neat is the community that OpenAxis is building. The community can submit visualizations to the platform so if you need to a template to build something great, you can find something pre-built. They’ve started seeing some Substack writers include their visualizations in their newsletters.

Nomad Data

Nomad Data anonymously connects buyers and sellers of datasets. They typically work with quant funds and finance shops who are looking to get an edge in the market through unique datasets. Nomad Data’s value proposition is for use cases where data is “sparse.” Let’s say you need a dataset on telecommunication providers and the dataset has 500 columns with 5 billions rows. Every cell of this table might not have data in it which means you have incomplete and bad data. Through AI, machine learning, and human recommendations, your table will get filled in with high quality data.

I’m not sure why, but this reminded me of that episode from Billions in season 2 where Axelrod and Taylor are trying to figure out which microchip company Krakow is investing in. Taylor figures out that the Chinese microchip company is faking trucking activity into and out of their warehouse. That activity is captured by satellite images which hedge funds analyze to see how business is going for the company. Without high quality data, funds will make bad investment decisions even in this fictional example :).

What makes a good data startup?

We concluded the episode with what David and his team look for in data startups. Here are a few of the criteria:

At least one of the founders should be technical
The product must be commercially viable (e.g. it has to make money)
They don’t just invest in an algorithm
Tech has to have some real-world application
Founders are relentlessly curious

One could argue that most of these bullet points are what VC funds look for in any startup. I think the big difference is that the data industry is growing quicker compared to other industries and there is a lot of crossover with other industries like the cloud.

Speaking of the cloud, David mentions that data startups should be cloud first. Since customers are already on AWS, Azure, or GCP, you don’t want to force the customer to move off of their data stack. David believes that in the previous 20 years, people were building software, infrastructure, and developer tool companies. The next 20 years is all about building tools and technology for the data stack. In this new world where data is everywhere like The Minority Report, all that data will have to be stored and analyzed somewhere. And I guarantee in that world, Excel and Google Sheets will still be around.