Data and Analytics: Where VCs invest, Trends and Questions through +150 VC transactions across 2017–2020. Apparent simplicity, underlying complexity?

Emmanuel Cassimatis
11 min readDec 18, 2020

In short: the below is about the apparent simplicity and underlying complexity of data and below you will find a visualization of the various components that work together to deliver data, data management, AI and Analytics, an analysis and short history of data through +150 VC transactions mapped from 2017–2020, and a summary of some possible trends that may continue shaping data in the coming years.

Specialist views and inputs were requested for this and special thanks to the select CDOs who were interviewed and the renowned VCs who provided inputs, among others: Anders Ranum at Sapphire Ventures; Alex Ferrara at Bessemer Ventures; Asheque Shams at General Atlantic; Ari Helgason at Index Ventures ; Will Sheldon at Accel; Suranga Chandratillake at Balderton; Judith Dada at La Famiglia ; Jan-Hendrik Buerk at BtoV; Andre Retterath at Earlybird, and others not named.

Every once in a while, comments made by key decision makers will work their way into long-lasting reflection and analysis. A ceo of a large multinational company recently told me, in an amused and troubled way “Data, all we hear about is data. It seems all that businesses can do now is try to tackle this new beast…”.

It seems indeed that data is everywhere, our lives revolve around it, bathe in it. It is a strange subject which, as one explores it, reveals itself in its never-ending patterns and implications, almost like a fractal picture. Indeed, a simple interaction, a simple email, generates a data flow — sure. But it also generates a trail on the data exchange, which itself gets categorized and recorded. And the insights generated on the recording of this data trail, may itself be rearranged and generate new data. Data generates data, metadata (data defining the data), which itself is categorized, generating more metadata, data and insights. Apparent simplicity and underlying complexity? I asked myself how the world had gotten where it is with data, and where this was going? It seemed one would need to:

  • First, visualize which are and how the various components of data work together
  • Then, go through a brief history of data — for instance an analysis of relevant VC transactions from 2017–2020
  • And finally isolate some of the underlying trends that may continue shaping data and data management

Data matters. How do the components of data work together? From extraction to delivery: apparent simplicity, underlying complexity?

(A short introduction / please skip to the schematics if needed): some years ago, the phrase ‘data is the new oil’ was coined. At the time the phrase most probably referred to the economic value that lies in both oil and data. But the analogy can go beyond.

  • Data production and consumption is booming. A few amazing statistics: every day 500M tweets are sent, close to 300Bn emails sent, 5 Bn searches are made and 2.5 quintillion bytes of data is produced (1 quintillion has 18 zeros!)
  • Data has geopolitical importance. Where the pipelines run, how the underlying is transformed, where the output goes, how it is exchanged, etc. — all that matters to countries, governments, individuals, societies
  • Data comes from different sources and with different flavors. It is heterogenous, with content variations
  • Data requires complex transportation. From data connectivity to data pipelining, to data streaming, etc.
  • Data needs transformation. Once extracted, it needs to be cleansed, harmonized, refined, etc. Some of this requires complex engineering (AI, etc.)
  • Data is consumed/analyzed in different ways. Some may need to monitor, track and continue taking decisions. Oil too is consumed in different ways, from transportation to material in chemical industries.

All this makes for a very interesting underlying. In short — data matters. It matters and the whole organization from input to output, from extraction to delivery has become quite complex.

An interesting observation came in a recent exchange with a CDO of a large retail company customer of SAP: “while data may seem like a continuous homogenous entity, actually it is the result of many components working together. And while some companies attempt to simplify some of these components to help manage the data lifecycle, the overall picture remains quite complex. The job of the data engineer has become, at the same time, easier and harder”.

And indeed, several components in the data journey are at play. Below is a visualization to illustrate how some of these components work together. Side note: yes of course SAP covers the whole data journey through different products from Hana and Data Services, to Data Warehouse Cloud, SAP Data Intelligence, SAP Analytics Cloud, etc.

The data journey and the components working together to deliver data
The data journey and the components working together to deliver data

Note: there may be differing views on how some of these components pan out on the picture, or that some components may span various categories. Of course, the picture can be streamlined for individual companies or spaces.

In short:

  • Inputs: data comes in various shapes and flavors, structured and unstructured and a first challenge is to make sense of it all. Technologies can now cover most data sources and it is increasingly possible to find structure everywhere, including in unstructured data, e.g. labelling and categorizing pictures, automatically sorting key moments in a movie, etc.
  • Data Management: then comes data management, from data ingestion and connectivity, to workflow management, pipelining to redirect appropriately, preparation, connection with real time data, transformation, etc. Metadata management remains key to ensure right categorization and content management.
  • Data storage and processing: once extracted and transformed, the data needs to be stored. Due to its high importance, storage has needed to evolve too. Datalakes and data warehouses used to compete on different territories and may now be converging in the way they treat and access data. At this stage, AI intelligence and collaboration is already taking place, with specific needs around storage and usage of data. Of particular importance nowadays, is the use case behind usage of particular data which governs storage too: analytics? insights through AI? Real time analysis? Etc.
  • AI intelligence: Intelligence is then generated, potentially on the back of algorithms sweeping through data and partially automated insights. Once stored, the data is analyzed, and insights are derived.
  • Output and Analytics: finally, data is turned into insights, either in a static (dashboard) or augmented manner (automated generation of insights, or in an augmented delivery), for current or future, predictive, analytics.

Now, for greater perspective on the trends, it is useful to have a look at a brief history of time in Data and Analytics through some of the main relevant VC transactions across 2017–2020. How has the world gotten here, where is it going?

A brief history of time in Data and Analytics through +150 VC transactions across 2017–2020
A brief history of time in Data and Analytics through +150 VC transactions across 2017–2020

Below is the output of an analysis of some +150 large and representative transactions in the D&A space (Data, Data management and Analytics). Were considered relevant transactions in private companies >$10M. A few select M&A transactions are mentioned, but for information purposes only and the study otherwise excluded M&A and listed companies. Obviously, this is not a representation of the whole universe of relevant transactions, but it is deemed to be a good representation and overview of the landscape to visualize some of the shifts and trends that have taken place across 2017–2020:

A brief history of time in Data and Analytics, through +150 representative VC large transactions, 2017–2020

Note: there may be differing views on where some of the companies may fit. Also, obviously in some cases, some companies span several categories. For instance Dremio spans several categories. And BigID could span several too. But the categorization was deemed ~ok by most VCs.

Some trends that one may visualize: convergence of data platforms, simplification of complexity and the increasing importance of collaboration.

Several trends can be seen changing the landscape, adding improvements or simplification, and were highlighted by the VCs interviewed:

  • Convergence of data platforms. Some of the top VCs are wondering if the data platforms space is not undergoing some convergence. Unified cloud data platforms keep rising, the likes of Snowflake or Databricks providing data warehousing or datalake operations. Some are now wondering if data warehouses and datalakes may not be coming together. It seems that while unstructured data can now become somehow structured (with labelling, categorizing, etc.), structured data can also be treated close to unstructured data. In this new world, the focus would be on the use cases rather than the types of data processed. In addition, the same may be happening with AI and collaboration which remain top of mind but also seems to be attempting to complement data provisioning capabilities and some are hence wondering the space may also converge with the data platforms one (companies including DataRobot, Dataiku, Domino, H20, etc.). Finally, the same may be happening with real-time and continuous intelligence platforms, companies such as Confluent, C3iot, Samsara, InfluxDB, etc., to connect real-time and IOT data with current data platforms. Overall, it may be that the focus will increasingly be on the types of use cases rather than the types of data processed.
  • Connectivity, data integration and workflow management — increasing importance and funding: transactions and feedback from VCs seems to show strong increased interest for management of data flows. It seems much has yet to come: customers are experiencing many challenges to connect, integrate, push data ‘in’ and ‘out’ (the ‘out’ part especially seems to be more difficult). This makes complete sense when one realizes the variety in data connectivity options — things are far from the sometimes-advertised simple API connection to exchange data. Companies there may include Fivetran, Adverity, Postman, Workato, Prefect, Snaplogic, Tray.io, Astronomer, etc. (Note: It is interesting to note that while this may enable business users with few technical skills are getting access to platforms, sometimes low-code to realize self-serve operations, at the other end of the spectrum lies the acceleration of data operations, which targets technical users/data engineers to help them realize data operations faster — both trends are going together, tackling different users)
  • Special Purpose databases — increasing amount of funding — on the rise? This category remains heterogenous and sometimes hard to sell. Relational DB are complemented by Graph, or distributed, or other types for select usages. VCs are overall excited about this area, as the potential for either complementarity with existing solutions, or better, disruption of existing players, is high. This includes companies like Rockset, Scylla, Couchbase, Cockroach labs, Yugabyte, Neo4j, etc. Open source innovation is also gaining traction with technologies including those from DBT, MariaDB, Yugabyte, etc.
  • Automation/Process Mining and RPA — have the winners been chosen? Process mining and RPA are large important areas underlying digitization for many companies. However, funding amounts seem to have decreased in this area. Companies there include UiPath, Automation Anywhere, Celonis, etc. Has this area fallen out of grace, or have on the contrary the winners been chosen? May this be because a wave of IPOs, M&A or innovation is preparing?
  • Data Privacy — best years still ahead of us? To many VCs, Privacy management is an exciting strategic space that can change the data landscape. It seems promising. Likely due to the technicality of the field, few players are making the market with different offerings, including BigID, Onetrust, Privitar. Sometimes they are enabled by governance and data lineage. Overall, VCs agreed this trend is likely to be felt for decades as regulations (GDPR, HiPaa, etc.) and data structuring requirements will likely continue to become more complex and demanding and create customer needs for finding and controlling privacy data.
  • Business analytics and augmentation — increasing numbers and high M&A activity: A very interesting market and field which would be top of all categories if M&A were accounted for. Indeed, in 2020 Looker was acquired by Google for $2.6Bn, in 2018 Datorama was acquired by Salesforce for $0.8Bn. Some VCs think the wave is now on its mature stage and that it is easier to switch between providers now. But some others think on the other hand that augmentation of analytics may lead the way into a renewed wave, either through various input modes, or through automated analysis of data insights, etc.. Companies include Looker, Thoughspot, Adverity, Sisu, Sisense, etc.

Here some other trends that appeared too and were also mentioned by VCs:

  • Acceleration of data operations: This is an interesting trend that is significantly growing. As data grows exponentially, a trend is emerging to help data operations specialists operate faster. This is a space hyperscalers are attempting to own but will they be able to do it themselves or through acquisitions? Few companies in this space seem to be able to make it past $40M ARR — this may be because they are tackling specific use cases or markets owned by hyperscalers, or get acquired. In this category are found companies like Matilion, Fishtown Analytics/DBT, Starburst, Dremio, etc.
  • Data governance and lineage: VCs were excited by this area which is also concentrated due to the technicality of the subject. Cataloguing, governance and data lineage revolve around the understanding and mapping of the flows of data and metadata, and are often enablers to fields, such as AI intelligence, systems productivity, etc. One indeed needs to understand the flows of data to imagine technology improvements, or apply AI algorithms, or figure out how to scan or relate data. Companies includes Alation, Collibra, Manta, Immuta, Okera, etc.
  • Data Prep/reliability: Data preparation and reliability was deemed an important topic. However, the capabilities were said to be often integrated into larger platforms, or the companies just get acquired quite fast. This may explain why the category has so far remained in the bottom tier of transaction sizes, with companies like Tamr, Trifacta or Scale.ai among others reaching interesting round sizes.
  • Enablement of Conversational analytics: VCs mentioned that this is a category that is deemed and may still be promising. After all, there could be a next generation of data management and analytics solutions which could be fully enabled by bots and voice analysis. But overall, the space has taken time to grow and some VCs wonder if the market will be as large as one may think it was going to be.
  • Monetization of data exchanges: There have been many discussions about this topic, and it seems there is consensus among the VCs and corporates that someday this will be a very large market with massive ROI. The question is when, as exchanging semi-anonymized data to monetize datasets is not so easy or widely done. Some large players have an offering (e.g. Snowflake), and it is question if smaller specialized players will emerge and whether corporates will actually really want to monetize their anonymized data, or see their data as their or their customers property. But the potential value is high, as anonymized data sharing may enable algorithms to find clues and deliver new insights.

That is all folks — so overall, data matters and is strategically important due to its complexity and importance. From extraction to delivery, it seems there is apparent simplicity but underlying complexity. Innovation is high to try to simplify complexity at all stages, from inputs to data management, AI insights and analytics and finally outputs. Analysis of +150 representative VC transactions over 2017–2020 revealed several trends: among others, and highlighted by VCs: data platforms may be converging (data warehouses and datalakes, AI data platforms, continuous intelligence are connecting their data assets), with data collaboration a priority; Process mining and RPA is attracting less capital but it may be because the winners have been chosen or a new wave of funding is preparing; Special purpose databases have seen increased interest for special data use cases; Connectivity, data integration and Workflow management is seeing increased interest and funding to facilitate data management, and Privacy Intelligence may have its best years to come and expand into adjacent areas, enabled by Governance, Cataloguing and Data Lineage. Finally, more specialized trends are continuing to shape the landscape, including Acceleration of data operations for data engineers (and at the same time democratization of data access to business users); or Data Preparation and Reliability. On the earlier side of the trends, VCs noted anonymization and synthetic data management, as well monetization of data exchanges as interesting coming trends.

The sum of it all: data matters; the market is moving fast, with significant funding, and increasing focus on data platform convergence, simplification of complexity and data collaboration! And under the apparent simplicity lies underlying complexity. And where there is complexity, there is room for improvements, ROI, VC funding and returns! 😊 Thanks to all for the contributions and for reading.

--

--

Emmanuel Cassimatis

Investments in early stage software B2B companies for SAP in Europe, former entrepreneur and VC/PE, writer of two books, tech enthusiast, angel