Cake

Firehose analytics_

A quick run down on firehose analytics, also known as 'collect everything and let growth sort it out'.

more bite sized snacks

Firehose analytics is often used to define an approach to the collection of data for analysis and reporting. Whereas traditionally data, events, and logs are collected at specifically defined points, firehose analytics is the idea to send a large volume of data from multiple sources, events, etc. and then post-process them.

Let's look at an example from a hypothetical e-commerce website, we'll call it The Cheese.

As the Head of Marketing you may want to know how users interact with your site after clicking your ad, so you have the team implement a number of event touch points.

Manually defined events

Let's define a few events that easily turn into a funnel:

  1. Users view the homepage
  2. Users add a cheese to their cart
  3. Users checkout
  4. Purchase completed

This gives a good overview of the user journey, but it's limited to the specific events that were defined. If you want to add more events you need to update the codebase, and you have no way to retroactively examine your data for new insights outside the current events scope.

That is to say if you want to know something new six months after launching your shop it would have to fall within the events you defined, or you won't be able to see it until you start collecting data for it.

Your six months of data from selling cheese basically counts for nothing. The horror.

Firehose analytics

Let's see how we can augment our flow with the concept of firehose analytics.

  1. All click events
  2. All page views
  3. All form submissions
  4. All errors in the console
  5. All scroll events
  6. Responses from our third party payment provider
  7. Popular cheese trends

This "noisy" but voluminous data is a very basic example of a firehose.

The analytics platform now gives us the ability to define new events on data that has already been captured, for example:

  • The page load time directly correlates to cart abandonment (Page views)
  • Scrolling to see reviews increases the likelihood of a purchase (Scroll events)
  • How likely a user is to leave a review post-purchase (Form submissions)
  • An error on the API is a strong indicator of abandonment (Errors)
  • How many cheeses are viewed on average before a purchase (Click events)
  • Users from particular regions have a higher incident of failed payments (Page views/Payment provider responses)
  • The most popular cheese is not the one with the most views on your site (Popular cheese trends)

Because this data was always being collected we can apply these post-hoc event definitions to our historical data and see insights over a larger period of time.

This "firehose" of data is a powerful tool for understanding your users and their interactions with your platform.

Just because you collect large amounts of data doesn't prevent you from having focused funnels and well-defined events within your analytics platform.

The above is a very basic example of incorporating a data firehose, in larger and scaling companies data from a firehose forms data warehouses and data lakes, and the analysis of this data is a significant part of the business which makes up several roles such as data engineers, data scientists, and data analysts.

In startups and smaller companies, this is often the responsibility of the engineering team, and the data is often stored within a single external analytics platform such as Amplitude, Mixpanel, or PostHog.

You should be of course collecting everything you possibly can.

If you're looking to implement analytics collection in your own platform, I've been using and enjoying PostHog (not sponsored, just a fan).

You were lucky reader number 145 🎉