CDPs vs. Reverse ETL: Understanding the Differences and Choosing the Right Solution

SHARE

26/06/2024 

Author:

Head of Marketing at Relay42

Anthony Botibol
Head of Marketing 
Relay42

For over a decade, marketing teams struggling to access their customer data for better personalized and cross-channel marketing campaigns have turned to Customer Data Platforms (CDPs)

While there are various CDP solutions to choose from, they all primarily provide the core capability to unify demographic, behavioral, and transactional data from fragmented systems across the business to create unified customer profiles. These 360° profiles enable the marketing team to leverage the full breadth of data to better understand their customers, improve audience segmentation, and deliver more tailored cross-channel marketing programs.

While these Customer Data Platforms provide business value for the marketing and commercial teams to eliminate issues related to data silos, they are not a trivial investment due to the cost of storing, processing and activating millions or billions of data events. Furthermore, with many businesses turning to cloud data warehouse solutions to store a lot of the same data, it is logical for the business to question whether they should be paying to store, process and query that data in two places. Ironically, the CDP that was designed to eliminate data silos starts to be seen as a data silo in itself!

In recent years, the idea of a ‘Composable CDP’ has therefore gathered pace, where effectively the cloud data warehouse is utilized as the single store of raw customer data, and then an application can sit on top of that data store to transform that raw data into unified customer profiles and then send the profiles to end points (eg. marketing channels, BI tools, CRMs,...). 

The process for extracting data from the data warehouse to accomplish this is called Reverse ETL (Extract, Transform & Load) and, in theory, negates the need to invest in a ‘Packaged’ CDP to achieve the same result. Composable CDPs therefore fulfill the ‘core capability’ to ensure the marketing has access to 360° profiles, but with the added advantage of not paying to store customer data twice.


It sounds like a no brainer at a high level, however in this article, we will dive deeper into some of the key differences between investing in a Packaged CDP or a Composable CDP that utilizes reverse ETL, with the goal to help you determine which solution best fits your business needs today and into the future.

What is Reverse ETL?


Before we get into Reverse ETL, we should first agree on the definition of ETL. ETL is the acronym for ‘Extract, Transform & Load’. This is ultimately a process for extracting data out from one system, transforming the shape of the data to make it ready for another system to accept, and then loading it to that second system. 

While ETL is a process that can be defined for getting data from any one operating system to another, over time it has become synonymous for getting data into a data warehouse. Reverse ETL is therefore the same process but going the opposite way to get data out of the data warehouse - thus coined as ‘Reverse ETL’.

The reason that Reverse ETL has only recently become a talked-about solution is that traditionally data warehouses were not built for activating data into other operational systems like marketing channels. They are primarily built to replicate and store data in one place and then used for analysis, reporting and insights. With all the data in one place it prevents different departments from generating conflicting insights and therefore a single source of information to make decisions from.

Querying data from the data warehouse would normally require some SQL knowledge that traditionally marketers don’t have, and when attempting to use the data warehouse for real time marketing use cases (e.g. website personalization), the data is not made available fast enough – at best, you may be able to get data out in 10-15 minute intervals by which time a website visitor may have already left the website.

Composable CDPs utilizing Reverse ETL therefore make the data available to marketing teams without needing to learn SQL, use identity matching to build unified profiles and will send the profiles to the marketing tools that need them. The 10-15 minute latency issue remains, but for direct marketing channels where that latency is more than acceptable it provides huge value. It also provides more control for the IT department who typically owns the data warehouse investments.

Packaged CDPs or Reverse ETL – The Advantages and Limitations  

So if a Reverse ETL solution, in combination with a cloud data warehouse can provide the same core capabilities as a CDP, why go and invest in a CDP at all?

Well, the devil is in the detail, which we’ll now breakdown here:

Data Storage Costs


The cost benefits of a Composable CDP using Reverse ETL can be misleading. While it may initially appear cheaper by only paying to store data in one location, the reality is that querying the data warehouse frequently, especially for large volumes of website traffic, can incur significant costs.

Indeed, the millions or billions of clickstream events that are occurring on your website continuously are often not stored in realtime and based on how data warehouse costs are incurred, by repeatedly running live queries on the data to continuously personalize a live experience, the costs will inevitably rise. So while a Composable CDP might be half the cost of a Packaged CDP investment, the increased cost of data storage and queries on the data warehouse will either make the decision cost neutral, or higher in the long run.

Real-time CDPs are architected in a very different way (in the case of Relay42 it’s a NoSQL database) to capture, process and activate real-time events in an optimal way that doesn’t incur the associated costs that a data warehouse would.

Zero-Copy Data


Composable CDP/Reverse ETL solutions will typically talk about ‘zero-copy data’ to emphasize that you are not copying data from the data warehouse into another database but instead using the modern speed and security of APIs to get unified customer data from the data warehouse to your engagement channels. 

However, the need for copying or caching data is usually for a necessary purpose, and in the case of digital marketing it provides enormous value for powering optimized real time experiences. 

In the example of a website, caching happens at many levels including an individual’s browser and hard drive with the aim of shortening page load times. This way, the next time the user loads the page, most of the content is already stored locally and the page will load much more quickly.

So while it indeed does not make sense to copy every byte of data from a data warehouse to a CDP, or indeed replicate data in multiple systems, the need for copying or caching data outside of the a data warehouse is indeed necessary where real time use cases related to digital marketing are important for the marketing team.

Real-Time Marketing Use Cases and Latency issues

Data Warehouses are SQL data stores that inherently come with latencies for extracting data based on the fact that their primary functions are reporting and data analysis. The latency of cloud data warehouses, although much better than traditional data warehouse solutions, also means Reverse ETL solutions often lack the real-time capabilities needed for immediate customer interactions, which are crucial for effective digital marketing strategies.

While CDPs originally were created for known customer marketing (e.g. cross-sell, upsell and retention programs), there is generally around 95% of website visitor traffic that is unknown, and which can be activated to increase customer acquisition if the data is being utilized. 

Many real-time packaged CDPs today will be creating (and activating) unified 360° profiles of anonymous website visitors in under 200 milliseconds to be able to deliver personalized experiences for returning visitors (eg. personalize the website based on past browsing behaviors) and/or activate into personalized paid and social media, as well as retargeting campaigns. Each time that visitor visits a new web page, abandons the site, returns and takes an action, an anonymous profile is updated and automated logic will continuously use that information to better tailor the content or offer they see.

And when those anonymous visitors do identify themselves, the clickstream history is then also highly valuable when attached to a known profile or ID. This can deliver huge value for businesses where consumers are increasingly shopping online across a huge number of channels, affiliates and social platforms.

Where anonymous behavioral data is traditionally not flowing into the data warehouse, due to the sheer volume of data that needs to be processed – leading to potentially eye-watering costs – the Composable CDP utilizing Reverse ETL will not have access to that data at all, or if it does then will encounter too much latency to enable an in-moment personalized experience that could be the difference between a conversion or not.

Ancillary CDP Capabilities Beyond Unified Customer Profiles

Packaged CDPs can provide a wide range of ancillary capabilities (for example, Journey Orchestration, Predictive AI modeling and a host of real-time marketing use cases) that are designed to be used in the ways that marketers are used to (eg. a drag & drop UI, or wizards and templates instead of writing SQL code). So while the creation of unified customer profiles is a common benefit of all CDPs AND Reverse ETL solutions, marketers also need the ability to make unified decisions across their many engagement channels too. 

Journey orchestration enables a business to make customer-centric decisions by considering all channels as part of a campaign or journey, rather than replicating those decisions across many engagement platforms. Similarly, pre-built AI models put data science into the hands of marketers to add real-time AI predictions into always-on journeys and campaigns without needing to rely on bottlenecks in data science or IT teams to build, train and deploy models manually over many months.

Of course, businesses already utilizing multichannel marketing hubs, marketing clouds or journey orchestration applications, and who may already have built and optimized some AI and ML models, may decide that the only missing piece in their technology stack is to deliver those 360 profiles and therefore a Reverse ETL approach may better suit them.

These businesses  therefore can reduce costs further by not paying for the additional functionality in a packaged CDP that they don’t need - ultimately delivering a more composable solution. Businesses may also choose to acquire all these functionalities across multiple vendors and technologies and piece them together.
 

Summary of Pros and Cons for Marketers

Relay42’s Hybrid CDP Approach

Relay42 provides a hybrid approach, combining the benefits of CDPs AND data warehouse native CDPs. This solution ensures cost-effective data management in both data platforms to keep data warehouse costs from rising, ensure minimal data copying or caching, while also supporting real-time marketing capabilities, unified journey orchestration decisions and easy-to-deploy AI models. 

Relay42 uses Change Data Capture (CDC) to identify and capture changes made to data from the data warehouse and only deliver those changes back to the warehouse if they’re necessary. With a highly flexible data schema and our proprietary multi-key identity resolution, Relay42 takes data from the Data Warehouse without the need for the data to be transformed (just extract and load without the transformation process!) In parallel, it powers all the real time marketing use cases without needing to start putting live clickstream data into the data warehouse either. 

It ensures that the marketing team always has access to the data they need to personalize an experience when they need it, while also benefiting the wider business to maintain data warehouse costs at an optimal and predictable level. It puts less strain on the data warehouse to try and achieve use cases it’s not optimal for, while providing all the flexibility and cost control that businesses are looking to achieve.

The built-in journey orchestration engine of Relay42 allows marketers to integrate unified customer profiles across any channel without the extensive costs of running a multichannel marketing hub or marketing cloud solution. While a Reverse ETL solution may be a cheaper alternative in year 1, it does not provide ancillary capabilities and therefore requires the business to invest in additional tools and micro-services to achieve the same level of functionality - inevitably leading to higher costs to the business overall. 

As a packaged CDP that already optimizes how data is stored and activated between Relay42 and a data warehouse, and with overlapping functionality that can provide lower costs and complexity than a marketing cloud, Relay42 is a more cost-saving option in the long run. 

Additionally, Relay42 offers AI propensity models and insights tailored for marketers, a feature often left to data science in composable solutions. This marketer-friendly approach ensures that marketing teams can leverage advanced analytics and models without relying solely on data science teams.


 

As depicted above, as opposed to a pure Reverse ETL approach, unified profiles (known customers AND anonymous prospects) are still created in the CDP, AI prediction scores are applied to the profiles, profiles are placed into journeys rather than sent to endpoint marketing channels and profiles are then synced either back to the data warehouse or data lake depending on the most optimal way to do so from both cost AND requirements perspectives.

At Relay42, we believe this is the optimal way to store, activate and unify customer data in an enterprise business.

Conclusion

This is not the first time that businesses battle it out to decide if they can achieve what the marketing team needs through the data warehouse – this is often a marketing vs. IT department decision. 

But, the confusion is rife at the moment with a lot of buzzwords and deliberate marketing messages that businesses need to better comprehend to make informed decisions that will not lead to unbudgeted data storage costs later down the line. 

Reverse ETL and Composable CDP solutions appear attractive at the outset and comparatively cheaper side-by-side, but ultimately the cost will be simply shifted from your CDP to the Data Warehouse over time. If you want to make use of the 95% of anonymous traffic on your website for revenue uplifts, and maintain costs that can be forecast over time, a Change Data Capture hybrid approach will suit your requirements and budgets in the long-run.