Snowflake Data Clean Room

Snowflake Data Clean Room

“Half the money I spend on advertising is wasted; the trouble is I don't know which half.” - John Wanamaker, bit.ly/3tz4FOi .

With browser support for 3rd party cookies coming to an end soon advertisers must find new ways to target their advertising spend. Enter the data clean room, here we will explore how we can use native Snowflake technology to achieve double-blind joins to improve advertising efficacy.

First off…

What is a Data Clean Room?

In simple terms, a data clean room is a secure zone between advertisers and publishers to match their respective customer/user data to measure the potential effectiveness of the advertiser’s advertising spend; and without divulging any sensitive and private details from either party.

No alt text provided for this image
Data Clean Room use case, secure data sharing masking PII content and quasi-identifiers

Snowflake technology for Data Clean rooms

Snowflake’s proprietary technologies can be used to build data clean rooms between two or more Snowflake accounts and what’s more no data is moved, and the analysis is performed on live data. The Snowflake technologies in a data clean room are:

No alt text provided for this image
  • Secure Data Sharing enables sharing selected tables, views, and user-defined functions in a database in one Snowflake account with other Snowflake accounts. We will use data sharing to expose user tables secured by row access policies, available column attributes from the secured tables and clean room request statuses
No alt text provided for this image
  • Streams & Tasks. A stream object records data manipulation language (DML) changes made to tables and a task object is a scheduled SQL code to run on a time-based schedule. We will use the combination on a shared table from the advertiser called “query_requests” to poll for new entries in that table
No alt text provided for this image
  • Row Access Policy (RAP) is a schema-level object that determines whether a given row in a table or view can be queried by a Snowflake role or account. We will use RAPs to secure publisher user data by way of an approved query list (a mapping table). Only an approved query from the advertiser will return query results from the publisher.
No alt text provided for this image
  • A Stored Procedure enables users to create modular code that can include complex business logic by combining multiple SQL statements with procedural logic. We will use stored procedures to,

a.    generate query requests in the advertiser side, and

b.    poll for and validate new requests on the publisher side

Validated requests will be executed on the advertiser side with results persisted to the advertiser’s Snowflake account.

No alt text provided for this image
Data is not moved, only securely shared to allow for approved queries

Snowflake is multi-cloud meaning participating parties in a Snowflake data clean room can participate from a mix of AWS, Azure and GCP platforms! This functionality is extended using Snowflake data replication between platforms but still follows the same data clean room deployment that will be highlighted below.

Data Clean Room Flow

Now let’s combine the above-mentioned elements into a Data Clean Room Setup and then the Data Clean Room Flow.

Additional iconography:

No alt text provided for this image
Data Objects and their Shared view of those data objects

The set up

No alt text provided for this image
Two-party clean room setup

The publisher shares

  • Available values – a column list of available values that can be used with the approved template
  • Query templates – the approved list of templates
  • Request status – after the completion of a query request from the advertiser, the status is shared back to the advertiser
  • Publisher data (live) - protected by the Row Access Policy, only approved queries can be used against this data

The advertiser shares

  • Query requests – the publisher polls this table for new requests every minute

Note that although the animation below articulates data movement between a table and the shared instance of that table between Snowflake accounts, that is only to draw your attention to where data flows in a data clean room request. Data sharing in Snowflake is live, the instant data is populated in a table that is shared it is instantly viewable by the other Snowflake account the share and table has been granted to. The Row Access Policy on the shared publisher data table ensures that the live content is only viewable by approved queries.

3 easy steps

1. Make Clean Room Request – advertiser makes a request via a stored procedure to execute a clean room query. The request is lodged in the “query request” table and instantly shared to the publisher.

No alt text provided for this image

2. Validate new Request – a stored procedure on the publisher’s Snowflake account is polling for new requests and finds one, validates it as an approved query (in this example) and inserts a fully formed query into the “approved queries” table. It will also insert a record into the “request status” table denoting the status of the request. Error messages (if any) are inserted here too however in this example the query is successful. Request status is instantly shared back to the advertiser.

No alt text provided for this image

3. Generate and Execute Validated Query – a stored procedure on the advertiser’s Snowflake account is polling for new request statuses and if successful executes the approved query on the advertiser’s Snowflake account combining advertiser and publisher content to produce a result. The analysis here is deriving what the audience overlap measure is between advertiser and publisher.

No alt text provided for this image

Voila!

The architecture pattern is repeatable, meaning that a consumer of a data clean room can become a provider by implementing the same pattern in reverse and following the data clean room flow as depicted above. This gives rise to a self-service ability for your clean room implementation by simply reusing the same deployment pattern with more and more partners and ensuring no plain text or sensitive data is shared!

Ultimately the responsibility for ensuring that the approved queries do not reveal sensitive 3rd party data is with the publisher in collaboration with the advertiser. And this can be tested by the publisher and templatized for use by the respective advertiser.

No alt text provided for this image
a repeatable pattern means unlimited implementations

Templates & Applications

Data Clean Rooms can be used for more than just for measuring advertising effectiveness! Essentially you can develop query templates to perform:

  • Audience Overlap: as demonstrated above, the value in finding an effective matching strategy without exposing participating party data.
  • 3rd Party Enrichment: the above framework essentially is executing approved queries, a query can in fact be designed to return non-sensitive plain-text data from the publisher (for example) to enrich the advertiser’s own data.
  • Campaign Attribution: measuring campaign effectiveness after the campaign itself was published.
  • Lookalike Analysis: adding features (demography, interest, purchase intentions etc…) needed for feeding your machine learning model audience.

Now, 3rd party matching can be defined by non-identifying attributes such as email address or a combination of other attributes; in a cookieless world there are (and will be) specializations on creating a unified non-identifying identity. Some solution examples of these include:

And we will close off with Snowflake’s value proposition for Data Clean Rooms, Snowflake contains the building blocks you can use to build your own Data Clean Room to be as robust, scalable and flexible as you design it. The above is a reusable framework where you can use Snowflake’s native technology to enhance your Marketing Analytics. To self-start and experiment on your own Snowflake Data Clean Room see the links to QuickStarts and tutorials below!

References for further reading

The views expressed in this article are that of my own, you should test implementation performance before committing to this implementation. The author provides no guarantees in this regard.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics