Blog

Latest news and updates from the PlayFab developers

by ZachStone 2017-07-17

Introducing Data "Sharehousing" with Snowflake

Thanks to our integration with cloud-based data warehouse provider Snowflake, PlayFab can now provide one-click access to all your game's raw data, in near-real time, via SQL. No messing around with log files, data export, custom ETL, or balky import scripts -- it just works!

Why should you care? Because data is the life-blood a modern game, and the need for high-quality, real-time data goes well beyond reporting simple vanity metrics like DAU, retention, and ARPU. To identify problems, test hypotheses, and ultimately improve your game, there is no substitute for gathering, storing, and then being able to quickly analyze large quantities of raw data from your game.

Until now, this has been hard because of these three challenges:

  1. How to capture the right data in the first place, and getting it off the client device and into the pipeline?
  2. The data schema (what information you are capturing) is constantly changing as you capture new data events
  3. Games generate a LOT of data, and moving it around is expensive.

Together, PlayFab and Snowflake have solved these problems, which is why we are so excited to announce the Snowflake add-on in PlayFab's marketplace.

You can learn how to setup and use this new feature in our Snowflake tutorial.

For a demo, watch the following video.


What Is Happening Under the Hood

Many of you are familiar with PlayStream. In this context, you can think of it as a stream of raw JSON objects, which describe everything happening in your game. It has PlayFab's common events that we generate from the backend. I.E, when a player logs in, we generate a "player_logged_in" event. It also has custom events with arbitrary schema. These events are specific to your game, and PlayFab does not need to know the schema ahead of time.

We already load everyone's PlayStream data into our Snowflake account. This is where we generate reports, among other things. Note Snowflake has full SQL and direct JSON support. Which means the raw data is loaded directly into Snowflake with no transformation. This way we use the most honest representation of the data possible. And because these events come through the backend, the data reliability is incredibly good.

Once Snowflake released their new Data Sharing Feature, we were finally able to setup secure views of the data, and then share those views with you. This avoids ETL entirely. Magically, Snowflake is using exactly the same files. But Snowflake's Secure View guarantees that your data is truly isolated. This means you get direct access to a direct feed of your data, without any custom import needed.

Why We Love Snowflake

We love Snowflake for more than just their data sharing. We had already chosen Snowflake over other data warehouses to use for our internal data warehouse for two main reasons:

  • Snowflake decouples compute and storage, keeping costs way down. Our query load is heavy for about 1 hour a day. Otherwise it's usually off. Solutions like Redshift are on 24 hours a day, but Snowflake specialized in this kind of load. Furthermore, heavy write doesn't interfere with heavy read. But we still get read committed, global transactions.
  • Snowflake has direct JSON support. We load the raw data into Snowflake, and never change it. We then use views to parse out pieces of the JSON we know, so it feels like a normal table. So changing schema is just changing a view, not migrating a table. This has saved me weeks of work over the last 6 months.

We believe Snowflake is the best tool for consuming your data. Its flexibility and pay-as-you-go pricing make it great for small indie teams. And its security focus and efficient handling of large, diverse data sets make it great for larger enterprise as well. PlayFab's simple integration makes it easy to get the right data into Snowflake when you need it. Just enable your Snowflake add-on and see for yourself!