Latest news and updates from the PlayFab developers

by Brent Elmer 2019-04-04

Overhauling the PlayFab Data Backend

Over the past several months, our analytics teams have been working under the hood on an upgrade to the PlayFab data backend. This change moves us onto Azure infrastructure and lays the foundations for ambitious analytics expansion over the coming months and year. We’re excited to finally share that journey publicly and introduce you to the benefits this will unlock over time.

This upgrade is divided into three major parts. We’ll talk below about what’s changing in each of these areas and what it means for the future:

  • Data Pipeline
  • Data Tenanting 
  • Analytics & Reports

A Bit of History

Our original analytics stack was driven using a variety of technologies, including some well-known offerings like S3, Elastic Search and Snowflake. But there were also custom components for scheduling jobs, executing queries, and moving data. Many of these custom components were architected a piece at a time as the platform expanded.  

Keeping this system running meant that periodically we’d lose engineering cycles when components needed maintenance, issues were discovered, or we had to scale up our machines and repartition. Fundamentally, we were managing too much of the stack ourselves, especially as the capabilities of Azure, AWS, and other hosted technologies had grown up around us. 

With this change, we’re deeply leveraging the latest Azure offerings built and maintained by dedicated teams such as Azure Data Explorer, Azure Data Factory, Azure Active Directory to name just a few. This means we’ll see continuous improvement in those capabilities with no direct engineering impact to our teams, and those capabilities can be exposed to our customers at low technical cost. 

With that, let’s dive into the specifics beginning with data ingestion. 

Data Pipeline: What’s Changing & Why?

There used to be only one front door for sending events: the PlayStream ingestion service. Today PlayStream parses and processes each event message individually in order to perform real-time segmentation and rule matching. Once read, the events are sent on to our data backend.

This is well and good if you need real-time, but what if you’re just sending descriptive telemetry events for offline analysis? The current approach brings some cost inefficiency and more restrictive limits. As our business grows, we needed to think bigger scale – not just in the number of customers, but the size and requirements of the biggest titles, many of which have a voracious appetite for custom event data.

Our new data pipeline makes it possible for tagged events to side-step the PlayStream engine, reducing cost while increasing throughput. Over the coming months, our SDKs will handle event batching, taking the burden off the client team to manage call frequency and payload size.

The scale multiplier here is huge. You’ll soon be able to send 10 or 100X more events with a larger payload size than before without fear of bumping up against limits. This is possible through an internal technology called Aria which grew up supporting Skype and now is used by Office and Windows. In addition to high ingestion throughput to Azure Data Explorer, Aria supports real-time cubes with sub-minute delays. We use these to aggregate counts and dimensions – think PlayFab’s dashboards – while preserving group/sum by functionality.

This new solution allows us to offer the best of both worlds, while putting control in your hands. If you need guaranteed delivery and real-time analysis, send to PlayStream. If you want scale and verbosity, use telemetry events. We’ll eventually allow this decision as a config change on the service side so you can switch at a moment’s notice. And of course, by the time these events hit the backend, they’re all stored together so you can seamlessly use the complete set for analysis. Better still, these events will land in our new tenant level databases.

Data Tenanting: What’s Changing & Why?

PlayFab offers a few different ways of accessing your data today. From both the Event History and Players pages, you can query across the columns in the player profile and the last 30 days of event data. If you need more, we offer Snowflake to get at the full history of raw events and API calls for your titles at a relatively low cost.

While this is easy to set up, it is implemented as a view against a single wide table with a single index for all events, which means that query performance has never been excellent with a single index. In the new model, all PlayFab titles will automatically have a database tenant where events will flow directly. Each event will land in its own table, rather than a single wide table as was previously the case, which will bring immediate query performance gains. This is possible because under the hood, we’re provisioning resources in Azure Data Explorer, an ultra-fast hyper scalable solution optimized for exploration of streaming data sets.

Now, all the data used for our Game Manager offerings, whether cooked or raw, will be available in the same location, opening them up for deeper integration on the platform, including segmentation, experimentation, etc. In addition, because we manage these resources, we can also replicate any processed data we use out to each customer, i.e. player profile, trends, reports, metrics. These can save teams hundreds of hours of data engineering time.

For customers who need more storage, we’ll offer options to scale to dedicated clusters where you can seamlessly bridge into the full capabilities of Azure for any of your most advanced data operations. 

Analytics & Reports: What’s Changing & Why?

Till now, the infrastructure used to power features like Reports and Trends has relied on a good amount of pre-calculation to keep page load times low – in some cases writing out self-contained CSV files, which isn’t the most flexible option. We’ve also bumped up against some impractical upper limits on cardinality

Azure Data Explorer is changing this by introducing two big advantages: 1) Ultra-responsive query times, 2) More efficient data joins. In some cases including Event History and Player Search, this means the query can now execute against the data in real time using the KQL expression language. These changes are in the process of rolling out to production. In other scenarios, we’ll perform scheduled calculations for metrics but do joins against full cardinality and date ranges in real time rather than baking the final product in a static form. 

Thanks to the new architecture, we’ll soon be able to support long-requested features such as custom date range pickers, new dimensions, and using segments as filters in Trends and Reports. Longer term, we’ll also allow you to create custom metrics and dimensions to produce scheduled calculations for you – a critical step towards easy custom reporting workflows. 

Looking Ahead

This is an exciting time in our journey. PlayFab is at an inflection point where the strengths of the Microsoft acquisition are being felt. By leveraging existing tech, we’re seeing a force multiplier on the scale and depths of services we’re able to deliver.

We’ll follow up soon with product level announcements that share dates and features that are lighting up. However, we’ve taken this opportunity to share this big leap forward in our journey to deliver a comprehensive data platform purpose-built for game developers.