To run the best backend platform, you need the best tools. This is so important to us at PlayFab, that we recently decided to create a new API for a third-party tool, simply because we really wanted to use it. The tool is Consul, we made the API in C#, and we're now sharing this resource with the rest of the development community. Here's how and why we did it:
Consul is a distributed, highly available, multi-datacenter-aware system that provides service discovery, health checking, and a consistent key/value pair store. We originally chose Consul to improve our service discovery. We already had a working service, but needed to be able to more easily tie it together, and unlike Zookeeper or etc., Consul provided a DNS server as part of the package. This made it an easy choice since it integrated instantly with our existing systems - we didn't need to make any code changes to start using it since it already used standard protocols to service requests.
An agent is required on nodes that participate in a Consul cluster, either as a server or a client. While installing another thing is a configuration burden, in this case it wasn't too bad, because we needed the health check services that Consul provides, and an agent is usually required to make health checks fast and stable anyway.
Since Consul provides health-check-aware service discovery (that is, unhealthy services are not returned by a DNS query), this decreases the chance that user requests will go to a server in a bad state, which increases reliability. Again, this did not need any changes to our existing codebase.
The next need that we looked to Consul to serve is distributed locks. A distributed lock is required when multiple nodes of a distributed system must agree to operate on a shared piece of data in a linear manner. Usually, this requires some sort of consensus algorithm or specialized datastore.
Our initial use case was account creation. If a user attempts to create an account multiple times in a short time window, they may end up creating multiple users with the same email address, Steam name, Facebook ID, etc., due to interaction between our eventually consistent datastore (DynamoDB) and how the users are keyed in the table. The solution in place before this need was realized was expensive and error-prone - rather than using DynamoDB Secondary Indexes, we maintained lookup tables of the various alternate IDs to a single central user ID. This means that lookup tables would drift from the central user table over time if the writes to the central or lookup table failed for any reason.
Using Consul, stable distributed locks are easily achieved because the key/value pair store at the heart of Consul is kept to consistent across multiple nodes using the Raft protocol. However, this means the developers at PlayFab needed an easy way to interact with Consul. Since the majority of our services are built on the Microsoft .NET stack and written in C#, and existing libraries either did not capture the functionality we needed, or were not up to date, we wrote our own.
The starting point for the library was Consul's Go API. Consul itself is written in Go and this API is produced and maintained by the Consul developers. The core of the API is a series of HTTP endpoints, which expose functionality in a RESTful style. Rather than writing large portions of code from scratch, we made the decision to port it as directly as possible. This means that the hard decisions about how to implement certain portions were already done (e.g. locks and semaphores), and if functionality was added in the future, it would be easier to quickly map the Go code to C# code.
Porting code is challenging for all but the simplest programs, but the differences between Go and C# make it especially brain-bending. Compared to Go, C# is extremely object-oriented, and while libraries like the Microsoft TPL blur the lines around raw threads and help make concurrency easy, they don't do nearly as much to ensure clean asynchronous operation when compared to Go's routines and channels.
Other challenges included stylistic differences in the language such as constructors, method overloading, and property accessors, but the hardest thing by far to map was the concept of Go's channel - specifically, the meaning of the channel closing. The Go Consul API uses channels for signaling periodic actions like renewing a session. We finally settled on a combination of Tasks and Cancellation Tokens to signal asynchronous methods cleanly in idiomatic C#, and while it's not quite as concise as Go's select statement, it works well.
In addition to porting the code, we ported all the tests as well, helping to ensure we got everything right. Much like the actual API, porting the tests rather than rewriting them from scratch means we could easily add new tests as the Consul developers added functionality or more tests as well as ensuring behaviors were exactly the same across languages.
What we ended up with was a clean, fast way to access the Consul APIs throughout our C# codebase in a way that mapped closely to the actual HTTP API, using the Microsoft TPL to provide asynchronous operation, while also keeping them closely aligned to the Go API to make them easy to update.
At PlayFab, we're huge fans of open source software, and recognize the importance of giving back to the community, so we've open-sourced our version of the Consul C# API as the amazingly original name "Consul.NET" under the Apache 2 license. It's available on NuGet and the source is available on GitHub.