Hands-On With Agones and Google Cloud Game Servers

I recently had the pleasure of exploring Agones and Google Cloud Game Servers (GCGS), and I wanted to share my experience.

Agones is an open-source platform for running multiplayer game servers in Kubernetes. Going into this exploration, I had never run any sort of game server, but I had worked with real-time communications workloads in Kubernetes. I saw the potential for similar problem sets between the two experiences, and was curious how Agones would solve issues I’ve seen in the past.

On top of Agones, GCGS is a multi-cluster management layer that makes deploying game servers across multiple clusters easier. Before I go into more detail about GCGS, it's important to have a deeper understanding of Agones, how it's installed, and what it does.

Agones

Installation

The first thing I notice about Agones is that it is entirely Kubernetes-native. I'm a big fan of this since I run everything in Kubernetes these days; there are installing with Helm instructions which I jump to immediately. Initial Helm installation on my GKE cluster goes smoothly, and I have a handful of custom resource definitions (CRDs) and a control plane running in the agones-system namespace within a few minutes.

▶ kubectl get deploy -n agones-system -oname
deployment.extensions/agones-allocator
deployment.extensions/agones-controller
deployment.extensions/agones-ping

▶ kubectl get crd -oname | grep agones
fleetautoscalers.autoscaling.agones.dev
fleets.agones.dev
gameserverallocationpolicies.multicluster.agones.dev
gameservers.agones.dev
gameserversets.agones.dev

Create a Gameserver

I have a controller, an allocator, a ping server, and some CRDs that look like they might be useful, but what do I do now? Following along in the documentation, it looks like the next step is to create a gameserver.

So I create a gameserver from the provided yaml manifest by following the documentation. This gameserver uses an example service called simple-udp to show how Agones works, and this is where we start to see what Agones is really doing.

If I take a look at the gameserver specification, I see that it contains a container spec, much like a pod specification would. It also declares ports via what they're calling aportSpecification:

spec:
  ports:
  - name: default
    portPolicy: Dynamic
    containerPort: 7654

This means that it is going to use aDynamicexternal port, and that the container is listening on port 7654. Cool, pretty simple so far.

Looking further, I can inspect the gameserver to see which port has been allocated:

▶ kubectl get gameserver                                                                                            
NAME               STATE   ADDRESS         PORT      AGE    
simple-udp-4pgls   Ready   35.224.97.238   7063

Looks like I have a gameserver that is running a pod and listening on port7063;now I'll try to connect via UDP using netcat.

▶ nc -u 35.224.97.238 7063   
HI                           
ACK: HI                      
EXIT                         
ACK: EXIT

Thesimple-udpserver did its job and responded to me with whatever I sent it, and then I gave the exit command. The way this server is set up, when it receives anEXITcommand, it tells Agones that it's done and then Agones shuts down the gameserver. We can see this in action:

▶ kubectl get po
NAME               READY   STATUS        RESTARTS   AGE
simple-udp-4pgls   0/2     Terminating   0          30s

▶ kubectl get gameserver
NAME               STATE      ADDRESS         PORT                                  
simple-udp-4pgls   Shutdown   35.224.97.238   7063

The pod is terminated and the gameserver State isShutdown.

A Quick Note On Networking and Cluster Creation

The first time I tried to connect the gameserver, nothing happened. In my case, it was because I didn't follow all of the wonderful documentation in the Agones installation guides. I skipped over the cluster creation section because I figured I already knew how to create one. Turns out there's one step in there that's a bit different than normal. Since these gameserver pods all need to listen on a UDP port, we need to open up a set of ports on the GCP firewall. Note that the port range in this case is udp:7000-8000. This port range is controlled by the Agones control plane installation, and can be configured by a value in the Helm chart:

▶ gcloud compute firewall-rules create game-server-firewall \
  --allow udp:7000-8000 \
  --target-tags game-server \
  --description "Firewall to allow game server udp traffic"

After that's done, we need to tag the nodes withgame-serverin order for this rule to apply. It's much easier to just follow the cluster creation instructions from the start, which will have you add the tags and create the firewall rule in the beginning.

Agones Architecture

You might be asking yourself why this gameserver thing is important, so I'm going to take a step away from the exploration and talk about what Agones is actually doing here.

A gameserver is in theory just a process that multiple clients can connect to in order to play a game. There are lots of different implementations of this, but there are a couple key things that these servers must do which affect the way they run in Kubernetes:

1. Gameservers must remain available during the time that the game is being played.

The fact that the gameserver pod must be uninterrupted during a specified time means that we can't go killing the pod because of autoscaling, draining, or any other reason. In order to handle this, the gameserver manages the pod lifecycle in a way that a deployment never could. Agones introduces several states for the gameserver, and the game code itself is able to update that status via the Agones API. Agones runs a sidecar in every gameserver to receive this traffic.

2. Gameservers must accept connections on some specified port from multiple clients.

This is also an issue when running multiple gameservers in Kubernetes due to the possibility of port exhaustion. We need a way to allocate a lot of different ports that can be used by the gameservers. Agones handles this seamlessly with its DynamicPort allocation strategy. Each gameserver is assigned a port and an IP combo that can be used by clients to connect.

Google Cloud Game Servers (GCGS)

Now that we have a handle on the basics of how Agones is installed and gameservers are run, we can start looking at what GCGS provides on top of that. I started with the quickstart guide in their documentation. The first thing this guide has you do is create a cluster and deploy Agones to it. I had already done that, so I skipped ahead to the part where we create a GCGS Realm.

Realms

Realms in GCGS are an organizational construct. While going through this exploration I was trying to figure out the best way to organize a large group of global clusters into realms. I ended up talking to a Googler in the Agones public Slack. (There's a #google-cloud-game-serverschannel in there). I won't go too in-depth about realm organization here, but the best advice I got was this:

A good rule of thumb is "groups of clusters in which, from a player perspective, latency differences between them don't matter"

Anyway, I digress. Once you have a cluster or set of clusters running Agones, in order to use GCGS, you have to add them to a realm. This is simple to do with a couplegcloudcommands:

▶ gcloud game servers realms create agones-blog --time-zone EST --location us-central1
Create request issued for: [agones-blog]
Waiting for operation [projects/gcp-prime/locations/us-central1/operations/operation-1598980635305-5ae43b0c52589-34a097b6-d0659fc7] to c
omplete...done.
Created realm [agones-blog].

▶ gcloud game servers clusters create agones-blog --realm=agones-blog --gke-cluster locations/us-central1/clusters/agones-blog --namespace=default --location us-central1 --no-dry-run
Create request issued for: [agones-blog]
Waiting for [operation-1598980727087-5ae43b63da1ee-88b19318-22a70d4f] to finish...done.
Created game server cluster: [agones-blog]

The first command creates the realm, and the second one attaches my cluster running Agones to that realm.

Now that we have a cluster in a realm, the next step is to create a deployment. A deployment is basically a container for a set of configurations that will describe a set of gameservers. So we create the deployment, and then we create a configuration inside of it:

▶ gcloud game servers deployments create agones-blog
Create request issued for: [agones-blog]
Waiting for operation [projects/gcp-prime/locations/global/operations/operation-1598980944523-5ae43c3336ffd-4eb7fce4-fd872d08] to complete...done.
Created deployment [agones-blog].

▶ gcloud game servers configs create config-1 --deployment agones-blog --fleet-configs-file fleet_configs.yaml
Create request issued for: [config-1]
Waiting for operation [projects/gcp-prime/locations/global/operations/operation-1598981023478-5ae43c7e83334-58008b9c-8df141c0] to complete...done.
Created game server config [config-1].

Notice that when creating a config, I specified a yaml file containing a fleet specification. The fleet specification looks a lot like the gameserver specification that we deployed earlier, but with template and replicas fields:

- name: fleet-spec-1
  fleetSpec:
    replicas: 2
    template:
      metadata:
        labels:
          foo: bar
      spec:
        ports:
        - name: default
          portPolicy: Dynamic
          containerPort: 7654
        template:
          spec:
            containers:
            - name: simple-udp
              image: gcr.io/agones-images/udp-server:0.17

This specifies that we want to create two gameservers as well as maintain a replica count of two. If the gameserver specification is analogous to a pod specification, then the fleet is a lot like a Kubernetes deployment.

The last thing to do is to roll out that deployment to the clusters in the realm.

▶ gcloud game servers deployments update-rollout agones-blog --default-config config-1 --no-dry-run
Update rollout request issued for: [agones-blog]
Waiting for [operation-1598981253616-5ae43d59fd30b-b841d131-f1822e0c] to finish...done.
Updated rollout for: [agones-blog]
createTime: '2020-09-01T17:22:24.587136253Z'
defaultGameServerConfig: projects/gcp-prime/locations/global/gameServerDeployments/agones-blog/configs/config-1
etag: fHXlfY2MivvPraKyPJEseJF5SqjaBfUrnaWMGT1aCb8
name: projects/gcp-prime/locations/global/gameServerDeployments/agones-blog/rollout
updateTime: '2020-09-01T17:22:25.547699385Z'

With that done, we see in our cluster a deployed agones fleet complete with the two gameservers we requested:

▶ kubectl get fleet
NAME                         SCHEDULING   DESIRED   CURRENT
fleet-agones-blog-config-1   Packed       2         2

▶ kubectl get gameserver
NAME                                     STATE   ADDRESS         PORT
fleet-agones-blog-config-1-st55c-8gbd2   Ready   34.123.40.127   7212
fleet-agones-blog-config-1-st55c-rnxjh   Ready   34.123.40.127   7839

Why Use GCGS?

Up to this point it might seem like you could just deploy the fleet to your cluster using the Agones CRD, which is entirely correct. The real power of GCGS is in the multi-cluster management of these fleets.

In further exploration, I spun up another cluster, installed Agones, and added that cluster to the realm. When I added a second cluster, I saw that the fleet was deployed to the new cluster as well. This told me that clusters are now centrally managed by GCGS. I could add and remove clusters at-will from the realm, and my deployment of gameservers remained the same. This is a really powerful concept that will make managing a massive deployment of gameservers much easier.

Gameserver Allocation

Up to this point, we've seen how we use Agones + GCGS to deploy gameservers to multiple clusters, but how do we actually use these gameservers? We know that each gameserver is in aReadystate, and that each gameserver can receive UDP traffic on a specified port and IP address. Now let's explore another powerful Agones concept: allocation.

You may have noticed that one of the control plane deployments was calledagones-allocator. The Helm chart deploys this and a corresponding service by default, but it requires further configuration before you can use it. The setup of the allocation service is beyond the scope of this article, but it is covered in detail in the advanced Agones documentation.

Once the allocator service is configured, you can make a request to the allocator using mTLS and gRPC. This request can specify selectors that will limit the choice of gameservers by label. The response you get back is an IP address and the port of a single gameserver. The really cool part is that this gameserver's status is now changed toAllocatedin the Kubernetes API. This means that the allocator service will not be allowed to allocate that gameserver again and that this gameserver can no longer be shut down by Kubernetes in the event that a node is drained. Effectively this pod is now a long-lived service that Agones and Kubernetes will try to keep alive as long as it is still in use.

Additionally, multi-cluster allocation can be set up by GCGS so that any time you make an allocation request to a cluster in the realm, you can be assigned any available gameserver across that realm, in any cluster. This is covered in the GCGS documentation.

In Summary

Agones is an open source platform that provides CRDs and a controller designed to run gameservers in Kubernetes. It's straightforward to get running, and the documentation is great. Adding Google Cloud Game Servers allows you to manage multiple clusters running Agones from a central point, and makes deploying gameservers across all of the clusters much simpler.

Throughout my exploration of these products I got continuously more excited about the potential they have to make running gameservers easier. The roadblocks that I ran into were small, and the Googlers in the Agones public Slack were consistently helpful and informative. It's rare that I explore a new Google product (that was in beta at the time I first encountered it) and have this great experience. A huge shout out to the team over there working on Agones.