PubSub the effect of ordering on latency

Thu, Aug 13, 2020

In this test, I intend to show the effect of ordering messages in Google Cloud PubSub has on latency. This test will be carried out by attaching a timestamp upon sending the message from a Google Cloud Function. This message will then be sent to a Google Cloud PubSub topic, with two subscriptions attached. One of these subscriptions will have ordering turned on, here after referred to as the ordered subscription, and one of them will have ordering turned off, referred to as the unordered subscription.

Once the message arrives in the topic and is present in each subscription, two identical Google Cloud Dataflow jobs, will read from each subscription separately. This is done to ensure there is no cross contamination. Upon reading the message, the dataflow pipeline with attach the publishTime to the message. These two dataflow jobs, each write to a separate table in Google Big Query, dictated to by which pubsub subscription they read from. Such that the ordered subscription streams into the ordered table in BigQuery and the unordered subscription streams into the unordered table. In this way we have a baseline from the unordered subscription to compare the ordered subscription against.

The full diagram of the architecture is shown below.

Architecture

Set up

Google Cloud Function

In order to test the ordered and unordered subscriptions, lets create a cloud function using the new java api for pubsub. In this way it can be called from http and it will send messages directly to pubsub. Find the java code for Google Cloud Function here and the pom.xml here for the java dependencies.

Google Cloud PubSub

Create a topic in Google Cloud PubSub for the tests, call it “test”.

Create the subscriptions that the test will use, here are the gcloud commands. Its also possible to create through the UI as well.

gcloud beta pubsub subscriptions create ordered \
  --enable-message-ordering --topic=test

gcloud beta pubsub subscriptions create not-ordered \
  --topic=test

Should now have something like the bellow image:

With that Google Cloud PubSub is now set up for the test.

Google BigQuery

Lets create a dataset in Google BigQuery to store our results called orderedtest. This can be done via the console. Once this is done, lets create our two tables one for each topic, ordered and unordered, naming them same ordered and unordered.

Create the tables via the console, using the following schema for both tables:

Once that is done our dataset should look like this:

Now time to create the Google Dataflow pipelines which will stream the data from our two subscriptions into the tables we just created.

Google Cloud Dataflow

Create a temporary Google Cloud Storage (GCS) bucket for the dataflows pipelines, see here for console, see below for gsutil command:

gsutil mb gs://<yourbucketname>

Once your temporary GCS bucket is created, we can then move on to running the two dataflow pipelines which will get the messages and attach the publish time to them. I had to create a custom dataflow pipeline, in order to do so. Find the code for that pipeline here. Should be quite simple to run. Follow the readme and do as follows:

export GOOGLE_APPLICATION_CREDENTIALS="key.json"

export BUCKET_NAME=<your bucket name here>

export PROJECT_ID=<your project name here>

# Build
./gradlew clean shadowJar

# Ordered pipeline
java -jar build/libs/pubsubtobigquery-1.0.0.jar \
 --runner=DataflowRunner --gcpTempLocation=gs://"${BUCKET_NAME}"/ordered \
 --workerZone=europe-west1-b --project="${PROJECTID}" \
 --inputSubscription=projects/"${PROJECTID}"/subscriptions/ordered \
 --outputTableSpec="${PROJECTID}":orderedtest.ordered

# Unordered pipeline
java -jar build/libs/pubsubtobigquery-1.0.0.jar \
 --runner=DataflowRunner --gcpTempLocation=gs://"${BUCKET_NAME}"/unordered \
 --workerZone=europe-west1-b --project="${PROJECTID}" \
 --inputSubscription=projects/"${PROJECTID}"/subscriptions/unordered \
 --outputTableSpec="${PROJECTID}":orderedtest.unordered

That should all run nicely and should end up looking like this:

Simulate traffic

Now there are two dataflow pipelines listening to those two topics we can use the cloud function to send data to the topic which will arrive in the subscriptions. This can be done by triggering the cloud function either remotely via curl or from the gcloud console. With that the test is all setup and we are ready to analyze the results.

Results

With some sql we can do a little comparison between the two subscriptions in terms of average

SELECT "ordered", 
       AVG(UNIX_MILLIS(CAST(event_timestamp AS TIMESTAMP)) - send_time) as diff 
FROM `<your project name here>.orderedtest.ordered`
UNION ALL
SELECT "unordered", 
       AVG(UNIX_MILLIS(CAST(event_timestamp AS TIMESTAMP)) - send_time) as diff 
FROM `<your project name here>.orderedtest.unordered`

Which yeilds:

1	unordered 1065.77
2	ordered 1065.77

Which as you can see, clearly demonstrates that there is no discernible effect on latency. This test was carried out with about 1000 (1KB) messages per second.

I hope you have enjoyed this post, please follow our rss feed for more.

Flux Engine Blog