Running Cockroach DB in Production with Nomad
For those that know me, they know I have been a huge fan of HashiCorp Nomad for the past year or so. I have been very open with my thoughts that I don’t believe that sweetspot of the tool is for scheduling containers. I believe the ship has sailed in the, very fashionable, direction of Kubernetes for container scheduling. I have been using Nomad in a similar vein to how others are supervising their services with tooling like systemd.
I needed to make a deployment of cockroach db to support a number of applications that were using it. This could have been achieved using a configuration management tool and systemd, but I thought that this would be a very good use case for Nomad. It has the ability to schedule applications across a number of hosts, it will replace an instance of an application that has died, it supports rolling updates to ensure high availablity etc. So what was the worst that could happen?
This post assumes that you already have a Nomad cluster running. If you need help on doing that, please have a read through the documentation available for that.
A nomad job file can be specified in either HCL or JSON and is broken into a number of sections. The hierarchy of a job looks as follows:
- job
- group
- task
- task
- group
A TL;DR in this area is that a job is a globally unique way of separating items within nomad. A job can have 0 or more groups where a group is a way to define a series of tasks that need to be co-located on the same nomad client and a task is an individual unit of work, e.g. running a service or a contaniner.
job "cockroach-db-cluster" {
...
}
Job
The only parts of my nomad job that are not part of the group are as follows:
datacenters = ["dc1"]
type = "service"
update {
health_check = "task_states"
max_parallel = 1
stagger = "12s"
}
group "db-cluster" {
...
}
This specifies that the job is a service that will run in the dc1
datacenter. It also contains information
about the update strategy for the job. In this case, it will update a single allocation of the job at a time
and it will be marked as healthy (via task_states
) when all tasks in the group are running. It will stagger
the allocation changes in sets of 12 seconds. This is slightly less than the default of 30 seconds in this area.
Group
A group is a way to add constraints and the number of allocations of a set of tasks. The group is made up of the following:
ephemeral_disk {
sticky = true
}
count = 3
constraint {
distinct_hosts = true
}
task "cockroach-cluster" {
...
}
As this is a database, I would rather not have to wait for nomad to copy data around my cluster when it needs to
reschedule an allocation. Therefore, setting the ephemeral_disk
stanza in this way means that nomad will
make a best-effort attempt to keep the allocation on the same machine and therefore, only the local/
and
alloc/data
will need to be moved to another folder on the disk.
We are then telling nomad that we want 3 allocations of the task and we are constraining the tasks to make sure they do not run on the same hosts. This ensures that we are spreading our database across our cluster and try to ensure that it is HA.
Task
The task group is where all the individual detail of how to run cockroach db happens. This is a pretty complex example and is actually linked to Vault to get certificates for a TLS cockroach cluster. I will try and make it as clear as possible what parts are optional and what are not.
driver = "raw_exec"
artifact {
source = "http://mybucket.s3-us-west-2.amazonaws.com/cockroach-linux-amd64.tar.gz"
}
vault {
policies = ["nomad-pki"]
change_mode = "signal"
change_signal = "SIGUSR1"
}
template {
data = <<EOH
{{ with secret "mypki/issue/crdb" "ttl=720h" "common_name=node" "ip_sans=192.168.0.1,192.168.0.2,192.168.0.3,127.0.0.1" "alt_names=db-cluster.service.consul" "format=pem" }}
{{.Data | toJSON }}
{{ end }}
EOH
destination = "local/bundle.json"
change_mode = "signal"
change_signal = "SIGHUP"
splay = "10m"
}
template {
left_delimiter = "(("
right_delimiter = "))"
data = <<EOH
{{- printf "%s\n" (datasource "bundle").private_key -}}
EOH
destination = "local/node.key.tmpl"
perms = "600"
change_mode = "noop"
}
template {
left_delimiter = "(("
right_delimiter = "))"
data = <<EOH
{{- printf "%s\n" (datasource "bundle").certificate -}}
EOH
destination = "local/node.crt.tmpl"
perms = "644"
change_mode = "noop"
}
template {
left_delimiter = "(("
right_delimiter = "))"
data = <<EOH
{{- range $index, $value := (datasource "bundle").ca_chain -}}
{{- printf "%s\n" $value -}}
{{- end -}}
EOH
destination = "local/ca.crt.tmpl"
perms = "644"
change_mode = "noop"
}
config {
command = "gomplate"
args = [
"-d",
"bundle=file://${NOMAD_TASK_DIR}/bundle.json?type=application/json",
"-f",
"local/ca.crt.tmpl",
"-o",
"local/ca.crt",
"-f",
"local/node.crt.tmpl",
"-o",
"local/node.crt",
"-f",
"local/node.key.tmpl",
"-o",
"local/node.key",
"--",
"${NOMAD_TASK_DIR}/cockroach",
"start",
"--certs-dir=${NOMAD_TASK_DIR}",
"--join=192.168.0.1:26257,192.168.0.2:26257,192.168.0.3:26257",
"--cache=.25",
"--max-sql-memory=.25",
"--store=${NOMAD_TASK_DIR}/data,size=90%",
"--logtostderr=INFO",
]
}
service {
name = "${TASKGROUP}"
}
Firstly, I am specifying a driver of raw_exec
. This is actually disabled by default in Nomad. So you will need
to enable this driver before you can use this way of working. The reason I chose raw_exec
was that I needed to
wrap my commands using gomplate to generate the correct certificate structure. Gomplate was
available on the system PATH, so the job needed to be able to communicate with that.
Next there is an artifact stanza. This allows us from where to download the cockroachdb binary.
Next, I have a vault
stanza. This allows nomad to get the correct certificates required from vault to run
the cluster in a TLS manner. If you don’t need to run cockroachdb in TLS, then you can omit this part.
All of the template stanzas mean I can acquire a certificate bundle from my PKI and then it actually uses different parts of the certificate bundle as cockroachdb required a cert, ca and key. At the time of writing this post, it does not support passing a certificate bundle. If you don’t need to run cockroachdb in TLS, then you can omit this part.
Next is the config
stanza. This is the configuration needed to pass to the task driver so that the task can be run. In
this case, as I needed to deal with putting the correct certificates in the correct place, I needed to execute the
gomplate
command. The end of the command args has --
which means at this point we can pipe through to the cockroach
binary. Notice here, I had to make sure and add the location of the binary, as I was executing a command that was already
within PATH, not the task directory.
Lastly, I have a service
stanza. This instructs nomad to register a service in
HashiCorp Consul with the name that is specified in the taskgroup
stanza, in this case, that
value would be “db-cluster”.
If you would want to run the same task without the need to talk to Vault to get TLS certificates, then it would look as follows:
driver = "exec"
artifact {
source = "http://mybucket.s3-us-west-2.amazonaws.com/cockroach-linux-amd64.tar.gz"
}
config {
command = "cockroach"
args = [
"start",
"–-insecure",
"--join=192.168.0.1:26257,192.168.0.2:26257,192.168.0.3:26257",
"--cache=.25",
"--max-sql-memory=.25",
"--store=${NOMAD_TASK_DIR}/data,size=90%",
"--logtostderr=INFO",
]
}
service {
name = "${TASKGROUP}"
}
The template has a lot less moving parts and the driver can move back to exec
.
Overall, the entire template looks as follows:
job "cockroach-db-cluster" {
datacenters = ["dc1"]
type = "service"
update {
health_check = "task_states"
max_parallel = 1
stagger = "12s"
}
group "db-cluster" {
ephemeral_disk {
sticky = true
}
count = 3
constraint {
distinct_hosts = true
}
task "cockroach-cluster" {
driver = "raw_exec"
artifact {
source = "http://mybucket.s3-us-west-2.amazonaws.com/cockroach-linux-amd64.tar.gz"
}
vault {
policies = ["nomad-pki"]
change_mode = "signal"
change_signal = "SIGUSR1"
}
template {
data = <<EOH
{{ with secret "mypki/issue/crdb" "ttl=720h" "common_name=node" "ip_sans=192.168.0.1,192.168.0.2,192.168.0.3,127.0.0.1" "alt_names=db-cluster.service.consul" "format=pem" }}
{{.Data | toJSON }}
{{ end }}
EOH
destination = "local/bundle.json"
change_mode = "signal"
change_signal = "SIGHUP"
splay = "10m"
}
template {
left_delimiter = "(("
right_delimiter = "))"
data = <<EOH
{{- printf "%s\n" (datasource "bundle").private_key -}}
EOH
destination = "local/node.key.tmpl"
perms = "600"
change_mode = "noop"
}
template {
left_delimiter = "(("
right_delimiter = "))"
data = <<EOH
{{- printf "%s\n" (datasource "bundle").certificate -}}
EOH
destination = "local/node.crt.tmpl"
perms = "644"
change_mode = "noop"
}
template {
left_delimiter = "(("
right_delimiter = "))"
data = <<EOH
{{- range $index, $value := (datasource "bundle").ca_chain -}}
{{- printf "%s\n" $value -}}
{{- end -}}
EOH
destination = "local/ca.crt.tmpl"
perms = "644"
change_mode = "noop"
}
config {
command = "gomplate"
args = [
"-d",
"bundle=file://${NOMAD_TASK_DIR}/bundle.json?type=application/json",
"-f",
"local/ca.crt.tmpl",
"-o",
"local/ca.crt",
"-f",
"local/node.crt.tmpl",
"-o",
"local/node.crt",
"-f",
"local/node.key.tmpl",
"-o",
"local/node.key",
"--",
"${NOMAD_TASK_DIR}/cockroach",
"start",
"--certs-dir=${NOMAD_TASK_DIR}",
"--join=192.168.0.1:26257,192.168.0.2:26257,192.168.0.3:26257",
"--cache=.25",
"--max-sql-memory=.25",
"--store=${NOMAD_TASK_DIR}/data,size=90%",
"--logtostderr=INFO",
]
}
service {
name = "${TASKGROUP}"
}
}
}
}
You can follow the documentation supplied by HashiCorp on running a job. That will then schedule the correct number of instances in the database cluster.