CouchbaseCluster Reference Architecture
How to configure a reference production deployment of Couchbase Server.
Couchbase clusters can be configured in many different ways. We advertise features as opt-in, so you have the freedom to configure your cluster as suits your environment. There are, however, a core set of best practices that we recommend.
This page collects together and aggregates those best practices into a single architecture. While it may not suit your environment completely, it may form the basis of your clusters.
| The majority of this page can be copied and used verbatim without modification. Non-dynamic elements of the configuration will be highlighted using admonitions such as this one. | 
Prerequisites
The reference architecture makes use of 3rd party resources, for ease of management and security compliance. Before continuing ensure you have installed the following on your Kubernetes cluster:
RBAC Management
Role based access control should be managed by the Operator. This allows your Couchbase users and groups to be defined in code. Code can be controlled with a change control system, it can be easily audited and reviewed, and crucially can be automated.
# Applications and the Operator are able to use this secret.
# For security, normal users within the namespace should be prohibited
# from access to Secrets.  It is up to the administrator to also
# ensure the passwords are not leaked from the application.
apiVersion: v1
kind: Secret
metadata:
  name: application1-authentication
type: Opaque
stringData:
  password: pieWrewn5knyk&
---
# Applications should have a user that they can use, this allows
# strong guarantees of safety when using RBAC.  While strictly not
# necessary, the labels provide filtering so resources are only
# picked up by specific clusters.
apiVersion: couchbase.com/v2
kind: CouchbaseUser
metadata:
  name: application1
  labels:
    cluster: cluster1
spec:
  authDomain: local
  authSecret: application1-authentication
---
# Groups control what the application is allowed to do.  The administrator
# should limit group permissions to only what is absolutely necessary to
# allow the application to function.  It also restricts those permissions
# to a specific bucket that the application needs to access. While strictly not
# necessary, the labels provide filtering so resources are only
# picked up by specific clusters.
apiVersion: couchbase.com/v2
kind: CouchbaseGroup
metadata:
  name: group1
  labels:
    cluster: cluster1
spec:
  roles:
  - bucket: bucket1
    name: bucket_full_access
---
# Role bindings are a bit of a misnomer.  They follow the behaviour of standard
# kubernetes role bindings by creating a relationship between users and groups.
apiVersion: couchbase.com/v2
kind: CouchbaseRoleBinding
metadata:
  name: group1
  labels:
    cluster: cluster1
spec:
  roleRef:
    kind: CouchbaseGroup
    name: group1
  subjects:
  - kind: CouchbaseUser
    name: application1Bucket Management
Buckets, like RBAC, should be managed by the Operator. Again this provides change control, peer review and auditing. This is essential for data containers, like buckets, as they have quotas, and these should be centrally controlled and managed to prevent resource starvation by over provisioning.
# Buckets control the amount of data an application can use, and control how that
# data is managed.  We need at least one data replica to ensure applications can
# continue to work when a pod is evicted (usually for the purposes of Kubernetes
# rolling upgrades, which should be carried out one every 3-4 months to ensure that
# you are up to date with the latest security fixes from both Kubernetes and the
# operator).
apiVersion: couchbase.com/v2
kind: CouchbaseBucket
metadata:
  name: bucket1
  labels:
    cluster: cluster1
spec:
  memoryQuota: 100Mi
  replicas: 2
  ioPriority: high
  enableIndexReplica: trueSecurity Management
Security is extremely important when working in a Cloud based environment such as Kubernetes. To this end we use TLS management by default to protect data from eavesdroppers.
TLS is managed by 3rd party tooling, rather than manually. By doing this we simplify the process by removing manual steps, and also secure it by using policy based certificate rotation.
| The TLS subject alternative names described in this section contain the namespace of the cluster.
In this example, that namespace is  | 
# Admin password is required by the Operator.  Like user secrets, this should be
# protected from unauthorized reads.
apiVersion: v1
kind: Secret
metadata:
  name: administrator-authentication
type: Opaque
stringData:
  username: Administrator
  password: Berv~fradrics3
---
# The CA used to issue and sign server certificates is still a manual step.
# This should be distributed to all clients who are going to consume this database
# instance.  This CA is fixed for the lifetime of the cluster, so must be kept secure,
# but does mean that clients will continue to function even when the server certificates
# are automatically rotated periodically.
apiVersion: v1
kind: Secret
metadata:
  name: ca
data:
  tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBODNvSHArVnNDSGs4UEtsQTJGc1FyNk1yQjhoenpMUjZVTjJtTVdtSFpqWDFIem5WCm1ZYkVSQnhHQmVjbVVHRlRQZWo3Z3Y0NDl5QlV2ZXFIVCtDSjhubHRSbWxFZ24vV3NTYzZUYUl1UWRnRHdreVEKVEZ4UjkvR3JlNEY2M1BwcGNLNFpLQ3FtNjZ5Sk5qTktUQ2hBZEJsdFJyT0hqcDB2TTJrY01JTFl2VDFVVkQxUwpLRDRSQTl3R01XajJTOUQ4eDVnRjFOZHljYndPRWJmRzY3bTVZcFkrcmlIdWEyMHpoamJiZk5RbzN1SVdCeCtmCkVhTmNYOE1MTS9Jc2tVYzFEbTdiK3djaE9ZY2FFbWhIUXhYcHRqMDhCRUN1Z3ZXL3FLTmtjaUI2cE9aYzdlNGUKcVFuRVEvN1hJZ0t4TURUVkJTb1RqMWVJby9sQXhGVS8yYWlHWHdJREFRQUJBb0lCQVFDdGE4WDRPTmx5VDZndwpMUDRiSFFJTm1GTVdBQms3UFhIQ0Y1NUFvOEhsYzVsYzNIemdGYlhHTGIxU2h3b3JScWRiK1k3c0J0ZmNiaEx1CkV4YStObGtMZEtINC9SSG5RZGRSNTNjSHhQVGR3VmNzRmd6UjF4QXJZdCtaNE9mNmJnS2NWK1ZqVHI0R0w2YXMKREd4blFtUm1UWllnUGMvWUxPMXAyUHhUTVYvZnFXTzI3TW9jZml4bUs2MUl3V010bXd0QWNXc1FQZmlhWlRpMApSQlR0TzRyVHV5VnZESURjRlh1ZHVUaHlGNW5KTENsTERiazU5eFIxWlFLTTVoQjcvdjN3LzFneW1FbFNoY1hjClU2ME5TRDFBOTBmV1VMUkxzZXE5RXlyT3U3bGdrNDZBWmpxaGt0NDdqZkM4K3JuR0dUNnh1MWtkUjRJa0FBcnoKM1lzZm1JZUJBb0dCQVBxcjBsckV6R1FqYlk2RjVrTjJGeXVvb0txMFMrakNkRUlONFB1NmhjTnpXazFlTzA5agpBTWJFWlBxVFhkYnF6TlVMOTlkLytkL1JqcXR4dTRXbzZPWHhZSnNKNVU1bVROandVcVFKL3Y1am1ndjVvenlvCjY0RFFQdjE2aG1qSXhwcHRJd25EWC9KL2trdnlPZlVuQzY4dnZEQ3hxQkVHbDZ3UVVJWE12aE01QW9HQkFQaW4KRGhUR3hYZmFvOHhkWVRPbTdFUXVZa3l2Q2h0NUp5Tm05MWRRNEhnOTZPMmRYc3BWSDNrUllSYzFjMFdJZVM3MApHT0lSOGcrNXQvNXYvZVQ2R0pWOGMyc3VZRjErS1pTY3Z0bFFPYjJacitnUEN4SEtDVDY0SE83MnNFVGNYRGpaCjZ1cThiN0JCdnkxZXFtR1FsS2VZSUpEb1RKL0lpbGhFdGFFRHVlNVhBb0dBQWdJUVhGUEpRMkFaUjVRQkJUZFQKOWpDU29PdHkxRG1DanVqbmpYeXdCNkhMN21TNzJ1WHpJcVIrSHBmQm43QWYxZkVUbWpGWFFoaStxTmJ2WnFHMAp3K3JNR0ZIYStXYk9aTXFBRHZwWmhaWXNyTDNpTmVFd2ljYWhTb3lKdVJzcXBDQU5zTTFVM205eEw1U1FMRXVVCngyRjlnM0pZNDFJSE13U3FjSGYwYWRrQ2dZQXFTZThoSlhVc0h5bEFkcGt6ZWE0eElscGhoRnVKdEo4dGJET2cKekFhQkxMWlN3ekw5NG1CSjdPVEFWN3pWRkpMWG8zZ2Y2c0ZxWDBHbHFsSmFBUmJ4UllzenJWMkNTUlMxUzd0QgpwbDFMbTduSkU5WGtIcUpYNG1RNVdBYytqdU80WDRlT2lLSE9Ma0JmYlB3NVA2ZW9vVHpZcUVsdjIyRjhCYU9HClVPWHNYUUtCZ1FEV0xHcThpeWRXK2FkK1k4blZxV1VUZnl4VW5yaW5TVG95ei9nT3dDZ2ZoS2FSdGpYclEvUmEKSVlTczNjR016SldwZXZkWGRDc2JSL0I2L09VSSsyTlRqcEo4c3FVZnhjbHpMeU4xQjlQUFhGeDF5WTV5ZDFJZApVQnRLTG5KRjBSQlRzWnNmVG0vUUpJeGY3Y2ZWUGRlTGcxMEdJeFhzaG5GV0FLUHhveUhSeGc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
  tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURTekNDQWpPZ0F3SUJBZ0lVQ1ptem1IQjI0cUFwUnY3WnpLUGFJclFHam5Nd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZqRVVNQklHQTFVRUF3d0xSV0Z6ZVMxU1UwRWdRMEV3SGhjTk1qQXhNakUzTVRJMU9ESXlXaGNOTXpBeApNakUxTVRJMU9ESXlXakFXTVJRd0VnWURWUVFEREF0RllYTjVMVkpUUVNCRFFUQ0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFQTjZCNmZsYkFoNVBEeXBRTmhiRUsrakt3ZkljOHkwZWxEZHBqRnAKaDJZMTlSODUxWm1HeEVRY1JnWG5KbEJoVXozbys0TCtPUGNnVkwzcWgwL2dpZko1YlVacFJJSi8xckVuT2syaQpMa0hZQThKTWtFeGNVZmZ4cTN1QmV0ejZhWEN1R1NncXB1dXNpVFl6U2t3b1FIUVpiVWF6aDQ2ZEx6TnBIRENDCjJMMDlWRlE5VWlnK0VRUGNCakZvOWt2US9NZVlCZFRYY25HOERoRzN4dXU1dVdLV1BxNGg3bXR0TTRZMjIzelUKS043aUZnY2ZueEdqWEYvREN6UHlMSkZITlE1dTIvc0hJVG1IR2hKb1IwTVY2Ylk5UEFSQXJvTDF2NmlqWkhJZwplcVRtWE8zdUhxa0p4RVArMXlJQ3NUQTAxUVVxRTQ5WGlLUDVRTVJWUDltb2hsOENBd0VBQWFPQmtEQ0JqVEFkCkJnTlZIUTRFRmdRVWdOdmtvVkY2dW5CSFFrS0p6Mzk4ZlJSUUs2VXdVUVlEVlIwakJFb3dTSUFVZ052a29WRjYKdW5CSFFrS0p6Mzk4ZlJSUUs2V2hHcVFZTUJZeEZEQVNCZ05WQkFNTUMwVmhjM2t0VWxOQklFTkJnaFFKbWJPWQpjSGJpb0NsRy90bk1vOW9pdEFhT2N6QU1CZ05WSFJNRUJUQURBUUgvTUFzR0ExVWREd1FFQXdJQkJqQU5CZ2txCmhraUc5dzBCQVFzRkFBT0NBUUVBeW9xakNxa1lJSmQ3dUF5TFRHSnB3cFRLd2JTSTJPcXBRNkVDRUNpZjBaWkYKNHBTT1ArYjg1bzF3VmptZ2wvTW92VXBPYTRxN3NSekcyM052Z0lKTjFzOXlYTURlRTB4TDE3dmpzWVFGNUlGSwo0bFEzTVArSFVLMGprUWthNTBNeFZQUTRDWldrUmV0V2d6M2l0bk8zcFVLbGc3bWpxV1hVc3dUYkw1S01PQzZ0CnBWZFBsRzRPaWJIa004czRrNzJhb1ovdzRaMStsVUpORXRNQldkMnI4LytlcEpMOUp2dXN0cGlPcnYvVkF5eC8KWW9lcjRTZitxWDEvand0ZWNHWFFmNHFKMkJwL2h6a3F4YzNETFFiSGFxaEZGQ3o2T2pWRGxzc0tqNjEzeEJoTwpYbkJNT1NCbVFEMkxoZWlVcC9aN1E5ZlovdDE2WW9NZ0tHSngwOUl3VlE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
---
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
  name: ca
spec:
  ca:
    secretName: ca
---
# Certificates are generated with Jetstack Cert Manager.  This simplifies
# configuration by keeping it as code, avoiding having to use openssl (or
# similar) directly.  By using Cert Manager we can readily demonstrate to
# security auditors that certificates are rotated on a period basis and also
# conform to encryption strength constraints.
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: cluster1-certificate
spec:
  secretName: cluster1-server-tls
  duration: 720h
  renewBefore: 24h
  commonName: couchbase-server
  isCA: false
  privateKey:
    algorithm: RSA
    encoding: PKCS8
    size: 2048
  usages:
    - server auth
  dnsNames:
    - "*.cluster1"
    - "*.cluster1.default"
    - "*.cluster1.default.svc"
    - "*.cluster1.default.svc.cluster.local"
    - "cluster1-srv"
    - "cluster1-srv.default"
    - "cluster1-srv.default.svc"
    - "*.cluster1-srv.default.svc.cluster.local"
    - localhost
  issuerRef:
    name: ca
    group: cert-manager.io
    kind: IssuerCluster Management
Cluster configuration is quite complicated, so will not be discussed at length here. Instead comments are provided inline, where they are contextually relevant.
In general, the cluster is designed to be stable, fault tolerant and secure.
| Cluster scheduling requires on Kubernetes node being manually labeled for exclusive use by the Couchbase cluster. An example of how to perform this is documented in the cluster definition’s comments. | 
| There is no one-size-fits-all cluster topology. This documents an arbitrary selection of services and server class sizes. Consult Couchbase solutions engineering to determine the correct cluster sizing for your workload. | 
apiVersion: couchbase.com/v2
kind: CouchbaseCluster
metadata:
  name: cluster1
spec:
  image: couchbase/server:7.6.0
  # Always enable anti-affinity to limit the "blast radius", and also ensure that any
  # assumptions about data replication hold i.e. a Kubernetes node going down will only
  # affect at most one pod.
  antiAffinity: true
  # Always select RBAC rules based on a label to prevent unexpectedly picking up
  # and unlabelled resources created in this namespace.
  security:
    adminSecret: administrator-authentication
    rbac:
      managed: true
      selector:
        matchLabels:
          cluster: cluster1
  # Always select buckets based on a label to prevent unexpectedly picking up
  # and unlabelled resources created in this namespace.
  buckets:
    managed: true
    selector:
      matchLabels:
        cluster: cluster1
  cluster:
    # Each service will be on its own pod, each pod will be on its own node.
    dataServiceMemoryQuota: 1Gi
    indexServiceMemoryQuota: 1Gi
    queryServiceMemoryQuota: 1Gi
    # Fast auto-failover ensures that replica data becomes live quickly and
    # minimises impact for applications in the face of trouble or upgrades.
    autoFailoverTimeout: 5s
    autoFailoverOnDataDiskIssues: true
    autoFailoverOnDataDiskIssuesTimePeriod: 5s
  # Auto resource allocation takes the memory quotas defined in the cluster
  # section and applies them to pods in the various server classes we will
  # define in the servers section.  This manifests itself as Kubernetes
  # resource requests that ensure fair scheduling of pods across your
  # Kubernetes cluster.
  autoResourceAllocation:
    enabled: true
  # Enable managed TLS to protect all data from eavesdropping.
  networking:
    tls:
      secretSource:
        serverSecretName: cluster1-server-tls
  # Each server class will have its own storage template, this allows independent
  # scaling as the need arises, and also minimises the number of pods that are
  # affected by a particular change.  Do not under provision storage, use the high
  # performance solid state variety.
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      storageClassName: premium-rwo
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: index
    spec:
      storageClassName: premium-rwo
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: query
    spec:
      storageClassName: premium-rwo
      resources:
        requests:
          storage: 1Gi
  # Each service is hosted on its own set of pods.  This facilitates simple independent
  # scaling of services, simplifies memory allocation and reduces blast radius, affecting
  # only a single service at a time.
  servers:
  - name: data
    size: 3
    services:
    - data
    volumeMounts:
      default: data
    pod:
      spec:
        # By tainting all the nodes we intend to use, we ensure no other pods
        # are running on them, and we get exclusive use (noisy-neighbours)
        # Example:
        #   for i in gke-cluster-default-pool-94815a4b-jkhv \
        #            gke-cluster-default-pool-94815a4b-phl1 \
        #            gke-cluster-default-pool-94815a4b-w4bl \
        #            gke-cluster-default-pool-c9efc654-8krv \
        #            gke-cluster-default-pool-c9efc654-kt07 \
        #            gke-cluster-default-pool-c9efc654-kt5r; do \
        #     kubectl taint nodes $i application-specific=couchbase-server:NoExecute
        #   done
        tolerations:
        - key: application-specific
          value: couchbase-server
          effect: NoExecute
        # By selecting only nodes labeled for our use, we don't run where we
        # should not.
        # Example:
        #   for i in gke-cluster-default-pool-94815a4b-jkhv \
        #            gke-cluster-default-pool-94815a4b-phl1 \
        #            gke-cluster-default-pool-94815a4b-w4bl \
        #            gke-cluster-default-pool-c9efc654-8krv \
        #            gke-cluster-default-pool-c9efc654-kt07 \
        #            gke-cluster-default-pool-c9efc654-kt5r; do \
        #     kubectl label nodes $i application=couchbase-server
        #   done
        nodeSelector:
          application: couchbase-server
  - name: index
    size: 2
    services:
    - index
    volumeMounts:
      default: index
    pod:
      spec:
        tolerations:
        - key: application-specific
          value: couchbase-server
          effect: NoExecute
        nodeSelector:
          application: couchbase-server
  - name: query
    size: 1
    services:
    - query
    volumeMounts:
      default: query
    pod:
      spec:
        tolerations:
        - key: application-specific
          value: couchbase-server
          effect: NoExecute
        nodeSelector:
          application: couchbase-server