CouchbaseCluster Reference Architecture

      +
      How to configure a reference production deployment of Couchbase Server.

      Couchbase clusters can be configured in many different ways. We advertise features as opt-in, so you have the freedom to configure your cluster as suits your environment. There are, however, a core set of best practices that we recommend.

      This page collects together and aggregates those best practices into a single architecture. While it may not suit your environment completely, it may form the basis of your clusters.

      The majority of this page can be copied and used verbatim without modification. Non-dynamic elements of the configuration will be highlighted using admonitions such as this one.

      Prerequisites

      The reference architecture makes use of 3rd party resources, for ease of management and security compliance. Before continuing ensure you have installed the following on your Kubernetes cluster:

      RBAC Management

      Role based access control should be managed by the Operator. This allows your Couchbase users and groups to be defined in code. Code can be controlled with a change control system, it can be easily audited and reviewed, and crucially can be automated.

      # Applications and the Operator are able to use this secret.
      # For security, normal users within the namespace should be prohibited
      # from access to Secrets.  It is up to the administrator to also
      # ensure the passwords are not leaked from the application.
      apiVersion: v1
      kind: Secret
      metadata:
        name: application1-authentication
      type: Opaque
      stringData:
        password: pieWrewn5knyk&
      ---
      # Applications should have a user that they can use, this allows
      # strong guarantees of safety when using RBAC.  While strictly not
      # necessary, the labels provide filtering so resources are only
      # picked up by specific clusters.
      apiVersion: couchbase.com/v2
      kind: CouchbaseUser
      metadata:
        name: application1
        labels:
          cluster: cluster1
      spec:
        authDomain: local
        authSecret: application1-authentication
      ---
      # Groups control what the application is allowed to do.  The administrator
      # should limit group permissions to only what is absolutely necessary to
      # allow the application to function.  It also restricts those permissions
      # to a specific bucket that the application needs to access. While strictly not
      # necessary, the labels provide filtering so resources are only
      # picked up by specific clusters.
      apiVersion: couchbase.com/v2
      kind: CouchbaseGroup
      metadata:
        name: group1
        labels:
          cluster: cluster1
      spec:
        roles:
        - bucket: bucket1
          name: bucket_full_access
      ---
      # Role bindings are a bit of a misnomer.  They follow the behaviour of standard
      # kubernetes role bindings by creating a relationship between users and groups.
      apiVersion: couchbase.com/v2
      kind: CouchbaseRoleBinding
      metadata:
        name: group1
        labels:
          cluster: cluster1
      spec:
        roleRef:
          kind: CouchbaseGroup
          name: group1
        subjects:
        - kind: CouchbaseUser
          name: application1

      Bucket Management

      Buckets, like RBAC, should be managed by the Operator. Again this provides change control, peer review and auditing. This is essential for data containers, like buckets, as they have quotas, and these should be centrally controlled and managed to prevent resource starvation by over provisioning.

      # Buckets control the amount of data an application can use, and control how that
      # data is managed.  We need at least one data replica to ensure applications can
      # continue to work when a pod is evicted (usually for the purposes of Kubernetes
      # rolling upgrades, which should be carried out one every 3-4 months to ensure that
      # you are up to date with the latest security fixes from both Kubernetes and the
      # operator).
      apiVersion: couchbase.com/v2
      kind: CouchbaseBucket
      metadata:
        name: bucket1
        labels:
          cluster: cluster1
      spec:
        memoryQuota: 100Mi
        replicas: 2
        ioPriority: high
        enableIndexReplica: true

      Security Management

      Security is extremely important when working in a Cloud based environment such as Kubernetes. To this end we use TLS management by default to protect data from eavesdroppers.

      TLS is managed by 3rd party tooling, rather than manually. By doing this we simplify the process by removing manual steps, and also secure it by using policy based certificate rotation.

      The TLS subject alternative names described in this section contain the namespace of the cluster. In this example, that namespace is default. If you wish to deploy in a different namespace, then this will need to be updated to reflect that change.

      # Admin password is required by the Operator.  Like user secrets, this should be
      # protected from unauthorized reads.
      apiVersion: v1
      kind: Secret
      metadata:
        name: administrator-authentication
      type: Opaque
      stringData:
        username: Administrator
        password: Berv~fradrics3
      ---
      # The CA used to issue and sign server certificates is still a manual step.
      # This should be distributed to all clients who are going to consume this database
      # instance.  This CA is fixed for the lifetime of the cluster, so must be kept secure,
      # but does mean that clients will continue to function even when the server certificates
      # are automatically rotated periodically.
      apiVersion: v1
      kind: Secret
      metadata:
        name: ca
      data:
        tls.key: LS0tLS1CRUdJTiBSU0EgUFJJVkFURSBLRVktLS0tLQpNSUlFcEFJQkFBS0NBUUVBODNvSHArVnNDSGs4UEtsQTJGc1FyNk1yQjhoenpMUjZVTjJtTVdtSFpqWDFIem5WCm1ZYkVSQnhHQmVjbVVHRlRQZWo3Z3Y0NDl5QlV2ZXFIVCtDSjhubHRSbWxFZ24vV3NTYzZUYUl1UWRnRHdreVEKVEZ4UjkvR3JlNEY2M1BwcGNLNFpLQ3FtNjZ5Sk5qTktUQ2hBZEJsdFJyT0hqcDB2TTJrY01JTFl2VDFVVkQxUwpLRDRSQTl3R01XajJTOUQ4eDVnRjFOZHljYndPRWJmRzY3bTVZcFkrcmlIdWEyMHpoamJiZk5RbzN1SVdCeCtmCkVhTmNYOE1MTS9Jc2tVYzFEbTdiK3djaE9ZY2FFbWhIUXhYcHRqMDhCRUN1Z3ZXL3FLTmtjaUI2cE9aYzdlNGUKcVFuRVEvN1hJZ0t4TURUVkJTb1RqMWVJby9sQXhGVS8yYWlHWHdJREFRQUJBb0lCQVFDdGE4WDRPTmx5VDZndwpMUDRiSFFJTm1GTVdBQms3UFhIQ0Y1NUFvOEhsYzVsYzNIemdGYlhHTGIxU2h3b3JScWRiK1k3c0J0ZmNiaEx1CkV4YStObGtMZEtINC9SSG5RZGRSNTNjSHhQVGR3VmNzRmd6UjF4QXJZdCtaNE9mNmJnS2NWK1ZqVHI0R0w2YXMKREd4blFtUm1UWllnUGMvWUxPMXAyUHhUTVYvZnFXTzI3TW9jZml4bUs2MUl3V010bXd0QWNXc1FQZmlhWlRpMApSQlR0TzRyVHV5VnZESURjRlh1ZHVUaHlGNW5KTENsTERiazU5eFIxWlFLTTVoQjcvdjN3LzFneW1FbFNoY1hjClU2ME5TRDFBOTBmV1VMUkxzZXE5RXlyT3U3bGdrNDZBWmpxaGt0NDdqZkM4K3JuR0dUNnh1MWtkUjRJa0FBcnoKM1lzZm1JZUJBb0dCQVBxcjBsckV6R1FqYlk2RjVrTjJGeXVvb0txMFMrakNkRUlONFB1NmhjTnpXazFlTzA5agpBTWJFWlBxVFhkYnF6TlVMOTlkLytkL1JqcXR4dTRXbzZPWHhZSnNKNVU1bVROandVcVFKL3Y1am1ndjVvenlvCjY0RFFQdjE2aG1qSXhwcHRJd25EWC9KL2trdnlPZlVuQzY4dnZEQ3hxQkVHbDZ3UVVJWE12aE01QW9HQkFQaW4KRGhUR3hYZmFvOHhkWVRPbTdFUXVZa3l2Q2h0NUp5Tm05MWRRNEhnOTZPMmRYc3BWSDNrUllSYzFjMFdJZVM3MApHT0lSOGcrNXQvNXYvZVQ2R0pWOGMyc3VZRjErS1pTY3Z0bFFPYjJacitnUEN4SEtDVDY0SE83MnNFVGNYRGpaCjZ1cThiN0JCdnkxZXFtR1FsS2VZSUpEb1RKL0lpbGhFdGFFRHVlNVhBb0dBQWdJUVhGUEpRMkFaUjVRQkJUZFQKOWpDU29PdHkxRG1DanVqbmpYeXdCNkhMN21TNzJ1WHpJcVIrSHBmQm43QWYxZkVUbWpGWFFoaStxTmJ2WnFHMAp3K3JNR0ZIYStXYk9aTXFBRHZwWmhaWXNyTDNpTmVFd2ljYWhTb3lKdVJzcXBDQU5zTTFVM205eEw1U1FMRXVVCngyRjlnM0pZNDFJSE13U3FjSGYwYWRrQ2dZQXFTZThoSlhVc0h5bEFkcGt6ZWE0eElscGhoRnVKdEo4dGJET2cKekFhQkxMWlN3ekw5NG1CSjdPVEFWN3pWRkpMWG8zZ2Y2c0ZxWDBHbHFsSmFBUmJ4UllzenJWMkNTUlMxUzd0QgpwbDFMbTduSkU5WGtIcUpYNG1RNVdBYytqdU80WDRlT2lLSE9Ma0JmYlB3NVA2ZW9vVHpZcUVsdjIyRjhCYU9HClVPWHNYUUtCZ1FEV0xHcThpeWRXK2FkK1k4blZxV1VUZnl4VW5yaW5TVG95ei9nT3dDZ2ZoS2FSdGpYclEvUmEKSVlTczNjR016SldwZXZkWGRDc2JSL0I2L09VSSsyTlRqcEo4c3FVZnhjbHpMeU4xQjlQUFhGeDF5WTV5ZDFJZApVQnRLTG5KRjBSQlRzWnNmVG0vUUpJeGY3Y2ZWUGRlTGcxMEdJeFhzaG5GV0FLUHhveUhSeGc9PQotLS0tLUVORCBSU0EgUFJJVkFURSBLRVktLS0tLQo=
        tls.crt: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSURTekNDQWpPZ0F3SUJBZ0lVQ1ptem1IQjI0cUFwUnY3WnpLUGFJclFHam5Nd0RRWUpLb1pJaHZjTkFRRUwKQlFBd0ZqRVVNQklHQTFVRUF3d0xSV0Z6ZVMxU1UwRWdRMEV3SGhjTk1qQXhNakUzTVRJMU9ESXlXaGNOTXpBeApNakUxTVRJMU9ESXlXakFXTVJRd0VnWURWUVFEREF0RllYTjVMVkpUUVNCRFFUQ0NBU0l3RFFZSktvWklodmNOCkFRRUJCUUFEZ2dFUEFEQ0NBUW9DZ2dFQkFQTjZCNmZsYkFoNVBEeXBRTmhiRUsrakt3ZkljOHkwZWxEZHBqRnAKaDJZMTlSODUxWm1HeEVRY1JnWG5KbEJoVXozbys0TCtPUGNnVkwzcWgwL2dpZko1YlVacFJJSi8xckVuT2syaQpMa0hZQThKTWtFeGNVZmZ4cTN1QmV0ejZhWEN1R1NncXB1dXNpVFl6U2t3b1FIUVpiVWF6aDQ2ZEx6TnBIRENDCjJMMDlWRlE5VWlnK0VRUGNCakZvOWt2US9NZVlCZFRYY25HOERoRzN4dXU1dVdLV1BxNGg3bXR0TTRZMjIzelUKS043aUZnY2ZueEdqWEYvREN6UHlMSkZITlE1dTIvc0hJVG1IR2hKb1IwTVY2Ylk5UEFSQXJvTDF2NmlqWkhJZwplcVRtWE8zdUhxa0p4RVArMXlJQ3NUQTAxUVVxRTQ5WGlLUDVRTVJWUDltb2hsOENBd0VBQWFPQmtEQ0JqVEFkCkJnTlZIUTRFRmdRVWdOdmtvVkY2dW5CSFFrS0p6Mzk4ZlJSUUs2VXdVUVlEVlIwakJFb3dTSUFVZ052a29WRjYKdW5CSFFrS0p6Mzk4ZlJSUUs2V2hHcVFZTUJZeEZEQVNCZ05WQkFNTUMwVmhjM2t0VWxOQklFTkJnaFFKbWJPWQpjSGJpb0NsRy90bk1vOW9pdEFhT2N6QU1CZ05WSFJNRUJUQURBUUgvTUFzR0ExVWREd1FFQXdJQkJqQU5CZ2txCmhraUc5dzBCQVFzRkFBT0NBUUVBeW9xakNxa1lJSmQ3dUF5TFRHSnB3cFRLd2JTSTJPcXBRNkVDRUNpZjBaWkYKNHBTT1ArYjg1bzF3VmptZ2wvTW92VXBPYTRxN3NSekcyM052Z0lKTjFzOXlYTURlRTB4TDE3dmpzWVFGNUlGSwo0bFEzTVArSFVLMGprUWthNTBNeFZQUTRDWldrUmV0V2d6M2l0bk8zcFVLbGc3bWpxV1hVc3dUYkw1S01PQzZ0CnBWZFBsRzRPaWJIa004czRrNzJhb1ovdzRaMStsVUpORXRNQldkMnI4LytlcEpMOUp2dXN0cGlPcnYvVkF5eC8KWW9lcjRTZitxWDEvand0ZWNHWFFmNHFKMkJwL2h6a3F4YzNETFFiSGFxaEZGQ3o2T2pWRGxzc0tqNjEzeEJoTwpYbkJNT1NCbVFEMkxoZWlVcC9aN1E5ZlovdDE2WW9NZ0tHSngwOUl3VlE9PQotLS0tLUVORCBDRVJUSUZJQ0FURS0tLS0tCg==
      ---
      apiVersion: cert-manager.io/v1
      kind: Issuer
      metadata:
        name: ca
      spec:
        ca:
          secretName: ca
      ---
      # Certificates are generated with Jetstack Cert Manager.  This simplifies
      # configuration by keeping it as code, avoiding having to use openssl (or
      # similar) directly.  By using Cert Manager we can readily demonstrate to
      # security auditors that certificates are rotated on a period basis and also
      # conform to encryption strength constraints.
      apiVersion: cert-manager.io/v1
      kind: Certificate
      metadata:
        name: cluster1-certificate
      spec:
        secretName: cluster1-server-tls
        duration: 720h
        renewBefore: 24h
        commonName: couchbase-server
        isCA: false
        privateKey:
          algorithm: RSA
          encoding: PKCS8
          size: 2048
        usages:
          - server auth
        dnsNames:
          - "*.cluster1"
          - "*.cluster1.default"
          - "*.cluster1.default.svc"
          - "*.cluster1.default.svc.cluster.local"
          - "cluster1-srv"
          - "cluster1-srv.default"
          - "cluster1-srv.default.svc"
          - "*.cluster1-srv.default.svc.cluster.local"
          - localhost
        issuerRef:
          name: ca
          group: cert-manager.io
          kind: Issuer

      Cluster Management

      Cluster configuration is quite complicated, so will not be discussed at length here. Instead comments are provided inline, where they are contextually relevant.

      In general, the cluster is designed to be stable, fault tolerant and secure.

      Cluster scheduling requires on Kubernetes node being manually labeled for exclusive use by the Couchbase cluster. An example of how to perform this is documented in the cluster definition’s comments.

      There is no one-size-fits-all cluster topology. This documents an arbitrary selection of services and server class sizes. Consult Couchbase solutions engineering to determine the correct cluster sizing for your workload.

      apiVersion: couchbase.com/v2
      kind: CouchbaseCluster
      metadata:
        name: cluster1
      spec:
        image: couchbase/server:7.6.0
        # Always enable anti-affinity to limit the "blast radius", and also ensure that any
        # assumptions about data replication hold i.e. a Kubernetes node going down will only
        # affect at most one pod.
        antiAffinity: true
        # Always select RBAC rules based on a label to prevent unexpectedly picking up
        # and unlabelled resources created in this namespace.
        security:
          adminSecret: administrator-authentication
          rbac:
            managed: true
            selector:
              matchLabels:
                cluster: cluster1
        # Always select buckets based on a label to prevent unexpectedly picking up
        # and unlabelled resources created in this namespace.
        buckets:
          managed: true
          selector:
            matchLabels:
              cluster: cluster1
        cluster:
          # Each service will be on its own pod, each pod will be on its own node.
          dataServiceMemoryQuota: 1Gi
          indexServiceMemoryQuota: 1Gi
          queryServiceMemoryQuota: 1Gi
          # Fast auto-failover ensures that replica data becomes live quickly and
          # minimises impact for applications in the face of trouble or upgrades.
          autoFailoverTimeout: 5s
          autoFailoverOnDataDiskIssues: true
          autoFailoverOnDataDiskIssuesTimePeriod: 5s
        # Auto resource allocation takes the memory quotas defined in the cluster
        # section and applies them to pods in the various server classes we will
        # define in the servers section.  This manifests itself as Kubernetes
        # resource requests that ensure fair scheduling of pods across your
        # Kubernetes cluster.
        autoResourceAllocation:
          enabled: true
        # Enable managed TLS to protect all data from eavesdropping.
        networking:
          tls:
            secretSource:
              serverSecretName: cluster1-server-tls
        # Each server class will have its own storage template, this allows independent
        # scaling as the need arises, and also minimises the number of pods that are
        # affected by a particular change.  Do not under provision storage, use the high
        # performance solid state variety.
        volumeClaimTemplates:
        - metadata:
            name: data
          spec:
            storageClassName: premium-rwo
            resources:
              requests:
                storage: 1Gi
        - metadata:
            name: index
          spec:
            storageClassName: premium-rwo
            resources:
              requests:
                storage: 1Gi
        - metadata:
            name: query
          spec:
            storageClassName: premium-rwo
            resources:
              requests:
                storage: 1Gi
        # Each service is hosted on its own set of pods.  This facilitates simple independent
        # scaling of services, simplifies memory allocation and reduces blast radius, affecting
        # only a single service at a time.
        servers:
        - name: data
          size: 3
          services:
          - data
          volumeMounts:
            default: data
          pod:
            spec:
              # By tainting all the nodes we intend to use, we ensure no other pods
              # are running on them, and we get exclusive use (noisy-neighbours)
              # Example:
              #   for i in gke-cluster-default-pool-94815a4b-jkhv \
              #            gke-cluster-default-pool-94815a4b-phl1 \
              #            gke-cluster-default-pool-94815a4b-w4bl \
              #            gke-cluster-default-pool-c9efc654-8krv \
              #            gke-cluster-default-pool-c9efc654-kt07 \
              #            gke-cluster-default-pool-c9efc654-kt5r; do \
              #     kubectl taint nodes $i application-specific=couchbase-server:NoExecute
              #   done
              tolerations:
              - key: application-specific
                value: couchbase-server
                effect: NoExecute
              # By selecting only nodes labeled for our use, we don't run where we
              # should not.
              # Example:
              #   for i in gke-cluster-default-pool-94815a4b-jkhv \
              #            gke-cluster-default-pool-94815a4b-phl1 \
              #            gke-cluster-default-pool-94815a4b-w4bl \
              #            gke-cluster-default-pool-c9efc654-8krv \
              #            gke-cluster-default-pool-c9efc654-kt07 \
              #            gke-cluster-default-pool-c9efc654-kt5r; do \
              #     kubectl label nodes $i application=couchbase-server
              #   done
              nodeSelector:
                application: couchbase-server
        - name: index
          size: 2
          services:
          - index
          volumeMounts:
            default: index
          pod:
            spec:
              tolerations:
              - key: application-specific
                value: couchbase-server
                effect: NoExecute
              nodeSelector:
                application: couchbase-server
        - name: query
          size: 1
          services:
          - query
          volumeMounts:
            default: query
          pod:
            spec:
              tolerations:
              - key: application-specific
                value: couchbase-server
                effect: NoExecute
              nodeSelector:
                application: couchbase-server