Configure CMOS for Kubernetes deployment

Developer Preview

Tutorials are provided to demonstrate how a particular problem may be solved. Tutorials are accurate at the time of writing but rely heavily on third party software. The third party software is not directly supported by Couchbase. For further help in the event of a problem, contact the relevant software maintainer.

Overview

Couchbase Monitoring and Observability Stack (also known as CMOS) is a simple, out-of-the-box solution built using industry standard tooling to observe the state of a running Couchbase cluster. CMOS can be deployed to monitor the Couchbase clusters deployed via Couchbase Autonomous Operator (CAO) running on Kubernetes.

Deploy CMOS

CMOS is deployed on Kubernetes platform using a standard set of resources like Deployment, Services etc. The following sections describe how to deploy these standard objects. They also include information on configuring these services.

Prometheus Configuration

The prometheus configuration file is a standard way of specifying the configuration for the prometheus server. Prometheus configuration can be externalized using a Kubernetes ConfigMap, which contains all the details including credentials, and targets to scrape metrics. By externalizing Prometheus configuration to a Kubernetes config map, you don’t have to build the Prometheus image whenever you need to add or remove a configuration. You simply need to update the config map and restart the Prometheus pods to apply the new configuration.

The following is an example file that contains default configuration for CMOS prometheus to work out of the box. Run the below command in the console to create it.

mkdir -p ./prometheus/custom/alerting
cat <<EOF >./prometheus/custom/prometheus-k8s.yml
# This is a template file we use so we can substitute environment variables at launch
global:
    scrape_interval: 30s
    evaluation_interval: 30s
  # scrape_timeout is set to the global default (10s).

  # Attach these labels to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
    external_labels:
        monitor: couchbase-observability-stack

# Load and evaluate rules in this file every 'evaluation_interval' seconds.
rule_files:
  # All Couchbase default rules go here
    - /etc/prometheus/alerting/couchbase/*.yaml
    - /etc/prometheus/alerting/couchbase/*.yml
  # All custom rules can go here: relative to this file
    - alerting/*.yaml
    - alerting/*.yml

alerting:
    alertmanagers:
        - scheme: http
    # tls_config:
    #   ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    # bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
          path_prefix: /alertmanager/
    # Assumption is we always have AlertManager with Prometheus
          static_configs:
              - targets:
                    - localhost:9093
    # Discover alert manager instances using K8s service discovery
    # kubernetes_sd_configs:
    #   - role: pod
    # relabel_configs:
    # - source_labels: [__meta_kubernetes_namespace]
    #   regex: monitoring
    #   action: keep
    # - source_labels: [__meta_kubernetes_pod_label_app]
    #   regex: prometheus
    #   action: keep
    # - source_labels: [__meta_kubernetes_pod_label_component]
    #   regex: alertmanager
    #   action: keep
    # - source_labels: [__meta_kubernetes_pod_container_port_number]
    #   regex:
    #   action: drop

scrape_configs:
    - job_name: prometheus
      metrics_path: /prometheus/metrics
      static_configs:
          - targets: [localhost:9090]

    - job_name: couchbase-grafana
      file_sd_configs:
          - files:
                - /etc/prometheus/couchbase/monitoring/*.json
            refresh_interval: 30s

  # TODO: add unauthenticated endpoint
    - job_name: couchbase-cluster-monitor
      basic_auth:
          username: admin
          password: password
      metrics_path: /api/v1/_prometheus
    # For basic auth we cannot use file_sd
      static_configs:
          - targets: [localhost:7196]

  # Used for kubernetes deployment as we can discover the end points to scrape from the API
    - job_name: couchbase-kubernetes-pods
      # Server 7 requires authentication
      basic_auth:
          username: admin
          password: password
      kubernetes_sd_configs:
          - role: pod
      relabel_configs:
      # Scrape pods labelled with app=couchbase and then only port 8091 (server 7), 9091 (exporter) or 2020 (fluent bit)
          - source_labels: [__meta_kubernetes_pod_label_app]
            action: keep
            regex: couchbase
          - source_labels: [__meta_kubernetes_pod_container_port_number]
            action: keep
            regex: (8091|9091|2020)
          - action: labelmap
            regex: __meta_kubernetes_pod_label_(.+)
          - source_labels: [__meta_kubernetes_namespace]
            action: replace
            target_label: kubernetes_namespace
          - source_labels: [__meta_kubernetes_pod_name]
            action: replace
            target_label: kubernetes_pod_name
          - source_labels: [__meta_kubernetes_pod_label_couchbase_cluster]
            action: replace
            target_label: cluster

  # Kube-state-metrics default service to scrape
    - job_name: kube-state-metrics
      static_configs:
          - targets: [kube-state-metrics:8080]
EOF

1	`rule_files`: Prometheus is configured to load rules via rule_files. You can extend rules by adding rule_files under the `alerting/` directory. Note that the `alerting/` directory is a relative path to the prometheus configuration file. By default the complete path is `/etc/prometheus/custom/alerting/`. Refer to the Observability Stack section for volume mounts.
2	`alerting`: Alert manager is shipped and enabled by default in the CMOS. This section has various configurations of alert managers.
3	`scrape_configs`: All the targets to scrape metrics are defined here. This includes `prometheus`, `couchbase-grafana`, `couchbase-cluster-monitor`, `couchbase-kubernetes-pods` and `kube-state-metrics`. We try to discover the couchbase pods using labels.

Run the below command in kubernetes console to create the prometheus config map from the configuration file:

kubectl create configmap prometheus-config-cmos --from-file=./prometheus/custom/

Observability Stack

Kubernetes controls access to its resources using Role Based Access Control (RBAC). In order to monitor the Couchbase cluster, the CMOS deployment must communicate with the cluster and discover it. The example YAML file handles this for you. Create it by running the below command in the kubernetes console.

cat <<EOF | kubectl apply -f -
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
   name: monitoring-endpoints-role
   labels:
       rbac.couchbase.observability.com/aggregate-to-monitoring: 'true'
rules:
   - apiGroups: [''] (1)
     resources: [services, endpoints, pods, secrets]
     verbs: [get, list, watch]
   - apiGroups: [couchbase.com] (2)
     resources: [couchbaseclusters]
     verbs: [get, list, watch]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
   name: monitoring-role-binding (3)
roleRef:
   kind: ClusterRole
   name: monitoring-endpoints-role
   apiGroup: rbac.authorization.k8s.io
subjects:
   - kind: Group
     name: system:serviceaccounts
     apiGroup: rbac.authorization.k8s.io
EOF

In this configure file, you can see that the cluster role is defined by specifying the following permissions:

1	Access to standard Kubernetes resources: CMOS requires get, list and watch permissions to `services, endpoints, pods, secrets` resources.
2	Couchbase Custom Resource Definition: CMOS requires get, list and watch permissions to `couchbaseclusters` resource.
3	`monitoring-role-binding`: This role binding is required to give the permissions created in ClusterRole to the service account of CMOS.

The actual CMOS workload runs as a Kubernetes deployment along with other supporting services. Create it by running the below command in the Kubernetes console.

cat <<EOF | kubectl apply -f -
apiVersion: apps/v1
kind: Deployment
metadata:
   name: couchbase-grafana
spec:
   selector:
       matchLabels:
           run: couchbase-grafana
   replicas: 1
   template:
       metadata:
           labels:
               run: couchbase-grafana
       spec:
           containers:
               - name: couchbase-grafana
                 image: couchbase/observability-stack
                 ports:
                     - name: http
                       containerPort: 8080
                     - name: loki # So we can push logs to it
                       containerPort: 3100
                 env:
                     - name: KUBERNETES_DEPLOYMENT
                       value: 'true'
                     - name: ENABLE_LOG_TO_FILE
                       value: 'true'
                     - name: PROMETHEUS_CONFIG_FILE
                       value: /etc/prometheus/custom/prometheus-k8s.yml
                     - name: PROMETHEUS_CONFIG_TEMPLATE_FILE
                       value: ignore
                   # - name: DISABLE_LOKI
                   #   value: "true"
                 volumeMounts:
                     - name: prometheus-config-volume
                       mountPath: /etc/prometheus/custom # keep /etc/prometheus for any defaults
     # Now we watch for changes to the volumes and auto-reload the prometheus configuration if seen
               - name: prometheus-config-watcher
                 image: weaveworks/watch:master-9199bf5
                 args: [-v, -t, -p=/etc/prometheus/custom, curl, -X, POST, --fail, -o, '-', -sS, http://localhost:8080/prometheus/-/reload]
                 volumeMounts:
                     - name: prometheus-config-volume
                       mountPath: /etc/prometheus/custom
           volumes:
               - name: prometheus-config-volume
                 configMap:
                     name: prometheus-config-cmos
EOF

After the observability dashboard is deployed, we need to create a service to access CMOS. Create it by running the below command in the kubernetes console.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
   name: couchbase-grafana-http
   labels:
       run: couchbase-grafana
spec:
   ports:
       - port: 8080 (1)
         protocol: TCP
   selector:
       run: couchbase-grafana
EOF

1	The observability monitoring service runs on port 8080 by default.

Create a service for accessing Loki by running the below command in the kubernetes console.

cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: Service
metadata:
   name: loki
   labels:
       run: couchbase-grafana
spec:
   ports:
       - port: 3100
         protocol: TCP
   selector:
       run: couchbase-grafana
EOF

Deploy Couchbase

The Couchbase Helm chart is used to deploy Couchbase Autonomous Operator and a default configuration for the Couchbase Server pods.

It is easy to get started by using the default configuration values. However, if you want to modify any of the values to meet your specific requirements, please see the section below Helm documentation.

We provide an example to show how to set up log forwarding to CMOS via Kubernetes annotations on the pod.

Execute the below commands to create a helm values file with custom values.

cat <<EOF >custom-values.yaml
cluster:
   logging:
       server:
           enabled: true
           sidecar:
               image: couchbase/fluent-bit:1.2.2
   monitoring:
       prometheus:
           enabled: false # We're using server 7 metrics directly
           image: couchbase/exporter:1.0.6
   security:
       username: admin
       password: password (1)
   servers:
       # We use custom annotations to forward to CMOS Loki
       default:
           size: 3
           pod:
               metadata:
                   annotations:
                       # Match all logs
                       fluentbit.couchbase.com/loki_match: "*"
                       # Send to this SVC
                       fluentbit.couchbase.com/loki_host: loki.default
           volumeMounts:
               default: couchbase
   volumeClaimTemplates:
       - metadata:
             name: couchbase
         spec:
             resources:
                 requests:
                     storage: 1Gi
EOF

1	We recommend specifying a stronger password.

If you already have the Couchbase operator deployed using helm or are considering a new deployment, the below command can be used with custom values to enable CMOS. If it is deployed using command line tools, you have to update the existing service using the kubectl patch with custom values mentioned above.

By using the command below, you can upgrade the existing version of an already deployed Couchbase operator. If the operator is not yet installed, it will install it.

Upgrades to an installed version of operator should be handled with extreme caution. Invalid custom-values.yaml can cause issues in the operator installation.

In the command below, the default values can in turn be overridden by a user-supplied values file specified using the --set parameters.

helm repo add couchbase https://couchbase-partners.github.io/helm-charts/
helm upgrade --install couchbase couchbase/couchbase-operator --set cluster.image=couchbase/server:7.0.2 --values=custom-values.yaml

Accessing CMOS

Deploying Ingress

In order to access the cluster, we set up a Kubernetes Ingress to forward traffic from our localhost to the appropriate parts of the cluster.

There are two aspects required here:

Provide an Ingress controller, which is Nginx in this case.
Set up Ingress to forward to our CMOS service.

For a production system it is likely an Ingress controller will already be deployed with appropriate rules.

Follow this Nginx Ingress Controller guideline to setup it.

As soon as the Ingress controller is installed and ready, the last step is to deploy the Ingress configuration as shown below :.

# Ingress to forward to our web server including sub-paths: we should just forward what we need but for local testing just sending it all.
cat <<EOF | kubectl apply -f -
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
   name: couchbase-ingress
   annotations:
       kubernetes.io/ingress.class: nginx
       nginx.ingress.kubernetes.io/rewrite-target: /
spec:
   rules:
       - http:
             paths:
                 - path: /
                   pathType: Prefix
                   backend:
                       service:
                           name: couchbase-grafana-http
                           port:
                               number: 8080
EOF

If everything is deployed properly, you should be able to access http://localhost (or whatever the Ingress* is for your deployment).

You should see a landing page which includes links to documentation, cluster manager service and various other services of CMOS.

Figure 1. CMOS landing image

Add Couchbase Cluster

You can access the Couchbase cluster from the CMOS dashboard using the “Add Cluster” option. In this section, we need to enter a few details and then click on "Add Cluster". Enter the Couchbase Server hostname, username and password credentials.

By default the username is “admin” and password is “password”. We recommend specifying a stronger password in the custom-values.yaml file during CMOS installation.

Remove the check from “Add to Prometheus” option, because prometheus scraping will be configured using service discovery.

Figure 2. Add cluster image

As soon as you add a cluster, you will see a Grafana URL where you can view inventory and metrics of Couchbase server clusters.

Figure 3. Couchbase inventory image

Prometheus Targets

From the "Prometheus Targets" option, we can see the prometheus targets and their details. For instance, we can filter the targets to show all targets or unhealthy targets. The state information tells which prometheus targets are running. The last scrape value shows how long ago the target metrics were scraped.

Figure 4. Prometheus target image

Whilst the pods are coming up, some may report as failing but these will resolve once the pods are running.

Figure 5. Prometheus target failing image

Grafana Dashboard

With Grafana, multiple dashboards for monitoring a Couchbase cluster are provided out of the box. You can list all the dashboards using the search dashboard option. You may create additional dashboards as per your needs. The following are some out-of-the-box dashboards.

Couchbase Cluster Overview Metrics

Couchbase cluster overview metrics dashboard can be accessible on Grafana with name as: single-cluster-overview. This dashboard displays a number of items including Couchbase nodes, buckets available, version information, health check warnings, and which services are running.

Figure 6. Couchbase cluster overview image

Couchbase Node Overview Metrics

Couchbase node overview metrics dashboard can be accessible on Grafana with name as: node-overview. This dashboard displays from a node perspective, resource utilization, version, and health check warnings.

Figure 7. Couchbase node overview image

Alerts

CMOS comes with pre-installed alert rules to monitor the Couchbase cluster. Navigate to Alertmanager, prometheus UI to check the alert rules and alerts. For more information check the prometheus and alerting configuration section.

Figure 8. Alert rules image

Figure 9. Alerts image

Alertmanager

Alertmanager is shipped and enabled by default in the CMOS, which is accessible via “Alert Manager” options. You can view all the generated alerts in this dashboard.

Figure 10. Alertmanager image

Loki

Loki, which is shipped with Grafana, allows access to logs of various components. You can configure it via Configuration > Data sources > Loki > Explore

Figure 11. Loki explore dashboard image

From the Log browser, you can enter a custom Loki query or select appropriate labels to see the logs.

Figure 12. Loki log browser image

After that select the “Show logs” to view logs. You can also build custom Grafana dashboards based on your needs.

Figure 13. Loki logs image