Autonomous Operator Troubleshooting
If you run into issues with the Autonomous Operator, you can troubleshoot by examining the logs and events that it generates.
The Autonomous Operator generates logs that can be used for auditing and troubleshooting purposes. This page describes logging that is specific to the Autonomous Operator itself. For information about Couchbase cluster logging, refer to Manage Couchbase Server Logging.
Overview
The Autonomous Operator generates logs that include information about itself and the various other Kubernetes components that make up the Operator deployment. These logs are distinct from the logs that are generated by the Couchbase Server application.
This page provides information about how to collect and scrutinize logging information that is produced by the Autonomous Operator. When troubleshooting the Autonomous Operator, it is important to first rule out Kubernetes itself as the root cause of the problem. The Kubernetes Troubleshooting Guide contains a great deal of helpful information about debugging applications within a Kubernetes cluster.
Familiarity with the Operator’s configuration settings can be helpful when troubleshooting the Autonomous Operator.
Collecting Autonomous Operator Logs
Using kubectl
or oc
, you can choose to print the Autonomous Operator logs to to standard console output.
-
Kubernetes
-
OpenShift
Start by getting the name of the Autonomous Operator pod.
$ kubectl get po -lapp=couchbase-operator
NAME READY STATUS RESTARTS AGE couchbase-operator-1917615544-h20bm 1/1 Running 0 20h
Use the pod name to get the logs.
$ kubectl logs couchbase-operator-1917615544-h20bm
time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main
Alternatively, you can specify the Autonomous Operator deployment to get the logs.
$ kubectl logs deployment/couchbase-operator
Since there is only one instance of the Autonomous Operator in the deployment, the the underlying command will automatically select the correct pod and print the logs.
Start by getting the name of the Autonomous Operator pod.
$ oc get po -lapp=couchbase-operator
NAME READY STATUS RESTARTS AGE couchbase-operator-1917615544-h20bm 1/1 Running 0 20h
Use the pod name to get the logs.
$ oc logs couchbase-operator-1917615544-h20bm
time="2018-01-23T22:56:34Z" level=info msg="couchbase-operator v1.1.0 (release)" module=main time="2018-01-23T22:56:34Z" level=info msg="Obtaining resource lock" module=main time="2018-01-23T22:56:34Z" level=info msg="Starting event recorder" module=main time="2018-01-23T22:56:34Z" level=info msg="Attempting to be elected the couchbase-operator leader" module=main time="2018-01-23T22:56:51Z" level=info msg="I'm the leader, attempt to start the operator" module=main time="2018-01-23T22:56:51Z" level=info msg="Creating the couchbase-operator controller" module=main
Alternatively, you can specify the Autonomous Operator deployment to get the logs.
$ oc logs deployment/couchbase-operator
Since there is only one instance of the Autonomous Operator in the deployment, the the underlying command will automatically select the correct pod and print the logs.
If you’re troubleshooting the Autonomous Operator, watch for the following messages which indicate that the Operator is unable to reconcile a Couchbase cluster into a desired state:
-
Logs with
level=error
-
Operator is unable to get cluster state after N retries
Profiling the Autonomous Operator
For more advanced troubleshooting, the Autonomous Operator supports the Go language pprof feature and serves profiling data on its default listen address localhost:8080
.
You can access this endpoint by running a remote shell or forwarding the port to your local system.
-
Kubernetes
-
OpenShift
To access goroutine stack traces using a shell:
$ kubectl exec -it couchbase-operator-599bcf47f-8wswh sh
$ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less
To access Go memory usage using a port forward:
$ kubectl port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
$ go tool pprof localhost:8080/debug/pprof/heap
(pprof) traces
To access goroutine stack traces using a shell:
$ oc exec -it couchbase-operator-599bcf47f-8wswh sh
$ wget -O- 'http://localhost:8080/debug/pprof/goroutine?debug=1' | less
To access Go memory usage using a port forward:
$ oc port-forward couchbase-operator-599bcf47f-8wswh 8080:8080
$ go tool pprof localhost:8080/debug/pprof/heap
(pprof) traces
Kubernetes Events
Kubernetes Events provide insights into what is happening inside a Kubernetes cluster. They record significant occurrences and changes in the state of resources, such as the creation, deletion, or failure of pods, nodes, services, and other Kubernetes objects.
They can be used to monitor changes that have occurred in the cluster, and can be helpful when troubleshooting issues with the Autonomous Operator. However, they expire after a certain period of time, typically one hour. You can use the Kubernetes Event Collector tool to collect and store events for longer periods of time.
The Kubernetes Event Collector (KEL) watches for Kubernetes events within a namespace and stores them to a buffer which can be stashed. It can be deployed and configured using helm
$ helm install event-collector charts/event-collector
For more details about the tool and how to use it, refer to the repo README: https://github.com/couchbase/couchbase-k8s-event-collector