Analytics

  • how-to
    +
    Parallel data management for complex queries over many records, using a familiar SQL++ syntax.

    This page covers Couchbase’s traditional Analytics Service, for long-running queries. A separate, as-a-service offering — Capella columnar — offers real-time analytics.

    Capella Columnar SDKs

    SDKs for Capella Columnar — Couchbase’s analytical database (RT-OLAP) for real time apps and operational intelligence — are in development, and will be arriving first for the Java, Node.js, and Python platforms.

    For complex and long-running queries, involving large ad hoc join, set, aggregation, and grouping operations, Couchbase Data Platform offers the Couchbase Analytics Service (CBAS). This is the analytic counterpart to our operational data focussed Query Service.

    The analytics service is available in Capella operational or the Enterprise Edition of self-managed Couchbase Server.

    Getting Started

    After familiarizing yourself with our introductory primer, in particular creating a dataset and linking it to a bucket to shadow the operational data, try Couchbase Analytics using the Node.js SDK. Intentionally, the API for analytics is very similar to that of the query service.

    var result = await cluster.analyticsQuery('SELECT "hello" AS greeting')
    result.rows.forEach((row) => {
      console.log(row)
    })

    Queries

    A query can either be simple or be parameterized. If parameters are used, they can either be positional or named. Here is one example of each:

    var result = await cluster.analyticsQuery(
      'SELECT airportname, country FROM airports WHERE country="France" LIMIT 3'
    )

    The query may be performed with positional parameters:

    var result = await cluster.analyticsQuery(
      'SELECT airportname, country FROM airports WHERE country = ? LIMIT 3',
      { parameters: ['France'] }
    )

    Alternatively, the query may be performed with named parameters:

    var result = await cluster.analyticsQuery(
      'SELECT airportname, country FROM airports WHERE country = $country LIMIT 3',
      { parameters: { country: 'France' } }
    )
    As timeouts are propagated to the server by the client, a timeout set on the client side may be used to stop the processing of a request, in order to save system resources. See example in the next section.

    Fluent API

    Additional parameters may be sent as part of the query, using the options block in the API. There are currently three parameters:

    • Client Context ID, sets a context ID that is returned back as part of the result. Uses the clientContextId option; default is a random UUID

    • Server Side Timeout, customizes the timeout sent to the server. Does not usually have to be set, as the client sets it based on the timeout on the operation. Uses the timeout option, and defaults to the Analytics timeout set on the client (75s). This can be adjusted at the cluster global config level.

    • Priority, set if the request should have priority over others. The priority option, defaults to false.

    Here, we give the request priority over others, and set a custom, server-side timeout value:

    var result = await cluster.analyticsQuery(
      'SELECT airportname, country FROM airports WHERE country="France" LIMIT 3',
      {
        priority: true,
        timeout: 100, // seconds
      }
    )

    Handling the Response

    Assuming that no errors are thrown during the exceution of your query, the return value will be a AnalyticsQueryResult object. You can access the individual rows which were returned through the rows property. These rows may contain various sorts of data and metadata, depending upon the nature of the query, as you will have seen when working through our introductory primer.

      var result = await cluster.analyticsQuery('SELECT "hello" AS greeting')
    
      result.rows.forEach((row) => {
        console.log('Greeting: %s', row.greeting)
      })

    MetaData

    The meta property of AnalyticsQueryResult contains useful metadata, such as metrics, which contains properties such as elapsedTime, and resultCount. Here is a snippet printing out some metrics from a query:

      var result = await cluster.analyticsQuery('SELECT "hello" AS greeting')
    
      console.log('Elapsed time: %d', result.meta.metrics.elapsedTime)
      console.log('Execution time: %d', result.meta.metrics.executionTime)
      console.log('Result count: %d', result.meta.metrics.resultCount)
      console.log('Error count: %d', result.meta.metrics.errorCount)

    For a listing of available metrics in the meta-data, see the Understanding Analytics SDK doc.

    Scoped Queries on Named Collections

    Given a dataset created against a collection, for example:

    ALTER COLLECTION `travel-sample`.inventory.airport ENABLE ANALYTICS;
    
    -- NB: this is more or less equivalent to:
    CREATE DATAVERSE `travel-sample`.inventory;
    CREATE DATASET `travel-sample`.inventory.airport ON `travel-sample`.inventory.airport;

    You can run a query as follows:

      var result = await cluster.analyticsQuery(
        'SELECT airportname, country FROM `travel-sample`.inventory.airport WHERE country="France" LIMIT 3'
      )

    Advanced Analytics Topics

    From Couchbase Data Platform 6.5, KV Ingestion is added to CBAS.