A newer version of this documentation is available.

View Latest

Grouping and Aggregation

  • how-to
    +

    How to calculate aggregates and group the results.
    This guide is for Couchbase Server.

    Introduction

    You can use aggregate functions to perform calculations over multiple values. Grouping enables you to display the results in groups.

    If you want to try out the examples in this section, follow the instructions given in Do a Quick Install to install Couchbase Server, configure a cluster, and load a sample dataset. Read the following for further information about the tools available for editing and executing queries:

    Aggregate Functions

    To take multiple values from documents, perform calculations, and return a single value as the result, use an aggregate function, such as AVG(), COUNT(), MIN(), MAX(), or SUM().

    For example, the following query finds the average altitude of airports in the airport keyspace.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT AVG(geo.alt) AS AverageAltitude FROM airport;
    Results
    [
      {
        "AverageAltitude": 870.1651422764228
      }
    ]

    For more information and examples, refer to Aggregate Functions.

    Aggregating Distinct Values

    To aggregate all values, omit the aggregate quantifier, or optionally include the ALL keyword before the function arguments.

    For example, the following query finds the average number of stops per route.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT AVG(ALL stops) AS AvgAllStops FROM route;
    Results
    [
      {
        "AvgAllStops": 0.0002
      }
    ]

    Results in 0.0002 since nearly all routes have 0 stops.

    To aggregate distinct values only, include the DISTINCT keyword before the function arguments.

    For example, the following query finds the average of the distinct numbers of stops.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT AVG(DISTINCT stops) AS AvgDistinctStops FROM route;
    Results
    [
      {
        "AvgDistinctStops": 0.5
      }
    ]

    Results in 0.5 since the routes contain only 1 or 0 stops.

    For more information, refer to Aggregate Quantifier.

    Filtering the Aggregates

    To filter the values used by an aggregate function, use the FILTER clause after the function.

    For example, the following query finds the minimum value of a string field, only including strings that start with "A" or greater.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT MIN(name) FILTER (WHERE SUBSTR(name,0)>="A") AS MinName
    FROM hotel;
    Results
    [
      {
        "MinName": "AIRE NATURELLE LE GROZEAU Aire naturelle"
      }
    ]

    For more information, refer to FILTER Clause.

    Grouping the Results

    By default, an aggregate function returns a single result for all the documents that the query selects. It is often more useful to group the documents (by a different field) and return the aggregate result for each group.

    To group the results of an aggregate query, use the GROUP BY clause.

    For example, the following query groups unique landmarks by city.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT city City, COUNT(DISTINCT name) LandmarkCount
    FROM landmark
    GROUP BY city
    ORDER BY LandmarkCount DESC
    LIMIT 4;
    Results
    [
      {
        "City": "San Francisco",
        "LandmarkCount": 797
      },
      {
        "City": "London",
        "LandmarkCount": 443
      },
      {
        "City": "Los Angeles",
        "LandmarkCount": 284
      },
      {
        "City": "San Diego",
        "LandmarkCount": 197
      }
    ]

    For more information and examples, refer to GROUP BY Clause.

    Filtering the Groups

    To filter the groups by an aggregate function, use the HAVING clause within the GROUP BY clause.

    For example, the following query groups unique landmarks by city, and specifies cities that have more than 180 landmarks.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT city City, COUNT(DISTINCT name) LandmarkCount
    FROM landmark
    GROUP BY city
    HAVING COUNT(DISTINCT name) > 180;
    Results
    [
      {
        "City": "London",
        "LandmarkCount": 443
      },
      {
        "City": "Los Angeles",
        "LandmarkCount": 284
      },
      {
        "City": "San Francisco",
        "LandmarkCount": 797
      },
      {
        "City": "San Diego",
        "LandmarkCount": 197
      }
    ]

    For more information and examples, refer to HAVING Clause.

    Defining an Expression within the GROUP BY Clause

    To define an expression for use within the GROUP BY clause, use the LETTING clause before the HAVING clause.

    For example, the following clause uses an expression to define the minimum number of landmarks for each city.

    Context

    Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

    Query
    SELECT city City, COUNT(DISTINCT name) LandmarkCount
    FROM landmark
    GROUP BY city
    LETTING MinimumThingsToSee = 400
    HAVING COUNT(DISTINCT name) > MinimumThingsToSee;
    Results
    [
      {
        "City": "London",
        "LandmarkCount": 443
      },
      {
        "City": "San Francisco",
        "LandmarkCount": 797
      }
    ]

    For more information and examples, refer to LETTING Clause.

    Reference and explanation:

    Tutorials:

    Querying with SDKs: