Grouping and Aggregation

how-to

How to calculate aggregates and group the results.
This guide is for Couchbase Server.

Introduction

You can use aggregate functions to perform calculations over multiple values. Grouping enables you to display the results in groups.

If you want to try out the examples in this section, follow the instructions given in Do a Quick Install to install Couchbase Server, configure a cluster, and load a sample dataset. Read the following for further information about the tools available for editing and executing queries:

Aggregate Functions

To take multiple values from documents, perform calculations, and return a single value as the result, use an aggregate function, such as AVG(), COUNT(), MIN(), MAX(), or SUM().

For example, the following query finds the average altitude of airports in the airport keyspace.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT AVG(geo.alt) AS AverageAltitude FROM airport;

Results

[
  {
    "AverageAltitude": 870.1651422764228
  }
]

For more information and examples, refer to Aggregate Functions.

Aggregating Distinct Values

To aggregate all values, omit the aggregate quantifier, or optionally include the ALL keyword before the function arguments.

For example, the following query finds the average number of stops per route.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT AVG(ALL stops) AS AvgAllStops FROM route;

Results

[
  {
    "AvgAllStops": 0.0002
  }
]

Results in 0.0002 since nearly all routes have 0 stops.

To aggregate distinct values only, include the DISTINCT keyword before the function arguments.

For example, the following query finds the average of the distinct numbers of stops.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT AVG(DISTINCT stops) AS AvgDistinctStops FROM route;

Results

[
  {
    "AvgDistinctStops": 0.5
  }
]

Results in 0.5 since the routes contain only 1 or 0 stops.

For more information, refer to Aggregate Quantifier.

Filtering the Aggregates

To filter the values used by an aggregate function, use the FILTER clause after the function.

For example, the following query finds the minimum value of a string field, only including strings that start with "A" or greater.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT MIN(name) FILTER (WHERE SUBSTR(name,0)>="A") AS MinName
FROM hotel;

Results

[
  {
    "MinName": "AIRE NATURELLE LE GROZEAU Aire naturelle"
  }
]

For more information, refer to FILTER Clause.

Grouping the Results

By default, an aggregate function returns a single result for all the documents that the query selects. It is often more useful to group the documents (by a different field) and return the aggregate result for each group.

To group the results of an aggregate query, use the GROUP BY clause.

For example, the following query groups unique landmarks by city.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT city City, COUNT(DISTINCT name) LandmarkCount
FROM landmark
GROUP BY city
ORDER BY LandmarkCount DESC
LIMIT 4;

Results

[
  {
    "City": "San Francisco",
    "LandmarkCount": 797
  },
  {
    "City": "London",
    "LandmarkCount": 443
  },
  {
    "City": "Los Angeles",
    "LandmarkCount": 284
  },
  {
    "City": "San Diego",
    "LandmarkCount": 197
  }
]

For more information and examples, refer to GROUP BY Clause.

Filtering the Groups

To filter the groups by an aggregate function, use the HAVING clause within the GROUP BY clause.

For example, the following query groups unique landmarks by city, and specifies cities that have more than 180 landmarks.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT city City, COUNT(DISTINCT name) LandmarkCount
FROM landmark
GROUP BY city
HAVING COUNT(DISTINCT name) > 180;

Results

[
  {
    "City": "London",
    "LandmarkCount": 443
  },
  {
    "City": "Los Angeles",
    "LandmarkCount": 284
  },
  {
    "City": "San Francisco",
    "LandmarkCount": 797
  },
  {
    "City": "San Diego",
    "LandmarkCount": 197
  }
]

For more information and examples, refer to HAVING Clause.

Defining an Expression within the GROUP BY Clause

To define an expression for use within the GROUP BY clause, use the LETTING clause before the HAVING clause.

For example, the following clause uses an expression to define the minimum number of landmarks for each city.

Context

Set the query context to the inventory scope in the travel sample dataset. For more information, see Setting the Query Context.

Query

SELECT city City, COUNT(DISTINCT name) LandmarkCount
FROM landmark
GROUP BY city
LETTING MinimumThingsToSee = 400
HAVING COUNT(DISTINCT name) > MinimumThingsToSee;

Results

[
  {
    "City": "London",
    "LandmarkCount": 443
  },
  {
    "City": "San Francisco",
    "LandmarkCount": 797
  }
]

For more information and examples, refer to LETTING Clause.

Reference and explanation:

Tutorials:

SQL++ Query Language Tutorial

Querying with SDKs:

C | .NET | Go | Java | Node.js | PHP | Python | Ruby | Scala

Grouping and Aggregation

Introduction

Aggregate Functions

Aggregating Distinct Values

Filtering the Aggregates

Grouping the Results

Filtering the Groups

Defining an Expression within the GROUP BY Clause

Related Links