Querying with SQL++
You can query for documents in Couchbase using the SQL++ query language, a language based on SQL, but designed for structured and flexible JSON documents.
On this page we dive straight into using the Query Service API from the Python Columnar SDK. For a deeper look at the concepts, to help you better understand the Query Service, and the SQL++ language, see the links in the Further Information section at the end of this page.
Here we show queries against the Travel Sample collection, at cluster and scope level, and give links to information on adding other collections to your data.
Before You Start
This page assumes that you have installed the Python Columnar SDK, added your IP address to the allowlist, and created a Columnar cluster.
Create a collection to work upon by importing the travel-sample dataset into your cluster.
Querying Your Dataset
Most queries return more than one result, and you want to iterate over the results:
Scope Level Queries
-
Sync API
-
Async API
scope = cluster.database('travel-sample').scope('inventory')
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM route
GROUP BY airline
ORDER BY route_count DESC
"""
res = scope.execute_query(query)
print('Rows:')
for row in res.rows():
print(row)
print(f'\nMetadata: {res.metadata()}')
scope = cluster.database('travel-sample').scope('inventory')
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM route
GROUP BY airline
ORDER BY route_count DESC
"""
res = await scope.execute_query(query)
print('Rows:')
async for row in res.rows():
print(row)
print(f'\nMetadata: {res.metadata()}')
Cluster Level Queries
-
Sync API
-
Async API
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM `travel-sample`.inventory.route
GROUP BY airline
ORDER BY route_count DESC
"""
res = cluster.execute_query(query)
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM `travel-sample`.inventory.route
GROUP BY airline
ORDER BY route_count DESC
"""
res = await cluster.execute_query(query)
Positional and Named Parameters
Supplying parameters as individual arguments to the query allows the query engine to optimize the parsing and planning of the query. You can either supply these parameters by name or by position.
Positional Parameters
Execute a query with positional arguments:
-
Sync API
-
Async API
from couchbase_columnar.options import QueryOptions
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM route
WHERE sourceairport=$1 AND distance>=$2
GROUP BY airline
ORDER BY route_count DESC
"""
res = scope.execute_query(query, QueryOptions(positional_parameters=['SFO', 1000]))
from acouchbase_columnar.options import QueryOptions
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM route
WHERE sourceairport=$1 AND distance>=$2
GROUP BY airline
ORDER BY route_count DESC
"""
res = await scope.execute_query(query, QueryOptions(positional_parameters=['SFO', 1000]))
Named Parameters
Execute a query with named arguments:
-
Sync API
-
Async API
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM route
WHERE sourceairport=$source_airport AND distance>=$min_distance
GROUP BY airline
ORDER BY route_count DESC
"""
res = scope.execute_query(query, QueryOptions(named_parameters={'source_airport': 'SFO', 'min_distance': 1000}))
query = """
SELECT airline, COUNT(*) AS route_count, AVG(route.distance) AS avg_route_distance
FROM route
WHERE sourceairport=$source_airport AND distance>=$min_distance
GROUP BY airline
ORDER BY route_count DESC
"""
res = await scope.execute_query(query, QueryOptions(named_parameters={'source_airport': 'SFO', 'min_distance': 1000}))
Using the Query Result
Results from the Couchbase Columnar SDK can easily be used with several common Data Analytics Python libraries, including Pandas and PyArrow.
import pandas as pd
res = scope.execute_query(query)
df = pd.DataFrame.from_records(res.rows(), index='airline')
print(df.head())
# route_count avg_route_distance
# airline
# AA 2354 2314.884359
# UA 2180 2350.365407
# DL 1981 2350.494112
# US 1960 2101.417609
# WN 1146 1397.736500
import pyarrow as pa
res = scope.execute_query(query)
table = pa.Table.from_pylist(res.get_all_rows())
print(table.to_string())
# pyarrow.Table
# route_count: int64
# avg_route_distance: double
# airline: string
Further Information
The SQL++ for Analytics Reference offers a complete guide to the SQL++ language for both of our analytics services, including all of the latest additions.