N1QL Queries and Results
- concept
An overview of common concepts that you will need to understand in order to use the Query service.
Queries
A N1QL query is a string parsed by the query service. The N1QL query language is based on SQL, but designed for structured and flexible JSON documents.
As with SQL, you can have nested sub-queries.
N1QL queries run on JSON documents, and you can query over multiple documents by using the JOIN
clause.
Because data in N1QL can be nested, there are operators and functions that let you navigate through nested arrays.
Because data can be irregular, you can specify conditions in the WHERE
clause to retrieve data.
You can use standard GROUP BY
, ORDER BY
, LIMIT
, and OFFSET
clauses as well as a rich set of functions to transform the results as needed.
Results
The result for each query is a set of JSON documents.
The returned document set is not required to be uniform, though it can be.
A SELECT
statement that specifies a fixed set of attribute (column) names results in a uniform set of documents and a SELECT
statement that specifies the wild card (*) results in a non-uniform result set.
The only guarantee is that every returned document meets the query criteria.
Here’s a sample query and the result returned:
SELECT name, brewery_id from `beer-sample` WHERE brewery_id IS NOT MISSING LIMIT 2;
{
"requestID": "fbea9e79-a2e2-4ab8-9fdc-14e098838cc1",
"signature": {
"brewery_id": "json",
"name": "json"
},
"results": [
{
"brewery_id": "big_horn_brewing_the_ram_2",
"name": "Schaumbergfest"
},
{
"brewery_id": "ballast_point_brewing",
"name": "Wahoo Wheat Beer"
}
],
"status": "success",
"metrics": {
"elapsedTime": "131.492184ms",
"executionTime": "131.261322ms",
"resultCount": 2,
"resultSize": 205
}
}
Paths
One of the main differences between JSON and flat rows is that JSON supports a nested structure, allowing documents to contain other documents, also known as sub-documents.
N1QL provides paths to support nested data.
Paths use dot notation syntax to identify the logical location of an attribute within a document.
For example, to get the street from a customer order, use the path orders.billTo.street
.
This path refers to the value for street
in the billTo
object.
A path is used with arrays or nested objects to get to attributes within the data structure.
Array syntax in the path can also be used to get to information.
For example, the path orders.items[0].productId
evaluates to the productId
value for the first array element under the order item, items
.
Paths provide a method for finding data in document structures without having to retrieve the entire document or handle it within an application. Any document data can be requested and returned to an application. When only relevant information is returned to an application, querying bandwidth is reduced.
See Nested Path Expressions for more details.
Parameterized Queries
N1QL allows the use of placeholders to declare dynamic query parameters. For example:
SELECT airportname FROM `travel-sample`.inventory.airport WHERE city=$1
The $1
is a positional placeholder.
When the query is constructed, it may receive arguments, with each argument being used as the placeholder value in the query.
Thus, $1
refers to the first argument,$2
to the second, and so on.
Placeholders may also be named.
This is particularly useful when there are many query parameters and ensuring that they are all in the correct order may be cumbersome.
Name query placeholders take the form of $name
.
SELECT airportname FROM `travel-sample`.inventory.airport
WHERE country=$country
AND geo.alt > $altitude
AND (geo.lat BETWEEN $min_lat AND $max_lat)
AND (geo.lon BETWEEN $min_lon AND $max_lon);
You can set positional and named parameter values using cbq or the N1QL REST API when you run the query. For example, using cbq:
cbq> \SET -$country "United States";
cbq> \SET -$altitude 500;
cbq> \SET -$min_lat -50;
cbq> \SET -$max_lat 50;
cbq> \SET -$min_lon -180;
cbq> \SET -$max_lon 0;
Query Optimization Using Prepared Statements
When a N1QL query string is sent to the server, the server will inspect the string and parse it, planning which indexes to query. Once this is done, it generates a query plan. The computation for the plan adds some additional processing time and overhead for the query.
Often-used queries can be prepared so that its plan is generated only once. Subsequent queries using the same query string will use the pre-generated plan instead, saving on the overhead and processing of the plan each time.
Parameterized queries are considered the same query for caching and planning purposes, even if the supplied parameters are different. |
For more information on how to optimize queries using prepared statements, refer to the PREPARE statement.
Indexes
The Couchbase query service makes use of indexes in order to do its work. Indexes replicate subsets of documents from data nodes over to index nodes, allowing specific data (for example, specific document properties) to be retrieved quickly, (and to distribute load away from data nodes in MDS topologies).
In order to make a keyspace queryable, it must have at least one index defined.
You can define a primary index on a keyspace.
When a primary index is defined you can issue non-covered queries on the keyspace as well.
This includes using the META
function in the queries.
CREATE PRIMARY INDEX ON `users`;
You can also define indexes over given document fields and then use those fields in the query:
CREATE INDEX ix_name ON `users`(name);
CREATE INDEX ix_email ON `users`(email);
Would allow you to query the users keyspace regarding a document’s name
or email
properties, so for example:
SELECT name, email FROM `users` WHERE name="Monty Python" OR email="monty@python.org";
Indexes help improve the performance of a query. When an index includes the actual values of all the fields specified in the query, the index covers the query and eliminates the need to fetch the actual values from the Data Service. An index, in this case, is called a covering index and the query is called a covered query.
For more information, refer to Using Indexes.
Index Building
Index creation happens in two phases: the creation phase and the build phase. During the creation phase, the Index Service validates the user input, decides the host node for the index, and creates the index metadata on the host node. During the build phase, the Index Service reads the documents from the Data Service and builds the index. The build phase cannot start until the creation phase is complete.
Creating and building indexes can take a long time on keyspaces with lots of existing documents. When you create an index, you can choose to defer the build phase, and then build the deferred index later. This allows multiple indexes to be built at once rather than having to re-scan the entire keyspace for each index.
CREATE PRIMARY INDEX ON `users` WITH {"defer_build": true};
CREATE INDEX ix_name ON `users`(name) WITH {"defer_build": true};
CREATE INDEX ix_email ON `users`(email) WITH {"defer_build": true};
BUILD INDEX ON `users`(`#primary`, `ix_name`, `ix_email`);
The indexes are actually built when the BUILD INDEX
statement is executed.
At this point, the server scans all the documents in the users
keyspace and indexes it for all of the applicable indexes (i.e.
if it has a name
or email
) field.
For more information, refer to CREATE PRIMARY INDEX, CREATE INDEX, and BUILD INDEX.
Index Consistency
Because indexes are by design outside the data service, they are eventually consistent with respect to changes to documents and, depending on how you issue the query, may at times not contain the most up-to-date information. This may especially be the case when deployed in a write-heavy environment: changes may take some time to propagate over to the index nodes.
The asynchronous updating nature of global secondary indexes means that they can be very quick to query and do not require the additional overhead of index recalculations at the time documents are modified. N1QL queries are forwarded to the relevant indexes and the queries are done based on indexed information, rather than the documents as they exist in the data service.
With default query options, the query service will rely on the current index state: the most up-to-date document versions are not retrieved, and only the indexed versions are queried. This provides the best performance. Only updates occurring with a small time frame may not yet have been indexed.
The query service can use the latest versions of documents by modifying the consistency
of the query.
This is done by setting the scan_consistency
parameter to REQUEST_PLUS
.
When using this consistency mode, the query service will ensure that the indexes are synchronized with the data service before querying.
For more information, refer to Query Settings.