In this Appendix, we’ll look at how variables are bound and how names are resolved.
Names can appear in every clause of a query.
Sometimes a name consists of just a single identifier, e.g., region
or revenue
.
More often a name will consist of two identifiers separated by a dot, e.g., customer.address
.
Occasionally a name may have more than two identifiers, e.g., policy.owner.address.zipcode
.
Resolving a name means determining exactly what the (possibly multi-part) name refers to.
It is necessary to have well-defined rules for how to resolve a name in cases of ambiguity.
(In the absence of schemas, such cases arise more commonly, and also differently, than they do in SQL.)
The basic job of each clause in a query block is to bind variables. Each clause sees the variables bound by previous clauses and may bind additional variables. Names are always resolved with respect to the variables that are bound ("in scope") at the place where the name use in question occurs. It is possible that the name resolution process will fail, which may lead to an empty result or an error message.
One important bit of background: When the system is reading a query and resolving its names, it has a list of all the available dataverses and datasets.
As a result, it knows whether a.b
is a valid name for dataset b
in dataverse a
.
However, the system does not in general have knowledge of the schemas of the data inside the datasets; remember that this is a much more open world.
As a result, in general the system cannot know whether any object in a particular dataset will have a field named c
.
These assumptions affect how errors are handled.
If you try to access dataset a.b
and no dataset by that name exists, you will get an error and your query will not run.
However, if you try to access a field c
in a collection of objects, your query will run and return missing
for each object that doesn’t have a field named c
- this is because it’s possible that some object (someday) could have such a field.
Binding Variables
Variables can be bound in the following ways:
-
WITH
andLET
clauses bind a variable to the result of an expression in a straightforward wayExamples:
WITH cheap_parts AS (SELECT partno FROM parts WHERE price < 100)
binds the variablecheap_parts
to the result of the subquery.LET pay = salary + bonus
binds the variablepay
to the result of evaluating the expressionsalary + bonus
. -
FROM
,GROUP BY
, andSELECT
clauses have optionalAS
subclauses that contain an expression and a name (called an iteration variable in aFROM
clause, or an alias inGROUP BY
orSELECT
).Examples:
FROM customer AS c, order AS o
GROUP BY salary + bonus AS total_pay
SELECT MAX(price) AS highest_price
An
AS
subclause always binds the name (as a variable) to the result of the expression (or, in the case of aFROM
clause, to the individual members of the collection identified by the expression).It’s always a good practice to use the keyword
AS
when defining an alias or iteration variable. However, as in SQL, the syntax allows the keywordAS
to be omitted. For example, theFROM
clause above could have been written like this:FROM customer c, order o
Omitting the keyword
AS
does not affect the binding of variables. The FROM clause in this example binds variables c and o whether the keyword AS is used or not.In certain cases, a variable is automatically bound even if no alias or variable-name is specified. Whenever an expression could have been followed by an AS subclause, if the expression consists of a simple name or a path expression, that expression binds a variable whose name is the same as the simple name or the last step in the path expression. Here are some examples:
FROM customer, order
binds iteration variables namedcustomer
andorder
GROUP BY address.zipcode
binds a variable namedzipcode
SELECT item[0].price
binds a variable namedprice
Note that a
FROM
clause iterates over a collection (usually a dataset), binding a variable to each member of the collection in turn. The name of the collection remains in scope, but it is not a variable. For example, consider thisFROM
clause used in a self-join:FROM customer AS c1, customer AS c2
This
FROM
clause joins the customer dataset to itself, binding the iteration variablesc1
andc2
to objects in the left-hand-side and right-hand-side of the join, respectively. After theFROM
clause,c1
andc2
are in scope as variables, and customer remains accessible as a dataset name but not as a variable. -
Special rules for
GROUP BY
:-
(3A): If a
GROUP BY
clause specifies an expression that has no explicit alias, it binds a pseudo-variable that is lexicographically identical to the expression itself. For example:GROUP BY salary + bonus
binds a pseudo-variable namedsalary + bonus
.This rule allows subsequent clauses to refer to the grouping expression (salary + bonus) even though its constituent variables (salary and bonus) are no longer in scope. For example, the following query is valid:
FROM employee GROUP BY salary + bonus HAVING salary + bonus > 1000 SELECT salary + bonus, COUNT(*) AS how_many
While it might have been more elegant to explicitly require an alias in cases like this, the pseudo-variable rule is retained for SQL compatibility. Note that the expression
salary + bonus
is not actually evaluated in theHAVING
andSELECT
clauses (and could not be sincesalary
andbonus
are no longer individually in scope). Instead, the expressionsalary + bonus
is treated as a reference to the pseudo-variable defined in theGROUP BY
clause. -
(3B): The
GROUP BY
clause may be followed by aGROUP AS
clause that binds a variable to the group. The purpose of this variable is to make the individual objects inside the group visible to subqueries that may need to iterate over them.The
GROUP AS
variable is bound to a multiset of objects. Each object represents one of the members of the group. Since the group may have been formed from a join, each of the member-objects contains a nested object for each variable bound by the nearestFROM
clause (and itsLET
subclause, if any). These nested objects, in turn, contain the actual fields of the group-member. To understand this process, consider the following query fragment:FROM parts AS p, suppliers AS s WHERE p.suppno = s.suppno GROUP BY p.color GROUP AS g
Suppose that the objects in
parts
have fieldspartno
,color
, andsuppno
. Suppose that the objects in suppliers have fieldssuppno
andlocation
.Then, for each group formed by the
GROUP BY
, the variable g will be bound to a multiset with the following structure:[ { "p": { "partno": "p1", "color": "red", "suppno": "s1" }, "s": { "suppno": "s1", "location": "Denver" } }, { "p": { "partno": "p2", "color": "red", "suppno": "s2" }, "s": { "suppno": "s2", "location": "Atlanta" } }, ... ]
-
Scoping
In general, the variables that are in scope at a particular position are those variables that were bound earlier in the current query block, in outer (enclosing) query blocks, or in a WITH
clause at the beginning of the query.
More specific rules follow.
The clauses in a query block are conceptually processed in the following order:
-
FROM
(followed by LET subclause, if any) -
WHERE
-
GROUP BY
(followed by LET subclause, if any) -
HAVING
-
SELECT
orSELECT VALUE
-
ORDER BY
-
OFFSET
-
LIMIT
During processing of each clause, the variables that are in scope are those variables that are bound in the following places:
-
In earlier clauses of the same query block (as defined by the ordering given above).
Example:
FROM orders AS o SELECT o.date
The variableo
in theSELECT
clause is bound, in turn, to each object in the datasetorders
. -
In outer query blocks in which the current query block is nested. In case of duplication, the innermost binding wins.
-
In the
WITH
clause (if any) at the beginning of the query.
However, in a query block where a GROUP BY
clause is present:
-
In clauses processed before
GROUP BY
, scoping rules are the same as though no GROUP BY were present. -
In clauses processed after
GROUP BY
, the variables bound in the nearestFROM
-clause (and itsLET
subclause, if any) are removed from scope and replaced by the variables bound in theGROUP BY
clause (and itsLET
subclause, if any). However, this replacement does not apply inside the arguments of the five SQL special aggregating functions (MIN
,MAX
,AVG
,SUM
, andCOUNT
). These functions still need to see the individual data items over which they are computing an aggregation. For example, afterFROM employee AS e GROUP BY deptno
, it would not be valid to referencee.salary
, butAVG(e.salary)
would be valid.
Special case: In an expression inside a FROM
clause, a variable is in scope if it was bound in an earlier expression in the same FROM
clause.
Example:
FROM orders AS o, o.items AS i
The reason for this special case is to support iteration over nested collections.
Note that, since the SELECT
clause comes after the WHERE
and GROUP BY
clauses in conceptual processing order, any variables defined in SELECT
are not visible in WHERE
or GROUP BY
.
Therefore the following query will not return what might be the expected result (since in the WHERE clause, pay
will be interpreted as a field in the emp
object rather than as the computed value salary + bonus
):
SELECT name, salary + bonus AS pay FROM emp WHERE pay > 1000 ORDER BY pay
The likely intent of the query above can be accomplished as follows:
FROM emp AS e LET pay = e.salary + e.bonus WHERE pay > 1000 SELECT e.name, pay ORDER BY pay
Note that in the phrase expr1 JOIN
expr2 ON
expr3, variables defined in expr1 are visible in expr3 but not in expr2. Here’s an example that will not work:
FROM orders AS o JOIN o.items AS i ON 1 = 1
The variable o
, defined in the phrase before JOIN
, cannot be used in the phrase immediately following JOIN
. The probable intent of this example could be accomplished in either of the following ways:
FROM orders AS o UNNEST o.items AS i FROM orders AS o, o.items AS i
To summarize this rule: You may not use left-correlation in an explicit JOIN
clause.
Resolving Names
The process of name resolution begins with the leftmost identifier in the name. The rules for resolving the leftmost identifier are:
-
In a
FROM
clause: Names in aFROM
clause identify the collections over which the query block will iterate. These collections may be stored datasets or may be the results of nested query blocks. A stored dataset may be in a named dataverse or in the default dataverse. Thus, if the two-part namea.b
is in aFROM
clause, a might represent a dataverse andb
might represent a dataset in that dataverse. Another example of a two-part name in aFROM
clause isFROM orders AS o, o.items AS i
. Ino.items
,o
represents an order object bound earlier in theFROM
clause, and items represents the items object inside that order.The rules for resolving the leftmost identifier in a
FROM
clause (including aJOIN
subclause), or in the expression followingIN
in a quantified predicate, are as follows:-
(1A): If the identifier matches a variable-name that is in scope, it resolves to the binding of that variable. (Note that in the case of a subquery, an in-scope variable might have been bound in an outer query block; this is called a correlated subquery).
-
(1B): Otherwise, if the identifier is the first part of a two-part name like
a.b
, the name is treated asdataverse.dataset
. If the identifier stands alone as a one-part name, it is treated as the name of a dataset in the default dataverse. If the designated dataset exists then the identifier is resolved to that dataset, otherwise if a synonym with given name exists then the identifier is resolved to the target dataset of that synonym (potentially recursively if this synonym points to another synonym). An error will result if the designated dataset or a synonym with this name does not exist.Datasets take precedence over synonyms, so if both a dataset and a synonym have the same name then the resolution is to the dataset.
-
-
Elsewhere in a query block: In clauses other than
FROM
, a name typically identifies a field of some object. For example, if the expressiona.b
is in aSELECT
orWHERE
clause, it’s likely thata
represents an object andb
represents a field in that object.The rules for resolving the leftmost identifier in clauses other than the ones listed in Rule 1 are:
-
(2A): If the identifier matches a variable-name that is in scope, it resolves to the binding of that variable. (In the case of a correlated subquery, the in-scope variable might have been bound in an outer query block).
-
(2B): (The "Single Variable Rule"): Otherwise, if the
FROM
clause in the current query block binds exactly one variable, the identifier is treated as a field access on the object bound to that variable. For example, in the queryFROM customer SELECT address
, the identifier address is treated as a field in the object bound to the variablecustomer
. At runtime, if the object bound tocustomer
has noaddress
field, theaddress
expression will returnmissing
. If theFROM
clause in the current query block binds multiple variables, name resolution fails with an "ambiguous name" error. If there’s noFROM
clause in the current query block, name resolution fails with an "undefined identifier" error. Note that the Single Variable Rule searches for bound variables only in the current query block, not in outer (containing) blocks. The purpose of this rule is to permit the compiler to resolve field-references unambiguously without relying on any schema information. Also note that variables defined byLET
clauses do not participate in the resolution process performed by this rule.Exception: In a query that has a
GROUP BY
clause, the Single Variable Rule does not apply in any clauses that occur after theGROUP BY
because, in these clauses, the variables bound by theFROM
clause are no longer in scope. In clauses afterGROUP BY
, only Rule (2A) applies.
-
-
In an
ORDER BY
clause following aUNION ALL
expression:The leftmost identifier is treated as a field-access on the objects that are generated by the
UNION ALL
. For example:query-block-1 UNION ALL query-block-2 ORDER BY salary
In the result of this query, objects that have a foo field will be ordered by the value of this field; objects that have no foo field will appear at at the beginning of the query result (in ascending order) or at the end (in descending order.)
-
In a standalone expression: If a query consists of a standalone expression then identifiers inside that expression are resolved according to Rule 1. For example, if the whole query is
ARRAY_COUNT(a.b)
thena.b
will be treated as datasetb
contained in dataversea
. Note that this rule only applies to identifiers which are located directly inside a standalone expression. Identifiers insideSELECT
statements in a standalone expression are still resolved according to Rules 1-3. For example, if the whole query isARRAY_SUM( (FROM employee AS e SELECT VALUE salary) )
thensalary
is resolved ase.salary
following the "Single Variable Rule" (Rule (2B)). -
Once the leftmost identifier has been resolved, the following dots and identifiers in the name (if any) are treated as a path expression that navigates to a field nested inside that object. The name resolves to the field at the end of the path. If this field does not exist, the value
missing
is returned.
Resolving Database Entities
This section explains how Couchbase Analytics resolves database entity references in the FROM clause — that is, when the first identifier is not a variable reference. This explanation also applies to resolving database entity references in DDL statements.
A database entity reference (or qualified name) may consist of an optional Analytics scope name, and the database entity’s local name, separated by a dot.
-
The Analytics scope name may consist of one or two identifiers, separated by a dot.
-
The local name consists of a single identifier.
Analytics uses the following three resources to resolve database entity references:
-
(Optional) The preceding USE statement.
-
(Optional) The
query_context
parameter. -
The database entity reference provided in the query.
The USE Statement
The USE statement enables you to specify the Analytics scope for the statement immediately following.
For more details, refer to Use Statements.
The Query Context Parameter
The query_context
parameter enables you to specify the Analytics scope for a statement.
You can set the query_context
parameter using the cbq shell
or the Analytics Service REST API.
The value of the query_context
parameter must start with default:
,
followed by an Analytics scope name.
The default:
prefix is a dummy and is ignored when resolving a collection name or synonym name.
The default value of the query_context
parameter is default:Default
, meaning the Default
Analytics scope.
For more details, refer to Analytics Query Parameters.
You can also set the query context in the Analytics workbench using the query context drop-down menu. |
The Database Entity Reference
The database entity reference is resolved according to the following rules:
-
If the database entity reference contains multiple identifiers, all identifiers except the last are assumed to refer to the Analytics scope, and the last identifier is assumed to be the database entity’s local name. Analytics attempts to find the database entity using the specified scope and local name. If the database entity is not found, an error is generated.
-
If the database entity reference only contains one identifier, this is assumed to be the database entity’s local name.
-
If the USE statement has been used to specify an Analytics scope, Analytics attempts to find the database entity using the scope provided by the USE statement and the specified local name.
-
Otherwise, if the
query_context
parameter has been set, Analytics attempts to find the database entity using the scope provided by the query context and the specified local name. -
Otherwise, Analytics attempts to find the database entity using the
Default
Analytics scope and the specified local name.
If the database entity is not found, an error is generated.
-
If the database entity is an Analytics synonym, the synonym is then resolved to find the Analytics collection.
Default Scopes and Collections
In the Data service, each bucket may contain a default scope called _default
,
which in turn may contain a default collection also called _default
.
The Query service enables you to refer to the default scope within the default collection using just the bucket name.
However, Analytics does not support this type of reference.
To simulate this behavior in Analytics,
you can create an Analytics synonym in the Default
scope whose name represents the bucket name.
So for example, you could create an Analytics synonym travel-sample
in the Default
Analytics scope,
which points to the _default
Analytics collection in the `travel-sample`._default
Analytics scope.
This is the approach taken by the ALTER COLLECTION statement.
Unlike the Query service, Analytics does not enable you to resolve individual missing elements
in the middle of a database entity reference.
For example, you cannot use analytics..customers
to refer to the the customers
Analytics collection in the
`analytics`._default
Analytics scope.