Access Data

Capella Columnar

tutorial

You can set up different data sources to work with sample data in Capella Columnar.

This Capella Columnar guide includes example statements and queries that refer to sample datasets, including a Commerce example dataset and the Couchbase travel-sample and beer-sample datasets.

Use the procedures on this page to create Capella Columnar collections for these datasets. Each section gives you practice setting up database objects in Capella Columnar so that you can work with data from different sources. For more information about Columnar database objects, see Access and Organize Data in Capella Columnar Services.

Set up standalone collections and populate them by inserting the Commerce dataset.
Directly import the travel-sample data into a new database object in your Columnar cluster.

Set up a remote data source—a link and a set of remote collections—to shadow beer-sample data in a Capella collection.

The beer-sample example requires you to set up a Capella operational cluster as a remote data source for Capella Columnar. The Commerce dataset and travel-sample do not require a Capella operational cluster or other remote data source to set up.

Prerequisites

If you’re just getting started with Capella Columnar, make sure that you created a Couchbase Capella account.
Create a project and Columnar cluster. For more information about how to set up your project and cluster in Columnar, see Create a Cluster.
If you want to work with the beer-sample dataset, you must have a Capella operational cluster deployed. This cluster is the remote data source for the sample data in the example. For more information about how to create a Capella operational cluster, see Create A Paid Cluster.
To follow along with the examples on this page, you must have the Organization Owner role in your organization, or one of the following project roles for the project that contains your cluster:
- Project Owner
- Database Data Reader/Writer

Install the Commerce Dataset in Standalone Collections

The Commerce dataset consists of two collections:

customers, with the primary key custid which has string values
orders, with the primary key orderno which has integer values

To work with this dataset in Capella Columnar you create a standalone collection for each one. Then, you use INSERT INTO statements to populate them with data.

Create a Standalone Collection

In the Capella UI, select the Columnar tab and then click a cluster name. The workbench opens.
In the explorer, go to Create Database.
In the Database Name field, enter sampleAnalytics.
Click Add Scope to New Database to create a scope named Commerce.
Click Create.
In the explorer, point to the Commerce scope and go to More Options (⋮) Add Standalone Collection.
In the Collection Name field, enter customers.
For the Collection Primary Key, in the Field Name, enter custid.
In the Field Type list, select string.
Click Create.

Populate a Standalone Collection

Use the query editor’s Query Context lists to select the sampleAnalytics database and Commerce scope.
In the query editor, begin an INSERT INTO statement as follows:
```
  INSERT INTO customers (
```
If you type in the statement instead of copying and pasting this example, the query editor automatically supplies the closing parenthesis ).
Open the customers data, select the contents of the page and copy it.
To complete the statement, return to the query editor and paste the JSON document in between the parentheses. You’ll need to add the closing parenthesis ) if you used copy and paste to supply the INSERT INTO statement.
Run the query to populate the customers collection.
To verify that the collection contains data now, run the following query:
```
  SELECT * FROM customers LIMIT 1;
```

Create another standalone collection for orders, which uses the Field Name orderno and a Field Type of int as a primary key. Use the orders data to select and copy the data for this collection, then populate it using another INSERT INTO statement.

You can also populate standalone collections by importing data from a file. After you create a standalone collection, point to the collection name, and go to Import Data to Collection to upload a CSV, TSV, JSON, or JSONL file. You can configure import filters or create a new collection for your import.

Import the `travel-sample` Collections

The Couchbase travel-sample dataset is available to import from directly inside the Columnar workbench. The travel-sample consists of 5 collections of JSON documents: airline, airport, landmark, hotel, and route.

To import the travel-sample into your Columnar cluster:

In the explorer, click Import.
Click Sample.
Click Import.

Columnar creates a new database, travel-sample, with the inventory scope and all 5 collections.

After the import finishes, you can choose to import sample queries to work with the travel-sample dataset.

Create Remote Collections for `beer-sample`

You can import the Couchbase beer-sample dataset into a Capella operational cluster or self-managed Couchbase Server cluster. This dataset consists of a single collection, which contains data on beers and breweries.

Set up remote collections to hold shadow copies of the beer-sample data in Capella Columnar. Use WHERE clauses to create multiple collections on Columnar, instead of creating only a single collection to match what’s in your remote data source in your Capella operational cluster or Couchbase Server.

Any Capella Columnar collections that use a WHERE clause apply that clause on an ongoing basis to continuously filter your incoming data event stream. Only documents that meet the WHERE clause criteria are upserted into your Capella Columnar collection.

Prepare to Ingest Data from the Remote System

Make sure the Capella operational cluster you want to use as a remote data source belongs to the same organization as your Capella Columnar cluster.

To prepare your Capella operational cluster for creating a remote collection in Capella Columnar:

Import the beer-sample sample data.

After you have imported the sample data, you can set up your Capella operational cluster as a remote data source in Columnar.

Create a Data Source for Remote Data

In the Capella UI, click the Couchbase Capella icon and from the Columnar tab select your cluster to open the workbench.
In the explorer, go to Create Cluster.
In the Database Name field, enter remoteCapella.
Click Add Scope to New Cluster to create a scope named remoteBeer.
Click Create.
In the explorer, go to Create Data Link.
Click Couchbase Capella.
Click Continue.
In the Link Name field, enter capellaLink.
Select the cluster where you imported the beer-sample dataset and click Save & Continue.

Capella creates the link between your Columnar cluster and your operational cluster.
Click Create Linked Collection.
In the lists, select your remoteCapella cluster and remoteBeer scope.
In the Collection Name field, enter the name for the first collection, brewBelgium.
In the Configure Data Details section, select the beer-sample bucket, and _default scope and collection.
In the provided field, enter a WHERE clause for the documents in the collection to shadow:
```
country = "Belgium"
```
Click Create Collection.
(Optional) Click Create Another Collection and use the following WHERE clause to shadow data for French beers and breweries:
```
country = "France"
```
To start shadowing data from your Capella operational cluster in Capella Columnar, click Connect Link.

You’ll incur charges for usage after you connect this link.
Click Yes, Continue.
Close the link creation pane.
Verify that your brewBelgium collection now contains a shadow copy of the data sourced from Capella by running the following query, with your query context set to remoteCapella and remoteBeer:
```
  SELECT * FROM brewBelgium LIMIT 1;
```

Next Steps

You can create more collections using the same remote link to your Capella operational cluster.

In the explorer, expand Links and go to More Options (⋮) Create Linked Collection to add another collection. Use WHERE clauses to shadow different subsets of the data: brewGermany for country = "Germany", brewUS for country = "United States" and so on.

You can also use the example DDL statements for creating remote Couchbase collections as guidelines for using SQL++ queries instead of the user interface controls. See the examples for Create a Remote Couchbase Collection.