CREATE a Remote Collection

  • Capella Columnar
  • reference
    +
    This topic describes how you use the CREATE statement to create a collection that shadows OLTP data from a remote data source.
     CreateRemoteCollection | CreateExternalCollection | CreateStandaloneCollection

    You use different statement syntax to create collections that shadow data located on a Couchbase data service than you do for collections that shadow data from a data service that uses a Kafka pipeline.

    Create a Remote Couchbase Collection

    To create a link to a remote data source, you use the Capella Columnar UI. See Stream Data from Remote Sources.

    Syntax

    CreateRemoteCouchbaseCollection EBNF
    CreateRemoteCouchbaseCollection ::= "CREATE" "ANALYTICS"? "COLLECTION" ("IF" "NOT" "EXISTS")?
                                     QualifiedName
                                     ( "WITH" ObjectConstructor )?
                                     "ON" KeyspaceName
                                     "AT" LinkName
                                     ( "WHERE" Expr )?
    CreateRemoteCouchbaseCollection Diagram
    "CREATE" "COLLECTION" ("IF" "NOT" "EXISTS")? QualifiedName ( "WITH" ObjectConstructor )? "ON" KeyspaceName "AT" LinkName ( "WHERE" Expr )?

    For all collections, the QualifiedName is the fully qualified name of the collection to create.

    Show QualifiedName Diagram
    (DatabaseAndScopeName ".")? Identifier
    QualifiedName
    Show ObjectConstructor Diagram
    "{" ( Expr ( ":" Expr )? ( "," Expr ( ":" Expr )? )* )? "}"
    ObjectConstructor
    Show KeyspaceName Diagram
    Identifier ( "." Identifier "." Identifier )?
    KeyspaceName

    Example

    This example adds a collection for French breweries and associates it with the remote link to Capella. As a prerequisite for running this example yourself, follow the add sample data procedures to add a remote link and collections to Capella Columnar.

      CREATE COLLECTION remoteCapella.remoteBeer.beerFrance
        ON `beer-sample`._default._default
        AT capellaLink
        WHERE country = "France";

    Arguments

    WITH

    The optional WITH clause enables you to specify parameters for the remote Couchbase collection. Its ObjectConstructor represents a JSON object containing key-value pairs.

    The optional storage-block-compression parameter determines the storage compression used for this collection. This parameter takes a nested object value, scheme. Possible values for scheme are snappy or none. The default storage compression scheme is snappy.

    ON

    The ON clause specifies the data source for the collection. The `Identifier`s in this clause represent a bucket name, followed by an optional scope name and collection name, on the remote Couchbase Server or Capella cluster.

    AT

    The AT clause specifies the name of the link that contains credentials for the bucket. The specified link must have a type of Couchbase.

    WHERE

    The optional WHERE clause provides the option to filter the documents in the collection. The Expr in this clause must be deterministic, and it cannot contain a user-defined function. For example, you could filter content by specifying a predicate on the values for one or more of the fields in the objects.

    To create a link to a remote data source, you use the Capella Columnar UI. See Stream Data from Couchbase Capella.

    Create a Remote Kafka Collection

    To create a link to a remote data source, you use the Capella Columnar UI. See Stream Data from Remote Sources.

    Syntax

    CreateRemoteKafkaCollection EBNF
    CreateRemoteKafkaCollection ::= "CREATE" "ANALYTICS"? "COLLECTION"
                                    ("IF" "NOT" "EXISTS")?
                                     QualifiedName
                                     "PRIMARY" "KEY" "(" FieldList ")"
                                     "ON" TopicName ("," TopicName)*
                                     "AT" LinkName
                                     "WITH" KafkaObjectConstructor
                                     ("WHERE" Expr )?
    CreateRemoteKafkaCollection Diagram
    "CREATE" "COLLECTION" ("IF" "NOT" "EXISTS")? QualifiedName "PRIMARY" "KEY" "(" FieldList ")" "ON" TopicName(,TopicName)* "AT" LinkName ( "WITH" KafkaObjectConstructor )? ( "WHERE" Expr )?

    For all collections, the QualifiedName is the fully qualified name of the collection to create.

    Show QualifiedName Diagram
    (DatabaseAndScopeName ".")? Identifier
    QualifiedName
    Show FieldList Diagram
    FieldDef ("," FieldDef)*
    FieldList
    Show FieldDef Diagram
    FieldName ":" FieldType
    FieldDef
    Show FieldName Diagram
    Identifier ("." Identifier)*
    FieldName
    Show FieldType Diagram
    Identifier
    FieldType

    Examples

    This statement uses the JSON Commerce customers dataset as an example. It assumes that this data is in a Kafka topic named remote-topic-1 for a MongoDB dataset that uses CDC. The primary key for the MongoDB dataset is _id, with string values. In addition, this example assumes that a link named confluentLink has the credentials for accessing the MongoDB dataset.

      CREATE COLLECTION sampleAnalytics.Commerce.customers
      PRIMARY KEY (_id:string)
      ON `remote-topic-1`
      AT confluentLink
      WITH {
          "keySerializationType":"JSON",
          "valueSerializationType":"JSON",
          "cdcEnabled": "true"
          "cdcDetails": {
            "cdcSource": "MONGODB"
            "cdcSourceConnector":"DEBEZIUM"
            },
          "deadLetterQueue":"dlq_topic-1"
      };
    Show additional example

    This statement uses the JSON Commerce orders dataset as an example. It assumes that the data is in a Kafka topic named non_cdc_5m_json.mongo_database.mongo_collection for a MongoDB dataset that does not use CDC. In addition, you have a link set up named mskLink with the credentials for accessing the dataset.

      CREATE COLLECTION sampleAnalytics.Commerce.orders
      PRIMARY KEY (orderno:int)
      ON `non_cdc_5m_json.mongo_database.mongo_collection`
      AT mskLink
      WITH {
          "keySerializationType":"JSON",
          "valueSerializationType":"JSON"
      };

    Arguments

    PRIMARY KEY

    The PRIMARY KEY clause indicates the field or fields to use as the primary key for the collection. The FieldType can be any of the primitive data types, where INT is an alias for BIGINT.

    A common datatype for the MongoDB _id primary key is objectId. To specify a primary key with this datatype, you use the following syntax:

      PRIMARY KEY (`_id`.`$oid`:String)

    If CDC is enabled for the collection, you should specify the primary key of the source collection as the primary key for your Capella Columnar collection.

    ON

    You specify one or more topics from the Kafka cluster in a comma-separated list.

    WITH

    The WITH clause enables you to specify parameters for the collection. Its ObjectConstructor represents an object containing key-value pairs, one for each parameter. The configuration that you supply applies to all of the topics listed by the ON clause. You can define the following parameters.

    Name Description Schema

    keySerializationType
    Required

    Specifies the format of the keys in the remote data. Accepts one of the following string values:

    JSON — JSON
    PROTOBUF — Protocol Buffers
    AVRO — Apache Avro

    For information about the mapping that Capella Columnar performs for Avro data types, see Data Type Mapping: Parquet and Avro.

    enum: json, protobuf, avro

    valueSerializationType
    Required

    Specifies the format of the values in the remote data. Accepts one of the following string values:

    JSON — JSON
    PROTOBUF — Protocol Buffers
    AVRO — Apache Avro

    For information about the mapping that Capella Columnar performs for Avro data types, see Data Type Mapping: Parquet and Avro.

    enum: json, protobuf, avro

    cdcEnabled
    Required

    Identifies whether the Kafka pipeline uses Change Data Capture (CDC) processing.

    When true, you also specify a cdcDetails object with the cdcSource and cdcSourceConnector parameters.

    When true, you should specify the primary key of the source collection as the primary key for your Capella Columnar collection. The Debezium source connector sends the primary key of the source collection as the key for records in the Kafka topic.

    Boolean

    cdcSource
    Optional

    Only used if cdcEnabled is true.

    Identifies the data source that uses CDC in the Kafka pipeline.

    enum: MONGODB, MYSQLDB, POSTGRESQL

    cdcSourceConnector
    Optional

    Only used if cdcEnabled is true.

    Identifies the type of source data connector used in the pipeline.

    enum: DEBEZIUM

    deadLetterQueue
    Optional

    Specifies a remote Kafka topic as the destination for failed messages. If you do not define a topic, failed messages are dropped.

    string

    To create a link to a remote data source, you use the Capella Columnar UI. See Create a Kafka Pipeline Link.