Create a Kafka Pipeline Collection

  • Capella Columnar
  • how-to
    +
    To receive a data event stream from a remote data source that uses an Apache Kafka pipeline, you create a remote collection.

    You can create collections to associate with a Kafka pipeline link when you create the link or at any time afterwards.

    You can also use a SQL++ statement to create a remote Kafka collection. See CREATE a Remote Collection.

    Requirements

    Primary Key

    When you set up a remote collection to receive data from a Kafka pipeline, you supply the primary key and its data type in KEY_NAME:DATA_TYPE format. For example, id:string.

    • To use a key name that includes a space or any character other than an underscore (_), escape the name with backtick (``) characters.

    • For source data that uses an object id, add . and then `$oid` after the KEY_NAME, in the following format:

        KEY_NAME.`$oid`:DATA_TYPE

      For example:

         _id.`$oid`:string
    • For a composite key, enter a comma-separated list of the key names and their data types.

    Topic

    The Kafka topic or set of topics that contains the data you want to stream into the collection. You can stream data from one or more topics to multiple collections using the same link. However, the collections that stream the same topics must have the same data serialization and change data capture settings. Otherwise, you receive an inconsistent details config error.

    Similarly, when streaming data from multiple topics into a collection, the data serialization and change data capture settings must apply to all of the topics that you provide.

    Data Serialization

    The type of data serialization used for keys and values:

    Dead Letter Queue

    You can have Capella Columnar report any messages it fails to load to a Kafka topic called the dead letter queue. The credentials you supply for the link to connect to Kafka must have permission to produce messages on this topic.

    Change Data Capture

    Whether Change Data Capture (CDC) applies, and if so, the source.

    If you have just saved a new Kafka data link and are ready to add one or more collections to it, begin with step 5.

    To create a remote collection that is associated with a Kafka data link:

    1. In the Capella UI, select the Columnar tab.

    2. Click a cluster name. The workbench opens.

    3. Use the explorer to locate the link.

    4. Move your cursor over the name of the link and then choose ⋮ (More)  Create Linked Collection. The Create Collection dialog opens.

    5. Use the lists to select the Database and Scope for the collection.

    6. In the Collection Name field, enter a name for the collection.

      The name must start with a letter (A-Z, a-z) and contain only upper- and lowercase letters, numbers (0-9), and underscore (_) or dash (-) characters.

    7. In the Primary Key field, enter the name of the primary key and its data type in the format KEY_NAME:DATA_TYPE. See the requirements for examples.

    8. Supply one or more Kafka Topics in a comma-separated list. If you supply multiple topics, the choices you make in the remaining fields must apply to all of them. Also enter the name of the dead letter queue topic (if any).

    9. Select the data serialization type used for keys.

    10. Select the data serialization type used for values.

    11. Specify whether the topics use CDC (Change Data Capture). If you select CDC Enabled, Capella Columnar supplies a CDC Connector of DEBEZIUM. You specify the CDC Source: MONGODB, MYSQLDB, or POSTGRESQL.

    12. Choose Create Collection. Your collection appears under the specified database and scope in the explorer.

      If the link is connected, the data stream from the specified topic or topics into this Capella Columnar collection begins immediately. If the link is not connected, see Connect or Disconnect a Remote Link.