Create a Custom Tokenizer
- Capella Operational
- how-to
Create a custom tokenizer with the Couchbase Capella UI to change how the Search Service creates tokens for matching Search index content to a Search query.
Prerequisites
-
You have the Search Service enabled on a node in your cluster. For more information about how to change Services on your cluster, see Modify a Paid Cluster.
-
You have logged in to the Couchbase Capella UI.
-
You have started to create or already created an index in Advanced Mode.
Procedure
You can create 2 types of custom tokenizers:
Tokenizer Type | Description |
---|---|
The tokenizer uses any input that matches the regular expression to create new tokens. |
|
The tokenizer removes any input that matches the regular expression, and creates tokens from the remaining input. You can choose another tokenizer to apply to the remaining input. |
Create a Regular Expression Tokenizer
To create a regular expression tokenizer with the Capella UI:
-
On the Operational Clusters page, select the cluster that has the Search index you want to edit.
-
Go to
. -
Click the index where you want to create a custom tokenizer.
-
Under Advanced Settings, expand Custom Filters.
Make sure you use Advanced Mode. -
Click Add Tokenizer.
-
In the Name field, enter a name for the custom tokenizer.
-
In the Type list, select regexp.
-
In the Regular Expression field, enter the regular expression to use to split input into tokens.
-
Click Submit.
Create an Exception Custom Tokenizer
To create an exception custom tokenizer with the Capella UI in Advanced Mode:
-
On the Operational Clusters page, select the cluster that has the Search index you want to edit.
-
Go to
. -
Click the index where you want to create a custom tokenizer.
-
Expand Custom Filters.
-
Click Add Tokenizer.
-
In the Name field, enter a name for the custom tokenizer.
-
In the Type list, select exception.
-
In the New Word field, enter a regular expression to use to remove content from input.
-
To add the regular expression to the list of exception patterns, click Add.
-
(Optional) To add additional regular expressions to the list of exception patterns, repeat the previous steps.
-
In the Tokenizer for Remaining Input list, select a tokenizer to apply to input after removing any content that matches the regular expression.
For more information about the available tokenizers, see Default Tokenizers.
-
Click Submit.
Next Steps
After you create a custom tokenizer, you can use it with a custom analyzer.
To continue customizing your Search index, you can also:
To run a search and test the contents of your Search index, see Run A Simple Search with the Capella UI.