Create a Custom Tokenizer
- how-to
Create a custom tokenizer with the Couchbase Server Web Console to change how the Search Service creates tokens for matching Search index content to a Search query.
Prerequisites
-
You have the Search Service enabled on a node in your database. For more information about how to deploy a new node and Services on your database, see Manage Nodes and Clusters.
-
You have created an index. For more information, see Create a Basic Search Index with the Web Console.
-
You have logged in to the Couchbase Server Web Console.
Procedure
You can create 2 types of custom tokenizers:
Tokenizer Type | Description |
---|---|
The tokenizer uses any input that matches the regular expression to create new tokens. |
|
The tokenizer removes any input that matches the regular expression, and creates tokens from the remaining input. You can choose another tokenizer to apply to the remaining input. |
Create a Regular Expression Tokenizer
To create a regular expression tokenizer with the Couchbase Server Web Console:
-
Go to Search.
-
Click the Search index where you want to create a custom tokenizer.
-
Click Edit.
-
Expand
. -
Click Add Tokenizer.
-
In the Name field, enter a name for the custom tokenizer.
-
In the Type field, select regexp.
-
In the Regular Expression field, enter the regular expression to use to split input into tokens.
-
Click Save.
Create an Exception Custom Tokenizer
To create an exception custom tokenizer with the Couchbase Server Web Console:
-
Go to Search.
-
Do one of the following:
-
Click the Search index where you want to create a custom tokenizer.
-
Click Edit.
-
Expand
. -
Click Add Tokenizer.
-
In the Name field, enter a name for the custom tokenizer.
-
In the Type field, select exception.
-
In the Exception Patterns field, enter a regular expression to use to remove content from input.
-
To add the regular expression to the list of exception patterns, click Add.
-
(Optional) To add additional regular expressions to the list of exception patterns, repeat the previous steps.
-
In the Tokenizer for Remaining Input field, select a tokenizer to apply to input after removing any content that matches the regular expression.
For more information about the available tokenizers, see Default Tokenizers.
-
Click Save.
Next Steps
After you create a custom tokenizer, you can use it with a custom analyzer.
To continue customizing your Search index, you can also:
To run a search and test the contents of your Search index, see Run A Simple Search with the Web Console or Run a Simple Search with the REST API and curl/HTTP.