Default Tokenizers
- Capella Operational
- reference
Tokenizers control how the Search Service splits input strings into individual tokens.
You can use a tokenizer when you create a custom analyzer. Choose a default tokenizer or create your own.
The following default tokenizers are available:
Tokenizer | Description |
---|---|
hebrew |
Separates an input string into tokens that contain only Hebrew alphabet characters. Punctuation marks and numbers are excluded. |
letter |
Separates an input string into tokens that contain only Latin alphabet characters. Punctuation marks and numbers are excluded. |
single |
Creates a single token from the input string. Special characters and whitespace are preserved. |
Separates input strings into tokens based on Unicode Word Boundaries. |
|
web |
Creates tokens from an input string that match email address, URL, Twitter username, and hashtag patterns. |
Separates an input string into tokens based on the location of whitespace characters. |