A newer version of this documentation is available.

View Latest

Default Tokenizers

  • reference
    +
    Tokenizers control how the Search Service splits input strings into individual tokens.

    You can use a tokenizer when you create a custom analyzer. Choose a default tokenizer or create your own.

    The following default tokenizers are available:

    Tokenizer Description

    hebrew

    Separates an input string into tokens that contain only Hebrew alphabet characters. Punctuation marks and numbers are excluded.

    letter

    Separates an input string into tokens that contain only Latin alphabet characters. Punctuation marks and numbers are excluded.

    single

    Creates a single token from the input string. Special characters and whitespace are preserved.

    unicode

    Separates input strings into tokens based on Unicode Word Boundaries.

    web

    Creates tokens from an input string that match email address, URL, Twitter username, and hashtag patterns.

    whitespace

    Separates an input string into tokens based on the location of whitespace characters.