Default Analyzers

  • Capella Operational
  • reference
    +
    Use an analyzer to filter and modify search strings to improve matches for search results.

    Analyzers contain:

    When you create a type mapping, you can choose a default analyzer for your type mappings, or create your own.

    The following default analyzer options are available:

    Analyzer Description

    inherit

    If you set an analyzer to inherit, the Search index component inherits the default analyzer set for an index.

    ar

    The ar analyzer uses character filters, tokenizers, and token filters designed for Arabic language searches.

    cjk

    The cjk analyzer uses character filters, tokenizers, and token filters designed for Chinese, Japanese and Korean language searches.

    ckb

    The ckb analyzer uses character filters, tokenizers, and token filters designed for Kurdish language searches.

    da

    The da analyzer uses character filters, tokenizers, and token filters designed for Danish language searches.

    de

    The de analyzer uses character filters, tokenizers, and token filters designed for German language searches.

    en

    The en analyzer uses character filters, tokenizers, and token filters designed for English language searches.

    es

    The es analyzer uses character filters, tokenizers, and token filters designed for Castilian Spanish language searches.

    fa

    The fa analyzer uses character filters, tokenizers, and token filters designed for Persian language searches.

    fi

    The fi analyzer uses character filters, tokenizers, and token filters designed for Finnish language searches.

    fr

    The fr analyzer uses character filters, tokenizers, and token filters designed for French language searches.

    he

    The he analyzer uses character filters, tokenizers, and token filters designed for Hebrew language searches.

    hi

    The hi analyzer uses character filters, tokenizers, and token filters designed for Hindi language searches.

    hr

    The hr analyzer uses character filters, tokenizers, and token filters designed for Croatian language searches.

    hu

    The hu analyzer uses character filters, tokenizers, and token filters designed for Hungarian language searches.

    it

    The it analyzer uses character filters, tokenizers, and token filters designed for Italian language searches.

    keyword

    The keyword analyzer turns input into a single token. It forces exact matches and preserves whitespace characters like spaces.

    For example, the keyword analyzer turns an input of Couchbase Server into a single token: Couchbase Server.

    nl

    The nl analyzer uses character filters, tokenizers, and token filters designed for Dutch language searches.

    no

    The no analyzer uses character filters, tokenizers, and token filters designed for Norwegian language searches.

    pt

    The pt analyzer uses character filters, tokenizers, and token filters designed for Portuguese language searches.

    ro

    The ro analyzer uses character filters, tokenizers, and token filters designed for Romanian language searches.

    ru

    The ru analyzer uses character filters, tokenizers, and token filters designed for Russian language searches.

    simple

    The simple analyzer turns input into tokens based on letter characters. It removes characters like punctuation and numbers, and uses these characters as the boundaries for tokens.

    For example, the simple analyzer turns an input of Couchbase Server into two tokens: Couchbase and Server.

    standard

    The standard analyzer uses the unicode tokenizer with the to_lower and stop_en token filters.

    For example, the standard analyzer turns an input of The name is Couchbase Server into three tokens: name, couchbase, and server.

    sv

    The sv analyzer uses character filters, tokenizers, and token filters designed for Swedish language searches.

    tr

    The tr analyzer uses character filters, tokenizers, and token filters designed for Turkish language searches.

    web

    The web analyzer finds email addresses, URLs, Twitter usernames, and hashtags in its input and turns them into tokens.

    For example, the web analyzer turns an input of Send #Couchbase to example@gmail.com into four tokens: send, #Couchbase, to, and example@gmail.com.