Default Analyzers

  • Capella Operational
  • reference
    +
    Use an analyzer to filter and modify search strings to improve matches for search results.

    Analyzers contain:

    When you create a type mapping, you can choose a default analyzer for your type mappings, or create your own.

    The following default analyzer options are available:

    Analyzer Description

    inherit

    If you set an analyzer to inherit, the Search index component inherits the default analyzer set for an index.

    Arabic - ar

    An Arabic language analyzer.

    Chinese, Japanese, and Korean - cjk

    An analyzer designed for the Chinese, Japanese, and Korean languages.

    Kurdish - ckb

    A Kurdish language analyzer.

    Danish - da

    A Danish language analyzer.

    German - de

    A German language analyzer.

    English - en

    An English language analyzer.

    Castilian Spanish - es

    A Castilian Spanish language analyzer.

    Persian - fa

    A Persian language analyzer.

    Finnish - fi

    A Finnish language analyzer.

    French - fr

    A French language analyzer.

    Hebrew - he

    A Hebrew language analyzer.

    Hindi - hi

    A Hindi language analyzer.

    Croatian - hr

    A Croatian language analyzer.

    Hungarian - hu

    A Hungarian language analyzer.

    Italian - it

    An Italian language analyzer.

    keyword

    The keyword analyzer turns input into a single token. It forces exact matches and preserves whitespace characters like spaces.

    For example, the keyword analyzer turns an input of Couchbase Server into a single token: Couchbase Server.

    Dutch - nl

    A Dutch language analyzer.

    Norwegian - no

    A Norwegian language analyzer.

    Portuguese - pt

    A Portuguese language analyzer.

    Romanian - ro

    A Romanian language analyzer.

    Russian - ru

    A Russian language analyzer.

    simple

    The simple analyzer turns input into tokens based on letter characters. It removes characters like punctuation and numbers, and uses these characters as the boundaries for tokens.

    For example, the simple analyzer turns an input of Couchbase Server into two tokens: Couchbase and Server.

    standard

    The standard analyzer uses the unicode tokenizer with the to_lower and stop_en token filters.

    For example, the standard analyzer turns an input of The name is Couchbase Server into three tokens: name, couchbase, and server.

    Swedish - sv

    A Swedish language analyzer.

    Turkish - tr

    A Turkish language analyzer.

    web

    The web analyzer finds email addresses, URLs, Twitter usernames, and hashtags in its input and turns them into tokens.

    For example, the web analyzer turns an input of Send #Couchbase to example@gmail.com into four tokens: send, #Couchbase, to, and example@gmail.com.