Search Index Features

Capella Operational

concept

Search indexes in Couchbase Capella have multiple features that you can configure to improve performance and fine tune your search results.

Some features are only available in Advanced Mode editing.

You can add the following components and configure the following options for a Search index:

Analyzers
Default Date/Time Parser
Document Filters
Type Mappings and Mappings
Replica and Partition Settings
Custom Analyzers and Other Filters

Analyzers

Use analyzers to improve and customize the search results in your index.

Analyzers transform input text into tokens, which give you greater control over your index’s text matching. The Default Analyzer sets the analyzer that’s used by default for new type mappings across your Search index.

You can use one of Couchbase’s built-in analyzers as the Default Analyzer or the analyzer for a specific type mapping. If you use Advanced Mode, you can create your own analyzer.

Analyzers have different components that control how text is transformed for search. When you create a custom analyzer, you can choose these components. For more information about Search analyzer components, see Custom Analyzers and Other Filters.

For more information about how to create a custom analyzer, see Create a Custom Analyzer.

Default Date/Time Parser

Set the default format that the Search index should use to interpret date and time data in your Search index.

If the documents in your index contain date and time data in a format other than the default date/time parsers, you need to create a custom date/time parser. You can only create a custom date/time parser if you switch to Advanced Mode. For more information about how to add a custom date/time parser, see Create a Custom Date/Time Parser.

Document Filters

In Advanced Mode, you can also choose and configure an additional document filter to add or remove documents in your Search index that meet certain conditions:

JSON Type Field: Selects only documents that contain a specific field with a specified string value.
Doc ID up to Separator: Selects only documents with an ID or key up to a specific substring.
Doc ID with Regex: Selects only documents with an ID or key that matches a regular expression.

For more information about how to configure a document filter, see Set a Document Filter.

Type Mappings and Mappings

Use a type mapping to include or exclude specific documents in a collection from an index.

Type mappings can also set a field’s data type and other settings.

Type mappings start at the collection level. Create additional mappings for document fields or JSON objects under a collection’s type mapping to restrict the documents added to your index. This can improve Search index performance over indexing entire collections.

If your operational cluster is running Couchbase Server version 7.6.2 and later, you can also choose to include document metadata inside your Search index by creating an XATTRs mapping. For more information about how to configure settings for the different types of mappings and type mappings, see Collection, Object, XATTRs, and Field Mapping Options.

For more information about how to configure a type mapping in the Search index editor, see Create a New Mapping or Type Mapping.

You can create two types of type mappings with the Search Service:

Dynamic Type Mappings
Static Type Mappings

Dynamic Type Mappings

When you do not know the structure of your data fields ahead of time, use a dynamic type mapping to add all available fields from a matching document type to an index. For example, you could create a dynamic type mapping to include all documents from the hotel collection in your Search index, or include all fields under a JSON object from your document schema.

Configure this type of mapping by selecting a collection or JSON object in the Search index editor when you Create a New Mapping or Type Mapping.

Static Type Mappings

When your data fields are stable and unlikely to change, use a static type mapping to add and define only specific fields from a matching document type to an index. For example, you could create a static type mapping to only include the contents of the city field from the hotel collection in your Search index, as a text field with an en analyzer.

Configure this type of mapping by selecting a field in your document schema in the Search index editor when you Create a New Mapping or Type Mapping.

Replica and Partition Settings

Use replicas and partitions to add high availability, fault tolerance, and scalability to your Search index.

Number of Replicas

Add Search index replicas to create copies of your Search index on other nodes. If 1 of the nodes running the Search Service in your cluster goes offline, you can still use your indexes if they exist on another node.

Adding more replicas increases the storage used by the Search Service for your indexes. You cannot add more replicas if your cluster configuration does not have the nodes to support those replicas.

Number of Partitions

Add Search index partitions to distribute the contents of a Search index over multiple Search Service nodes in your cluster.

Partitions improve Search index performance, but increase the complexity of a Search index and its resource usage.

It’s recommended to set your Search index partitions to the number of nodes running the Search Service in your operational cluster, to get the most efficient resource usage.

Custom Analyzers and Other Filters

Custom filters are components of a Search index analyzer.

Create and add custom filters to a custom analyzer to improve search results and performance for an index in Advanced Mode. You cannot create custom analyzers or custom filters if Advanced Options are not enabled.

You can create the following custom filters:

Character Filters
Tokenizers
Token Filters
Word Lists

Character Filters

Character filters remove unwanted characters from the input for a search. For example, the default html character filter removes HTML tags from your search content.

You can use a default character filter in an analyzer or create your own. When you create a custom character filter, you can choose whether your analyzer replaces any removed characters with your own configured string.

For more information about the available default character filters, see Default Character Filters.

For more information about how to create your own custom character filter, see Create a Custom Character Filter.

Tokenizers

Tokenizers separate input strings into individual tokens. These tokens are combined into token streams. The Search Service takes token streams from search queries to determine matches for token streams in search results.

You can use a default tokenizer in an analyzer or create your own.

For more information about the available default tokenizers, see Default Tokenizers.

For more information about how to create your own tokenizer, see Create a Custom Tokenizer.

Token Filters

Token filters take the token stream from a tokenizer and modify the tokens.

A token filter can create stems from tokens to increase the matches for a search term. For example, if a token filter creates the stem play, a search can return matches for player, playing, and playable.

The Search Service has default token filters available. For a list of all available token filters, see Default Token Filters.

You can also create your own token filters. Custom token filters can use Word Lists to modify their tokens. For more information about how to create your own token filter, see Create a Custom Token Filter.

Word Lists

Word lists define a list of words that you can use with a token filter to create tokens.

You can use a word list to find words and create tokens, or remove words from a tokenizer’s token stream.

When you create a custom token filter, the Search Service you can use a default word list, or create your own word list. Only specific custom token filter types use word lists in their configuration:

For more information about the available default word lists, see Default Wordlists. For more information about how to create your own word list, see Create a Custom Token Filter.