Search

how-to

You can use the Full Text Search service (FTS) to create queryable, full-text indexes in Couchbase Server.

Full Text Search (FTS) — or Search for short — allows you to create, manage, and query full text indexes on JSON documents stored in Couchbase buckets. It uses natural language processing for querying documents, provides relevance scoring on the results of your queries, and has fast indexes for querying a wide range of possible text searches.

Some of the supported query types include simple queries like Match and Term queries; range queries like Date Range and Numeric Range; and compound queries for conjunctions, disjunctions, and/or boolean queries.

There are two APIs for querying search: cluster.searchQuery(), and cluster.search(). Both are also available at the Scope level.

The former API supports FTS queries (SearchQuery), while the latter additionally supports the VectorSearch added in 7.6. Most of this documentation will focus on the former API, as the latter is in @Stability.Volatile status.

Index Creation

For the purposes of the below examples we will use the Beer Sample sample bucket. Full Text Search indexes can be created through the UI or throuth the REST API, or created programatically as follows:

search_indexes = cluster.search_indexes.get_all_indexes
unless search_indexes.any? {|idx| idx.name == "my-index-name"}
  index = Management::SearchIndex.new
  index.type = "fulltext-index"
  index.name = "my-index-name"
  index.source_type = "couchbase"
  index.source_name = "beer-sample"
  index.params = {
      mapping: {
          default_datetime_parser: "dateTimeOptional",
          types: {
              "beer" => {
                  properties: {
                      "abv" => {
                          fields: [
                              {
                                  name: "abv",
                                  type: "number",
                                  include_in_all: true,
                                  index: true,
                                  store: true,
                                  docvalues: true,
                              }
                          ]
                      },
                      "category" => {
                          fields: [
                              {
                                  name: "category",
                                  type: "text",
                                  include_in_all: true,
                                  include_term_vectors: true,
                                  index: true,
                                  store: true,
                                  docvalues: true,
                              }
                          ]
                      },
                      "description" => {
                          fields: [
                              {
                                  name: "description",
                                  type: "text",
                                  include_in_all: true,
                                  include_term_vectors: true,
                                  index: true,
                                  store: true,
                                  docvalues: true,
                              }
                          ]
                      },
                      "name" => {
                          fields: [
                              {
                                  name: "name",
                                  type: "text",
                                  include_in_all: true,
                                  include_term_vectors: true,
                                  index: true,
                                  store: true,
                                  docvalues: true,
                              }
                          ]
                      },
                      "style" => {
                          fields: [
                              {
                                  name: "style",
                                  type: "text",
                                  include_in_all: true,
                                  include_term_vectors: true,
                                  index: true,
                                  store: true,
                                  docvalues: true,
                              }
                          ]
                      },
                      "updated" => {
                          fields: [
                              {
                                  name: "updated",
                                  type: "datetime",
                                  include_in_all: true,
                                  index: true,
                                  store: true,
                                  docvalues: true,
                              }
                          ]
                      },
                  }
              }
          }
      }
  }
  cluster.search_indexes.upsert_index(index)
  num_indexed = 0
  loop do
    sleep(1)
    num = cluster.search_indexes.get_indexed_documents_count(index.name)
    break if num_indexed == num
    num_indexed = num
    puts "#{index.name.inspect} indexed #{num_indexed}"
  end
end

Examples

In versions of Couchbase Server starting from 7.6, Search queries are executed at either the Scope or the Cluster level; in earlier versions, they are just performed at the cluster level. (not bucket or collection).

We will perform an FTS query here - see the [vector search] section for examples of that. Here is a simple query that looks for the text "hop beer" using the defined index:

result = cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.query_string("hop beer")
  )
result.rows.each do |row|
  puts "id: #{row.id}, score: #{row.score}"
end
#=>
# id: great_divide_brewing-fresh_hop_pale_ale, score: 0.8361701974709099
# id: left_coast_brewing-hop_juice_double_ipa, score: 0.7902867513072585
# ...

puts "Reported total rows: #{result.meta_data.metrics.total_rows}"
#=> Reported total rows: 6043

match_phrase() builds a phrase query is built from the results of an analysis of the terms in the query phrase; here it’s built on a search in the name field.

options = Cluster::SearchOptions.new
options.fields = ["name"]
result = cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.match_phrase("hop beer"),
    options
  )
result.rows.each do |row|
  puts "id: #{row.id}, score: #{row.score}\n  fields: #{row.fields}"
end
#=>
# id: deschutes_brewery-hop_henge_imperial_ipa, score: 0.7752384807123055
#   fields: {"name"=>"Hop Henge Imperial IPA"}
# id: harpoon_brewery_boston-glacier_harvest_09_wet_hop_100_barrel_series_28, score: 0.6862594775775723
#   fields: {"name"=>"Glacier Harvest '09 Wet Hop (100 Barrel Series #28)"}

puts "Reported total rows: #{result.meta_data.metrics.total_rows}"
# Reported total rows: 2

Working with Results

The result of a Search query has three components: rows, facets, and metdata. Rows are the documents that match the query. Facets allow the aggregation of information collected on a particular result set. Metdata holds additional information not directly related to your query, such as success, total hits, and how long the query took to execute in the cluster.

Iterating Rows

Here we are iterating over the rows that were returned in the results. Highlighting has been selected for the description field in each row, and the total number of rows is taken from the metrics returned in the metadata:

options = Cluster::SearchOptions.new
options.highlight_style = :html
options.highlight_fields = ["description"]
result = cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.match_phrase("banana"),
    options
  )
result.rows.each do |row|
  puts "id: #{row.id}, score: #{row.score}"
  row.fragments.each do |field, excerpts|
    puts "  #{field}: "
    excerpts.each do |excerpt|
      puts "  * #{excerpt}"
    end
  end
end
#=>
# id: wells_and_youngs_brewing_company_ltd-wells_banana_bread_beer, score: 0.8269933841266812
# description:
#     * A silky, crisp, and rich amber-colored ale with a fluffy head and strong <mark>banana</mark> note on the nose.
# ...

puts "Reported total rows: #{result.meta_data.metrics.total_rows}"
# Reported total rows: 41

With skip and limit a slice of the returned data may be selected:

options = Cluster::SearchOptions.new
options.skip = 4
options.limit = 3
result = cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.query_string("hop beer"),
    options
  )
result.rows.each do |row|
  puts "id: #{row.id}, score: #{row.score}"
end
#=>
# id: harpoon_brewery_boston-glacier_harvest_09_wet_hop_100_barrel_series_28, score: 0.6862594775775723
# id: lift_bridge_brewery-harvestor_fresh_hop_ale, score: 0.6674211556164669
# id: southern_tier_brewing_co-hop_sun, score: 0.6630296619927506

puts "Reported total rows: #{result.meta_data.metrics.total_rows}"
# Reported total rows: 6043

Ordering rules can be applied via sort and SearchSort:

options = Cluster::SearchOptions.new
options.sort = [
    Cluster::SearchSort.score,
    Cluster::SearchSort.field("name"),
]
cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.match_phrase("hop beer"),
    options
  )

Facets

options = Cluster::SearchOptions.new
categories_facet = Cluster::SearchFacet.term("category")
categories_facet.size = 5
options.facets = {"categories" => categories_facet}
cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.query_string("hop beer"),
    options
  )

Scoped vs Global Indexes

The FTS APIs exist at both the Cluster and Scope levels.

This is because FTS supports, as of Couchbase Server 7.6, a new form of "scoped index" in addition to the traditional "global index".

It’s important to use the Cluster.searchQuery() / Cluster.search() for global indexes, and Scope.search() for scoped indexes.

Vector Search

As of Couchbase Server 7.6, the FTS service supports vector search in additional to traditional full text search queries.

Examples

Single vector query

In this first example we are performing a single vector query:

request = SearchRequest.new(
  VectorSearch.new(VectorQuery.new('vector_field', vector_query))
)
result = scope.search('vector-index', request)

result.rows.each do |row|
  puts "Document ID: #{row.id}, search score: #{row.score}"
end

Let’s break this down. We create a SearchRequest, which can contain a traditional FTS query SearchQuery and/or the new VectorSearch. Here we are just using the latter.

The VectorSearch allows us to perform one or more VectorQuery s.

The VectorQuery itself takes the name of the document field that contains embedded vectors ("vector_field" here), plus actual vector query in the form of a Array<Float>.

(Note that Couchbase itself is not involved in generating the vectors, and these will come from an external source such as an embeddings API.)

Finally we execute the SearchRequest against the FTS index "vector-index", which has previously been setup to vector index the "vector_field" field.

This happens to be a scoped index so we are using Scope#Search. If it was a global index we would use Cluster#Search instead - see Scoped vs Global Indexes.

It returns the same SearchResult detailed earlier.

Multiple vector queries

You can run multiple vector queries together:

request = SearchRequest.new(
  VectorSearch.new(
    [
      VectorQuery.new('vector_field', vector_query) do |q|
        q.num_candidates = 2
        q.boost = 0.3
      end,
      VectorQuery.new('vector_field', another_vector_query) do |q|
        q.num_candidates = 5
        q.boost = 0.7
      end,
    ],
    Options::VectorSearch.new(vector_query_combination: :and)
  )
)
result = scope.search('vector-index', request)

result.rows.each do |row|
  puts "Document ID: #{row.id}, search score: #{row.score}"
end

How the results are combined (ANDed or ORed) can be controlled with the vector_query_combination attribute of Options::VectorSearch.

Combining FTS and vector queries

You can combine a traditional FTS query with vector queries:

request = SearchRequest.new(
  VectorSearch.new(VectorQuery.new('vector_field', vector_query))
)
request.search_query(SearchQuery.match_all)
result = scope.search('vector-and-fts-index', request)

result.rows.each do |row|
  puts "Document ID: #{row.id}, search score: #{row.score}"
end

FTS queries

And note that traditional FTS queries, without vector search, are also supported with the new Cluster#Search / Scope#Search APIs:

request = SearchRequest.new(
  SearchQuery.match('swanky')
)
result = scope.search('travel-sample-index', request)

result.rows.each do |row|
  puts "Document ID: #{row.id}, search score: #{row.score}"
end

The SearchQuery is created in the same way as detailed earlier.

Consistency

Like the Couchbase Query Service, FTS allows consistent_with() queries — Read-Your-Own_Writes (RYOW) consistency, ensuring results contain information from updated indexes:

random_value = rand
result = collection.upsert("cool-beer-#{random_value}", {
    "type" => "beer",
    "name" => "Random Beer ##{random_value}",
    "description" => "The beer full of randomness"
})
mutation_state = MutationState.new(result.mutation_token)

options = Cluster::SearchOptions.new
options.fields = ["name"]
options.consistent_with(mutation_state)
result = cluster.search_query(
    "my-index-name",
    Cluster::SearchQuery.match_phrase("randomness"),
    options
  )
result.rows.each do |row|
  puts "id: #{row.id}, score: #{row.score}\n  fields: #{row.fields}"
end
#=>
# id: cool-beer-0.4332638785378332, score: 2.6573492057051666
#   fields: {"name"=>"Random Beer #0.4332638785378332"}

puts "Reported total rows: #{result.meta_data.metrics.total_rows}"
# Reported total rows: 1