Elasticsearch Elasticsearch Keyword vs. Text

Opster Expert Team - Saskia

Last updated: Feb 12, 2023

| 2 min read

Opster Team

Last updated: Feb 12, 2023

| 2 min read

In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

To evaluate your use of string literals, we recommend you try AutoOps for Elasticsearch. It will also help you optimize other important settings and processes in Elasticsearch to improve performance and ensure high availability for your crucial data. Try it for free.

Elasticsearch keyword vs. text vs. wild card. Elasticsearch strings explanation

Overview

String literals in Elasticsearch can come in different flavors. Keyword, wildcard and text field types all have different features and are ideal for different use cases. Below is an explanation of the differences between each one and the context in which to use the different types for your string fields. 

Text vs. Keyword

By default, in recent versions of Elasticsearch all string fields get indexed as both text and keyword. 

The difference between text and keyword

In early Elasticsearch versions there was a field type called “string”. This was used to enable full text search. These fields would go through an analysis pipeline that performs operations such as lowercasing, removing punctuation, splitting the document into single tokens and filtering them further by stopwords etc. 

This process works perfectly for searching larger documents, but sometimes this isn’t the ideal behavior. When you want to filter by certain values or list them all using aggregations, you need a different type because you don’t want the input document to go through an analysis pipeline. You want it to stay not analyzed. 

So if you wanted to use a field for exact filtering or term aggregations you had to configure the field of type “string” with: “index” : “not_analyzed”. 

"old_string_field" : {
	"type" : "string",
	"fields" : {
  	  "keyword" : {
    	    "type" : "string",
    	    "index" : "not_analyzed"
  	  }
	}
  }

This was exactly how you could differentiate between text and keyword. Since this was not very intuitive for users not familiar with information retrieval, 2 new types were created: text and keyword. 

As of Elasticsearch version 5 the default mapping for String literals is:

"new_string_field" : {
	"type" : "text",
	 "fields" : {
  	    "keyword" : {
    	      "type" : "keyword"
  	    }
	}
  }

So the differences are: 

Keyword vs. Wildcard

When you’re planning to run many wildcard queries you should use the wildcard type. It works well for machine-generated content like log messages that you would typically grep through in the terminal. 

Performance is usually poor if you’re running wildcard queries on regular text or keyword fields. If you already know your users will run wildcard queries, you should use the wildcard field to maintain cluster stability. Read more about wildcard fields and how they process queries internally. 

The wildcard type was introduced in Elasticsearch version 7.9. 

Text vs. Match Only Text

The type “match_only_text” is very similar to “text” but it saves disk space by sacrificing granular scoring. Read more about it here.   

Code samples

Create a multi-field mapping to enable all string types on the field message:

PUT string-types
{
  "mappings": {
	"properties": {
  	"message": {
    	"type": "text",
    	"analyzer": "standard",
    	"fields": {
      	"keyword": {
        	"type" : "keyword"
      	},
      	"wildcard_field" : {
        	"type" : "wildcard"
      	}
    	}
  	}
	}
  }
}

Which Elasticsearch string type should I use?

Use the text field if:

  • You’re planning to perform regular fulltext search / search for a specific word or phrase
  • The content is in in regular, written text, such that a person could easily read

Use the keyword type if: 

  • You’re planning to filter exact values 
  • You’re planning to filter on prefix character sequences
  • You’re planning to perform term aggregations like for a faceted navigation on a website

Use the wildcard type if:

  • You’re trying to find the needle in poorly tokenized or machine generated text
  • You do not intend to use queries that rely on word positions

Use match_only_text if:

  • You intend to run fulltext search but granular scores are not very important to you

Watch product tour

Try AutoOps to find & fix Elasticsearch problems

Analyze Your Cluster
Skip to content