Elasticsearch: how to update mapping for existing fields?

March 19, 2022

Intro

Elasticsearch uses a process called mapping for defining fields, their types, and the way they should be indexed. Similar to SQL’s CREATE TABLE, you can specify the fields, their types and indexing methods during initial creation. Moreover, later on, similar to SQL’s ALTER TABLE, you can add new fields and change fields, well, not quite for all cases…

It is not easy, as it may appear, to change the datatype of an existing field in Elasticsearch, or adding a .keyword field, and for a good reason — the engine has already built the index for it, and otherwise it could break the existing data.

Fortunately, there is a sequence of commands that we can run on our index to introduce any change we want on our indexes. In this article, I will be guiding you through them.

The problem

Before diving into the steps and to understand the problem better, let’s create a new index named products:

PUT products

and assume we defined the following mapping for its properties:

PUT products/_mapping
{
  "properties": {
    "title": {
      "type": "text"
    },
    "description": {
      "type": "text"
    },
    "price": {
      "type": "float"
    },
    "category": {
      "type": "text"
    }
  }
}

Then, you start indexing (adding) documents:


POST products/_doc
{
  "title": "Sausages",
  "description": "The beautiful range of Apple Naturalé that has an excitin…gredients. With the Goodness of 100% Natural Ingredients",
  "price": "72.00",
  "category": "Food"
}

POST products/_doc
{
  "title": "Ball",
  "description": "Boston's most advanced compression wear technology increases muscle oxygenation, stabilizes active muscles",
  "price": "83.00",
  "category": "Sport"
}

POST products/_doc
{
  "title": "Shoes",
  "description": "The Football Is Good For Training And Recreational Purposes",
  "price": "93.00",
  "category": "Clothes"
}

POST products/_doc
{
  "title": "Salad",
  "description": "A salad is a dish consisting of mixed, mostly natural ingredients with at least one raw ingredient. They are often dressed, and typically served at room temperature or chilled, though some can be served warm.",
  "price": "93.00",
  "category": "Food"
}

POST products/_doc
{
  "title": "Mouse",
  "description": "The Apollotech B340 is an affordable wireless mouse with …e connectivity, 12 months battery life and modern design",
  "price": "48.00",
  "category": "Computers"
}

POST products/_doc
{
  "title": "Gloves",
  "description": "Carbonite web goalkeeper gloves are ergonomically designed to give easy fit",
  "price": "63.00",
  "category": "Outdoors"
}

Let’s say a customer on our website is looking for an affordable mouse in our store, and on the search bar, he/she may type either affordable mouse, mouse affordable, affordable mice, etc. We can use the Search API, to do the full-text search:

GET products/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "type": "most_fields", 
            "query": "mouse affordable",
            "fields": ["title", "description"]
          }
        }
      ]
    }
  }
}

and any of the customer’s potential queries above yields the same result (which is fantastic 🚀):

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 4.5613437,
    "hits" : [
      {
        "_index" : "products",
        "_id" : "gRqrpn8BdmTmb3ML_2e3",
        "_score" : 4.5613437,
        "_source" : {
          "title" : "Mouse",
          "description" : "The Apollotech B340 is an affordable wireless mouse with …e connectivity, 12 months battery life and modern design",
          "price" : "48.00",
          "category" : "Computers"
        }
      }
    ]
  }
}

Later, your Product owner asks you provide API’s that support filtering, sorting and aggregation capabilities. Cool, let’s try!

Full-text vs keyword

Filtering and sorting are keyword operations in Elasticsearch (as opposed to full-text search that we performed above), which means that they are performed on exact matches, or whole phrases on a field. For example, let’s say you have products with "category": "computer mouse". When you aggregate data, naturally, you want to get the number of products with category of exactly "computer mouse", not individually for "computer" and "mouse". Keyword fields help in this situation.

Failing operations

But first, let’s see what happens, if we try to sort, aggregate or filter data with our current setup:

Sorting by product title:

GET products/_search
{
  "sort": [
    {
      "title": {
        "order": "desc"
      }
    }
  ]
}

results in:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead."
      }
    ]
    // ...shortened for readability
  }
}

Running aggregations:

GET products/_search
{
  "size": 0, 
  "aggs": {
    "CATEGORY": {
      "terms": {
        "field": "category",
        "size": 100
      }
    }
  }
}

results in:

{
  "error" : {
    "root_cause" : [
      {
        "type" : "illegal_argument_exception",
        "reason" : "Text fields are not optimised for operations that require per-document field data like aggregations and sorting, so these operations are disabled by default. Please use a keyword field instead."
      }
    ]
    // ...shortened for readability
  }
}

Filtering by the category field:

GET products/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "category": "Food"
          }
        }
      ]
    }
  }
}

again, results in:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 0,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  }
}

which is wrong, because we know we have products with "Food" category.

As described above, we need to be able do these operations on .keyword fields. However, we cannot directly use the Update Mapping API to achieve that, and it will have no effect.

Solution

Now, finally let’s see the actual steps for updating our existing fields, which is the main purpose of this article.

DISCLAIMER: Be careful when running the commands to avoid potential data loss!

  1. Create another index:
PUT products_reindex
  1. Define the new/updated mapping, with all the changes you need. It can be easier if you copy the mapping from the source index, and add the necessary changes:
PUT products_reindex/_mapping
{
  "properties": {
    "title": {
      "type": "text",
      "fields": { 
        "keyword": { // add .keyword field
          "type": "keyword",
          "ignore_above": 256
        }
      }
    },
    "description": {
      "type": "text"
    },
    "price": {
      "type": "float"
    },
    "category": {
      "type": "text",
      "fields": {
        "keyword": { // add .keyword field
          "type": "keyword",
          "ignore_above": 256
        }
      }
    }
  }
}
  1. Use the Reindex API. Basically, the Reindex API copies only the “raw” data from the source index to the destination index, and applies the mapping of the destination index to the copied data.
POST _reindex
{
  "source": {
    "index": "products"
  },
  "dest": {
    "index": "products_reindex"
  }
}

In most cases, it is not acceptable, to change the index name all the time (as we have currently). So, it is best to keep the same index name. If you want to have the same name, continue:

  1. Make sure that your data has been successfully copied to the new index <your_index_name>_reindex, in our case it is products_reindex. Then, run to delete the old index:
DELETE /products

Nextly, we need to clone our products_reindex back to products.

  1. Lock writes on products_reindex:
PUT /products_reindex/_settings
{
  "index.blocks.write": "true"
}
  1. Use the Clone API. If you are wondering, what is the difference between the Clone API and Reindex API, the Reindex API copies only the data, while the Clone API copies everything, including data, mapping and all settings of an index to the new index (which should not exist before you run).
POST /products_reindex/_clone/products
{
  "settings": {
    "index.blocks.write": null 
  }
}

Note that we are also removing the write lock with this API call.

  1. It is important to check the index health status after this operation, using the Cluster Health API:
GET /_cluster/health/products?wait_for_status=green&timeout=15s
  1. That’s it, now we can check whether the operations that were failing are working now.

We need to add the .keyword field to the calls, that we added during Reindexing steps:

Sorting by product title:

GET products/_search
{
  "sort": [
    {
      "title.keyword": { // note the .keyword
        "order": "desc"
      }
    }
  ]
}

results in:

// ...shortened for readability
"hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "products",
        "_id" : "fxqrpn8BdmTmb3ML42fM",
        "_score" : null,
        "_source" : {
          "title" : "Shoes",
          "description" : "The Football Is Good For Training And Recreational Purposes",
          "price" : "93.00",
          "category" : "Clothes"
        },
        "sort" : [
          "Shoes"
        ]
      },
      {
        "_index" : "products",
        "_id" : "fRqrpn8BdmTmb3MLz2dp",
        "_score" : null,
        "_source" : {
          "title" : "Sausages",
          "description" : "The beautiful range of Apple Naturalé that has an excitin…gredients. With the Goodness of 100% Natural Ingredients",
          "price" : "72.00",
          "category" : "Food"
        },
        "sort" : [
          "Sausages"
        ]
      },
      {
        "_index" : "products",
        "_id" : "gBqrpn8BdmTmb3ML72dA",
        "_score" : null,
        "_source" : {
          "title" : "Salad",
          "description" : "A salad is a dish consisting of mixed, mostly natural ingredients with at least one raw ingredient. They are often dressed, and typically served at room temperature or chilled, though some can be served warm.",
          "price" : "93.00",
          "category" : "Food"
        },
        "sort" : [
          "Salad"
        ]
      },
      {
        "_index" : "products",
        "_id" : "gRqrpn8BdmTmb3ML_2e3",
        "_score" : null,
        "_source" : {
          "title" : "Mouse",
          "description" : "The Apollotech B340 is an affordable wireless mouse with …e connectivity, 12 months battery life and modern design",
          "price" : "48.00",
          "category" : "Computers"
        },
        "sort" : [
          "Mouse"
        ]
      },
      {
        "_index" : "products",
        "_id" : "ghqspn8BdmTmb3MLC2ef",
        "_score" : null,
        "_source" : {
          "title" : "Gloves",
          "description" : "Carbonite web goalkeeper gloves are ergonomically designed to give easy fit",
          "price" : "63.00",
          "category" : "Outdoors"
        },
        "sort" : [
          "Gloves"
        ]
      },
      {
        "_index" : "products",
        "_id" : "fhqrpn8BdmTmb3ML2meM",
        "_score" : null,
        "_source" : {
          "title" : "Ball",
          "description" : "Boston's most advanced compression wear technology increases muscle oxygenation, stabilizes active muscles",
          "price" : "83.00",
          "category" : "Sport"
        },
        "sort" : [
          "Ball"
        ]
      }
    ]
  }

Running aggregations:

GET products/_search
{
  "size": 0, 
  "aggs": {
    "CATEGORY": {
      "terms": {
        "field": "category.keyword",
        "size": 100
      }
    }
  }
}

results in:

// ...shortened for readability
"hits" : {
    "total" : {
      "value" : 6,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "CATEGORY" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "Food",
          "doc_count" : 2
        },
        {
          "key" : "Clothes",
          "doc_count" : 1
        },
        {
          "key" : "Computers",
          "doc_count" : 1
        },
        {
          "key" : "Outdoors",
          "doc_count" : 1
        },
        {
          "key" : "Sport",
          "doc_count" : 1
        }
      ]
    }
  }

Filtering by the category field:

GET products/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "category.keyword": "Food"
          }
        }
      ]
    }
  }
}

results in:

"hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "products",
        "_id" : "fRqrpn8BdmTmb3MLz2dp",
        "_score" : 0.0,
        "_source" : {
          "title" : "Sausages",
          "description" : "The beautiful range of Apple Naturalé that has an excitin…gredients. With the Goodness of 100% Natural Ingredients",
          "price" : "72.00",
          "category" : "Food"
        }
      },
      {
        "_index" : "products",
        "_id" : "gBqrpn8BdmTmb3ML72dA",
        "_score" : 0.0,
        "_source" : {
          "title" : "Salad",
          "description" : "A salad is a dish consisting of mixed, mostly natural ingredients with at least one raw ingredient. They are often dressed, and typically served at room temperature or chilled, though some can be served warm.",
          "price" : "93.00",
          "category" : "Food"
        }
      }
    ]
  }

We can see that all the operations succeeded with the expected result, and we can proceed with implementing the features requested by the Product owner 💪🏻.

  1. If you could successfully run the filtering, aggregation and sorting on the products index, you can now safely delete the temporary products_reindex index:
DELETE products_reindex

Conclusion

We have seen how to update mapping for existing fields in the index. Although, this article mostly focused on adding the .keyword field, these steps apply to any type of changes to the index mapping.


abdullaev.dev
Profile picture

Engineering blog by Azamat Abdullaev.

I write my <discoveries />.

All opinions are my own.