Elasticsearch Scaling Elasticsearch – Drain Strategy for Scaling Down Resources

Opster Expert Team - Gustavo

Last updated: Mar 6, 2023

| 3 min read

Opster Team

Last updated: Mar 6, 2023

| 3 min read

In addition to reading this guide, we recommend you run the Elasticsearch Health Check-Up. It will detect issues and improve your Elasticsearch performance by analyzing your shard sizes, threadpools, memory, snapshots, disk watermarks and more.

The Elasticsearch Check-Up is free and requires no installation.

To easily orchestrate and manage Elasticsearch, we romcommed you try Opster’s Management Console (OMC). By using the OMC you can deploy multiple clusters, configure node roles, scale cluster resources, manage certificates and more – all from a single interface, for free. Check it out.

This guide will cover how to plan for scaling down resources in Elasticsearch. If you’d like to perform this automatically, you can run self-hosted with OpenSearch and use Opster’s Kubernetes Operator.

The instructions in this guide are for manual processes in Elasticsearch.

Quick links

Introduction

When scaling down resources in Elasticsearch clusters, be aware of what happens when decreasing the number of nodes, and the correct procedure to remove them, while maintaining cluster stability.

The main considerations are:

  • Performance Issues
  • Preserving quorum of master nodes
  • Data integrity issues

Performance issues that could be caused by draining

Disk availability 

Removing a data node will transfer all its data to the other data nodes.  Make sure that the other data nodes have sufficient disk capacity to receive the extra data, without exceeding the low disk watermark

Be aware that if  you are running Hot-Warm-Cold Architecture or Zone Awareness that this architecture creates restrictions on which shards are allocated on which nodes, which must also be taken into consideration when evaluating the available disk space on the remaining nodes.  For a full explanation on how Elasticsearch manages disk space, read here

Index and search rate 

If you remove a data node, then the indexing and search activity of that node will need to be shared across other data nodes. Check your monitoring data to evaluate whether the remaining nodes have sufficient resources to handle the extra indexing and search activity. 

As a rough guide, consider that the CPU usage on all remaining nodes will increase in proportion to the number of nodes reduced, divided by the total number of data nodes. For example, if you have 8 data nodes with a peak CPU of around 50%, after removing 1 node, you would expect a peak CPU of approx (50% * 8/7)= 57%.  This is a very rough approximation, as true usage will depend on how shards are distributed across your cluster.

Considerations when removing an Elasticsearch node 

  • Master eligible node 
  • Non-master eligible node

Removing master eligible nodes – Preserving quorum of master nodes

Important! A master eligible node is any node that has roles: [master], including data nodes with the master role. In general, users should be particularly careful removing master eligible nodes, because under some circumstances the remaining master nodes may not be able to elect new master nodes, resulting in cluster downtime.  

For a high availability Elasticsearch cluster, always have at least 3 master nodes. This will ensure that if one node is lost, the remaining master eligible nodes are still able to elect a new master.  Furthermore, if users intend to remove more than half of the eligible nodes in a short timeframe, they must remove the nodes they want to shut down from the voting configuration before shutting down the nodes using the command below. This is because the cluster may not have had sufficient time to automatically reduce or adjust the minimum quorum necessary to elect a new master.

POST /_cluster/voting_config_exclusions?node_names=node_name1,node_name2

Ensuring data integrity when removing data nodes

Removing a data node will require transferring the data to all the other data nodes, which will be  resource intensive. In order to minimize the resources required and ensure data integrity, users should carry out the shard migration in an orderly way.  To do so, carry out the following steps:

How to migrate Elasticsearch shards correctly:

  • Ensure there’s a recent snapshot backup of Elasticsearch cluster/s.

  • Delete any unnecessary indices.

  • Stop any unnecessary indexing.

  • Ensure each cluster is green before continuing the process.

  • Migrate shards from the node(s) you want to shut down.

 PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": "192.168.1.150"
  }
}
  • Check that there are no shards left on the node you want to shut down
GET /_cat/allocation?v=true

If the result of the above command shows 0 shards on the excluded node(s), safely shut down the node(s) to permanently remove these from the cluster.

How to remove a node from the Elasticsearch.yml configuration

For  master eligible nodes, users should remove the IP address from elasticsearch.yml configurations:

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11
   - seeds.mydomain.com
cluster.initial_master_nodes: 
   - master-node-a
   - master-node-b
   - master-node-c

How to rejoin excluded nodes 

If a data node is temporarily removed, and users later wish to return it to the cluster, then ensure the allocation exclusion setting created earlier is removed.

 PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.exclude._ip": null
  }
}

Watch product tour

Try AutoOps to find & fix Elasticsearch problems

Analyze Your Cluster
Skip to content