Failing shard – How to solve related issues

Opster Team

Jan-20, Version: 1.7-8.0

Before you begin reading this guide, we recommend you run Elasticsearch Error Check-Up which analyzes 2 JSON files to detect many errors.

Briefly, this error message indicates that Elasticsearch was unable to perform a query or indexing operation on a specific shard. The cause of the error can be due to a variety of issues, such as a corrupted shard or a lack of disk space. To resolve the issue, you can try running the _cat/shards API to identify the problematic shard, and then take appropriate actions based on the root cause.

To easily locate the root cause and resolve this issue try AutoOps for Elasticsearch & OpenSearch. It diagnoses problems by analyzing hundreds of metrics collected by a lightweight agent and offers guidance for resolving them. Take a self-guided product tour to see for yourself (no registration required).

This guide will help you check for common problems that cause the log ” Failing shard ” to appear. To understand the issues related to this log, read the explanation below about the following Elasticsearch concepts: allocation, cluster, routing and shard.

Log Context

Log “Failing shard [{}]” classname is AllocationService.java.
We extracted the following from Elasticsearch source code for those seeking an in-depth context :

                     failedShardEntry.getFailure(); failedAllocations + 1; currentNanoTime; System.currentTimeMillis(); false;
                    AllocationStatus.NO_ATTEMPT);
                if (failedShardEntry.markAsStale()) {
                    allocation.removeAllocationId(failedShard);
                }
                logger.warn(new ParameterizedMessage("failing shard [{}]"; failedShardEntry); failedShardEntry.getFailure());
                routingNodes.failShard(logger; failedShard; unassignedInfo; indexMetaData; allocation.changes());
            } else {
                logger.trace("{} shard routing failed in an earlier iteration (routing: {})"; shardToFail.shardId(); shardToFail);
            }
        }




 

Watch product tour

Try AutoOps to find & fix Elasticsearch problems

Analyze Your Cluster
Skip to content