You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There is A Merge Task On Node1#Index1#Shard1(long time running)
After merge task started, begin relocating from Node1#Index1#Shard1 TO Node2#Index1#Shard1
At the finalize step, source need do closeShard, but the merge task would take a long time, stack as following shows.
The clusterApplierService would wait for about N minutes(long time running), and mark the node stale, and master let node1 left because node1 long time no response.
opensearch[datanode1][clusterApplierService#updateTask][T#1]" #41 daemon prio=5 os_prio=0 cpu=5183.70ms elapsed=93132.85s tid=0x00007f3f392509d0 nid=0x101 in Object.wait() [0x00007f3f6ddfb000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <no object reference available>
at org.apache.lucene.index.IndexWriter.doWait(IndexWriter.java:5410)
- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.abortMerges(IndexWriter.java:2721)
- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.rollbackInternalNoCommit(IndexWriter.java:2469)
- locked <0x0000001022b0abe8> (a org.apache.lucene.index.IndexWriter)
at org.apache.lucene.index.IndexWriter.rollbackInternal(IndexWriter.java:2449)
- locked <0x0000001022bae6d0> (a java.lang.Object)
at org.apache.lucene.index.IndexWriter.rollback(IndexWriter.java:2441)
at org.opensearch.index.engine.InternalEngine.closeNoLock(InternalEngine.java:2370)
at org.opensearch.index.engine.Engine.close(Engine.java:2000)
at org.opensearch.index.engine.Engine.flushAndClose(Engine.java:1987)
at org.opensearch.index.shard.IndexShard.close(IndexShard.java:1907)
- locked <0x0000001022b07ea0> (a java.lang.Object)
at org.opensearch.index.IndexService.closeShard(IndexService.java:623)
at org.opensearch.index.IndexService.removeShard(IndexService.java:599)
- locked <0x0000001022a976a8> (a org.opensearch.index.IndexService)
at org.opensearch.index.IndexService.close(IndexService.java:374)
- locked <0x0000001022a976a8> (a org.opensearch.index.IndexService)
at org.opensearch.indices.IndicesService.removeIndex(IndicesService.java:993)
at org.opensearch.indices.cluster.IndicesClusterStateService.removeIndices(IndicesClusterStateService.java:446)
at org.opensearch.indices.cluster.IndicesClusterStateService.applyClusterState(IndicesClusterStateService.java:287)
- locked <0x000000100b7da520> (a org.opensearch.indices.cluster.IndicesClusterStateService)
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:606)
at org.opensearch.cluster.service.ClusterApplierService.callClusterStateAppliers(ClusterApplierService.java:593)
What is the bug?
Description
When there is a scenarios:
Node1#Index1#Shard1
(long time running)Node1#Index1#Shard1
TONode2#Index1#Shard1
PR #2529
The text was updated successfully, but these errors were encountered: