After we upgrade elastisearch-1.5.2 we are getting java.io.EOFException - elasticsearch

After upgrade elasticsearch to 1.5.2 we are repeatedly getting:
java.io.EOFException: read past EOF:
MMapIndexInput(path="/iqs/ESData/elasticsearch/nodes/0/indices/ids_1/1/index/segments_7")
If we restart the cluster also the same exception is coming continuously. So we have one option left to delete the corrupted segment. But it's not the correct solution to our busy cluster. Can anyone suggest please.

Related

elasticsearch node not alive

I need some help and I will try and offer as much information as I can as I am unfamiliar with Elasticsearch.
I have received access to a server that has elasticsearch installed and uses I am guessing one node to run the elastic search.
When running docker ps -a I can see the name of the container and it's ID and I can also log into it.
however, in a certain part of the application I am getting this error message:
production.INFO: Exception at search page No alive nodes found in your cluster
When digging in a little more I can see the following:
production.ERROR: No alive nodes found in your cluster {"userId":1639,"exception":"[object] (Elasticsearch\Common\Exceptions\NoNodesAvailableException(code: 0): No alive nodes found in your cluster at /var/www/vendor/elasticsearch/elasticsearch/src/Elasticsearch/ConnectionPool/StaticNoPingConnectionPool.php:50)*
I am assuming the problem is that there is no connection with the node but all answers I found on the web do not specify how to fix the issue or when I try the fixes I get other errors on my side (systemctl not installed and such).
Can anyone explain how I can restart the nodes through the cli? I know for certain the code was not changed so it has to be something to do with the server.
If anyone can help me out that would be great! thanks for your time
So my issue was I needed to run -
sysctl -w vm.max_map_count=262144
I understand this is to increase virtual memory for the container (I found this in a document that was left in the system).
But I would really appreciate if someone can explain why this issue suddenly appeared and if there is a better solution I can use.

corrupted/unassigned elasticsearch index

I have run the Elasticsearch service for quite long time, but suddenly encountered the following
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [d:\elasticsearch-7.1.0\data\nodes\0]indices\A2CcAAE-R3KkQh6jSoaEUA\2\translog\translog-1.tlog] is corrupted, expected shard UUID [.......] but got: [...........] this translog file belongs to a different translog.
I executed the GET /_ca/shards?v and most of the indexes are UNASSIGNED state.
Please help!
I went through the log files and saw the error message "Failed to update shard information for ClusterInfoUpdateJob within 15s timeout", could this error message cause most of the shards turn to UNASSIGNED?
You can try to recover using elasticsearch-translog tool as explained in the documentation
Elasticsearch should be stopped while running this tool
If you don't have replica from which data can be recovered, you may lose some data by using the tool.
Reason is mentioned that drive error or user error.

Greenplum error requested WAL segment has already been removed

I have a Greenplum cluster.I am monitoring it through GPmon
I am getting error
requested WAL segment 00000001000000080000000F has already been removed.
How to remove this?
You can try removing the standby master and then reinitializing it to resolve this issue.
Thanks!

Storm: How to resubmit topology automatically when it occurs exception?

I have a topology running on a Storm cluster with 3 supervisor nodes(32GRAM each node). In the first several days, the topology goes well, everything is ok. But the following error always occurred and the topology gone down after several days running:
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/topics/TOPICNAME/partitions at storm.kafka.ZkCoordinator.refresh
The topology uses a spout to consume messages from a remote Kafka service which sits on an remote server and this server is also the zookeeper service on.
I guess the reason for this exception is that the zookeeper server is instability, OR the network connection is unstable.
I have no permission to do anything with the remote kafka/zookeeper server, So I need a solution by my side to keep the topology running stably. Is there anyway to let the topology runs stably OR anyway to skip the exception while it comes out?
Or is there anyway to resubmit topology automatically?
Thank you very much!
The first thing you should have done is to google for what causes the connection loss error.
Then go to storm's log files and view which line of code is causing the error.
The right way to do things is to find out what is causing the error.
However, if you want the quicker temporary solution, then use Storm's REST API to kill the topology. Then you can use a normal Java program or a script in any language to re-launch the topology from the commandline.

failed to send join request to master

I am using elasticsearch version 1.3.0. After restarting data node its unable to detect master node. I am getting error failed to send join request to master
In Error log :-
[[app101][dGRBqTFTQfae76IFCjsMmQ][app101][inet[/127.0.0.2:9300]]{master=true, river=_none_}],
reason [org.elasticsearch.transport.RemoteTransportException:
[app101][inet[/127.0.0.2:9300]][discovery/zen/join];
org.elasticsearch.transport.RemoteTransportException:
[app102][inet[/127.0.0.1:9300]][discovery/zen/join/validate];
org.elasticsearch.ElasticsearchIllegalArgumentException:
No custom index metadata factory registered for type [rivers]]
Edit:
after reinstalling jdbc river fix the problem. But still I am looking the way if remove jdbc river and restart the ES cluster. Then it should detect master node.

Resources