I have a Greenplum cluster.I am monitoring it through GPmon
I am getting error
requested WAL segment 00000001000000080000000F has already been removed.
How to remove this?
You can try removing the standby master and then reinitializing it to resolve this issue.
Thanks!
Related
Yarn Timeline Server produces logs at /var/log/hadoop-yarn location. We see two type of log files:
hadoop-yarn-timelineserver-<host_ip>*.log*
hadoop-yarn-timelineserver-<host_ip>*.out*
The disk is getting filled due .out file growing indefinitely which result into disk full errors.
Any solution to rotate .out file or add a size check on the same
Can you check your DataNode and NameNode logs for any "IllegalAccessException" errors? If yes, then the issue you are facing is related to a known Hadoop bug of Jersey 1.9 version. There are temporary solutions such as upgrading to latest version of Jersey.
https://issues.apache.org/jira/browse/HADOOP-11461
I am trying to move my non-HA namenode to HA. After setting up all the configurations for JournalNode by following the Apache Hadoop documentation, I was able to bring the namenodes up. However, the namenodes are crashing immediately and throwing the follwing error.
ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: There appears to be a gap in the edit log. We expected txid 43891997, but got txid 45321534.
I tried to recover the edit logs, initialize the shared edits etc., but nothing works. I am not sure how to fix this problem without formatting namenode since I do not want to loose any data.
Any help is greatly appreciated. Thanking in advance.
The problem was with the limit of open files on a linux machine. I increased the limit of open files and then the initialization of shared edits worked.
I have run the Elasticsearch service for quite long time, but suddenly encountered the following
Caused by: org.elasticsearch.index.translog.TranslogCorruptedException: translog from source [d:\elasticsearch-7.1.0\data\nodes\0]indices\A2CcAAE-R3KkQh6jSoaEUA\2\translog\translog-1.tlog] is corrupted, expected shard UUID [.......] but got: [...........] this translog file belongs to a different translog.
I executed the GET /_ca/shards?v and most of the indexes are UNASSIGNED state.
Please help!
I went through the log files and saw the error message "Failed to update shard information for ClusterInfoUpdateJob within 15s timeout", could this error message cause most of the shards turn to UNASSIGNED?
You can try to recover using elasticsearch-translog tool as explained in the documentation
Elasticsearch should be stopped while running this tool
If you don't have replica from which data can be recovered, you may lose some data by using the tool.
Reason is mentioned that drive error or user error.
I have an existing replication in Couchbase -> ElasticSearch. I found out that there is now errors in replicating:
I tried to CREATE Replication again but it also gave the same error:
I already checked my elasticsearch plugin_head and I can see data in there and I can query with results. I restarted also my elasticsearch batch file but still error is persistent.
Anyone can help me on what else I need to check to further investigate the issue? Thank you in advance.
You may have a connectivity problem, which can happen due to networking issues like an IP address change since you initially setup the replication.
You might try the troubleshooting steps outlined here if you haven't already:
http://developer.couchbase.com/documentation/server/4.1/connectors/elasticsearch-2.1/trouble-intro.html
You should also check the goxdcr logs, which you can find here depending on the OS you're using:
http://developer.couchbase.com/documentation/server/4.0/troubleshooting/troubleshooting-logs.html
After upgrade elasticsearch to 1.5.2 we are repeatedly getting:
java.io.EOFException: read past EOF:
MMapIndexInput(path="/iqs/ESData/elasticsearch/nodes/0/indices/ids_1/1/index/segments_7")
If we restart the cluster also the same exception is coming continuously. So we have one option left to delete the corrupted segment. But it's not the correct solution to our busy cluster. Can anyone suggest please.