ElasticSearch ran out of disk space on a couple nodes and this is the resulting error
The error we're getting is:
[WARN][indices.store ] [odr-es-md15] Can't open file to read checksums
java.io.FileNotFoundException: No such file [_2rlb_es090_0.pos]
at org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:176)
at org.elasticsearch.index.store.DistributorDirectory.getDirectory(DistributorDirectory.java:144)
at org.elasticsearch.index.store.Store$MetadataSnapshot.buildMetadata(Store.java:482)
at org.elasticsearch.index.store.Store$MetadataSnapshot.<init>(Store.java:456)
at org.elasticsearch.index.store.Store.readMetadataSnapshot(Store.java:281)
...
...
...
The translog may be corrupt. I followed the directions from here:
http://unpunctualprogrammer.com/2014/05/13/corrupt-elasticsearch-translogs/
To fix this you need to look in the ES Logs,
/var/log/elasticsearch/elasticsearch.log for CentOS, and find the
error lines above. On those lines you’ll see something like
[<timestamp>][WARN ][cluste.action.shard] [<wierd name>] [logstash-2014.05.13][X]
where X (shard) is some number, likely (0,1,2,3,4), and the block
before that, logstash-date for me, and you if your doing centralized
logging like we are, is the index name. You then need to go to the
index location, /var/lib/elasticsearch/elasticsearch/nodes/0/indices/
on centos. In that directory you’ll be able to find the following
structure, logstash-date/X/translog/translog-.
That’s the file you’ll need to delete, so:
sudo service stop elasticsearch
sudo rm /var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-date/X/translog/translog-blalblabla
repeat step 2 for all indices and shards in the error log
sudo service start elasticsearch
Watch the logs and repeat that process as needed until the ES logs
stop spitting out stack traces.
Related
Hi I have installed Kamalio it start first time but when I stop and start it again it gives sctp_core_destroy(): SCTP API not initialized . I have already installed sctp module.
yyerror_at(): parse error in config file /etc/kamailio/kamailio.cfg
load_module(): could not find module <db_mysql> in </usr/lib/kamailio/modules>
[sctp_core.c:53]: sctp_core_destroy(): SCTP API not initialized
From the log it is obvious that you have successfully compiled & installed SCTP module, however it could NOT be initialized.
Note that is error could must often than not be as a result of other errors in your cfg file.
Few tips:
Can you run kamailio -c and to be sure there is NO error in your cfg.
Found error? use this command to monitor what the exact issue is. Run from a different terminal tail -fn200 /var/log/syslog
On the second terminal try restarting you Kamalio server sudo service kamalio restart
Revisit terminal 1 and look out for the first line with CRITICAL output like the one below CRITICAL: <core> [core/cfg.y:3413]: yyerror_at(): parse error in config file /usr/local/etc/kamailio/kamailio.cfg, line 366, column 41: syntax error
Line 366 mostly is the issue so visit that file at that line (366) to fix the proble
sudo nano +366 /usr/local/etc/kamailio/kamailio.cfg
Let me know if it helps
I am trying to put some basic query in hive editor in hue browser , but it is returning the following error whereas my Hivecli works fine and able to execute queries. Could someone help me?
Fetching results ran into the following error(s):
Bad status for request TFetchResultsReq(fetchType=1,
operationHandle=TOperationHandle(hasResultSet=True,
modifiedRowCount=None, operationType=0,
operationId=THandleIdentifier(secret='r\t\x80\xac\x1a\xa0K\xf8\xa4\xa0\x85?\x03!\x88\xa9',
guid='\x852\x0c\x87b\x7fJ\xe2\x9f\xee\x00\xc9\xeeo\x06\xbc')),
orientation=4, maxRows=-1):
TFetchResultsResp(status=TStatus(errorCode=0, errorMessage="Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]",
sqlState=None,
infoMessages=["*org.apache.hive.service.cli.HiveSQLException:Couldn't
find log associated with operation handle: OperationHandle
[opType=EXECUTE_STATEMENT,
getHandleIdentifier()=85320c87-627f-4ae2-9fee-00c9ee6f06bc]:24:23",
'org.apache.hive.service.cli.operation.OperationManager:getOperationLogRowSet:OperationManager.java:229',
'org.apache.hive.service.cli.session.HiveSessionImpl:fetchResults:HiveSessionImpl.java:687',
'sun.reflect.GeneratedMethodAccessor14:invoke::-1',
'sun.reflect.DelegatingMethodAccessorImpl:invoke:DelegatingMethodAccessorImpl.java:43',
'java.lang.reflect.Method:invoke:Method.java:606',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:78',
'org.apache.hive.service.cli.session.HiveSessionProxy:access$000:HiveSessionProxy.java:36',
'org.apache.hive.service.cli.session.HiveSessionProxy$1:run:HiveSessionProxy.java:63',
'java.security.AccessController:doPrivileged:AccessController.java:-2',
'javax.security.auth.Subject:doAs:Subject.java:415',
'org.apache.hadoop.security.UserGroupInformation:doAs:UserGroupInformation.java:1657',
'org.apache.hive.service.cli.session.HiveSessionProxy:invoke:HiveSessionProxy.java:59',
'com.sun.proxy.$Proxy19:fetchResults::-1',
'org.apache.hive.service.cli.CLIService:fetchResults:CLIService.java:454',
'org.apache.hive.service.cli.thrift.ThriftCLIService:FetchResults:ThriftCLIService.java:672',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1553',
'org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults:getResult:TCLIService.java:1538',
'org.apache.thrift.ProcessFunction:process:ProcessFunction.java:39',
'org.apache.thrift.TBaseProcessor:process:TBaseProcessor.java:39',
'org.apache.hive.service.auth.TSetIpAddressProcessor:process:TSetIpAddressProcessor.java:56',
'org.apache.thrift.server.TThreadPoolServer$WorkerProcess:run:TThreadPoolServer.java:285',
'java.util.concurrent.ThreadPoolExecutor:runWorker:ThreadPoolExecutor.java:1145',
'java.util.concurrent.ThreadPoolExecutor$Worker:run:ThreadPoolExecutor.java:615',
'java.lang.Thread:run:Thread.java:745'], statusCode=3), results=None,
hasMoreRows=None)
This error could be either due to HiveServer2 not running or Hue does not have access to hive_conf_dir.
Check whether the HiveServer2 has been started and is running. It uses the port 10000 by default.
netstat -ntpl | grep 10000
If it is not running, start the HiveServer2
$HIVE_HOME/bin/hiveserver2
Also check the Hue configuration file hue.ini. The hive_conf_dir property must be set under [beeswax] section. If not set, add this property under [beeswax]
hive_conf_dir=$HIVE_HOME/conf
Restart supervisor after making these changes.
When I use the logstash_input_jdbc plugin sync MySQL and my local elastic search,
The below errors appear, But I search for a long time, but I have no resolve method until now.
./logstash -f ./logstash_jdbc_test/jdbc.conf
Pipeline aborted due to error {:exception=>#,
:backtrace=>["/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/plugin_mixins/jdbc.rb:156:in
prepare_jdbc_connection'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-input-jdbc-3.0.2/lib/logstash/inputs/jdbc.rb:167:in
register'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:330:in
start_inputs'", "org/jruby/RubyArray.java:1613:ineach'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:329:in
start_inputs'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:180:in
start_workers'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/pipeline.rb:136:in
run'",
"/usr/local/logstash/vendor/bundle/jruby/1.9/gems/logstash-core-2.3.2-java/lib/logstash/agent.rb:465:in
start_pipeline'"], :level=>:error}
Yesterday, I find the reason.
The reason is:
In my install path /elasticsearch-jdbc-2.3.2.0/lib, the size of mysql-connector-java-5.1.38.jar is zero.
So I download the new mysql-connector-java-5.1.38.jar, and copy to the path of /elasticsearch-jdbc-2.3.2.0/lib.
And then, my problem resolved.
Now I can sync date between mysql and elaticsearch quickly.
If I issue this webhcat REST call in my cloudera 5.4.1 environment
curl -s 'http://mywebhcat:50111/templeton/v1/ddl/database/default/table/person?
user.name=admin&format=extended'; echo; echo;
everything works fine and I see the metadata for the table Person.
But there is another table called foo_bar if I change the REST call above to
curl -s 'http://mywebhcat:50111/templeton/v1/ddl/database/default/table/foo_bar?
user.name=admin&format=extended'; echo; echo;
Then I get an error
{"statement":"use default; show table extended like
foo_bar;","error":"unable to show table:
foo_bar","exec":{"stdout":"","stderr":"which: no /opt/cloudera/parcels/
CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hadoop/bin/hadoop in ((null))\ndirname: missing operand\nTry
`dirname --help' for more information.\nlog4j:ERROR setFile(null,true) call failed.\njava.io
.FileNotFoundException: /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/lib/hive/logs/hcat.
log (No such file or directory)\n\tat java.io.FileOutputStream.open0(Native Method)\n\tat
java.io.FileOutputStream.open(FileOutputStream.java:270)\n\tat java.io.FileOutputStream.<
init>(FileOutputStream.java:213)\n\tat java.io.FileOutputStream.<init>(FileOutputStream.
java:133)\n\tat org.apache.log4j.FileAppender.setFile(FileAppender.java:294)\n\tat org.
apache.log4j.FileAppender.activateOptions(FileAppender.java:165)\n\tat org.apache.log4j.
DailyRollingFileAppender.activateOptions(DailyRollingFileAppender.java:223)\n\tat org.apache
.log4j.config.PropertySetter.activate(PropertySetter.java:307)\n\tat org.apache.log4j.config
.PropertySetter.setProperties(PropertySetter.java:172)\n\tat org.apache.log4j.config.
PropertySetter.setProperties(PropertySetter.java:104)\n\tat org.apache.log4j.
PropertyConfigurator.parseAppender(PropertyConfigurator.java:842)\n\tat org.apache.log4j.
PropertyConfigurator.parseCategory(PropertyConfigurator.java:768)\n\tat org.apache.log4j.
PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:648)\n\tat org.apache.
log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:514)\n\tat org.apache.log4j
.PropertyConfigurator.doConfigure(PropertyConfigurator.java:580)\n\tat org.apache.log4j.
PropertyConfigurator.configure(PropertyConfigurator.java:415)\n\tat org.apache.hadoop.hive.
common.LogUtils.initHiveLog4jDefault(LogUtils.java:127)\n\tat org.apache.hadoop.hive.common.
LogUtils.initHiveLog4jCommon(LogUtils.java:77)\n\tat org.apache.hadoop.hive.common.LogUtils.
initHiveLog4j(LogUtils.java:58)\n\tat org.apache.hive.hcatalog.cli.HCatCli.main(HCatCli.
java:65)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect
.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.
DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.
reflect.Method.invoke(Method.java:497)\n\tat org.apache.hadoop.util.RunJar.run(RunJar.
java:221)\n\tat org.apache.hadoop.util.RunJar.main(RunJar.java:136)\nlog4j:ERROR Either
File or DatePattern options are not set for appender [DRFA].\nOK\nTime taken: 1.035
seconds\n Command was terminated due to timeout(10000ms). See templeton.exec.timeout
property","exitcode":143}}
I don't know why it complains about a missing log directory only for foo_bar table but successfully returns the metadata regarding Person.
BTW, I can go into the hive console and do a select count(*) query on both Person and foo_bar.
EDIT::
Upon reading the error message again, it seems the core problem is
Command was terminated due to timeout(10000ms). See templeton.exec.timeout property","exitcode":143
But cloudera manager is not aware of this property "templeton.exec.timeout" ... what do I do... I don't want to edit files manually as there are many many nodes in the cluster.
Edit2::
I went inside each hadoop node and did
sudo vi /opt/cloudera/parcels/CDH-5.4.1-1.cdh5.4.1.p0.6/etc/hive-webhcat/conf.dist/webhcat-default.xml
I found the timeout value and increased it to 1000000. I did this on each one and then I restarted Hive and WebHCat server using cloudera manager. but I got exactly the same error message.
I added a resource and then deleted it afterwards.
However, when I issue the following command:
crm_resource -l
It is still listed! I try to remove it:
crm configure delete <resource_name>
I get the following error:
ERROR: object <resource_name> does not exist
Moreover:
crm configure show | grep <resource_name>
doesn't match any resource of that name! CIB also doesn't have it listed under LRM...
Any idea how to get rid of this resource?
Thanks,
D.
This seems to be "normal" in Pacemaker 1.1.6 - if you tell it delete something, it will delete it and then complain to you that it no longer exists.