We are running an HBase cluster (Hadoop 3.3.1 + HBase 2.2.6) when clients frequently write a lot of data into this HBase cluster, we got the message "Cache flush failed for the region" error log.
then got a lot of "responseTooSlow" logs. below are the sample logs of the "Cache flush failed for region" messages:
"2022-07-29 07:31:58,133 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flush failed for region ecitem:IM_ItemPrice,421fa9e78dc076b621dbf7e47353ad1ab44c337619a5bb646f7743781946d73b,1478471026510.e5733f1bfe6fa351f5f9c678e20a8c94.",
2022-07-29 07:36:44,075 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache flush failed for region namespace1:ItableC,37f6deefb28c4ce53693f3e84b3b49f5b8f5cd2ea5641e4ff9cf3153903068c7,1506597652433.af03abfd3916d10e62f592d4cd26dede.",
2022-07-29 07:36:58,175 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flush failed for region ecitem:IM_ItemBase,4af080051cb8c01bad53f06fac3677ae0788c92f0b099f00f95e1f2d77f90654,1506577129446.4310321b38d9b88744bc3f020893b72c.",
"2022-07-29 07:41:44,116 ERROR [MemStoreFlusher.1] regionserver.MemStoreFlusher: Cache flush failed for region namespace1:ItableA,0323|NE_US,1506655847792.375ce18534de9f7b2ba465c5a2940759.",
"2022-07-29 07:41:58,218 ERROR [MemStoreFlusher.0] regionserver.MemStoreFlusher: Cache flush failed for region namespace1:ItableB,c697e42447100554fcb0be17084b2d9407fb46700a598cde0ea0047498d1841f,1462557756302.ef9b748ff4f99d11b33dfa81f7cdeb56."
this issue impacts our clients to get a lot of timeout exceptions.
Is someone any ideas? thanks for your help!
Monitor Log
Related
We have active and passive in Hadoop. Due to hardware issue our standby went down. After server(disk replacement) issue is rectified I cleared all the data in the server and started namenode. It is failing with error as " Encountered exception loading fsimage Java.ip.IOException: Namenode is not formatted"
Can you help here how to make standby node up and running.
I have a problem on nifi at each step of my flow I have this error:
11:44:07 CESTERROR
Failed to index Provenance Events. See logs for more information.
i find in nifi-app.log
however there is no error and my nifi flow is working well
I have no error on the processor but the error is displayed at the top right
the nifi.properties looks good,
I am running a spark job via pyspark, which consistently returns an error:
Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-908041201-10.122.103.38-1485808002236:blk_1073741831_1007 file=/hdp/apps/2.5.3.0-37/spark/spark-hdp-assembly.jar
The error is always on the same block, namely BP-908041201-10.122.103.38-1485808002236:blk_1073741831_1007.
When i look in the hadoop tracking url, the message reads:
Application application_1505726128034_2371 failed 2 times due to AM Container
for appattempt_1505726128034_2371_000002 exited with exitCode: -1000
I can only assume from this that there is some corrupted data? How can i view the data / block via hadoop command line and see exactly which data is on this potentially corrupted block.
Unfortunately there doesnt appear to be more detailed logs on the specific nodes of failure when looking in the web based logs.
Also - is there a way in pyspark to ignore any 'corrupted' blocks and simply ignore any files/blocks it cannot fully read?
Thanks
I am new to Kylo.
I manually deployed Kylo on a test cluster of Hortonworks HDP 2.5 and have reused my Nifi instance prior to kylo.
I made a sample feed by following like ingestion tutorial (User Signups) and was successful.
However, when I drop sample data file in /var/dropzone/ the file is removed (assuming it is fetched and read by Nifi) but the operational dashboard does not show any any job running. No status against the feed job is populated.
I looked at the generated nifi process flow and there are two red processes and both are ReleaseHighWaterMark processes.
Also, Upon checking nifi-app.log I found following exception
2017-05-25 16:42:51,939 ERROR [Timer-Driven Process Thread-1] c.t.n.p.ProvenanceEventCollector ERROR PROCESSING EVENT! ProvenanceEventRecordDTO{eventId=759716, processorName=null, componentId=01a84157-0b38-14f2-d63d-c41fbd9c38a3, flowFile=ab93de46-e659-4c41-9812-94bbe2f90cfc, previous=null, eventType=CREATE, eventDetails=null, isEndOfJob=false, isBatch=true, isStream=false, feed=null}. ERROR: null
java.lang.NullPointerException: null
It seems there is a configuration issue and there is hardly any good troubleshooting guide available.
Any idea?
Please check that the KyloPrevenanceEventReportingTask is running in NiFi: http://kylo.readthedocs.io/en/latest/how-to-guides/NiFiKyloProvenanceReportingTask.html
If that doesn't resolve the issue, please post the stack trace that accompanies the error message.
My production environment has started constantly throwing this error:
Error fetching message: ERR Error running script (call to f_0ab965f5b2899a2dec38dec41fff8c97f7a33ee9): #user_script:56: #user_script: 56: -OOM command not allowed when used memory > 'maxmemory'.
I am using the Heroku Redis addon with a worker dyno running Sidekiq.
Both Redis and the Worker Dyno have plenty of memory right now and the logs don't show them running out.
What is causing this error to be thrown and how can I fix it?
I had a job that required more memory than I had available in order to run.
Run "config get maxmemory" on your redis server. Maybe that config is limiting the amount of memory Redis is using.