NiFi FlowFile Repository failed to update - apache-nifi

I´m using Apache NiFi to ingest and preprocess some CSV files, but when runing during a long time, it always fails. The error is always the same:
FlowFile Repository failed to update
Searching at logs, I see this error always:
2018-07-11 22:42:49,913 ERROR [Timer-Driven Process Thread-10] o.a.n.p.attributes.UpdateAttribute UpdateAttribute[id=c7f45dc9-ee12-31b0-8dee-6f1746b3c544] Failed to process session due to org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update: org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
org.apache.nifi.processor.exception.ProcessException: FlowFile Repository failed to update
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:405)
at org.apache.nifi.controller.repository.StandardProcessSession.commit(StandardProcessSession.java:336)
at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:28)
at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1165)
at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:203)
at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: **Cannot update journal file ./flowfile_repository/journals/8772495.journal because this journal has already been closed**
at org.apache.nifi.wali.LengthDelimitedJournal.checkState(LengthDelimitedJournal.java:223)
at org.apache.nifi.wali.LengthDelimitedJournal.update(LengthDelimitedJournal.java:178)
at org.apache.nifi.wali.SequentialAccessWriteAheadLog.update(SequentialAccessWriteAheadLog.java:121)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:300)
at org.apache.nifi.controller.repository.WriteAheadFlowFileRepository.updateRepository(WriteAheadFlowFileRepository.java:257)
What makes me believe that the root cause is that Nifi Cannot update journal file ./flowfile_repository/journals/8772495.journal because this journal has already been closed**, as seen on logs file.
How can I solve this issue?
Thanks!

I ran into the same issue the other day.
When I inspected the disk space on the volume where "flowfile_repository" is located I see this
/dev/sdc1 447G 447G 24K 100% /var/proj/data2
Its 100% full.

If NiFi is having issues writing to the journal file there are a few things to check.
Are you reading in fields from your CSV that are large (greater than 64kb) and attempting to assign them to attributes? You may want to consider processing that particular field in your CSV as a separate flowfile and matching it with attributes later. See this mailing list discussion for more information.
Have you checked NiFi's configuration against the best practices listed in the administration guide? I also recommend understanding each of the Flowfile repository settings. It will allow you to ask more targeted questions.
It is likely worth updating your JVM settings to allow for larger file processing. Check out this post on Hortonworks detailing best practices for high performance systems.
In order to solve the problem you may need to tweak a couple of things. Does the flow handle the CSV in an efficient manner? Does NiFi have enough memory to do what it needs to with the data? Would it be more appropriate to handle the CSV files as records? If that concept is unfamiliar check out this post that introduces record processing in NiFi. I hope some of these resources help you get a little closer to a solution. If you have a follow up question let me know.

I encountered the same problem after running Nifi for 2 days on my Ubuntu system.
First, I ran command du -shr ./* under the Nifi folder, it turned out there are a lot of application logs there. Each of the log file is 101M. I think it is the default value for retention.
Since I don't need to keep log for that much for now, so I updated the logback.xml file in Nifi /conf folder set the application log to daily rollover.

Related

Apache Nifi put file on iSeries FTP

At this time, it isn't possible to interact with an iSeries IFS using Nifi. If this is something that you need, I recommend heading over and voting/commenting.
Opened an enhancement with ASF
Edit:
I created a dynamic property pre.cmd._____ per a suggestion below and seem to still be having the same issue. Perhaps I'm not creating this correctly? I've tried quoting and a few other variations with no success:
Looking to use Apache Nifi (1.11.4) for some near real time file movements and one requirement of mine is to FTP a file to an FTP hosted on an iSeries.
An iSeries has two formats when utilizing FTP, NAMEFMT 0 when accessing libraries, physical files, and members, and NAMEFMT 1 when accessing the IFS. More on that if you're interested...
I set up what I thought was a very simple process to get acquainted, detect when a file is placed on an SFTP, and simply move it on over to our iSeries:
The problem I'm having is that I'm simply trying to place the file on the IFS in my home directory /home/dkorinke. However, the PutFTP is erroring as it seems to be using NAMEFMT 0 and when changing directory to /home/dkorinke.
From the logs:
2020-06-10 21:09:36,924 ERROR [Timer-Driven Process Thread-6] o.apache.nifi.processors.standard.PutFTP PutFTP[id=a084128f-0172-1000-49e1-fdd7b9d8d129] Unable to transfer StandardFlowFileRecord[uuid=72600f74-2678-4c33-870a-f2648aac034b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1591830969408-1, container=default, section=1], offset=463787, length=104801],offset=0,name=My_File.csv,size=104801] to remote host DEVQA01 due to org.apache.nifi.processor.exception.ProcessException: Unable to change working directory to /HOME/DKORINKE/: null; routing to failure
I began playing around with the Remote Path property on the PutFTP and was able to get a different error when using "/HOME/DKORINKE/". I noticed QGPL/ in front of "/HOME/DKORINKE":
2020-06-10 21:53:35,710 ERROR [Timer-Driven Process Thread-5] o.apache.nifi.processors.standard.PutFTP PutFTP[id=a084128f-0172-1000-49e1-fdd7b9d8d129] Unable to transfer StandardFlowFileRecord[uuid=14303c7c-babb-4cbf-9728-346e0afaafba,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1591830969408-1, container=default, section=1], offset=882991, length=104801],offset=0,name=My_File.csv,size=104801] to remote host DEVQA01 due to org.apache.nifi.processor.exception.ProcessException: Unable to change working directory to QGPL/"/HOME/DKORINKE/": null; routing to failure
So far, the only solution I can think of is to update my user profile to default to NAMEFMT 1 which I'm not entirely sure is possible. I've hit the Google pretty hard this evening but I haven't been able to find anyone having a similar use case or issue like this.

[Elasticsearch]: Unable to Recover Primary Shard

I'm using Elasticsearch 2.3.5 version. I have to recover the complete data from the backup disks. Everything got recovered except 2 shards. While checking logs, I found the following error.
ERROR:
Caused by: java.nio.file.NoSuchFileException: /data/<cluster_name>/nodes/0/indices/index_name/shard_no/index/_c4_49.liv
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)
at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at org.apache.lucene.store.NIOFSDirectory.openInput(NIOFSDirectory.java:81)
at org.apache.lucene.store.FileSwitchDirectory.openInput(FileSwitchDirectory.java:186)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
at org.apache.lucene.store.FilterDirectory.openInput(FilterDirectory.java:89)
at org.apache.lucene.store.Directory.openChecksumInput(Directory.java:109)
at org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:83)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:73)
at org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at org.apache.lucene.index.ReadersAndUpdates.getReadOnlyClone(ReadersAndUpdates.java:197)
at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:99)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:435)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:100)
at org.elasticsearch.index.engine.InternalEngine.createSearcherManager(InternalEngine.java:283)
... 12 more
Can anyone suggest why is this happening or anyhow I can skip this particular file?
Thanks in Advance
Unfortunately restoring Elasticsearch from a filesystem backup is not a reliable way to recover your data, and is expected to fail like this sometimes. You should always use snapshot and restore instead. Your version is rather old, but more recent versions include this warning in the docs (which also applies to your version):
WARNING: You cannot back up an Elasticsearch cluster by simply copying the data directories of all of its nodes. Elasticsearch may be making changes to the contents of its data directories while it is running; copying its data directories cannot be expected to capture a consistent picture of their contents. If you try to restore a cluster from such a backup, it may fail and report corruption and/or missing files. Alternatively, it may appear to have succeeded though it silently lost some of its data. The only reliable way to back up a cluster is by using the snapshot and restore functionality.
It is possible that the restore has silently lost data in other shards too, there's no way to tell. Assuming you don't also have a snapshot of the data held in the lost shards, the only way to recover it is to reindex it from its source.

tHiveCreateTable component gives "org.apache.hive.service.cli.HiveSQLException" exception

I have a Talend Big Data job where I am trying to connect to Hive and create a table. Hive connect works fine. But tHiveCreate table gives the below exception.
Exception in component tHiveCreateTable_1 (Test)
org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:258)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:244)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:247)
at local_project.test_0_1.Test.tHiveCreateTable_1Process(Test.java:643)
at local_project.test_0_1.Test.tHiveConnection_1Process(Test.java:498)
at local_project.test_0_1.Test.runJobInTOS(Test.java:948)
at local_project.test_0_1.Test.main(Test.java:799)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while processing statement: Cannot modify mapred.job.name at runtime. It is not in list of params that are allowed to be modified at runtime
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:324)
at org.apache.hive.service.cli.operation.HiveCommandOperation.runInternal(HiveCommandOperation.java:108)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:264)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:479)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:466)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:315)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:509)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1377)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1362)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Earlier, the tHiveConnection was failing with same error. As per one of the older posts, I unchecked the Hadoop propeties from the tHiveConnect component and it worked fine. The similar properties are not available in the tHiveCreateTable component as I am using the tHiveConnection to provide connection details to tHiveCreateTable component.
Any help will be appreciated. Thanks
Anil
Similar problem as Talend (7.0.1) - Cannot modify mapred.job.name at runtime. Try to fix property hive.security.authorization.sqlstd.confwhitelist
I was able to fix the issue.
Added a property to custom hive site in Ambari as:
hive.security.authorization.sqlstd.confwhitelist.append with values
mapred.job.name|mapred.child.env|query.invoker|hive.query.name
The issue got fixed.

How to access WSO2 BAM's hadoop job tracker?

I am quite new to BAM and one of my hive queries is broken.
However I can't find what's wrong since the only error it gives me is
ERROR: Error while executing Hive script.Query returned non-zero code: 9, cause: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MapRedTask
I've looked around and found out that BAM is only capable of displaying that much information and for more I need to look in hadoop's job tracker. However I can't find any info on how to turn it on or access it in the BAM server.
So how do I access it/ turn it on ?
Please do not mislead with the exception. Most probably this seems to be a problem with Hive query. To get a proper idea about the problem you should send the backend console print log.
It seems like the problem is most probably with your hive query and not with hadoop job tracker. To make sure, please run of the samples[1] and check whether hive queries are executing properly. If hive queries executing without a problem and summarized results are displayed in dashboards, the problem could be with your hive query.
[1] - http://docs.wso2.org/display/BAM240/HTTPD+Logs+Analysis+Sample

Skipping bad input files in hadoop

I'm using Amazon Elastic MapReduce to process some log files uploaded to S3.
The log files are uploaded daily from servers using S3, but it seems that a some get corrupted during the transfer. This results in a java.io.IOException: IO error in map input file exception.
Is there any way to have hadoop skip over the bad files?
There's a who bunch of record skipping configuration properties you can use to do this - see the mapred.skip. prefixed properties on http://hadoop.apache.org/docs/r1.2.1/mapred-default.html
There's also a nice blog post entry about this subject and these config properties:
http://devblog.factual.com/practical-hadoop-streaming-dealing-with-brittle-code
That said, if you file is completely corrupt (i.e. broken before the first record), you might still have issues even with these properties.
Chris White's comment suggesting writing your own RecordReader and InputFormat is exactly right. I recently faced this issue and was able to solve it by catching the file exceptions in those classes, logging them, and then moving on to the next file.
I've written up some details (including full Java source code) here: http://daynebatten.com/2016/03/dealing-with-corrupt-or-blank-files-in-hadoop/

Resources