read multiple gzip files from folder and unzip error in Talend

read multiple gzip files from folder and unzip error in Talend - etl

I have created a workflow for reading all gzip files from a folder. Once done, i want to unzip those. I am getting error as follows
Starting job MyFirst at 08:55 22/03/2017.
[statistics] connecting to socket on port 3675
[statistics] connected
Exception in component tFileUnarchive_1
java.lang.NullPointerException
at local_project.myfirst_0_1.MyFirst.tFileUnarchive_1Process(MyFirst.java:535)
at local_project.myfirst_0_1.MyFirst.tFileList_1Process(MyFirst.java:451)
at local_project.myfirst_0_1.MyFirst.runJobInTOS(MyFirst.java:933)
at local_project.myfirst_0_1.MyFirst.main(MyFirst.java:790)
[statistics] disconnected
Job MyFirst ended at 08:55 22/03/2017. [exit code=1]
Screenshot 1
Screenshot 2

try to use a "flow -> Iterate" link insteed of "OnSubjectOk"

Related

NiFi FetchFTP delete process getting failed for some of the files

NiFi version 1.5.
i use FetchFTP configured as shown below:
Hostname: x.x.x.x
port: 21
username: yyy
password: zzz
Remote File: ${path}/${file_name}
Completion Strategy: Delete File
Run Schedule: 5 sec
50 files are processed by FetchFTP and out of which only 46 is successfully deleted the file from the FTP server.
Instantly, the processor showed the below error message and also in the log file:
2019-11-20 23:33:25,542 WARN [Timer-Driven Process Thread-6] o.a.nifi.processors.standard.FetchFTP FetchFTP[id=53ee29a5] Successfully fetched the content for StandardFlowFileRecord[uuid=8ec8219e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=15742604, container=default, section=175], offset=49, length=598026],offset=0,name=20191118145221190.pdf,size=598026] from x.x.x.x:21/<folder>/20191118145221190.pdf but failed to remove the remote file due to java.io.IOException: Failed to remove file /<folder>/20191118145221190.pdf due to 550 The process cannot access the file because it is being used by another process.
Appreciate for any help regarding this.

Postgres Backup Restoration Issue

 my  oobjective is simple,  just a backup and retsore it  on other machine , which have no raltion with running cluter .
My steps .
1.  Remotly pg_basebackup on new machine .
2.  rm -fr ../../main/
3.  mv bacnkup/main/ ../../main/
4.  start postgres service
** During backup no error occur. **
But getting error:
2018-12-13 10:05:12.437 IST [834] LOG: database  system was shut down in recovery at 2018-12-12 23:01:58 IST
2018-12-13 10:05:12.437 IST [834] LOG:  invalid primary  checkpoint record
2018-12-13 10:05:12.437 IST [834] LOG: invalid secondary checkpoint record
2018-12-13 10:05:12.437 IST [834] PANIC: could not locate a valid checkpoint record
 2018-12-13 10:05:12.556 IST [833] LOG: startup process (PID 834) was terminated by signal 6: Aborted
 2018-12-13 10:05:12.556 IST [833] LOG: aborting  startup due to startup process failure
 2018-12-13 10:05:12.557 IST [833] LOG: database  system is shut down

Based on the answer to a very similar question (How to mount a pg_basebackup on a stand alone server to retrieve accidently deleted data and on the fact that that answer helped me get this working glitch-free, the steps are:
do the basebackup, or copy/untar previously made one, to the right location /var/lib/postgresql/9.5/main
remove the file backup_label
run /usr/lib/postgresql/9.5/bin/pg_resetxlog -f /var/lib/postgresql/9.5/main
start postgres service
(replying to this old question because it is the first one I found when looking to find the solution to the same problem).

Talend workflow for importing multiple gzip file un-archiving and creating calculated field

I want to 1)read multiple gzip files from a path, 2)un-archive and 3)create a calculated field. So far i have been successful in doing 1 and 2. For 3, i thought that tMap will do the needful, however don't know why i am unable to connect the un-archive component with tMap.
Edit1:
I don't know why tdelemited and tMap have the error message showing?
below is the message i got
Starting job Migration_1 at 09:36 04/04/2017.
[statistics] connecting to socket on port 3336
[statistics] connected
[statistics] disconnected
Job Migration_1 ended at 09:36 04/04/2017. [exit code=0]
Edit2: i tried with all suggested steps, yet it does not give me the required output and to my surprise there is no error message in the log to debug anything.
Starting job Migration_1 at 12:36 04/04/2017.
[statistics] connecting to socket on port 3463
[statistics] connected
[statistics] disconnected
Job Migration_1 ended at 12:36 04/04/2017. [exit code=0]

tFileUnarchive will just unarchive the zip files, but you will still have to read the files contained in these zips. tFileUnarchive component does not provide this reading part.
After the tFileList-->tFileUnarchive subjob, you should have a file-reading subjob, such as :
tFileList--iterate-->tFileInput*-->tMap
tFileList should be set to read the repository where you extracted the gzip files.

Error in Accumulo's tablet server when scanning for data

I have a bunch of tables in Accumulo with one master and 2 tablet servers containing a bunch of tables storing millions of records. The problem is that whenever I scan the tables to get a few records out, the tablet server logs keep throwing this error
2015-11-12 04:38:56,107 [hdfs.DFSClient] WARN : Failed to connect to /192.168.250.12:50010 for block, add to deadNodes and continue. java.io.IOException: Got error, status message opReadBlock BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 received exception java.io.IOException: Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
java.io.IOException: Got error, status message opReadBlock BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 received exception java.io.IOException: Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockReader2.java:456)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockReader2.java:424)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:697)
at java.io.DataInputStream.readShort(DataInputStream.java:312)
at org.apache.accumulo.core.file.rfile.bcfile.Utils$Version.<init>(Utils.java:264)
at org.apache.accumulo.core.file.rfile.bcfile.BCFile$Reader.<init>(BCFile.java:823)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.init(CachableBlockFile.java:246)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBCFile(CachableBlockFile.java:257)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.access$100(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader$MetaBlockLoader.get(CachableBlockFile.java:209)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getBlock(CachableBlockFile.java:313)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:368)
at org.apache.accumulo.core.file.blockfile.impl.CachableBlockFile$Reader.getMetaBlock(CachableBlockFile.java:137)
at org.apache.accumulo.core.file.rfile.RFile$Reader.<init>(RFile.java:843)
at org.apache.accumulo.core.file.rfile.RFileOperations.openReader(RFileOperations.java:79)
at org.apache.accumulo.core.file.DispatchingFileFactory.openReader(DispatchingFileFactory.java:69)
at org.apache.accumulo.tserver.tablet.Compactor.openMapDataFiles(Compactor.java:279)
at org.apache.accumulo.tserver.tablet.Compactor.compactLocalityGroup(Compactor.java:322)
at org.apache.accumulo.tserver.tablet.Compactor.call(Compactor.java:214)
at org.apache.accumulo.tserver.tablet.Tablet._majorCompact(Tablet.java:1976)
at org.apache.accumulo.tserver.tablet.Tablet.majorCompact(Tablet.java:2093)
at org.apache.accumulo.tserver.tablet.CompactionRunner.run(CompactionRunner.java:44)
at org.apache.htrace.wrappers.TraceRunnable.run(TraceRunnable.java:57)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at org.apache.accumulo.fate.util.LoggingRunnable.run(LoggingRunnable.java:35)
at java.lang.Thread.run(Thread.java:745)
I think it is more of a HDFS related issue as opposed to an Accumulo one, so I checked the logs of the datanode and found the same message,
Offset 16320 and length 20 don't match block BP-1881591466-192.168.1.111-1438767154643:blk_1073773956_33167 ( blockLen 0 ), for OP_READ_BLOCK, self=/192.168.250.202:55915, remote=/192.168.250.12:50010, for file /accumulo/tables/1/default_tablet/F0000gne.rf, for pool BP-1881591466-192.168.1.111-1438767154643 block 1073773956_33167
But as INFO in the logs. What I don't understand is that why am I getting this error.
I can see that the pool name of the file (BP-1881591466-192.168.1.111-1438767154643) that I am trying to access contains a IP address (192.168.1.111) which does not match the IP address of any of the servers (self and remote). Actually, 192.168.1.111 was the old IP address of the Hadoop Master server, but I had changed it. I use domain names instead of IP addresses so the only place where I made the changes were in the host files of the machines in the cluster. None of the Hadoop/Accumulo configurations use IP addresses. Does anyone know what the issue is here? I have spent days on it and still am not able to figure it out.

The error you are receiving indicates that Accumulo cannot read part of one of its files from HDFS. The NameNode is reporting that a block is located on a particular DataNode (in your case, 192.168.250.12). However, when Accumulo attempts to read from that DataNode, it fails.
This likely indicates a corrupt block in HDFS, or a temporary network issue. You can try to run hadoop fsck / (the exact command may vary, depending on version) to perform a health check of HDFS.
Also, the IP address mismatch in the DataNode appears to indicate that the DataNode is confused about the HDFS pool it is a part of. You should restart that DataNode after double-checking its configuration, DNS, and /etc/hosts for any anomolies.

Java run time error on shell script in pentaho

I have a script in pentaho where it gets values like process id and few variables from the successful execution of previous job and write it as a file name into another location based on the process id and variable. While the shell script is being executed, it is throwing below run time error only few times. Please help.
ERROR 21-07 03:27:26,604 - Shell_Create_Trigger_File - (stderr) java.io.IOException: Stream closed
at java.io.BufferedInputStream.getBufIfOpen(BufferedInputStream.java:162)
at java.io.BufferedInputStream.read(BufferedInputStream.java:325)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:283)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:325)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:177)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:154)
at java.io.BufferedReader.readLine(BufferedReader.java:317)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at org.pentaho.di.core.util.StreamLogger.run(StreamLogger.java:57)
at java.lang.Thread.run(Thread.java:745)

"(stderr) java.io.IOException: Stream closed" usually happens if there is an abrupt closing of the connection or data connection is getting lost. You can check this question.
Also there could be multiple reasons like slow network, pdi process getting killed in the middle. But answer to your question can be broad. As you said the error is happening only few times, i suggest you look into the server you are running the code.
Hope it helps :)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

read multiple gzip files from folder and unzip error in Talend - etl

try to use a "flow -> Iterate" link insteed of "OnSubjectOk"

Related

NiFi FetchFTP delete process getting failed for some of the files

Postgres Backup Restoration Issue

Talend workflow for importing multiple gzip file un-archiving and creating calculated field

Error in Accumulo's tablet server when scanning for data

Java run time error on shell script in pentaho

Categories

Resources