Logging Status of Files Read by Pentaho Text File Input Step - pentaho-data-integration

I'm trying to use a Text File Input step with wildcards(regexpressions) to read a bunch of files(say 100) from a directory. This is working fine. But what I'd like to know is whether there is a way to log which all files got processed successfully and which all failed in the step?
Say for example, In directory /home/usr/ I have 10 .txt files like A.txt,B.txt,C.txt,D.txt,E.txt,F.txt,G.txt,H.txt,I.txt,J.txt.
What I would like to know is whether there is way I can log something like,
file A.txt processed successfully
file B.txt processed successfully
file C.txt processed successfully
file D.txt failed //(due to some reason which doesn't matter)
file E.txt processed successfully
file F.txt processed successfully
file G.txt processed successfully
file H.txt failed
file I.txt processed successfully
file J.txt processed successfully
Simply put, I would just like to log the status of the individual files that the Text File Input step is reading.
Is this possible?
I already tried running the transformation with Detailed, Debug and Row Level log level but to no avail.
Would appreciate some help thanks!!

If you are running everything(reading all files) in the same KTR, and any of them causes error, your KTR will stop there.
What you need is to run this KTR for each file found in the directory, for that you need a job, with a KTR that lists those files, and passes the filename to your KTR through variables. You can then use an Error flow in the Job so your KTR keeps on running for the next files.
Your Job should look something like this:

Related

Viewing output contained in slurm.out file

I have a C++ program that when run on my local machine runs some simulations and saves the results in a .csv file.
I am now running the same program on a cluster. Jobs are scheduled with SLURM, queued, and then run to completion. Rather than a .csv file output, the output is a slurmid.out file. How can I access this file to see the results of my simulation?
I typically use the cat command to view slurm output files-
cat slurmid.out
You could also use vim, or any other text editor/viewer. The script should probably output the csv file as well- If its not because it's failing, the .out file will tell you about it.

modify the Source code of hadoop command to add text during command execution

I'd like to see the source code for certain hadoop commands like -put and -ls. I want to be able to add additional information to the log outputs that are associated with running these commands. For example, i want to show the message "Hii user, your file is copying from local file system to hdfs" during the execution of -get or copyFromLocal command.
I want to change in core files not in api files like copyCommand.java(http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/shell/CopyCommands.java?view=markup)
this type of message should print on execution of command.
Can anyone tell which file I should change.
How can I find these?

Bash script behaving differently for different files

I have a bash script that uses awk to process some files that I have downloaded. If I run the script on any of the files it does not work properly. However, if I transfer the contents of a file in a newly created one it seems to work as supposed. Could it have anything to do with the settings of the files?
I have two files file hotel_12313.dat and hotel_99999.dat . The first one is downloaded and the second one is created by me. If I copy the data from the first file into the second one and I execute the script on both of them the output is different.

concatenate fastq files in a directory

I have a file uploader, resumable.js, which takes a file and breaks it into 1MB 'chunks' and than sends over the files 1MB at a time. So after an upload I have a directory with thousands, sometimes millions of individual fastq files. I can concatenate all of these 'chunks' back into the files original state with this line of code..
cat file_name.* > merged.fastq
How would I go about concatenating the files back into its original state without manually running this script in the command line? Should I set up some bash script to handle this issue, maybe a cronjob? Any ideas to solve this issue are greatly appreciated.
ANSWER: For what its worth I used this npm module and it works great.
https://www.npmjs.com/package/joiner

MeshLab: processing multiple files in meshlabserver

I'm new to using meshlabserver and meshlab in general.
I created the .mlx file and tries to run a command in meshlabserver for one file and it worked. I would like to know how do I write a command for hundreds of files?
Thanks in Advance.
I've just created a batch file with necessary loops and calls the .mlx file that will run the meshlabserver command. However one should know that the resulting files will be saved in the same directory where meshlabserver.exe is.

Resources