Add atribute in NIFI from fix lenght text witout any delimeter - apache-nifi

I'm trying to read a file with fix length of 30 like the one below:
093562010705000031505000002542
in NIFI and I need to change it as below format and put it in 6 columns with Nifi:
0,935,62,010705,0000315050,00002542
How can I do that with Nifi processors?
Thanks in advance for your help.

Related

How to split Large files in Apache Nifi

I have a requirement to split millions of data(csv format) to single raw in apache nifi.Currently I am using multiple split text processor to achieve this. Is there any other way to do this instead of multiple split text processor
You can use SplitRecord Processor.
You need to create a Record Reader and Record Writer Service first.
Then you can give a value for Records Per Split to split at n position.

Split an xml file using split record processor in nifi

all I am new to nifi. I want to split a large xml file into multiple chunks using the split record processor.I am unable to split the records I am my original file as the output not a multiple chunks.Can anyone help me with this?
To use SplitRecord, you're going to need to create an Avro schema that defines your record. If you have that, you should be able to use the XMLReader to turn it into a record set.

NiFi: how to get maximum timestamp from first column?

NiFi version 1.5
i have a csv file arrives first time like:
datetime,a.DLG,b.DLG,c.DLG
2019/02/04 00:00,86667,98.5,0
2019/02/04 01:00,86567,96.5,0
used listfile -> fetchfile to get the csv file.
next 10 minutes, i get appended csv file:
datetime,a.DLG,b.DLG,c.DLG
2019/02/04 00:00,86667,98.5,0
2019/02/04 01:00,86567,96.5,0
2019/02/04 02:00,86787,99.5,0
2019/02/04 03:00,86117,91.5,0
here, how do we need to get only new records alone (last two records). i do not want to process first two records that is already been processed.
my thought process is, we need to get maximum datetime to store in attribute and use QueryRecord. but i do not know how to get maximum datetime using which processor.
is there any better solution.
This is currently an open issue (NIFI-6047) but there has been a community contribution to address it, so you may see the DetectDuplicateRecord processor in an upcoming release of NiFi.
There may be a workaround to split up the CSV rows and create a compound key using ExtractText, then using DetectDuplicate.
It doesn't seems to be a work that is best solved on Nifi as you need to keep a state of what you have processed. An alternative would be for you to delete what you have already processed. Then you can assume what is in the file is always not processed.
here, how do we need to get only new records alone (last two records).
i do not want to process first two records that is already been
processed.
From my understanding, actual question is 'how to process/ingest csv rows as it is written to the file?'.
Description of 'TailFile' processor from NiFi documentation:
"Tails" a file, or a list of files, ingesting data from the file as it
is written to the file. The file is expected to be textual. Data is
ingested only when a new line is encountered (carriage return or
new-line character or combination)
This solution is appropriate when you don't want to move/delete actual file.

Combining different data flows and create .txt file by sorting output

I have a requirement. I am trying to combining several data flows with Talend in order to create a .txt file. In my case the input flows are DB tables. I am able to create the output file "prova.txt", but in this file some fields of 2nd and 3rd tables are missing and I don't know why. I checked with tLogRow and the probelm seems to be in tHashInput_1. In the 3 tHashOutput rows are logged correctly with all fields.
Below, my job:
Components tHashOutput_2, tHashOutput_3, tHashInput_1 are linked to tHashOutput_1.
Am I doing something wrong? Does anyone could help me?
Thank you in advance!
Assuming for all thashoutput schema will be same, I attached image for your problem.
Here in all tfileoutputdelimited components give same file name, same schema and use append option. It will append data in same file from all 3 tables.
An alternative, is to use tUnite, Assuming for all thashoutput schema will be same
Example: Using tUnique
Regards!
tunitetalendecanaveras

read csv file data and store it in database using spring framework

I need help, I want the code to read the data which is in a csv file and the store that data into database. I have tried reading the csv file with known rows and cols. But the challenge here is that I want to create an utility where I don't know the number of cols and rows that are in the csv file so how would I do it? Please help.
Have you explored Spring Batch? You can write your own implementation of LineTokenizer for the columns which are going to change dynamically.

Resources