How to split Large files in Apache Nifi - apache-nifi

I have a requirement to split millions of data(csv format) to single raw in apache nifi.Currently I am using multiple split text processor to achieve this. Is there any other way to do this instead of multiple split text processor

You can use SplitRecord Processor.
You need to create a Record Reader and Record Writer Service first.
Then you can give a value for Records Per Split to split at n position.

Related

Split an xml file using split record processor in nifi

all I am new to nifi. I want to split a large xml file into multiple chunks using the split record processor.I am unable to split the records I am my original file as the output not a multiple chunks.Can anyone help me with this?
To use SplitRecord, you're going to need to create an Avro schema that defines your record. If you have that, you should be able to use the XMLReader to turn it into a record set.

Apache Nifi - Split a large Json file into multiple files with a specified number of records

I am a newbie to Nifi and would like some guidance please. 
We want to split a large Json file into multiple files with a specified number of records. I am able to split a file into individual records using SplitJson and the Json Path Expression set as $..* I have also added an UpdateAttribute Processor with filename set to ${filename}_${fragment.index} so that we have the sequence of the files as order is important.
However, we might want to have say a 100,000 records split into 100 files of 1000 records each . What is the easiest way to do this ?
Thanks very much in advance
There is a SplitRecord processor. You can define the number of records to be split per file, such as:
Record Reader CSVReader
Record Writer CSVRecordSetWriter
Records Per Split 3
I have tested with the record,
id
1
...
8
and it is split into 3 files with the id = (1,2,3), (4,5,6), (7,8).

how to split a large file into smaller files for parallel processing in spring batch?

We have a large file which can be split logically (not be range but by the occurrence of next header record)
For example
HeaderRecord1
...large number of detail records
HeaderRecord2
...large number of detail records
and so on...
We want to split the file into multiple small files at the HeaderRecord level and process them in parallel.
How to achieve this in Spring Batch? When I google, I came across Systemcommandtasklet and to use Linux / Unix Split command to split.
Is that the best approach? Are there any partition options within Spring Batch?
Thanks and Regards
You need to create a custom Partitioner that calculates the indexes of each logical partition (begin/end index). Then use a custom item reader (that could extend FlatFileItemReader) which reads only the lines of the given partition (and ignore other lines).

How to split the xml file using apache nifi?

l have a 20GB XML file in my local system, I want to split the data into multiple chunks and also I want to remove specific attributes from that file. how will I achieve using Nifi?
Use SplitRecord processor and define XML Reader/Writer controller services to read the xml data and write only the required attributes into your result xml.
Also define Records Per Split property value to include how many records you needed for each split.

running record count from SplitRecord processor Nifi

Is there a way to get fragment index from SplitRecord processor Nifi? I am splitting a very big xls (4 mill records) into "Records Per Split" = 100000.
Now I want to just process first 2 splits, to see quality of the file and reject rest of the file.
I can see fragment index is in other split function (e.g. JsonSplit), but not in record split. Any other hack?
Method1:
By using Control Rate processor we can achieve this case
Control Rate Processor:
By this configs we are releasing 2 flowfiles for every minute and
Flow:
Configure the queue expiration to like 10 sec(or lower number if you need), then the flowfiles are going to expired in the queue but first 2 flowfiles are going to be released.
Method2:
By using SplitText processor then use RouteOnAttribute Processor and add new property as
${fragment.index:le(2)}
By using above expression language we are only allowing only the first 2 fragment indexes.
Refer to this link for splitting Big File in NiFi.

Resources