How to only read one line (its head) from a file? - spring

I want to get the head of a file. How to only read one line (its head) from a file? (spring-batch)

I suppose you use FlatFileItemReader.
In such case it possible to set setLinesToSkip(1) to skip header and use setSkippedLinesCallback to get its value. If you want to ignore other lines - just alwasy return null in LineMapper.
But if you really need only one line - implement your Tasklet or ItemReader for it.

Related

Processing a huge text file in laravel

I got a huge 500m file or even greater, i need to parse the text and insert data from the file into db.I know how to accomplish it using plain php (read line by line, or by small chunks).The question is: Is it possible to accomplish using laravel filesystem abstraction (laravel uses "thephpleague flysystem" library).Thanks!
It's on phpleague's documentation, you can use "chunk" method :
https://csv.thephpleague.com/9.0/connections/output/#outputting-the-document-into-chunks
Instead of output, you could insert data in database
To read files line by line in Laravel you can use File::lines('file_test.txt') which use LazyCollection or use LazyCollection directly.
Example:
use Illuminate\Support\Facades\File;
foreach (File::lines('file_test.txt') as $line) {
//Modifications in current line
}
References:
https://laravel.com/api/9.x/Illuminate/Support/Facades/File.html#method_lines
https://laravel.com/docs/9.x/collections#lazy-collections

Nifi:making one flowfile from splited log flowfiles

I want to make log files for nifi processors , i get them form tailFail and split text line by line then check if it is error , info or warn log and route to executescript processors but at this time i have 5 flowfiles and i want to unify this split flowfiles and write it in one flowfile, i tried to use merge content but i think it doesn't fit my task.
I also want to know if nifi custom log return log files for
all processors i have added in my workflow and is it nessecary to
add appenders inside logback.xml.
I want to know if it is possible to unify split log data?
(p.s i tried routeonAttriute also but it doesn't work for me)
my workflow looks like this:
After splitting lines you can use RouteOnContent to check that line matches regexp.
Then if you want to join lines you can use the following script.
That's just an example:
//get 1000 flow file list from incoming queue but not more then 1000
def ffList = session.get(1000)
if(!ffList)return
ffList.each{ff->
session.read(ff, {rawIn ->
//you can write here to a new output flowfile
//but in this example i will just add content into a plain file on disk
new File('./logs/warn.log') << rawIn << '\n'
} as InputStreamCallback)
session.remove(ff)
}

How to extract lines from text file "block" in loop

I have a huge text file and I want to use grep to search if some "blocks" in my text file is existing in another file. So, I need to extract these blocks first.
This is my file:
>gi|60117238|gb|AY897435.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TCTGTTGCGAGTGTGCTGATAACTACTGAATCTATGATAGTTGATGTACCAAGCAAAGAAAATGCTTCATCTCCTATGGG
TGCAGGAGAAATGAGTGGCATGGGTGGATTCTAAGTAGAATGAAACCGTGGAGCAATTGCTCCACGGTAGTTCCAAAAAA
TCTCACATTTTACTATTCGTTAAAGGTAATACGTTTGGTGCAGAAATGCACTACTGTTTGCATCCGTTTCGCTCCTTTAT
ATTGTGGTTGTCTAATAACAAAAAGGCAGCATAAGAAAACTATAACACCTAGTATATTTATACTATAGCTGACCCAAGCA
ACACGTCATACCGCGATTCATTCCACAACTGTACGAACATTACAATATGGCACATAGTAAACGATGTCATGAAAGTAGCT
GACACTGGAATTCAGAAAAAAGGATTATGTCATTCCAGTGCTTGACACTGGAATCCAGCATTTCCATAATCATCAAAACA
TTGTATTTTAACAAAAAACATGTATTTTTATGCTTGCCAACTTAATAAAATTCCTGGATCCCAGTGTCAAGCACTGGGAT
GACAC
>gi|60117239|gb|AY897436.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TTTTCATCGCTCATGTCCTTAGTTTACCCCCTGTTTCACCATTACATTAATATCTACAGAACCTCCCACTGGGGAGTAGT
AATCTAGGATAGTTTCTATCACTAAAACGCGTGGTATTCCTTTATTTTTTACCAATTTTAAATAAGACAATACCTTATTA
TCATCATAATGCTGCAGAAAGCGGCAAAAGACACCTAATTCATAATTTGTAGCTGATAATTCTTCTTGAGTTATGAGTTT
AATTTTTAAATCTTCTACTGCCTGCCTAGGCACTTTATGTTCGTTGTAATAATATAAGCCTATAGAACCTTTATTGTGTA
TATCAGAATAAGCAAGAAATAAAGAGTGTACGCCAAATAGCAATATATTTTTAGCACCATCTATATTAACCCTAGAATTA
AACTCTTTAGTGTCAAACCTGGAATATCCTAGCAATGCTTGGTAAAACGCTATTTTCCTGTCTTCTGATGTTTCTTTCTC
CTTAAAAAGAATCAAATGAAAATATTGACTCCTGCCTTAAAATATCCGGCATTTTTAACCAATTCTTTTCAGCGGCAACC
CTTGCCCACATTGCTGCTGCTTTAGGAAAAATGGTATTTCTTTAAACACTTACCTTTTGATGAAAGTTGCCCAAAATCCT
TTGTTCTATCCGAATCCAAAACCCCTATTTCCCAAACGCCCCTTAAAACCTTTTTTAAAATTGGAACAAAAAATATTTAA
TTTTTAAAAAAAAACG
>gi|60117240|gb|AY897437.1| Wolbachia endosymbiont of Drosophila mojavensis, genomic survey sequence
TTGNCCATCAATTGGCCACCAGAAAAGTTGCGTCCGTTTACTTCTACACCATGTATAAATGCACCTAAAATCATGCCTTG
GCAAAATGCAGCACCAAGTGACCCAAAATGAAAGGCATAATCCCATAATCGCCTGTATTTTCCTTCTGCCTTAAAACGAA
ACTCAAAGGATACTCCGCGCACTATAAGGCCAAGCAGCATAATAATGATTGGAATATAAAAAGCAGGCATTAATATTGAA
TATGCAAGAGGAAAAGCAGCAAACAACCCTCCACCACCTAGTACCAACCATGTTTCGTTTCCATCCCAAAATGGTGCAAT
TGAGCTTATCATGTGATCACGGCATTTATCTGACGGTGCAAAAGGAAGTAAAATACCAATACCTAAATCAAACCCATCCA
TTAAAATATACAGTAAAACAGCTATGGCAATTAGTAATCCCCAGATTAGGGGTAAATTAATTAAGGAAGAAAAATCAAAC
ATGATTGTTGTCCTTTCCAGATGTACCAGCATCAATCACTGAAGCTCCAATACCGTGTTTATAAAATTGCTCTTCTTCTT
TAATGACAGGAATTCCTTTGTATATAAGTTTCAGAATATAGTATCTACCTGCTCCAAATATAAGGGTATACATAAACGAT
AAATGCAATCAAAGACCATGCAACCTGAGGACCGGTAATCGCAGATGAAAATGATTCAATTGTGCCGTTAATTCCATATA
CAGTGTAAAGTTGACGGCCAATTTCATAGTAAACCAAACTGCAAGTAACGCTATGGACCCCGACGGCATCTTTGAAATCC
ACAATCCTTTGAAAACACAACTTTGGAATAATTTGCCCCGAAAAATACTGAAAAAAAATTTACTGGACCCATTTTGGATT
ATTAAAATTTCAACTCCAACCATTTATACGGG
Block is starting from > to the letter befor the next >.
So, 1st block is:
TCTGTTGCGAGTGTGCTGATAACTACTGAATCTATGATAGTTGATGTACCAAGCAAAGAAAATGCTTCATCTCCTATGGG
TGCAGGAGAAATGAGTGGCATGGGTGGATTCTAAGTAGAATGAAACCGTGGAGCAATTGCTCCACGGTAGTTCCAAAAAA
TCTCACATTTTACTATTCGTTAAAGGTAATACGTTTGGTGCAGAAATGCACTACTGTTTGCATCCGTTTCGCTCCTTTAT
ATTGTGGTTGTCTAATAACAAAAAGGCAGCATAAGAAAACTATAACACCTAGTATATTTATACTATAGCTGACCCAAGCA
ACACGTCATACCGCGATTCATTCCACAACTGTACGAACATTACAATATGGCACATAGTAAACGATGTCATGAAAGTAGCT
GACACTGGAATTCAGAAAAAAGGATTATGTCATTCCAGTGCTTGACACTGGAATCCAGCATTTCCATAATCATCAAAACA
TTGTATTTTAACAAAAAACATGTATTTTTATGCTTGCCAACTTAATAAAATTCCTGGATCCCAGTGTCAAGCACTGGGAT
GACAC
2nd block is:
TTTTCATCGCTCATGTCCTTAGTTTACCCCCTGTTTCACCATTACATTAATATCTACAGAACCTCCCACTGGGGAGTAGT
AATCTAGGATAGTTTCTATCACTAAAACGCGTGGTATTCCTTTATTTTTTACCAATTTTAAATAAGACAATACCTTATTA
TCATCATAATGCTGCAGAAAGCGGCAAAAGACACCTAATTCATAATTTGTAGCTGATAATTCTTCTTGAGTTATGAGTTT
AATTTTTAAATCTTCTACTGCCTGCCTAGGCACTTTATGTTCGTTGTAATAATATAAGCCTATAGAACCTTTATTGTGTA
TATCAGAATAAGCAAGAAATAAAGAGTGTACGCCAAATAGCAATATATTTTTAGCACCATCTATATTAACCCTAGAATTA
AACTCTTTAGTGTCAAACCTGGAATATCCTAGCAATGCTTGGTAAAACGCTATTTTCCTGTCTTCTGATGTTTCTTTCTC
CTTAAAAAGAATCAAATGAAAATATTGACTCCTGCCTTAAAATATCCGGCATTTTTAACCAATTCTTTTCAGCGGCAACC
CTTGCCCACATTGCTGCTGCTTTAGGAAAAATGGTATTTCTTTAAACACTTACCTTTTGATGAAAGTTGCCCAAAATCCT
TTGTTCTATCCGAATCCAAAACCCCTATTTCCCAAACGCCCCTTAAAACCTTTTTTAAAATTGGAACAAAAAATATTTAA
TTTTTAAAAAAAAACG
Third block:
TTGNCCATCAATTGGCCACCAGAAAAGTTGCGTCCGTTTACTTCTACACCATGTATAAATGCACCTAAAATCATGCCTTG
GCAAAATGCAGCACCAAGTGACCCAAAATGAAAGGCATAATCCCATAATCGCCTGTATTTTCCTTCTGCCTTAAAACGAA
ACTCAAAGGATACTCCGCGCACTATAAGGCCAAGCAGCATAATAATGATTGGAATATAAAAAGCAGGCATTAATATTGAA
TATGCAAGAGGAAAAGCAGCAAACAACCCTCCACCACCTAGTACCAACCATGTTTCGTTTCCATCCCAAAATGGTGCAAT
TGAGCTTATCATGTGATCACGGCATTTATCTGACGGTGCAAAAGGAAGTAAAATACCAATACCTAAATCAAACCCATCCA
TTAAAATATACAGTAAAACAGCTATGGCAATTAGTAATCCCCAGATTAGGGGTAAATTAATTAAGGAAGAAAAATCAAAC
ATGATTGTTGTCCTTTCCAGATGTACCAGCATCAATCACTGAAGCTCCAATACCGTGTTTATAAAATTGCTCTTCTTCTT
TAATGACAGGAATTCCTTTGTATATAAGTTTCAGAATATAGTATCTACCTGCTCCAAATATAAGGGTATACATAAACGAT
AAATGCAATCAAAGACCATGCAACCTGAGGACCGGTAATCGCAGATGAAAATGATTCAATTGTGCCGTTAATTCCATATA
CAGTGTAAAGTTGACGGCCAATTTCATAGTAAACCAAACTGCAAGTAACGCTATGGACCCCGACGGCATCTTTGAAATCC
ACAATCCTTTGAAAACACAACTTTGGAATAATTTGCCCCGAAAAATACTGAAAAAAAATTTACTGGACCCATTTTGGATT
ATTAAAATTTCAACTCCAACCATTTATACGGG
How can I loop my file and extract one block in each iteration? To grep it with the other file?
Edit 1:
For more clarification:
I want to do some operation on each block. First, I perform diff between two files and but the result in a new file. For the new file which contains the blocks, i want to search if each block is included in the first file or in the second file. If it is included in the first file, i want to extract it to another new file. If it is included in the second file, i want to escape and go to next block.
Hope you getting my point.
Thanks,
Do you want to create a separate file for each block? And then you want to do any operation on those files? Or you just want to do some operation(say search/grep) for each block in each loop iteration? Please clarify your requirement.

Deleting contents of file after a specific line in ruby

Probably a simple question, but I need to delete the contents of a file after a specific line number? So I wan't to keep the first e.g 5 lines and delete the rest of the contents of a file. I have been searching for a while and can't find a way to do this, I am an iOS developer so Ruby is not a language I am very familiar with.
That is called truncate. The truncate method needs the byte position after which everything gets cut off - and the File.pos method delivers just that:
File.open("test.csv", "r+") do |f|
f.each_line.take(5)
f.truncate( f.pos )
end
The "r+" mode from File.open is read and write, without truncating existing files to zero size, like "w+" would.
The block form of File.open ensures that the file is closed when the block ends.
I'm not aware of any methods to delete from a file so my first thought was to read the file and then write back to it. Something like this:
path = '/path/to/thefile'
start_line = 0
end_line = 4
File.write(path, File.readlines(path)[start_line..end_line].join)
File#readlines reads the file and returns an array of strings, where each element is one line of the file. You can then use the subscript operator with a range for the lines you want
This isn't going to be very memory efficient for large files, so you may want to optimise if that's something you'll be doing.

Get line count before looping over data in ruby

I need to get the total number of lines that an IO object contains before looping through each line in the IO object. How can I do this in ruby?
You can't really, unless you want to shell out to wc and parse the result of that - otherwise you'll need to do two passes - one to get the line numbers, and another to do your actual work.
(assuming we're talking about a File IO instance - neither of those approaches work for network sockets etc)
in rails (the only difference is how I generate the file object instance)
file = File.open(File.join(Rails.root, 'lib', 'assets', 'file.json'))
linecount = file.readlines.size
io.lines.count would give you the number of lines.
io.lines.each_with_index {|line, index|} would give you each line and which line number it is (starting at 0).
But I don't know if it's possible to count the number of lines without reading a file.
You may want to read a file, and then use io.rewind to read it again.
If your file is not humongous, slurp it into memory(array) and count the the number of items( ie lines).

Resources