Initialize an object from multiple input file data - spring

I have 2 fixed-length flat files F1 and F2 containing data (id, A, B) and (id, C, D) respectively.
I am trying to instantiate an object Foo f = new Foo (id, A, B, C, D).
How can I achieve this feat with spring batch. I don't have access to any DB so I can't insert Foo(id, A, B) into a staging table and then update the missing values.
Thank you so much :)

Thank you Mahmoud Ben Hassine, using an awk script solved my issue, here's my script
awk 'NR==FNR
{dossier[$3] = $0;next}
{
printf("<client>\n");
printf "<rev>%s</rev>\n<pdd>%s</pdd>\n",$0,dossier[$9];
print("</client>")
}' pdddos.txt revass.txt > output.txt
sed -i '1s/^/<clients>\n/' output.txt
sed -i -e '$a</clients>' output.txt

Related

Sorting the contents within a column using Shell Script Line by Line in a File

I am Sorting a File using a column using the command -
cat myFile | sort -u -k3
Now i want to Sort Data within a Column of a File. Can anyone please help and tell me how can i achieve it?
My Data Looks like this in the File names Student.csv -
Name,Age,Marks,Grades
Sam,21,"34,56,21,67","C,B,D,A"
Josh,25,"90,89,78,45","A,A,B,C"
Output-
Name,Age,Marks,Grades
Sam,21,"21,34,56,67","A,B,C,D"
Josh,25,"45,78,89,90","A,A,B,C"
Will Appreciate the help, Thanks
You should export your CSV with a field separator that does not exist within the texts. Otherwise it becomes hugely cumbersome to deal with this.
Afterwards you can easily sort by specifying the separator and the field.
Example if you would use | as separator:
Name|Age|Marks|Grades
Sam|21|"34,56,21,67"|"C,B,D,A"
Josh|25|"90,89,78,45"|"A,A,B,C"
Then execute:
cat myFile | sort -u -k3 -t\|
or:
sort -u -k3 -t\| <myFile
Afterwards you could be putting your semi-colons back:
sort -u -k3 -t\| <myFile | sed 's/|/;/g'
Did it, but I'm too tired to explain how; brain's hitting a brick wall. There's a lot to unpack there, and it'll take half-a-day to explain. I'll write all the steps in a couple hours after I get a nap in, otherwise there's gonna be 50 typos in that description.
cat Student.csv | head -n1 && cat Student.csv | tail -n+2 | awk -F \" '{split($2,a,",");asort(a);b="";for(i in a)b=b a[i] ",";split($4,c,",");asort(c);d="";for(i in c)d=d c[i] ",";printf "%s\"%s\",\"%s\"\n",$1,substr(b,1,length(b)-1),substr(d,1,length(d)-1)}'
Alternatively:
cat Student.csv | tee >(head -n1) >(tail -n+2 | awk -F \" '{split($2,a,",");asort(a);b="";for(i in a)b=b a[i] ",";split($4,c,",");asort(c);d="";for(i in c)d=d c[i] ",";printf "%s\"%s\",\"%s\"\n",$1,substr(b,1,length(b)-1),substr(d,1,length(d)-1)}') >/dev/null ; sleep 0.1
Output:
Name,Age,Marks,Grades
Sam,21,"21,34,56,67","A,B,C,D"
Josh,25,"45,78,89,90","A,A,B,C"
https://www.tutorialspoint.com/awk/index.htm
Edit -- 'kay, the explaination:
cat concatenates (glues) files together, but when you just give it one arg, then that's what it prints out.
You can do the next part in one or two steps, I'll explain the first method. | pipe directs the output to another command. We all know this, or we wouldn't be here right now... however someday, someone will come across this post, and wonder what it does.
head prints out the first few lines of what you give it. Here, I specified -n1 number of lines = one, so it would print out the header:
Name,Age,Marks,Grades
&& continues to the next command, so long as that initial instruction was a success.
cat Student.csv again, but this time piped into tail, which prints the last few lines, of whatever you give it. -n+2 specifies to spit out everything from line number 2, and beyond.
We then pipe those contents into AWK https://en.wikipedia.org/wiki/AWK ...I'm sure you could do it with sed https://en.wikipedia.org/wiki/Sed, and I started with that, but sed tends to be more simple than awk, so you'd need to do far more chained-commands to achieve the same thing. Lisp might be able to do it more concicely, but it sounded like you were asking for shell builtins. Python's also decent with strings, but again, sh.
-F \" delegates a literal " as the field separator, so that we can group the contents into 3 categories:
Sam,21, " 34,56,21,67 " , "C,B,D,A"
$1 = Sam,21,
$2 = 34,56,21,67
$3 = ,
$4 = C,B,D,A
You actually get 4, but I'm throwing out that comma in the third position. It's easy enough to put it back in.
We now need to sort those numbers, so split($2,a,",") returns an array, in this case, named a, from the contents of $2, which has been delimited by the , symbol.
a = [ 34, 56, 21, 67 ]
; separates AWK commands, you can mostly ignore those. If there were simply a space, awk would try to concatenate items together, and we don't want that yet.
Next, array sort asort( a ), the contents of a -- https://www.tutorialspoint.com/awk/awk_string_functions.htm
a = [ 21, 34, 56, 67 ]
Here would be a perfect time for Python's string .join() method https://www.w3schools.com/python/ref_string_join.asp
However, we don't have that available to us, and AWK doens't seem to have it, as far as I know, so we have to roll our own here. So construct string, b, whose contents will be appended by each item in a. Single-quotes often won't do in commandline, so you'll see double-quotes.
b=""
for( i in a ) b=b a[i] ","
b begins empty. Iterating a for-loop over a's contents, we arrive at an appending which includes commas. Leave the trailing comma for now, it'll get trimmed off in a bit.
21,34,56,67,
Exact same procedure for $4, but we name the array c this time, and the string in which those contents are contatenaded with commas, d -- split( $4, c, "," ) ; asort( c ) ; d="" ; for( i in c ) d=d c[i] "," You can name them anything you like, just happened to have ABCD staring me in the face from those grade listings, so that's what I went with.
OK, now we have everything we need.
$1 = Sam,21,
b = 21,34,56,67,
d = A,B,C,D,
Let's format a string so they're all together.
printf "%s\"%s\",\"%s\"\n"
This will print $1 in the first %s string position, then a literal double-quote,
b into the second %s string position, next ",",
followed by d in the third %s position,
all wrapped up with a final double-quote and a newline.
However, b and d both have trailing commas, so we trim those off with AWK's substr() command. -- https://www.tutorialspoint.com/awk/awk_string_functions.htm Knowing where to begin is easy enough, but we need to chop those at one-from-the-end.
substr( b, 1, length(b) -1 )
substr( d, 1, length(d) -1 )
It'd be nice if you could just specify -2, and have it count backwards, like you can in Lua, Python, et al... but that doesn't seem to do in AWK, so whatevs. Ya live, ya learn. And there you have it, all your ducks in a row.
Sam,21,"21,34,56,67","A,B,C,D"
This does, maybe not elegantly, but it's within the required guidelines. I'm sure there's possibilities of code-golfing in there somewhere, but it's solid logic you can follow.

bash: separate blocks of lines between pattern x and y

I have a similar question to this one Sed/Awk - pull lines between pattern x and y, however, in my case I want to output each block-of-lines to individual files (named after the first pattern).
Input example:
-- filename: query1.sql
-- sql comments goes here or else where
select * from table1
where id=123;
-- eof
-- filename: query2.sql
insert into table1
(id, date) values (1, sysdate);
-- eof
I want the bash script to generate 2 files: query1.sql and query2.sql with the following content:
query1.sql:
-- sql comments goes here or else where
select * from table1
where id=123;
query2.sql:
insert into table1
(id, date) values (1, sysdate);
Thank you
awk '/-- filename/{if(f)close(f); f=$3;next} !/eof/&&/./{print $0 >> f}' input
Brief explanation,
-- filename{if(f)close(f); f=$3;next}: locate the record contains filename, and assign it to f
!/eof/&&/./{print $0 >> f}: if following lines don't contain 'eof' neither empty, save it to the corresponding file.
This might work for you (GNU sed):
sed -r '/-- filename: (\S+)/!d;s##/&/,/-- eof/{//d;w \1#p;s/.*/}/p;d' file |
sed -nf - file
Create a sed script from the input file and run it against the input file
N.B. Two lines are needed for each query as the program for the query must be surrounded by braces and the w command must end in a newline.
Using GNU awk to handle multiple open files for you:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{out=$3; f=1}' file
or with any awk:
awk '/^-- eof/{f=0} f{print > out} /^-- filename/{close(out); out=$3; f=1}' file

Grabing values from one file (via awk) and using them in another (via sed)

I am moving using gawk to grab some values but not all values from a file. I have another file that's a template that I will use to replace certain piece then generate a file specific to those values I grab. I would like to use sed to substitute these fields of interest that are in the template.
the dog NAME , likes to ACTION in water when he's bored
another file,f1, would have the name of the dog and the action
Maxs,swim
StoneCold,digs
Thor,leaps
So I can grab these values and store them into an associative array...what I cant do, or see, is how do I get these to my sed script?
so a simple sed script could be like this
s/NAME/ value from f1
s/ACTION/ value from f1
so my out put for the template would be
the dog Maxs , likes to swim in water when he's bored
So if I ran a bash file, the command would look something like this, or what I have attempted
gawk -f f1 animalNameAction | sed -f (is there a way to put something here) template | cat
gawk -f f1 animalNameAction > PulledValues| sed -f PulledValues template | cat
but none of this has worked. So I am left wondering how this could be done.
You can do this, using awk itself,
I assume, template can be of multiline char,
so in FNR==NR{} block, I saved entire file (template) contents in variable t,
and in other block, I replaced NAME and ACTION with first and second fields from comma separated file.
Here is example :
$ cat template
the dog NAME , likes to ACTION in water when he's bored
$ cat file
Maxs,swim
StoneCold,digs
Thor,leaps
$ awk 'FNR==NR{ t = (t ? t RS :"") $0; next}{ s=t; gsub(/NAME/,$1,s); gsub(/ACTION/,$2,s); print s}' template FS=',' file
the dog Maxs , likes to swim in water when he's bored
the dog StoneCold , likes to digs in water when he's bored
the dog Thor , likes to leaps in water when he's bored
Better Readable :
awk 'FNR==NR{
t = (t ? t RS :"") $0;
next
}
{
s=t;
gsub(/NAME/,$1,s);
gsub(/ACTION/,$2,s);
print s
}
' template FS=',' file

read column from csv file in terminal ignoring the header

I'm writting a simple .ksh file to read a single column from a .csv file and then printing the output to the screen:
fname=($(cut -d, -f2 "myfile.csv"))
# loop through these names
for i in ${fname[#]};
do echo "$i"
done
This works fine but I don't want to return the header row, that is the first row of the file. How would I alter the cut command so that it ignore the first value or string. In this case the header is called 'NAME'. I want to print all of the other rows of this file.
That being said, is it easier to loop through from 2:fname as the code is currently written or is it best to alter the cut command?
You could do
fname=($(sed 1d myfile.csv | cut -d, -f2))
Alternately, the index of the first element of the array is 0: to start the loop at index 1:
for i in "${fname[#]:1}"; do
Demo:
$ a=(a b c d e f)
$ echo "${a[#]:1}"
b c d e f
Note, you should always put the array expansion in double quotes.

comparing csv files

I want to write a shell script to compare two .csv files. First one contains filename,path the second .csv file contains filename,paht,target. Now, I want to compare the two .csv files and output the target name where the file from the first .csv exists in the second .csv file.
Ex.
a.csv
build.xml,/home/build/NUOP/project1
eesX.java,/home/build/adm/acl
b.csv
build.xml,/home/build/NUOP/project1,M1
eesX.java,/home/build/adm/acl,M2
ddexse3.htm,/home/class/adm/33eFg
I want the output to be something like this.
M1 and M2
Please help
Thanks,
If you don't necessarily need a shell script, you can easily do it in Python like this:
import csv
seen = set()
for row in csv.reader(open('a.csv')):
seen.add(tuple(row))
for row in csv.reader(open('b.csv')):
if tuple(row[:2]) in seen:
print row[2]
if those M1 and M2 are always at field 3 and 5, you can try this
awk -F"," 'FNR==NR{
split($3,b," ")
split($5,c," ")
a[$1]=b[1]" "c[1]
next
}
($1 in a){
print "found: " $1" "a[$1]
}' file2.txt file1.txt
output
# cat file2.txt
build.xml,/home/build/NUOP/project1,M1 eesX.java,/home/build/adm/acl,M2 ddexse3.htm,/home/class/adm/33eFg
filename, blah,M1 blah, blah, M2 blah , end
$ cat file1.txt
build.xml,/home/build/NUOP/project1 eesX.java,/home/build/adm/acl
$ ./shell.sh
found: build.xml M1 M2
try http://sourceforge.net/projects/csvdiff/
Quote:
csvdiff is a Perl script to diff/compare two csv files with the possibility to select the separator. Differences will be shown like: "Column XYZ in record 999" is different. After this, the actual and the expected result for this column will be shown.

Resources