Take first first column from 1.csv and find in second 2.csv - bash

I want to read two csv files (a.csv and b.csv) and write a new csv file new.csv with a status of each column. I want to do that with a shell script.
A.csv:
Inputfile_name,Date
abc.csv,2018/11/26 16.38.54
bbc.csv,2018/11/26 15.28.11
B.csv:
Outputfile_name,Date
abc_SUCCESS.csv,2018/11/26 17.20.11
bbc_FAIL.csv,2018/11/26 16.28.11
new.csv:
Inputfile_name,Date,Outputfile_name,Date,Status
abc.csv,2018/11/26 16.38.54,abc_SUCCESS.csv,2018/11/26 17.20.11,SUCCESS
bbc.csv,2018/11/26 15.28.11,bbc_FAIL.csv,2018/11/26 16.28.11,FAIL

Like so?
$ paste -d, A.csv B.csv | sed -e 's/\(SUCCESS\|FAIL\).*/&,\1/'
Inputfile_name,Date,Outputfile_name,Date
abc.csv,2018/11/26 16.38.54,abc_SUCCESS.csv,2018/11/26 17.20.11,SUCCESS
bbc.csv,2018/11/26 15.28.11,bbc_FAIL.csv,2018/11/26 16.28.11,FAIL
paste can concatenate the contents of two files linewise. And with sed you can do a search+replace operation for adding SUCCESS or FAIL at the end of each line.

Related

Unmask data from matrix linux shell

i have 2 file.
analizeddata.txt:
A001->A002->A003->A004
A001->A005->A007
A022->A033
[...]
and
matrix.txt:
A001|Scott
A002|Bob
A003|Mark
A004|Jane
A005|Elion
A007|Brooke
A022|Meggie
A023|Tif
[..]
How i can replace in analizeddata.txt, or obtain a new file, with the second column of matrix.txt?
The expected output file will be as:
Scott->Bob->Mark->Jane
Scott->Elion->Brooke
Meggie->Tif
[...]
Thanks
Just use sed to replace the string what you want.
sed 's/|/\//g' matrix.txt will generate the replace pattern likes A001/Scott which will be used as regexp/replacement of the second sed s/regexp/replacement/ command.
sed -i option will update directly analizeddata.txt file, back up it before exec this command.
for replace_mode in $(sed 's/|/\//g' matrix.txt); do sed -i 's/'$replace_mode'/g' analizeddata.txt; done
Suggesting awk script:
awk -F"|" 'FNR==NR{arr[$1]=$2;next}{for(i in arr)gsub(i,arr[i])}1' matrix.txt analizeddata.txt
with provided sample data, results:
Scott->Bob->Mark->Jane
Scott->Elion->Brooke
Meggie->A033

replace string with exact match in bash script

I have a many repeated content as give below in a file . These are only uniq content.
CHECKSUM="Y"
CHECKSUM="N"
CHECKSUM="U"
CHECKSUM="
I want to replace empty field with "Null" and need output as :
CHECKSUM="Y"
CHECKSUM="N"
CHECKSUM="U"
CHECKSUM="Null"
What I can think of as :
#First find the matching content
cat file.txt | egrep 'CHECKSUM="Y"|CHECKSUM="N"|CHECKSUM="U"' > file_contain.txt
# Find the content where given string are not there
cat file.txt | egrep -v 'CHECKSUM="Y"|CHECKSUM="N"|CHECKSUM="U"' > file_donot_contain.txt
# Replace the string in content not found file
sed -i 's/CHECKSUM="/CHECKSUM="Null"/g' file_donot_contain.txt
# Merge the files
cat file_contain.txt file_donot_contain.txt > output.txt
But I find this is not efficient way of doing. Any other suggestion ?
To achieve this you need to mark that this is the end of the line, not just part of it, using $ (And optionally ^ to mark the start of the line too):
sed -i s'/^CHECKSUM="$/CHECKSUM="Null"/' file.txt

Remove duplicates from the same line in a file

How do I remove below duplicates from the same line in a file? I need the duplicates removed including semicolon.
For example from the below output of a file I need only "dg01.server.wmq.host=jms1001-01-ri5.ri5.dc2.responsys.com" similarly other lines of the file.
dg01.server.wmq.host=jms1001-01-ri5.ri5.dc2.responsys.com;jms1001-02-ri5.ri5.dc2.responsys.com dg02.server.wmq.host=jms1002-01-ri5.ri5.dc2.responsys.com;jms1002-02-ri5.ri5.dc2.responsys.com dg03.server.wmq.host=jms1003-01-ri5.ri5.dc2.responsys.com;jms1003-02-ri5.ri5.dc2.responsys.com dg04.server.wmq.host=jms1004-01-ri5.ri5.dc2.responsys.com;jms1004-02-ri5.ri5.dc2.responsys.com dg05.server.wmq.host=jms1005-01-ri5.ri5.dc2.responsys.com;jms1005-02-ri5.ri5.dc2.responsys.com dg06.server.wmq.host=jms1006-01-ri5.ri5.dc2.responsys.com;jms1006-02-ri5.ri5.dc2.responsys.com dg07.server.wmq.host=jms1007-01-ri5.ri5.dc2.responsys.com;jms1007-02-ri5.ri5.dc2.responsys.com dg08.server.wmq.host=jms1008-01-ri5.ri5.dc2.responsys.com;jms1008-02-ri5.ri5.dc2.responsys.com dg09.server.wmq.host=jms1009-01-ri5.ri5.dc2.responsys.com;jms1009-02-ri5.ri5.dc2.responsys.com dg10.server.wmq.host=jms1010-01-ri5.ri5.dc2.responsys.com;jms1010-02-ri5.ri5.dc2.responsys.com dg11.server.wmq.host=jms1011-01-ri5.ri5.dc2.responsys.com;jms1011-02-ri5.ri5.dc2.responsys.com dg12.server.wmq.host=jms1012-01-ri5.ri5.dc2.responsys.com;jms1012-02-ri5.ri5.dc2.responsys.com dg13.server.wmq.host=jms1013-01-ri5.ri5.dc2.responsys.com;jms1013-02-ri5.ri5.dc2.responsys.com dg14.server.wmq.host=jms1014-01-ri5.ri5.dc2.responsys.com;jms1014-02-ri5.ri5.dc2.responsys.com dg15.server.wmq.host=jms1015-01-ri5.ri5.dc2.responsys.com;jms1015-02-ri5.ri5.dc2.responsys.com dg16.server.wmq.host=jms1001-01-ri5.ri5.dc2.responsys.com;jms1001-02-ri5.ri5.dc2.responsys.com dg17.server.wmq.host=jms1002-01-ri5.ri5.dc2.responsys.com;jms1002-02-ri5.ri5.dc2.responsys.com dg18.server.wmq.host=jms1003-01-ri5.ri5.dc2.responsys.com;jms1003-02-ri5.ri5.dc2.responsys.com dg19.server.wmq.host=jms1004-01-ri5.ri5.dc2.responsys.com;jms1004-02-ri5.ri5.dc2.responsys.com dg20.server.wmq.host=jms1005-01-ri5.ri5.dc2.responsys.com;jms1005-02-ri5.ri5.dc2.responsys.com dg21.server.wmq.host=jms1006-01-ri5.ri5.dc2.responsys.com;jms1006-02-ri5.ri5.dc2.responsys.com dg22.server.wmq.host=jms1007-01-ri5.ri5.dc2.responsys.com;jms1007-02-ri5.ri5.dc2.responsys.com dg23.server.wmq.host=jms1008-01-ri5.ri5.dc2.responsys.com;jms1008-02-ri5.ri5.dc2.responsys.com dg24.server.wmq.host=jms1009-01-ri5.ri5.dc2.responsys.com;jms1009-02-ri5.ri5.dc2.responsys.com dg25.server.wmq.host=jms1010-01-ri5.ri5.dc2.responsys.com;jms1010-02-ri5.ri5.dc2.responsys.com dg26.server.wmq.host=jms1011-01-ri5.ri5.dc2.responsys.com;jms1011-02-ri5.ri5.dc2.responsys.com dg27.server.wmq.host=jms1012-01-ri5.ri5.dc2.responsys.com;jms1012-02-ri5.ri5.dc2.responsys.com dg28.server.wmq.host=jms1013-01-ri5.ri5.dc2.responsys.com;jms1013-02-ri5.ri5.dc2.responsys.com dg29.server.wmq.host=jms1014-01-ri5.ri5.dc2.responsys.com;jms1014-02-ri5.ri5.dc2.responsys.com dg30.server.wmq.host=jms1015-01-ri5.ri5.dc2.responsys.com;jms1015-02-ri5.ri5.dc2.responsys.com dg31.server.wmq.host=jms1001-01-ri5.ri5.dc2.responsys.com;jms1001-02-ri5.ri5.dc2.responsys.com dg32.server.wmq.host=jms1002-01-ri5.ri5.dc2.responsys.com;jms1002-02-ri5.ri5.dc2.responsys.com dg33.server.wmq.host=jms1003-01-ri5.ri5.dc2.responsys.com;jms1003-02-ri5.ri5.dc2.responsys.com dg34.server.wmq.host=jms1004-01-ri5.ri5.dc2.responsys.com;jms1004-02-ri5.ri5.dc2.responsys.com dg35.server.wmq.host=jms1009-01-ri5.ri5.dc2.responsys.com;jms1009-02-ri5.ri5.dc2.responsys.com dg36.server.wmq.host=jms1010-01-ri5.ri5.dc2.responsys.com;jms1010-02-ri5.ri5.dc2.responsys.com dg37.server.wmq.host=jms1011-01-ri5.ri5.dc2.responsys.com;jms1011-02-ri5.ri5.dc2.responsys.com dg38.server.wmq.host=jms1012-01-ri5.ri5.dc2.responsys.com;jms1012-02-ri5.ri5.dc2.responsys.com dg39.server.wmq.host=jms1007-01-ri5.ri5.dc2.responsys.com;jms1007-02-ri5.ri5.dc2.responsys.com dg40.server.wmq.host=jms1008-01-ri5.ri5.dc2.responsys.com;jms1008-02-ri5.ri5.dc2.responsys.com
Assuming dg01.server.wmq.host=jms1001-01-ri5.ri5.dc2.responsys.com;jms1001-02-ri5.ri5.dc2.responsys.com is a line in your input file and you're only interested in the dg01.server.wmq.host=jms1001-01-ri5.ri5.dc2.responsys.com part (up to, but not including, the semicolumn) you can obtain the desired output by running:
cat inputfile | awk -F ';' {'print $1'}
Another way to obtain the same output, as pointed out by #Shawn, would be:
cut -d ';' -f1 inputfile

How to split a CSV file into multiple files based on column value

I have CSV file which could look like this:
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
there could more or less rows and I need to split it into multiple .dat files each containing rows with the same value of the second column of this file. (Then I will make bar chart for each .dat file) For this case it should be two files:
data1.dat
name1;1;11880
name2;1;260.483
name3;1;3355.82
name4;1;4179.48
data2.dat
name1;2;10740.4
name2;2;1868.69
name3;2;341.375
name4;2;4783.9
Is there any simple way of doing it with bash?
You can use awk to generate a file containing only a particular value of the second column:
awk -F ';' '($2==1){print}' data.dat > data1.dat
Just change the value in the $2== condition.
Or, if you want to do this automatically, just use:
awk -F ';' '{print > ("data"$2".dat")}' data.dat
which will output to files containing the value of the second column in the name.
Try this:
while IFS=";" read -r a b c; do echo "$a;$b;$c" >> data${b}.dat; done <file

search a pattern in file and output each pattern result in its own file using awk, sed

I have a file of numbers in each new line:
$cat test
700320947
700509217
701113187
701435748
701435889
701667717
701668467
702119126
702306577
702914910
that I want to search details of from another larger file with several comma separated fields and out put results in
700320947.csv
700509217.csv
701113187.csv
701435748.csv
701435889.csv
701667717.csv
701668467.csv
702119126.csv
702306577.csv
702914910.csv
Logic:
ls test | while read file; do zgrep $line *large*file*gz >> $line.csv ; done
Please assist.
Thanks
Since nothing said about the structure of the large file, I'll just assume that the numbers in test are to be found in the second column of the large file; generalize as needed.
This can be done in a single pass through each of the files by using output redirection in awk:
awk -F"," 'FILENAME == "test" { num[$1]=1; next }
num[$2] { print > $2".csv" }' test bigfile
Unzip the large file first; using zgrep means unzipping on-the-fly for every line of the number file... very inefficient. After unzipping the big file, this will do it:
for number in `cat test`; do grep $number bigfile > $number.csv; done
Edited:
To limit hits to whole words only (eg 702119126 won't match 1702119126), add word boundaries to the regex:
for number in `cat test`; do grep \\b$number\\b bigfile > $number.csv; done

Resources