Shell script: Check if data in columns `X` from two CSV files are matched - bash

For example I have
a.txt:
1 21 34
1 22 21
2 32 76
2 12 76
...
b.txt:
1 99 73
1 32 27
2 55 76
2 76 12
...
Expected output:
$ ./some_script 1 a.txt b.txt
0 # matched
# compare data in #1 column of a.txt to data in #1 column of b.txt
# data: a.txt b.txt
# 1 1
# 1 1
# 2 2
# 2 2
$ ./some_script 2 a.txt b.txt
1 # not matched
$ ./some_script 3 a.txt b.txt
1 # not matched
where parameters 1, 2, and 3 are column numbers.
Let's say, the some_script just did comparison between data in the same column from files a.txt and b.txt.
I need some program written in either bash, sed, or awk (or another possible programs) to do this job.

I would use a combination of paste and awk to achieve that
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && echo "Not enough arguments" && exit 1
[ ! -f "$2" -o ! -f "$3" ] && echo "input file(s) don't exist" && exit 1
awk -v var="$1" '$var!=$(NF/2+var){flag=1;exit}
END{print flag;}' <(paste "$2" "$3")
Save the file as, say, compare.sh, make it an executable and then run it like
./compare.sh 3 a.txt b.txt

[ "$(cut -d' ' -f1 a.txt)" = "$(cut -d' ' -f1 b.txt)" ]; echo $?
Explanation:
[ "string1" = "string2" ] - The test command. If the string1 equals to the string2, it returns 0, else 1. See man test for another information.
cut -d' ' -f1 a.txt - cut the first column from the file a.txt.
-d' ' - set the field delimiter to the space.
-f1 - select only the field number 1. You can use a variable, instead of the number 1 in this case, like the num=1; [ "$(cut -d' ' -f$num a.txt)" = "$(cut -d' ' -f$num b.txt)" ]; echo $?.
echo $? - print the exit status of the last executed program.

Simple one line solution with bash and awk
#!/bin/bash
[ "$(awk -F' ' "{print \$$1}" "$2")" == "$(awk -F' ' "{print \$$1}" "$3")" ] && echo 0 || echo 1
Output
./script 1 a.txt b.txt
0
./script 2 a.txt b.txt
1
./script 3 a.txt b.txt
1

Here's a bash version using custom file descriptors and arrays:
#!/bin/bash
exec 3< "$2"
exec 4< "$3"
while read -ru3 -a a && read -ru4 -a b; do
[ "${a[$(($1 - 1))]}" != "${b[$(($1 - 1))]}" ] && exit 1
done
exit 0

Related

in shell script how to print a line if the previous and the next line has a blank and the

I have a file like
abc
1234567890
0987654321
cde
fgh
ijk
1234567890
0987654321
I need to write a script that extract the lines with a blank before and after, in the example should be like this:
cde
fgh
I guess that awk or sed could do the work but I wasn't able to make them work. Any help?
Here is the solution.
#!/bin/bash
amt=$(sed -n '$=' path-to-your-file)
i=0
while :
do
((i++))
if [ $i == $amt ]; then
break
fi
if ! [ $i == 1 ]; then
j=$(expr $i - 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
j=$(expr $i + 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
emp=$(sed $i'!d' path-to-your-file)
echo >> extracted $emp
fi
fi
fi
done
With awk:
awk '
BEGIN{
RS=""
FS="\n"
}
NF==1' file
Prints:
cde
fgh
very simple solution
cat "myfile.txt" | grep -A 1 '^$' | grep -B 1 '^$' | grep -v -e '^--$' | grep -v '^$'
assuming "--" is the default group separator
you may get ride of group separator by other means like
--group-separator="" or --no-group-separator options
but depends of grep program variant (BSD, Gnu, OSX... )

How to run grep with while loop in shell script?

I'm trying to make shell script that counts 9 letters words consisted of A, G, C, T in B.txt.
First,
9bp_cases.txt contains
AAAAAAAAA
AAAAAAAAG
AAAAAAAAC
AAAAAAAAT
AAAAAAAGA
AAAAAAAGG
AAAAAAAGC
...
//#!/bin/bash
file=/Dataset/4.synTF/2.Sequence/9bp_cases.txt
# CSV 파일이 존재하지 않으면 종료
if [ ! -f "$file" ]; then
echo "CSV 파일이 존재하지 않습니다: $file" >&2
exit 1
fi
cat 9bp_cases.txt | while read line;
do
echo $line
grep -i -o $line B.txt | wc -w
done
The result is like this:
//AAAAAAAAA
0
AAAAAAAAG
0
AAAAAAAAC
0
AAAAAAAAT
0
AAAAAAAGA
0
None of the words is counted correctly.
However, when I run the simple code, it returns result well.
grep -i -o AAAAAAAA B.txt | wc -w
33410
I guess $line after grep is not recognized by the command grep.
Can you please help me?
Thank you.

How to copy only that data to a file which is not present in that file in shell script / bash?

I have to put some data in a file which should be unique.
suppose in
file1 I have following data.
ABC
XYZ
PQR
and now I want to add MNO DES ABC then it should only copy "MNO" and "DES" as "ABC" is already present.
file1 should look like
ABC
XYZ
PQR
MNO
DES
(ABC should be there for only once.)
Easiest way: this sholud add non-matching line in f1
diff -c f1 f2|grep ^+|awk -F '+ ' '{print $NF}' >> f1
or if '+ ' is going to be a part of actual text:
diff -c f1 f2|grep ^+|awk -F '+ ' '{ for(i=2;i<=NF;i++)print $i}' >> f1
shell script way:
I have compare script that compares line counts/lenght etc.. but for your requirement I think below part should do the job....
input:
$ cat f1
ABC
XYZ
PQR
$ cat f2
MNO
DES
ABC
output after script*
$ ./compareCopy f1 f2
-----------------------------------------------------
comparing f1 f2
-----------------------------------------------------
Lines check - DONE
$ cat f1
ABC
XYZ
PQR
DES
MNO
#!/bin/sh
if [ $# != "2" ]; then
echo
echo "Requires arguments from command prompt"
echo "Usage: compare <file1> <file2>"
echo
exit
fi
proc="compareCopy"
#sort files for line by line compare
cat $1|sort > file1.tmp
cat $2|sort > file2.tmp
echo "-----------------------------------------------------"
echo " comparing $1 $2" |tee ${proc}_compare.result
echo "-----------------------------------------------------"
file1_lines=`wc -l $1|cut -d " " -f1`
file2_lines=`wc -l $2|cut -d " " -f1`
#Check each line
x=1
while [ "${x}" -le "${file1_lines}" ]
do
f1_line=`sed -n ${x}p file1.tmp`
f2_line=`sed -n ${x}p file2.tmp`
if [ "${f1_line}" != "${f2_line}" ]; then
echo "line number ${x} don't match in both $1 and $2 files" >> ${proc}_compare.result
echo "$1 line: "${f1_line}"" >> ${proc}_compare.result
echo "$2 line: "${f2_line}"" >> ${proc}_compare.result
# so add this line in file 1
echo $f2_line >> $1
fi
x=$[${x} +1]
done
rm -f file1.tmp file2.tmp
echo "Lines check - DONE" |tee -a ${proc}_compare.result
Use fgrep:
fgrep -vf file1 file2 > file2.tmp && cat file2.tmp >> file1 && rm file2.tmp
which fetches all lines of file2 that are not in file1 and appends the result to file1.
You may want to take a look at this post: grep -f maximum number of patterns?
Perl one liner
file one:
1
2
3
file two:
1
4
3
Print Only Unique Line
perl -lne 'print if ++$n{ $_ } == 1 ' file_one.txt file_two.txt
Or
perl -lne 'print unless ++$n{ $_ } ' file_one.txt file_two.txt
output
1
4
3
2
The natural way:
sort -u File1 File2 >Temp && mv Temp File1
The tricky way if the files are already sorted:
comm File1 File2 | awk '{$1=$1};1' >Temp && mv Temp File1

Best way to merge two lines with same pattern

I have a text file like below
Input:
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
I am wondering the best way to merge two lines into:
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0
With this as the input file:
$ cat file
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
We can get the output you want with:
$ awk -F, -v OFS=, 'NR==1{first=$0;next;} {print first,$6,$7;}' file
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0
This is a more general solution that reads both files, item by item, where items are separated by comma. After the first mismatch, remaining items from the first line are appended to the output, followed by remaining items from the second line.
The most complicated tool this uses is sed. Looking at it again, even sed can be replaced.
#!/bin/bash
inFile="$1"
tmp=$(mktemp -d)
sed -n '1p' <"$inFile" | tr "," "\n" > "$tmp/in1"
sed -n '2p' <"$inFile" | tr "," "\n" > "$tmp/in2"
{ while true; do
read -r f1 <&3; r1=$?
read -r f2 <&4; r2=$?
[ $r1 -ne 0 ] || [ $r2 -ne 0 ] && break
[ $r1 -ne 0 ] && echo "$f2"
[ $r2 -ne 0 ] && echo "$f1"
if [ "$f1" == "$f2" ]; then
echo "$f1"
else
while echo "$f1"; do
read -r f1 <&3 || break
done
while echo "$f2"; do
read -r f2 <&4 || break
done
fi
done; } 3<"$tmp/in1" 4<"$tmp/in2" | tr '\n' ',' | sed 's/.$/\n/'
rm -rf "$tmp"
Assuming your input file looks like this:
$ cat in.txt
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,CALLS_TREATED,0
You can then run the script as:
$ ./merge.sh in.txt
05-29-2015,03:15:00,SESM1_0,ABC,interSesm,REDIRECTED_CALLS,0,CALLS_TREATED,0

Is there a command that works for command line arguments like the sort command does for files?

I am trying to write a script in BASH that will take between 1 and 5 command line arguments from the user and report them back in reverse numerical order to standard output. The only command I know that would work similarly to this is the sort command, but this only works for files. Is there a similar command for sorting command line arguments? Here is what I have so far.
#!/bin/bash
if [ $# -lt 1 ] || [ $# -gt 5 ];
then echo "Incorrect number of arguments!"
else
sorted=sort -rn $*
echo "SORTED: $sorted"
fi
Try:
sorted=$( printf '%s\n' "$#" | sort -rn )
printf '%s\n' "${sorted//$'\n'/ }"
You can give the sort command values from standard input. It expects every value on its own line, which you can achieve by combining echo and tr:
sorted=$(echo $* | tr ' ' '\n' | sort -rn - | tr '\n' ' ')
The last invocation of tr is only necessary if you want the result to be space-delimited again and not newline-delimited.
#!/bin/bash
if [ $# -lt 1 ] || [ $# -gt 5 ];
then echo "Incorrect number of arguments!"
else
sorted=$(echo $* | tr ' ' '\n' | sort -rn | tr '\n' ' ')
echo "SORTED: $sorted"
fi
echo $* | tr ' ' '\n' | sort -rn | tr '\n' ' '
You need to use command substitution $(...) to capture the output of a command like that.
#!/bin/bash
if [ $# -lt 1 ] || [ $# -gt 5 ]; then
echo "Incorrect number of arguments!"
else
sorted=$(for var in "$#"; do echo "$var"; done | sort -rn | tr -d '\n')
echo "SORTED: $sorted"
fi
$ ./test 1 2 3 4 5
SORTED: 5 4 3 2 1
$ ./test 5 4 3 2 1
SORTED: 5 4 3 2 1

Resources