How to copy only that data to a file which is not present in that file in shell script / bash? - bash

I have to put some data in a file which should be unique.
suppose in
file1 I have following data.
ABC
XYZ
PQR
and now I want to add MNO DES ABC then it should only copy "MNO" and "DES" as "ABC" is already present.
file1 should look like
ABC
XYZ
PQR
MNO
DES
(ABC should be there for only once.)

Easiest way: this sholud add non-matching line in f1
diff -c f1 f2|grep ^+|awk -F '+ ' '{print $NF}' >> f1
or if '+ ' is going to be a part of actual text:
diff -c f1 f2|grep ^+|awk -F '+ ' '{ for(i=2;i<=NF;i++)print $i}' >> f1
shell script way:
I have compare script that compares line counts/lenght etc.. but for your requirement I think below part should do the job....
input:
$ cat f1
ABC
XYZ
PQR
$ cat f2
MNO
DES
ABC
output after script*
$ ./compareCopy f1 f2
-----------------------------------------------------
comparing f1 f2
-----------------------------------------------------
Lines check - DONE
$ cat f1
ABC
XYZ
PQR
DES
MNO
#!/bin/sh
if [ $# != "2" ]; then
echo
echo "Requires arguments from command prompt"
echo "Usage: compare <file1> <file2>"
echo
exit
fi
proc="compareCopy"
#sort files for line by line compare
cat $1|sort > file1.tmp
cat $2|sort > file2.tmp
echo "-----------------------------------------------------"
echo " comparing $1 $2" |tee ${proc}_compare.result
echo "-----------------------------------------------------"
file1_lines=`wc -l $1|cut -d " " -f1`
file2_lines=`wc -l $2|cut -d " " -f1`
#Check each line
x=1
while [ "${x}" -le "${file1_lines}" ]
do
f1_line=`sed -n ${x}p file1.tmp`
f2_line=`sed -n ${x}p file2.tmp`
if [ "${f1_line}" != "${f2_line}" ]; then
echo "line number ${x} don't match in both $1 and $2 files" >> ${proc}_compare.result
echo "$1 line: "${f1_line}"" >> ${proc}_compare.result
echo "$2 line: "${f2_line}"" >> ${proc}_compare.result
# so add this line in file 1
echo $f2_line >> $1
fi
x=$[${x} +1]
done
rm -f file1.tmp file2.tmp
echo "Lines check - DONE" |tee -a ${proc}_compare.result

Use fgrep:
fgrep -vf file1 file2 > file2.tmp && cat file2.tmp >> file1 && rm file2.tmp
which fetches all lines of file2 that are not in file1 and appends the result to file1.
You may want to take a look at this post: grep -f maximum number of patterns?

Perl one liner
file one:
1
2
3
file two:
1
4
3
Print Only Unique Line
perl -lne 'print if ++$n{ $_ } == 1 ' file_one.txt file_two.txt
Or
perl -lne 'print unless ++$n{ $_ } ' file_one.txt file_two.txt
output
1
4
3
2

The natural way:
sort -u File1 File2 >Temp && mv Temp File1
The tricky way if the files are already sorted:
comm File1 File2 | awk '{$1=$1};1' >Temp && mv Temp File1

Related

in shell script how to print a line if the previous and the next line has a blank and the

I have a file like
abc
1234567890
0987654321
cde
fgh
ijk
1234567890
0987654321
I need to write a script that extract the lines with a blank before and after, in the example should be like this:
cde
fgh
I guess that awk or sed could do the work but I wasn't able to make them work. Any help?
Here is the solution.
#!/bin/bash
amt=$(sed -n '$=' path-to-your-file)
i=0
while :
do
((i++))
if [ $i == $amt ]; then
break
fi
if ! [ $i == 1 ]; then
j=$(expr $i - 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
j=$(expr $i + 1)
emp=$(sed $j'!d' path-to-your-file)
if [ h$emp == h ]; then
emp=$(sed $i'!d' path-to-your-file)
echo >> extracted $emp
fi
fi
fi
done
With awk:
awk '
BEGIN{
RS=""
FS="\n"
}
NF==1' file
Prints:
cde
fgh
very simple solution
cat "myfile.txt" | grep -A 1 '^$' | grep -B 1 '^$' | grep -v -e '^--$' | grep -v '^$'
assuming "--" is the default group separator
you may get ride of group separator by other means like
--group-separator="" or --no-group-separator options
but depends of grep program variant (BSD, Gnu, OSX... )

Shell script: Check if data in columns `X` from two CSV files are matched

For example I have
a.txt:
1 21 34
1 22 21
2 32 76
2 12 76
...
b.txt:
1 99 73
1 32 27
2 55 76
2 76 12
...
Expected output:
$ ./some_script 1 a.txt b.txt
0 # matched
# compare data in #1 column of a.txt to data in #1 column of b.txt
# data: a.txt b.txt
# 1 1
# 1 1
# 2 2
# 2 2
$ ./some_script 2 a.txt b.txt
1 # not matched
$ ./some_script 3 a.txt b.txt
1 # not matched
where parameters 1, 2, and 3 are column numbers.
Let's say, the some_script just did comparison between data in the same column from files a.txt and b.txt.
I need some program written in either bash, sed, or awk (or another possible programs) to do this job.
I would use a combination of paste and awk to achieve that
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && echo "Not enough arguments" && exit 1
[ ! -f "$2" -o ! -f "$3" ] && echo "input file(s) don't exist" && exit 1
awk -v var="$1" '$var!=$(NF/2+var){flag=1;exit}
END{print flag;}' <(paste "$2" "$3")
Save the file as, say, compare.sh, make it an executable and then run it like
./compare.sh 3 a.txt b.txt
[ "$(cut -d' ' -f1 a.txt)" = "$(cut -d' ' -f1 b.txt)" ]; echo $?
Explanation:
[ "string1" = "string2" ] - The test command. If the string1 equals to the string2, it returns 0, else 1. See man test for another information.
cut -d' ' -f1 a.txt - cut the first column from the file a.txt.
-d' ' - set the field delimiter to the space.
-f1 - select only the field number 1. You can use a variable, instead of the number 1 in this case, like the num=1; [ "$(cut -d' ' -f$num a.txt)" = "$(cut -d' ' -f$num b.txt)" ]; echo $?.
echo $? - print the exit status of the last executed program.
Simple one line solution with bash and awk
#!/bin/bash
[ "$(awk -F' ' "{print \$$1}" "$2")" == "$(awk -F' ' "{print \$$1}" "$3")" ] && echo 0 || echo 1
Output
./script 1 a.txt b.txt
0
./script 2 a.txt b.txt
1
./script 3 a.txt b.txt
1
Here's a bash version using custom file descriptors and arrays:
#!/bin/bash
exec 3< "$2"
exec 4< "$3"
while read -ru3 -a a && read -ru4 -a b; do
[ "${a[$(($1 - 1))]}" != "${b[$(($1 - 1))]}" ] && exit 1
done
exit 0

How to pass a variable string to a file txt at the biginig of test?

I have a problem
I Have a program general like this gene.sh
that for all file (es file: geneX.csv) make a directory with the name of gene (example: Genex/geneX.csv) next this program compile an other program inside gene.sh but this progrm need a varieble and I dont know how do it.
this is the program gene.sh
#!/bin/bash
# Create a dictory for each file *.xls and *.csv
for fname in *.xlsx *csv
do
dname=${fname%.*}
[[ -d $dname ]] || mkdir "$dname"
mv "$fname" "$dname"
done
# For each gene go inside the directory and compile the programs getChromosomicPositions.sh to have the positions, and getHapolotipeStings.sh to have the variants
for geni in */; do
cd $geni
z=$(tail -n 1 *.csv | tr ';' "\n" | wc -l)
cd ..
cp getChromosomicPositions.sh $geni --->
cp getHaplotypeStrings.sh $geni
cd $geni
export z
./getChromosomicPositions.sh *.csv
export z
./getHaplotypeStrings.sh *.csv
cd ..
done
This is the program getChromosomichPositions.sh:
rm chrPosRs.txt
grep '^Haplotype\ ID' $1 | cut -d ";" -f 4-61 | tr ";" "\n" | awk '{print "select chrom,chromStart,chromEnd,name from snp147 where name=\""$1"\";"}' > listOfQuery.txt
while read l; do
echo $l > query.txt
mysql -h genome-mysql.cse.ucsc.edu -u genome -A -D hg38 --skip-column-names < query.txt > queryResult.txt
if [[ "$(cat queryResult.txt)" == "" ]];
then
cat query.txt |
while read line; do
echo $line | awk '$6 ~/rs/ {print $6}' > temp.txt;
if [[ "$(cat temp.txt)" != "" ]];
then cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g' > temp.txt;
./getHGSVposHG19.sh temp.txt ---> Hear the problem--->
else
echo $line | awk '{num=sub(/.*:g\./,"");num+=sub(/\".*/,"");if(num==2){print};num=""}' > temp2.txt
fi
done
cat query.txt >> varianti.txt
echo "Missing Data" >> chrPosRs.txt
else
cat queryResult.txt >> chrPosRs.txt
fi
done < listOfQuery.txt
rm query*
hear the problem:
I need to enter in the file temp.txt and put automatically at the beginning of the file the variable $geni of the program gene.sh
How can I do that?
Why not pass "$geni" as say the first argument when invoking your script, and treating the rest of the arguments as your expected .csv files.
./getChromosomicPositions.sh "$geni" *.csv
Alternatively, you can set it as environment variable for the script, so that it can be used there (or just export it).
geni="$geni" ./getChromosomicPositions.sh *.csv
In any case, once you have it available in the second script, you can do
if passed as the first argument:
echo "${1}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')
or if passed as environment variable:
echo "${geni}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')

bash, adding string after a line

I'm trying to put together a bash script that will search a bunch of files and if it finds a particular string in a file, it will add a new line on the line after that string and then move on to the next file.
#! /bin/bash
echo "Creating variables"
SEARCHDIR=testfile
LINENUM=1
find $SEARCHDIR* -type f -name *.xml | while read i; do
echo "Checking $i"
ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
cat $i | while read LINE; do
((LINENUM=LINENUM+1))
if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
echo "editing $i"
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
fi
done
fi
LINENUM=1
done
the bit I'm having trouble with is
awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
if I just use $i at the end, it will output the content to the screen, if I use $i > $i then it will just erase the file and if I use $i >> $i it will get stuck in a loop until the disk fills up.
any suggestions?
Unfortunately awk dosen't have an in-place replacement option, similar to sed's -i, so you can create a temp file and then remove it:
awk '{commands}' file > tmpfile && mv tmpfile file
or if you have GNU awk 4.1.0 or newer, the -i inplace is added, so you can do:
awk -i inplace '{commands}' file
to modify the original
#cat $i | while read LINE; do
# ((LINENUM=LINENUM+1))
# if [[ $LINE == "<STRING_TO_SEARCH_FOR>" ]] ; then
# echo "editing $i"
# awk -v "n=$LINENUM" -v "s=new line to insert" '(NR==n) { print s } 1' $i
# fi
# done
# replaced by
sed -i 's/STRING_TO_SEARCH_FOR/&\n/g' ${i}
or use awk in place of sed
also
# ISBE=`cat $i | grep STRING_TO_SEARCH_FOR`
# if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
#by
if [ $( grep -c 'STRING_TO_SEARCH_FOR' ${i} ) -gt 0 ]; then
# if file are huge, if not directly used sed on it, it will be faster (but no echo about finding the file)
If you can, maybe use a temporary file?
~$ awk ... $i > tmpfile
~$ mv tmpfile $i
Or simply awk ... $i > tmpfile && mv tmpfile $i
Note that, you can use mktemp to create this temporary file.
Otherwise, with sed you can insert a line right after a match:
~$ cat f
auie
nrst
abcd
efgh
1234
~$ sed '/abcd/{a\
new_line
}' f
auie
nrst
abcd
new_line
efgh
1234
The command search if the line matches /abcd/, if so, it will append (a\) the line new_line.
And since sed as the -i to replace inline, you can do:
if [[ $ISBE =~ "STRING_TO_SEARCH_FOR" ]] ; then
echo "found $i"
echo "editing $i"
sed -i "/STRING_TO_SEARCH_FOR/{a
\new line to insert
}" $i
fi

how to map one csv file content to second csv file and write it another csv using unix

After writing some unix scripts I am able to manage to get data from different xml files to csv format and now I got stuck with the following problem
file1.csv : contains
1,5,6,7,8
2,3,4,5,9
1,6,10,11,12
1,5,11,12
file2.csv : contains
1,Mango,Tuna,Webby,Through,Franky,Sam,Sumo
2,Franky
3,Sam
4,Sumo
5,Mango,Tuna,Webby
6,Tuna,Webby,Through
7,Through,Sam,Sumo
8,Nothing
9,Sam,Sumo
10,Sumo,Mango,Tuna
11,Mango,Tuna,Webby,Through
12,Mango,Tuna,Webby,Through,Franky
output I want is
1,5,6,7,8
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Tuna,Webby,Through
Through,Sam,Sumo
Nothing
Common word:None
2,3,4,5,9
Franky
Sam
Sumo
Mango,Tuna,Webby
Sam, Sumo
Common Word:None
1,6,10,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Tuna,Webby,Through
Sumo,Mango,Tuna
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Tuna
1,5,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Mango,Tuna,Webby
I apprecaite any help.
Thanks
I got some solution but not complete
##!/bin/bash
count=1
count_2=1
for i in `cat file1.csv`
do
echo $i > $count.txt
cat $count.txt | tr "," "\n" > $count_2.txt
count=`expr $count + 1`
count_2=`expr $count_2 + 1`
done;
#this code will create separte files for each line in file1.csv,
bash file3_search.sh
##########################
file3_search.sh
================
##!/bin/bash
cat file2.csv | sed '/^$/d' | sed 's/[ ]*$//' > trim.txt
dos2unix -q 1.txt 1.txt
dos2unix 2.txt 2.txt
dos2unix 3.txt 3.txt
echo "1st Combination results"
for i in `cat 1.txt`
do
cat trim.txt | egrep -w $i
done > Combination1.txt;
echo "2nd Combination results"
for i in `cat 2.txt`
do
cat trim.txt | egrep -w $i
done > Combination2.txt;
echo "3rd Combination results"
for i in `cat 3.txt`
do
cat trim.txt | egrep -w $i
done > Combination3.txt;
Guys I am not good at programming (I am software tester) please someone can re-factor my code and also please tell me how to get the common word in those Combination.txt file
IMHO it works:
for line in $(cat 1.csv) ; do
echo $line ;
grepline=`echo $line | sed 's/ \+//g;s/,/,|/g;s/^\(.*\)$/^(\1,)/'`;
egrep $grepline 2.csv
egrep $grepline 2.csv | \
awk -F "," '
{ for (i=2;i<=NF;i++)
{s[$i]+=1}
}
END { for (key in s)
{if (s[key]==NR) { tp+=key "," }
}
if (tp!="") {print "Common word(s): " gensub(/,$/,"","g",tp)}
else {print "Common word: None"}}'
echo
done
HTH
Here's an answer for you. It depends on associative array capabilities of bash version 4:
IFS=,
declare -a words
# read and store the words in file2
while read line; do
set -- $line
n=$1
shift
words[$n]="$*"
done < file2.csv
# read file1 and process
while read line; do
echo "$line"
set -- $line
indexes=( "$#" )
NF=${#indexes[#]}
declare -A common
for (( i=0; i<$NF; i++)); do
echo "${words[${indexes[$i]}]}"
set -- ${words[${indexes[$i]}]}
for word; do
common[$word]=$(( ${common[$word]} + 1))
done
done
printf "Common words: "
n=0
for word in "${!common[#]}"; do
if [[ ${common[$word]} -eq $NF ]]; then
printf "%s " $word
(( n++ ))
fi
done
[[ $n -eq 0 ]] && printf "None"
unset common
printf "\n\n"
done < file1.csv

Resources