I am looking to repeat the same function for each gene in my genelist. This is what the while loop does. Then it extracts the files from the master document into a new bed file.
The number_of_lines variable is the number of rows in the document. And I want to create a document with the number of row corresponding to number_of_lines
i.e.
number_of_lines=1
output
1
number_of_lines=5
output
5
5
5
5
5
my code below
while read gene
do
grep -w $gene $masterfile | awk '{print $1"\t"$2"\t"$3"\t"$5"\t"$6"\t"$4}' > $gene.bed
number_of_lines=$(grep "^.*$" -c $gene.bed)
echo $number_of_lines
cat "" > $gene.1.bed
for i in 'eval echo {1..$number_of_lines}'
do
echo $number_of_lines >> $gene.1.bed
done
done < $genelist
if I do this by itself
cat "" > $gene.1.bed
for i in 'eval echo {1..$number_of_lines}'
do
echo $number_of_lines >> $gene.1.bed
done
it works?
You need to put eval echo {1..$number_of_lines} inside $() to expand to the output.
cat "" will get an error, that should be echo "". But simpler is to just put the output redirection around the entire loop instead of after each echo statement.
while read gene
do
grep -w "$gene" "$masterfile" | awk '{print $1"\t"$2"\t"$3"\t"$5"\t"$6"\t"$4}' > "$gene.bed"
number_of_lines=$(grep "^.*$" -c "$gene.bed")
echo $number_of_lines
for i in $(eval echo {1..$number_of_lines})
do
echo $number_of_lines
done > "$gene.1.bed"
done < "$genelist"
When you see eval, you "know" your code is wrong. #Barmar already pointed out the normal construction for ((i=0; i<$number_of_lines; i++)), what should be used here. With all lines having the same content, you have another possibility: yes. I made some other changes too.
while read gene
do
grep -w "${gene}" "${masterfile}" |
awk 'BEGIN {OFS="\t";} {print $1, $2, $3, $5, $6, $4}' > "${gene}.bed"
number_of_lines=$(wc -l < "${gene}.bed")
echo "${number_of_lines}"
yes "${number_of_lines}" | head -"${number_of_lines}" > "${gene}.1.bed"
done < "${genelist}"
For example I have
a.txt:
1 21 34
1 22 21
2 32 76
2 12 76
...
b.txt:
1 99 73
1 32 27
2 55 76
2 76 12
...
Expected output:
$ ./some_script 1 a.txt b.txt
0 # matched
# compare data in #1 column of a.txt to data in #1 column of b.txt
# data: a.txt b.txt
# 1 1
# 1 1
# 2 2
# 2 2
$ ./some_script 2 a.txt b.txt
1 # not matched
$ ./some_script 3 a.txt b.txt
1 # not matched
where parameters 1, 2, and 3 are column numbers.
Let's say, the some_script just did comparison between data in the same column from files a.txt and b.txt.
I need some program written in either bash, sed, or awk (or another possible programs) to do this job.
I would use a combination of paste and awk to achieve that
#!/bin/bash
[ -z "$1" -o -z "$2" -o -z "$3" ] && echo "Not enough arguments" && exit 1
[ ! -f "$2" -o ! -f "$3" ] && echo "input file(s) don't exist" && exit 1
awk -v var="$1" '$var!=$(NF/2+var){flag=1;exit}
END{print flag;}' <(paste "$2" "$3")
Save the file as, say, compare.sh, make it an executable and then run it like
./compare.sh 3 a.txt b.txt
[ "$(cut -d' ' -f1 a.txt)" = "$(cut -d' ' -f1 b.txt)" ]; echo $?
Explanation:
[ "string1" = "string2" ] - The test command. If the string1 equals to the string2, it returns 0, else 1. See man test for another information.
cut -d' ' -f1 a.txt - cut the first column from the file a.txt.
-d' ' - set the field delimiter to the space.
-f1 - select only the field number 1. You can use a variable, instead of the number 1 in this case, like the num=1; [ "$(cut -d' ' -f$num a.txt)" = "$(cut -d' ' -f$num b.txt)" ]; echo $?.
echo $? - print the exit status of the last executed program.
Simple one line solution with bash and awk
#!/bin/bash
[ "$(awk -F' ' "{print \$$1}" "$2")" == "$(awk -F' ' "{print \$$1}" "$3")" ] && echo 0 || echo 1
Output
./script 1 a.txt b.txt
0
./script 2 a.txt b.txt
1
./script 3 a.txt b.txt
1
Here's a bash version using custom file descriptors and arrays:
#!/bin/bash
exec 3< "$2"
exec 4< "$3"
while read -ru3 -a a && read -ru4 -a b; do
[ "${a[$(($1 - 1))]}" != "${b[$(($1 - 1))]}" ] && exit 1
done
exit 0
I have to put some data in a file which should be unique.
suppose in
file1 I have following data.
ABC
XYZ
PQR
and now I want to add MNO DES ABC then it should only copy "MNO" and "DES" as "ABC" is already present.
file1 should look like
ABC
XYZ
PQR
MNO
DES
(ABC should be there for only once.)
Easiest way: this sholud add non-matching line in f1
diff -c f1 f2|grep ^+|awk -F '+ ' '{print $NF}' >> f1
or if '+ ' is going to be a part of actual text:
diff -c f1 f2|grep ^+|awk -F '+ ' '{ for(i=2;i<=NF;i++)print $i}' >> f1
shell script way:
I have compare script that compares line counts/lenght etc.. but for your requirement I think below part should do the job....
input:
$ cat f1
ABC
XYZ
PQR
$ cat f2
MNO
DES
ABC
output after script*
$ ./compareCopy f1 f2
-----------------------------------------------------
comparing f1 f2
-----------------------------------------------------
Lines check - DONE
$ cat f1
ABC
XYZ
PQR
DES
MNO
#!/bin/sh
if [ $# != "2" ]; then
echo
echo "Requires arguments from command prompt"
echo "Usage: compare <file1> <file2>"
echo
exit
fi
proc="compareCopy"
#sort files for line by line compare
cat $1|sort > file1.tmp
cat $2|sort > file2.tmp
echo "-----------------------------------------------------"
echo " comparing $1 $2" |tee ${proc}_compare.result
echo "-----------------------------------------------------"
file1_lines=`wc -l $1|cut -d " " -f1`
file2_lines=`wc -l $2|cut -d " " -f1`
#Check each line
x=1
while [ "${x}" -le "${file1_lines}" ]
do
f1_line=`sed -n ${x}p file1.tmp`
f2_line=`sed -n ${x}p file2.tmp`
if [ "${f1_line}" != "${f2_line}" ]; then
echo "line number ${x} don't match in both $1 and $2 files" >> ${proc}_compare.result
echo "$1 line: "${f1_line}"" >> ${proc}_compare.result
echo "$2 line: "${f2_line}"" >> ${proc}_compare.result
# so add this line in file 1
echo $f2_line >> $1
fi
x=$[${x} +1]
done
rm -f file1.tmp file2.tmp
echo "Lines check - DONE" |tee -a ${proc}_compare.result
Use fgrep:
fgrep -vf file1 file2 > file2.tmp && cat file2.tmp >> file1 && rm file2.tmp
which fetches all lines of file2 that are not in file1 and appends the result to file1.
You may want to take a look at this post: grep -f maximum number of patterns?
Perl one liner
file one:
1
2
3
file two:
1
4
3
Print Only Unique Line
perl -lne 'print if ++$n{ $_ } == 1 ' file_one.txt file_two.txt
Or
perl -lne 'print unless ++$n{ $_ } ' file_one.txt file_two.txt
output
1
4
3
2
The natural way:
sort -u File1 File2 >Temp && mv Temp File1
The tricky way if the files are already sorted:
comm File1 File2 | awk '{$1=$1};1' >Temp && mv Temp File1
I'm using a comm in a infinite cycle for view if a new file incoming in a folder, but i not have difference from 2 files but for example if incominig file "a" i view in output:
a a.out a.txt b.txt test.cpp testshell.sh
a.out a.txt b.txt test.cpp testshell.sh
my Code is this:
#! /bin/ksh
ls1=$(ls);
echo $ls1 > a.txt;
while [[ 1 > 0 ]] ; do
ls2=$(ls);
echo $ls2 > b.txt;
#cat b.txt;
#sort b.txt > b.txt;
#diff -u a.txt b.txt;
#diff -a --suppress-common-lines -y a.txt b.txt
comm -3 a.txt b.txt;
printf "\n";
ls1=$ls2;
echo $ls1 > a.txt;
#cat a.txt;
#sleep 2;
#sort a.txt > a.txt;
done
THANKS
#! /bin/ksh
set -vx
PreCycle="$( ls -1 )"
while true
do
ThisCycle="$( ls -1 )"
echo "${PreCycle}${ThisCycle}" | uniq
PreCycle="${ThisCycle}"
sleep 10
done
give add and removed difference but without use of file. Could directly give new file same way but uniq -f 1 failed (don't understand why) when used on list prefixed by + and - depending of source
After writing some unix scripts I am able to manage to get data from different xml files to csv format and now I got stuck with the following problem
file1.csv : contains
1,5,6,7,8
2,3,4,5,9
1,6,10,11,12
1,5,11,12
file2.csv : contains
1,Mango,Tuna,Webby,Through,Franky,Sam,Sumo
2,Franky
3,Sam
4,Sumo
5,Mango,Tuna,Webby
6,Tuna,Webby,Through
7,Through,Sam,Sumo
8,Nothing
9,Sam,Sumo
10,Sumo,Mango,Tuna
11,Mango,Tuna,Webby,Through
12,Mango,Tuna,Webby,Through,Franky
output I want is
1,5,6,7,8
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Tuna,Webby,Through
Through,Sam,Sumo
Nothing
Common word:None
2,3,4,5,9
Franky
Sam
Sumo
Mango,Tuna,Webby
Sam, Sumo
Common Word:None
1,6,10,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Tuna,Webby,Through
Sumo,Mango,Tuna
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Tuna
1,5,11,12
Mango,Tuna,Webby,Through,Franky,Sam,Sumo
Mango,Tuna,Webby
Mango,Tuna,Webby,Through
Mango,Tuna,Webby,Through,Franky
Common word: Mango,Tuna,Webby
I apprecaite any help.
Thanks
I got some solution but not complete
##!/bin/bash
count=1
count_2=1
for i in `cat file1.csv`
do
echo $i > $count.txt
cat $count.txt | tr "," "\n" > $count_2.txt
count=`expr $count + 1`
count_2=`expr $count_2 + 1`
done;
#this code will create separte files for each line in file1.csv,
bash file3_search.sh
##########################
file3_search.sh
================
##!/bin/bash
cat file2.csv | sed '/^$/d' | sed 's/[ ]*$//' > trim.txt
dos2unix -q 1.txt 1.txt
dos2unix 2.txt 2.txt
dos2unix 3.txt 3.txt
echo "1st Combination results"
for i in `cat 1.txt`
do
cat trim.txt | egrep -w $i
done > Combination1.txt;
echo "2nd Combination results"
for i in `cat 2.txt`
do
cat trim.txt | egrep -w $i
done > Combination2.txt;
echo "3rd Combination results"
for i in `cat 3.txt`
do
cat trim.txt | egrep -w $i
done > Combination3.txt;
Guys I am not good at programming (I am software tester) please someone can re-factor my code and also please tell me how to get the common word in those Combination.txt file
IMHO it works:
for line in $(cat 1.csv) ; do
echo $line ;
grepline=`echo $line | sed 's/ \+//g;s/,/,|/g;s/^\(.*\)$/^(\1,)/'`;
egrep $grepline 2.csv
egrep $grepline 2.csv | \
awk -F "," '
{ for (i=2;i<=NF;i++)
{s[$i]+=1}
}
END { for (key in s)
{if (s[key]==NR) { tp+=key "," }
}
if (tp!="") {print "Common word(s): " gensub(/,$/,"","g",tp)}
else {print "Common word: None"}}'
echo
done
HTH
Here's an answer for you. It depends on associative array capabilities of bash version 4:
IFS=,
declare -a words
# read and store the words in file2
while read line; do
set -- $line
n=$1
shift
words[$n]="$*"
done < file2.csv
# read file1 and process
while read line; do
echo "$line"
set -- $line
indexes=( "$#" )
NF=${#indexes[#]}
declare -A common
for (( i=0; i<$NF; i++)); do
echo "${words[${indexes[$i]}]}"
set -- ${words[${indexes[$i]}]}
for word; do
common[$word]=$(( ${common[$word]} + 1))
done
done
printf "Common words: "
n=0
for word in "${!common[#]}"; do
if [[ ${common[$word]} -eq $NF ]]; then
printf "%s " $word
(( n++ ))
fi
done
[[ $n -eq 0 ]] && printf "None"
unset common
printf "\n\n"
done < file1.csv