For loop function doesn't work within a while loop - bash

I am looking to repeat the same function for each gene in my genelist. This is what the while loop does. Then it extracts the files from the master document into a new bed file.
The number_of_lines variable is the number of rows in the document. And I want to create a document with the number of row corresponding to number_of_lines
i.e.
number_of_lines=1
output
1
number_of_lines=5
output
5
5
5
5
5
my code below
while read gene
do
grep -w $gene $masterfile | awk '{print $1"\t"$2"\t"$3"\t"$5"\t"$6"\t"$4}' > $gene.bed
number_of_lines=$(grep "^.*$" -c $gene.bed)
echo $number_of_lines
cat "" > $gene.1.bed
for i in 'eval echo {1..$number_of_lines}'
do
echo $number_of_lines >> $gene.1.bed
done
done < $genelist
if I do this by itself
cat "" > $gene.1.bed
for i in 'eval echo {1..$number_of_lines}'
do
echo $number_of_lines >> $gene.1.bed
done
it works?

You need to put eval echo {1..$number_of_lines} inside $() to expand to the output.
cat "" will get an error, that should be echo "". But simpler is to just put the output redirection around the entire loop instead of after each echo statement.
while read gene
do
grep -w "$gene" "$masterfile" | awk '{print $1"\t"$2"\t"$3"\t"$5"\t"$6"\t"$4}' > "$gene.bed"
number_of_lines=$(grep "^.*$" -c "$gene.bed")
echo $number_of_lines
for i in $(eval echo {1..$number_of_lines})
do
echo $number_of_lines
done > "$gene.1.bed"
done < "$genelist"

When you see eval, you "know" your code is wrong. #Barmar already pointed out the normal construction for ((i=0; i<$number_of_lines; i++)), what should be used here. With all lines having the same content, you have another possibility: yes. I made some other changes too.
while read gene
do
grep -w "${gene}" "${masterfile}" |
awk 'BEGIN {OFS="\t";} {print $1, $2, $3, $5, $6, $4}' > "${gene}.bed"
number_of_lines=$(wc -l < "${gene}.bed")
echo "${number_of_lines}"
yes "${number_of_lines}" | head -"${number_of_lines}" > "${gene}.1.bed"
done < "${genelist}"

Related

copy text in another file and append different strings shell script

file=$2
isHeader=$true
while read -r line;
do
if [ $isHeader ]
then
sed "1i$line",\"BATCH_ID\"\n >> $file
else
sed "$line,1"\a >> $file
fi
isHeader=$false
done < $1
echo $file
In the first line I want to append a string and to the others lines I want to append the same string for the rest of the lines. I tried this but it doesn't work. I don't have any ideas, can somebody help me please?
Not entirely clear to me what you want to do, but if you simply want to append text at the end of each line, use echo in place of sed:
file=$2
isHeader=1
while read -r line;
do
if [ $isHeader ]
then
#sed "1i$line",\"BATCH_ID\"\n >> $file
echo "${line},\"BATCH_ID\"\n" > $file
else
#sed "$line,1"\a >> $file
echo "${line},1\a" >> $file
fi
isHeader=0
done < $1
cat $file
The accepted answer is slightly wrong because echo...\a produces a bell. Also, awk or sed support regular expressions and are 10x faster at line-by-line processing. Here it is in awk:
#! /bin/sh
script='NR == 1 { print $0 ",\"BATCH_ID\"" }
NR > 1 { print $0 ",1" }'
awk "$script" $1 > $2
In sed it's even simpler:
sed '1 s/$/,"BATCH_ID"/; 2,$ s/$/,1/' $1 > $2
To convince yourself of the speed, try this yourself:
$ time seq 100000 | while read f; do echo ${f}foo; done > /dev/null
real 0m2.068s
user 0m1.708s
sys 0m0.364s
$ time seq 100000 | sed 's/$/foo/' > /dev/null
real 0m0.166s
user 0m0.156s
sys 0m0.017s

Simple bash script to split csv file by week number

I'm trying to separate a large pipe-delimited file based on a week number field. The file contains data for a full year thus having 53 weeks. I am hoping to create a loop that does the following:
1) check if week number is less than 10 - if it is paste a '0' in front
2) use grep to send the rows to a file (ie `grep '|01|' bigFile.txt > smallFile.txt` )
3) gzip the smaller file (ie `gzip smallFile.txt`)
4) repeat
Is there a resource that would show how to do this?
EDIT :
Data looks like this:
1|#gmail|1|0|0|0|1|01|com
1|#yahoo|0|1|0|0|0|27|com
The column I care about is the 2nd from the right.
EDIT 2:
Here's the script I'm using but it's not functioning:
for (( i = 1; i <= 12; i++ )); do
#statements
echo 'i :'$i
q=$i
# echo $q
# $q==10
if [[ q -lt 10 ]]; then
#statements
k='0'$q
echo $k
grep '|$k|' 20150226_train.txt > 'weeks_files/week'$k
gzip weeks_files/week $k
fi
if [[ q -gt 9 ]]; then
#statements
echo $q
grep \'|$q|\' 20150226_train.txt > 'weeks_files/week'$q
gzip 'weeks_files/week'$q
fi
done
Very simple in awk ...
awk -F'|' '{ print > ("smallfile-" $(NF-1) ".txt";) }' bigfile.txt
Edit: brackets added for "original-awk".
You're almost there.
#!/bin/bash
for (( i = 1; i <= 12; i++ )); do
#statements
echo 'i :'$i
q=$i
# echo $q
# $q==10
#OLD if [[ q -lt 10 ]]; then
if [[ $q -lt 10 ]]; then
#statements
k='0'$q
echo $k
#OLD grep '|$k|' 20150226_train.txt > 'weeks_files/week'$k
grep "|$k|" 20150226_train.txt > 'weeks_files/week'$k
#OLD gzip weeks_files/week $k
gzip weeks_files/week$k
#OLD fi
#OLD if [[ q -gt 9 ]]; then
elif [[ $q -gt 9 ]] ; then
#statements
echo $q
#OLD grep \'|$q|\' 20150226_train.txt > 'weeks_files/week'$q
grep "|$q|" 20150226_train.txt > 'weeks_files/week'$q
gzip 'weeks_files/week'$q
fi
done
You didn't alway use $ in front of your variable values. You can only get away with using k or q without a $ inside the shell arthimetic substitution feature, ie z=$(( x+k)) or just to operate on a variable like (( k++ )). There are others.
You need to learn the difference between single quoting and dbl-quoting. You need to use dbl-quoting when you want a value substituted for a variable, as in your lines
grep "|$q|" 20150226_train.txt > 'weeks_files/week'$q
and others.
I'm guessing that your use of grep \'|$q|\' 20150226_train.txt was an attempt to get the value of $q.
The way to get comfortable with debugging this sort of situation is to set the shell debugging option with set -x (turn it off with set +x). You'll see each line that is executed with the values substituted for the variables. Advanced debugging requires echo "varof Interset now = $var" (print statements). Also, you can use set -vx (and set +vx) to see each line or block of code before it is executed, and then the -x output will show which lines where acctually executed. For your script, you'd see the whole if ... elfi ...fi block printed, and then just the lines of -x output with values for variables. It can be confusing, even after years of looking at it. ;-)
So you can go thru and remove all lines with the prefix #OLD, and I'm hoping your code will work for you.
IHTH
mkdir -p weeks_files &&
awk -F'|' '
{ file=sprintf("weeks_files/week%2d",$(NF-1)); print > file }
!seen[file]++ { print file }
' 20150226_train.txt |
xargs gzip
If your data is ordered so that all of the rows for a given week number are contiguous you can make it simpler and more efficient:
mkdir -p weeks_files &&
awk -F'|' '
$(NF-1) != prev { file=sprintf("weeks_files/week%2d",$(NF-1)); print file }
{ print > file; prev=$(NF-1) }
' 20150226_train.txt |
xargs gzip
There are certainly a number of approaches - the 'awk' line below will reformat your data. If you take a sequential approach, then:
1) awk to reformat
awk -F '|' '{printf "%s|%s|%s|%s|%s|%s|%s|%02d|%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9}' SOURCE_FILE > bigFile.txt
2) loop through the weeks, create small file an zip it
for N in {01..53}
do
grep "|${N}|" bigFile.txt > smallFile.${N}.txt
gzip smallFile.${N}.txt
done
3) test script showing reformat step
#!/bin/bash
function show_data {
# Data set w/9 'fields'
# 1| 2 |3|4|5|6|7| 8|9
cat << EOM
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|2|com
1|#gmail|1|0|0|0|1|5|com
1|#yahoo|0|1|0|0|0|27|com
EOM
}
###
function stars {
echo "## $# ##"
}
###
stars "Raw data"
show_data
stars "Modified data"
# 1| 2| 3| 4| 5| 6| 7| 8|9 ##
show_data | awk -F '|' '{printf "%s|%s|%s|%s|%s|%s|%s|%02d|%s\n", $1, $2, $3, $4, $5, $6, $7, $8, $9}'
Sample run:
$ bash test.sh
## Raw data ##
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|2|com
1|#gmail|1|0|0|0|1|5|com
1|#yahoo|0|1|0|0|0|27|com
## Modified data ##
1|#gmail|1|0|0|0|1|01|com
1|#gmail|1|0|0|0|1|02|com
1|#gmail|1|0|0|0|1|05|com
1|#yahoo|0|1|0|0|0|27|com

How to pass filename through variable to be read it by awk

Good day,
I was wondering how to pass the filename to awk as variable, in order to awk read it.
So far I have done:
echo file1 > Aenumerar
echo file2 >> Aenumerar
echo file3 >> Aenumerar
AE=`grep -c '' Aenumerar`
r=1
while [ $r -le $AE ]; do
lista=`awk "NR==$r {print $0}" Aenumerar`
AEList=`grep -c '' $lista`
s=1
while [ $s -le $AEList ]; do
word=`awk -v var=$s 'NR==var {print $1}' $lista`
echo $word
let "s = s + 1"
done
let "r = r + 1"
done
Thanks so much in advance for any clue or other simple way to do it with bash command line
Instead of:
awk "NR==$r {print $0}" Aenumerar
You need to use:
awk -v r="$r" 'NR==r' Aenumerar
Judging by what you've posted, you don't actually need all the NR stuff; you can replace your whole script with this:
while IFS= read -r lista ; do
awk '{print $1}' "$lista"
done < Aenumerar
(This will print the first field of each line in each of file1, file2, file3. I think that's what you're trying to do?)

Bash script read specifc value from files of an entire folder

I have a problem creating a script that reads specific value from all the files of an entire folder
I have a number of email files in a directory and I need to extract from each file, 2 specific values.
After that I have to put them into a new file that looks like that:
--------------
To: value1
value2
--------------
This is what I want to do, but I don't know how to create the script:
# I am putting the name of the files into a temp file
`ls -l | awk '{print $9 }' >tmpfile`
# use for the name of a file
`date=`date +"%T"
# The first specific value from file (phone number)
var1=`cat tmpfile | grep "To: 0" | awk '{print $2 }' | cut -b -10 `
# The second specific value from file(subject)
var2=cat file | grep Subject | awk '{print $2$3$4$5$6$7$8$9$10 }'
# Put the first value in a new file on the first row
echo "To: 4"$var1"" > sms-$date
# Put the second value in the same file on the second row
echo ""$var2"" >>sms-$date
.......
and do the same for every file in the directory
I tried using while and for functions but I couldn't finalize the script
Thank You
I've made a few changes to your script, hopefully they will be useful to you:
#!/bin/bash
for file in *; do
var1=$(awk '/To: 0/ {print substr($2,0,10)}' "$file")
var2=$(awk '/Subject/ {for (i=2; i<=10; ++i) s=s$i; print s}' "$file")
outfile="sms-"$(date +"%T")
i=0
while [ -f "$outfile" ]; do outfile="sms-$date-"$((i++)); done
echo "To: 4$var1" > "$outfile"
echo "$var2" >> "$outfile"
done
The for loop just goes through every file in the folder that you run the script from.
I have added added an additional suffix $i to the end of the file name. If no file with the same date already exists, then the file will be created without the suffix. Otherwise the value of $i will keep increasing until there is no file with the same name.
I'm using $( ) rather than backticks, this is just a personal preference but it can be clearer in my opinion, especially when there are other quotes about.
There's not usually any need to pipe the output of grep to awk. You can do the search in awk using the / / syntax.
I have removed the cut -b -10 and replaced it with substr($2, 0, 10), which prints the first 10 characters from column 2.
It's not much shorter but I used a loop rather than the $2$3..., I think it looks a bit neater.
There's no need for all the extra " in the two output lines.
I sugest to try the following:
#!/bin/sh
RESULT_FILE=sms-`date +"%T"`
DIR=.
fgrep -l 'To: 0' "$DIR" | while read FILE; do
var1=`fgrep 'To: 0' "$FILE" | awk '{print $2 }' | cut -b -10`
var2=`fgrep 'Subject' "$FILE" | awk '{print $2$3$4$5$6$7$8$9$10 }'`
echo "To: 4$var1" >>"$RESULT_FIL"
echo "$var2" >>"$RESULT_FIL"
done

Dynamic Patch Counter for Shell Script

I am developing a script on a Solaris 10 SPARC machine to calculate how many patches got installed successfully during a patch delivery. I would like to display to the user:
(X) of 33 patches were successfully installed
I would like my script to output dynamically replacing the "X" so the user knows there is activity occurring; sort of like a counter. I am able to show counts, but only on a new line. How can I make the brackets update dynamically as the script performs its checks? Don't worry about the "pass/fail" ... I am mainly concerned with making my output update in the bracket.
for x in `cat ${PATCHLIST}`
do
if ( showrev -p $x | grep $x > /dev/null 2>&1 ); then
touch /tmp/patchcheck/* | echo "pass" >> /tmp/patchcheck/$x
wc /tmp/patchcheck/* | tail -1 | awk '{print $1}'
else
touch /tmp/patchcheck/* | echo "fail" >> /tmp/patchcheck/$x
wc /tmp/patchcheck/* | tail -1 | awk '{print $1}'
fi
done
The usual way to do that is to emit a \r carriage return (CR) at some point and to omit the \n newline or line feed (LF) at the end of the line. Since you're using awk, you can try:
awk '{printf "\r%s", $1} END {print ""}'
For most lines, it outputs a carriage return and the data in field 1 (without a newline at the end). At the end of the input, it prints an empty string followed by a newline.
One other possibility is that you should place the awk script outside your for loop:
for x in `cat ${PATCHLIST}`
do
if ( showrev -p $x | grep $x > /dev/null 2>&1 ); then
touch /tmp/patchcheck/* | echo "pass" >> /tmp/patchcheck/$x
wc /tmp/patchcheck/* | tail -1
else
touch /tmp/patchcheck/* | echo "fail" >> /tmp/patchcheck/$x
wc /tmp/patchcheck/* | tail -1
fi
done | awk '{ printf "\r%s", $1} END { print "" }'
I'm not sure but I think you can apply similar streamlining to the rest of the repetitious code in the script:
for x in `cat ${PATCHLIST}`
do
if showrev -p $x | grep -s $x
then echo "pass"
else echo "fail"
fi >> /tmp/patchcheck/$x
wc /tmp/patchcheck/* | tail -1
done | awk '{ printf "\r%s", $1} END { print "" }'
This eliminates the touch (which doesn't seem to do much), and especially not when the empty output of touch is piped to echo which ignores its standard input. It eliminates the sub-shell in the if line; it uses the -s option of grep to keep it quiet.
I'm still a bit dubious about the wc line. I think you're looking to count the number of files, in effect, since each file should contain one line (pass or fail), unless you listed some patch twice in the file identified by ${PATCHLIST}. In which case, I'd probably use:
for x in `cat ${PATCHLIST}`
do
if showrev -p $x | grep -s $x
then echo "pass"
else echo "fail"
fi >> /tmp/patchcheck/$x
ls /tmp/patchcheck | wc -l
done | awk '{ printf "\r%s", $1} END { print "" }'
This lists the files in /tmp/patchcheck and counts the number of lines output. It means you could simply print $0 in the awk script since $0 and $1 are the same. To the extent efficiency matters (not a lot), this is more efficient because ls only scans a directory, rather than having wc open each file. But it is more particularly a more accurate description of what you are trying to do. If you later want to count the passes, you can use:
for x in `cat ${PATCHLIST}`
do
if showrev -p $x | grep -s $x
then echo "pass"
else echo "fail"
fi >> /tmp/patchcheck/$x
grep '^pass$' /tmp/patchcheck/* | wc -l
done | awk '{ printf "\r%s", $1} END { print "" }'
Of course, this goes back to reading each file, but you're getting more refined information out of it now (and that's the penalty for the more refined information).
Here is how I got my patch installation script working the way I wanted:
while read pkgline
do
patchadd -d ${pkgline} >> /var/log/patch_install.log 2>&1
# Create audit file for progress indicator
for x in ${pkgline}
do
if ( showrev -p ${x} | grep -i ${x} > /dev/null 2>&1 ); then
echo "${x}" >> /tmp/pass
else
echo "${x}" >> /tmp/fail
fi
done
# Progress indicator
for y in `wc -l /tmp/pass | awk '{print $1}'`
do
printf "\r${y} out of `wc -l /patchdir/master | awk '{print $1}'` packages installed for `hostname`. Last patch installed: (${pkgline})"
done
done < /patchdir/master

Resources