Can anyone please explain me the following unix script? - bash

I recently had to debug some old scripts and struck at this code. Please explain me what the awk is doing here.
#!/bin/ksh
set -x on
ls -1 ../Rejectfiles/*.csv 2>/dev/null | while read file
do
filename=${file##*/}
if [ -f ../Processed/$filename ]
then
awk '{ if (NR > 1){ print $0;}}' $file >> ../Processed/$filename
else
cp $file ../Processed/
fi
done

awk '{ if (NR > 1){ print $0;}}' $file >> ../Processed/$filename
Write all lines from $file without 1 line to ../Processed/$filename
man awk | grep -i " NR "
NR current record number in the total input stream.
also you can use sed
sed -n '1!p' $file >> ../Processed/$filename
Usually sed is more fast.

man awk states clearly:
NR - ordinal number of the current record
As #Sundeep noted in the comments of #RichardS his answer:
awk '{ if (NR > 1){ print $0;}}' $file thus removes the first ordinal number from the file. Given the input file is a CSV file the first ordinal number means the first line in the file. As #RichardS has mentioned that makes perfect sense in a CSV file (given the fact the first line in a CSV usually contains the description of the underlying values).

Related

For loop in a awk command

I have a file which has rows , now i want to read it'w value from awk command in Unix. I am able to read that file , but i have added a for loop to traverse all the data into the file. But my for loop is not ending it is going in infinite loop.
Below code i am using to read the file and get the data of $1 ,$2 and $3 position
file=$1;
nbrClients=`wc -l $file | cut -d' ' -f1`;
echo $nbrClients;
awk '{
for(i=0; i<=$nbrClients; ++i)
{print $1 $2 $3}
}' $file
File which i am reading has below format :
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
So for this nbrClients value will be 6 and it should loop for 6 times but it is not doing so .Please suggest what wrong i am doing in this.
Here is the full code which i am trying to :
file=$1;
nbrClients=`wc -l $file | cut -d' ' -f1`;
echo $nbrClients;
file=$1;
cat | awk '{
fileName=$1
tnxCount=$2
for i in `seq 1 $tnxCount`
do
echo "Starting thread number $i"
nohup perl /home/user/abc.pl -i $fileName >>/home/user/test_load_${today}.out 2>&1 &
done
}' $file;
I think the problem here is that you're under the impression that the for loop is what will cause awk to step through your input file, whereas it's awk's nature to do that already.
Awk works by taking a set of condition { statement } pairs, and then FOR EACH LINE OF INPUT, evaluating the condition, and if it rings true, executing the statement. Note that conditions can be statements (since functions and other commands have a return value) and statements can include if constructs, so there's a lot of flexibility here.
Note that awk can also reduce or simplify stuff you'd do in a shell script. Consider the following:
#!/bin/sh
file="$1"
awk '
NR==FNR {
ClientCount++
next
}
FNR==1 {
printf "%s: %d\n", FILENAME, ClientCount
}
{
print $1, $2, $3
}
' "$file" "$file"
This script reads your input file twice -- once to count the lines (so that the line count can be placed at the top of theoutput), and once to process the lines, printing the first three fields. The script is composed of three condition { statement } groupings:
The first one is the counter. It only operates on the first instance of the file, and the next command insures that no other commands will be run on that file.
The second one operates on the first line of the file. But since the first condition captured all of the first file, this statement will only be executed once, when the first line of the second file is in play.
The third one is what prints the bulk of your output. With awk, when no condition is included, the condition is assumed to be "true", so this statement runs for each line of the second file.
The awk script could of course be compressed onto a single line, I've spaced it out for easier reading.
Note also that this method of keeping or showing a line count might be a little heavy handed. If you know that you're just showing a line count, you can use the internal awk variable NR. At the point in your script where the second condition is evaluated, NR-1 is the line count of the previous file, so you could use:
#!/bin/sh
file="$1"
awk '
NR==FNR {
next
}
FNR==1 {
printf "%s: %d\n", FILENAME, NR-1
}
{
print $1, $2, $3
}
' "$file" "$file"
updating the answer based on comment and latest version of the question
file=$1;
nbrClients=`wc -l $file | cut -d' ' -f1`;
echo $nbrClients;
file=$1;
cat $file | awk -v fileName="$1" -v tnxCount="$2" '{
system("echo "Starting thread number $i"")
system("nohup perl /home/user/abc.pl -i $fileName >>/home/user/test_load_${today}.out 2>&1 &")
}';

Script returned '/usr/bin/awk: Argument list too long' in using -v in awk command

Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.
You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"
There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'

Inserting prefix before output of lines printed by awk command

My program prints the even lines of every file in the current directory:
for file in . *
do
awk 'NR % 2 == 0' "$file"
done
I would like it to print the name of the file followed by a colon before every line in the output. I can't find a way to insert anything while the awk command is doing it's job. Is it impossible to do this using awk? Thank you in advance for any suggestions.
awk supports the FILENAME variable which contains, guess what?, the filename. You don't even need the shell loop. Simply:
awk 'NR % 2 == 0 {printf "%s:%s\n", FILENAME, $0}' *
how about this?
IFS=$'\r\n'
for file in . *
do
for line in `awk 'NR % 2 == 0' "$file"`
do
echo $file: $line
done
done

Bash script read specifc value from files of an entire folder

I have a problem creating a script that reads specific value from all the files of an entire folder
I have a number of email files in a directory and I need to extract from each file, 2 specific values.
After that I have to put them into a new file that looks like that:
--------------
To: value1
value2
--------------
This is what I want to do, but I don't know how to create the script:
# I am putting the name of the files into a temp file
`ls -l | awk '{print $9 }' >tmpfile`
# use for the name of a file
`date=`date +"%T"
# The first specific value from file (phone number)
var1=`cat tmpfile | grep "To: 0" | awk '{print $2 }' | cut -b -10 `
# The second specific value from file(subject)
var2=cat file | grep Subject | awk '{print $2$3$4$5$6$7$8$9$10 }'
# Put the first value in a new file on the first row
echo "To: 4"$var1"" > sms-$date
# Put the second value in the same file on the second row
echo ""$var2"" >>sms-$date
.......
and do the same for every file in the directory
I tried using while and for functions but I couldn't finalize the script
Thank You
I've made a few changes to your script, hopefully they will be useful to you:
#!/bin/bash
for file in *; do
var1=$(awk '/To: 0/ {print substr($2,0,10)}' "$file")
var2=$(awk '/Subject/ {for (i=2; i<=10; ++i) s=s$i; print s}' "$file")
outfile="sms-"$(date +"%T")
i=0
while [ -f "$outfile" ]; do outfile="sms-$date-"$((i++)); done
echo "To: 4$var1" > "$outfile"
echo "$var2" >> "$outfile"
done
The for loop just goes through every file in the folder that you run the script from.
I have added added an additional suffix $i to the end of the file name. If no file with the same date already exists, then the file will be created without the suffix. Otherwise the value of $i will keep increasing until there is no file with the same name.
I'm using $( ) rather than backticks, this is just a personal preference but it can be clearer in my opinion, especially when there are other quotes about.
There's not usually any need to pipe the output of grep to awk. You can do the search in awk using the / / syntax.
I have removed the cut -b -10 and replaced it with substr($2, 0, 10), which prints the first 10 characters from column 2.
It's not much shorter but I used a loop rather than the $2$3..., I think it looks a bit neater.
There's no need for all the extra " in the two output lines.
I sugest to try the following:
#!/bin/sh
RESULT_FILE=sms-`date +"%T"`
DIR=.
fgrep -l 'To: 0' "$DIR" | while read FILE; do
var1=`fgrep 'To: 0' "$FILE" | awk '{print $2 }' | cut -b -10`
var2=`fgrep 'Subject' "$FILE" | awk '{print $2$3$4$5$6$7$8$9$10 }'`
echo "To: 4$var1" >>"$RESULT_FIL"
echo "$var2" >>"$RESULT_FIL"
done

unix command to get lines from in between first and last occurence of a word and write to a file

I want a unix command to find the lines between first & last occurence of a word
For example:
let's imagine we have 1000 lines. Tenth line contains word "stackoverflow", thirty fifth line also contains word "stackoverflow".
I want to print lines between 10 and 35 and write it to a new file.
You can make it in two steps. The basic idea is to:
1) get the line number of the first and last match.
2) print the range of lines in between these range.
$ read first last <<< $(grep -n stackoverflow your_file | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file
Explanation
read first last reads two values and stores them in $first and $last.
grep -n stackoverflow your_file greps and shows the output like this: number_of_line:output
awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}') prints the number of line of the first and last match of stackoverflow in the file.
And
awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file prints all lines from $first line number till $last line number.
Test
$ cat a
here we
have some text
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
to make more fun
blablabla
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' a
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
By steps:
$ grep -n stackoverflow a
3:stackoverflow
9:stackoverflow
11:stackoverflow
$ grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}'
3 11
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ echo "first=$first, last=$last"
first=3, last=11
If you know an upper bound of how many lines there can be (say, a million), then you can use this simple abusive script:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow) < file
You can append | tail -n +2 | head -n -1 to strip the border lines as well:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow
| tail -n +2 | head -n -1) < file
I'm not 100% sure from the question whether the output should be inclusive of the first and last matching lines, so I'm assuming it is. But this can be easily changed if we want exclusive instead.
This pure-bash solution does it all in one step - i.e. the file (or pipe) is only read once:
#!/bin/bash
function midgrep {
while read ln; do
[ "$saveline" ] && linea[$((i++))]=$ln
if [[ $ln =~ $1 ]]; then
if [ "$saveline" ]; then
for ((j=0; j<i; j++)); do echo ${linea[$j]}; done
i=0
else
saveline=1
linea[$((i++))]=$ln
fi
fi
done
}
midgrep "$1"
Save this as a script (e.g. midgrep.sh) and pipe whatever output you like to it as follows:
$ cat input.txt | ./midgrep.sh stackoverflow
This works as follows:
find the first matching line and buffer in the first element of an array
continue reading lines until the next match, buffering to the array as we go
on each subsequent matches, flush the buffer array to output
continue reading file to the end. If there are no more matches, then the last buffer is simply discarded.
The advantage of this approach is that we only read through the input one time only. The disadvantage is that we buffer everything between each match - if there are many lines between each match, then these are all buffered to memory, until we hit the next match.
Also this uses the bash =~ regular expression operator to keep this pure bash. But you could replace this with a grep instead, if you are more comfortable with that.
Using perl :
perl -00 -lne '
chomp(my #arr = split /stackoverflow/);
print join "\nstackoverflow", #arr[1 .. $#arr -1 ]
' file.txt | tee newfile.txt
The idea behind this is to feed an array of the whole input file in to chunks using "stackoverflow" string to split. Next, we print the 2nd occurrences to the last -1 with join "stackoverflow".

Resources