For loop in a awk command - shell

I have a file which has rows , now i want to read it'w value from awk command in Unix. I am able to read that file , but i have added a for loop to traverse all the data into the file. But my for loop is not ending it is going in infinite loop.
Below code i am using to read the file and get the data of $1 ,$2 and $3 position
file=$1;
nbrClients=`wc -l $file | cut -d' ' -f1`;
echo $nbrClients;
awk '{
for(i=0; i<=$nbrClients; ++i)
{print $1 $2 $3}
}' $file
File which i am reading has below format :
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
abc 12 test.txt
So for this nbrClients value will be 6 and it should loop for 6 times but it is not doing so .Please suggest what wrong i am doing in this.
Here is the full code which i am trying to :
file=$1;
nbrClients=`wc -l $file | cut -d' ' -f1`;
echo $nbrClients;
file=$1;
cat | awk '{
fileName=$1
tnxCount=$2
for i in `seq 1 $tnxCount`
do
echo "Starting thread number $i"
nohup perl /home/user/abc.pl -i $fileName >>/home/user/test_load_${today}.out 2>&1 &
done
}' $file;

I think the problem here is that you're under the impression that the for loop is what will cause awk to step through your input file, whereas it's awk's nature to do that already.
Awk works by taking a set of condition { statement } pairs, and then FOR EACH LINE OF INPUT, evaluating the condition, and if it rings true, executing the statement. Note that conditions can be statements (since functions and other commands have a return value) and statements can include if constructs, so there's a lot of flexibility here.
Note that awk can also reduce or simplify stuff you'd do in a shell script. Consider the following:
#!/bin/sh
file="$1"
awk '
NR==FNR {
ClientCount++
next
}
FNR==1 {
printf "%s: %d\n", FILENAME, ClientCount
}
{
print $1, $2, $3
}
' "$file" "$file"
This script reads your input file twice -- once to count the lines (so that the line count can be placed at the top of theoutput), and once to process the lines, printing the first three fields. The script is composed of three condition { statement } groupings:
The first one is the counter. It only operates on the first instance of the file, and the next command insures that no other commands will be run on that file.
The second one operates on the first line of the file. But since the first condition captured all of the first file, this statement will only be executed once, when the first line of the second file is in play.
The third one is what prints the bulk of your output. With awk, when no condition is included, the condition is assumed to be "true", so this statement runs for each line of the second file.
The awk script could of course be compressed onto a single line, I've spaced it out for easier reading.
Note also that this method of keeping or showing a line count might be a little heavy handed. If you know that you're just showing a line count, you can use the internal awk variable NR. At the point in your script where the second condition is evaluated, NR-1 is the line count of the previous file, so you could use:
#!/bin/sh
file="$1"
awk '
NR==FNR {
next
}
FNR==1 {
printf "%s: %d\n", FILENAME, NR-1
}
{
print $1, $2, $3
}
' "$file" "$file"

updating the answer based on comment and latest version of the question
file=$1;
nbrClients=`wc -l $file | cut -d' ' -f1`;
echo $nbrClients;
file=$1;
cat $file | awk -v fileName="$1" -v tnxCount="$2" '{
system("echo "Starting thread number $i"")
system("nohup perl /home/user/abc.pl -i $fileName >>/home/user/test_load_${today}.out 2>&1 &")
}';

Related

Search equality in a certain field with AWK [duplicate]

This question already has answers here:
How do I use shell variables in an awk script?
(7 answers)
Closed 1 year ago.
I am trying to get the name out of /etc/passwd using awk to search only in the 5th field of every row, and then to cut some part of that line and print it out.
This is what I wrote but it doesn't seems to work:
for iter in "$#";
do cat /etc/passwd | awk -F ":" '$5==$iter' | cut -d":" -f6;
done;
concerning the delimiter syntax, everything should be fine I guess?
so my problem is in the $5==$iter, I assume.
How can I change that $5==$iter to - if the 5th field of that row contains my $iter var, then cut and so on..
Sorry for the ignorance, I am a beginner :)
Thanks in advance.
See How do I use shell variables in an awk script?
-v should be used to pass shell variables into awk. Also, there's no reason to use either cat or cut here:
for iter in "$#"; do
awk -F: -v iter="$iter" '$5==iter { print $6 }' </etc/passwd
done
As Charles Duffy commented, your code would be more efficient if it didn't need to read /etc/passwd every pass. And while this particular loop probably doesn't need to be optimized (after all, /etc/passwd is typically not that long and most OS's would cache the file anyway after the first read), it would be interesting to see an awk script read the file only once.
That said, here's another implementation where awk is only invoked once:
printf "%s\n" "$#" | awk -F: '
NR == FNR { etc_passwd[ $5 ] = $6; next }
{ print $0 , etc_passwd[ $0 ] }
' /etc/passwd /dev/stdin
The NR == FNR condition is an idiom that causes its associated command only to be executed for the first file in the list of files that follows the awk script (that is, for the reading of /etc/passwd).
You can also do everything in bash, example:
#!/bin/bash
declare -A passwd # declare a associative array
# build the associative array "passwd" with the
# 5th field as a "key" and 6th field as "value"
while IFS=$':\n' read -a line; do # emulate awk to extract fields
[[ -n "${line[4]}" ]] || continue # avoid blank "keys"
passwd["${line[4]}"]=${line[5]} # in bash, arrays starting in "0"
done < /etc/passwd
for iter in "$#"; do
if [ ${passwd[$iter] + 'x'} ]; then
echo ${passwd[$iter]}
fi
done
(This version doesn't get into accout mĂșltiples values for 5th field)
here is a better version that can handle blank values as well, ike./script.sh '':
while IFS=$':\n' read -a line; do
for iter in "$#"; do
if [ "$iter" == "${line[4]}" ]; then
echo ${line[5]}
continue
fi
done
done < /etc/passwd
A pure awk solution could be:
#!/usr/bin/awk -f
BEGIN {
FS = ":"
for ( i = 1; i < ARGC; i++ ) {
args[ARGV[i]] = 1
delete ARGV[i]
}
ARGV[1] = "/etc/passwd"
}
($5 in args) { print $6 }
and you could call as ./script.awk -f 'param1' 'param2'.

Can anyone please explain me the following unix script?

I recently had to debug some old scripts and struck at this code. Please explain me what the awk is doing here.
#!/bin/ksh
set -x on
ls -1 ../Rejectfiles/*.csv 2>/dev/null | while read file
do
filename=${file##*/}
if [ -f ../Processed/$filename ]
then
awk '{ if (NR > 1){ print $0;}}' $file >> ../Processed/$filename
else
cp $file ../Processed/
fi
done
awk '{ if (NR > 1){ print $0;}}' $file >> ../Processed/$filename
Write all lines from $file without 1 line to ../Processed/$filename
man awk | grep -i " NR "
NR current record number in the total input stream.
also you can use sed
sed -n '1!p' $file >> ../Processed/$filename
Usually sed is more fast.
man awk states clearly:
NR - ordinal number of the current record
As #Sundeep noted in the comments of #RichardS his answer:
awk '{ if (NR > 1){ print $0;}}' $file thus removes the first ordinal number from the file. Given the input file is a CSV file the first ordinal number means the first line in the file. As #RichardS has mentioned that makes perfect sense in a CSV file (given the fact the first line in a CSV usually contains the description of the underlying values).

Bash Shell: Infinite Loop

The problem is the following I have a file that each line has this form:
id|lastName|firstName|gender|birthday|joinDate|IP|browser
i want to sort alphabetically all the firstnames in that file and print them one on each line but each name only once
i have created the following program but for some reason it creates an infinite loop:
array1=()
while read LINE
do
if [ ${LINE:0:1} != '#' ]
then
IFS="|"
array=($LINE)
if [[ "${array1[#]}" != "${array[2]}" ]]
then
array1+=("${array[2]}")
fi
fi
done < $3
echo ${array1[#]} | awk 'BEGIN{RS=" ";} {print $1}' | sort
NOTES
if [ ${LINE:0:1} != '#' ] : this command is used because there are comments in the file that i dont want to print
$3 : filename
array1 : is used for all the seperate names
Wow, there's a MUCH simpler and cleaner way to achieve this, without having to mess with the IFS variable or using arrays. You can use "for" to do this:
First I created a file with the same structure as yours:
$ cat file
id|lastName|Douglas|gender|birthday|joinDate|IP|browser
id|lastName|Tim|gender|birthday|joinDate|IP|browser
id|lastName|Andrew|gender|birthday|joinDate|IP|browser
id|lastName|Sasha|gender|birthday|joinDate|IP|browser
#id|lastName|Carly|gender|birthday|joinDate|IP|browser
id|lastName|Madson|gender|birthday|joinDate|IP|browser
Here's the script I wrote using "for":
#!/bin/bash
for LINE in `cat file | grep -v "^#" | awk -F'|' '{print$3}' | sort -u`
do
echo $LINE
done
And here's the output of this script:
$ ./script.sh
Andrew
Douglas
Madson
Sasha
Tim
Explanation:
for LINE in `cat file`
Creates a loop that reads each line of "file". The commands between ` are run by linux, for example, if you wanted to store the date inside of a variable you could use "VARDATE=`date`".
grep -v "^#"
The option -v is used to exclude results matching the pattern, in this case the pattern is "^#". The "^" character means "line begins with". So grep -v "^#" means "exclude lines beginning with #".
awk -F'|' '{print$3}'
The -F option switches the column delimiter from the default (the default is a space) to whatever you put between ' after it, in this case the "|" character.
The '{print$3}' prints the 3rd column.
sort -u
And the "sort -u" command to sort the names alphabetically.

unix command to get lines from in between first and last occurence of a word and write to a file

I want a unix command to find the lines between first & last occurence of a word
For example:
let's imagine we have 1000 lines. Tenth line contains word "stackoverflow", thirty fifth line also contains word "stackoverflow".
I want to print lines between 10 and 35 and write it to a new file.
You can make it in two steps. The basic idea is to:
1) get the line number of the first and last match.
2) print the range of lines in between these range.
$ read first last <<< $(grep -n stackoverflow your_file | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file
Explanation
read first last reads two values and stores them in $first and $last.
grep -n stackoverflow your_file greps and shows the output like this: number_of_line:output
awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}') prints the number of line of the first and last match of stackoverflow in the file.
And
awk -v f=$first -v l=$last 'NR>=f && NR<=l' your_file prints all lines from $first line number till $last line number.
Test
$ cat a
here we
have some text
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
to make more fun
blablabla
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ awk -v f=$first -v l=$last 'NR>=f && NR<=l' a
stackoverflow
and other things
bla
bla
bla bla
stackoverflow
and whatever else
stackoverflow
By steps:
$ grep -n stackoverflow a
3:stackoverflow
9:stackoverflow
11:stackoverflow
$ grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}'
3 11
$ read first last <<< $(grep -n stackoverflow a | awk -F: 'NR==1 {printf "%d ", $1}; END{print $1}')
$ echo "first=$first, last=$last"
first=3, last=11
If you know an upper bound of how many lines there can be (say, a million), then you can use this simple abusive script:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow) < file
You can append | tail -n +2 | head -n -1 to strip the border lines as well:
(grep -A 100000 stackoverflow | grep -B 1000000 stackoverflow
| tail -n +2 | head -n -1) < file
I'm not 100% sure from the question whether the output should be inclusive of the first and last matching lines, so I'm assuming it is. But this can be easily changed if we want exclusive instead.
This pure-bash solution does it all in one step - i.e. the file (or pipe) is only read once:
#!/bin/bash
function midgrep {
while read ln; do
[ "$saveline" ] && linea[$((i++))]=$ln
if [[ $ln =~ $1 ]]; then
if [ "$saveline" ]; then
for ((j=0; j<i; j++)); do echo ${linea[$j]}; done
i=0
else
saveline=1
linea[$((i++))]=$ln
fi
fi
done
}
midgrep "$1"
Save this as a script (e.g. midgrep.sh) and pipe whatever output you like to it as follows:
$ cat input.txt | ./midgrep.sh stackoverflow
This works as follows:
find the first matching line and buffer in the first element of an array
continue reading lines until the next match, buffering to the array as we go
on each subsequent matches, flush the buffer array to output
continue reading file to the end. If there are no more matches, then the last buffer is simply discarded.
The advantage of this approach is that we only read through the input one time only. The disadvantage is that we buffer everything between each match - if there are many lines between each match, then these are all buffered to memory, until we hit the next match.
Also this uses the bash =~ regular expression operator to keep this pure bash. But you could replace this with a grep instead, if you are more comfortable with that.
Using perl :
perl -00 -lne '
chomp(my #arr = split /stackoverflow/);
print join "\nstackoverflow", #arr[1 .. $#arr -1 ]
' file.txt | tee newfile.txt
The idea behind this is to feed an array of the whole input file in to chunks using "stackoverflow" string to split. Next, we print the 2nd occurrences to the last -1 with join "stackoverflow".

Check if a particular string is in a file bash

I want to write a script to check for duplicates
For example: I have a text file with information in the format of /etc/passwd
alice:x:1008:555:William Williams:/home/bill:/bin/bash
bob:x:1018:588:Bobs Boos:/home/bob:/bin/bash
bob:x:1019:528:Robt Ross:/home/bob:/bin/bash
james:x:1012:518:Tilly James:/home/bob:/bin/bash
I want to simply check if there are duplicate users and if there are, output the line to standard error. So in the example above since bob appears twice my output would simply generate something like:
Error duplicate user
bob:x:1018:588:Bobs Boos:/home/bob:/bin/bash
bob:x:1019:528:Robt Ross:/home/bob:/bin/bash
Right now I have a while loop that reads each line and stores each piece of information in a variable using awk -F that is delimited with ":". After storing my username I am not too sure on the best approach to check to see if it already exists.
Some parts of my code:
while read line; do
echo $line
user=`echo $line | awk -F : '{print $1}'`
match=`grep $user $1`($1 is the txtfile)
if [ $? -ne 0 ]; then
echo "Unique user"
else
echo "Not unique user"
then somehow grep those lines and output it
fi
done
The matching does not produce the right results
Suggestions?
instead of re-inventing the wheel, use the following tools:
cut to extract first field
sort and uniq to keep duplicated lines only.
cut -d : -f 1 | sort | uniq -d | while read i ; do
echo "error: duplicate user $i"
done
Sounds like a job for awk to me:
% awk -F':' '
/:/ {
count[$1] += 1
}
END {
for (user in count) {
if (count[user] > 1) {
print user " appears in the file " count[user] " times."
}
}
}
' /etc/passwd
A perl-proposal:
perl -F: -lanE 'push #{$h{$F[0]}},$_; END{for $k (keys %h){if(#{$h{$k}}>1){say "Error";say for #{$h{$k}}}}}' file

Resources