Inserting prefix before output of lines printed by awk command - bash

My program prints the even lines of every file in the current directory:
for file in . *
do
awk 'NR % 2 == 0' "$file"
done
I would like it to print the name of the file followed by a colon before every line in the output. I can't find a way to insert anything while the awk command is doing it's job. Is it impossible to do this using awk? Thank you in advance for any suggestions.

awk supports the FILENAME variable which contains, guess what?, the filename. You don't even need the shell loop. Simply:
awk 'NR % 2 == 0 {printf "%s:%s\n", FILENAME, $0}' *

how about this?
IFS=$'\r\n'
for file in . *
do
for line in `awk 'NR % 2 == 0' "$file"`
do
echo $file: $line
done
done

Related

print lines that first column not in the list

I have a list of numbers in a file
cat to_delete.txt
2
3
6
9
11
and many txt files in one folder. Each file has tab delimited lines (can be more lines than this).
3 0.55667 0.66778 0.54321 0.12345
6 0.99999 0.44444 0.55555 0.66666
7 0.33333 0.34567 0.56789 0.34543
I want to remove the lines that the first number ($1 for awk) is in to_delete.txt and print only the lines that the first number is not in to_delete.txt. The change should be replacing the old file.
Expected output
7 0.33333 0.34567 0.56789 0.34543
This is what I got so far, which doesn't remove anything;
for file in *.txt; do awk '$1 != /2|3|6|9|11/' "$file" > "$tmp" && mv "$tmp" "$file"; done
I've looked through so many similar questions here but still cannot make it work. I also tried grep -v -f to_delete.txt and sed -n -i '/$to_delete/!p'
Any help is appreciated. Thanks!
In awk:
$ awk 'NR==FNR{a[$1];next}!($1 in a)' delete file
Output:
7 0.33333 0.34567 0.56789 0.34543
Explained:
$ awk '
NR==FNR { # hash records in delete file to a hash
a[$1]
next
}
!($1 in a) # if $1 not found in record in files after the first, output
' delete files* # mind the file order
My first idea was this:
printf "%s\n" *.txt | xargs -n1 sed -i "$(sed 's!.*!/& /d!' to_delete.txt)"
printf "%s\n" *.txt - outputs the *.txt files each on separate lines
| xargs -n1 execute the following command for each line passing the line content as the input
sed -i - edit file in place
$( ... ) - command substitution
sed 's!.*!/^& /d!' to_delete.txt - for each line in to_delete.txt, append the line with /^ and suffix with /d. That way from the list of numbers I get a list of regexes to delete, like:
/^2 /d
/^3 /d
/^6 /d
and so on. Which tells sed to delete lines matching the regex - line starting with the number followed by a space.
But I think awk would be simpler. You could do:
awk '$1 != 2 && $1 != 3 && $1 != 6 ... and so on ...`
but that would be longish, unreadable. It's easier to read the map from the file and then check if the number is in the array:
awk 'FNR==NR{ map[$1] } FNR!=NR && !($1 in map)' to_delete.txt "$file"
The FNR==NR is true only for the first file. So when we read it, we set the map[$1] (we "set" it, just so such element exists). Then FNR!=NR is true for the second file, for which we check if the first element is the key in the map. If it is not, the expression is true and the line gets printed out.
all together:
for file in *.txt; do awk 'FNR==NR{ map[$1] } FNR!=NR && !($1 in map)' to_delete.txt "$file" > "$tmp"; mv "$tmp" "$file"; done

Script returned '/usr/bin/awk: Argument list too long' in using -v in awk command

Here is the part of my script that uses awk.
ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
This works perfectly, but when I tried to get data to two or more files like this.
ids=`cut -d ',' -f1 $file1 $file2 $file3 | sed ':a;N;$!ba;s/\n/,/g'`
It returned this error.
/usr/bin/awk: Argument list too long
As I researched, it was not caused by the number of files, but the number of ids fetched.
Does anybody have an idea on how to solve this? Thanks.
You could use an environment variable to pass the data to awk. In awk the environment variables are accessible via an array ENVIRON.
So try something like this:
export ids=`cut -d ',' -f1 $file | sed ':a;N;$!ba;s/\n/,/g'`
awk -F',' 'NR > 1 {if(index(ENVIRON["ids"],$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
Change the way you generate your ids so they come out one per line, like this, which I use as a very simple way to generate ids 2,3 and 9:
echo 2; echo 3; echo 9
2
3
9
Now pass that as the first file to awk and your $input_file as the second file to awk:
awk '...' <(echo 2; echo 3; echo 9) "$input_file"
In bash you can generate a pseudo-file with the output of a process using <(some commands), and that is what I am using.
Now, in your awk, pick up the ids from the first file like this:
awk 'FNR==NR{ids[$1]++;next}' <(echo 2; echo 3; echo 9)
which will set ids[2]=1, ids[3]=1 and ids[9]=1.
Then pass both your files and add in your original processing:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(echo 2; echo 3; echo 9) "$input_file"
So, for my final answer, your entire code will look like:
awk 'FNR==NR{ids[$1]++;next} {if($2 in ids) print $0",true"; else print $0",false"}' <(cut ... file1 file2 file3 | sed ...) "$input_file"
As #hek2mgl alludes in the comments, you can likely just pass the files which include the ids to awk "as is" and let awk find the ids itself rather than using cut and sed. If there are many, you can make them all come to awk as the first file with:
awk '...' <(cat file1 file2 file3) "$input_file"
There's 2 problems in your script:
awk -vdata="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' $input_file >> $output_file
that could be causing that error:
-vdata=.. - that is gawk-specific, in other awks you need to leave a space between -v and data=. So if you aren't running gawk then idk what your awk will make of that statement but it might treat it as multiple args.
$input_file - you MUST quote shell variables unless you have a specific purpose in mind by leaving them unquoted. If $input_file contains globbing chars or spaces then you leaving it unquoted will cause them to be expanded into potentially multiple files/args.
So try this:
awk -v data="$ids" -F',' 'NR > 1 {if(index(data,$2)>0){print $0",true"}else{print $0",false"}}' "$input_file" >> "$output_file"
and see if you still have the problem. Your script does have other unrelated issues of course, some of which have already been pointed out, and you can post a followup question if you want help with those, but just FYI that awk script could be written more concisely as:
awk -v data="$ids" 'BEGIN{FS=OFS=","} NR > 1{print $0, (index(data,$2) ? "true" : "false")}'

Can anyone please explain me the following unix script?

I recently had to debug some old scripts and struck at this code. Please explain me what the awk is doing here.
#!/bin/ksh
set -x on
ls -1 ../Rejectfiles/*.csv 2>/dev/null | while read file
do
filename=${file##*/}
if [ -f ../Processed/$filename ]
then
awk '{ if (NR > 1){ print $0;}}' $file >> ../Processed/$filename
else
cp $file ../Processed/
fi
done
awk '{ if (NR > 1){ print $0;}}' $file >> ../Processed/$filename
Write all lines from $file without 1 line to ../Processed/$filename
man awk | grep -i " NR "
NR current record number in the total input stream.
also you can use sed
sed -n '1!p' $file >> ../Processed/$filename
Usually sed is more fast.
man awk states clearly:
NR - ordinal number of the current record
As #Sundeep noted in the comments of #RichardS his answer:
awk '{ if (NR > 1){ print $0;}}' $file thus removes the first ordinal number from the file. Given the input file is a CSV file the first ordinal number means the first line in the file. As #RichardS has mentioned that makes perfect sense in a CSV file (given the fact the first line in a CSV usually contains the description of the underlying values).

How to read a file for special lines in bash script

I just want to read even line number from a file in bash shell, how to do it?
Also I just want to read the fifth line of a file, then how do it?
awk 'NR % 2 == 1' <filename>
For the second one:
awk 'NR == 5' <filename>
You can also use sed to get numbers in a specified range:
sed -ne '5,5p' <filename>
You could use the tail command. Put it in a for loop for the first case and the second is totally trivial if you get the first.
Or maybe you could even use awk:
awk NR==5 file_name
To read even number files using gnu-sed:
sed -n "2~2 p" file
To print specific line # from a file using sed:
sed '5q;d' file
Awk is often the answer (or, nowadays, Perl, Python etc. too)
If for some reason you must do it with only bash and the basic shell utilities:
cat file | \
while read line; do
i=$(( (i + 1) % 2 ))
if [[ $i -eq 0 ]]; then
echo $line // or whatever else you wanted to do with it
fi
done
And to get a specific line:
cat file | head -5 | tail -1
try this:
for example lines between 3 and 6
awk 'NR>=3 && NR<=6'`
These is a help to improve it(but not completed)
#!/bin/bash
test=`cat input.txt | awk 'NR>=3 && NR<=6'`
while read line; do
#do stuff
done <input.txt

Save variable from txt using awk

I have a txt in my folder named parameters.txt which contains
PP1 20 30 40 60
PP2 0 0 0 0
I'd like to use awk to read the different parameters depending on the value of the first text field in each line. At the moment, if I run
src_dir='/PP1/'
awk "$src_dir" '{ print $2 }' parameters.txt
I correctly get
20
I would simply like to store that 20 into a variable and to export the variable itself.
Thanks in advance!
If you want to save the output, do var=$(awk expression):
result=$(awk -v value=$src_dir '($1==value) { print $2 }' parameters.txt)
You can make your command more general giving awk the variable with the -v syntax:
$ var="PP1"
$ awk -v v=$var '($1==v) { print $2 }' a
20
$ var="PP2"
$ awk -v v=$var '($1==v) { print $2 }' a
0
You don't really need awk for that. You can do it in bash.
$ src_dir="PP1"
$ while read -r pattern columns ; do
set - $columns
if [[ $pattern =~ $src_dir ]]; then
variable=$2
fi
done < parameters.txt
shell_pattern=PP1
output_var=$(awk -v patt=$shell_pattern '$0 ~ patt {print $2}' file)
Note that $output_var may contain more than one value if the pattern matches more than one line. If you're only interested in the first value, then have the awk program exit after printing .

Resources