setting awk variables through inlining - shell

I've got this:
./awktest -v fields=`cat testfile`
which ought to set fields variable to '1 2 3 4 5' which is all that testfile contains
It returns:
gawk: ./awktest:9: fatal: cannot open file `2' for reading (No such file or directory)
When I do this it works fine.
./awktest -v fields='1 2 3 4 5'
printing fields at the time of error yields:
1
printing fields in the second instance yields:
1 2 3 4 5
When I try it with 12345 instead of 1 2 3 4 5 it works fine for both, so it's a problem with the white space. What is this problem? And how do I fix it.

This is most likely not an awk question. Most likely, it is your shell that is the culprit.
For example, if awktest is:
#!/bin/bash
i=1
for arg in "$#"; do
printf "%d\t%s\n" $i "$arg"
((i++))
done
Then you get:
$ ./awktest -v fields=`cat testfile`
1 -v
2 fields=1
3 2
4 3
5 4
6 5
You see that the file contents are not being handled as a single word.
Simple solution: use double quotes on the command line:
$ ./awktest -v fields="$(< testfile)"
1 -v
2 fields=1 2 3 4 5
The $(< file) construct is a bash shortcut for `cat file` that does not need to spawn an external process.
Or, read the first line of the file in the awk BEGIN block
awk '
BEGIN {getline fields < "testfile"}
rest of awk program ...
'

./awktest -v fields="`cat testfile`"
#note that:
#./awktest -v fields='`cat testfile`'
#does not work

Related

Cannot print in awk command in bash script

I am trying to read values from a file and print specific items into a variable which I will use later.
cat /dir1/file1 | while read blmbline2
do
BLMBFILE2=`print $blmbline2 | awk '{$1=""; print $0}'`
echo $BLMBFILE2
done
When I run that same code at the command line, it runs as expected, but, when I run it in a bash script called testme.sh, I get this error:
./testme.sh: line 3: print: command not found
If I run print by itself at the command prompt, I don't get an error (just a blank line).
If I run "bash" and then print at the command prompt, I get command not found.
I can't figure out what I'm doing wrong. Can someone suggest?
updated: I see some other posts that say to use echo or printf? Is there a difference I need to be concerned with in using one of those in bash?
Since awk can read files, you may be able to do away with the cat | while read and just use awk. Using a sample file containing:
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
Declare your bash array variable and populate with the output from awk:
arr=() ; arr=($(awk '{$1=""; print $0}' /dir1/file1))
Use the following to display array size and contents:
printf "array length: %d\narray contents: %s\n" "${#arr[#]}" "${arr[*]}"
Output:
array length: 30
array contents: 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6 2 3 4 5 6
Change print to echo in your shell script. With printf you can format the data and with echo it will print the entire line of the file. Also, create an array so you can store multiple items:
BLMBFILE2=()
while IFS= read -r -d $'\0'
do
BLMBFILE2+=(`echo $REPLY | awk '{$1=""; print $0}'`)
echo $BLMBFILE2
done < <(cat /dir1/file1)
echo "Items found:"
for value in "${BLMBFILE2[#]}"
do
echo $value
done

SED to spit out nth and (n+1)th lines

EDITS: For reference, "stuff" is a general variable, as is "KEEP".
KEEP could be "Hi, my name is Dave" on line 2 and "I love pie" on line 7. The numbers I've put here are for illustration only and DO NOT show up in the data.
I had a file that needed to be parsed, keeping every 4th line, starting at the 3rd line. In other words, it looked like this:
1 stuff
2 stuff
3 KEEP
4
5 stuff
6 stuff
7 KEEP
8 stuff etc...
Great, sed solved that easily with:
sed -n -e 3~4p myfile
giving me
3 KEEP
7 KEEP
11 KEEP
Now I have a different file format and a different take on the pattern:
1 stuff
2 KEEP
3 KEEP
4
5 stuff
6 KEEP
7 KEEP etc...
and I still want the output of
2 KEEP
3 KEEP
6 KEEP
7 KEEP
10 KEEP
11 KEEP
Here's the problem - this is a multi-pattern "pattern" for sed. It's "every 4th line, spit out 2 lines, but start at line 2".
Do I need to have some sort of DO/FOR loop in my sed, or do I need a different command like awk or grep? Thus far, I have tried formats like:
sed -n -e '3~4p;4~4p' myfile
and
awk 'NR % 3 == 0 || NR % 4 ==0' myfile
and
sed -n -e '3~1p;4~4p' myfile
and
awk 'NR % 1 == 0 || NR % 4 ==0' myfile
source: https://superuser.com/questions/396536/how-to-keep-only-every-nth-line-of-a-file
If your intent is to print lines 2,3 then every fourth line after those two, you can do:
$ seq 20 | awk 'BEGIN{e[2];e[3]} (NR%4) in e'
2
3
6
7
10
11
14
15
18
19
You were pretty close with your sed:
$ printf '%s\n' {1..12} | sed -n '2~4p;3~4p'
2
3
6
7
10
11
this is the idiomatic way to write in awk
$ awk 'NR%4==2 || NR%4==3' file
however, this special case can be shortened to
$ awk 'NR%4>1' file
This might work for you (GNU sed):
sed '2~4,+1p;d' file
Use a range, the first parameter is the starting line and modulus (in this case from line 2 modulus 4). The second parameter is how man lines following the start of the range (in this case plus one). Print these lines and delete all others.
In the generic case, you want to keep lines p to p+q and p+n to p+q+n and p+2n to p+q+2n ... So you can write:
awk '(NR - p) % n <= q'

Continuously-updated (running-count) output from a program reading from a pipeline

How can I get continuously-updated output from a program that's reading from a pipeline? For example, let's say that this program were a version of wc:
$ ls | running_wc
So I'd like this to output instantly, e.g.
0 0 0
and then every time a new output line is received, it'd update again, e.g.
1 2 12
2 4 24
etc.
Of course my command isn't really ls, it's a process that slowly outputs data... I'd actually love to dynamically have it count matches and non matches, and sum this info up on a single line, e.g,
$ my_process | count_matches error
This would constantly update a single line of output with the matching and non matching counts, e.g.
$ my_process | count_matches error
0 5
then later on it might look like so, since it's found 2 matches and 10 non matching lines.
$ my_process | count_matches error
2 10
dd will print out statistics if it receives a SIGUSR1 signal, but neither wc nor grep does that. You'll need to re-implement them, more or less.
count_matches() {
local pattern=$1
local matches=0 nonmatches=0
local line
while IFS= read -r line; do
if [[ $line == *$pattern* ]]; then ((++matches)); else ((++nonmatches)); fi
printf '\r%s %s' "$matches" "$nonmatches"
done
printf '\n'
}
Printing a carriage return \r each time causes the printouts to overwrite each other.
Most programs will switch from line buffering to full buffering when used in a pipeline. Your slow-running program should flush its output after each line to ensure the results are available immediately. Or if you can't modify it, you can often use stdbuf -oL to force programs that use C stdio to line buffer stdout.
stdbuf -oL my_process | count_matches error
Using awk. First we create the "my_process":
$ for i in {1..10} ; do echo $i ; sleep 1 ; done # slowly prints lines
The match counter:
$ awk 'BEGIN {
print "match","miss" # print header
m=0 # reset match count
}
{
if($1~/(3|6)/) # match is a 3 or 6 (for this output)
m++ # increment match count
print m,NR-m # for each record output match / miss counts
}'
Running it:
$ for i in {1..10} ; do echo $i ; sleep 1 ; done | awk 'BEGIN{print "match","miss";m=0}{if($1~/(3|6)/)m++;print m,NR-m}'
match miss
0 1
0 2
1 2
1 3
1 4
2 4
2 5
2 6
2 7
2 8

Paste files conditionally with bash if and awk loop

I have a list of files that I want to paste to a master file (bar) if some awk condition is fulfilled.
for foo in *;
do
if awk '*condition* {exit 1}' $foo
then
:
else
paste $foo > bar
fi
done
However, it looks like only the last pasted file is in bar. Shouldn't paste add new columns to bar every time, without overwriting all the data completely?
File1 File2 Expected_Output Actual_Output
1 4 1 NaN 1 4 1 NaN 1 NaN
2 5 2 7 2 5 2 7 2 7
3 6 3 8 3 6 3 8 3 8
Your paste command overwrites file bar at each iteration in the loop, which explains that at the end you only have the last file.
declare -a FILES=()
for foo in *;
do
if awk '*condition* {exit 1}' $foo
then
:
else
FILES+=("$foo")
fi
done
paste "${FILES[#]}" > bar
This code accumulates all filenames that match your condition in an array named FILES, and calls paste only once, expanding all filenames into individual, quoted arguments (this is what "${FILES[#]}" does) and redirecting output to the bar file.
Additionally, you can replace the whole if/then/else block with :
awk '*condition* {exit 1}' "$foo" || FILES+=("$foo")
The || expresses a condition, and because of Bash performing a lazy evaluation of logical operators, the statement to the right is only executed if awk returns a non-zero return code.
Please note I quoted "$foo" (when passing it to awk) for the cases the name of your files would contain special characters.

Bash script cut at specific ranges

I have a log file with a plenty of collected logs, I already made a grep command with a regex that outputs the number of lines that matches it.
This is the grep command I'm using to output the matched lines:
grep -n -E 'START_REGEX|END_REGEX' Example.log | cut -d ':' -f 1 > ranges.txt
The regex is conditional it can match the begin of a specific log or its end, thus the output is something like:
12
45
128
136
...
The idea is to use this as a source of ranges to make specific cut on the log file from first number to the second and save them on another file.
The ranges are made by couples of the output, according to the example the first range is 12,45 and the second 128,136.
I expect to see in the final file all the text from line 12 to 45 and then from 128 to 136.
The problem I'm facing is that the sed command seems to work with only one range at time.
sed -E -iTMP "$START_RANGE,$END_RANGE! d;$END_RANGEq" $FILE_NAME
Is there any way (maybe with awk) to do that just in one "cycle"?
Constraints: I can only use supported bash command.
You can use an awk statement, too
awk '(NR>=12 && NR<=45) || (NR>=128 && NR<=136)' file
where, NR is a special variable in Awk which keep tracks of the line number as it processes the file.
An example,
seq 1 10 > file
cat file
1
2
3
4
5
6
7
8
9
10
awk '(NR>=1 && NR<=3) || (NR>=8 && NR<=10)' file
1
2
3
8
9
10
You can also avoid, hard-coding the line numbers by using the -v variable option,
awk -v start1=1 -v end1=3 -v start2=8 -v end2=10 '(NR>=start1 && NR<=end1) || (NR>=start2 && NR<=end2)' file
1
2
3
8
9
10
With sed you can do multiple ranges of lines like so:
sed -n '12,45p;128,136p'
This would output lines 12-45, then 128-136.

Resources