Before starting to explain my issue I have to say that it's the first time I'm using bash and the awk command.
I have a file containing a lot of lines and I am interested in printing some of these lines if certain characters of the line satisfy a condition. I already have a simple method which is working but I intend to try with awk to see if it can be faster. The command I'm trying was inspired by a colleague at work but I don't fully understand it.
My file looks like :
# 15247.479
1 23775U 96005A 18088.90328565 -.00000293 +00000-0 +00000-0 0 9992
2 23775 014.2616 019.1859 0018427 174.9850 255.8427 00.99889926081074
# 15250.479
1 23775U 96005A 18088.35358271 -.00000295 +00000-0 +00000-0 0 9990
2 23775 014.2614 019.1913 0018425 174.9634 058.1812 00.99890136081067
The 4th field number refers to a date and I want to print the lines starting with 1 and 2 if the bold number if superior to startDate and inferior to endDate.
I am trying with :
< $file awk ' BEGIN {ok=0}
{date=substring($0,19,10) if ($date>='$firstTime' && $date<= '$lastTime' ) {print; ok=1} else ok=0;next}
{if (ok) print}'
This returns a syntax error but I fear it is not the only problem. I don't really understand what the $0 in substring refers to.
Thanks everyone for the help !
Per the question about $0:
Awk is a language built for processing tables and has language features specific to both filtering and manipulating tabular data. One language feature is automatic field splitting.
If you see a $ in front of a variable or constant, it is referring to a "field." When awk sees $field_number being used in a variable context, awk splits the current record buffer based upon what is in the FS variable and allows you to work on that just as you would any other variable -- just that the backing store for that variable is the record buffer.
$0 is a special field referring to the whole of the record buffer. There are some interesting notes in the awk documentation about the side effects on $0 of assigning $field_number variables, FS and OFS that are worth an in depth read.
Here is my answer to your application:
(1) First, LC_ALL may help us for speed. I'm using ll/ul for lower and upper limits -- the reason for which will be apparent later. Specifying them as variables outside the script helps our readability. It is good practice to properly quote shell variables.
(2) It is good practice to use BEGIN { ... }, as you did in your attempt, to formally initialize variables. If using gawk, we can use LINT = 1 to test things like this.
(3) /^#/ is probably the simplest (and fastest) pattern for our reset. We use next because we never want to apply the limits to this line and we never want to see this line in our output (even if ll = ul = "").
(4) It is surprisingly easy to make a mistake on limits. Implement limits consistently one way, and our readers will thank us. We remember to check corner cases where ll and/or ul are blank. One corner case is where we have already triggered our limits and we are waiting for /^#/ -- we don't want to rescan the limits again while ok.
(5) The default action of a pattern is to print.
(6) Remembering to quote our filename variable will save us someday when we inevitably encounter the stray "$file" with spaces in the name.
LC_ALL=C awk -v ll="$firstTime" -v ul="$lastTime" ' # (1)
BEGIN { ok = 0 } # (2)
/^#/ { ok = 0; next } # (3)
!ok { ok = (ll == "" || ll <= $4) && (ul == "" || $4 <= ul) } # (4)
ok # <- print if ok # (5)
' "$file" # (6)
You're missing a ; between the variable assignment and if. And instead of concatenating shell variables, assign them to awk variables. There's no need to initialize ok=0, uninitialized variables are automatically treated as falsey. And if you want to access a field of the input, use $n where n is the field number, rather than substr().
You need to set ok=0 when you get to the next line beginning with #, otherwise you'll just keep printing the rest of the file.
awk -v firstTime="$firstTime" -v lastTime="$lastTime" '
NF > 3 && $4 > firstTime && $4 <= lastTime { print; ok=1 }
$1 == "#" { ok = 0 }
ok { print }' "$file"
This answer is based upon my original but taking into account some new information that #clem sent us in comment -- to the effect that we now know that the line we need to test is always immediately subsequent to the line matching /^#/. Therefore, when we match in this new solution, we immediately do a getline to grab the next line, and set ok based upon that next line's data. We now only check against limits on the line subsequent to our match, and we do not check against limits on lines where we shouldn't.
LC_ALL=C awk -v ll="$firstTime" -v ul="$lastTime" '
BEGIN { ok = 0 }
/^#/ {
getline
ok = (ll == "" || ll <= $4) && (ul == "" || $4 <= ul)
}
ok # <- print if ok
' "$file"
So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output
For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.
lets say I have a file looking somewhat like this:
X NeedThis1 KEYWORD
.
.
NeedThis2 X KEYWORD
And I need to combine the two lines into one like this:
NeedThis2 NeedThis1 KEYWORD
It needs to be done for every line in that file that contains the same KEYWORD but it can't combine two lines that look like this (two X's at the first|second position)
X NeedThis1 KEYWORD
X NeedThis2 KEYWORD
I am considering myself bash-noob so any advice if it can be done with something like awk or sed would be appreciated.
awk '
{if ($1 == "X") end[$3] = $2; else start[$3] = $1}
END {for (kw in start) if (kw in end) print start[kw], end[kw], kw}
' file
Try this:
awk '
$1=="X" {key = $NF; value = $2; next}
$2=="X" && $NF==key {print value, $1, key}' file
Explanation:
When a line where first field is X, store the last field as key and second field as value.
Look for the next line where second field is X and last field matches the key stored from pervious action.
When found, print the value of last matched line along with first field of the current line and the key.
This will most definitely break if your data does not match the sample you have shown (if it has more spaces or fields in between), so feel free to adjust as per your needs.
I won't give you the full answer, but if you have some way to identify "KEYWORD" (not in your problem statement), then use a BASH associative array:
declare -A keys
while IFS= read -u3 -r line
do
set -- $line
eval keyword=\$$#
keys[$keyword]+=${line%$keyword}
done
you'll certainly have to do some more fiddling, but your problem statement is incomplete and some of the work needs to be an exercise for the reader.