match a pattern and print nth line if condition matches - shell

My requirement is something like this:
Read File:
If ( line contains /String1/)
{
Increment cursor by 10 lines;
If (line contains /String2/ )
{ print line; }
}
so far I have got:
awk '/String1/{nr[NR]; nr[NR+10]}; NR in nr' file1.log
Result of this should input to:
awk 'match ($0 , /String2/) {print $0}' file1.log
How can I achieve it? Is there a better way?
Thanks.

You are close; you need to set the value of the array element.
awk '/String1/ { linematch[NR+10]=1; } /String2/ && NR in linematch;' file1.log
Each time a line matches String1, you save the record (line) number plus 10. Any time you match String2, check if the current line number is one we are expecting, and if so, print the line.

Here's another way to describe your algorithm. Instead of:
If ( line contains /String1/)
{
Increment cursor by 10 lines;
If (line contains /String2/ )
{ print line; }
}
which would require jumping ahead in your input stream somehow, think of it as:
If ( line contains /String2/)
{
If (line 10 lines previously contained /String1/ )
{ print line; }
}
which just requires you to re-visit what you already read in:
awk '/String1/{f[NR]} /String2/ && (NR-10) in f' file

Related

How can I calculate the number of occurrences that is followed by a specific value? (add if statement)

How can I calculate the number of occurrences that are ONLY followed by a specific value that is after E*? e.g:'EXXXX' ?
file.txt:
E2dd,Rv0761,Rv1408
2s32,Rv0761,Rv1862,Rv3086
6r87,Rv0761
Rv2fd90c,Rv1408
Esf62,Rv0761
Evsf62,Rv3086
i tried
input:
awk -F, '{map[$2]++} END { for (key in map) { print key, map[key] } }' file.txt
and add:
if [[ $line2 == `E*` ]];then
but not working, have syntax error
Expected Output:
total no of occurrences:
Rv0761: 2
Rv3086:1
Now i can only count all number of occurrences of the second value
if [[ $line2 == `E*` ]];then
This definitely is not legal GNU AWK if statement, consult If Statement to find what is allowed, though it is not required in this case as you might as follows, let file.txt content be
E2dd,Rv0761,Rv1408
2s32,Rv0761,Rv1862,Rv3086
6r87,Rv0761
Rv2fd90c,Rv1408
Esf62,Rv0761
Evsf62,Rv3086
then
awk 'BEGIN{FS=","}($1~/^E/){map[$2]++} END { for (key in map) { print key, map[key] } }' file.txt
gives output
Rv3086 1
Rv0761 2
Explanation: actions (enclosed in {...}) could be preceeded by pattern, which does restrict their execution to lines which does match pattern (in other words: condition does hold) in above example pattern is $1~/^E/ which means 1st column does starts with E.
(tested in gawk 4.2.1)
You are so close. You are only missing the REGEX to identify records beginning with 'E' and then a ":" concatenated on output to produce your desired results (not in sort-order). For example you can do:
awk -F, '/^E/{map[$2]++} END { for (key in map) { print key ":", map[key] } }' file.txt
Example Output
With your data in file.txt you would get:
Rv3086: 1
Rv0761: 2
If you need the output sorted in some way, just pipe the output of the awk command to sort with whatever option you need.

GNU sed 4.2.1 matching second occurence

With this command I am trying to insert M09 between two patterns of M502 and the second line since M502 that looks like X41.5Y251.5T201.
Specifically I am trying to insert M09 before the last line.
sed -i "/M502/{:a;N;/\(.*\)T\([0-9]\+\)/!ba;N;s/\(.*\)\n\(.*\)T\([0-9]\+\)/\1\nM09\n\2T\3/}" *.nc
Result:
M502
M09
X1287.2Y353.28T324
..........
G27X-310.27
X41.5Y251.5T201
Should be:
M502
X1287.2Y353.28T324
..........
G27X-310.27
M09
X41.5Y251.5T201
I am trying to match the last line with \(.*\)T\([0-9]\+\), but that picks up the second line. How do I make sed ignore the first match of this expression?
I stress that I could have an undertermined amount of such text blocks and I need to insert M09 for every one.
With your shown samples, please try following awk code.
awk '
/.*T[0-9][0-9]*/{ found=1 }
FNR==1 { prev=$0; next }
{
print prev
prev=$0
}
END{
if(found) { print "M09" }
print prev
}
' Input_file
Above code will only print the values on terminal, in case you want to save output into Input_file itself then append > temp && mv temp Input_file to above code.
How do I match the second instance of \(.*\)T\([0-9]\+\) ? It must be an expression. Thanks for the help.
Assuming that you would like to insert some text just before the line that matches the the above regex for the 2th time, then you should consider using awk, as then you could do something like this:
$ awk '/.*T[0-9][0-9]*/ { n++ } !p && n == 2 { print "M09"; p = 1 } 1' input.txt
M502
X1287.2Y353.28T324
..........
G27X-310.27
M09
X41.5Y251.5T201
The breakdown of the above command is:
/.*T[0-9][0-9]*/ { n++ } # Everytime the regex match add one to `n`
!p && n == 2 { print "M09"; p = 1 } # If n = 2 and we havn't printed a header yet,
# then print "M09"
1 # Print every line

Turning multi-line string into single comma-separated list in Bash

I have this format:
host1,app1
host1,app2
host1,app3
host2,app4
host2,app5
host2,app6
host3,app1
host4... and so on.
I need it like this format:
host1;app1,app2,app3
host2;app4,app5,app6
I have tired this: awk -vORS=, '{ print $2 }' data | sed 's/,$/\n/'
and it gives me this:
app1,app2,app3 without the host in front.
I do not want to show duplicates.
I do not want this:
host1;app1,app1,app1,app1...
host2;app1,app1,app1,app1...
I want this format:
host1;app1,app2,app3
host2;app2,app3,app4
host3;app2;app3
With input sorted on the first column (as in your example ; otherwise just pipe it to sort), you can use the following awk command :
awk -F, 'NR == 1 { currentHost=$1; currentApps=$2 }
NR > 1 && currentHost == $1 { currentApps=currentApps "," $2 }
NR > 1 && currentHost != $1 { print currentHost ";" currentApps; currentHost=$1; currentApps=$2 }
END { print currentHost ";" currentApps }'
It has the advantage over other solutions posted as of this edit to avoid holding the whole data in memory. This comes at the cost of needing the input to be sorted (which is what would need to put lots of data in memory if the input wasn't sorted already).
Explanation :
the first line initializes the currentHost and currentApps variables to the values of the first line of the input
the second line handles a line with the same host as the previous one : the app mentionned in the line is appended to the currentApps variable
the third line handles a line with a different host than the previous one : the infos for the previous host are printed, then we reinitialize the variables to the value of the current line of input
the last line prints the infos of the current host when we have reached the end of the input
It probably can be refined (so much redundancy !), but I'll leave that to someone more experienced with awk.
See it in action !
$ awk '
BEGIN { FS=","; ORS="" }
$1!=prev { print ors $1; prev=$1; ors=RS; OFS=";" }
{ print OFS $2; OFS=FS }
END { print ors }
' file
host1;app1,app2,app3
host2;app4,app5,app6
host3;app1
Maybe something like this:
#!/bin/bash
declare -A hosts
while IFS=, read host app
do
[ -z "${hosts["$host"]}" ] && hosts["$host"]="$host;"
hosts["$host"]+=$app,
done < testfile
printf "%s\n" "${hosts[#]%,}" | sort
The script reads the sample data from testfile and outputs to stdout.
You could try this awk script:
awk -F, '{a[$1]=($1 in a?a[$1]",":"")$2}END{for(i in a) printf "%s;%s\n",i,a[i]}' file
The script creates entries in the array a for each unique element in the first column. It appends to that array entry all element from the second column.
When the file is parsed, the content of the array is printed.

awk get the nextline

i'm trying to use awk to format a file thats contains multiple line.
Contains of file:
ABC;0;1
ABC;0;0;10
ABC;0;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12
KLM;6;18;1200
KLM;10;18;14
KLM;1;18;15
result desired:
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
I am using the code below :
awk -F ";" '{
ligne= ligne $0
ma_var = $1
{getline
if($1 != ma_var){
ligne= ligne "\n" $0
}
else {
ligne= ligne";"NF
}
}
}
END {
print ligne
} ' ${FILE_IN} > ${FILE_OUT}
the objectif is to compare the first column of the next line to the first column the current line, if it matches then add the last column of the next line to the current line, and delete the next line, else print the next line.
Kind regards,
As with life, it's a lot easier to make decisions based on what has happened (the previous line) than what will happen (the next line). Re-state your requirements as the objective is to compare the first column of the current line to the first column the previous line, if it matches then add the last column of the current line to the previous line, and delete the current line, else print the current line. and the code to implement it becomes relatively straight-forward:
$ cat tst.awk
BEGIN { FS=OFS=";" }
$1 == p1 { prev = prev OFS $NF; next }
{ if (NR>1) print prev; prev=$0; p1=$1 }
END { print prev }
$ awk -f tst.awk file
ABC;0;1;10;2
EFG;0;1;15
HIJ;2;8;00
KLM;4;18;12;1200;14;15
If you're ever tempted to use getline again, be sure you fully understand everything discussed at http://awk.freeshell.org/AllAboutGetline before making a decision.
I would take a slightly different approach than Ed:
$ awk '$1 == p { printf ";%s", $NF; next } NR > 1 { print "" } {p=$1;
printf "%s" , $0} END{print ""}' FS=\; input
At each line, check if the first column matches the previous. If it does, just print the last field. If it doesn't, print the whole line with no trailing newline.

Remove empty lines followed by a pattern

I'm trying to find a way to remove empty lines which are found in my asciidoc file before a marker string, such as:
//Empty line
[source,shell]
I'd need:
[source,shell]
I'm trying with:
sed '/^\s*$\n[source,shell]/d' file
however it doesn't produce the expected effect (even escaping the parenthesis). Any help ?
You may use this awk-script to delete previous empty line:
awk -v desired_val="[source,shell]"
'BEGIN { first_time=1 }
{
if ( $0 != desired_val && first_time != 1) { print prev };
prev = $0;
first_time = 0;
}
END { print $0 }' your_file
Next script is little more than previous, but provides deleting all empty lines before desired value.
# AWK script file
# Provides clearing all empty lines in front of desired value
# Usage: awk -v DESIRED_VAL="your_value" -f "awk_script_fname" input_fname
BEGIN { i=0 }
{
# If line is empty - save counts of empty strings
if ( length($0) == 0 ) { i++; }
# If line is not empty and is DESIRED_VAL, print it
if ( length ($0) != 0 && $0 == DESIRED_VAL )
{
print $0; i=0;
}
# If line is not empty and not DESIRED_VAL, print all empty str and current
if ( length ($0) != 0 && $0 != DESIRED_VAL )
{
for (m=0;m<i;m++) { print ""; } i=0; print $0;
}
}
# If last lines is empty, print it
END { for (m=0;m<i;m++) { print ""; } }
This is awk-script used by typing followed command:
awk -v DESIRED_VAL="your_value" -f "awk_script_fname" input_fname
Your sed line doesn't work because sed processes one line at a time, so it will not match a pattern that includes \n unless you manipulate the pattern space.
If you still want to do it with sed:
sed '/^$/{N;s/\n\(\[source,shell]\)/\1/}' file
How it works: When matching an empty line, read the next line into the pattern space and remove the empty line if a marker is found. Note that this won't work correctly if you have two empty lines before the marker, as the first empty line will consume the second one and there will be no matching with the marker.

Resources