Print all lines between line containing a string and first blank line, starting with the line containing that string - bash

I've tried awk:
awk -v RS="zuzu_mumu" '{print RS $0}' input_file > output_file
The obtained file is the exact input_file but now the first line in file is zuzu_mumu.
How could be corrected my command?
After solved this, I've found the same string/patern in another arrangement; so I need to save all those records that match too, in an output file, following this rule:
if pattern match on a line, then look at previous lines and print the first line that follows an empty line, and print also the pattern match line and an empty line.
record 1
record 2
This is record 3 first line
info 1
info 2
This is one matched zuzu_mumu line
info 3
info 4
info 5
record 4
record 5
...
This is record n-1 first line
info a
This is one matched zuzu_mumu line
info b
info c
record n
...
I should obtain:
This is record 3 first line
This is one matched zuzu_mumu line
This is record n-1 first line
This is one matched zuzu_mumu line

Print all lines between line containing a string and first blank line,
starting with the line containing that string
I would use GNU AWK for this task. Let file.txt content be
Able
Baker
Charlie
Dog
Easy
Fox
then
awk 'index($0,"aker"){p=1}p{if(/^$/){exit};print}' file.txt
output
Baker
Charlie
Explanation: use index String function which gives either position of aker in whole line ($0) or 0 and treat this as condition, so this is used like is aker inside line? Note that using index rather than regular expression means we do not have to care about characters with special meaning, like for example .. If it does set p value to 1. If p then if it is empty line (it matches start of line followed by end of line) terminate processing (exit); print whole line as is.
(tested in gawk 4.2.1)

If you don't want to match the same line again, you can record all lines in an array and print the valid lines in the END block.
awk '
f && /zuzu_mumu/ { # If already found and found again
delete ary; entries=1; next; # Delete the array, reset entries and go to the next record
}
f || /zuzu_mumu/ { # If already found or match the word or interest
if(/^[[:blank:]]*$/){exit} # If only spaces, exit
f=1 # Mark as found
ary[entries++]=$0 # Add the current line to the array and increment the entry number
}
END {
for (j=1; j<entries; j++) # Loop and print the array values
print ary[j]
}
' file

Related

In bash how to transform multimap<K,V> to a map of <K, {V1,V2}>

I am processing output from a file in bash and need to group values by their keys.
For example, I have the
13,47099
13,54024
13,1
13,39956
13,0
17,126223
17,52782
17,4
17,62617
17,0
23,1022724
23,79958
23,80590
23,230
23,1
23,118224
23,0
23,1049
42,72470
42,80185
42,2
42,89199
42,0
54,70344
54,72824
54,1
54,62969
54,1
in a file and group all values from a particular key into a single line as in
13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1
There are about 10000 entries in my input file. How do I transform this data in shell ?
awk to the rescue!
assuming keys are contiguous...
$ awk -F, 'p!=$1 {if(a) print a; a=p=$1}
{a=a FS $2}
END {print a}' file
13,47099,54024,1,39956,0
17,126223,52782,4,62617,0
23,1022724,79958,80590,230,1,118224,0,1049
42,72470,80185,2,89199,0
54,70344,72824,1,62969,1
Here is a breakdown of what #karakfa's code is doing, for us awk beginners. I've written this based on a toy dataset file:
1,X
1,Y
3,Z
p!=$1: check if the pattern p!=$1 is true
checks if variable p is equal to the first field of the current (first) line of file (1 in this case)
since p is undefined at this point it cannot be equal to 1, so p!=$1 is true and we continue with this line of code
if(a) print a: check if variable a exists and print a if it does exists
since a is undefined at this point the print a command is not executed
a=p=$1: set variables a and p equal to the value of the first field of the current (first) line (1 in this case)
a=a FS $2: set variable a equal to a combined with the value of the second field of the current (first) line separated by the field separator (1,X in this case)
END: since we haven't reached the end of file yet, we skip the the rest of this line of code
move to the next (second) line of file and restart the awk code on that line
p!=$1: check if the pattern p!=$1 is true
since p is 1 and the first field of the current (second) line is 1, p!=$1 is false and we skip the the rest of this line of code
a=a FS $2: set a equal to the value of a and the value of the second field of the current (second) line separated by the filed separator (1,X,Y in this case)
END: since we haven't reached the end of file yet, we skip the the rest of this line of code
move to the next (third) line of file and restart the awk code
p!=$1: check if the pattern p!=$1 is true
since p is 1 and $1 of the third line is 3, p!=$1 is true and we continue with this line of code
if(a) print a: check if variable a exists and print a if it does exists
since a is 1,X,Y at this point, 1,X,Y is printed to the output
a=p=$1: set variables a and p equal to the value of the first field of the current (third) line (3 in this case)
a=a FS $2: set variable a equal to a combined with the value of the second field of the current (third) line separated by the field separator (3,Z in this case)
END {print a}: since we have reached the end of file, execute this code
print a: print the last group a (3,Z in this case)
The resulting output is
1,X,Y
3,Z
Please let me know if there are any errors in this description.
Slight tweak to #karakfa's answer. If you want the separator between the key and the values to be different than the separator between the values, you can use this code:
awk -F, 'p==$1 {a=a "; " $2} p!=$1 {if(a) print a; a=$0; p=$1} END {print a}'

How can I retrieve the matching records from mentioned file format in bash

XYZNA0000778800Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
I have above file format from which I want to find a matching record. For example, match a number(7789) on line starting with XYZ and once matched look for a matching number (7345) in lines below starting with 1 until it reaches to line starting with 9. retrieve the entire line record. How can I accomplish this using shell script, awk, sed or any combination.
Expected Output:
XYZNA0000778900Z
17345000012300324000000004000000000000000
With sed one can do:
$ sed -n '/^XYZ.*7789/,/^9$/{/^1.*7345/p}' file
17345000012300324000000004000000000000000
Breakdown:
sed -n ' ' # -n disabled automatic printing
/^XYZ.*7789/, # Match line starting with XYZ, and
# containing 7789
/^1.*7345/p # Print line starting with 1 and
# containing 7345, which is coming
# after the previous match
/^9$/ { } # Match line that is 9
range { stuff } will execute stuff when it's inside range, in this case the range is starting at /^XYZ.*7789/ and ending with /^9$/.
.* will match anything but newlines zero or more times.
If you want to print the whole block matching the conditions, one can use:
$ sed -n '/^XYZ.*7789/{:s;N;/\n9$/!bs;/\n1.*7345/p}' file
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
This works by reading lines between ^XYZ.*7779 and ^9$ into the pattern
space. And then printing the whole thing if ^1.*7345 can be matches:
sed -n ' ' # -n disables printing
/^XYZ.*7789/{ } # Match line starting
# with XYZ that also contains 7789
:s; # Define label s
N; # Append next line to pattern space
/\n9$/!bs; # Goto s unless \n9$ matches
/\n1.*7345/p # Print whole pattern space
# if \n1.*7345 matches
I'd use awk:
awk -v rid=7789 -v fid=7345 -v RS='\n9\n' -F '\n' 'index($1, rid) { for(i = 2; i < $NF; ++i) { if(index($i, fid)) { print $i; next } } }' filename
This works as follows:
-v RS='\n9\n' is the meat of the whole thing. Awk separates its input into records (by default lines). This sets the record separator to \n9\n, which means that records are separated by lines with a single 9 on them. These records are further separated into fields, and
-F '\n' tells awk that fields in a record are separated by newlines, so that each line in a record becomes a field.
-v rid=7789 -v fid=7345 sets two awk variables rid and fid (meant by me as record identifier and field identifier, respectively. The names are arbitrary.) to your search strings. You could encode these in the awk script directly, but this way makes it easier and safer to replace the values with those of a shell variables (which I expect you'll want to do).
Then the code:
index($1, rid) { # In records whose first field contains rid
for(i = 2; i < $NF; ++i) { # Walk through the fields from the second
if(index($i, fid)) { # When you find one that contains fid
print $i # Print it,
next # and continue with the next record.
} # Remove the "next" line if you want all matching
} # fields.
}
Note that multi-character record separators are not strictly required by POSIX awk, and I'm not certain if BSD awk accepts it. Both GNU awk and mawk do, though.
EDIT: Misread question the first time around.
an extendable awk script can be
$ awk '/^9$/{s=0} s&&/7345/; /^XYZ/&&/7789/{s=1} ' file
set flag s when line starts with XYZ and contains 7789; reset when line is just 9, and print when flag is set and contains pattern 7345.
This might work for you (GNU sed):
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789/!b;/7345/p' file
Use the option -n for the grep-like nature of sed. Gather up records beginning with XYZ and ending in 9. Reject any records which do not have 7789 in the header. Print any remaining records that contain 7345.
If the 7345 will always follow the header,this could be shortened to:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789.*7345/p' file
If all records are well-formed (begin XYZ and end in 9) then use:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^[^\n]*7789.*7345/p' file

extract each line followed by a line with a different value in column two

Given the following file structure,
9.975 1.49000000 0.295 0 0.4880 0.4929 0.5113 0.5245 2.016726 1.0472 -30.7449 1
9.975 1.49000000 0.295 1 0.4870 0.5056 0.5188 0.5045 2.015859 1.0442 -30.7653 1
9.975 1.50000000 0.295 0 0.5145 0.4984 0.4873 0.5019 2.002143 1.0854 -30.3044 2
is there a way to extract each line in which the value in column two is not equal to the value in column two in the following line?
I.e. from these three lines I would like to extract the second one, since 1.49 is not equal to 1.50.
Maybe with sed or awk?
This is how I do this in MATLAB:
myline = 1;
mynewline = 1;
while myline < length(myfile)
if myfile(myline,2) ~= myfile(myline+1,2)
mynewfile(mynewline,:) = myfile(myline,:);
mynewline = mynewline+1;
myline = myline+1;
else
myline = myline+1;
end
end
However, my files are so large now that I would prefer to carry out this extraction in terminal before transferring them to my laptop.
Awk should do.
<data awk '($2 != prev) {print line} {line = $0; prev = $2}'
A brief intro to awk: awk program consists of a set of condition {code} blocks. It operates line by line. When no condition is given, the block is executed for each line. BEGIN condition is executed before the first line. Each line is split to fields, which are accessible with $_number_. The full line is in $0.
Here I compare the second field to the previous value, if it does not match I print the whole previous line. In all cases I store the current line into line and the second field into prev.
And if you really want it right, careful with the float comparisons - something like abs($2 - prev) < eps (there is no abs in awk, you need to define it yourself, and eps is some small enough number). I'm actually not sure if awk converts to number for equality testing, if not you're safe with the string comparisons.
This might work for you (GNU sed):
sed -r 'N;/^((\S+)\s+){2}.*\n\S+\s+\2/!P;D' file
Read two lines at a time. Pattern match on the first two columns and only print the first line when the second column does not match.
Try following command:
awk '$2 != field && field { print line } { field = $2; line = $0 }' infile
It saves previous line and second field, comparing in next loop with current line values. The && field check is useful to avoid a blank line at the beginning of file, when $2 != field would match because variable is empty.
It yields:
9.975 1.49000000 0.295 1 0.4870 0.5056 0.5188 0.5045 2.015859 1.0442 -30.7653 1

AWK between 2 patterns - first occurence

I am having this example of ini file. I need to extract the names between 2 patterns Name_Z1 and OBJ=Name_Z1 and put them each on a line.
The problem is that there are more than one occurences with Name_Z1 and OBJ=Name_Z1 and i only need first occurence.
[Name_Z5]
random;text
Names;Jesus;Tom;Miguel
random;text
OBJ=Name_Z5
[Name_Z1]
random;text
Names;Jhon;Alex;Smith
random;text
OBJ=Name_Z1
[Name_Z2]
random;text
Names;Chris;Mara;Iordana
random;text
OBJ=Name_Z2
[Name_Z1_Phone]
random;text
Names;Bill;Stan;Mike
random;text
OBJ=Name_Z1_Phone
My desired output would be:
Jhon
Alex
Smith
I am currently writing a more ample script in bash and i am stuck on this. I prefer awk to do the job.
My greatly appreciation for who can help me. Thank you!
For Wintermute solution: The [Name_Z1] part looks like this:
[CAB_Z1]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;AIRE;ALIMENTA;BATER;CONVERTIDOR;DISTRIBUCION;FUEGO;HURTO;MAINS;MALLO;MAYOR;MENOR;PANEL;TEMP
NAME=CAB_Z1
And the [Name_Z1_Phone] part looks like this:
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
The fix should be somewhere around the "|PerceivedSeverity"
Expected Output:
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
This should work:
sed -n '/^\[Name_Z1/,/^OBJ=Name_Z1/ { /^Names/ { s/^Names;//; s/;/\n/g; p; q } }' foo.txt
Explanation: Written readably, the code is
/^\[Name_Z1/,/^OBJ=Name_Z1/ {
/^Names/ {
s/^Names;//
s/;/\n/g
p
q
}
}
This means: In the pattern range /^\[Name_Z1/,/^OBJ=Name_Z1/, for all lines that match the pattern /^Names/, remove the Names; in the beginning, then replace all remaining ; with newlines, print the whole thing, and then quit. Since it immediately quits, it will only handle the first such line in the first such pattern range.
EDIT: The update made things a bit more complicated. I suggest
sed -n '/^\[CAB_Z1/,/^NAME=CAB_Z1/ { /^FilterAttr=/ { s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/; s/;/\n/g; p; q } }' foo.txt
The main difference is that instead of removing ^Names from a line, the substitution
s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/;
is applied. This isolates the part between contains; and |PerceivedSeverity before continuing as before. It assumes that there is only one such part in the line. If the match is ambiguous, it will pick the one that appears last in the line.
An (g)awk way that doesn't need a set number of fields(although i have assumed that contains; will always be on the line you need the names from.
(g)awk '(x+=/Z1/)&&match($0,/contains;([^|]+)/,a)&&gsub(";","\n",a[1]){print a[1];exit}' f
Explanation
(x+=/Z1/) - Increments x when Z1 is found. Also part of a
condition so x must exist to continue.
match($0,/contains;([^|]+)/,a) - Matches contains; and then captures everything after
up to the |. Stores the capture in a. Again a
condition so must succeed to continue.
gsub(";","\n",a[1]) - Substitutes all the ; for newlines in the capture
group a[1].
{print a[1];exit}' - If all conditions are met then print a[1] and exit.
This way should work in (m)awk
awk '(x+=/Z1/)&&/contains/{split($0,a,"|");y=split(a[2],b,";");for(i=3;i<=y;i++)
print b[i];exit}' file
sed -n '/\[Name_Z1\]/,/OBJ=Name_Z1$/ s/Names;//p' file.txt | tr ';' '\n'
That is sed -n to avoid printing anything not explicitly requested. Start from Name_Z1 and finish at OBJ=Name_Z1. Remove Names; and print the rest of the line where it occurs. Finally, replace semicolons with newlines.
Awk solution would be
$ awk -F";" '/Name_Z1/{f=1} f && /Names/{print $2,$3,$4} /OBJ=Name_Z1/{exit}' OFS="\n" input
Jhon
Alex
Smith
OR
$ awk -F";" '/Name_Z1/{f++} f==1 && /Names/{print $2,$3,$4}' OFS="\n" input
Jhon
Alex
Smith
-F";" sets the field seperator as ;
/Name_Z1/{f++} matches the line with pattern /Name_Z1/ If matched increment {f++}
f==1 && /Names/{print $2,$3,$4} is same as if f == 1 and maches pattern Name with line if true, then print the the columns 2 3 and 4 (delimted by ;)
OFS="\n" sets the output filed seperator as \n new line
EDIT
$ awk -F"[;|]" '/Z1/{f++} f==1 && NF>1{for (i=5; i<15; i++)print $i}' input
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
Here is a more generic solution for data in group of blocks.
This awk does not need the end tag, just the start.
awk -vRS= -F"\n" '/^\[Name_Z1\]/ {n=split($3,a,";");for (i=2;i<=n;i++) print a[i];exit}' file
Jhon
Alex
Smith
How it works:
awk -vRS= -F"\n" ' # By setting RS to nothing, one record equals one block. Then FS is set to one line as a field
/^\[Name_Z1\]/ { # Search for block with [Name_Z1]
n=split($3,a,";") # Split field 3, the names and store number of fields in variable n
for (i=2;i<=n;i++) # Loop from second to last field
print a[i] # Print the fields
exit # Exits after first find
' file
With updated data
cat file
data
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
data
awk -vRS= -F"\n" '/^\[CAB_Z1_FUEGO\]/ {split($3,a,"|");n=split(a[2],b,";");for (i=3;i<=n;i++) print b[i]}' file
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
The following awk script will do what you want:
awk 's==1&&/^Names/{gsub("Names;","",$0);gsub(";","\n",$0);print}/^\[Name_Z1\]$/||/^OBJ=Name_Z1$/{s++}' inputFileName
In more detail:
s==1 && /^Names;/ {
gsub ("Names;","",$0);
gsub(";","\n",$0);
print
}
/^\[Name_Z1\]$/ || /^OBJ=Name_Z1$/ {
s++
}
The state s starts with a value of zero and is incremented whenever you find one of the two lines:
[Name_Z1]
OBJ=Name_Z1
That means, between the first set of those lines, s will be equal to one. That's where the other condition comes in. When s is one and you find a line starting with Names;, you do two substitutions.
The first is to get rid of the Names; at the front, the second is to replace all ; semi-colon characters with a newline. Then you print it out.
The output for your given test data is, as expected:
Jhon
Alex
Smith

Replace text with sed

A program creates HTML files from a database. There are headings and stuff in between the headings.
There are not a set amount of headings.
After each heading the program places the text:
$WHITE*("5")$
$WHITE*("20")$
$HRULE$
I need every occurrence of these 4 lines to be replaced with:
$WHITE*("20")$
$HRULE$
$WHITE*("10")$
I am not fussed what program is used :)
I have tried:
sed 's:\$WHITE\*(\"5\")\$\n\n\$WHITE\*(\"20\")\$\n\$HRULE\$:\$WHITE\*(\"20\")\$\
\$HRULE$\
\$WHITE*("10")$:g'
and various other permutations
If that'S your input file, and this is the spec, you can do:
sed -n '3,$p;$a$WHITE*("10")$' INPUTFILE
But I assume that's not the case, so you might want to rephrase your question and/or giving some more detailes.
More specific solution with sed:
sed '/^\$WHITE\*("5")\$$/,/^$/d;/\$HRULE\$/ a$WHITE*("10")$' INPUTFILE
(Searches for the $WHITE*("5")$ line and deletes it till (including!) the next empty line. Then searches for the next $HRULE$ line and appends an $WHITE*("10")$ line.
awk solution:
awk '/\$WHITE\*\("5"\)\$/ { getline ; next }
/\$WHITE\*\("20"\)\$/ { print ;
getline ;
if ($0 ~ /\$HRULE\$/) { print ;
print "$WHITE*(\"10\")$" ;
}
else { print }
}
1 ' INPUTFILE
This reads the file and prints every line - that's why the 1 is there, except if it finds the $WHITE*("5") pattern it drops it, reads the next line and drops that too. if it finds the $WHITE*("20") prints it. Reads the next line and if its $HRULE$ then prints that and the appended $WHITE*("10") line. Else just prints the line.
HTH
UPDATE #2
From the sed faq, section 4.23.3
If you need to match a static block of text (which may occur any number of times throughout a file), where the contents of the block are known in advance, then this script is easy to use
UPDATE #1
Python?
$ cat input
first line
second line
3rd line
$WHITE*("5")$
$WHITE*("20")$
$HRULE$
some more lines
yet another
$WHITE*("5")$
$WHITE*("20")$
$HRULE$
THE END
the script:
#!/usr/bin/env python
## Use these 3 lines for python version < 2.5
#fd=open('input')
#text=fd.read()
#fd.close()
## Use these 2 lines for python version >= 2.5
with open('input') as fd:
text=fd.read()
old="""$WHITE*("5")$
$WHITE*("20")$
$HRULE$
"""
new="""$WHITE*("20")$
$HRULE$
$WHITE*("10")$
"""
print text.replace(old,new)
output:
first line
second line
3rd line
$WHITE*("20")$
$HRULE$
$WHITE*("10")$
some more lines
yet another
$WHITE*("20")$
$HRULE$
$WHITE*("10")$
THE END
Try something like
sed -e '${p;};/$WHITE\*("5")\$/,/$HRULE\$/{H;/$HRULE\$/{g;s/$HRULE\$//;s/20/10/;s/5/20/;s/\n/&$HRULE$/2p;s/.*//p;x;d;};d;};' white.txt
Crude, but it should work.
This might work for you:
sed '/^\$WHITE\*(\"5\")\$/{N;N;N;s/.*\n\n\(\(\$WHITE\*(\"\)20\(\")\$\s*\)\n\$HRULE\$\s*$\)/\1\n\210\3/}' file
Explanation:
Match on first string $WHITE*("5")$, read the next 3 lines and match on remainder. Use grouping and back references to formulate output lines.

Resources