SED incorrectly replaces only the first instance of a pattern on a line - bash

Hello: I have tab separated data of the form
customer-item description-purchase price-category
e.g. a.out contains:
1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\tNULL\t3.0\tfruit
4\tCarrots\tNULL\tfruit
5\tNULL\tNULL\tfruit
I'm attempting to get rid of all the NULL fields. I can't rely on the simple replacement of the string "NULL" as it may be a substring; so I am attempting
sed -i 's:\tNULL\t:\t\t:g' a.out
when I do this, I end up with
1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5.\t\tNULL\tfruit
what's wrong here is that #5 has only suffered a replacement of the first instance of the search string on each line.
If I run my sed command twice, I end up with the result I want:
1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5.\t\t\tfruit
where you can see that line 5 has both of the NULLs removed
But I don't understand why I'm suffering this?

awk -F'\t' -v OFS='\t' '{
for (i = 1; i <= NF; ++i) {
if ($i == "NULL") {
$i = "";
}
}
print
}' test.txt
The straightforward solution is to use \t as a field separator and then loop over all of the fields looking for an exact match of "NULL". No substringing.
Here's the same thing as a one liner:
awk -F'\t' -v OFS='\t' '{for(i=1;i<=NF;++i) if($i=="NULL") $i=""} 1' test.txt

Since tabs can't be inside strings in your case since that would imply a new field you might be able to do what you want simply by doing this;
sed ':start ; s/\tNULL\(\t\|$\)/\t\1/ ; t start' a.out
First the inner part s/\tNULL\(\t\|$\)/\t\1/ searches for tab NULL followed by a tab or end of line $ and replace with a tab followed by the character that did appear after NULL (this last part is done using \1). We'll call that expression
We now have:
sed ':start ; expression ; t start' a.out
This is effectively a loop (like goto). :start is a label. ; acts as a statement delimiter. I have described what expression does above. t start says that IF the expression did any substitution that a jump will be made to label start. The buffer will contain the substituted text. This loop occurs until no substitution can be done on the line and then processing continues.
Information on sed flow control and other useful tidbits can be found here

awk makes it simpler:
awk -F '\tNULL\\>' -v OFS='\t' '{$1=$1}1' file
1\t400 Bananas\t3.00\tfruit
2\t60 Oranges\t0.00\tfruit
3\t\t3.0\tfruit
4\tCarrots\t\tfruit
5\t\t\tfruit

From grep(1) on a recent Linux:
The Backslash Character and Special Expressions
The symbols \< and > respectively match the empty string at the
beginning and end of a word. The symbol \b matches the empty string at
the edge of a word [...]
--
So, how about:
sed -i 's:\<NULL\>::g' a.out

Related

Unix sed command - global replacement is not working

I have scenario where we want to replace multiple double quotes to single quotes between the data, but as the input data is separated with "comma" delimiter and all column data is enclosed with double quotes "" got an issue and the same explained below:
The sample data looks like this:
"int","","123","abd"""sf123","top"
So, the output would be:
"int","","123","abd"sf123","top"
tried below approach to get the resolution, but only first occurrence is working, not sure what is the issue??
sed -ie 's/,"",/,"NULL",/g;s/""/"/g;s/,"NULL",/,"",/g' inputfile.txt
replacing all ---> from ,"", to ,"NULL",
replacing all multiple occurrences of ---> from """ or "" or """" to " (single occurrence)
replacing 1 step changes back to original ---> from ,"NULL", to ,"",
But, only first occurrence is getting changed and remaining looks same as below:
If input is :
"int","","","123","abd"""sf123","top"
the output is coming as:
"int","","NULL","123","abd"sf123","top"
But, the output should be:
"int","","","123","abd"sf123","top"
You may try this perl with a lookahead:
perl -pe 's/("")+(?=")//g' file
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
Where input is:
cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
Breakup:
("")+: Match 1+ pairs of double quotes
(?="): If those pairs are followed by a single "
Using sed
$ sed -E 's/(,"",)?"+(",)?/\1"\2/g' input_file
"int","","123","abd"sf123","top"
"int","","NULL","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
In awk with your shown samples please try following awk code. Written and tested in GNU awk, should work in any version of awk.
awk '
BEGIN{ FS=OFS="," }
{
for(i=1;i<=NF;i++){
if($i!~/^""$/){
gsub(/"+/,"\"",$i)
}
}
}
1
' Input_file
Explanation: Simple explanation would be, setting field separator and output field separator as , for all the lines of Input_file. Then traversing through each field of line, if a field is NOT NULL then Globally replacing all 1 or more occurrences of " with single occurrence of ". Then printing the line.
With sed you could repeat 1 or more times sets of "" using a group followed by matching a single "
Then in the replacement use a single "
sed -E 's/("")+"/"/g' file
For this content
$ cat file
"int","","123","abd"""sf123","top"
"int","","","123","abd"""sf123","top"
"123"""""abcs"
The output is
"int","","123","abd"sf123","top"
"int","","","123","abd"sf123","top"
"123"abcs"
sed s'#"""#"#' file
That works. I will demonstrate another method though, which you may also find useful in other situations.
#!/bin/sh -x
cat > ed1 <<EOF
3s/"""/"/
wq
EOF
cp file stack
cat stack | tr ',' '\n' > f2
ed -s f2 < ed1
cat f2 | tr '\n' ',' > stack
rm -v ./f2
rm -v ./ed1
The point of this is that if you have a big csv record all on one line, and you want to edit a specific field, then if you know the field number, you can convert all the commas to carriage returns, and use the field number as a line number to either substitute, append after it, or insert before it with Ed; and then re-convert back to csv.

How to select text in a file until a certain string using grep, sed or awk?

I have a huge file (this is just a sample) and I would like to select all lines with "Ph_gUFAC1083" and all after until reach one that doesn't have the code (in this example Ph_gUFAC1139)
>uce_353_Ph_gUFAC1083 |uce_353
TTTAGCCATAGAAATGCAGAAATAATTAGAAGTGCCATTGTGTACAGTGCCTTCTGGACT
GGGCTGAAGGTGAAGGAGAAAGTATCATACTATCCTTGTCAGCTGCAAGGGTAATTACTG
CTGGCTGAAATTACTCAACATTTGTTTATAAGCTCCCCAGAGCATGCTGTAAATAGATTG
TCTGTTATAGTCCAATCACATTAAAACGCTGCTCCTTGCAAACTGCTACCTCCTGTTTTC
TGTAAGCTAGACAGAGAAAGCCTGCTGCTCACTTACTGAGCACCAAGCACTGAAGAGCTA
TGTTTAATGTGATTGTTTTCATTAGCTCTTCTCTGTCTGATATTACATTTATAATTTGCT
GGGCTTGAAGACTGGCATGTTGCATTGCTTTCATTTACTGTAGTAAGAGTGAATAGCTCT
AT
>uce_101_Ph_gUFAC1083 |uce_101
TTGGGCTTTATTTCCACCTTAAAATCTTTACCTGGCCGTGATCTGTTGTTCCATTACTGG
AGGGCAAAAATGGGAGGAATTGTCTGGGCTAAATTGCAATTAGGCAGCCCTGAGAGAGGC
TGGCACCAGTTAACTTGGGATATTGGAGTGAAAAGGCCCGTAATCAGCCTTCGGTCATGT
AGAACAATGCATAAAATTAAATTGACATTAATGAATAATTGTGTAATGAAAATGGAAGAG
GAGAGTTAATTGCATGTTACAGTGAGTGTAATGCCTAGATAACCTTGCATTTAATGCTAT
TCTTAGCCCTGCTGCCAAGACTTCTACAGAGCCTCTCTCTGCAGGAAGTCATTAAAGCTG
TGAGTAGATAATGCAGGCTCAGTGAAACCTAAGTGGCAACAATATA
>uce_171_Ph_gUFAC1083 |uce_171
CATGGAAAACGAGGAAAAGCCATATCTTCCAGGCCATTAATATTACTACGGAGACGTCTT
CATATCGCCGTAATTACAGCAGATCTCAAAGTGGCACAACCAAGACCAGCACCAAAGCTA
AAATAACTCGCAGGAGCAGGCGAGCTGCTTTTGCAGCCCTCAGTCCCAGAAATGCTCGGT
AGCTTTTCTTAAAATAGACAGCCTGTAAATAAGGTCTGTGAACTCAATTGAAGGTGGCTG
TTTCTGAATTAGTCAGCCCTCACAAGGCTCTCGGCCTACATGCTAGTACATAAATTGTCC
ACTTTACCACCAGACAAGAAAGATTAGAGTAATAAACACGGGGCATTAGCTCAGCTAGAG
AAACACACCAGCCGTTACGCACACGCGGGATTGCCAAGAACTGTTAACCCCACTCTCCAG
AAACGCACACAAAAAAACAAGTTAAAGCCATGACATCATGGGAA
>uce_4300_Ph_gUFAC1139 |uce_4300
ATTAAAAATACAATCCTCATGTTTGCATTTTGCAGTCGTCAACAAGAAATTGAAGAGAAA
CTCATAGAGGAAGAAACTGCTCGAAGGGTGGAAGAACTTGTAGCTAAACGCGTGGAAGAA
GAGCTGGAGAAAAGAAAGGATGAGATTGAGCGAGAGGTTCTCCGCAGGGTGGAGGAGGCT
AAGCGCATCATGGAAAAACAGTTGCTCGAAGAACTCGAGCGACAGCGACAAGCTGAACTT
GCAGCACAAAAAGCCAGAGAGGTAACGCTCGGTCGTTTGGAAAGTAGAGACAGTCCATGG
CAAAACTTTCAGTGTCGGTTTGTGCCTCCTGTTCGGTTCAGAAAGAGATGGAATACAGCA
AATCTAATTCCCTTCTCATATAAACTTGCATTGCTGCGAAACTTAATTTCTAGCCTATTC
AGAGGAGCTCACTGATATTTAAACAGTTACTCTCCTAAAACCTGAACAAGGATACTTGAT
TCTTAATGGAACTGACCTACATATTTCAGAATTGTTTGAAACTTTTGCCATGGCTGCAGG
ATTATTCAGCAGTCCTTTCATTTT
>uce_1039_Ph_gUFAC1139 |uce_1039
ATTAGTGGAATACAAATATGCAAAAACCAAACAGTTTGGTGCTATAATGTGAAAAGAAAT
TTACACCAATCTTATTTTTAATTTGTATGGGAACATTTTTACCACAAATTCCATATTTTA
ATAATACTATCCCAACTCTATTTTTTAGACTCATTTTGTCACTGTTTTGTAACAGAAACA
CTGTAAATATTATAGATGTGGTAAACTATTATACTTGTTTTCTTATAAATGAAATGATCT
GTGCCAACACTGACAAAATGAATTAATGTGTTACTAAGGCAACAGTCACATTATATGCTT
TCTCTTTCACAGTATGCGGTAGAGCATATGGTTTACTCTTAATGGAACACTAGCTTCTCA
TTAACATACCAGTAGCAATGTCAGAACTTACAAACCAGCATAACAGAGAAATGGAAAAAC
TTATAAATTAGACCCTTTCAGTATTATTGAGTAGAAAATGACTGATGTTCCAAGGTACAA
TATTTAGCTAATACAGTGCCCTTTTCTGCATCTTTCTTCTCAAAGGAAAAAAAAATCCTC
AAAAAAAACCAGAGCAAGAAACCTAACTTTTTCTTGT
I already tried several alternatives without success, the closest I reached was
sed -n '/Ph_gUFAC1083/, />/p' file.txt
that gave me that:
>uce_2347_Ph_gUFAC1083 |uce_2347
GCTTTTCTATGCAGATTTTTTCTAATTCTCTCCCTCCCCTTGCTTCTGTCAGTGTGAAGC
CCACACTAAGCATTAACAGTATTAAAAAGAGTGTTATCTATTAGTTCAATTAGACATCAG
ACATTTACTTTCCAATGTATTTGAAGACTGATTTGATTTGGGTCCAATCATTTAAAAATA
AGAGAGCAGAACTGTGTACAGAGCTGTGTACAGATATCTGTAGCTCTGAAGTCTTAATTG
CAAATTCAGATAAGGATTAGAAGGGGCTGTATCTCTGTAGACCAAAGGTATTTGCTAATA
CCTGAGATATAAAAGTGGTTAAATTCAATATTTACTAATTTAGGATTTCCACTTTGGATT
TTGATTAAGCTTTTTGGTTGAAAACCCCACATTATTAAGCTGTGATGAGGGAAAAAGCAA
CTCTTTCATAAGCCTCACTTTAACGCTTTATTTCAAATAATTTATTTTGGACCTTCTAAA
G
>uce_353_Ph_gUFAC1083 |uce_353
>uce_101_Ph_gUFAC1083 |uce_101
TTGGGCTTTATTTCCACCTTAAAATCTTTACCTGGCCGTGATCTGTTGTTCCATTACTGG
AGGGCAAAAATGGGAGGAATTGTCTGGGCTAAATTGCAATTAGGCAGCCCTGAGAGAGGC
TGGCACCAGTTAACTTGGGATATTGGAGTGAAAAGGCCCGTAATCAGCCTTCGGTCATGT
AGAACAATGCATAAAATTAAATTGACATTAATGAATAATTGTGTAATGAAAATGGAAGAG
GAGAGTTAATTGCATGTTACAGTGAGTGTAATGCCTAGATAACCTTGCATTTAATGCTAT
TCTTAGCCCTGCTGCCAAGACTTCTACAGAGCCTCTCTCTGCAGGAAGTCATTAAAGCTG
TGAGTAGATAATGCAGGCTCAGTGAAACCTAAGTGGCAACAATATA
>uce_171_Ph_gUFAC1083 |uce_171
Do you know how to do it using grep, sed or awk?
Thx
$ awk '/^>/{if(match($0,"Ph_gUFAC1083")){s=1} else s=0}s' file
I made a simple criteria for your request,
If the the start of the line is >, we're going to judge if "Ph_gUFAC1083" existed, if yes, set s=1, set s=0 otherwise.
For the line that doesn't start with >, the value of s would be retained.
The final s in the awk command decide if the line to be printed (s=1) or not (s=0).
If what you want is every line with Ph_gUFAC1139 plus block of lines after that line until the next line starting with >, then the following awk snippet might do:
$ awk 'BEGIN {RS=ORS=">"} /Ph_gUFAC1139/' file.txt
This uses the > character as a record separator, then simply displays records that contain the text you're interested in.
If you wanted to be able to provide the search string using a variable, you'd do it something like this:
$ val="Ph_gUFAC1139"
$ awk -v s="$val" 'BEGIN {RS=ORS=">"} $0 ~ s' file.txt
UPDATE
A comment mentions that the solution above shows trailing record separators rather than leading ones. You can adapt your output to match your input by reversing this order manually:
awk 'BEGIN { RS=ORS=">" } /Ph_gUFAC1139/ { printf "%s%s",ORS,$0 }' file.txt
Note that in the initial examples, a "match" of the regex would invoke awk's default "action", which is to print the line. The default action is invoked if no action is specified within the script. The code (immediately) above includes an action .. which prints the record, preceded by the separator.
This might work for you (GNU sed):
sed '/^>/h;G;/Ph_gUFAC1083/P;d' file
Store each line beginning with > in the hold space (HS) and then append the HS to every line. If any line contains the string Ph_gUFAC1083 print the first line in the pattern space (PS) and discard the everything else.
N.B. the regexp for the match may be amended to /\n.*Ph_gUFAC1083/ if the string match may occur in any line.
This program is used to find the block which starts with Ph_gUFAC1083 and ends with any statement other than Ph_gUFAC1139
cat inp.txt |
awk '
BEGIN{begin=0}
{
# Ignore blank lines
if( $0 ~ /^$/ )
{
print $0
next
}
# mark the line that contains Ph_gUFAC1083 and print it
if( $0 ~ /Ph_gUFAC1083/ )
{
begin=1
print $0
}
else
{
# if the line contains Ph_gUFAC1083 and Ph_gUFAC1139 was found before it, print it
if( begin == 1 && ( $0 ~ /Ph_gUFAC1139/ ) )
{
print $0
}
else
{
# found a line which doesnt contain Ph_gUFAC1139 , mark the end of the block.
begin = 0
}
}
}'

How can I retrieve the matching records from mentioned file format in bash

XYZNA0000778800Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
I have above file format from which I want to find a matching record. For example, match a number(7789) on line starting with XYZ and once matched look for a matching number (7345) in lines below starting with 1 until it reaches to line starting with 9. retrieve the entire line record. How can I accomplish this using shell script, awk, sed or any combination.
Expected Output:
XYZNA0000778900Z
17345000012300324000000004000000000000000
With sed one can do:
$ sed -n '/^XYZ.*7789/,/^9$/{/^1.*7345/p}' file
17345000012300324000000004000000000000000
Breakdown:
sed -n ' ' # -n disabled automatic printing
/^XYZ.*7789/, # Match line starting with XYZ, and
# containing 7789
/^1.*7345/p # Print line starting with 1 and
# containing 7345, which is coming
# after the previous match
/^9$/ { } # Match line that is 9
range { stuff } will execute stuff when it's inside range, in this case the range is starting at /^XYZ.*7789/ and ending with /^9$/.
.* will match anything but newlines zero or more times.
If you want to print the whole block matching the conditions, one can use:
$ sed -n '/^XYZ.*7789/{:s;N;/\n9$/!bs;/\n1.*7345/p}' file
XYZNA0000778900Z
16123000012300321000000008000000000000000
16124000012300322000000007000000000000000
17234000012300323000000005000000000000000
17345000012300324000000004000000000000000
17456000012300325000000003000000000000000
9
This works by reading lines between ^XYZ.*7779 and ^9$ into the pattern
space. And then printing the whole thing if ^1.*7345 can be matches:
sed -n ' ' # -n disables printing
/^XYZ.*7789/{ } # Match line starting
# with XYZ that also contains 7789
:s; # Define label s
N; # Append next line to pattern space
/\n9$/!bs; # Goto s unless \n9$ matches
/\n1.*7345/p # Print whole pattern space
# if \n1.*7345 matches
I'd use awk:
awk -v rid=7789 -v fid=7345 -v RS='\n9\n' -F '\n' 'index($1, rid) { for(i = 2; i < $NF; ++i) { if(index($i, fid)) { print $i; next } } }' filename
This works as follows:
-v RS='\n9\n' is the meat of the whole thing. Awk separates its input into records (by default lines). This sets the record separator to \n9\n, which means that records are separated by lines with a single 9 on them. These records are further separated into fields, and
-F '\n' tells awk that fields in a record are separated by newlines, so that each line in a record becomes a field.
-v rid=7789 -v fid=7345 sets two awk variables rid and fid (meant by me as record identifier and field identifier, respectively. The names are arbitrary.) to your search strings. You could encode these in the awk script directly, but this way makes it easier and safer to replace the values with those of a shell variables (which I expect you'll want to do).
Then the code:
index($1, rid) { # In records whose first field contains rid
for(i = 2; i < $NF; ++i) { # Walk through the fields from the second
if(index($i, fid)) { # When you find one that contains fid
print $i # Print it,
next # and continue with the next record.
} # Remove the "next" line if you want all matching
} # fields.
}
Note that multi-character record separators are not strictly required by POSIX awk, and I'm not certain if BSD awk accepts it. Both GNU awk and mawk do, though.
EDIT: Misread question the first time around.
an extendable awk script can be
$ awk '/^9$/{s=0} s&&/7345/; /^XYZ/&&/7789/{s=1} ' file
set flag s when line starts with XYZ and contains 7789; reset when line is just 9, and print when flag is set and contains pattern 7345.
This might work for you (GNU sed):
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789/!b;/7345/p' file
Use the option -n for the grep-like nature of sed. Gather up records beginning with XYZ and ending in 9. Reject any records which do not have 7789 in the header. Print any remaining records that contain 7345.
If the 7345 will always follow the header,this could be shortened to:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^XYZ[^\n]*7789.*7345/p' file
If all records are well-formed (begin XYZ and end in 9) then use:
sed -n '/^XYZ/h;//!H;/^9/!b;x;/^[^\n]*7789.*7345/p' file

AWK between 2 patterns - first occurence

I am having this example of ini file. I need to extract the names between 2 patterns Name_Z1 and OBJ=Name_Z1 and put them each on a line.
The problem is that there are more than one occurences with Name_Z1 and OBJ=Name_Z1 and i only need first occurence.
[Name_Z5]
random;text
Names;Jesus;Tom;Miguel
random;text
OBJ=Name_Z5
[Name_Z1]
random;text
Names;Jhon;Alex;Smith
random;text
OBJ=Name_Z1
[Name_Z2]
random;text
Names;Chris;Mara;Iordana
random;text
OBJ=Name_Z2
[Name_Z1_Phone]
random;text
Names;Bill;Stan;Mike
random;text
OBJ=Name_Z1_Phone
My desired output would be:
Jhon
Alex
Smith
I am currently writing a more ample script in bash and i am stuck on this. I prefer awk to do the job.
My greatly appreciation for who can help me. Thank you!
For Wintermute solution: The [Name_Z1] part looks like this:
[CAB_Z1]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;AIRE;ALIMENTA;BATER;CONVERTIDOR;DISTRIBUCION;FUEGO;HURTO;MAINS;MALLO;MAYOR;MENOR;PANEL;TEMP
NAME=CAB_Z1
And the [Name_Z1_Phone] part looks like this:
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
The fix should be somewhere around the "|PerceivedSeverity"
Expected Output:
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
This should work:
sed -n '/^\[Name_Z1/,/^OBJ=Name_Z1/ { /^Names/ { s/^Names;//; s/;/\n/g; p; q } }' foo.txt
Explanation: Written readably, the code is
/^\[Name_Z1/,/^OBJ=Name_Z1/ {
/^Names/ {
s/^Names;//
s/;/\n/g
p
q
}
}
This means: In the pattern range /^\[Name_Z1/,/^OBJ=Name_Z1/, for all lines that match the pattern /^Names/, remove the Names; in the beginning, then replace all remaining ; with newlines, print the whole thing, and then quit. Since it immediately quits, it will only handle the first such line in the first such pattern range.
EDIT: The update made things a bit more complicated. I suggest
sed -n '/^\[CAB_Z1/,/^NAME=CAB_Z1/ { /^FilterAttr=/ { s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/; s/;/\n/g; p; q } }' foo.txt
The main difference is that instead of removing ^Names from a line, the substitution
s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/;
is applied. This isolates the part between contains; and |PerceivedSeverity before continuing as before. It assumes that there is only one such part in the line. If the match is ambiguous, it will pick the one that appears last in the line.
An (g)awk way that doesn't need a set number of fields(although i have assumed that contains; will always be on the line you need the names from.
(g)awk '(x+=/Z1/)&&match($0,/contains;([^|]+)/,a)&&gsub(";","\n",a[1]){print a[1];exit}' f
Explanation
(x+=/Z1/) - Increments x when Z1 is found. Also part of a
condition so x must exist to continue.
match($0,/contains;([^|]+)/,a) - Matches contains; and then captures everything after
up to the |. Stores the capture in a. Again a
condition so must succeed to continue.
gsub(";","\n",a[1]) - Substitutes all the ; for newlines in the capture
group a[1].
{print a[1];exit}' - If all conditions are met then print a[1] and exit.
This way should work in (m)awk
awk '(x+=/Z1/)&&/contains/{split($0,a,"|");y=split(a[2],b,";");for(i=3;i<=y;i++)
print b[i];exit}' file
sed -n '/\[Name_Z1\]/,/OBJ=Name_Z1$/ s/Names;//p' file.txt | tr ';' '\n'
That is sed -n to avoid printing anything not explicitly requested. Start from Name_Z1 and finish at OBJ=Name_Z1. Remove Names; and print the rest of the line where it occurs. Finally, replace semicolons with newlines.
Awk solution would be
$ awk -F";" '/Name_Z1/{f=1} f && /Names/{print $2,$3,$4} /OBJ=Name_Z1/{exit}' OFS="\n" input
Jhon
Alex
Smith
OR
$ awk -F";" '/Name_Z1/{f++} f==1 && /Names/{print $2,$3,$4}' OFS="\n" input
Jhon
Alex
Smith
-F";" sets the field seperator as ;
/Name_Z1/{f++} matches the line with pattern /Name_Z1/ If matched increment {f++}
f==1 && /Names/{print $2,$3,$4} is same as if f == 1 and maches pattern Name with line if true, then print the the columns 2 3 and 4 (delimted by ;)
OFS="\n" sets the output filed seperator as \n new line
EDIT
$ awk -F"[;|]" '/Z1/{f++} f==1 && NF>1{for (i=5; i<15; i++)print $i}' input
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
Here is a more generic solution for data in group of blocks.
This awk does not need the end tag, just the start.
awk -vRS= -F"\n" '/^\[Name_Z1\]/ {n=split($3,a,";");for (i=2;i<=n;i++) print a[i];exit}' file
Jhon
Alex
Smith
How it works:
awk -vRS= -F"\n" ' # By setting RS to nothing, one record equals one block. Then FS is set to one line as a field
/^\[Name_Z1\]/ { # Search for block with [Name_Z1]
n=split($3,a,";") # Split field 3, the names and store number of fields in variable n
for (i=2;i<=n;i++) # Loop from second to last field
print a[i] # Print the fields
exit # Exits after first find
' file
With updated data
cat file
data
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
data
awk -vRS= -F"\n" '/^\[CAB_Z1_FUEGO\]/ {split($3,a,"|");n=split(a[2],b,";");for (i=3;i<=n;i++) print b[i]}' file
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
The following awk script will do what you want:
awk 's==1&&/^Names/{gsub("Names;","",$0);gsub(";","\n",$0);print}/^\[Name_Z1\]$/||/^OBJ=Name_Z1$/{s++}' inputFileName
In more detail:
s==1 && /^Names;/ {
gsub ("Names;","",$0);
gsub(";","\n",$0);
print
}
/^\[Name_Z1\]$/ || /^OBJ=Name_Z1$/ {
s++
}
The state s starts with a value of zero and is incremented whenever you find one of the two lines:
[Name_Z1]
OBJ=Name_Z1
That means, between the first set of those lines, s will be equal to one. That's where the other condition comes in. When s is one and you find a line starting with Names;, you do two substitutions.
The first is to get rid of the Names; at the front, the second is to replace all ; semi-colon characters with a newline. Then you print it out.
The output for your given test data is, as expected:
Jhon
Alex
Smith

insert a string at specific position in a file by SED awk

I have a string which i need to insert at a specific position in a file :
The file contains multiple semicolons(;) i need to insert the string just before the last ";"
Is this possible with SED ?
Please do post the explanation with the command as I am new to shell scripting
before :
adad;sfs;sdfsf;fsdfs
string = jjjjj
after
adad;sfs;sdfsf jjjjj;fsdfs
Thanks in advance
This might work for you:
echo 'adad;sfs;sdfsf;fsdfs'| sed 's/\(.*\);/\1 jjjjj;/'
adad;sfs;sdfsf jjjjj;fsdfs
The \(.*\) is greedy and swallows the whole line, the ; makes the regexp backtrack to the last ;. The \(.*\) make s a back reference \1. Put all together in the RHS of the s command means insert jjjjj before the last ;.
sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/' filename
(substitute jjjjj with what you need to insert).
Example:
$ echo 'adad;sfs;sdfsf;fsdfs;' | sed 's/\([^;]*\)\(;[^;]*;$\)/\1jjjjj\2/'
adad;sfs;sdfsfjjjjj;fsdfs;
Explanation:
sed finds the following pattern: \([^;]*\)\(;[^;]*;$\). Escaped round brackets (\(, \)) form numbered groups so we can refer to them later as \1 and \2.
[^;]* is "everything but ;, repeated any number of times.
$ means end of the line.
Then it changes it to \1jjjjj\2.
\1 and \2 are groups matched in first and second round brackets.
For now, the shorter solution using sed : =)
sed -r 's#;([^;]+);$#; jjjjj;\1#' <<< 'adad;sfs;sdfsf;fsdfs;'
-r option stands for extented Regexp
# is the delimiter, the known / separator can be substituted to any other character
we match what's finishing by anything that's not a ; with the ; final one, $ mean end of the line
the last part from my explanation is captured with ()
finally, we substitute the matching part by adding "; jjjj" ans concatenate it with the captured part
Edit: POSIX version (more portable) :
echo 'adad;sfs;sdfsf;fsdfs;' | sed 's#;\([^;]\+\);$#; jjjjj;\1#'
echo 'adad;sfs;sdfsf;fsdfs;' | sed -r 's/(.*);(.*);/\1 jjjj;\2;/'
You don't need the negation of ; because sed is by default greedy, and will pick as much characters as it can.
sed -e 's/\(;[^;]*\)$/ jjjj\1/'
Inserts jjjj before the part where a semicolon is followed by any number of non-semicolons ([^;]*) at the end of the line $. \1 is called a backreference and contains the characters matched between \( and \).
UPDATE: Since the sample input has no longer a ";" at the end.
Something like this may work for you:
echo "adad;sfs;sdfsf;fsdfs"| awk 'BEGIN{FS=OFS=";"} {$(NF-1)=$(NF-1) " jjjjj"; print}'
OUTPUT:
adad;sfs;sdfsf jjjjj;fsdfs
Explanation: awk starts with setting FS (field separator) and OFS (output field separator) as semi colon ;. NF in awk stands for number of fields. $(NF-1) thus means last-1 field. In this awk command {$(NF-1)=$(NF-1) " jjjjj" I am just appending jjjjj to last-1 field.

Resources