How to replace duplicated rows with "." in awk? - bash

I need to substitute duplications in my first column with just "."
For example:
name1
name1
name1
name2
name2
name3
name3
And I need Output:
name1
.
.
name2
.
name3
.
I have solution like this:
awk '{c=$1} c==p{gsub(/./,".",$1)} {p=c} 1' in.file
But the output is:
name1
.....
.....
name2
.....
name3
.....
Is there any solution without any other piping?

Use an array to check if a line has already been seen!
$ awk 'seen[$0]++ {$0="."}1' file
name1
.
.
name2
.
name3
.
The typical way to skip repeated lines is to say awk '!seen[$0]++' file. Here we use the same logic but twisting it a little bit: we use the array seen[] to check if a line has appeared so far. If it has, seen[$0]++ will be bigger than 0, so {$0="."} will occur. Then, 1 prints either this or the line.
If you happen to need this to check not the full line but a defined column, do replace $0 (full record) with $n, where n is the nth field.

This function call:
gsub(/./,".",$1)
replaces each match of the pattern /./ with the string ".". The regex given matches any single character, so you are requesting exactly the behavior you observe: each character in the duplicate names is replaced with a ".".
There are many ways to fix it; among them would be to perform the substitution you really mean:
sub(/.*/, ".", $1)
That's not the best implementation, but it demonstrates the issue in your original code.

You could just add a * to your pattern inside gsub, to match the entire row:
awk '{c=$1} c==p{gsub(/.*/,".",$1)} {p=c} 1'

Related

bash script: how to insert text between two specific characters

For example, I have a file containing a line as below:
"abc":"def"
I need to insert 123 between "abc":" and def" so that the line will become: "abc":"123def".
As "abc" appears only once so I think I can just search it and do the insertion.
How to do this with bash script such as sed or awk?
AMD$ sed 's/"abc":"/&123/' File
"abc":"123def"
Match "abc":", then append this match with 123 (& will contain the matched string "abc":")
If you want to take care of space before and after :, you can use:
sed 's/"abc" *: *"/&123/'
For replacing all such patterns, use g with sed.
sed 's/"abc" *: *"/&123/g' File
sed:
$ sed -E 's/(:")(.*)/\1123\2/' <<<'"abc":"def"'
"abc":"123def"
(:") gets :" and put in captured group 1
(.*) gets the remaining portion and put in captured group 2
in the replacement, \1123\2 puts 123 between the groups
awk:
$ awk -F: 'sub(".", "&123", $2)' <<<'"abc":"def"'
"abc" "123def"
In the sub() function, the second ($2) field is being operated on, pattern is used as . (which would match "), and in the replacement the matched portion (&) is followed by 123.
echo '"abc":"def"'| awk '{sub(/def/,"123def")}1'
"abc":"123def"

How to delete the first matched pattern in unix

I want to delete the character from first line to till first matched pattern.
Ex :
xyz#abc:Hello:1234 xyz
I want to replace first character to till first matched pattern : (I don't want to delete all)
I need result like :
Hello:1234 xyz
Please can anyone help me on this .
You can use this awk:
awk '!p && p=index($0, ":"){$0=substr($0, p+1)} p' file

AWK between 2 patterns - first occurence

I am having this example of ini file. I need to extract the names between 2 patterns Name_Z1 and OBJ=Name_Z1 and put them each on a line.
The problem is that there are more than one occurences with Name_Z1 and OBJ=Name_Z1 and i only need first occurence.
[Name_Z5]
random;text
Names;Jesus;Tom;Miguel
random;text
OBJ=Name_Z5
[Name_Z1]
random;text
Names;Jhon;Alex;Smith
random;text
OBJ=Name_Z1
[Name_Z2]
random;text
Names;Chris;Mara;Iordana
random;text
OBJ=Name_Z2
[Name_Z1_Phone]
random;text
Names;Bill;Stan;Mike
random;text
OBJ=Name_Z1_Phone
My desired output would be:
Jhon
Alex
Smith
I am currently writing a more ample script in bash and i am stuck on this. I prefer awk to do the job.
My greatly appreciation for who can help me. Thank you!
For Wintermute solution: The [Name_Z1] part looks like this:
[CAB_Z1]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;AIRE;ALIMENTA;BATER;CONVERTIDOR;DISTRIBUCION;FUEGO;HURTO;MAINS;MALLO;MAYOR;MENOR;PANEL;TEMP
NAME=CAB_Z1
And the [Name_Z1_Phone] part looks like this:
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
The fix should be somewhere around the "|PerceivedSeverity"
Expected Output:
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
This should work:
sed -n '/^\[Name_Z1/,/^OBJ=Name_Z1/ { /^Names/ { s/^Names;//; s/;/\n/g; p; q } }' foo.txt
Explanation: Written readably, the code is
/^\[Name_Z1/,/^OBJ=Name_Z1/ {
/^Names/ {
s/^Names;//
s/;/\n/g
p
q
}
}
This means: In the pattern range /^\[Name_Z1/,/^OBJ=Name_Z1/, for all lines that match the pattern /^Names/, remove the Names; in the beginning, then replace all remaining ; with newlines, print the whole thing, and then quit. Since it immediately quits, it will only handle the first such line in the first such pattern range.
EDIT: The update made things a bit more complicated. I suggest
sed -n '/^\[CAB_Z1/,/^NAME=CAB_Z1/ { /^FilterAttr=/ { s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/; s/;/\n/g; p; q } }' foo.txt
The main difference is that instead of removing ^Names from a line, the substitution
s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/;
is applied. This isolates the part between contains; and |PerceivedSeverity before continuing as before. It assumes that there is only one such part in the line. If the match is ambiguous, it will pick the one that appears last in the line.
An (g)awk way that doesn't need a set number of fields(although i have assumed that contains; will always be on the line you need the names from.
(g)awk '(x+=/Z1/)&&match($0,/contains;([^|]+)/,a)&&gsub(";","\n",a[1]){print a[1];exit}' f
Explanation
(x+=/Z1/) - Increments x when Z1 is found. Also part of a
condition so x must exist to continue.
match($0,/contains;([^|]+)/,a) - Matches contains; and then captures everything after
up to the |. Stores the capture in a. Again a
condition so must succeed to continue.
gsub(";","\n",a[1]) - Substitutes all the ; for newlines in the capture
group a[1].
{print a[1];exit}' - If all conditions are met then print a[1] and exit.
This way should work in (m)awk
awk '(x+=/Z1/)&&/contains/{split($0,a,"|");y=split(a[2],b,";");for(i=3;i<=y;i++)
print b[i];exit}' file
sed -n '/\[Name_Z1\]/,/OBJ=Name_Z1$/ s/Names;//p' file.txt | tr ';' '\n'
That is sed -n to avoid printing anything not explicitly requested. Start from Name_Z1 and finish at OBJ=Name_Z1. Remove Names; and print the rest of the line where it occurs. Finally, replace semicolons with newlines.
Awk solution would be
$ awk -F";" '/Name_Z1/{f=1} f && /Names/{print $2,$3,$4} /OBJ=Name_Z1/{exit}' OFS="\n" input
Jhon
Alex
Smith
OR
$ awk -F";" '/Name_Z1/{f++} f==1 && /Names/{print $2,$3,$4}' OFS="\n" input
Jhon
Alex
Smith
-F";" sets the field seperator as ;
/Name_Z1/{f++} matches the line with pattern /Name_Z1/ If matched increment {f++}
f==1 && /Names/{print $2,$3,$4} is same as if f == 1 and maches pattern Name with line if true, then print the the columns 2 3 and 4 (delimted by ;)
OFS="\n" sets the output filed seperator as \n new line
EDIT
$ awk -F"[;|]" '/Z1/{f++} f==1 && NF>1{for (i=5; i<15; i++)print $i}' input
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
Here is a more generic solution for data in group of blocks.
This awk does not need the end tag, just the start.
awk -vRS= -F"\n" '/^\[Name_Z1\]/ {n=split($3,a,";");for (i=2;i<=n;i++) print a[i];exit}' file
Jhon
Alex
Smith
How it works:
awk -vRS= -F"\n" ' # By setting RS to nothing, one record equals one block. Then FS is set to one line as a field
/^\[Name_Z1\]/ { # Search for block with [Name_Z1]
n=split($3,a,";") # Split field 3, the names and store number of fields in variable n
for (i=2;i<=n;i++) # Loop from second to last field
print a[i] # Print the fields
exit # Exits after first find
' file
With updated data
cat file
data
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
data
awk -vRS= -F"\n" '/^\[CAB_Z1_FUEGO\]/ {split($3,a,"|");n=split(a[2],b,";");for (i=3;i<=n;i++) print b[i]}' file
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
The following awk script will do what you want:
awk 's==1&&/^Names/{gsub("Names;","",$0);gsub(";","\n",$0);print}/^\[Name_Z1\]$/||/^OBJ=Name_Z1$/{s++}' inputFileName
In more detail:
s==1 && /^Names;/ {
gsub ("Names;","",$0);
gsub(";","\n",$0);
print
}
/^\[Name_Z1\]$/ || /^OBJ=Name_Z1$/ {
s++
}
The state s starts with a value of zero and is incremented whenever you find one of the two lines:
[Name_Z1]
OBJ=Name_Z1
That means, between the first set of those lines, s will be equal to one. That's where the other condition comes in. When s is one and you find a line starting with Names;, you do two substitutions.
The first is to get rid of the Names; at the front, the second is to replace all ; semi-colon characters with a newline. Then you print it out.
The output for your given test data is, as expected:
Jhon
Alex
Smith

command to print all the lines until the first match

I am trying to print all the lines from a file before the first match. I have the same entries again in the file, but I don't need that lines. Tried
awk "{print} /${pattern}/ {exit}" and sed "/$pattern/q" (my serach is based on a variable). But both these commands are printing all the line before the last match
ex: my file is like
abc
bcd
def
xyz
def
lmno
def
xvd
when my pattern is 'def', i just need abc and bcd . but the above commands are printing, all the lines before the last 'def'. could you please provide some idea
This should work:
awk '!'"/${pattern}/{print} /${pattern}/ {exit}" input_file.txt

Bash scripting: Get value before a count of linex between 2 words

The problem is as follows.
File:
Name1
command
data1
data2
data3
done
Name2
command
data4
data5
done
Name1
command
data6
done
In the file above, I want to count the lines between "command" and "done" and give a list of names where this count is more then 1.
the output here should be:
Name1
Name2
I've experimented with:
sed -n "/command/,/done/p" | count
Any idea's?
This can be a way:
$ awk '/done/ {if (t>1) print name} {t++} /^Name/ {name=$0} /command/ {t=0}' a
Name1
Name2
Explanation
/done/ {if (t>1) print name} if the line contains done, print the title of the block just if the counter is > 1.
{t++} increment the counter in any case.
/^Name/ {name=$0} store the line value.
/command/ {t=0} if the line contains command, reset the counter.
How about this awk:
awk '/^[A-Z]/{name=$0;count=-1;next}/done/&&count>1{print name;next}{count++}' file
Output:
Name1
Name2
That says, if the line starts with a left-aligned capital letter, save that as the name of the command and set the counter to -1 ... so that it will be zero after the next line when "command" appears". Then move to next line without further ado. If the line matches the string "done" and the counter is greater than one, print the name we saved earlier and move to next line. Increment counter for all other lines.
If Name are not fixed string then you can use:
awk '/command/{cmd=1; n=0;next} !cmd{p=$1;next} cmd{n++} /done/{if (n>2) print p; cmd=0; next}' dat
Name1
Name2

Resources