Bash scripting: Get value before a count of linex between 2 words - bash

The problem is as follows.
File:
Name1
command
data1
data2
data3
done
Name2
command
data4
data5
done
Name1
command
data6
done
In the file above, I want to count the lines between "command" and "done" and give a list of names where this count is more then 1.
the output here should be:
Name1
Name2
I've experimented with:
sed -n "/command/,/done/p" | count
Any idea's?

This can be a way:
$ awk '/done/ {if (t>1) print name} {t++} /^Name/ {name=$0} /command/ {t=0}' a
Name1
Name2
Explanation
/done/ {if (t>1) print name} if the line contains done, print the title of the block just if the counter is > 1.
{t++} increment the counter in any case.
/^Name/ {name=$0} store the line value.
/command/ {t=0} if the line contains command, reset the counter.

How about this awk:
awk '/^[A-Z]/{name=$0;count=-1;next}/done/&&count>1{print name;next}{count++}' file
Output:
Name1
Name2
That says, if the line starts with a left-aligned capital letter, save that as the name of the command and set the counter to -1 ... so that it will be zero after the next line when "command" appears". Then move to next line without further ado. If the line matches the string "done" and the counter is greater than one, print the name we saved earlier and move to next line. Increment counter for all other lines.

If Name are not fixed string then you can use:
awk '/command/{cmd=1; n=0;next} !cmd{p=$1;next} cmd{n++} /done/{if (n>2) print p; cmd=0; next}' dat
Name1
Name2

Related

Print all lines between line containing a string and first blank line, starting with the line containing that string

I've tried awk:
awk -v RS="zuzu_mumu" '{print RS $0}' input_file > output_file
The obtained file is the exact input_file but now the first line in file is zuzu_mumu.
How could be corrected my command?
After solved this, I've found the same string/patern in another arrangement; so I need to save all those records that match too, in an output file, following this rule:
if pattern match on a line, then look at previous lines and print the first line that follows an empty line, and print also the pattern match line and an empty line.
record 1
record 2
This is record 3 first line
info 1
info 2
This is one matched zuzu_mumu line
info 3
info 4
info 5
record 4
record 5
...
This is record n-1 first line
info a
This is one matched zuzu_mumu line
info b
info c
record n
...
I should obtain:
This is record 3 first line
This is one matched zuzu_mumu line
This is record n-1 first line
This is one matched zuzu_mumu line
Print all lines between line containing a string and first blank line,
starting with the line containing that string
I would use GNU AWK for this task. Let file.txt content be
Able
Baker
Charlie
Dog
Easy
Fox
then
awk 'index($0,"aker"){p=1}p{if(/^$/){exit};print}' file.txt
output
Baker
Charlie
Explanation: use index String function which gives either position of aker in whole line ($0) or 0 and treat this as condition, so this is used like is aker inside line? Note that using index rather than regular expression means we do not have to care about characters with special meaning, like for example .. If it does set p value to 1. If p then if it is empty line (it matches start of line followed by end of line) terminate processing (exit); print whole line as is.
(tested in gawk 4.2.1)
If you don't want to match the same line again, you can record all lines in an array and print the valid lines in the END block.
awk '
f && /zuzu_mumu/ { # If already found and found again
delete ary; entries=1; next; # Delete the array, reset entries and go to the next record
}
f || /zuzu_mumu/ { # If already found or match the word or interest
if(/^[[:blank:]]*$/){exit} # If only spaces, exit
f=1 # Mark as found
ary[entries++]=$0 # Add the current line to the array and increment the entry number
}
END {
for (j=1; j<entries; j++) # Loop and print the array values
print ary[j]
}
' file

How to replace duplicated rows with "." in awk?

I need to substitute duplications in my first column with just "."
For example:
name1
name1
name1
name2
name2
name3
name3
And I need Output:
name1
.
.
name2
.
name3
.
I have solution like this:
awk '{c=$1} c==p{gsub(/./,".",$1)} {p=c} 1' in.file
But the output is:
name1
.....
.....
name2
.....
name3
.....
Is there any solution without any other piping?
Use an array to check if a line has already been seen!
$ awk 'seen[$0]++ {$0="."}1' file
name1
.
.
name2
.
name3
.
The typical way to skip repeated lines is to say awk '!seen[$0]++' file. Here we use the same logic but twisting it a little bit: we use the array seen[] to check if a line has appeared so far. If it has, seen[$0]++ will be bigger than 0, so {$0="."} will occur. Then, 1 prints either this or the line.
If you happen to need this to check not the full line but a defined column, do replace $0 (full record) with $n, where n is the nth field.
This function call:
gsub(/./,".",$1)
replaces each match of the pattern /./ with the string ".". The regex given matches any single character, so you are requesting exactly the behavior you observe: each character in the duplicate names is replaced with a ".".
There are many ways to fix it; among them would be to perform the substitution you really mean:
sub(/.*/, ".", $1)
That's not the best implementation, but it demonstrates the issue in your original code.
You could just add a * to your pattern inside gsub, to match the entire row:
awk '{c=$1} c==p{gsub(/.*/,".",$1)} {p=c} 1'

How to iterate over text file having multiple-words-per-line using shell script?

I know how to iterate over lines of text when the text file has contents as below:
abc
pqr
xyz
However, what if the contents of my text file are as below,
abc xyz
cdf pqr
lmn rst
and I need to get values "abc" stored to one variable and"xyz" stored to another variable. How would I do that?
read splits the line by $IFS as many times as you pass variables to it:
while read var1 var2 ; do
echo "var1: ${var1} var2: ${var2}"
done
You see, if you pass var1 and var2 both columns go to separate variables. But note that if the line would contain more columns var2 would contain the whole remaining line, not just column2.
Type help read for more info.
If the delimiter is a space then you can do:
#!/bin/bash
ALLVALUES=()
while read line
do
ALLVALUES+=( $line )
done < "/path/to/your/file"
So after, you can just reference an element by ${ALLVALUES[0]} or ${ALLVALUES[1]} etc
If you want to read every word in a file into a single array you can do it like this:
arr=()
while read -r -a _a; do
arr+=("${a[#]}")
done < infile
Which uses -r to avoid read from interpreting backslashes in the input and -a to have it split the words (splitting on $IFS) into an array. It then appends all the elements of that array to the accumulating array while being safe for globbing and other metacharacters.
This awk command reads the input word by word:
awk -v RS='[[:space:]]+' '1' file
abc
xyz
cdf
pqr
lmn
rst
To populate a shell array use awk command in process substitution:
arr=()
while read -r w; do
arr+=("$w")
done < <(awk -v RS='[[:space:]]+' '1' file)
And print the array content:
declare -p arr
declare -a arr='([0]="abc" [1]="xyz" [2]="cdf" [3]="pqr" [4]="lmn" [5]="rst")'

AWK between 2 patterns - first occurence

I am having this example of ini file. I need to extract the names between 2 patterns Name_Z1 and OBJ=Name_Z1 and put them each on a line.
The problem is that there are more than one occurences with Name_Z1 and OBJ=Name_Z1 and i only need first occurence.
[Name_Z5]
random;text
Names;Jesus;Tom;Miguel
random;text
OBJ=Name_Z5
[Name_Z1]
random;text
Names;Jhon;Alex;Smith
random;text
OBJ=Name_Z1
[Name_Z2]
random;text
Names;Chris;Mara;Iordana
random;text
OBJ=Name_Z2
[Name_Z1_Phone]
random;text
Names;Bill;Stan;Mike
random;text
OBJ=Name_Z1_Phone
My desired output would be:
Jhon
Alex
Smith
I am currently writing a more ample script in bash and i am stuck on this. I prefer awk to do the job.
My greatly appreciation for who can help me. Thank you!
For Wintermute solution: The [Name_Z1] part looks like this:
[CAB_Z1]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;AIRE;ALIMENTA;BATER;CONVERTIDOR;DISTRIBUCION;FUEGO;HURTO;MAINS;MALLO;MAYOR;MENOR;PANEL;TEMP
NAME=CAB_Z1
And the [Name_Z1_Phone] part looks like this:
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
The fix should be somewhere around the "|PerceivedSeverity"
Expected Output:
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
This should work:
sed -n '/^\[Name_Z1/,/^OBJ=Name_Z1/ { /^Names/ { s/^Names;//; s/;/\n/g; p; q } }' foo.txt
Explanation: Written readably, the code is
/^\[Name_Z1/,/^OBJ=Name_Z1/ {
/^Names/ {
s/^Names;//
s/;/\n/g
p
q
}
}
This means: In the pattern range /^\[Name_Z1/,/^OBJ=Name_Z1/, for all lines that match the pattern /^Names/, remove the Names; in the beginning, then replace all remaining ; with newlines, print the whole thing, and then quit. Since it immediately quits, it will only handle the first such line in the first such pattern range.
EDIT: The update made things a bit more complicated. I suggest
sed -n '/^\[CAB_Z1/,/^NAME=CAB_Z1/ { /^FilterAttr=/ { s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/; s/;/\n/g; p; q } }' foo.txt
The main difference is that instead of removing ^Names from a line, the substitution
s/^.*contains;\(.*\)|PerceivedSeverity.*$/\1/;
is applied. This isolates the part between contains; and |PerceivedSeverity before continuing as before. It assumes that there is only one such part in the line. If the match is ambiguous, it will pick the one that appears last in the line.
An (g)awk way that doesn't need a set number of fields(although i have assumed that contains; will always be on the line you need the names from.
(g)awk '(x+=/Z1/)&&match($0,/contains;([^|]+)/,a)&&gsub(";","\n",a[1]){print a[1];exit}' f
Explanation
(x+=/Z1/) - Increments x when Z1 is found. Also part of a
condition so x must exist to continue.
match($0,/contains;([^|]+)/,a) - Matches contains; and then captures everything after
up to the |. Stores the capture in a. Again a
condition so must succeed to continue.
gsub(";","\n",a[1]) - Substitutes all the ; for newlines in the capture
group a[1].
{print a[1];exit}' - If all conditions are met then print a[1] and exit.
This way should work in (m)awk
awk '(x+=/Z1/)&&/contains/{split($0,a,"|");y=split(a[2],b,";");for(i=3;i<=y;i++)
print b[i];exit}' file
sed -n '/\[Name_Z1\]/,/OBJ=Name_Z1$/ s/Names;//p' file.txt | tr ';' '\n'
That is sed -n to avoid printing anything not explicitly requested. Start from Name_Z1 and finish at OBJ=Name_Z1. Remove Names; and print the rest of the line where it occurs. Finally, replace semicolons with newlines.
Awk solution would be
$ awk -F";" '/Name_Z1/{f=1} f && /Names/{print $2,$3,$4} /OBJ=Name_Z1/{exit}' OFS="\n" input
Jhon
Alex
Smith
OR
$ awk -F";" '/Name_Z1/{f++} f==1 && /Names/{print $2,$3,$4}' OFS="\n" input
Jhon
Alex
Smith
-F";" sets the field seperator as ;
/Name_Z1/{f++} matches the line with pattern /Name_Z1/ If matched increment {f++}
f==1 && /Names/{print $2,$3,$4} is same as if f == 1 and maches pattern Name with line if true, then print the the columns 2 3 and 4 (delimted by ;)
OFS="\n" sets the output filed seperator as \n new line
EDIT
$ awk -F"[;|]" '/Z1/{f++} f==1 && NF>1{for (i=5; i<15; i++)print $i}' input
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
Here is a more generic solution for data in group of blocks.
This awk does not need the end tag, just the start.
awk -vRS= -F"\n" '/^\[Name_Z1\]/ {n=split($3,a,";");for (i=2;i<=n;i++) print a[i];exit}' file
Jhon
Alex
Smith
How it works:
awk -vRS= -F"\n" ' # By setting RS to nothing, one record equals one block. Then FS is set to one line as a field
/^\[Name_Z1\]/ { # Search for block with [Name_Z1]
n=split($3,a,";") # Split field 3, the names and store number of fields in variable n
for (i=2;i<=n;i++) # Loop from second to last field
print a[i] # Print the fields
exit # Exits after first find
' file
With updated data
cat file
data
[CAB_Z1_FUEGO]
READ_ONLY=false
FilterAttr=CeaseTime;blank|ObjectOfReference;contains;511047;512044;513008;593026;598326;CL5518;CL5521;CL5538;CL5612;CL5620|PerceivedSeverity;=;Critical;Major;Minor|ProbableCause;!=;HOUSE ALARM;IO DEVICE|ProblemText;contains;FUEGO
NAME=CAB_Z1_FUEGO
data
awk -vRS= -F"\n" '/^\[CAB_Z1_FUEGO\]/ {split($3,a,"|");n=split(a[2],b,";");for (i=3;i<=n;i++) print b[i]}' file
511047
512044
513008
593026
598326
CL5518
CL5521
CL5538
CL5612
CL5620
The following awk script will do what you want:
awk 's==1&&/^Names/{gsub("Names;","",$0);gsub(";","\n",$0);print}/^\[Name_Z1\]$/||/^OBJ=Name_Z1$/{s++}' inputFileName
In more detail:
s==1 && /^Names;/ {
gsub ("Names;","",$0);
gsub(";","\n",$0);
print
}
/^\[Name_Z1\]$/ || /^OBJ=Name_Z1$/ {
s++
}
The state s starts with a value of zero and is incremented whenever you find one of the two lines:
[Name_Z1]
OBJ=Name_Z1
That means, between the first set of those lines, s will be equal to one. That's where the other condition comes in. When s is one and you find a line starting with Names;, you do two substitutions.
The first is to get rid of the Names; at the front, the second is to replace all ; semi-colon characters with a newline. Then you print it out.
The output for your given test data is, as expected:
Jhon
Alex
Smith

grep the input file with keyword, then generate new report

cat infile
abc 123 678
sda 234 345 321
xyz 234 456 678
I need grep the file for keyword sda and report with first and last column.
sda has the value of 321
If you know bash script, I need a function in ruby as in below bash(awk) script:
awk '/sda/{print $1 " has the value of " $NF}' infile
How about something like this?
File.open("infile", "r").each_line do |line|
next unless line =~ /^sda/ # don't process the line unless it starts with "sda"
entries = line.split(" ")
var1 = entries.first
var2 = entries.last
puts "#{var1} has the value of #{var2}"
end
I don't know where you are defining the "sda" matcher. If it's fixed, you can just put it in there.
If not, you might try grabbing it from commandline arguments.
key, *_, value = line.split
next unless key == 'sda' # or "next if key != 'sda'"
puts your_string
Alternatively, you could use a regexp matcher in the beginning to see if the line starts with 'sda' or not.

Resources