grep a string from a specific block of text - bash

Some help required please...
I have a block of text in a file on my Linux machine like this;
Block.1:\
:Value1=something:\
:Value2=something_else:\
:Value3=something_other:
Block.2:\
:Value1=something:\
:Value2=something_else:\
:Value3=something_other:
Block.n:\
:Value1=something:\
:Value2=something_else:\
:Value3=something_other:
How can I use grep (and/or possibly awk?) to pluck out e.g Value2 from Block.2 only?
Blocks won't always be ordered sequentially (they have arbitary names) but will always be unique.
Colon and backslash positions are absolute.
TIA, Rob.

Following awk may help you in same.
awk -F"=" '/^Block\.2/{flag=1} flag && /Value2/{print $2;flag=""}' Input_file
Output will be as follows.
something_else:\
In case you want to print full line of value2 in block2 then change from print $2 to print in above code.
Explanation: Adding explanation of above code too now.
awk -F"=" ' ##Creating field separator as = for each line of Input_file.
/Block\.2/{ ##Checking condition if a line is having string Block.2, here I have escaped . to refrain its special meaning, if condition is TRUE then do follow:
flag=1 ##Setting variable flag value as 1, which indicates that flag is TRUE.
}
flag && /Value2/{ ##Checking condition if flag value is TRUE and line is having string Value2 in it then do following:
print $2; ##Printing 2nd field of the current line.
flag="" ##Nullifying the variable flag now.
}
' Input_file ##Mentioning the Input_file name here.

$ cat tst.awk
BEGIN { FS="[:=]" }
NF==2 { f = ($1 == "Block.2" ? 1 : 0) }
f && ($2 == "Value2") { print $3 }
$ awk -f tst.awk file
something_else

grep -A 2 "Block.2" | tail -1 | cut -d= -f2
explanation :
grep -A look for a pattern and prints 2 more lines (till Value2)
tail -1 gets the last line (the one with Value2)
cut use "=" as a field separator and prints second field

Related

Add Extra Strings Based on count of fields- Sed/Awk

I have data in below format in a text file.
null,"ABC:MNO"
"hjgy","ABC:PQR"
"mn","qwe","ABC:WER"
"mn","qwe","mno","ABC:WER"
All rows should have 3 fields like row 3. I want the data in below format.
"","","","ABC:MNO"
"hjgy","","","ABC:PQR"
"mn","qwe","","ABC:WER"
"mn","qwe","mno","ABC:WER"
If the row starts with null then null should be replace by "","","",
If there are only 2 fields then "","", should be added after 1st string .
If there are 3 fields then "", should be added after 2nd string
If there are 4 fields then do nothing.
I am able to handle 1st scenario by using sed 's/null/\"\",\"\",\"\"/' test.txt
But I dont know how to handle next 2 scenarios.
Regards.
With perl:
$ perl -pe 's/^null,/"","","",/; s/.*,\K/q("",) x (3 - tr|,||)/e' ip.txt
"","","","ABC:MNO"
"hjgy","","","ABC:PQR"
"mn","qwe","","ABC:WER"
"mn","qwe","mno","ABC:WER"
s/^null,/"","","",/ take care of null field first
.*,\K matches till last , in the line
\K is helpful to avoid having to put this matching portion back
3 - tr|,|| will give you how many fields are missing (tr return value is number of occurrences of , here)
q("",) here q() is used to represent single quoted string, so that escaping " isn't needed
x is the string replication operator
e flag allows you to use Perl code in replacement section
If rows starting with null, will always have two fields, then you can also use:
perl -pe 's/.*,\K/q("",) x (3 - tr|,||)/e; s/^null,/"",/'
Similar logic with awk:
awk -v q='"",' 'BEGIN{FS=OFS=","} {sub(/^null,/, q q q);
c=4-NF; while (c--) $NF = q $NF} 1'
With your shown samples only, please try following.
awk '
BEGIN{
FS=OFS=","
}
{
sub(/^null/,"\"\",\"\",\"\"")
}
NF==2{
$1=$1",\"\",\"\""
}
NF==3{
$2=$2",\"\""
}
1' Input_file
OR make " as a variable and one could try following too:
awk -v s1="\"\"" '
BEGIN{
FS=OFS=","
}
{
sub(/^null/,s1 "," s1","s1)
}
NF==2{
$1=$1"," s1 "," s1
}
NF==3{
$2=$2"," s1
}
1' Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS=OFS="," ##Setting FS and OFS to comma here.
}
{
sub(/^null/,"\"\",\"\",\"\"") ##Substituting starting with space null to "","","", in current line.
}
NF==2{ ##If number of fields are 2 then do following.
$1=$1",\"\",\"\"" ##Adding ,"","" after 1st field value here.
}
NF==3{ ##If number of fields are 3 here then do following.
$2=$2",\"\"" ##Adding ,"" after 2nd field value here.
}
1 ##Printing current line here.
' Input_file ##Mentioning Input_file name here.
A solution using awk:
awk -F "," 'BEGIN{ OFS=FS }
{ gsub(/^ /,"",$1)
if($1=="null") print "\x22\x22","\x22\x22","\x22\x22", $2
else if(NF==2) print $1,"\x22\x22","\x22\x22",$2
else if(NF==3) print $1,$2,"\x22\x22",$3
else print $0 }' input
This might work for you (GNU sed):
sed 's/^\s*null,/"",/;:a;ta;s/,/&/3;t;s/.*,/&"",/;ta' file
If the line begins with null replace that field by an empty one i.e. "",.
Reset the substitute success flag by going back to :a using ta (this will only be the case when the first field is null and has be substituted).
If the 3rd field separator exists then all done.
Otherwise, insert an empty field before the last field separator and repeat.

Regex pattern as variable in AWK

Let's say I have a file with multiple fields and field 1 needs to be filtered for 2 conditions. I was thinking of turning those conditions into a regex pattern and pass them as variables to the awk statement. For some reason, they are not filtering out the records at all. Here is my attempt that runs fine, but doesn't filter out the results per conditions, except when fed directly into awk without variable assignment.
regex1="/abc|def/"; # match first field for abc or def;
regex2="/123|567/"; # and also match the first field for 123 or 567;
cat file_name \
| awk -v pat1="${regex1}" -v pat2="${regex2}" 'BEGIN{FS=OFS="\t"} {if ( ($1~pat1) && ($1~pat2) ) print $0}'
Update: Fixed a syntax error related to missing parenthesis for the if conditions in the awk. (I had it fixed in the code I ran).
Sample data
abc:567 1
egf:888 2
Expected output
abc:567 1
The problem is that I am getting all the results instead of the ones that satisfy the 2 regex for field 1
Note that the match needs to be wildcarded instead of exact match. Meaning 567 as defined in the regex pattern should also match on 567_1 if available.
It seems like the way to implement what you want to do would be:
awk -F'\t' '
($1 ~ /abc|def/) &&
($1 ~ /123|567/)
' file
or probably more robustly:
awk -F'\t' '
{ split($1,a,/:/) }
(a[1] ~ /abc|def/) &&
(a[2] ~ /123|567/)
' file
What's wrong with that?
EDIT here is me running the OPs code before and after fixing the inclusion of regexp delimiters (/) in the dynamic regexp strings:
$ cat tst.sh
#!/usr/bin/env bash
regex1="/abc|def/"; #--match first field for abc or def;
regex2="/123|567/"; #--and also match the first field for 123 or 567;
cat file_name \
| awk -v pat1="${regex1}" -v pat2="${regex2}" 'BEGIN{FS=OFS="\t"} $1~pat1 && $1~pat2'
echo "###################"
regex1="abc|def"; #--match first field for abc or def;
regex2="123|567"; #--and also match the first field for 123 or 567;
cat file_name \
| awk -v pat1="${regex1}" -v pat2="${regex2}" 'BEGIN{FS=OFS="\t"} $1~pat1 && $1~pat2'
$
$ ./tst.sh
###################
abc:567 1
EDIT: Since OP has changed the samples, so adding this solution here, this will work for partial matches also, again written and tested with shown samples in GNU awk.
awk -F':|[[:space:]]+' -v var1="abc|def" -v var2="123|567" '
BEGIN{
num=split(var1,arr1,"|")
split(var2,arr2,"|")
for(i=1;i<=num;i++){
reg1[arr1[i]]
reg2[arr2[i]]
}
}
{
for(i in reg1){
if(index($1,i)){
for(j in reg2){
if(index($2,j)){ print; next }
}
}
}
}
' Input_file
Let's say following is an Input_file:
cat Input_file
abc_2:567_3 1
egf:888 2
Now after running above code we will get abc_2:567_3 1 in output.
With your shown samples only, could you please try following. Written and tested in GNU awk. Give your values which you you want to look for in 1st column in var1 and those which you want to look in 2nd field in var2 variables respectively with pipe delimiter in it.
awk -F':|[[:space:]]+' -v var1="abc|def" -v var2="123|567" '
BEGIN{
num=split(var1,arr1,"|")
split(var2,arr2,"|")
for(i=1;i<=num;i++){
reg1[arr1[i]]
reg2[arr2[i]]
}
}
($1 in reg1) && ($2 in reg2)
' Input_file
Explanation: Adding detailed explanation for above.
awk -F':|[[:space:]]+' -v var1="abc|def" -v var2="123|567" ' ##Starting awk program from here.
##Setting field separator as colon or spaces, setting var1 and var2 values here.
BEGIN{ ##Starting BEGIN section of this program from here.
num=split(var1,arr1,"|") ##Splitting var1 to arr1 here.
split(var2,arr2,"|") ##Splitting var2 to arr2 here.
for(i=1;i<=num;i++){ ##Running for loop from 1 to till value of num here.
reg1[arr1[i]] ##Creating reg1 with index of arr1 value here.
reg2[arr2[i]] ##Creating reg1 with index of arr2 value here.
}
}
($1 in reg1) && ($2 in reg2) ##Checking condition if 1st field is present in reg1 AND in reg2 then print that line.
' Input_file ##Mentioning Input_file name here.

issue for condition on unique raws in bash

I want to print rows of a table in a file, the issue is when I use a readline the reprint me the result several times, here is my input file
aa ,DEC ,file1.txt
aa ,CHAR ,file1.txt
cc ,CHAR ,file1.txt
dd ,DEC ,file2.txt
bb ,DEC ,file3.txt
bb ,CHAR ,file3.txt
cc ,DEC ,file1.txt
Here is the result I want to have:
printed in file1.txt
aa#DEC,CHAR
cc#CHAR,DEC
printed in file2.txt
dd#DEC
printed in file3.txt
bb#DEC,CHAR
here is it my attempt :
(cat input.txt|while read line
do
table=`echo $line|cut -d"," -f1
variable=`echo $line|cut -d"," -f2
file=`echo $line|cut -d"," -f3
echo ${table}#${variable},
done ) > ${file}
This can be done in a single pass gnu awk like this:
awk -F ' *, *' '{
map[$3][$1] = (map[$3][$1] == "" ? "" : map[$3][$1] ",") $2
}
END {
for (f in map)
for (d in map[f])
print d "#" map[f][d] > f
}' file
This will populate this data:
=== file1.txt ===
aa#DEC,CHAR
cc#CHAR,DEC
=== file2.txt ===
dd#DEC
=== file3.txt ===
bb#DEC,CHAR
With your shown samples, could you please try following, written and tested in shown samples in GNU awk.
awk '
{
sub(/^,/,"",$3)
}
FNR==NR{
sub(/^,/,"",$2)
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
next
}
(($1,$3) in arr){
close(outputFile)
outputFile=$3
print $1"#"arr[$1,$3] >> (outputFile)
delete arr[$1,$3]
}
' Input_file Input_file
Explanation: Adding detailed explanation for above.
awk ' ##Starting awk program from here.
{
sub(/^,/,"",$3) ##Substituting starting comma in 3rd field with NULL.
}
FNR==NR{ ##Checking condition FNR==NR will be true when first time Input_file is being read.
sub(/^,/,"",$2) ##Substituting starting comma with NULL in 2nd field.
arr[$1,$3]=(arr[$1,$3]?arr[$1,$3]",":"")$2
##Creating arr with index of 1st and 3rd fields, which has 2nd field as value.
next ##next will skip all further statements from here.
}
(($1,$3) in arr){ ##Checking condition if 1st and 3rd fields are in arr then do following.
close(outputFile) ##Closing output file, to avoid "too many opened files" error.
outputFile=$3 ##Setting outputFile with value of 3rd field.
print $1"#"arr[$1,$3] >> (outputFile)
##printing 1st field # arr value and output it to outputFile here.
delete arr[$1,$3] ##Deleting array element with index of 1st and 3rd field here.
}
' Input_file Input_file ##Mentioning Input_file 2 times here.
You have several errors in your code. You can use the built-in read to split on a comma, and the parentheses are completely unnecessary.
while IFS=, read -r table variable file
do
echo "${table}#${variable}," >>"$file"
done< input.txt
Using $file in a redirect after done is an error; the shell wants to open the file handle to redirect to before file is defined. But as per your requirements, each line should go to a different `file.
Notice also quoting fixes and the omission of the useless cat.
Wrapping fields with the same value onto the same line would be comfortably easy with an Awk postprocessor, but then you might as well do all of this in Awk, as in the other answer you already received.

Condition on Nth character of string in a Mth column in bash

I have a sample
$ cat c.csv
a,1234543,c
b,1231456,d
c,1230654,e
I need to grep only numbers where 4th character of 2nd column but not be 0 or 1
Output must be
a,1234543,c
I know this only
awk -F, 'BEGIN { OFS = FS } $2 ~/^[2-9]/' c.csv
Is it possible to put a condition on 4th character?
Could you please try following.
awk 'BEGIN{FS=","} substr($2,4,1)!=0 && substr($2,4,1)!=1' Input_file
OR as per Ed site's suggestion:
awk 'BEGIN{FS=","} substr($2,4,1)!~[01]' Input_file
Explanation: Adding a detailed explanation for above code here.
awk ' ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section from here.
FS="," ##Setting field separator as comma here.
} ##Closing BLOCK for this program BEGIN section.
substr($2,4,1)!=0 && substr($2,4,1)!=1 ##Checking conditions if 4th character of current line is NOT 0 and 1 then print the current line.
' Input_file ##Mentioning Input_file name here.
This might work for you (GNU sed or grep):
grep -vE '^([^,]*,){1}[^,]{3}[01]' file
or:
sed -E '/^([^,]*,){1}[^,]{3}[01]/d' file
Replace the 1 for the m'th-1 column and the 3 for the n'th-1 character in that column.
Grep is the answer.
But here is another way using array and variable substitution
test=( $(cat c.csv) ) # load c.csv data to an array
echo ${test[#]//*,???[0-1]*/} # print all items from an array,
# but remove the ones that correspond to this regex *,???[0-1]*
# so 'b,1231456,d' and 'c,1230654,e' from example will be removed
# and only 'a,1234543,c' will be printed
There are many ways to do this with awk. the most literal form would be:
4th character of 2nd column is not 0 or 1
$ awk -F, '($2 !~ /^...[01]/)' file
$ awk -F, '($2 ~ /^...[^01]/)' file
These will also match a line a,abcdefg,b
2nd column is an integer and 4th character is not 0 or 1
$ awk -F, '($2+0==$2) && ($2!~[.]) && ($2 !~ /^...[01]/)'
$ awk -F, '($2 ~ /^[0-9][0-9][0-9][^01][0-9]*$/)'

Ignore delimiters in quotes and excluding columns dynamically in csv file

I have awk command to read the csv file with | sperator. I am using this command as part of my shell script where the columns to exclude will be removed from the output. The list of columns are input as 1 2 3
Command Reference: http://wiki.bash-hackers.org/snipplets/awkcsv
awk -v FS='"| "|^"|"$' '{for i in $test; do $(echo $i=""); done print }' test.csv
$test is 1 2 3
I want to print $1="" $2="" $3="" in front of print all columns. I am getting this error
awk: {for i in $test; do $(echo $i=""); done {print }
awk: ^ syntax error
This command is working properly which prints all the columns
awk -v FS='"| "|^"|"$' '{print }' test.csv
File 1
"first"| "second"| "last"
"fir|st"| "second"| "last"
"firtst one"| "sec|ond field"| "final|ly"
Expected output if I want to exclude the column 2 and 3 dynamically
first
fir|st
firtst one
I need help to keep the for loop properly.
With GNU awk for FPAT:
$ awk -v FPAT='"[^"]+"' '{print $1}' file
"first"
"fir|st"
"firtst one"
$ awk -v flds='1' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' file
"first"
"fir|st"
"firtst one"
$ awk -v flds='2 3' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' file
"second" "last"
"second" "last"
"sec|ond field" "final|ly"
$ awk -v flds='3 1' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", $(f[i]), (i<n?OFS:ORS)}' file
"last" "first"
"last" "fir|st"
"final|ly" "firtst one"
If you don't want your output fields separated by a blank char then set OFS to whatever you do want with -v OFS='whatever'. If you want to get rid of the surrounding quotes you can use gensub() (since we're using gawk anyway) or substr() on every field, e.g.:
$ awk -v OFS=';' -v flds='1 3' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", substr($(f[i]),2,length($(f[i]))-2), (i<n?OFS:ORS)}' file
first;last
fir|st;last
firtst one;final|ly
$ awk -v OFS=';' -v flds='1 3' -v FPAT='"[^"]+"' 'BEGIN{n=split(flds,f,/ /)} {for (i=1;i<=n;i++) printf "%s%s", gensub(/"/,"","g",$(f[i])), (i<n?OFS:ORS)}' file
first;last
fir|st;last
firtst one;final|ly
In GNU awk (for FPAT):
$ test="2 3" # fields to exclude in bash var $test
$ awk -v t="$test" ' # taken to awk var t
BEGIN { # first
FPAT="([^|]+)|( *\"[^\"]+\")" # instead of FS, use FPAT
split(t,a," ") # process t to e:
for(i in a) # a[1]=2 -> e[2], etc.
e[a[i]]
}
{
for(i=1;i<=NF;i++) # for each field
if((i in e)==0) { # if field # not in e
gsub(/^\"|\"$/,"",$i) # remove leading and trailing "
b=b (b==""?"":OFS) $i # put to buffer b
}
print b; b="" # putput and reset buffer
}' file
first
fir|st
firtst one
FPAT is used as FS can't handle separator in quotes.
Vikram, if your actual Input_file is DITTO same as shown sample Input_file then following may help you in same. I will add explanation shortly too here(tested this with GNU awk 3.1.7 little old version of awk).
awk -v num="2,3" 'BEGIN{
len=split(num, val,",")
}
{while($0){
match($0,/.[^"]*/);
if(substr($0,RSTART,RLENGTH+1) && substr($0,RSTART,RLENGTH+1) !~ /\"\| \"/ && substr($0,RSTART,RLENGTH+1) !~ /^\"$/ && substr($0,RSTART,RLENGTH+1) !~ /^\" \"$/){
array[++i]=substr($0,RSTART,RLENGTH+1)
};
$0=substr($0,RLENGTH+1);
};
for(l=1;l<=len;l++){
delete array[val[l]]
};
for(j=1;j<=length(array);j++){
if(array[j]){
gsub(/^\"|\"$/,"",array[j]);
printf("%s%s",array[j],j==length(array)?"":" ")
}
};
print "";
i="";
delete array
}' Input_file
EDIT1: Adding a code with explanation too here.
awk -v num="2,3" 'BEGIN{ ##creating a variable named num whose value is comma seprated values of fields which you want to nullify, starting BEGIN section here.
len=split(num, val,",") ##creating an array named val here whose delimiter is comma and creating len variable whose value is length of array val here.
}
{while($0){ ##Starting a while loop here which will run for a single line till that line is NOT getting null.
match($0,/.[^"]*/);##using match functionality which will look for matches from starting to till a " comes into match.
if(substr($0,RSTART,RLENGTH+1) && substr($0,RSTART,RLENGTH+1) !~ /\"\| \"/ && substr($0,RSTART,RLENGTH+1) !~ /^\"$/ && substr($0,RSTART,RLENGTH+1) !~ /^\" \"$/){##So RSTATR and RLENGTH are the variables which will be set when a regex is having a match in line/variable passed into match function. In this if condition I am checking 1st: value of substring of RSTART,RLENGTH+1 should not be NULL. 2nd: Then checking this substring should not be having " pipe space ". 3rd condition: Checking if substring is NOT equal to a string which starts from " and ending with it. 4th condition: Checking here if substring is NOT equal to ^" space "$, if all conditions are TRUE then do following actions.
array[++i]=substr($0,RSTART,RLENGTH+1) ##creating an array named array whose index is variable i with increasing value of i and its value is substring of RSTART to till RLENGTH+1.
};
$0=substr($0,RLENGTH+1);##Now removing the matched part from current line which will decrease the length of line and avoid the while loop to become as infinite.
};
for(l=1;l<=len;l++){##Starting a loop here once while above loop is done which runs from starting of variable l=1 to value of len.
delete array[val[l]] ##Deleting here those values which we want to REMOVE from OPs request, so removing here.
};
for(j=1;j<=length(array);j++){##Start a for loop from the value of j=1 till the value of lengthh of array.
if(array[j]){ ##Now making sure array value whose index is j is NOT NULL, if yes then perform following statements.
gsub(/^\"|\"$/,"",array[j]); ##Globally substituting starting " and ending " with NULL in value of array value.
printf("%s%s",array[j],j==length(array)?"":" ") ##Now printing the value of array and secondly printing space or null depending upon if j value is equal to array length then print NULL else print space. It is because we don not want space at the last of the line.
}
};
print ""; ##Because above printf will NOT print a new line, so printing a new line.
i=""; ##Nullifying variable i here.
delete array ##Deleting array here.
}' Input_file ##Mentioning Input_file here.

Resources