Get substring of regex from string - bash

I am using a bash script to parse logs and need to extract a substring.
My string is something like this -
TestingmkJHSBD,MFV from testing:2.6.1.566-978.7 testing
How can I extract this string using bash

Would you please try the following:
awk '
/FROM image/ {
if (match($0, /image[^[:space:]]+/))
print(substr($0, RSTART, RLENGTH))
}
' logfile
The regex image[^[:space:]]+ matches a substring which starts with
image and followed by non-space character(s).
Then the awk variables RSTART and RLENGTH are assigned to the position
and the length of the matched substring.

Another awk option:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"TestingmkJHSBD,MFV FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7
Or using different log string:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"bdkjf asfjkklsdfsg FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7

Related

BASH How to get minimum value from each row

I have csv file like this:
-0.106992, -0.106992, -0.059528, -0.059528, -0.028184, -0.028184, 0.017793, 0.017793, 0.0, 0.220367
-0.094557, -0.094557, -0.063707, -0.063707, -0.020796, -0.020796, 0.003707, 0.003707, 0.200767, 0.200767
-0.106038, -0.106038, -0.056540, -0.056540, -0.015119, -0.015119, 0.032954, 0.032954, 0.237774, 0.237774
-0.049499, -0.049499, -0.006934, -0.006934, 0.026562, 0.026562, 0.067442, 0.067442, 0.260149, 0.260149
-0.081001, -0.081001, -0.039581, -0.039581, -0.008817, -0.008817, 0.029912, 0.029912, 0.222084, 0.222084
-0.046782, -0.046782, -0.000180, -0.000180, 0.030788, 0.030788, 0.075928, 0.075928, 0.266452, 0.266452
-0.082107, -0.082107, -0.026791, -0.026791, 0.001874, 0.001874, 0.052341, 0.052341, 0.249779, 0.249779
enter image description here
I want to get the minimum value from each row.
Expected output must be:
-0.106992
-0.094557
-0.106038
-0.049499
-0.08100
-0.046782
-0.082107
I tried get it by awk but awk doesn't give minimum values:
awk command:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
output:
-0.028184,
-0.020796,
-0.015119,
-0.006934,
-0.008817,
-0.000180,
-0.026791,
Perl makes short work of this:
perl -MList::Util=min -F', ' -E 'say min #F' file.csv
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
Using any awk in any shell on every Unix box whether you have blanks after each comma or not:
$ awk -F', *' '{min=$1; for (i=2;i<=NF;i++) if ($i<min) min=$i; print min}' file
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
with ruby :-D
ruby -F', ' -ane 'puts $F.map(&:to_f).min' file.csv
Your code is correct:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
Except that you must add a comma to the field separator:
awk -F '[[:blank:],]' '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
[[:blank:],] is spaces, tabs, and commas.

Error with awk (newline or end of string)

I am having an issue with the following command:
awk ‘{if ($1 ~ /^##contig/) {next}else if ($1 ~ /^#/) {print $0; next}else {print $0 | “sort -k1,1V -k2,2n”}’ file.vcf > out.vcf
It gives the following error:
^ unexpected newline or end of string
Your command contains "fancy quotes" instead of normal ones, in addition to a missing }.
awk '{if ($1 ~ /^##contig/) {next} else if ($1 ~ /^#/) {print $0; next} else {print $0 | "sort -k1,1V -k2,2n"} }' file.vcf > out.vcf
Changing your command to the above should work as expected.

print multiple fields if multiple pattern matches

I have a comma delimited file like below
0,category=a,type=b,value=1
1,category=c,type=b,.....,original_value=0
2,category=b,type=c,....,original_value=1,....,corrected_value=3
A line in the file can contain
(1)only 'value'
(2)only 'original_value'
(3)both 'original value' and 'corrected_value'
The values can be in any column.
The following awk command I wrote can only print one field after pattern match.
cat file | awk -F, 'BEGIN{OFS=","} /value/ { for (x=1;x<=NF;x++) if ($x~"value") {print $2,$3,$(x)} }' | sort -u
Current Output:
category=a,type=b,value=1
category=b,type=c,corrected_value=3
category=b,type=c,original_value=1
category=c,type=b,original_value=0
How do I print two fields (columns) of a line if two pattern matches occur? In this case, if both original_value and corrected_value exist.
Expected Output:
category=a,type=b,value=1
category=b,type=c,original_value=1,corrected_value=3
category=c,type=b,original_value=0
Bash Version: 4.3.11
You can use this awk command:
awk 'BEGIN{FS=OFS=","} {printf "%s%s%s", $2,OFS,$3; for(i=4; i<=NF; i++)
if ($i ~ /value/) printf "%s%s", OFS,$i; print ""}' file
category=a,type=b,value=1
category=c,type=b,original_value=0
category=b,type=c,original_value=1,corrected_value=3
Similar to #anubhava's answer, but does not rely on the category or type being in a particular column:
awk -F, '
BEGIN { pattern = "^(category|type|value|original_value|corrected_value)" }
{
sep = ""
for (i=1; i<=NF; i++) {
if ($i ~ pattern) {
printf "%s%s", sep, $i
sep = ","
}
}
print ""
}
' file

Bash: remove words from string containing numbers

In bash how to perform a string rename deleting all words that contains a number:
name_befor_proc="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
result:
name_after_proc="art-of-medusa.jpg"
In sed, remove everything between - that contains a number.
sed 's/[^-]*[0-9][^-\.]*-\{0,1\}//g;s/-\././' test
art-of-medusa.jpg
I guess there is no generic solution, also you can use the following python script for your particular use case
name = "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
ext = name.split(".")[1]
def contains_number(word):
for i in "0123456789":
if i in word:
return False
return True
final = '-'.join([word for word in name.split('-') if contains_number(word)])
if ext not in final:
final += "."+ext
print final
output:
art-of-medusa.jpg
It is not trivial!
awk -F"." -v sep="-" '
{n=split($1,a,sep)
for (i=1; i<=n; i++)
{if (a[i] ~ /[0-9]/) delete a[i]}
n=length(a)
for (i in a)
printf "%s%s", a[i], (++c<n?sep:"")
printf "%s%s\n", FS, $2}'
Split the string (up to the dot) and loop through the pieces. If one contains a digit, remove it. Then, rejoin the array and print accordingly.
Test
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
art-of-medusa.jpg
Testing with "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg" to make sure other words are also matched:
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg"
art-of-medusa-a-b.jpg
You can use gnu-awk for this:
s="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
name_after_proc=$(awk -v RS='[.-]' '!/[[:digit:]]/{printf r $1} {r=RT}' <<< "$s")
echo "$name_after_proc"
art-of-medusa.jpg
Two possible solutions:
Using Sed:
sed 's/[a-zA-Z0-9]*[0-9][a-zA-Z0-9]*/ /g' filename
Using grep:
grep -wo -E [a-zA-Z]+ foo | xargs filename

How to print a pattern using AWK?

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?
Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.
Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

Resources