Get substring of regex from string

Get substring of regex from string - bash

I am using a bash script to parse logs and need to extract a substring.
My string is something like this -
TestingmkJHSBD,MFV from testing:2.6.1.566-978.7 testing
How can I extract this string using bash

Would you please try the following:
awk '
/FROM image/ {
if (match($0, /image[^[:space:]]+/))
print(substr($0, RSTART, RLENGTH))
}
' logfile
The regex image[^[:space:]]+ matches a substring which starts with
image and followed by non-space character(s).
Then the awk variables RSTART and RLENGTH are assigned to the position
and the length of the matched substring.

Another awk option:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"TestingmkJHSBD,MFV FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7
Or using different log string:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"bdkjf asfjkklsdfsg FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7

Related

BASH How to get minimum value from each row

I have csv file like this:
-0.106992, -0.106992, -0.059528, -0.059528, -0.028184, -0.028184, 0.017793, 0.017793, 0.0, 0.220367
-0.094557, -0.094557, -0.063707, -0.063707, -0.020796, -0.020796, 0.003707, 0.003707, 0.200767, 0.200767
-0.106038, -0.106038, -0.056540, -0.056540, -0.015119, -0.015119, 0.032954, 0.032954, 0.237774, 0.237774
-0.049499, -0.049499, -0.006934, -0.006934, 0.026562, 0.026562, 0.067442, 0.067442, 0.260149, 0.260149
-0.081001, -0.081001, -0.039581, -0.039581, -0.008817, -0.008817, 0.029912, 0.029912, 0.222084, 0.222084
-0.046782, -0.046782, -0.000180, -0.000180, 0.030788, 0.030788, 0.075928, 0.075928, 0.266452, 0.266452
-0.082107, -0.082107, -0.026791, -0.026791, 0.001874, 0.001874, 0.052341, 0.052341, 0.249779, 0.249779
enter image description here
I want to get the minimum value from each row.
Expected output must be:
-0.106992
-0.094557
-0.106038
-0.049499
-0.08100
-0.046782
-0.082107
I tried get it by awk but awk doesn't give minimum values:
awk command:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
output:
-0.028184,
-0.020796,
-0.015119,
-0.006934,
-0.008817,
-0.000180,
-0.026791,

Perl makes short work of this:
perl -MList::Util=min -F', ' -E 'say min #F' file.csv
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107

Using any awk in any shell on every Unix box whether you have blanks after each comma or not:
$ awk -F', *' '{min=$1; for (i=2;i<=NF;i++) if ($i<min) min=$i; print min}' file
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107

with ruby :-D
ruby -F', ' -ane 'puts $F.map(&:to_f).min' file.csv

Your code is correct:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
Except that you must add a comma to the field separator:
awk -F '[[:blank:],]' '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
[[:blank:],] is spaces, tabs, and commas.

Error with awk (newline or end of string)

I am having an issue with the following command:
awk ‘{if ($1 ~ /^##contig/) {next}else if ($1 ~ /^#/) {print $0; next}else {print $0 | “sort -k1,1V -k2,2n”}’ file.vcf > out.vcf
It gives the following error:
^ unexpected newline or end of string

Your command contains "fancy quotes" instead of normal ones, in addition to a missing }.
awk '{if ($1 ~ /^##contig/) {next} else if ($1 ~ /^#/) {print $0; next} else {print $0 | "sort -k1,1V -k2,2n"} }' file.vcf > out.vcf
Changing your command to the above should work as expected.

print multiple fields if multiple pattern matches

I have a comma delimited file like below
0,category=a,type=b,value=1
1,category=c,type=b,.....,original_value=0
2,category=b,type=c,....,original_value=1,....,corrected_value=3
A line in the file can contain
(1)only 'value'
(2)only 'original_value'
(3)both 'original value' and 'corrected_value'
The values can be in any column.
The following awk command I wrote can only print one field after pattern match.
cat file | awk -F, 'BEGIN{OFS=","} /value/ { for (x=1;x<=NF;x++) if ($x~"value") {print $2,$3,$(x)} }' | sort -u
Current Output:
category=a,type=b,value=1
category=b,type=c,corrected_value=3
category=b,type=c,original_value=1
category=c,type=b,original_value=0
How do I print two fields (columns) of a line if two pattern matches occur? In this case, if both original_value and corrected_value exist.
Expected Output:
category=a,type=b,value=1
category=b,type=c,original_value=1,corrected_value=3
category=c,type=b,original_value=0
Bash Version: 4.3.11

You can use this awk command:
awk 'BEGIN{FS=OFS=","} {printf "%s%s%s", $2,OFS,$3; for(i=4; i<=NF; i++)
if ($i ~ /value/) printf "%s%s", OFS,$i; print ""}' file
category=a,type=b,value=1
category=c,type=b,original_value=0
category=b,type=c,original_value=1,corrected_value=3

Similar to #anubhava's answer, but does not rely on the category or type being in a particular column:
awk -F, '
BEGIN { pattern = "^(category|type|value|original_value|corrected_value)" }
{
sep = ""
for (i=1; i<=NF; i++) {
if ($i ~ pattern) {
printf "%s%s", sep, $i
sep = ","
}
}
print ""
}
' file

Bash: remove words from string containing numbers

In bash how to perform a string rename deleting all words that contains a number:
name_befor_proc="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
result:
name_after_proc="art-of-medusa.jpg"

In sed, remove everything between - that contains a number.
sed 's/[^-]*[0-9][^-\.]*-\{0,1\}//g;s/-\././' test
art-of-medusa.jpg

I guess there is no generic solution, also you can use the following python script for your particular use case
name = "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
ext = name.split(".")[1]
def contains_number(word):
for i in "0123456789":
if i in word:
return False
return True
final = '-'.join([word for word in name.split('-') if contains_number(word)])
if ext not in final:
final += "."+ext
print final
output:
art-of-medusa.jpg

It is not trivial!
awk -F"." -v sep="-" '
{n=split($1,a,sep)
for (i=1; i<=n; i++)
{if (a[i] ~ /[0-9]/) delete a[i]}
n=length(a)
for (i in a)
printf "%s%s", a[i], (++c<n?sep:"")
printf "%s%s\n", FS, $2}'
Split the string (up to the dot) and loop through the pieces. If one contains a digit, remove it. Then, rejoin the array and print accordingly.
Test
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
art-of-medusa.jpg
Testing with "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg" to make sure other words are also matched:
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg"
art-of-medusa-a-b.jpg

You can use gnu-awk for this:
s="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
name_after_proc=$(awk -v RS='[.-]' '!/[[:digit:]]/{printf r $1} {r=RT}' <<< "$s")
echo "$name_after_proc"
art-of-medusa.jpg

Two possible solutions:
Using Sed:
sed 's/[a-zA-Z0-9]*[0-9][a-zA-Z0-9]*/ /g' filename
Using grep:
grep -wo -E [a-zA-Z]+ foo | xargs filename

How to print a pattern using AWK?

I need to find in file word that matches regex pattern.
So if in line, i have:
00:10:20,918 I [AbstractAction.java] - register | 0.0.0.0 | {GW_CHANNEL=AA, PWD=********, ID=777777, GW_USER=BB, NUM=3996, SYSTEM_USER=OS, LOGIC_ID=0}
awk -F' ' '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER/ && /GW_CHANNEL/){print $5 " " $i} } }'
Print only:
register GW_USER=BB
I wonna get:
register GW_USER=BB GW_CHANNEL=AA
How to print GW_USER and GW_CHANNEL columns?

Your if condition isn't looking right, you can use regex alternation:
awk '{for(i=1;i<=NF;i++){ if($i ~ /GW_USER|GW_CHANNEL/) print $5, $i } }' file
There is no need to use -F" " and " " in print as that is default field separator.
Your condition:
if($i ~ /GW_USER/ && /GW_CHANNEL/)
Will match FW_USER against $i but will match GW_CHANNEL in whole line.

Whenever you have name=value pairs in your input, it's a good idea to create an array that maps the names to the values and then print by name:
$ cat tst.awk
match($0,/{[^}]+/) {
str = substr($0,RSTART+1,RLENGTH-1)
split(str,arr,/[ ,=]+/)
delete n2v
for (i=1; i in arr; i+=2) {
n2v[arr[i]] = arr[i+1]
}
print $5, fmt("GW_USER"), fmt("GW_CHANNEL")
}
function fmt(name) { return (name "=" n2v[name]) }
$
$ awk -f tst.awk file
register GW_USER=BB GW_CHANNEL=AA
that way you trivially print or do anything else you want with any other field in future.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Get substring of regex from string - bash

I am using a bash script to parse logs and need to extract a substring. My string is something like this - TestingmkJHSBD,MFV from testing:2.6.1.566-978.7 testing How can I extract this string using bash

Related

BASH How to get minimum value from each row

Error with awk (newline or end of string)

print multiple fields if multiple pattern matches

Bash: remove words from string containing numbers

How to print a pattern using AWK?

Categories

Resources