Bash: remove words from string containing numbers - bash

In bash how to perform a string rename deleting all words that contains a number:
name_befor_proc="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
result:
name_after_proc="art-of-medusa.jpg"

In sed, remove everything between - that contains a number.
sed 's/[^-]*[0-9][^-\.]*-\{0,1\}//g;s/-\././' test
art-of-medusa.jpg

I guess there is no generic solution, also you can use the following python script for your particular use case
name = "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
ext = name.split(".")[1]
def contains_number(word):
for i in "0123456789":
if i in word:
return False
return True
final = '-'.join([word for word in name.split('-') if contains_number(word)])
if ext not in final:
final += "."+ext
print final
output:
art-of-medusa.jpg

It is not trivial!
awk -F"." -v sep="-" '
{n=split($1,a,sep)
for (i=1; i<=n; i++)
{if (a[i] ~ /[0-9]/) delete a[i]}
n=length(a)
for (i in a)
printf "%s%s", a[i], (++c<n?sep:"")
printf "%s%s\n", FS, $2}'
Split the string (up to the dot) and loop through the pieces. If one contains a digit, remove it. Then, rejoin the array and print accordingly.
Test
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
art-of-medusa.jpg
Testing with "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg" to make sure other words are also matched:
$ awk -F"." -v sep="-" '{n=split($1,a,sep); for (i=1; i<=n; i++) {if (a[i] ~ /[0-9]/) delete a[i]}; n=length(a); for (i in a) printf "%s%s", a[i], (++c<n?sep:""); printf "%s%s\n", FS, $2}' <<< "art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061-a-23-b.jpg"
art-of-medusa-a-b.jpg

You can use gnu-awk for this:
s="art-of-medusa-feefacc0-c75e-4846-9ccf-7463d5944061.jpg"
name_after_proc=$(awk -v RS='[.-]' '!/[[:digit:]]/{printf r $1} {r=RT}' <<< "$s")
echo "$name_after_proc"
art-of-medusa.jpg

Two possible solutions:
Using Sed:
sed 's/[a-zA-Z0-9]*[0-9][a-zA-Z0-9]*/ /g' filename
Using grep:
grep -wo -E [a-zA-Z]+ foo | xargs filename

Related

Get substring of regex from string

I am using a bash script to parse logs and need to extract a substring.
My string is something like this -
TestingmkJHSBD,MFV from testing:2.6.1.566-978.7 testing
How can I extract this string using bash
Would you please try the following:
awk '
/FROM image/ {
if (match($0, /image[^[:space:]]+/))
print(substr($0, RSTART, RLENGTH))
}
' logfile
The regex image[^[:space:]]+ matches a substring which starts with
image and followed by non-space character(s).
Then the awk variables RSTART and RLENGTH are assigned to the position
and the length of the matched substring.
Another awk option:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"TestingmkJHSBD,MFV FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7
Or using different log string:
awk '{if ($0 ~ /FROM image/) {for (i=1; i<=NF; i++) if ($i ~ /^image/) {print $i} }}' <<<"bdkjf asfjkklsdfsg FROM image/something/docker:2.6.1.566-978.7 testing"
Output:
image/something/docker:2.6.1.566-978.7

BASH How to get minimum value from each row

I have csv file like this:
-0.106992, -0.106992, -0.059528, -0.059528, -0.028184, -0.028184, 0.017793, 0.017793, 0.0, 0.220367
-0.094557, -0.094557, -0.063707, -0.063707, -0.020796, -0.020796, 0.003707, 0.003707, 0.200767, 0.200767
-0.106038, -0.106038, -0.056540, -0.056540, -0.015119, -0.015119, 0.032954, 0.032954, 0.237774, 0.237774
-0.049499, -0.049499, -0.006934, -0.006934, 0.026562, 0.026562, 0.067442, 0.067442, 0.260149, 0.260149
-0.081001, -0.081001, -0.039581, -0.039581, -0.008817, -0.008817, 0.029912, 0.029912, 0.222084, 0.222084
-0.046782, -0.046782, -0.000180, -0.000180, 0.030788, 0.030788, 0.075928, 0.075928, 0.266452, 0.266452
-0.082107, -0.082107, -0.026791, -0.026791, 0.001874, 0.001874, 0.052341, 0.052341, 0.249779, 0.249779
enter image description here
I want to get the minimum value from each row.
Expected output must be:
-0.106992
-0.094557
-0.106038
-0.049499
-0.08100
-0.046782
-0.082107
I tried get it by awk but awk doesn't give minimum values:
awk command:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
output:
-0.028184,
-0.020796,
-0.015119,
-0.006934,
-0.008817,
-0.000180,
-0.026791,
Perl makes short work of this:
perl -MList::Util=min -F', ' -E 'say min #F' file.csv
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
Using any awk in any shell on every Unix box whether you have blanks after each comma or not:
$ awk -F', *' '{min=$1; for (i=2;i<=NF;i++) if ($i<min) min=$i; print min}' file
-0.106992
-0.094557
-0.106038
-0.049499
-0.081001
-0.046782
-0.082107
with ruby :-D
ruby -F', ' -ane 'puts $F.map(&:to_f).min' file.csv
Your code is correct:
awk '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
Except that you must add a comma to the field separator:
awk -F '[[:blank:],]' '{m=$1; for (i=2; i<=NF; i++) if ($i < m) m = $i; print m}' file_name
[[:blank:],] is spaces, tabs, and commas.

How to grep for multiple word occurrences from multiple files and list them grouped as rows and columns

Hello: Need your help to count word occurrences from multiple files and output them as row and columns. I searched the site for a similar reference but could not locate, hence posting it here.
Setup:
I have 2 files with the following
[a.log]
id,status
1,new
2,old
3,old
4,old
5,old
[b.log]
id,status
1,new
2,old
3,new
4,old
5,new
Results required
The result i require using the command line only is (preferably):
file count(new) count(old)
a.log 1 4
b.log 3 2
Script
The script below provides me the count for a single word across multiple.
I am stuck trying to get results for multiple words. Please help.
grep -cw "old" *.log
You can get this output using gnu-awk that accepts comma separated word to be searched in a command line argument:
awk -v OFS='\t' -F, -v wrds='new,old' 'BEGIN{n=split(wrds, a, /,/); for(i=1; i<=n; i++) b[a[i]]=a[i]} FNR==1{next} $2 in b{freq[FILENAME][$2]++} END{printf "%s", "file" OFS; for(i=1; i<=n; i++) printf "count(%s)%s", a[i], (i==n?ORS:OFS); for(f in freq) {printf "%s", f OFS; for(i=1; i<=n; i++) printf "%s%s", freq[f][a[i]], (i==n?ORS:OFS)}}' a.log b.log | column -t
Output:
file count(new) count(old)
a.log 1 4
b.log 3 2
PS: column -t was only used for formatting the output in tabular format.
Readable awk:
awk -v OFS='\t' -F, -v wrds='new,old' 'BEGIN {
n = split(wrds, a, /,/) # split input words list by comma with int index
for(i=1; i<=n; i++) # store words in another array with key as words
b[a[i]]=a[i]
}
FNR==1 {
next # skip first row from all the files
}
$2 in b {
freq[FILENAME][$2]++ # store filename and word frequency in 2-dimesional array
}
END { # print formatted result
printf "%s", "file" OFS
for(i=1; i<=n; i++)
printf "count(%s)%s", a[i], (i==n?ORS:OFS)
for(f in freq) {
printf "%s", f OFS
for(i=1; i<=n; i++)
printf "%s%s", freq[f][a[i]], (i==n?ORS:OFS)
}
}' a.log b.log
I think you're looking for something like this, but it's not overly clear what your objectives are (if you're going for efficiency for example, this isn't overly efficient)...
for file in *.log; do
echo -n "${file}\t"
for word in "new" "old"; do
grep -cw $word $file;
echo -n "\t";
done
echo;
done
(for readability, I simplified the first line, but this doesn't work if there's spaces in the filenames -- the proper solution is to change the first line to read find . -iname "*.log" -maxdepth=1 | while read file; do)
for c in a b ; do egrep -o "new|old" $c.log | sort | uniq -c > $c.luc; done
Get rid of the headlines with grep, then sort and count.
join -1 2 -2 2 a.luc b.luc
> new 1 3
> old 4 2
Placing a new header is left as an exercise for the reader. Is there a flip command for unix/linux/bash to flip a table, or how would you say?
Handling empty cells is left as an exercise too, but possible with join.
without real multi-dim array support, this will count all values in field 2, not just "new/old". The header and number of columns are dynamic with number of distinct values as well.
$ awk -F, 'NR==1 {fs["file"]}
FNR>1 {c[FILENAME,$2]++; fs[FILENAME]; ks[$2];
c["file",$2]="count("$2")"}
END {for(f in fs)
{printf "%s", f;
for(k in ks) printf "%s", OFS c[f,k];
printf "\n"}}' file{1,2} | column -t
file count(new) count(old)
file1 1 4
file2 3 2
Awk solution:
awk 'BEGIN{
FS=","; OFS="\t"; print "file","count(new)","count(old)";
f1=ARGV[1]; f2=ARGV[2] # get filenames
}
FNR==1{ next } # skip the 1st header line
NR==FNR{ c1[$2]++; next } # accumulate occurrences of the 2nd field in 1st file
{ c2[$2]++ } # accumulate occurrences of the 2nd field in 2nd file
END{
print f1, c1["new"], c1["old"];
print f2, c2["new"], c2["old"]
}' a.log b.log
The output:
file count(new) count(old)
a.log 1 4
b.log 3 2

print multiple fields if multiple pattern matches

I have a comma delimited file like below
0,category=a,type=b,value=1
1,category=c,type=b,.....,original_value=0
2,category=b,type=c,....,original_value=1,....,corrected_value=3
A line in the file can contain
(1)only 'value'
(2)only 'original_value'
(3)both 'original value' and 'corrected_value'
The values can be in any column.
The following awk command I wrote can only print one field after pattern match.
cat file | awk -F, 'BEGIN{OFS=","} /value/ { for (x=1;x<=NF;x++) if ($x~"value") {print $2,$3,$(x)} }' | sort -u
Current Output:
category=a,type=b,value=1
category=b,type=c,corrected_value=3
category=b,type=c,original_value=1
category=c,type=b,original_value=0
How do I print two fields (columns) of a line if two pattern matches occur? In this case, if both original_value and corrected_value exist.
Expected Output:
category=a,type=b,value=1
category=b,type=c,original_value=1,corrected_value=3
category=c,type=b,original_value=0
Bash Version: 4.3.11
You can use this awk command:
awk 'BEGIN{FS=OFS=","} {printf "%s%s%s", $2,OFS,$3; for(i=4; i<=NF; i++)
if ($i ~ /value/) printf "%s%s", OFS,$i; print ""}' file
category=a,type=b,value=1
category=c,type=b,original_value=0
category=b,type=c,original_value=1,corrected_value=3
Similar to #anubhava's answer, but does not rely on the category or type being in a particular column:
awk -F, '
BEGIN { pattern = "^(category|type|value|original_value|corrected_value)" }
{
sep = ""
for (i=1; i<=NF; i++) {
if ($i ~ pattern) {
printf "%s%s", sep, $i
sep = ","
}
}
print ""
}
' file

Edit text format with shell script

I am trying to make a script for text editing. In this case I have a text file named text.csv, which reads:
first;48548a;48954a,48594B
second;58757a;5875b
third;58756a;58576b;5867d;56894d;45864a
I want to make text format to like this:
first;48548a
first;48954a
first;48594B
second;58757a
second;5875b
third;58756a
third;58576b
third;5867d
third;56894d
third;45864a
What is command should I use to make this happen?
I'd do this in awk.
Assuming your first line should have a ; instead of a ,:
$ awk -F\; '{for(n=2; n<=NF; n++) { printf("%s;%s\n",$1,$n); }}' input.txt
Untested.
Here is a pure bash solution that handles both , and ;.
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
for item in "${data[#]}"; do
printf '%s;%s\n' "$id" "$item"
done
done < input.txt
UPDATED - alternate printing method based on chepner's suggestion:
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
printf "$id;%s\n" "${data[#]}"
done < input.txt
awk -v FS=';' -v OFS=';' '{for (i = 2; i <= NF; ++i) { print $1, $i }}'
Explanation: awk implicitly splits data into records(by default separeted by newline, i.e. line == record) which then are split into numbered fields by given field separator(FS for input field separator and OFS for output separator).
For each record this script prints first field(which is record name), along with i-th field, and that's exactly what you need.
while IFS=';,' read -a data; do
id="${data[0]}"
data=("${data[#]:1}")
printf "$id;%s\n" "${data[#]}"
done < input.txt
or
awk -v FS=';' -v OFS=';' '{for (i = 2; i <= NF; ++i) { print $1, $i }}'
And
$ awk -F\; '{for(n=2; n<=NF; n++) { printf("%s;%s\n",$1,$n); }}' input.txt
thanks all for your suggestions, :d. It's really give me a new knowledge..

Resources