Get numbers from line and separate them in bash - bash

I have this line:
io=9839.1MB, bw=4012.3KB/s, iops=250, runt=2511369msec
and I need to extract the numbers, to get an output like this:
9839.1 4012.3 250 2511369
I tried with sed 's/[^0-9.]*//g' but numbers are not separated from each other.
How can I do that?

Remove everything but digits, dots and blanks:
echo 'io=9839.1MB, bw=4012.3KB/s, iops=250, runt=2511369msec' | tr -cd '0-9. '
Output:
9839.1 4012.3 250 2511369

Your attempt with sed is almost right, just need a add a single whitespace character for excluding from replacement,
sed 's/[^0-9. ]*//g' file
9839.1 4012.3 250 2511369
You can also use GNU grep with a -E regular expression syntax and -o for matching only words which include digits and dots.
grep -o -E '[0-9.]+' file
9839.1
4012.3
250
2511369

The Fish
// a group containing a digit and the literal character "." at least once
const regex = /([\d.]+)/g
// your string
const haystack = `io=9839.1MB, bw=4012.3KB/s, iops=250, runt=2511369msec`
console.log(haystack.match(regex))
How to fish:
Here is the fantastic regex101 featuring a wysiwyg regexp editor including a large knowledge base.
https://regex101.com/r/a2scnd/1

With shell parameter expansion:
$ var='io=9839.1MB, bw=4012.3KB/s, iops=250, runt=2511369msec'
$ echo "${var//[![:digit:]. ]}"
9839.1 4012.3 250 2511369
Specifically, this uses the ${parameter//pattern/string} with an empty string and a negated bracket expression for pattern: [![:digit:]. ] – everything other than digits, periods and spaces.

Related

Regular expression to capture alphanumeric string only in shell

Trying to write the regex to capture the given alphanumeric values but its also capturing other numeric values. What should be the correct way to get the desire output?
code
grep -Eo '(\[[[:alnum:]]\)\w+' file > output
$ cat file
2022-04-29 08:45:11,754 [14] [Y23467] [546] This is a single line
2022-04-29 08:45:11,764 [15] [fpes] [547] This is a single line
2022-04-29 08:46:12,454 [143] [mwalkc] [548] This is a single line
2022-04-29 08:49:12,554 [143] [skhat2] [549] This is a single line
2022-04-29 09:40:13,852 [5] [narl12] [550] This is a single line
2022-04-29 09:45:14,754 [1426] [Y23467] [550] This is a single line
current output -
[14
[Y23467
[546
[15
[fpes
[547
[143
[mwalkc
[548
[143
[skhat2
[549
[5
[narl12
[550
[1426
[Y23467
[550
expected output -
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
1st solution: With your shown samples, please try following awk code. Simple explanation would be, using gsub function to substitute [ and ] in 4th field, printing 4th field after that.
awk '{gsub(/\[|\]/,"",$4);print $4}' Input_file
2nd solution: With GNU grep please try following solution.
grep -oP '^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{1,3} \[[0-9]+\] \[\K[^]]*' Input_file
Explanation: Adding detailed explanation for above regex used in GNU grep.
^[0-9]{4}(-[0-9]{2}){2} ##From starting of value matching 4 digits followed by dash 2 digits combination of 2 times.
[0-9]{2}(:[0-9]{2}){2} ##Matching space followed by 2 digits followed by : 2 digits combination of 2 times.
,[0-9]{1,3} ##Matching comma followed by digits from 1 to 3 number.
\[[0-9]+\] \[\K ##Matching space followed by [ digits(1 or more occurrences of digits) followed by space [ and
##then using \K to forget all the previously matched values.
[^]]* ##Matching everything just before 1st occurrence of ] to get actual values.
Using [[:alnum:]] or \w means that it can possibly match alphanumeric or word characters.
If there can be numbers, but there should be a character a-z and using -P for a perl compatible regex is supported:
grep -oP '\[\K\d*[A-Za-z][\dA-Za-z]*(?=])' file
Explanation
\[ Match [
\K Forget what is matched so far
\d*[A-Za-z] Match optional digits and at least a single char a-zA-Z
[\dA-Za-z]* Match optional chars a-zA-Z and digits
(?=]) Assert ] to the right
Output
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
If there can be only 1 occurrence, you might also use sed with a capture group \(...\) and use the group in the replacement using \1
sed 's/.*\[\([[:digit:]]*[[:alpha:]][[:alnum:]]*\)].*/\1/' file
There are several parts to your problem. First I'll try to help you with your regex (but it will probably unlock more problems); next I'll show you an alternative.
The Regex
The thing to understand about [[:alnum:]] is that it captures anything that contains an alphanumeric character. So it will capture "123", and it will capture "abc", as all of those characters are alphanumeric. It judges each character individually and cannot capture "only sections that have both numbers and letters" like what you want.
However, by chaining several greps together, we could filter out lines which only contain numbers.
grep -Eo '(\[[[:alnum:]]\)\w+' file | grep -v -Eo '\[[[:digit:]]+(\w+|$)' > output
To refine this further, there look to be a couple of bugs in your regex. First, you have included \[ inside the captured part, which is why it's capturing the [ in your results, so you should change (\[ to \[( to move the [ outside of the captured part in parantheses ( ... ).
Next, your combination of [[:alnum:]] with \w+ probably doesn't do what you expect. It looks for a single alphanumeric character, followed by one or more "word" characters (which is all the alphanumerics, plus some extra ones). You probably want ([[:alnum:]]+) instead of ([[:alnum:]])\w+
Alternative
Why not use cut instead? cut -d' ' -f4 will take the 4th field (with "space" as the delimiter between fields)
$ cut -d' ' -f 4 file
[Y23467]
[fpes]
[mwalkc]
[skhat2]
[narl12]
[Y23467]
If you also want to remove the square brackets, try
$ cut -d' ' -f 4 file | grep -Eo '\w+'
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
Using sed
$ sed 's/\([^[]*\[\)\{2\}\([^]]*\).*/\2/' input_file
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
Using FPAT with GNU awk:
awk -v FPAT='[[[:alnum:]]*]' '{gsub(/^\[|\]$/, "",$(NF-1));print $(NF-1)}' file
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
setting FPAT as '[[[:alnum:]]*]' we match [ char followed by zero o more alphanumeric chars followed by ] char.
with gsub() function we remove initial [ and final ] chars.
we print the field previous to the last field, i.e. $(NF-1) field, without [ and ] characters.

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

how to remove all whitespaces in front and beind 3 consecutive periods

I'm trying to remove all white spaces before and after 3 consecutive periods and replace it with the actual ellipse symbol.
I've tried the following code:
sed 's/[[:space:]]*\.\.\.[[:space:]]*/…/g'
It replaces the 3 periods with the ellipse symbol, but the spaces before and after remain.
Sample Input.
hello ... world
Desired output
hello…world
Expression you are using is ERE(extended regular expressions) you have to add -E option to sed as follows to allow it, since you are using character classes in your code [[:space:]].
sed -E 's/[[:space:]]*\.\.\.[[:space:]]*/.../g' Input_file
Without -E try:
sed 's/ *\.\.\. */.../g' Input_file
Here is another sed
echo "hello ... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello...world
4 dots, do nothing?
echo "hello .... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello .... world
In bash, just use parameter substitution...
foo="hello ... world"
foo="${foo//+( )...+( )/...}"
Now, echo "$foo", outputs:
hello...world
The syntax for BaSH regex variable substitution are as follows:
${var-name/search/replace}
A single /replaces only the first occurrence from the left, while a double //replaces every occurrence.
One of ?*+#! followed by (pattern-list) replaces a specified number of occurrences of the patterns in pattern-list as follows:
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
# A single occurence
! Anything that *doesn't* match one of the occurrences
Pattern list can be any combination of literal strings, or character classes, separated by the pipe character |

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

how to chop last n bytes of a string in bash string choping?

for example qa_sharutils-2009-04-22-15-20-39, want chop last 20 bytes, and get 'qa_sharutils'.
I know how to do it in sed, but why $A=${A/.\{20\}$/} does not work?
Thanks!
If your string is stored in a variable called $str, then this will get you give you the substring without the last 20 digits in bash
${str:0:${#str} - 20}
basically, string slicing can be done using
${[variableName]:[startIndex]:[length]}
and the length of a string is
${#[variableName]}
EDIT:
solution using sed that works on files:
sed 's/.\{20\}$//' < inputFile
similar to substr('abcdefg', 2-1, 3) in php:
echo 'abcdefg'|tail -c +2|head -c 3
using awk:
echo $str | awk '{print substr($0,1,length($0)-20)}'
or using strings manipulation - echo ${string:position:length}:
echo ${str:0:$((${#str}-20))}
In the ${parameter/pattern/string} syntax in bash, pattern is a path wildcard-style pattern, not a regular expression. In wildcard syntax a dot . is just a literal dot and curly braces are used to match a choice of options (like the pipe | in regular expressions), so that line will simply erase the literal string ".20".
There are several ways to accomplish the basic task.
$ str="qa_sharutils-2009-04-22-15-20-39"
If you want to strip the last 20 characters. This substring selection is zero based:
$ echo ${str::${#str}-20}
qa_sharutils
The "%" and "%%" to strip from the right hand side of the string. For instance, if you want the basename, minus anything that follows the first "-":
$ echo ${str%%-*}
qa_sharutils
only if your last 20 bytes is always date.
$ str="qa_sharutils-2009-04-22-15-20-39"
$ IFS="-"
$ set -- $str
$ echo $1
qa_sharutils
$ unset IFS
or when first dash and beyond are not needed.
$ echo ${str%%-*}
qa_sharutils

Resources