Search and replace with multiple occurences of same pattern - bash

I want to replace the following
word-3.4.4-r0-20170804_101145
second-example-3.4.4-r0-20170804_101145
and-third-example-3.4.4-r0-20170804_101145
I'm looking to replace the hyphen that occurs before the first 3 with a colon e.g.
word:3.4.4-r0-20170804_101145
second-example:3.4.4-r0-20170804_101145
and-third-example:3.4.4-r0-20170804_101145
So far, the closest I can get is
newvar=$(echo "$var" | sed 's/-[0-9]/:/')
but this solution replaces -3 with:
word:.4.4-r0-20170804_101145
second-example:.4.4-r0-20170804_101145
and-third-example:.4.4-r0-20170804_101145

You are almost there. Just use captured group to retain the digit:
newvar=$(sed 's/-\([0-9]\)/:\1/' <<< "$var")
Result:
echo "$newvar"
word:3.4.4-r0-20170804_101145
second-example:3.4.4-r0-20170804_101145
and-third-example:3.4.4-r0-20170804_101145

You may use
sed 's/^\([^0-9-]*\(-[^0-9-]*\)*\)-\([0-9]\)/\1:\3/'
See the online demo
Details
^ - start of a line
\([^0-9-]*\(-[^0-9-]*\)*\) - Group 1:
[^0-9-]* - any 0+ chars other than digits and -
\(-[^0-9-]*\)* - (Group 2) 0+ sequences of - and any 0+ chars other than digits and -
- - a hyphen
\([0-9]\) - Group 3
The \1 is the backreference to Group 1 contents and \3 is the backreference to Group 3 contents.

newvar=$(echo "$var" | sed -r 's/-([0-9])/:\1/')
Very similar to other answers, but I prefer to use the -r option to sed, this allows constructing a sed String that is easier to read as it has fewer \ to parse. Like the other answers, the digit is capture by ([0-9]) and then replaced by \1.

Related

Regular expression to capture alphanumeric string only in shell

Trying to write the regex to capture the given alphanumeric values but its also capturing other numeric values. What should be the correct way to get the desire output?
code
grep -Eo '(\[[[:alnum:]]\)\w+' file > output
$ cat file
2022-04-29 08:45:11,754 [14] [Y23467] [546] This is a single line
2022-04-29 08:45:11,764 [15] [fpes] [547] This is a single line
2022-04-29 08:46:12,454 [143] [mwalkc] [548] This is a single line
2022-04-29 08:49:12,554 [143] [skhat2] [549] This is a single line
2022-04-29 09:40:13,852 [5] [narl12] [550] This is a single line
2022-04-29 09:45:14,754 [1426] [Y23467] [550] This is a single line
current output -
[14
[Y23467
[546
[15
[fpes
[547
[143
[mwalkc
[548
[143
[skhat2
[549
[5
[narl12
[550
[1426
[Y23467
[550
expected output -
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
1st solution: With your shown samples, please try following awk code. Simple explanation would be, using gsub function to substitute [ and ] in 4th field, printing 4th field after that.
awk '{gsub(/\[|\]/,"",$4);print $4}' Input_file
2nd solution: With GNU grep please try following solution.
grep -oP '^[0-9]{4}(-[0-9]{2}){2} [0-9]{2}(:[0-9]{2}){2},[0-9]{1,3} \[[0-9]+\] \[\K[^]]*' Input_file
Explanation: Adding detailed explanation for above regex used in GNU grep.
^[0-9]{4}(-[0-9]{2}){2} ##From starting of value matching 4 digits followed by dash 2 digits combination of 2 times.
[0-9]{2}(:[0-9]{2}){2} ##Matching space followed by 2 digits followed by : 2 digits combination of 2 times.
,[0-9]{1,3} ##Matching comma followed by digits from 1 to 3 number.
\[[0-9]+\] \[\K ##Matching space followed by [ digits(1 or more occurrences of digits) followed by space [ and
##then using \K to forget all the previously matched values.
[^]]* ##Matching everything just before 1st occurrence of ] to get actual values.
Using [[:alnum:]] or \w means that it can possibly match alphanumeric or word characters.
If there can be numbers, but there should be a character a-z and using -P for a perl compatible regex is supported:
grep -oP '\[\K\d*[A-Za-z][\dA-Za-z]*(?=])' file
Explanation
\[ Match [
\K Forget what is matched so far
\d*[A-Za-z] Match optional digits and at least a single char a-zA-Z
[\dA-Za-z]* Match optional chars a-zA-Z and digits
(?=]) Assert ] to the right
Output
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
If there can be only 1 occurrence, you might also use sed with a capture group \(...\) and use the group in the replacement using \1
sed 's/.*\[\([[:digit:]]*[[:alpha:]][[:alnum:]]*\)].*/\1/' file
There are several parts to your problem. First I'll try to help you with your regex (but it will probably unlock more problems); next I'll show you an alternative.
The Regex
The thing to understand about [[:alnum:]] is that it captures anything that contains an alphanumeric character. So it will capture "123", and it will capture "abc", as all of those characters are alphanumeric. It judges each character individually and cannot capture "only sections that have both numbers and letters" like what you want.
However, by chaining several greps together, we could filter out lines which only contain numbers.
grep -Eo '(\[[[:alnum:]]\)\w+' file | grep -v -Eo '\[[[:digit:]]+(\w+|$)' > output
To refine this further, there look to be a couple of bugs in your regex. First, you have included \[ inside the captured part, which is why it's capturing the [ in your results, so you should change (\[ to \[( to move the [ outside of the captured part in parantheses ( ... ).
Next, your combination of [[:alnum:]] with \w+ probably doesn't do what you expect. It looks for a single alphanumeric character, followed by one or more "word" characters (which is all the alphanumerics, plus some extra ones). You probably want ([[:alnum:]]+) instead of ([[:alnum:]])\w+
Alternative
Why not use cut instead? cut -d' ' -f4 will take the 4th field (with "space" as the delimiter between fields)
$ cut -d' ' -f 4 file
[Y23467]
[fpes]
[mwalkc]
[skhat2]
[narl12]
[Y23467]
If you also want to remove the square brackets, try
$ cut -d' ' -f 4 file | grep -Eo '\w+'
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
Using sed
$ sed 's/\([^[]*\[\)\{2\}\([^]]*\).*/\2/' input_file
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
Using FPAT with GNU awk:
awk -v FPAT='[[[:alnum:]]*]' '{gsub(/^\[|\]$/, "",$(NF-1));print $(NF-1)}' file
Y23467
fpes
mwalkc
skhat2
narl12
Y23467
setting FPAT as '[[[:alnum:]]*]' we match [ char followed by zero o more alphanumeric chars followed by ] char.
with gsub() function we remove initial [ and final ] chars.
we print the field previous to the last field, i.e. $(NF-1) field, without [ and ] characters.

How to convert a line into camel case?

This picks all the text on single line after a pattern match, and converts it to camel case using non-alphanumeric as separator, remove the spaces at the beginning and at the end of the resulting string, (1) this don't replace if it has 2 consecutive non-alphanumeric chars, e.g "2, " in the below example, (2) is there a way to do everything using sed command instead of using grep, cut, sed and tr?
$ echo " hello
world
title: this is-the_test string with number 2, to-test CAMEL String
end! " | grep -o 'title:.*' | cut -f2 -d: | sed -r 's/([^[:alnum:]])([0-9a-zA-Z])/\U\2/g' | tr -d ' '
ThisIsTheTestStringWithNumber2,ToTestCAMELString
To answer your first question, change [^[:alnum:]] to [^[:alnum:]]+ to mach one ore more non-alnum chars.
You may combine all the commands into a GNU sed solution like
sed -En '/.*title: *(.*[[:alnum:]]).*/{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp}'
See the online demo
Details
-En - POSIX ERE syntax is on (E) and default line output supressed with n
/.*title: *(.*[[:alnum:]]).*/ - matches a line having title: capturing all after it up to the last alnum char into Group 1 and matching the rest of the line
{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp} - if the line is matched,
s//\1/ - remove all but Group 1 pattern (received above)
s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/ - match and capture start of string or 1+ non-alnum chars into Group 1 (with ([^[:alnum:]]+|^)) and then capture an alnum char into Group 2 (with ([0-9a-zA-Z])) and replace with uppercased Group 2 contents (with \U\2).

how to remove all whitespaces in front and beind 3 consecutive periods

I'm trying to remove all white spaces before and after 3 consecutive periods and replace it with the actual ellipse symbol.
I've tried the following code:
sed 's/[[:space:]]*\.\.\.[[:space:]]*/…/g'
It replaces the 3 periods with the ellipse symbol, but the spaces before and after remain.
Sample Input.
hello ... world
Desired output
hello…world
Expression you are using is ERE(extended regular expressions) you have to add -E option to sed as follows to allow it, since you are using character classes in your code [[:space:]].
sed -E 's/[[:space:]]*\.\.\.[[:space:]]*/.../g' Input_file
Without -E try:
sed 's/ *\.\.\. */.../g' Input_file
Here is another sed
echo "hello ... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello...world
4 dots, do nothing?
echo "hello .... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello .... world
In bash, just use parameter substitution...
foo="hello ... world"
foo="${foo//+( )...+( )/...}"
Now, echo "$foo", outputs:
hello...world
The syntax for BaSH regex variable substitution are as follows:
${var-name/search/replace}
A single /replaces only the first occurrence from the left, while a double //replaces every occurrence.
One of ?*+#! followed by (pattern-list) replaces a specified number of occurrences of the patterns in pattern-list as follows:
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
# A single occurence
! Anything that *doesn't* match one of the occurrences
Pattern list can be any combination of literal strings, or character classes, separated by the pipe character |

Replacing one space with two spaces in Unix

I am trying to replace every time there is one space with two spaces in Unix. We are just reading from standard input and writing to standard ouput. I also have to avoid using the functions awk and perl. For example if I read in something like San Diego it should print San Diego. If there are already multiple spaces, it should just leave them alone.
How about bash only? First test file:
$ cat file
1
2 3
4 5
San Diego NO
Then:
$ cat file |
while IFS= read line
do
while [[ "$line" =~ (^|.+[^ ])\ ([^ ].*) ]]
do
line="${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"
done
echo "$line"
done
1
2 3
4 5
San Diego NO
You have to a bit careful here not to forget spaces at the beginning or end.
I present three solutions for educational purpose:
sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g' # solution 1
sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g' # solution 2
sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g' # solution 3
All three solutions make use of subexpressions:
9.3.6 BREs Matching Multiple Characters
A subexpression can be defined within a BRE by enclosing it between
the character pairs \( and \). Such a subexpression shall match
whatever it would have matched without the \( and \), except that
anchoring within subexpressions is optional behavior; see BRE
Expression Anchoring. Subexpressions can be arbitrarily nested.
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression enclosed
between "\(" and "\)" preceding the '\n'. The character n shall be a
digit from 1 through 9, specifying the nth subexpression (the one that
begins with the nth \( from the beginning of the pattern and ends
with the corresponding paired \) ). The expression is invalid if
less than n subexpressions precede the \n. For example, the
expression ".∗\1$" matches a line consisting of two adjacent
appearances of the same string, and the expression a*\1 fails to
match a. When the referenced subexpression matched more than one
string, the back-referenced expression shall refer to the last matched
string. If the subexpression referenced by the back-reference matches
more than one string because of an asterisk (*) or an interval
expression (see item (5)), the back-reference shall match the last
(rightmost) of these strings.
Solution 1: sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g'
Here there are two subexpressions. The first subexpression \(^\|[^ ]\) matches the beginning of the line (^) or (\|) a non-space character ([^ ]). The second subexpression \($\|[^ ]\) is similar but with the end-of-line ($).
Solution 2: sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g'
This replaces one-or more spaces by the same amount of spaces and an extra one. Afterwards we correct the ones with 3 spaces or more by removing a single space from those.
Solution 3: sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g'
This does the same thing as solution 2 but inverts the logic. First remove a space from all sequences that have more then one space, and afterwards add a space. This one-liner is just one-character shorter then solution 2.
Example: based on solution 1
The following commands are nothing more then echo "string" | sed ..., but to show the spaces, wrapped into a printf statement.
# default string
$ printf "|%s|" " foo bar car "
| foo bar car |
# spaces replaced
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
# 3 spaces in front and back
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
note: If you want to replace any form of blanks (spaces and tabs in any encoding) by the same doubled blank, you could use :
sed 's/\(^\|[^[:blank:]]\)\([[:blank:]]\)\($\|[^[:blank:]]\)/\1\2\2\3/g'
sed 's/\(^\|[[:graph:]]\)\([[:blank:]]\)\($\|[[:graph:]]\)/\1\2\2\3/g
Something along the lines of
cat input.txt | sed 's,\([[:alnum:]]\) \([[:alnum:]]\),\1 \2,'
should work for that purpose.
replace only occurrence of 1 space between 2 chars hat are not white space with 2 spaces
`sed 's/\([^ ]\) \([^ ]\)/\1 \2/g' file`
1) [^ ] - not space char
2) \1 \2 - first expression found in Parenthesis, 2 spaces, second Parentheses expiration
3) sed used with s///g is replacing the regex in the first // with the value in the second //

sed - Replacing brackets with characters?

I have a string that with brackets that enclose a single character, like such:
[a]
I want to take the character within the bracket and replace the bracket with the character, so the end result would look like:
aaa
This is what I came up with, but it doesn't work:
sed 's/\[ \([a-z]\) \]/\2/g' < testfile
Can someone please help me, and explain why my command isn't working?
Try the following code:
$ echo "[a]" | sed 's/\[\([a-zA-Z]\)\]/\1\1\1/g'
or
$ echo "[a]" | sed -r 's/\[([a-zA-Z])\]/\1\1\1/g'
Output:
aaa
I think you missed some basic concepts. First let's duplicate a single char
$ echo a | sed -r 's/(.)/\1\1/'
aa
parenthesis indicates the groups and \1 refers to the first group
Now, to match a char in square brackets and triple it.
$ echo [a]b | sed -r 's/\[(.)\]/\1\1\1/'
aaab
you need to escape square bracket chars since they have special meaning in regex. The key is you have to bracket in parenthesis the regex you're interested in and refer to them in the same order with \{number} notation.
The issue with your patern sed 's/\[ \([a-z]\) \]/\2/g' < testfile:
1) The pattern has only one group \([a-z]\), so \2 is invalid;
2) The pattern contains space, there is no match found;
3) To replace brackets, you need to capture them in a group.
My idea is, to catch all groups in a pattern, and replace them with \2\2\2:
echo "[a]" | sed 's/\(\[\)\([a-z]\)\(\]\)/\2\2\2/g'
Or
echo "[a]" | sed 's/\(.\)\(.\)\(.\)/\2\2\2/g'
The output is:
aaa

Resources