How to convert a line into camel case? - bash

This picks all the text on single line after a pattern match, and converts it to camel case using non-alphanumeric as separator, remove the spaces at the beginning and at the end of the resulting string, (1) this don't replace if it has 2 consecutive non-alphanumeric chars, e.g "2, " in the below example, (2) is there a way to do everything using sed command instead of using grep, cut, sed and tr?
$ echo " hello
world
title: this is-the_test string with number 2, to-test CAMEL String
end! " | grep -o 'title:.*' | cut -f2 -d: | sed -r 's/([^[:alnum:]])([0-9a-zA-Z])/\U\2/g' | tr -d ' '
ThisIsTheTestStringWithNumber2,ToTestCAMELString

To answer your first question, change [^[:alnum:]] to [^[:alnum:]]+ to mach one ore more non-alnum chars.
You may combine all the commands into a GNU sed solution like
sed -En '/.*title: *(.*[[:alnum:]]).*/{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp}'
See the online demo
Details
-En - POSIX ERE syntax is on (E) and default line output supressed with n
/.*title: *(.*[[:alnum:]]).*/ - matches a line having title: capturing all after it up to the last alnum char into Group 1 and matching the rest of the line
{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp} - if the line is matched,
s//\1/ - remove all but Group 1 pattern (received above)
s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/ - match and capture start of string or 1+ non-alnum chars into Group 1 (with ([^[:alnum:]]+|^)) and then capture an alnum char into Group 2 (with ([0-9a-zA-Z])) and replace with uppercased Group 2 contents (with \U\2).

Related

sed replace string with pipe and stars

I have the following string:
|**barak**.version|2001.0132012031539|
in file text.txt.
I would like to replace it with the following:
|**barak**.version|2001.01.2012031541|
So I run:
sed -i "s/\|\*\*$module\*\*.version\|2001.0132012031539/|**$module**.version|$version/" text.txt
but the result is a duplicate instead of replacing:
|**barak**.version|2001.01.2012031541|**barak**.version|2001.0132012031539|
What am I doing wrong?
Here is the value for module and version:
$ echo $module
barak
$ echo $version
2001.01.2012031541
Assumptions:
lines of interest start and end with a pipe (|) and have one more pipe somewhere in the middle of the data
search is based solely on the value of ${module} existing between the 1st/2nd pipes in the data
we don't know what else may be between the 1st/2nd pipes
the version number is the only thing between the 2nd/3rd pipes
we don't know the version number that we'll be replacing
Sample data:
$ module='barak'
$ version='2001.01.2012031541'
$ cat text.txt
**barak**.version|2001.0132012031539| <<<=== leave this one alone
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.0132012031539| <<<=== replace this one
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.0132012031539| <<<=== replace this one
One sed solution with -Extended regex support enabled and making use of a capture group:
$ sed -E "s/^(\|[^|]*${module}[^|]*).*/\1|${version}|/" text.txt
Where:
\| - first occurrence (escaped pipe) tells sed we're dealing with a literal pipe; follow-on pipes will be treated as literal strings
^(\|[^|]*${module}[^|]*) - first capture group that starts at the beginning of the line, starts with a pipe, then some number of non-pipe characters, then the search pattern (${module}), then more non-pipe characters (continues up to next pipe character)
.* - matches rest of the line (which we're going to discard)
\1|${version}| - replace line with our first capture group, then a pipe, then the new replacement value (${version}), then the final pipe
The above generates:
**barak**.version|2001.0132012031539|
|**apple**.version|2001.0132012031539|
|**barak**.version|2001.01.2012031541| <<<=== replaced
|**chuck**.version|2001.0132012031539|
|**barak**.peanuts|2001.01.2012031541| <<<=== replaced
An awk alternative using GNU awk:
awk -v mod="$module" -v vers="$version" -F \| '{ OFS=FS;split($2,map,".");inmod=substr(map[1],3,length(map[1])-4);if (inmod==mod) { $3=vers } }1' file
Pass two variables mod and vers to awk using $module and $version. Set the field delimiter to |. Split the second field into array map using the split function and using . as the delimiter. Then strip the leading and ending "**" from the first index of the array to expose the module name as inmod using the substr function. Compare this to the mod variable and if there is a match, change the 3rd delimited field to the variable vers. Print the lines with short hand 1
Pipe is only special when you're using extended regular expressions: sed -E
There's no reason why you need extended here, stick with basic regex:
sed "
# for lines matching module.version
/|\*\*$module\*\*.version|/ {
# replace the version
s/|2001.0132012031539|/|$version|/
}
" text.txt
or as an unreadable one-liner
sed "/|\*\*$module\*\*.version|/ s/|2001.0132012031539|/|$version|/" text.txt

how to remove all whitespaces in front and beind 3 consecutive periods

I'm trying to remove all white spaces before and after 3 consecutive periods and replace it with the actual ellipse symbol.
I've tried the following code:
sed 's/[[:space:]]*\.\.\.[[:space:]]*/…/g'
It replaces the 3 periods with the ellipse symbol, but the spaces before and after remain.
Sample Input.
hello ... world
Desired output
hello…world
Expression you are using is ERE(extended regular expressions) you have to add -E option to sed as follows to allow it, since you are using character classes in your code [[:space:]].
sed -E 's/[[:space:]]*\.\.\.[[:space:]]*/.../g' Input_file
Without -E try:
sed 's/ *\.\.\. */.../g' Input_file
Here is another sed
echo "hello ... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello...world
4 dots, do nothing?
echo "hello .... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello .... world
In bash, just use parameter substitution...
foo="hello ... world"
foo="${foo//+( )...+( )/...}"
Now, echo "$foo", outputs:
hello...world
The syntax for BaSH regex variable substitution are as follows:
${var-name/search/replace}
A single /replaces only the first occurrence from the left, while a double //replaces every occurrence.
One of ?*+#! followed by (pattern-list) replaces a specified number of occurrences of the patterns in pattern-list as follows:
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
# A single occurence
! Anything that *doesn't* match one of the occurrences
Pattern list can be any combination of literal strings, or character classes, separated by the pipe character |

Replacing one space with two spaces in Unix

I am trying to replace every time there is one space with two spaces in Unix. We are just reading from standard input and writing to standard ouput. I also have to avoid using the functions awk and perl. For example if I read in something like San Diego it should print San Diego. If there are already multiple spaces, it should just leave them alone.
How about bash only? First test file:
$ cat file
1
2 3
4 5
San Diego NO
Then:
$ cat file |
while IFS= read line
do
while [[ "$line" =~ (^|.+[^ ])\ ([^ ].*) ]]
do
line="${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"
done
echo "$line"
done
1
2 3
4 5
San Diego NO
You have to a bit careful here not to forget spaces at the beginning or end.
I present three solutions for educational purpose:
sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g' # solution 1
sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g' # solution 2
sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g' # solution 3
All three solutions make use of subexpressions:
9.3.6 BREs Matching Multiple Characters
A subexpression can be defined within a BRE by enclosing it between
the character pairs \( and \). Such a subexpression shall match
whatever it would have matched without the \( and \), except that
anchoring within subexpressions is optional behavior; see BRE
Expression Anchoring. Subexpressions can be arbitrarily nested.
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression enclosed
between "\(" and "\)" preceding the '\n'. The character n shall be a
digit from 1 through 9, specifying the nth subexpression (the one that
begins with the nth \( from the beginning of the pattern and ends
with the corresponding paired \) ). The expression is invalid if
less than n subexpressions precede the \n. For example, the
expression ".∗\1$" matches a line consisting of two adjacent
appearances of the same string, and the expression a*\1 fails to
match a. When the referenced subexpression matched more than one
string, the back-referenced expression shall refer to the last matched
string. If the subexpression referenced by the back-reference matches
more than one string because of an asterisk (*) or an interval
expression (see item (5)), the back-reference shall match the last
(rightmost) of these strings.
Solution 1: sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g'
Here there are two subexpressions. The first subexpression \(^\|[^ ]\) matches the beginning of the line (^) or (\|) a non-space character ([^ ]). The second subexpression \($\|[^ ]\) is similar but with the end-of-line ($).
Solution 2: sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g'
This replaces one-or more spaces by the same amount of spaces and an extra one. Afterwards we correct the ones with 3 spaces or more by removing a single space from those.
Solution 3: sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g'
This does the same thing as solution 2 but inverts the logic. First remove a space from all sequences that have more then one space, and afterwards add a space. This one-liner is just one-character shorter then solution 2.
Example: based on solution 1
The following commands are nothing more then echo "string" | sed ..., but to show the spaces, wrapped into a printf statement.
# default string
$ printf "|%s|" " foo bar car "
| foo bar car |
# spaces replaced
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
# 3 spaces in front and back
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
note: If you want to replace any form of blanks (spaces and tabs in any encoding) by the same doubled blank, you could use :
sed 's/\(^\|[^[:blank:]]\)\([[:blank:]]\)\($\|[^[:blank:]]\)/\1\2\2\3/g'
sed 's/\(^\|[[:graph:]]\)\([[:blank:]]\)\($\|[[:graph:]]\)/\1\2\2\3/g
Something along the lines of
cat input.txt | sed 's,\([[:alnum:]]\) \([[:alnum:]]\),\1 \2,'
should work for that purpose.
replace only occurrence of 1 space between 2 chars hat are not white space with 2 spaces
`sed 's/\([^ ]\) \([^ ]\)/\1 \2/g' file`
1) [^ ] - not space char
2) \1 \2 - first expression found in Parenthesis, 2 spaces, second Parentheses expiration
3) sed used with s///g is replacing the regex in the first // with the value in the second //

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

Search and replace with multiple occurences of same pattern

I want to replace the following
word-3.4.4-r0-20170804_101145
second-example-3.4.4-r0-20170804_101145
and-third-example-3.4.4-r0-20170804_101145
I'm looking to replace the hyphen that occurs before the first 3 with a colon e.g.
word:3.4.4-r0-20170804_101145
second-example:3.4.4-r0-20170804_101145
and-third-example:3.4.4-r0-20170804_101145
So far, the closest I can get is
newvar=$(echo "$var" | sed 's/-[0-9]/:/')
but this solution replaces -3 with:
word:.4.4-r0-20170804_101145
second-example:.4.4-r0-20170804_101145
and-third-example:.4.4-r0-20170804_101145
You are almost there. Just use captured group to retain the digit:
newvar=$(sed 's/-\([0-9]\)/:\1/' <<< "$var")
Result:
echo "$newvar"
word:3.4.4-r0-20170804_101145
second-example:3.4.4-r0-20170804_101145
and-third-example:3.4.4-r0-20170804_101145
You may use
sed 's/^\([^0-9-]*\(-[^0-9-]*\)*\)-\([0-9]\)/\1:\3/'
See the online demo
Details
^ - start of a line
\([^0-9-]*\(-[^0-9-]*\)*\) - Group 1:
[^0-9-]* - any 0+ chars other than digits and -
\(-[^0-9-]*\)* - (Group 2) 0+ sequences of - and any 0+ chars other than digits and -
- - a hyphen
\([0-9]\) - Group 3
The \1 is the backreference to Group 1 contents and \3 is the backreference to Group 3 contents.
newvar=$(echo "$var" | sed -r 's/-([0-9])/:\1/')
Very similar to other answers, but I prefer to use the -r option to sed, this allows constructing a sed String that is easier to read as it has fewer \ to parse. Like the other answers, the digit is capture by ([0-9]) and then replaced by \1.

Resources