Parenthesize first character of each word - shell

i want to Parenthesize first character of each word
$ echo "Welcome To The Geek Stuff" | sed 's/\(\b[A-Z]\)/\(\1\)/g'
can anyone explain? i am not getting how it is working?

sed 's/pattern1/pattern2' --- Does "replace first occurrence of pattern1 with pattern2"
sed 's/pattern1/pattern2/g' --- Does "A (g)lobal replacement => replace all the occurrence of pattern1 with pattern2 "
sed 's/\b(pattern1)/pattern2/g' --- Does " A word by word search"
sed 's/\b([A-Z])/pattern2/g' --- Does " Matches a single uppercase letter"
sed 's/\b([A-Z])/(\1)/g' --- Does " sed 's/\b([A-Z])/([A-Z])/g' "
\1 is a back reference. [Refer][https://www.gnu.org/software/sed/manual/html_node/Back_002dreferences-and-Subexpressions.html]
In short It does a global replacement (replace all occurrences) of any uppercase letter with (uppercase letter) doing a word by word search.

I need to use sed -E to get that working.
$ echo "Welcome To The Geek Stuff" | sed 's/(\b[A-Z])/(\1)/g'
sed: -e expression #1, char 18: invalid reference \1 on `s' command's RHS
$ echo "Welcome To The Geek Stuff" | sed -E 's/(\<.)/(\1)/g'
(W)elcome (T)o (T)he (G)eek (S)tuff
You could also use the \< anchor which is "start of word", where \b is "word boundary". Using start of word marker lets you simplify the regex to match any word character:
$ echo "Welcome To The Geek Stuff 123" | sed -E 's/\<./(&)/g'
(W)elcome (T)o (T)he (G)eek (S)tuff (1)23

you should do this:
echo "Welcome To The Our Class" | sed 's/\([A-Z]\)/\(\1\)/g'
(remove the "\b")
between the first "/" to the second, there is an expression to be replace on the expression between the second "/" to the third.
you search in the sentence a string that begin in capital letter(chr between A to Z) and add to this letter "(" before and ")" after. 1 means the first letter in a word.
the output will be :
(W)elcome (T)o (T)he (O)ur (C)lass

To parenthesize the first 3 letters of a word, you can use
$ echo "the quick brown fox jumps over a lazy
dog" | sed 's/(\b[a-Z]{1,3})/(&)/g' (the) (qui)ck (bro)wn (fox)
(jum)ps (ove)r (a) (laz)y (dog) $

Related

How to convert a line into camel case?

This picks all the text on single line after a pattern match, and converts it to camel case using non-alphanumeric as separator, remove the spaces at the beginning and at the end of the resulting string, (1) this don't replace if it has 2 consecutive non-alphanumeric chars, e.g "2, " in the below example, (2) is there a way to do everything using sed command instead of using grep, cut, sed and tr?
$ echo " hello
world
title: this is-the_test string with number 2, to-test CAMEL String
end! " | grep -o 'title:.*' | cut -f2 -d: | sed -r 's/([^[:alnum:]])([0-9a-zA-Z])/\U\2/g' | tr -d ' '
ThisIsTheTestStringWithNumber2,ToTestCAMELString
To answer your first question, change [^[:alnum:]] to [^[:alnum:]]+ to mach one ore more non-alnum chars.
You may combine all the commands into a GNU sed solution like
sed -En '/.*title: *(.*[[:alnum:]]).*/{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp}'
See the online demo
Details
-En - POSIX ERE syntax is on (E) and default line output supressed with n
/.*title: *(.*[[:alnum:]]).*/ - matches a line having title: capturing all after it up to the last alnum char into Group 1 and matching the rest of the line
{s//\1/;s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/gp} - if the line is matched,
s//\1/ - remove all but Group 1 pattern (received above)
s/([^[:alnum:]]+|^)([0-9a-zA-Z])/\U\2/ - match and capture start of string or 1+ non-alnum chars into Group 1 (with ([^[:alnum:]]+|^)) and then capture an alnum char into Group 2 (with ([0-9a-zA-Z])) and replace with uppercased Group 2 contents (with \U\2).

how to remove all whitespaces in front and beind 3 consecutive periods

I'm trying to remove all white spaces before and after 3 consecutive periods and replace it with the actual ellipse symbol.
I've tried the following code:
sed 's/[[:space:]]*\.\.\.[[:space:]]*/…/g'
It replaces the 3 periods with the ellipse symbol, but the spaces before and after remain.
Sample Input.
hello ... world
Desired output
hello…world
Expression you are using is ERE(extended regular expressions) you have to add -E option to sed as follows to allow it, since you are using character classes in your code [[:space:]].
sed -E 's/[[:space:]]*\.\.\.[[:space:]]*/.../g' Input_file
Without -E try:
sed 's/ *\.\.\. */.../g' Input_file
Here is another sed
echo "hello ... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello...world
4 dots, do nothing?
echo "hello .... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello .... world
In bash, just use parameter substitution...
foo="hello ... world"
foo="${foo//+( )...+( )/...}"
Now, echo "$foo", outputs:
hello...world
The syntax for BaSH regex variable substitution are as follows:
${var-name/search/replace}
A single /replaces only the first occurrence from the left, while a double //replaces every occurrence.
One of ?*+#! followed by (pattern-list) replaces a specified number of occurrences of the patterns in pattern-list as follows:
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
# A single occurence
! Anything that *doesn't* match one of the occurrences
Pattern list can be any combination of literal strings, or character classes, separated by the pipe character |

Replacing one space with two spaces in Unix

I am trying to replace every time there is one space with two spaces in Unix. We are just reading from standard input and writing to standard ouput. I also have to avoid using the functions awk and perl. For example if I read in something like San Diego it should print San Diego. If there are already multiple spaces, it should just leave them alone.
How about bash only? First test file:
$ cat file
1
2 3
4 5
San Diego NO
Then:
$ cat file |
while IFS= read line
do
while [[ "$line" =~ (^|.+[^ ])\ ([^ ].*) ]]
do
line="${BASH_REMATCH[1]} ${BASH_REMATCH[2]}"
done
echo "$line"
done
1
2 3
4 5
San Diego NO
You have to a bit careful here not to forget spaces at the beginning or end.
I present three solutions for educational purpose:
sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g' # solution 1
sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g' # solution 2
sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g' # solution 3
All three solutions make use of subexpressions:
9.3.6 BREs Matching Multiple Characters
A subexpression can be defined within a BRE by enclosing it between
the character pairs \( and \). Such a subexpression shall match
whatever it would have matched without the \( and \), except that
anchoring within subexpressions is optional behavior; see BRE
Expression Anchoring. Subexpressions can be arbitrarily nested.
The back-reference expression '\n' shall match the same (possibly
empty) string of characters as was matched by a subexpression enclosed
between "\(" and "\)" preceding the '\n'. The character n shall be a
digit from 1 through 9, specifying the nth subexpression (the one that
begins with the nth \( from the beginning of the pattern and ends
with the corresponding paired \) ). The expression is invalid if
less than n subexpressions precede the \n. For example, the
expression ".∗\1$" matches a line consisting of two adjacent
appearances of the same string, and the expression a*\1 fails to
match a. When the referenced subexpression matched more than one
string, the back-referenced expression shall refer to the last matched
string. If the subexpression referenced by the back-reference matches
more than one string because of an asterisk (*) or an interval
expression (see item (5)), the back-reference shall match the last
(rightmost) of these strings.
Solution 1: sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g'
Here there are two subexpressions. The first subexpression \(^\|[^ ]\) matches the beginning of the line (^) or (\|) a non-space character ([^ ]). The second subexpression \($\|[^ ]\) is similar but with the end-of-line ($).
Solution 2: sed 's/\( \+\)/ \1/g;s/ \( \+\)/\1/g'
This replaces one-or more spaces by the same amount of spaces and an extra one. Afterwards we correct the ones with 3 spaces or more by removing a single space from those.
Solution 3: sed 's/ \( \+\)/\1/g;s/\( \+\)/ \1/g'
This does the same thing as solution 2 but inverts the logic. First remove a space from all sequences that have more then one space, and afterwards add a space. This one-liner is just one-character shorter then solution 2.
Example: based on solution 1
The following commands are nothing more then echo "string" | sed ..., but to show the spaces, wrapped into a printf statement.
# default string
$ printf "|%s|" " foo bar car "
| foo bar car |
# spaces replaced
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
# 3 spaces in front and back
$ printf "|%s|" "$(echo " foo bar car " | sed 's/\(^\|[^ ]\) \($\|[^ ]\)/\1 \2/g')"
| foo bar car |
note: If you want to replace any form of blanks (spaces and tabs in any encoding) by the same doubled blank, you could use :
sed 's/\(^\|[^[:blank:]]\)\([[:blank:]]\)\($\|[^[:blank:]]\)/\1\2\2\3/g'
sed 's/\(^\|[[:graph:]]\)\([[:blank:]]\)\($\|[[:graph:]]\)/\1\2\2\3/g
Something along the lines of
cat input.txt | sed 's,\([[:alnum:]]\) \([[:alnum:]]\),\1 \2,'
should work for that purpose.
replace only occurrence of 1 space between 2 chars hat are not white space with 2 spaces
`sed 's/\([^ ]\) \([^ ]\)/\1 \2/g' file`
1) [^ ] - not space char
2) \1 \2 - first expression found in Parenthesis, 2 spaces, second Parentheses expiration
3) sed used with s///g is replacing the regex in the first // with the value in the second //

Adding zero to part of string using sed

I have SNMP outputs like:
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70
As you can see mac-address output is incorrect, and i fix it with sed:
echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70 |
sed -e 's/\b\(\w\)\b/0\1/g'
Output:
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.03.25 = STRING: 34:08:04:56:f4:70
It fixes address but changes IP as well from 192.19.3.25 to 192.19.03.25. How can I avoid it and force to perform sed only after STRING: or only after last space in the string ?
The MAC address is colon-separated. You can use that to limit the substitutions. This will perform the substitutions that you are interested in but only if the word character is next to a colon:
sed -e 's/\b\w:/0&/g; s/:\(\w\)\b/:0\1/g'
For example:
$ echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70 | sed -e 's/\b\w:/0&/g; s/:\(\w\)\b/:0\1/g'
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:08:04:56:f4:70
How it works
s/\b\w:/0&/g
This performs the substitution if the word character is preceded by a word break, \b, and followed by a colon. Since we just need to put a zero in front of the entire matched text, not just some section of it, we can omit the parens and just use & to copy the matched text.
s/:\(\w\)\b/:0\1/g
If there are any remaining substitutions that need to be done where the word character is preceded by a colon and followed by a word break, this does them.
Note: We are using GNU extensions that may not be portable.
Another way with sed if the MAC address is at end of line
echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 4:8:d:56:f4:7 |
sed -E '
s/$/:/
:A
s/([^[:xdigit:]])([[:xdigit:]]:)/\10\2/
tA
s/:$//'

I'm puzzled here about awk, sed, etc

I'm trying for a while to work this out with no success so far
I have a command output that I need to chew to make it suitable for further processing
The text I have is:
1/2 [3] (27/03/2012 19:32:54) word word word word 4/5
What I need is to extract only the numbers 1/2 [3] 4/5 so it will look:
1 2 3 4 5
So, basically I was trying to exclude all characters that are not digits, like "/", "[", "]", etc.
I tried awk with FS, tried using regexp, but none of my tries were successful.
I would then add something to it like
first:1 second:2 third:3 .... etc
Please take in mind I'm talking about a file that contains a lot if lines with the same structure, but I already though about using awk to sum every column with
awk '{sum1+=$1 ; sum2+=$2 ;......etc} END {print "first:"sum1 " second:"sum2.....etc}'
But first I will need to extract only the relevant numbers,
The date that is in between "( )" can be omitted completely but they are numbers too, so filtering merely by digits won't be enough as it will match them too
Hope you can help me out
Thanks in advance!
This: sed -r 's/[(][^)]*[)]/ /g; s/[^0-9]+/ /g' should work. It makes two passes, removing parenthesized expressions first and then replacing all runs of non-digits with single spaces.
You can do something like sed -e 's/(.*)//' -e 's/[^0-9]/ /g'. It deletes everything inside the round brackets, than substitutes all non-digit characters with a space. To get rid of extra spaces you can feed it to column -t:
$ echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' | sed -e 's/(.*)//' -e 's/[^0-9]/ /g' | column -t
1 2 3 4 5
TXR:
#(collect)
#one/#two [#three] (#date #time) #(skip :greedy) #four/#five
#(filter :tonumber one two three four five)
#(end)
#(bind (first second third fourth fifth)
#(mapcar (op apply +) (list one two three four five)))
#(output)
first:#first second:#second third:#third fourth:#fourth fifth:#fifth
#(end)
data:
1/2 [3] (27/03/2012 19:32:54) word word word word 4/5
10/20 [30] (27/03/2012 19:32:54) word word 40/50
run:
$ txr data.txr data.txt
first:11 second:22 third:33 fourth:44 fifth:55
Easy to add some error checking:
#(collect)
# (cases)
#one/#two [#three] (#date #time) #(skip :greedy) #four/#five
# (or)
#line
# (throw error `badly formatted line: #line`)
# (end)
# (filter :tonumber one two three four five)
#(end)
#(bind (first second third fourth fifth)
#(mapcar (op apply +) (list one two three four five)))
#(output)
first:#first second:#second third:#third fourth:#fourth fifth:#fifth
#(end)
$ txr data.txr -
foo bar junk
txr: unhandled exception of type error:
txr: ("badly formatted line: foo bar junk")
Aborted
TXR is for robust programming. There is strong typing, so you can't treat strings as numbers just because they contain digits. Variables have to be bound before use, and so misspelled variables do not silently default to zero or blank, but rather produce an unbound variable <name> in <file>:<line> type error. Text extraction is performed with lots of specific context to guard against misinterpreting input in one format as being in another format.
see below, if it is what you want:
kent$ echo "1/2 [3] (27/03/2012 19:32:54) word word word word 4/5"|sed -r 's/\([^)]*\)//g; s/[^0-9]/ /g'
1 2 3 4 5
if you want it to look better:
kent$ echo "1/2 [3] (27/03/2012 19:32:54) word word word word 4/5"|sed -r 's/\([^)]*\)//g; s/[^0-9]/ /g;s/ */ /g'
1 2 3 4 5
awk '{ first+=gensub("^([0-9]+)/.*","\\1","g",$0)
second+=gensub("^[0-9]+/([0-9]+) .*","\\1","g",$0)
thirdl+=gensub("^[0-9]+/[0-9]+ \[([0-9]+)\].*","\\1","g",$0)
fourth+=gensub("^.* ([0-9]+)/[0-9]+ *$","\\1","g",$0)
fifth+=gensub("^.* [0-9]+/([0-9]+) *$","\\1","g",$0)
}
END { print "first: " first " second: " second " third: " third " fourth: " fourth " fifth: " fifth
}
Might work for you.
This will give you digits extracted out excluding text in parenthesis:
digits=$(echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' |\
sed 's/(.*)//' | grep -o '[0-9][0-9]*')
echo $digits
or pure sed solution:
echo '1/2 [3] (27/03/2012 19:32:54) word word word word 4/5' |\
sed -e 's/(.*)//' -e 's/[^0-9]/ /g' -e 's/[ \t][ \t]*/ /g'
OUTPUT:
1 2 3 4 5
one pass with awk is sufficient if you set a fancy field separator: any one of slash, space, open bracket or close bracket separates a field:
awk -F '[][/ ]' '
{s1+=$1; s2+=$2; s3+=$4; s4+=$(NF-1); s5+=$NF}
END {printf("first:%d second:%d third:%d fourth:%d fifth:%d\n", s1, s2, s3, s4, s5)}
'

Resources