given string and its substring, how could I add white space before substring using unix - shell

Assume the substring is unique, for example, given string,
"123 main streetHuntington, WV"
How could I locate the Huntington and add a white space before it.
"123 main street Huntington, WV"

If you are parsing arbitrary addresses, this is a very hard problem.
Assuming you're not just trying to put a space in that specific string, you might want to add a space before an uppercase letter that is preceded by a lower case letter.
sed -r 's/([[:lower:]])([[:upper:]])/\1 \2/g'

# perl
echo "123 main streetHuntington, WV" | perl -ne 's/(Huntington)/ $1/;print'
# sed
echo "123 main streetHuntington, WV" | sed 's/\(Huntington\)/ \1/'

Related

how to remove all whitespaces in front and beind 3 consecutive periods

I'm trying to remove all white spaces before and after 3 consecutive periods and replace it with the actual ellipse symbol.
I've tried the following code:
sed 's/[[:space:]]*\.\.\.[[:space:]]*/…/g'
It replaces the 3 periods with the ellipse symbol, but the spaces before and after remain.
Sample Input.
hello ... world
Desired output
hello…world
Expression you are using is ERE(extended regular expressions) you have to add -E option to sed as follows to allow it, since you are using character classes in your code [[:space:]].
sed -E 's/[[:space:]]*\.\.\.[[:space:]]*/.../g' Input_file
Without -E try:
sed 's/ *\.\.\. */.../g' Input_file
Here is another sed
echo "hello ... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello...world
4 dots, do nothing?
echo "hello .... world" | sed -E 's/ +(\.\.\.) +/\1/g'
hello .... world
In bash, just use parameter substitution...
foo="hello ... world"
foo="${foo//+( )...+( )/...}"
Now, echo "$foo", outputs:
hello...world
The syntax for BaSH regex variable substitution are as follows:
${var-name/search/replace}
A single /replaces only the first occurrence from the left, while a double //replaces every occurrence.
One of ?*+#! followed by (pattern-list) replaces a specified number of occurrences of the patterns in pattern-list as follows:
? Zero or one occurrence
* Zero or more occurrences
+ One or more occurrences
# A single occurence
! Anything that *doesn't* match one of the occurrences
Pattern list can be any combination of literal strings, or character classes, separated by the pipe character |

Word after a particular word in a string

I have string say e.g. ab_abc_bbb_ccc_ssss_pppp, I want the word after ccc i.e. ssss from the string, how to achieve the same using unix command
you mean something like this?
echo "ab_abc_bbb_ccc_ssss_pppp" | sed 's/.*ccc_\([^_]*\).*/\1/'
explanation
s/ # substitute
.*ccc_ # find search pattern
\([^_]*\) # save all chars without '_' into arg1 (\1)
_.*/ # ignore trailing chars
\1/ # print \1
output
ssss

Adding zero to part of string using sed

I have SNMP outputs like:
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70
As you can see mac-address output is incorrect, and i fix it with sed:
echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70 |
sed -e 's/\b\(\w\)\b/0\1/g'
Output:
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.03.25 = STRING: 34:08:04:56:f4:70
It fixes address but changes IP as well from 192.19.3.25 to 192.19.03.25. How can I avoid it and force to perform sed only after STRING: or only after last space in the string ?
The MAC address is colon-separated. You can use that to limit the substitutions. This will perform the substitutions that you are interested in but only if the word character is next to a colon:
sed -e 's/\b\w:/0&/g; s/:\(\w\)\b/:0\1/g'
For example:
$ echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:8:4:56:f4:70 | sed -e 's/\b\w:/0&/g; s/:\(\w\)\b/:0\1/g'
IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 34:08:04:56:f4:70
How it works
s/\b\w:/0&/g
This performs the substitution if the word character is preceded by a word break, \b, and followed by a colon. Since we just need to put a zero in front of the entire matched text, not just some section of it, we can omit the parens and just use & to copy the matched text.
s/:\(\w\)\b/:0\1/g
If there are any remaining substitutions that need to be done where the word character is preceded by a colon and followed by a word break, this does them.
Note: We are using GNU extensions that may not be portable.
Another way with sed if the MAC address is at end of line
echo IP-MIB::ipNetToMediaPhysAddress.5122.192.19.3.25 = STRING: 4:8:d:56:f4:7 |
sed -E '
s/$/:/
:A
s/([^[:xdigit:]])([[:xdigit:]]:)/\10\2/
tA
s/:$//'

Reverse four length of letters with sed in unix

How can I reverse a four length of letters with sed?
For example:
the year was 1815.
Reverse to:
the raey was 5181.
This is my attempt:
cat filename | sed's/\([a-z]*\) *\([a-z]*\)/\2, \1/'
But it does not work as I intended.
not sure it is possible to do it with GNU sed for all cases. If _ doesn't occur immediately before/after four letter words, you can use
sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
\b is word boundary, word definition being any alphabet or digit or underscore character. So \b will ensure to match only whole words not part of words
$ echo 'the year was 1815.' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
the raey was 5181.
$ echo 'two time five three six good' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
two emit evif three six doog
$ # but won't work if there are underscores around the words
$ echo '_good food' | sed -E 's/\b([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])\b/\4\3\2\1/gi'
_good doof
tool with lookaround support would work for all cases
$ echo '_good food' | perl -pe 's/(?<![a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])([a-z0-9])(?!=[a-z0-9])/$4$3$2$1/gi'
_doog doof
(?<![a-z0-9]) and (?!=[a-z0-9]) are negative lookbehind and negative lookahead respectively
Can be shortened to
perl -pe 's/(?<![a-z0-9])[a-z0-9]{4}(?!=[a-z0-9])/reverse $&/gie'
which uses the e modifier to place Perl code in substitution section. This form is suitable to easily change length of words to be reversed
Possible shortest sed solution even if a four length of letters contains _s.
sed -r 's/\<(.)(.)(.)(.)\>/\4\3\2\1/g'
Following awk may help you in same. Tested this in GNU awk and only with provided sample Input_file
echo "the year was 1815." |
awk '
function reverse(val){
num=split(val, array,"");
i=array[num]=="."?num-1:num;
for(;i>q;i--){
var=var?var array[i]:array[i]
};
printf (array[num]=="."?var".":var);
var=""
}
{
for(j=1;j<=NF;j++){
printf("%s%s",j==NF||j==2?reverse($j):$j,j==NF?RS:FS)
}}'
This might work for you (GNU sed):
sed -r '/\<\w{4}\>/!b;s//\n&\n/g;s/^[^\n]/\n&/;:a;/\n\n/!s/(.*\n)([^\n])(.*\n)/\2\1\3/;ta;s/^([^\n]*)(.*)\n\n/\2\1/;ta;s/\n//' file
If there are no strings of the length required to reverse, bail out.
Prepend and append newlines to all required strings.
Insert a newline at the start of the pattern space (PS). The PS is divided into two parts, the first line will contain the current word being reversed. The remainder will contain the original line.
Each character of the word to be reversed is inserted at the front of the first line and removed from the original line. When all the characters in the word have been processed, the original word will have gone and only the bordering newlines will exist. These double newlines are then replaced by the word in the first line and the process is repeated until all words have been processed. Finally the newline introduced to separate the working line and the original is removed and the PS is printed.
N.B. This method may be used to reverse strings of varying string length i.e. by changing the first regexp strings of any number can be reversed. Also strings between two lengths may also be reversed e.g. /\<w{2,4}\>/ will change all words between 2 and 4 character length.
It's a recurrent problem so somebody created a bash command called "rev".
echo "$(echo the | rev) $(echo year | rev) $(echo was | rev) $(echo 1815 | rev)".
OR
echo "the year was 1815." | rev | tr ' ' '\n' | tac | tr '\n' ' '

why does one less space in regex makes my sed go weird?

Here is an example of some regex I am trying to figure out. The goal is to strip out extra spaces and make it only one space between words via sed. The sample given has three spaces between sdf and sdk:
test#ubuntu:~/addr_book_script$ echo "est sdf sdk" | sed 's/ */ /g'
est sdf sdk
test#ubuntu:~/addr_book_script$ echo "est sdf sdk" | sed 's/ */ /g'
e s t s d f s d k
You will notice that the two sed statement only differs on the number of spaces before the *. The first statement had two spaces and it behaved exactly what I wanted.
The second statement had one space before the * and it stuck a space between each letter and word.
I know the * means any number of occurrences of whatever-it-is-that-I-am-looking-for. What I don't understand is why the one space sed replace behaves the way it does.
Thanks
sed 's/ */ /g'
The regex * matches 0 or more occurrences of (space).
At the start of the string a 0 space match is found and replaced by single space
After the first letter another 0 space match is found and replaced by single space and so forth.
After est, more than 0 space is found and replaced by single space
And so forth.
Another example:
~ >>> echo "est sdf sdk" | sed 's/a*/ /g'
e s t s d f s d k
The replacements are occurred because of 0 character match.
" *" (space-star) in regex means 0 or more occurrences of space and so it replaces every instance of 0 or more spaces with a space
" *" (space-space-star) forces there to be at least one space
" +" (space-plus) would accomplish the same thing in some regular expression flavors, but not BRE

Resources