using sed how to put space after numbers in a big string - bash

This question was asked in an interview. I could not answer! So getting some help here to understand the logic. i.e. how to put space between a number string and character string.
Given the string "1abc2abcd3efghi10z11jkl100pqrs" what command you use to get following result -
"1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs"
Thanks in advance.

Here is another -- yet simple -- way to think about it:
echo "1abc2abcd3efghi10z11jkl100pqrs" | \
sed -r 's/([0-9])([a-zA-Z])/\1 \2/g; s/([a-zA-Z])([0-9])/\1 \2/g'
add a whitespace between a digit-letter string & letter-digit string
() is to capture the group and \1 and \2 is to return the first and second captured group

With GNU sed:
$ echo "1abc2abcd3efghi10z11jkl100pqrs" | sed -e 's/[0-9]\+/ & /g' -e 's/^ \| $//'
1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs
With awk:
$ echo "1abc2abcd3efghi10z11jkl100pqrs" | awk '{gsub(/[0-9]+/," & ",$0); $1=$1}1'
1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs
gsub with substitute all numbers with space before and after it.
$1=$1 will re-compute entire line and add OFS (by default single
space)

I would have chosen sed over awk:
echo "1abc2abcd3efghi10z11jkl100pqrs" | sed 's/[0-9]\+/ & /g; s/^[ ]//; s/[ ]$//'
It surrounds each run of digits with spaces and afterwards removes the (possibly) leading and trailing ones.
It yields:
1 abc 2 abcd 3 efghi 10 z 11 jkl 100 pqrs

echo 1abc2abcd3efghi10z11jkl100pqrs | \
sed -r -e 's/([[:digit:]]+)/ \1 /g' -e 's/^ *//g' -e 's/ *$//g'
Take the expression -e 's/([[:digit:]]+)/ \1 /g' first.
The parentheses around [[:digit:]]+ 'capture' each sequence of one or more digits. Since it's the first capture group, it's referenced in the substitution by \1 (then there's the space before and after:  \1 ).
The g tells sed to perform this substitution 'globally' on the input.
The -r before the expression tells sed to use extended regular expressions.
The other two 'expressions' (each expression has -e before it to show that it's an expression):
-e 's/^ *//g' will remove leading whitespace, and -e 's/ *$//g' will remove trailing whitespace.

Using perl:
echo 1abc2abcd3efghi10z11jkl100pqrs | perl -F'(\d+)' -ane \
'$F[0] and print "#F\n" or print "#F[1..$#F]"'
Some explanation:
-an together tells Perl to split each line of input and put the resulting fields into the array #F.
-F specifies a delimiter of one or more digits to use with -an to split the input. The parentheses cause the delimiters themselves to be stored in the array, not just the strings they separate.
-e specifies the code to run after each line is read. We simply want to print the contents of #F, with the default field separator (space) used to separate elements of the array. The and...or combination is used to ignore the first field if it is empty, as it will be if the input line starts with a delimiter.

Related

POSIX: abcdef to ab bc cd de ef

Using POSIX sed or awk, I would like to duplicate every second character in every pair of neighboring characters and list every newly-formed pair on a new line.
example.txt:
abcd 10001.
Expected result:
ab
bc
cd
d
1
10
00
00
01
1.
So far, this is what I have (N.B. omit "--posix" if on macOS). For some reason, adding a literal newline character before \2 does not produce the expected result. Removing the first group and using \1 has the same effect. What am I missing?
sed --posix -E -e 's/(.)(.)/&\2\
/g' example.txt
abb
cdd
100
000
1..
Try:
$ echo "abcd 10001." | awk '{for(i=1;i<length($0);i++) print substr($0,i,2)}'
ab
bc
cd
d
1
10
00
00
01
1.
You may use
sed --posix -e 's/./&\
&/g' example.txt | sed '1d;$d'
The first sed command finds every char in the string and replaces with the same char, then a newline and then the same char again. Since it replaces first and last chars, the first and last resulting lines must be removed, which is achieved with sed '1d;$d'.
Had sed supported lookarounds, one could have used (?!^).(?!$) (any char but not at the start or end of string) and the last sed command would not have been necessary, but it is not possible with sed. You could use it in perl though, perl -pe 's/(?!^).(?!$)/$&\n$&/g' example.txt (see demo online, $& in the RHS is the same as & placeholder in sed, the whole match value).
With GNU awk could you please try following. Written and tested with shown samples and tested it in link
https://ideone.com/qahp0S
awk '
BEGIN{
FS=""
}
{
for(i=1;i<=(NF-1);i++){
print $i$(i+1)
}
}
' Input_file
Explanation: setting field separator as NULL in the BEGIN section of program for all lines here. Then in main program running a for loop which runs from 1st field to till 2nd last field. In that loop's each iteration printing current and next field.
Using same routine, it can be done in bash itself:
s='abcd 10001.'
for((i=0; i<${#s}-1; i++)); do echo "${s:i:2}"; done
ab
bc
cd
d
1
10
00
00
01
1.
Just for fun, a single sed consisting of 3 substitutions:
$ echo "abcd 10001." | sed 's/./&&/g;s/\(^.\|.$\)//g;s/../&\n/g'
The first part duplicates all characters, the second part removes the first and last character, the third part adds a newline character after each character-pair.
If you want to be POSIX compliant you have to do:
$ echo "abcd 10001." | sed -e 's/./&&/g' -e 's/^.//g' -e 's/.$//g' -e 's/../&\n/g'
Here we had to add an extra one as the expression \(^.\|.$) is an ERE and posix sed only accepts a BRE
This might work for you (GNU sed):
sed 's/.\(.\)/&\n\1/;/../P;D' file
Replace the first two characters by the first two characters, a newline and the second character.
Print the first line if it is two characters long, delete the first line and repeat.
Alternative, more long winded:
sed -E ':a;s/^(([^\n]{2}\n)*[^\n])([^\n])([^\n])/\1\3\n\3\4/;ta' file
Or, with no hardcoded new line:
sed -E '/.../{G;s/^(.(.))(.*)(.)/\1\4\2\3/;P;D}' file
Lastly:
sed 's/./&\n&/g;s/^..\|..$/g' file
Process substitution isn't specified by POSIX. The POSIX requirement was only specified for awk and sed, so maybe the next solution is acceptable:
paste -d '\0' <(echo; fold -w1 example.txt) <(fold -w1 example.txt) | grep ..
or
while read -n1 ch; do
printf "%s\n%s" "${ch}" "${ch}"
done < example.txt | grep ..
or
sed 's/./&&/g;s/.//' example.txt | grep -o ..

unix sed substitute nth occurence misfunction?

Let's say I have a string which contains multiple occurences of the letter Z.
For example: aaZbbZccZ.
I want to print parts of that string, each time until the next occurence of Z:
aaZ
aaZbbZ
aaZbbZccZ
So I tried using unix sed for this, with the command sed s/Z.*/Z/i where i is an index that I have running from 1 to the number of Z's in the string. As far as my sed understanding goes: this should delete everything that comes after the i'th Z, But in practice this only works when I have i=1 as in sed s/Z.*/Z/, but not as I increment i, as in sed s/Z.*/Z/2 for example, where it just prints the entire original string. It feels as if there's something I am missing about the functioning of sed, since according to multiple manuals, it should work.
edit: for example, in the string aaZbbZccZ while applying sed s/Z.*/Z/2 I am expecting to have aaZbbZ, as everything after the 2nd occurence of Z get's deleted.
Below sed works closely to what you are looking for, except it removes also the last Z.
$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//1g;s/$/Z/'
aaZ
$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//2g;s/$/Z/'
aaZbbZ
$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//3g;s/$/Z/'
aaZbbZccZ
$echo aaZbbZccZdd | sed -e 's/Z[^Z]*//4g;s/$/Z/'
aaZbbZccZddZ
Edit:
Modified according to Aaron suggestion.
Edit2:
If you don't know how many Z there are in the string it's safer to use below command. Otherwise additional Z is added at the end.
-r - enables regular expressions
-e - separates sed operations, the same as ; but easier to read in my opinion.
$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//1g' -e 's/([^Z])$/\1Z/'
aaZ
$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//2g' -e 's/([^Z])$/\1Z/'
aaZbbZ
$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//3g' -e 's/([^Z])$/\1Z/'
aaZbbZccZ
$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//4g' -e 's/([^Z])$/\1Z/'
aaZbbZccZddZ
$echo aaZbbZccZddZ | sed -r -e 's/Z[^Z]*//5g' -e 's/([^Z])$/\1Z/'
aaZbbZccZddZ
This should do what you expect (see comments) unless your string can contain line breaks:
# -n will prevent default printing
echo 'aaZbbZccZ' | sed -n '{
# Add a line break after each 'Z'
s/Z/Z\
/g
# Print it and consume it in the next sed command
p
}' | sed -n '{
# Add only the first line to the hold buffer (you can remove it if you don't mind to see first blank line)
1 {
h
}
# As for the rest of the lines
2,$ {
# Replace the hold buffer with the pattern space
x
# Remove line breaks
s/\n//
# Print the result
p
# Get the hold buffer again (matched line)
x
# And append it with new line to the hold buffer
H
}'
The idea is to break the string into multiples lines (each is terminated with Z), that will be processed one by one on the second sed command.
On the second sed we use the Hold Buffer to remember previous lines, print the aggregated result, append new lines and each time remove the line breaks we previously added.
And the output is
aaZ
aaZbbZ
aaZbbZccZ
This might work for you (GNU sed):
sed -n 's/Z/&\n/g;:a;/\n/P;s/\n\(.*Z\)/\1/;ta' file
Use sed's grep-like option -n to explicitly print content. Append a newline after each Z. If there were no substitutions then there is nothing to be done. Print upto the first newline, remove the first newline if the following characters contain a Z and repeat.

Search first occurrence and print until next delimiter, but match whole word only

I have a file with multiple lines of text similar to:
foo
1
2
3
bar
fool
1
2
3
bar
food
1
2
3
bar
So far the following gives me a closer answer:
sed -n '/foo/,/bar/ p' file.txt | sed -e '$d'
...but it fails by introducing duplicates if it encounters words like "food" or "fool". I want to make the code above do a whole word match only (i.e. grep -w), but inserting the \b switch doesn't seem to work:
sed -n '/foo/\b,/bar/ p' file.txt | sed -e '$d'
I would like to print anything after "foo" (including the first foo) up until "bar", but matching only "foo", and not "foo1".
Use the Regex tokens ^ and $ to indicate the start and end of a line respectively:
sed -n '/^foo$/,/^bar$/ p' file.txt
sed -n '/\<foo\>/,/\<bar\>/ p' file.txt
Or may be this if foo and bar have to be first word of any line.
sed -n '/^\<foo\>/,/^\<bar\>/ p' file

Replace the last six spaces with comma

How would I replace the last six spaces with comma in a text file from each line with bash?
I have:
$cat myfile
foo bar foo 6 1 3 23 1 20
foo bar 6 1 2 18 1 15
foo 5 5 0 15 1 21
What I want is:
$cat myfile
foo bar foo,6,1,3,23,1,20
foo bar,6,1,2,18,1,15
foo,5,5,0,15,1,21
Any help is appreciated! Thanks!
It looks like the rule could be to substitute any space before a digit for a comma:
sed 's/ \([0-9]\)/,\1/g' file
Alternatively, following your specification (replace the last six spaces), you could go for something like this:
awk '{for(i=1; i<=NF; ++i)printf "%s%s", $i, (i<NF-6?FS:(i<NF?",":RS))}' file
This loops through the field in the input, printing each one followed by either a space (FS), a comma or a newline (RS), depending on how close it is to the end of the line.
More complete sed with added rev (for reverse) might be
rev myfile | sed 's/ /,/; s/ /,/; s/ /,/; s/ /,/; s/ /,/; s/ /,/' | rev
sed part for first occurences of course can be simplified if needed!
This might work for you (GNU sed):
sed -r ':a;s/(.*) /\1,/;x;s/^/x/;/^x{6}/{z;x;b};x;ba' file
This uses greed to find the last space on a line and then keeps track of the number of spaces replaced by keeping a counter in the hold space.

sed help - convert a string of form ABC_DEF_GHI to AbcDefGhi

How can covert a string of form ABC_DEF_GHI to AbcDefGhi using any online command such as sed etc. ?
Here's a one-liner using gawk:
echo ABC_DEF_GHI | gawk 'function cap(s){return toupper(substr(s,1,1))tolower(substr(s,2))}{n=split($0,x,"_");for(i=1;i<=n;i++)o=o cap(x[i]); print o}'
AbcDefGhi
Optimized awk 1-liner
awk -v RS=_ '{printf "%s%s", substr($0,1,1), tolower(substr($0,2))}'
Optimized sed 1-liner
sed 's/\(.\)\(..\)_\(.\)\(..\)_\(.\)\(..\)/\1\L\2\U\3\L\4\U\5\L\6/'
Edit:
Here's a gawk version:
gawk -F_ '{for (i=1;i<=NF;i++) printf "%s%s",substr($i,1,1),tolower(substr($i,2)); printf "\n"}'
Original:
Using sed for this is pretty scary:
sed -r 'h;s/(^|_)./\n/g;y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;x;s/((^|_)(.))[^_]*/\3\n/g;G;:a;s/(^.*)([^\n])\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;ta;s/\n//g'
Here it is broken down:
# make a copy in hold space
h;
# replace all the characters which will remain upper case with newlines
s/(^|_)./\n/g;
# lowercase all the remaining characters
y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;
# swap the copy into pattern space and the lowercase characters into hold space
x;
# discard all but the characters which will remain upper case
s/((^|_)(.))[^_]*/\3\n/g;
# append the lower case characters to the end of pattern space
G;
# top of the loop
:a;
# shuffle the lower case characters back into their proper positions (see below)
s/(^.*)([^\n])\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;
# if a replacement was made, branch to the top of the loop
ta;
# remove all the newlines
s/\n//g
Here's how the shuffle works:
At the time it starts, this is what pattern space looks like:
A
D
G
bc
ef
hi
The shuffle loop picks up the string that's between the last newline and the end and moves it to the position before the two consecutive newlines (actually three) and moves the extra newline so it's before the character that it previously followed.
After the first step through the loop, this is what pattern space looks like:
A
D
Ghi
bc
ef
And processing proceeds similarly until there's nothing before the extra newline at which point the match fails and the loop branch is not taken.
If you want to title case a sequence of words separated by spaces, the script would be similar:
$ echo 'BEST MOVIE THIS YEAR' | sed -r 'h;s/(^| )./\n/g;y/ABCDEFGHIJKLMNOPQRSTUVWXYZ/abcdefghijklmnopqrstuvwxyz/;x;s/((^| ).)[^ ]*/\1\n/g;G;:a;s/(^.*)( [^\n]*)\n\n(.*)\n([^\n]*)$/\1\n\2\4\3/;ta;s/^([^\n]*)(.*)\n([^\n]*)$/\1\3\2/;s/\n//g'
Best Movie This Year
One liner using perl:
$ echo 'ABC_DEF_GHI' | perl -npe 's/([A-Z])([^_]+)_?/$1\L$2\E/g;'
AbcDefGhi
This might work for you:
echo "ABC_DEF_GHI" |
sed 'h;s/\(.\)[^_]*\(_\|$\)/\1/g;x;y/'$(printf "%s" {A..Z} / {a..z})'/;G;:a;s/\(\(^[a-z]\)\|_\([a-z]\)\)\([^\n]*\n\)\(.\)/\5\4/;ta;s/\n//'
AbcDefGhi
Or using GNU sed:
echo "ABC_DEF_GHI" | sed 's/\([A-Z]\)\([^_]*\)\(_\|$\)/\1\L\2/g'
AbcDefGhi
Less scary sed version with tr:
echo ABC_DEF_GHI | sed -e 's/_//g' - | tr 'A-Z' 'a-z'

Resources