Shell scripting, replace the characters using sed - shell

In shell scripting, i want to replace 1bc1034gf45dna22 (16 characters total) with 1b:c1:03:4g:f4:5d:na:22 (seperated by colon) using sed.
Edit
I have tried
sed 's/\w{2}/\w{2}:/g' a.txt > b.txt
where a.txt has
1bc1034gf45dna22

Too Many Edge Cases
Sed is not an ideal tool for this job, because there are just too many edge cases. If you have very regular data and a known corpus then that's fine, but in some cases you need something more powerful that can act on pieces of a line independently after matching/capturing the text of interest. For example, given an input file containing:
foo 1bc1034gf45dna22 bar
you might need to extract the second column, transform it, and then substitute it in place. You might do this with Ruby as follows:
$ echo 'foo 1bc1034gf45dna22 bar' |
ruby -pe '
if /(?<str>\p{Alnum}{16})/ =~ $_
$_.sub!(str, str.scan(/../).join(?:))
end'
This correctly yields:
foo 1b:c1:03:4g:f4:5d:na:22 bar

Does
sed -e 's/../&:/g' -e 's/:$//' a.txt > b.txt
or
sed -e 's/\(..\)/\1:/g' -e 's/:$//' a.txt > b.txt
work for you?

This might work for you (GNU sed):
sed 's/\w\w\B/&:/g' file

why not sed 's/1bc1034gf45dna22/1b:c1:03:4g:f4:5d:na:2/g' file ?

You could:
sed -i 's/\(\w\{2\}\)/\1:/g' file
This would also append a : to the end of the string. Since we cannot use positive lookahead in our sed regex syntax, we would then need to:
sed -i 's/:$//g' file
Notice that we used inplace editing for sed therefore the file will in this way altered.
Alternatively, you could use perl to use a regex that would have been /(\w{2})(?=\w{2})/$1:/g

With GNU/sed on any GNU/Linux distro you can use extentded regular expressions. This will convert a string of any length to a string where each two letters are separated by a colon:
echo 1bc1034gf45dna22 | sed -re 's/(\w{2})/\1:/g' -e 's/:$//'
On your AIX platform, you can leverage awk:
echo 1bc1034gf45dna22 | awk '
BEGIN { FS=""; ORS=""; }
{
for (i=1; i <=NF; i++) {
if (i > 1 && (i + 1) % 2 == 0) {
print(":");
}
print($i);
}
}'

Well, there's always the brute-force method:
sed 's/\<'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\([0-9a-z][0-9a-z]\)'\
'\>/\1:\2:\3:\4:\5:\6:\7:\8/g'
With no leading spaces on the continuation lines the shell pastes the strings together properly.

Related

Find the pattern (YYYY-MM-DD) and replace it with the same value concatenating with apostrophes

I have this kind of data:
1,1990-01-01,2,A,2015-02-09
1,NULL,2,A,2015-02-09
1,1990-01-01,2,A,NULL
And looking for solution which will replace each date in the file with the old value but adding apostrophes. Basically expected result from the example will be:
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
I have found the way how to find the pattern which match my date, but still can't get with what I can then replace it.
sed 's/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/????/' a.txt > b.txt
Catch the date in a group by surrounding the pattern with parentheses (). Then you can use this catched group with \1 (second group would be \2 etc.).
sed "s/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'\1'/g"
Note the g at the end, which ensures that all matches are replaced (if there are more than one in one line).
If you add -r switch to sed, the awkward backslashes before () can be omitted:
sed -r "s/([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9])/'\1'/g"
This can be further simplified using quantifiers:
sed -r "s/([0-9]{4}-[0-9]{2}-[0-9]{2})/'\1'/g"
Or even:
sed -r "s/([0-9]{4}-([0-9]{2}){2})/'\1'/g"
As mentioned in the comments: Also, in this particular case, you may use & instead of \1, which matches the whole looked-up expression, and omit the ():
sed -r "s/[0-9]{4}(-[0-9]{2}){2}/'&'/g"
You need to use a capture group, as well as replace all matching occurrences with the g flag.
sed 's/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'"'"'\1'"'"'/g' a.txt > b.txt
The replacement text is a bit confusing because a single-quoted string in shell cannot contain a single quote, so you have to close the single-quoted string, then use a double-quoted single-quote. Using $'...'-style quoting in bash simplies it a bit, at the cost of needing to escape the backslashes.
sed $'s/\\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\\)/\'\1\'/g' a.txt > b.txt
Or, you can simply double-quote the script, since there's nothing currently in it that is subject to expansion:
sed "s/\([0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]\)/'\1'/g" a.txt > b.txt
There is also the special & replacement text, which expands to whatever the regular expressions matches, so you can avoid an explicit capture group:
sed "s/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/'&'/g" a.txt > b.txt
With GNU sed:
sed -E 's/([0-9]{2,4}-?){3}/'\''&'\''/g' file
Depending on your file content, the dates may also be described as 1 or 2 followed by a combination of nine dashes or digits:
sed -E 's/[12][-0-9]{9}/'\''&'\''/g" file
Here is one in awk:
$ awk -v q="'" '
BEGIN { FS=OFS="," } # set selimiters
{
for(i=1;i<=NF;i++) # loop all fields
if($i~/[0-9]{4}-[0-9]{2}-[0-9]{2}/) # if field has a date looking string
$i=q $i q # quote it
}1' file
Output:
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
Could you please try following.(REGEX mentioned inside match could be written as [0-9]{4}-[0-9]{2}-[0-9]{2} too but since my awk is of old version so couldn't test it, you could try it once)
awk -v s1="'" '
{
while(match($0,/[0-9][0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9]/)){
val=val substr($0,1,RSTART-1) s1 substr($0,RSTART,RLENGTH) s1
$0=substr($0,RSTART+RLENGTH)
}
print val
val=""
}' Input_file
Output will be as follows.
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01'
With Perl, it is simple
perl -pe ' s/(\d{4}-\d\d-\d\d)/\x27$1\x27/g '
with inputs - \x27 is used for single quotes
$ cat liubo.txt
1,1990-01-01,2,A,2015-02-09
1,NULL,2,A,2015-02-09
1,1990-01-01,2,A,NULL
$ perl -pe ' s/(\d{4}-\d\d-\d\d)/\x27$1\x27/g ' liubo.txt
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
$
If you want to use single quotes, then escape $ and wrap the command in double quotes
$ perl -pe " s/(\d{4}-\d\d-\d\d)/\'\$1\'/g " liubo.txt
1,'1990-01-01',2,A,'2015-02-09'
1,NULL,2,A,'2015-02-09'
1,'1990-01-01',2,A,NULL
$

Replacing/removing excess white space between columns in a file

I am trying to parse a file with similar contents:
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
I want the out file to be tab delimited:
I am a string\t12831928
I am another string\t41327318
A set of strings\t39842938
Another string\t3242342
I have tried the following:
sed 's/\s+/\t/g' filename > outfile
I have also tried cut, and awk.
Just use awk:
$ awk -F' +' -v OFS='\t' '{sub(/ +$/,""); $1=$1}1' file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Breakdown:
-F' +' # tell awk that input fields (FS) are separated by 2 or more blanks
-v OFS='\t' # tell awk that output fields are separated by tabs
'{sub(/ +$/,""); # remove all trailing blank spaces from the current record (line)
$1=$1} # recompile the current record (line) replacing FSs by OFSs
1' # idiomatic: any true condition invokes the default action of "print"
I highly recommend the book Effective Awk Programming, 4th Edition, by Arnold Robbins.
The difficulty comes in the varying number of words per-line. While you can handle this with awk, a simple script reading each word in a line into an array and then tab-delimiting the last word in each line will work as well:
#!/bin/bash
fn="${1:-/dev/stdin}"
while read -r line || test -n "$line"; do
arr=( $(echo "$line") )
nword=${#arr[#]}
for ((i = 0; i < nword - 1; i++)); do
test "$i" -eq '0' && word="${arr[i]}" || word=" ${arr[i]}"
printf "%s" "$word"
done
printf "\t%s\n" "${arr[i]}"
done < "$fn"
Example Use/Output
(using your input file)
$ bash rfmttab.sh < dat/tabfile.txt
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Each number is tab-delimited from the rest of the string. Look it over and let me know if you have any questions.
sed -E 's/[ ][ ]+/\\t/g' filename > outfile
NOTE: the [ ] is openBracket Space closeBracket
-E for extended regular expression support.
The double brackets [ ][ ]+ is to only substitute tabs for more than 1 consecutive space.
Tested on MacOS and Ubuntu versions of sed.
Your input has spaces at the end of each line, which makes things a little more difficult than without. This sed command would replace the spaces before that last column with a tab:
$ sed 's/[[:blank:]]*\([^[:blank:]]*[[:blank:]]*\)$/\t\1/' infile | cat -A
I am a string^I12831928 $
I am another string^I41327318 $
A set of strings^I39842938 $
Another string^I3242342 $
This matches – anchored at the end of the line – blanks, non-blanks and again blanks, zero or more of each. The last column and the optional blanks after it are captured.
The blanks before the last column are then replaced by a single tab, and the rest stays the same – see output piped to cat -A to show explicit line endings and ^I for tab characters.
If there are no blanks at the end of each line, this simplifies to
sed 's/[[:blank:]]*\([^[:blank:]]*\)$/\t\1/' infile
Notice that some seds, notably BSD sed as found in MacOS, can't use \t for tab in a substitution. In that case, you have to use either '$'\t'' or '"$(printf '\t')"' instead.
another approach, with gnu sed and rev
$ rev file | sed -r 's/ +/\t/1' | rev
You have trailing spaces on each line. So you can do two sed expressions in one go like so:
$ sed -E -e 's/ +$//' -e $'s/ +/\t/' /tmp/file
I am a string 12831928
I am another string 41327318
A set of strings 39842938
Another string 3242342
Note the $'s/ +/\t/': This tells bash to replace \t with an actual tab character prior to invoking sed.
To show that these deletions and \t insertions are in the right place you can do:
$ sed -E -e 's/ +$/X/' -e $'s/ +/Y/' /tmp/file
I am a stringY12831928X
I am another stringY41327318X
A set of stringsY39842938X
Another stringY3242342X
Simple and without invisible semantic characters in the code:
perl -lpe 's/\s+$//; s/\s\s+/\t/' filename
Explanation:
Options:
-l: remove LF during processing (in this case)
-p: loop over records (like awk) and print
-e: code follows
Code:
remove trailing whitespace
change two or more whitespace to tab
Tested on OP data. The trailing spaces are removed for consistency.

Replace string by regex

I have bunch of string like "{one}two", where "{one}" could be different and "two" is always the same. I need to replace original sting with "three{one}", "three" is also constant. It could be easily done with python, for example, but I need it to be done with shell tools, like sed or awk.
If I understand correctly, you want:
{one}two --> three{one}
{two}two --> three{two}
{n}two --> three{n}
SED with a backreference will do that:
echo "{one}two" | sed 's/\(.*\)two$/three\1/'
The search store all text up to your fixed string, and then replace with the your new string pre-appended to the stored text. SED is greedy by default, so it should grab all text up to your fixed string even if there's some repeat in the variable part (e.gxx`., {two}two will still remap to three{two} properly).
Using sed:
s="{one}two"
sed 's/^\(.*\)two/three\1/' <<< "$s"
three{one}
echo "XXXtwo" | sed -E 's/(.*)two/three\1/'
Here's a Bash only solution:
string="{one}two"
echo "three${string/two/}"
awk '{a=gensub(/(.*)two/,"three\\1","g"); print a}' <<< "{one}two"
Output:
three{one}
awk '/{.*}two/ { split($0,s,"}"); print "three"s[1]"}" }' <<< "{one}two"
does also output
three{one}
Here, we are using awk to find the correct lines, and then split on "}" (which means your lines should not contain more than the one to indicate the field).
Through GNU sed,
$ echo 'foo {one}two bar' | sed -r 's/(\{[^}]*\})two/three\1/g'
foo three{one} bar
Basic sed,
$ echo 'foo {one}two bar' | sed 's/\({[^}]*}\)two/three\1/g'
foo three{one} bar

How to get text which is middle of Tags?

<li><b> Some Text:</b></li><li><b> Some Text:</b></li>
<pg>something else</pg> <li><b> Some Text:</b> </li>
<li><b> Some Text:</b></li>
<li><b> Some Text:</b> More Text </li> <li><b> Some Text:</b> More Text </li>
If this is my input string and
Some Text:
Some Text:
Some Text:
Some Text: More Text
Some Text: More Text
This is to be my output But I got was only
Some Text:
Some Text:
Some Text: More Text
This is my shell script function in linux
#!/bin/sh
sed -n -e 's/.*<li>\(.*\)<\/li>.*/\1/p' $1 > temp
sed -e 's/<[<\/b]*>//g' temp >out
Please give me some ideas where went wrong.
Here is one way with GNU awk (the first line is a blank line):
$ gawk '
RT=="</b>"||RT=="</li>" && NF {
gsub(/^ *| *$/,"")
printf "%s%s",(ORS=!(NR%2)?"":"\n"),$0
}
END { print "\n" }' RS='</?b>|</?li>' file
Some Text:
Some Text:
Some Text:
Some Text:
Some Text:More Text
Some Text:More Text
If you don't mind using a third-party tool - the multi-platform web-scraping utility xidel - it gets as simple as:
xidel file.html -e '/li'
This extracts the text-only content of all (top-level) li elements and prints each on a separate line to produce the desired output.
First things first: Generally speaking, use a tool that understands HTML (see my other answer) rather than awk or sed for HTML parsing - as #chepner succinctly puts it:
Do not parse HTML with sed or awk; sed is designed for line-based editing, and awk for field-based tasks. Neither is suitable for general structured text whose elements may span more than one line.
Thus, the solutions below work in limited circumstances, but do not generalize well.
#jaypal has already provided a GNU awk (gawk)-specific answer.
Here's one that should work with all awk flavors that accept regexes as input record separators (RS) (such as gawk, mawk, and nawk):
awk -v RS='</?li>\n*' '
/^<b>/ { t=$0; gsub(/<\/?b>/, "", t); gsub(/^ +| +$/, "", t); print t}
' file
Older and POSIX-compliant awk flavors - such as the BSD-based one in OSX - only accept a single, literal char. as RS, so the above won't work; on OSX, the following sed command achieves the same (works on Linux, too):
sed -E 's/<\/?li>/\'$'\n''/g' file |
sed -En '/^<pg>/! { /[^ ]/ { s/<\/?b>//g; s/^ +| +$//gp; }; }'
Both solutions trim leading and trailing spaces from the output lines.
#!/bin/sh
Your first sed line does not what you want it to do:
You will only match ONE occurence per line
sed -n -e 's/.*<li>\(.*\)<\/li>.*/\1/p' $1 > temp
this...........................^^
which matches....the rest of the line (obviously not what you expected)
One quick workaround is to change every </li> into </li> plus linefeed before any other processing.
#!/bin/sh
sed -e 's/<\/li>/<\/li>\n/g' "$1" |\
sed -n -e 's/.*<li>\(.*\)<\/li>/\1/p' |\
sed -e 's/<[\/b]*>//g' >out
I am no sed expert...somebody else may have an more elegant solution

SED script to remove whitespace from a region

I have a file with a bunch of strings that looks like this:
new Tab("Hello World")
I want to turn those particular lines into something like:
new Tab("helloWorld")
Is this possible using SED and if so, how can I accomplish this? I figure I have to use grouping and regions but I can't figure out the replacement string.
This is what I have so far
sed -n 's/new Tab("\(.*\)"/new Tab("\1")'
This solution is not perfect: it assumes the line contains just new Tab("some string blah blah blah") and nothing else on that line. Here is my *remove_space.sed:*
/new Tab/ {
s/ *//g
s/newTab/new Tab/
}
To invoke:
sed -f remove_space.sed data.txt
The first substitution blindly remove all spaces, the second puts back a space between new and tab.
You don't have to put this in a file, the script works on command line as well:
sed '/new Tab/s/ *//g;s/newTab/new Tab/' data.txt
I'm not enough of a sed guru, but here's a piece of Perl:
perl -pe 's/(?<=new Tab\(")[^"]+/ lcfirst(join("", split(" ", $&))) /e'
My first thought was using awk but I came up with something I don't really like:
echo "new Tab(\"Hello World\")" | gawk 'match($0, /new Tab\("(.*)"\)/, r) {print r[1]}' | sed -e 's/ *//g'

Resources