Extracting string from Substring in shell script

Extracting string from Substring in shell script - bash

I have a line like this:
<option value="bo">Tibetan Standard, Tibetan, Central</option>
I want an output like this:
bo Tibetan Standard, Tibetan, Central
When I am trying to do with sed:
sed -r 's/.*value="(\S+).*">(\S+)<.*/\1 \2/'
It gives only:
bo Tibetan
Can anyone help me?
Thanks in advance

The following modification to your original sed should work:
sed -r 's_.*value="(.*)">(.*)</option>_\1 \2_'
The following example:
sed -r 's_.*value="(.*)">(.*)</option>_\1 \2_' <<< '<option value="bo">Tibetan Standard, Tibetan, Central</option>'
Prints the desired output:
bo Tibetan Standard, Tibetan, Central

Related

How to change date format with Sed command?

I am trying to change the date format from YYYY-MM-DD to DD/MM/YYYY in all lines of a file using the sed command.
For example:
2021-04-01 00:15,69,0,38,1,1,0,0,0,32
should be:
01-04-2021 00:15,69,0,38,1,1,0,0,0,32
I have tried the following using regular expressions:
sed -E 's,[0-9]{4}-[0-9]{2}-[0-9]{2},\3-\2-\1,g'
unfortunately this does not work and gives me an error (not defined in the RE).
it would be really great if someone could help me to solve this.

You forgot capturing groups with ( ):
$ sed -E 's,^([0-9]{4})-([0-9]{2})-([0-9]{2}),\3-\2-\1,g' <<< '2021-04-01 00:15,69,0,38,1,1,0,0,0,32'
01-04-2021 00:15,69,0,38,1,1,0,0,0,32

sunOS's sed in bash script - replace pattern: nothing changes

ANSWER:
as mlv and sorontar mentioned, my SED version is BRE and doesn't support | (pipe). so in my case is possible use something like:
sed "s/\( [namevlu]*=\"\)BASE\.$str1/\1BASE\.$str2/g"
which match name=" and value=" but not other=" and values=". regex ( [namevlu]) contains only characters what i need. ok, it is not as save, as can be, but i don't expect existence of something like valuenm=" or so. if someone needs exactly specified regex, it needs make two (or more) seds.
ORIGINAL QUESTION:
I need to replace one string with another, but for sure i need check specific context.
for example:
blah val1="BASE.OLD_TEXT_OR_SO" blabla
blah val2="BASE.OLD_SOMETHING" ...
i want change to
blah val1="BASE.NEW_TEXT_OR_SO" blabla
blah val2="BASE.NEW_SOMETHING" ...
this script doesn't change anything:
#!/bin/bash
...
str1="OLD_"
str2="NEW_"
sed "s/\(name=\"|value=\"\)BASE\.$str1/\1BASE\.$str2/g" input.file > output.file
but later similiar sed works ok:
sed "s/\(<Tag>\)[A-Z0-9\-\._|]*\(<\/Tag>\)/\1$otherStr\2/g" input.file > output.file
output file has still BASE.OLD_ :/
also when i try it on console, i get same (none) result. i think there is something wrong in "looking for" pattern, but i havent idea what.
$ str0='blah name="BASE.OLD_TEXT_OR_SO" blabla
> blah value="BASE.OLD_SOMETHING" ...
> blah other="BASE.OLD_SOMETHING" ...
> blah values="BASE.OLD_SOMETHING" ...'
$ echo $str0 | sed "s/\(val1=\"|val2=\"\)BASE\.$str1/\1BASE\.$str2/g"
regex was tested on online tester where it works fine.
(name="|value=")BASE\.OLD_
\1BASE\.NEW_
system:
SunOS 5.11
GNU bash 4.1.11(1)-release
Sed 4.2.1
Thanks in advance.

I'm not sure SunOS sed includes |. That would be added with -r which I don't think SunOS sed supports.
But in this case, you should be able to do:
sed "s/\(val[12]=\"\)BASE.$str1/\1BASE.$str2/"
If you can't use val[12], then I don't think it can be done in sed. But it's not too hard in perl:
perl -pe "if (/val1=BASE.$str1/ || /val2=BASE.$str1/) {s/BASE.$str1/BASE.$str2/;}"

In a basic sed only BRE are supported, (POSIX BRE) and the alternation (|) does not exist.
Your regex must be written as:
sed "s/\(val[12]=\"BASE\.\)${str1}/\1${str2}/g" input.file
If, the version of sed you use supports (extended) ERE (which accepts the |) then you may use:
sed -E "s/(val(1|2)=\"BASE\.)${str1}/\1${str2}/g" input.file

as mlv and sorontar mentioned, my SED version is BRE and doesn't support | (pipe). so in my case is possible use something like:
sed "s/\( [namevlu]*=\"\)BASE\.$str1/\1BASE\.$str2/g"
which match name=" and value=" but not other=" and values=". regex ( [namevlu]) contains only characters what i need. ok, it is not as save, as can be, but i don't expect existence of something like valuenm=" or so. if someone needs exactly specified regex, it needs make two (or more) seds.

Using sed to find text BEFORE a string

So I'm trying to extract URLs from url files
and i need some kind of descriptor for the file--
If I have something like
param5x57389dwordhereparam3904
and I want to extract wordhere, I know I can use
$(sed -n 's/param5x57389//p' $f)
to get wordhereparam3904
so how do I go about taking off that last part?
Thanks in advance!

You can use the following sed command:
sed '/param5x57389d/ s/param5x57389d\(.*\)param.*/\1/' "$f"

Trimming pathnames beyond a keyword (awk, sed, ?)

I want to trim a pathname beyond a certain point after finding a keyword. I'm drawing a blank this morning.
/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java
I want to find the keyword Java, save the pathname beyond that (tsupdater), then cut everything off after the Java portion.

I don't know if this is what you want, but you can split the pathname into two with:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//p;g;s/Java.*/Java/'
Which outputs:
/tsupdater/src/tsupdater.java
/home/quikq/1.0/dev/Java
If you would like to save the second part into a file part2.txt and print the first part, you could do:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 'h;s/.*Java//;wpart2.txt;g;s/Java.*/Java/'
If you're writing a shell script:
myvar="/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"
part1="${myvar%Java*}Java"
part2="${myvar#*Java/}"
Hope this helps =)

take one you need:
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java/[^/]*).*#\1#g'
/home/quikq/1.0/dev/Java/tsupdater
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#(.*Java).*#\1#g'
/home/quikq/1.0/dev/Java
kent$ echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java"|sed -r 's#.*Java/([^/]*).*#\1#g'
tsupdater

I'm not entirely sure what you want as output (please specify more clearly), but this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/.*Java//'
results in:
/tsupdater/src/tsupdater.java
If you want the preceding part then this command:
echo "/home/quikq/1.0/dev/Java/tsupdater/src/tsupdater.java" | sed 's/Java.*//'
results in:
/home/quikq/1.0/dev/

Like I said, I was having a weird morning, but it dawned on me.
echo /home/quikq/1.0/dev/Java/TSUpdater/src/TSUpdater.java | sed s/Java.*//g
Yields
/home/quikq/1.0/dev
Lots of great tips here for chopping it up different ways though. Thanks a bunch!

Remove nth character from middle of string using Shell

I've been searching google for ever, and I cannot find an example of how to do this. I also do not grasp the concept of how to construct a regular expression for SED, so I was hoping someone could explain this to me.
I'm running a bash script against a file full of lines of text that look like this: 2222,H,73.82,04,07,2012
and I need to make them all look like this: 2222,H,73.82,04072012
I need to remove the last two commas, which are the 16th and 19th characters in the line.
Can someone tell me how to do that? I was going to use colrm, which is blessedly simple, but i can't seem to get that installed in CYGWIN. Please and thank you!

I'd use awk for this:
awk -F',' -v OFS=',' '{ print $1, $2, $3, $4$5$6 }' inputfile
This takes a CSV file and prints the first, second and third fields, each followed by the output field separator (",") and then the fourth, fifth and sixth fields concatenated.
Personally I find this easier to read and maintain than regular expression-based solutions in sed and it will cope well if any of your columns get wider (or narrower!).

This will work on any string and will remove only the last 2 commas:
sed -e 's/\(.*\),\([^,]*\),\([^,]*\)$/\1\2\3/' infile.txt
Note that in my sed variant I have to escape parenthesis, YMMV.

I also do not grasp the concept of how to construct a regular
expression for SED, so I was hoping someone could explain this to me.
The basic notation that people are telling you here is: s/PATTERN/REPLACEMENT/
Your PATTERN is a regular expression, which may contain parts that are in brackets. Those parts can then be referred to in the REPLACEMENT part of the command. For example:
> echo "aabbcc" | sed 's/\(..\)\(..\)\(..\)/\2\3\1/'
bbccaa
Note that in the version of sed I'm using defaults to the "basic" RE dialect, where the brackets in expressions need to be escaped. You can do the same thing in the "extended" dialect:
> echo "aabbcc" | sed -E 's/(..)(..)(..)/\2\3\1/'
bbccaa
(In GNU sed (which you'd find in Linux), you can get the same results with the -r options instead of -E. I'm using OS X.)
I should say that for your task, I would definitely follow Johnsyweb's advice and use awk instead of sed. Much easier to understand. :)

It should work :
sed -e 's~,~~4g' file.txt
remove 4th and next commas

echo "2222,H,73.82,04,07,2012" | sed -r 's/(.{15}).(..)./\1\2/'
Take 15 chars, drop one, take 2, drop one.

sed -e 's/(..),(..),(....)$/\1\2\3/' myfile.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting string from Substring in shell script - bash

I have a line like this: <option value="bo">Tibetan Standard, Tibetan, Central</option> I want an output like this: bo Tibetan Standard, Tibetan, Central When I am trying to do with sed: sed -r 's/.value="(\S+).">(\S+)<.*/\1 \2/' It gives only: bo Tibetan Can anyone help me? Thanks in advance

Related

How to change date format with Sed command?

sunOS's sed in bash script - replace pattern: nothing changes

Using sed to find text BEFORE a string

Trimming pathnames beyond a keyword (awk, sed, ?)

Remove nth character from middle of string using Shell

Categories

Resources

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Extracting string from Substring in shell script - bash

I have a line like this: <option value="bo">Tibetan Standard, Tibetan, Central</option> I want an output like this: bo Tibetan Standard, Tibetan, Central When I am trying to do with sed: sed -r 's/.*value="(\S+).*">(\S+)<.*/\1 \2/' It gives only: bo Tibetan Can anyone help me? Thanks in advance

Related

How to change date format with Sed command?

sunOS's sed in bash script - replace pattern: nothing changes

Using sed to find text BEFORE a string

Trimming pathnames beyond a keyword (awk, sed, ?)

Remove nth character from middle of string using Shell

Categories

Resources

I have a line like this: <option value="bo">Tibetan Standard, Tibetan, Central</option> I want an output like this: bo Tibetan Standard, Tibetan, Central When I am trying to do with sed: sed -r 's/.value="(\S+).">(\S+)<.*/\1 \2/' It gives only: bo Tibetan Can anyone help me? Thanks in advance