How to combine multiple sed commands into one [duplicate] - bash

This question already has answers here:
Combining two sed commands
(2 answers)
Closed 1 year ago.
I have 4 different sed commands which I am running on a file. And in order to tune in the performance of these 4 commands, I want to combine them into one.
Each command is a complex command with -E switch. Searched many many forums but could not get my specific answer.
sed -i -E ':a; s/('"$search_str"'X*)[^X&]/\1X/; ta' "$newfile"
sed -i -E '/[<]ExtData[>?" "]/{:a; /Name=/{/Name="'"$nvp_list_ORed"'"/!b}; /Value=/bb; n; ba; :b; s/(Value="X*)[^X"]/\1X/; tb; }' "$newfile"
sed -i -E ':a; s/('"$search_str1"'X*)[^X\<]/\1X/; ta' "$newfile"
sed -i -E ':a; s/('"$search_str2"'X*)[^X\/]/\1X/; ta' "$newfile"
And i want to combine them say something like
sed -i -E 'command1' -e 'command2' -e 'command3' -e 'command4'
"$newfile"
But it is not working. Because may be -E and -e can't be combine.
Please let me know.
Thanks !! Puneet

-E means "extended regex" and is a standalone flag, -e means "expression" and must be followed by a sed expression.
You can combine them, but each of your sed expression must be preceded by a -e if you want multiple of them, which isn't the case of your first one.
sed -i -E -e 'command1' -e 'command2' -e 'command3' -e 'command4' "$newfile"
A second option is to write each command in the same expression :
sed -i -E 'command1;command2;command3;command4' "$newfile"
However, since you're using labels I wouldn't rely on this option ; some implementations may not support it as John1024 pointed out.
Lastly, as mentionned by Mad Physicist, you can write your sed expressions to a file which you'll reference through the -f option.
The file must contain a single sed expression by line (you can write multiline expressions by suffixing each line but the last by a \, thus escaping the line-feed).

Simply pipe them:
sed -E 'A' file | sed -E 'B' | ... >file.tmp && mv file.tmp file

As #Aaron observed, if you want to give multiple separate expressions to sed, you must designate them as -e options; they will be combined. You can also combine a bunch of expressions into one by separating the pieces with semicolons.
Your case is a bit special however: your particular expressions use labels and branch instructions, with one of the label names (a) repeated in each expression. In order to combine these, each label should be distinct, and each branch (either conditional and absolute) should specify the correct label. That would look something like this:
sed -i -E \
-e ':a1; s/('"$search_str"'X*)[^X&]/\1X/; ta1' \
-e '/[<]ExtData[>?" "]/ {:a2; /Name=/ {/Name="'"$nvp_list_ORed"'"/ !b}; /Value=/ bb2; n; ba2; :b2; s/(Value="X*)[^X"]/\1X/; tb2; }' \
-e ':a3; s/('"$search_str1"'X*)[^X\<]/\1X/; ta3' \
-e ':a4; s/('"$search_str2"'X*)[^X\/]/\1X/; ta4' \
"$newfile"
Do note that even with proper quoting from a shell perspsective, which you appear to have, your approach will not do what you expect if the value of any of the interpolated shell variables contains a regex metacharacter.

Warning: It is not always possible to combine multiple sed scripts into a single one without change. Sometimes you might have to do a redesign of your algorithm.
Sed makes has two concepts of memory. The pattern space and the hold space. Concatenation is only working if these two spaces are identical in both sed commands. Below you find an example where the pattern space changes:
$ echo aa | sed -e 's/./&\n/' | sed -e '1s/a/b/g'
b
a
$ echo aa | sed -e 's/./&\n/' -e '1s/a/b/g'
b
b
$ echo aa | gsed -e 's/./&\n/;1s/a/b/g'
b
b
In the original pipeline, the first sed command works on the pattern space aa, while the second script's pattern space is only a.

Related

What is the usage of -e flag in sed?

From some online reading, it seems that sed's -e flag usage is to note a sed script
e.g:
sed -i -e 's/default_language/language/g' "$CONF_FILE"
but from self-testing and some online search, it seems that this line should also work:
sed -i 's/default_language/language/g' "$CONF_FILE"
So what do I need -e for? Is it only useful for cases I'd like to write several scripts in a row? That can also be managed with ;.
According to the manual:
If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed
script to interpret. All remaining arguments are names of input files; if no input files are specified, then
the standard input is read.
As you already mentioned, -e may be used for multiple commands.
sed 'cmd1; cmd2'
sed -e 'cmd1; cmd2'
sed -e 'cmd1' -e 'cmd2'

Error on sed script - extra characters after command

I've been trying to create a sed script that reads a list of phone numbers and only prints ones that match the following schemes:
+1(212)xxx-xxxx
1(212)xxx-xxxx
I'm an absolute beginner, but I tried to write a sed script that would print this for me using the -n -r flags (the contents of which are as follows):
/\+1\(212\)[0-9]{3}-[0-9]{4}/p
/1\(212\)[0-9]{3}-[0-9]{4}/p
If I run this in sed directly, it works fine (i.e. sed -n -r '/\+1\(212\)[0-9]{3}-[0-9]{4}/p' sample.txt prints matching lines as expected. This does NOT work in the sed script I wrote, instead sed says:
sed: -e expression #1, char 2: extra characters after command
I could not find a good solution, this error seems to have so many causes and none of the answers I found apply easily here.
EDIT: I ran it with sed -n -r script.sed sample.txt
sed can not automatically determine whether you intended a parameter to be a script file or a script string.
To run a sed script from a file, you have to use -f:
$ echo 's/hello/goodbye/g' > demo.sed
$ echo "hello world" | sed -f demo.sed
goodbye world
If you neglect the -f, sed will try to run the filename as a command, and the delete command is not happy to have emo.sed after it:
$ echo "hello world" | sed demo.sed
sed: -e expression #1, char 2: extra characters after command
Of the various unix tools out there, two use BRE as their default regex dialect. Those two tools are sed and grep.
In most operating systems, you can use egrep or grep -E to tell that tool to use ERE as its dialect. A smaller (but still significant) number of sed implementations will accept a -E option to use ERE.
In BRE mode, however, you can still create atoms with brackets. And you do it by escaping parentheses. That's why your initial expression is failing -- the parentheses are NOT special by default in BRE, but you're MAKING THEM SPECIAL by preceding the characters with backslashes.
The other thing to keep in mind is that if you want sed to execute a script from a command line argument, you should use the -e option.
So:
$ cat ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
212-xxx-xxxx
$ grep '^+\{0,1\}1([0-9]\{3\})' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ egrep '^[+]?1\([0-9]{3}\)' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -n -e '/^+\{0,1\}1([0-9]\{3\})/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -E -n -e '/^[+]?1\([0-9]{3}\)/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
Depending on your OS, you may be able to get a full list of how this works from man re_format.

match multiple conditions with GNU sed

I'm using sed to replace values in other bash scripts, such as:
somedata="$(<somefile.sh)"
somedata=`sed 's/ ==/==/g' <<< $somedata` # [space]== becomes ==
somedata=`sed 's/== /==/g' <<< $somedata` # ==[space] becomes ==
The same for ||, &&, !=, etc. I think steps should be reduced with the right regex match. The operator does not need surrounding spaces, but may have a space before and after, only before, or only after. Is there a way to handle all of these with one sed command?
There are many other conditions not mentioned also. The script takes more time to execute than desired.
The goal is to reduce the overall execution time so I am hoping to reduce the number of commands used with clever regex to match multiple conditions.
I'm also considering tr, awk or perl - whichever is fastest?
With GNU sed, you can use the | (or) operator:
$ sed -r 's/ *(&&|\|\|) */\1/g' <<< "foo && bar || baz"
foo&&bar||baz
*(&&|\|\|) *: search for zero or more space followed by any of the | separated strings followed by zero or more space
the matching strings are captured and output using backreference
Edit:
As pointed out in comments, you can use the -E flag with GNU sed in place of -r. Your command will be more portable:
sed -E 's/ *(\&\&|\|\|) */\1/g'
As GNU sed also supports \| alternation operator with Basic Regular Expressions, you can use it for better readability:
sed 's/ *\(&&\|||\) */\1/g'
You can chain multiple sed substitutions with the -e flag:
$ echo -n "test data here" | sed -e 's/test/TEST/' \
-e 's/data/HERE/' \
-e 's/here/DATA/'
$ TEST HERE DATA
you can use a sedfile (-f option) alongside with the -i option (replace in-place, no need to store in env. variable):
sed -i -f mysedfile somefile.sh
mysedfile may contain expressions, 1 per line
s/ *&& */\&\&/g
s/ *== */==/g
(or use the -e option to use several expression, but if you have a lot of them, it wil become quickly unreadable)
BTW: -i option creates a temporary file within the processed file directory, so in the end, if operation succeeds, the original file is deleted and the temporary file is renamed into the original file name
When the end of the file is reached, the temporary file is renamed
to the output file's original name. The extension, if supplied,
is used to modify the name of the old file before renaming the
temporary file, thereby making a backup copy(2))
so there's no I/O overhead with that option. No need at all to store in a variable.

How to delete the string which is present in parameter from file in unix

I have redirected some string into one parameter for ex: ab=jyoti,priya, pranit
I have one file : Name.txt which contains -
jyoti
prathmesh
John
Kelvin
pranit
I want to delete the records from the Name.txt file which are contain in ab parameter.
Please suggest if this can be done ?
If ab is a shell variable, you can easily turn it into an extended regular expression, and use it with grep -E:
grep -E -x -v "${ab//,/|}" Name.txt
The string substitution ${ab//,/|} returns the value of $ab with every , substituted with a | which turns it into an extended regular expression, suitable for passing as an argument to grep -E.
The -v option says to remove matching lines.
The -x option specifies that the match needs to cover the whole input line, so that a short substring will not cause an entire longer line to be removed. Without it, ab=prat would cause pratmesh to be removed.
If you really require a sed solution, the transformation should be fairly trivial. grep -E -v -x 'aaa|bbb|ccc' is equivalent to sed '/^\(aaa\|bbb\|ccc)$/d' (with some dialects disliking the backslashes, and others requiring them).
To do an in-place edit (modify Name.txt without a temporary file), try this:
sed -i "/^\(${ab//,/\|}\)\$/d" Name.txt
This is not entirely robust against strings containing whitespace or other shell metacharacters, but if you just need
Try with
sed -e 's/\bjyoti\b//g;s/\bpriya\b//g' < Name.txt
(using \b assuming you need word boundaries)
this will do it:
for param in `echo $ab | sed -e 's/[ ]+//g' -e 's/,/ /g'` ; do res=`sed -e "s/$param//g" < name.txt`; echo $res > name.txt; done
echo $res

SED bad substitution error

Here's my problem, I have written the following line of code to format properly a list of files found recursively in a directory.
find * | sed -e '/\(.*\..*\)/ !d' | sed -e "s/^.*/\${File} \${INST\_FILES} &/" | sed -e "s/\( \)\([a-zA-Z0-9]*\/\)/\/\2/" | sed -e "s/\(\/\)\([a-zA-Z0-9\_\-\(\)\{\}\$]*\.[a-zA-Z0-9]*\)/ \2/"
The second step is to write the output of this command in a script. While the code above has the expected behavior, the problem occurs when I try to store its output to a variable, I get a bad substitution error from the first sed command in the line.
#!/bin/bash
nsisscript=myscript.sh
FILES=*
for f in $(find $FILES); do
v=`echo $f | sed -e '/\(.*\..*\)/ !d' | sed -e "s/^.*/\${File} \${INST\_FILES} &/" | sed -e "s/\( \)\([a-zA-Z0-9]*\/\)/\/\2/" | sed -e "s/\(\/\)\([a-zA-Z0-9\_\-\(\)\{\}\$]*\.[a-zA-Z0-9]*\)/ \2/"`
sed -i.backup -e "s/\;Insert files here/$v\\n&/" $nsisscript
done
Could you please help me understand what the difference is between the two cases and why I get this error ?
Thanks in advance!
Well my guess was that your escaping of underscore in INST_FILES is strange as underscore is not a special character in shell nor in sed. The error disappear when you delete the '\' before '_'
my 2 cents
Parsing inside of backquote-style command substitution is a bit weird -- it requires an extra level of escaping (i.e. backslashes) to control when expansions take place. Ugly solution: add more backslashes. Better solution: use $() instead of backquotes -- it does the same thing, but without the weird parsing and escaping issues.
BTW, your script seems to have some other issues. First, I don't know about the sed on your system, but the versions I'm familiar with don't interpret \n in the substitution as a newline (which I presume you want), but as a literal n character. One solution is to include a literal newline in the substitution (preceded by a backslash).
Also, the loop executes for each found file, but for files that don't have a period in the name, the first sed command removes them, $v is empty, and you add a blank line to myscript.sh. You should either put the filtering sed call in the for statement, or add it as a filter to the find command.
#!/bin/bash
nsisscript=myscript.sh
nl=$'\n'
FILES=*
for f in $(find $FILES -name "*.*"); do
v=$(echo $f | sed -e "s/^.*/\${File} \${INST\_FILES} &/" | sed -e "s/\( \)\([a-zA-Z0-9]*\/\)/\/\2/" | sed -e "s/\(\/\)\([a-zA-Z0-9\_\-\(\)\{\}\$]*\.[a-zA-Z0-9]*\)/ \2/")
sed -i.backup -e "s/\;Insert files here/$v\\$nl&/" $nsisscript
done

Resources