invoking sed with a shell variable - bash

Why doesn't this work?
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | sed $s
sed: 1: "'s/^
": invalid command code '
$ ls | gsed $s
gsed: -e expression #1, char 1: unknown command: `''
But this does:
$ ls | eval sed $s
... prints staff ...
$ ls | eval gsed $s
... prints staff ...
Tried removing single quotes from $s but it only works for patterns without spaces:
$ s="-e s/a/b/"
$ ls | sed $s
... prints staff ...
$ s="-e s/^ *//"
$ ls | sed $s
sed: 1: "s/^
": unterminated substitute pattern
or
$ s="-e s/^\ *//"
$ ls | sed $s
sed: 1: "s/^\
": unterminated substitute pattern
Mac OS 10.8, bash 4.2, default sed and gsed 4.2.2 from Mac Ports

Simple looking question with a complicated answer. Most of the issue is with the shell; it is only partly a problem with sed. (In other words, you could use a number of different commands instead of sed and would run into similar issues.)
Note that most commands documented with an option letter and a separate argument string will also work when the argument string is attached to the option. For example:
sort -t :
sort -t:
Both of these give the value : to the -t option. Similarly with sed and the -e option. That is, you can write either of these:
sed -n -e /match/p
sed -n -e/match/p
Let's look at the one of the working sed commands you wrote:
$ s="-e s/a/b/"
$ ls | sed $s
What the sed command is passed here is two arguments (after it's command name):
-e
s/a/b/
This is a perfectly fine set of arguments for sed. What went wrong with the first one, then?
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | sed $s
Well, this time, the sed command was passed 6 arguments:
-e
's/^
*//'
-e
's/
*$//'
You can use the al command (argument list — print each argument on its own line; it is described and implemented at the bottom of this answer) to see how arguments are presented to sed. Simply type al in place of sed in the examples.
Now, the -e option should be followed by a valid sed command, but 's/^ is not a valid command; the quote ' is not a valid sed command. When you type the command at the shell prompt, the shell processes the single quote and removes it, so sed does not normally see it, but that happens before shell variables are expanded.
Why, then, does the eval work:
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | eval sed $s
The eval re-evaluates the command line. It sees:
eval sed -e 's/$ *//' -e 's/ *$//'
and goes through the full evaluation process. It removes the single quotes after grouping the characters, so sed sees:
-e
s/$ *//
-e
s/ *$//
which is all completely valid sed scripting.
One of your tests was:
$ s="-e s/^ *//"
$ ls | sed $s
And this failed because sed was given the arguments:
-e
s/^
*//
The first is not a valid substitute command, and the second is unlikely to be a valid file name. Interestingly, you could rescue this by putting double quotes around the $s, as in:
$ s="-e s/^ *//"
$ ls | sed "$s"
Now sed gets a single argument:
-e s/^ *//
but the -e can have the command attached, and leading spaces on commands are ignored, so this is all valid. You can't do that with your first attempt, though:
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | sed "$s"
Now you get told about the ' not being recognized. You could, however, have used:
$ s="-e s/^ *//; s/ *$//"
$ ls | sed "$s"
Again, sed sees a single argument, and there are two semicolon-separated sed commands in the argument to the -e option.
You can ring the variations from here. I find the al command very useful; it quite often helps me understand where something is going wrong.
Source for al — argument list
#include <stdio.h>
int main(int argc, char **argv)
{
while (*++argv)
puts(*argv);
return 0;
}
This is one of the smallest useful C programs you can write ('hello world' is one line shorter, but it isn't useful for much beyond demonstrating how to compile and run a program). It lists each of its arguments on a line on its own. You can also simulate it in bash and other related shells with the printf command:
printf "%s\n" "$#"
Wrap it as a function:
al()
{
printf "%s\n" "$#"
}

The sed worked for your normal replace pattern because it did not have any metacharacters. You had just a and b. When there are metacharacters involved, you need single quotes.
I think the only way sed would work properly for your variable assignment case is only by using eval.

Related

Shell: Filter list by array of sed expressions

I have a list like this:
> echo $candidates
ENV-NONPROD-SANDBOX
ENV-NONPROD-SANDBOX-SECRETS
ENV-NONPROD-DEMO
ENV-NONPROD-DEMO-SECRETS
ENV-PROD-EU
ENV-PROD-EU-SECRETS
ENV-PROD-US
ENV-PROD-US-SECRETS
I also have a dynamically created list of expressions which I want to apply as filters (AND) to narrow that list to possible candidates:
$ filters=('/-SECRETS/!d' '/-NONPROD/!d') # static exanmple
Then I concatenate this and try to apply, but that does not work:
$ filterParam=$(printf "-e '%s' " "${filters[#]}")
$ echo $filterParam
-e "/-SECRETS/!d" -e "/-NONPROD/!d"
$ echo "$candidates" | sed $filterParam
sed: 1: " '/-SECRETS/\!d' ...": invalid command code '
The strange thing: If I execute it manually, it works!
> echo "$candidates" | sed -e "/-SECRETS/!d" -e "/-NONPROD/!d"
ENV-NONPROD-SANDBOX-SECRETS
ENV-NONPROD-DEMO-SECRETS
I execute this on macOS and zsh 5.8.1 (x86_64-apple-darwin21.0)
filterParam=$(printf "-e '%s' "
No, you can't store command line arguments in variables. Read https://mywiki.wooledge.org/BashFAQ/050 .
You can use bash arrays, which you already use to store filters, so just use them:
sedargs=()
for i in "${filters[#]}"; do
sedargs+=(-e "$i")
done
sed "${sedargs[#]}"
But sed is sed, just join array elements with a newline or a semicolon, which delimits sed expressions:
sed "$(printf "%s\n" "${filters[#]}")"
When you do a
sed $filterParam
in zsh, sed is invoked with one single argument, which is the content of the variable filterParam. sed does not know how to handle this.
If you would type the parameters explicitly, i.e.
sed -e "/-SECRETS/!d" -e "/-NONPROD/!d"
sed is invoked with four arguments, and this is what sed understands.
In bash, in the command
sed $filterParam
the value of filterParam would be split at the spaces and each "word" would be passed as a separate argument. In your concrete setting, this would make have sed receive 4 arguments.

Portable sed way to find longest common prefix of strings

The sed solutions in Longest common prefix of two strings in bash only work with GNU sed. I'd like a more portable sed solution (e.g. for BSD/macOS sed, Busybox sed).
The following solutions are tested with GNU sed, macOS (10.15) sed and busybox (v1.29) sed.
$ printf '%s\n' a ab abc | sed -e '$q;N;s/^\(.*\).*\n\1.*$/\1/;h;G;D'
a
$ printf '%s\n' a b c | sed -e '$q;N;s/^\(.*\).*\n\1.*$/\1/;h;G;D'
$
To be more efficient when there are many strings especially when there's no common prefix at all (note the ..* part which is different from the previous solution):
$ printf '%s\n' a ab abc | sed -ne :L -e '$p;N;s/^\(..*\).*\n\1.*/\1/;tL' -e q
a
$ printf '%s\n' a b c | sed -ne :L -e '$p;N;s/^\(..*\).*\n\1.*/\1/;tL' -e q
$
Regarding $q in the first solution
According to GNU sed manual (info sed):
N command on the last line
Most versions of sed exit without printing anything when the N command is issued on the last line of a file. GNU sed prints pattern space before exiting unless of course the -n command switch has been specified.
Note that I did not use sed -E because macOS sed's -E does not support \N back-reference in s/pattern/replace/ command's pattern string.
$ # with GNU sed:
$ echo foofoo | gsed -E 's/(foo)\1/bar/'
bar
$
$ # with macOS's own sed:
$ echo foofoo | sed -E 's/(foo)\1/bar/'
foofoo
$
UPDATE (2021-04-26):
Found this in another answer :
sed -e '1{h;d;}' -e 'G;s/\(.*\).*\n\1.*/\1/;h;$!d'
Note that it does not work when there's only one line. Can be easily fixed by removing the 1d part:
sed -e '1h;G;s/^\(.*\).*\n\1.*/\1/;h;$!d'

Error on sed script - extra characters after command

I've been trying to create a sed script that reads a list of phone numbers and only prints ones that match the following schemes:
+1(212)xxx-xxxx
1(212)xxx-xxxx
I'm an absolute beginner, but I tried to write a sed script that would print this for me using the -n -r flags (the contents of which are as follows):
/\+1\(212\)[0-9]{3}-[0-9]{4}/p
/1\(212\)[0-9]{3}-[0-9]{4}/p
If I run this in sed directly, it works fine (i.e. sed -n -r '/\+1\(212\)[0-9]{3}-[0-9]{4}/p' sample.txt prints matching lines as expected. This does NOT work in the sed script I wrote, instead sed says:
sed: -e expression #1, char 2: extra characters after command
I could not find a good solution, this error seems to have so many causes and none of the answers I found apply easily here.
EDIT: I ran it with sed -n -r script.sed sample.txt
sed can not automatically determine whether you intended a parameter to be a script file or a script string.
To run a sed script from a file, you have to use -f:
$ echo 's/hello/goodbye/g' > demo.sed
$ echo "hello world" | sed -f demo.sed
goodbye world
If you neglect the -f, sed will try to run the filename as a command, and the delete command is not happy to have emo.sed after it:
$ echo "hello world" | sed demo.sed
sed: -e expression #1, char 2: extra characters after command
Of the various unix tools out there, two use BRE as their default regex dialect. Those two tools are sed and grep.
In most operating systems, you can use egrep or grep -E to tell that tool to use ERE as its dialect. A smaller (but still significant) number of sed implementations will accept a -E option to use ERE.
In BRE mode, however, you can still create atoms with brackets. And you do it by escaping parentheses. That's why your initial expression is failing -- the parentheses are NOT special by default in BRE, but you're MAKING THEM SPECIAL by preceding the characters with backslashes.
The other thing to keep in mind is that if you want sed to execute a script from a command line argument, you should use the -e option.
So:
$ cat ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
212-xxx-xxxx
$ grep '^+\{0,1\}1([0-9]\{3\})' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ egrep '^[+]?1\([0-9]{3}\)' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -n -e '/^+\{0,1\}1([0-9]\{3\})/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -E -n -e '/^[+]?1\([0-9]{3}\)/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
Depending on your OS, you may be able to get a full list of how this works from man re_format.

How to compose custom command-line argument from file lines?

I know about the xargs utility, which allows me to convert lines into multiple arguments, like this:
echo -e "a\nb\nc\n" | xargs
Results in:
a b c
But I want to get:
a:b:c
The character : is used for an example. I want to be able to insert any separator between lines to get a single argument. How can I do it?
If you have a file with multiple lines than you want to change to a single argument changing the NEWLINES by a single character, the paste command is what you need:
$ echo -en "a\nb\nc\n" | paste -s -d ":"
a:b:c
Then, your command becomes:
your_command "$(paste -s -d ":" your_file)"
EDIT:
If you want to insert more than a single character as a separator, you could use sed before paste:
your_command "$(sed -e '2,$s/^/<you_separator>/' your_file | paste -s -d "")"
Or use a single more complicated sed:
your_command "$(sed -n -e '1h;2,$H;${x;s/\n/<you_separator>/gp}' your_file)"
The example you gave is not working for me. You would need:
echo -e "a\nb\nc\n" | xargs
to get a b c.
Coming back to your need, you could do this:
echo "a b c" | awk 'OFS=":" {print $1, $2, $3}'
it will change the separator from space to : or whatever you want it to be.
You can also use sed:
echo "a b c" | sed -e 's/ /:/g
that will output a:b:c.
After all these data processing, you can use xargs to perform the command you want to. Just | xargs and do whatever you want.
Hope it helps.
You can join the lines using xargs and then replace the space(' ' ) using sed.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'
will result in
a:b:c
obviously you can use this output as argument for other command using another xargs.
echo -e "a\nb\nc"|xargs| sed -e 's/ /:/g'|xargs

SED bad substitution error

Here's my problem, I have written the following line of code to format properly a list of files found recursively in a directory.
find * | sed -e '/\(.*\..*\)/ !d' | sed -e "s/^.*/\${File} \${INST\_FILES} &/" | sed -e "s/\( \)\([a-zA-Z0-9]*\/\)/\/\2/" | sed -e "s/\(\/\)\([a-zA-Z0-9\_\-\(\)\{\}\$]*\.[a-zA-Z0-9]*\)/ \2/"
The second step is to write the output of this command in a script. While the code above has the expected behavior, the problem occurs when I try to store its output to a variable, I get a bad substitution error from the first sed command in the line.
#!/bin/bash
nsisscript=myscript.sh
FILES=*
for f in $(find $FILES); do
v=`echo $f | sed -e '/\(.*\..*\)/ !d' | sed -e "s/^.*/\${File} \${INST\_FILES} &/" | sed -e "s/\( \)\([a-zA-Z0-9]*\/\)/\/\2/" | sed -e "s/\(\/\)\([a-zA-Z0-9\_\-\(\)\{\}\$]*\.[a-zA-Z0-9]*\)/ \2/"`
sed -i.backup -e "s/\;Insert files here/$v\\n&/" $nsisscript
done
Could you please help me understand what the difference is between the two cases and why I get this error ?
Thanks in advance!
Well my guess was that your escaping of underscore in INST_FILES is strange as underscore is not a special character in shell nor in sed. The error disappear when you delete the '\' before '_'
my 2 cents
Parsing inside of backquote-style command substitution is a bit weird -- it requires an extra level of escaping (i.e. backslashes) to control when expansions take place. Ugly solution: add more backslashes. Better solution: use $() instead of backquotes -- it does the same thing, but without the weird parsing and escaping issues.
BTW, your script seems to have some other issues. First, I don't know about the sed on your system, but the versions I'm familiar with don't interpret \n in the substitution as a newline (which I presume you want), but as a literal n character. One solution is to include a literal newline in the substitution (preceded by a backslash).
Also, the loop executes for each found file, but for files that don't have a period in the name, the first sed command removes them, $v is empty, and you add a blank line to myscript.sh. You should either put the filtering sed call in the for statement, or add it as a filter to the find command.
#!/bin/bash
nsisscript=myscript.sh
nl=$'\n'
FILES=*
for f in $(find $FILES -name "*.*"); do
v=$(echo $f | sed -e "s/^.*/\${File} \${INST\_FILES} &/" | sed -e "s/\( \)\([a-zA-Z0-9]*\/\)/\/\2/" | sed -e "s/\(\/\)\([a-zA-Z0-9\_\-\(\)\{\}\$]*\.[a-zA-Z0-9]*\)/ \2/")
sed -i.backup -e "s/\;Insert files here/$v\\$nl&/" $nsisscript
done

Resources