Error on sed script - extra characters after command - bash

I've been trying to create a sed script that reads a list of phone numbers and only prints ones that match the following schemes:
+1(212)xxx-xxxx
1(212)xxx-xxxx
I'm an absolute beginner, but I tried to write a sed script that would print this for me using the -n -r flags (the contents of which are as follows):
/\+1\(212\)[0-9]{3}-[0-9]{4}/p
/1\(212\)[0-9]{3}-[0-9]{4}/p
If I run this in sed directly, it works fine (i.e. sed -n -r '/\+1\(212\)[0-9]{3}-[0-9]{4}/p' sample.txt prints matching lines as expected. This does NOT work in the sed script I wrote, instead sed says:
sed: -e expression #1, char 2: extra characters after command
I could not find a good solution, this error seems to have so many causes and none of the answers I found apply easily here.
EDIT: I ran it with sed -n -r script.sed sample.txt

sed can not automatically determine whether you intended a parameter to be a script file or a script string.
To run a sed script from a file, you have to use -f:
$ echo 's/hello/goodbye/g' > demo.sed
$ echo "hello world" | sed -f demo.sed
goodbye world
If you neglect the -f, sed will try to run the filename as a command, and the delete command is not happy to have emo.sed after it:
$ echo "hello world" | sed demo.sed
sed: -e expression #1, char 2: extra characters after command

Of the various unix tools out there, two use BRE as their default regex dialect. Those two tools are sed and grep.
In most operating systems, you can use egrep or grep -E to tell that tool to use ERE as its dialect. A smaller (but still significant) number of sed implementations will accept a -E option to use ERE.
In BRE mode, however, you can still create atoms with brackets. And you do it by escaping parentheses. That's why your initial expression is failing -- the parentheses are NOT special by default in BRE, but you're MAKING THEM SPECIAL by preceding the characters with backslashes.
The other thing to keep in mind is that if you want sed to execute a script from a command line argument, you should use the -e option.
So:
$ cat ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
212-xxx-xxxx
$ grep '^+\{0,1\}1([0-9]\{3\})' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ egrep '^[+]?1\([0-9]{3}\)' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -n -e '/^+\{0,1\}1([0-9]\{3\})/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
$ sed -E -n -e '/^[+]?1\([0-9]{3}\)/p' ph.txt
+1(212)xxx-xxxx
1(212)xxx-xxxx
Depending on your OS, you may be able to get a full list of how this works from man re_format.

Related

What is the usage of -e flag in sed?

From some online reading, it seems that sed's -e flag usage is to note a sed script
e.g:
sed -i -e 's/default_language/language/g' "$CONF_FILE"
but from self-testing and some online search, it seems that this line should also work:
sed -i 's/default_language/language/g' "$CONF_FILE"
So what do I need -e for? Is it only useful for cases I'd like to write several scripts in a row? That can also be managed with ;.
According to the manual:
If no -e, --expression, -f, or --file option is given, then the first non-option argument is taken as the sed
script to interpret. All remaining arguments are names of input files; if no input files are specified, then
the standard input is read.
As you already mentioned, -e may be used for multiple commands.
sed 'cmd1; cmd2'
sed -e 'cmd1; cmd2'
sed -e 'cmd1' -e 'cmd2'

Replace all unquoted characters from a file bash

Using bash, how would one replace all unquoted characters from a file?
I have a system that I can't modify that spits out CSV files such as:
code;prop1;prop2;prop3;prop4;prop5;prop6
0,1000,89,"a1,a2,a3",33,,
1,,,"a55,a10",1,1 L,87
2,25,1001,a4,,"1,5 L",
I need this to become, for a new system being added
code;prop1;prop2;prop3;prop4;prop5;prop6
0;1000;89;a1,a2,a3;33;;
1;;;a55,a10;1;1 L;87
2;25;1001;a4;1,5 L;
If the quotes can be removed after this substitution happens in one command it would be nice :) But I prefer clarity to complicated one-liners for future maintenance.
Thank you
With sed:
sed -e 's/,/;/g' -e ':loop; s/\("\)\([^;]*\);\([^"]*"\)/\1\2,\3/; t loop'
Test:
$ sed -e 's/,/;/g' -e ':loop; s/\("\)\([^;]*\);\([^"]*"\)/\1\2,\3/; t loop' yourfile
code;prop1;prop2;prop3;prop4;prop5;prop6
0;1000;89;"a1,a2,a3";33;;
1;;;"a55,a10";1;1 L;87
2;25;1001;a4;;"1,5 L";
You want to use a csv parser. Parsing csv with shell tools is hard (you will encounter regular expressions soon, and they rarely get all cases).
There is one in almost every language. I recommend python.
You can also do this using excel/openoffice variants by opening the file and then saving with ; as the separator.
You can used sed:
echo '0,1000,89,"a1,a2,a3",33,,' | sed -e "s|\"||g"
This will replace " with the empty string (deletes it), and you can pipe another sed to replace the , with ;:
sed -e "s|,|;|g"
$ echo '0,1000,89,"a1,a2,a3",33,,' | sed -e "s|\"||g" | sed -e "s|,|;|g"
>> 0;1000;89;a1;a2;a3;33;;
Note that you can use any separator you want instead of | inside the sed command. For example, you can rewrite the first sed as:
sed -e "s-\"--g"

Replacing special characters in a shell script using sed

I am trying to write a shell script that will replace whatever characters/strings I choose using sed. My first attempt worked with the exception of special characters. I have been trying to use sed to fix the special characters so that they too will be searched for or replaced. I decided to simplify the script for testing purposed, and just deal with a single offending character. However, I am still having problems.
Edited Script
#! /bin/sh
oldString=$1
newString=$2
file=$3
oldStringFixed=$(echo "$oldString" | sed 's/\\/\\\\/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\[/\\\[/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\]/\\\]/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\^/\\\^/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\*/\\\*/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\+/\\\+/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\./\\\./g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\$/\\\$/g')
oldStringFixed=$(echo "$oldStringFixed" | sed 's/\-/\\\-/g')
sed -e "s/$oldStringFixed/$newString/g" "$file" > newfile.updated
mv newfile.updated "$file"#! /bin/sh
In case it is not clear, I am trying to search through oldString for the [ character, and replace it with an escaped version and assign the results to oldStringFixed (do I need the backticks for this?). The bottom two lines are slightly modified versions of my original script that I believe works correctly.
When I echo the fixed string, nothing is displayed, and sed outputs an error
sed: can't read [: No such file or directory
Can anyone explain what Is wrong with my first sed line?
EDIT:
Thanks to Jite, the script is working better. However, I am still having a problem with replacing single quoted characters with spaces, i.e. ' *'. The new version is above.
I suggest two improvements:
Do not stack calls to sed as you do, instead pack all of them in a single function, as escape_string below.
You can use a fancy delimiter for the sed substitute command to avoid issues linked to / being part of the strings involved.
With these changes, your script looks like:
#! /bin/sh
oldString="$1"
newString="$2"
file="$3"
escape_string()
{
printf '%s' "$1" | sed -e 's/[][\\^*+.$-]/\\\1/g'
}
fancyDelim=$(printf '\001')
oldStringFixed=$(escape_string "$oldString")
sed -e "s$fancyDelim$oldStringFixed$fancyDelim$newString${fancyDelim}g" "$file" \
> newfile.updated
mv newfile.updated "$file"
To replace values containing special characters try using sed with "|" instead of "/"
Eg: sed -i 's|'$original_value'|'$new_value'|g'
where original_value="comprising_special_char_/"
new_value="comprising_new_special_char:"
Change:
oldStringFixed= `sed 's/\[/\[/g' "$oldString"\`
to:
oldStringFixed=$(echo "$oldString" | sed 's/\[/\\\[/g')
Problem 1: Space after =, it's not allowed when assigning shell variables.
Problem 2: sed expects a file as input, not a string. You may pipe it as my solution does though.
Problem 3: You need to escape the backslash first \\, then you need to escape your char \[, totalling \\\[ :)
Side note: I changed `` to $() since the latter is the recommended praxis (due to nesting, another topic).
For me it was just a nightmare trying to get sed to do this for the general case. I gave up and wrote a short Python code to replace sed:
#!/usr/bin/python
# replace.py
import sys
# Replace string in a file (in place)
match=sys.argv[1]
replace=sys.argv[2]
filename=sys.argv[3]
print "Replacing strings in",filename
with open(filename,"r") as f:
data = f.read().replace(match,replace)
with open(filename,"w") as f:
f.write(data)
Which can then be used like:
#!/bin/bash
orig='<somethinghorrible>'
out='<replacement>'
python replace.py "$orig" "$out" myfile.txt
you can use this for replacing " with \" sed 's/\"/\\\"/g' filename

invoking sed with a shell variable

Why doesn't this work?
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | sed $s
sed: 1: "'s/^
": invalid command code '
$ ls | gsed $s
gsed: -e expression #1, char 1: unknown command: `''
But this does:
$ ls | eval sed $s
... prints staff ...
$ ls | eval gsed $s
... prints staff ...
Tried removing single quotes from $s but it only works for patterns without spaces:
$ s="-e s/a/b/"
$ ls | sed $s
... prints staff ...
$ s="-e s/^ *//"
$ ls | sed $s
sed: 1: "s/^
": unterminated substitute pattern
or
$ s="-e s/^\ *//"
$ ls | sed $s
sed: 1: "s/^\
": unterminated substitute pattern
Mac OS 10.8, bash 4.2, default sed and gsed 4.2.2 from Mac Ports
Simple looking question with a complicated answer. Most of the issue is with the shell; it is only partly a problem with sed. (In other words, you could use a number of different commands instead of sed and would run into similar issues.)
Note that most commands documented with an option letter and a separate argument string will also work when the argument string is attached to the option. For example:
sort -t :
sort -t:
Both of these give the value : to the -t option. Similarly with sed and the -e option. That is, you can write either of these:
sed -n -e /match/p
sed -n -e/match/p
Let's look at the one of the working sed commands you wrote:
$ s="-e s/a/b/"
$ ls | sed $s
What the sed command is passed here is two arguments (after it's command name):
-e
s/a/b/
This is a perfectly fine set of arguments for sed. What went wrong with the first one, then?
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | sed $s
Well, this time, the sed command was passed 6 arguments:
-e
's/^
*//'
-e
's/
*$//'
You can use the al command (argument list — print each argument on its own line; it is described and implemented at the bottom of this answer) to see how arguments are presented to sed. Simply type al in place of sed in the examples.
Now, the -e option should be followed by a valid sed command, but 's/^ is not a valid command; the quote ' is not a valid sed command. When you type the command at the shell prompt, the shell processes the single quote and removes it, so sed does not normally see it, but that happens before shell variables are expanded.
Why, then, does the eval work:
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | eval sed $s
The eval re-evaluates the command line. It sees:
eval sed -e 's/$ *//' -e 's/ *$//'
and goes through the full evaluation process. It removes the single quotes after grouping the characters, so sed sees:
-e
s/$ *//
-e
s/ *$//
which is all completely valid sed scripting.
One of your tests was:
$ s="-e s/^ *//"
$ ls | sed $s
And this failed because sed was given the arguments:
-e
s/^
*//
The first is not a valid substitute command, and the second is unlikely to be a valid file name. Interestingly, you could rescue this by putting double quotes around the $s, as in:
$ s="-e s/^ *//"
$ ls | sed "$s"
Now sed gets a single argument:
-e s/^ *//
but the -e can have the command attached, and leading spaces on commands are ignored, so this is all valid. You can't do that with your first attempt, though:
$ s="-e 's/^ *//' -e 's/ *$//'"
$ ls | sed "$s"
Now you get told about the ' not being recognized. You could, however, have used:
$ s="-e s/^ *//; s/ *$//"
$ ls | sed "$s"
Again, sed sees a single argument, and there are two semicolon-separated sed commands in the argument to the -e option.
You can ring the variations from here. I find the al command very useful; it quite often helps me understand where something is going wrong.
Source for al — argument list
#include <stdio.h>
int main(int argc, char **argv)
{
while (*++argv)
puts(*argv);
return 0;
}
This is one of the smallest useful C programs you can write ('hello world' is one line shorter, but it isn't useful for much beyond demonstrating how to compile and run a program). It lists each of its arguments on a line on its own. You can also simulate it in bash and other related shells with the printf command:
printf "%s\n" "$#"
Wrap it as a function:
al()
{
printf "%s\n" "$#"
}
The sed worked for your normal replace pattern because it did not have any metacharacters. You had just a and b. When there are metacharacters involved, you need single quotes.
I think the only way sed would work properly for your variable assignment case is only by using eval.

using sed to find and replace in bash for loop

I have a large number of words in a text file to replace.
This script is working up until the sed command where I get:
sed: 1: "*.js": invalid command code *
PS... Bash isn't one of my strong points - this doesn't need to be pretty or efficient
cd '/Users/xxxxxx/Sites/xxxxxx'
echo `pwd`;
for line in `cat myFile.txt`
do
export IFS=":"
i=0
list=()
for word in $line; do
list[$i]=$word
i=$[i+1]
done
echo ${list[0]}
echo ${list[1]}
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
done
You're running BSD sed (under OS X), therefore the -i flag requires an argument specifying what you want the suffix to be.
Also, no files match the glob *.js.
This looks like a simple typo:
sed -i "s/{$list[0]}/{$list[1]}/g" *.js
Should be:
sed -i "s/${list[0]}/${list[1]}/g" *.js
(just like the echo lines above)
So myFile.txt contains a list of from:to substitutions, and you are looping over each of those. Why don't you create a sed script from this file instead?
cd '/Users/xxxxxx/Sites/xxxxxx'
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt |
# Output from first sed script is a sed script!
# It contains substitutions like this:
# s:from:to:
# s:other:substitute:
sed -f - -i~ *.js
Your sed might not like the -f - which means sed should read its script from standard input. If that is the case, perhaps you can create a temporary script like this instead;
sed -e 's/^/s:/' -e 's/$/:/' myFile.txt >script.sed
sed -f script.sed -i~ *.js
Another approach, if you don't feel very confident with sed and think you are going to forget in a week what the meaning of that voodoo symbols is, could be using IFS in a more efficient way:
IFS=":"
cat myFile.txt | while read PATTERN REPLACEMENT # You feed the while loop with stdout lines and read fields separated by ":"
do
sed -i "s/${PATTERN}/${REPLACEMENT}/g"
done
The only pitfall I can see (it may be more) is that if whether PATTERN or REPLACEMENT contain a slash (/) they are going to destroy your sed expression.
You can change the sed separator with a non-printable character and you should be safe.
Anyway, if you know whats on your myFile.txt you can just use any.

Resources