Newlines at the end "$( .... )" get removed in shell scripts. Why? - shell

Can someone explain why the output of these two commands is different?
$ echo "NewLine1\nNewLine2\n"
NewLine1
NewLine2
<-- Note 2nd newline here
$ echo "$(echo "NewLine1\nNewLine2\n")"
NewLine1
NewLine2
$ <-- No second newline
Is there any good way that I can keep the new lines at the end of the output in "$( .... )" ? I've thought about just adding a dummy letter and removing it, but I'd quite like to understand why those new lines are going away.

Because that's what POSIX specifies and has always been like that in Bourne shells:
2.6.3 Command Substitution
Command substitution allows the output of a command to be substituted
in place of the command name itself. Command substitution shall occur
when the command is enclosed as follows:
$(command)
or (backquoted version):
`command`
The shell shall expand the command substitution by executing command
in a subshell environment (see Shell Execution Environment) and
replacing the command substitution (the text of command plus the
enclosing "$()" or backquotes) with the standard output of the
command, removing sequences of one or more <newline> characters at the
end of the substitution. Embedded <newline> characters before the end
of the output shall not be removed; however, they may be treated as
field delimiters and eliminated during field splitting, depending on
the value of IFS and quoting that is in effect. If the output contains
any null bytes, the behavior is unspecified.
One way to keep the final newline(s) would be
VAR="$(command; echo x)" # Append x to keep newline(s).
VAR=${VAR%x} # Chop x.
Vis.:
$ x="$(whoami; echo x)" ; printf '<%s>\n' "$x" "${x%x}"
<user
x>
<user
>
But why remove trailing newlines? Because more often than not you want it that way. I'm also programming in perl and I can't count the number of times where I read a line or variable and then need to chop the newline:
while (defined ($string = <>)) {
chop $string;
frobnitz($string);
}

command substitution removes every trailing newline.
It makes sense to remove one. For instance:
basename foo/bar
outputs bar\n. In:
var=$(basename foo/bar)
you want $var to contain bar, not bar\n.
However in
var=$(basename $'foo/bar\n')
You would like $var to contain bar\n (after all, newline is as valid a character as any in a file name on Unix). But all shells remove every trailing newline character. That misfeature was in the original Bourne shell and even rc which has fixed most of Bourne's flaws has not fixed that one. (though rc has the ``(){cmd} syntax to not strip any newline character).
In POSIX shells, to work around the issue, you can do:
var=$(basename -- "$file"; echo .)
var=${var%??}
Though you're then losing the exit status of basename. Which you can fix with:
var=$(basename -- "$file" && echo .) && var=${var%??}
${var%??} is to remove the last two characters. The first one is the . that we added above, the second is the one newline character added by basename, we're not removing any more as command substitution would do as the other newline characters, if any, would be part of the filename we want to get the base of, so we do want them.
In the Bourne shell which doesn't have the ${var%x} operator, you had to go a long and convoluted way to work around it.

If the newlines were not removed, then constructs like:
x="$(pwd)/filename"
would not work usefully, but the people who wrote Unix preferred useful behaviour.
Once, briefly, a very long time ago (like 1983, maybe 1984), I suffered from a shell update on a particular variant of Unix that didn't remove the trailing newline. It broke scripts all over the place. It was fixed very quickly.

Related

How to escape bash parameter expansion (when a "!" followed by letters is a parameter) [duplicate]

This question already has answers here:
echo "#!" fails -- "event not found"
(5 answers)
Closed 7 years ago.
I am attempting to parse the output of a VNC server startup event and have run into a problem in parsing using sed in a command substitution. Specifically, the remote VNC server is started in a manner such as the following:
address1="user1#lxplus.cern.ch"
VNCServerResponse="$(ssh "${address1}" 'vncserver' 2>&1)"
The standard error output produced in this startup event is then to be parsed in order to extract the server and display information. At this point the content of the variable VNCServerResponse is something such as the following:
New 'lxplus0186.cern.ch:1 (user1)' desktop is lxplus0186.cern.ch:1
Starting applications specified in /afs/cern.ch/user/u/user1/.vnc/xstartup
Log file is /afs/cern.ch/user/u/user1/.vnc/lxplus0186.cern.ch:1.log
This output can be parsed in the following way in order to extract the server and display information:
echo "${VNCServerResponse}" | sed '/New.*desktop.*is/!d' \
| awk -F" desktop is " '{print $2}'
The result is something such as the following:
lxplus0186.cern.ch:1
What I want to do is use this parsing in a command substitution something like the following:
VNCServerAndDisplayNumber="$(echo "${VNCServerResponse}" \
| sed '/New.*desktop.*is/!d' | awk -F" desktop is " '{print $2}')"
On attempting to do this, I am presented with the following error:
bash: !d': event not found
I am not sure how to address this. It appears to be a problem in the way sed is being used in the command substitution. I would appreciate guidance.
Bash history expansion is a very odd corner in the bash command line parser, and you are clearly running into an unexpected history expansion, which is explained below. However, any sort of history expansion in a script is unexpected, because normally history expansion is not enabled in scripts; not even scripts run with the source (or .) builtin.
How history expansion is enabled (or disabled)
There are two shell options which control history expansion:
set -o history: Required for the history to be recorded.
set -H (or set -o histexpand): Additionally required for history expansion to be enabled.
Both of these options must be set for history expansion to be recognized. (I found the manual unclear on this interaction, but it's logical enough.)
According to the bash manual, these options are unset for non-interactive shells, so if you want to enable history expansion in a script (and I cannot imagine a reason you would want this), you would need to set both of them:
set -o history -o histexpand
The situation for scripts run with source is more complicated (and what I'm about to say only applies to bash v4, and since it's undocumented in might change in the future). [Note 3]
History recording (and consequently expansion) is turned off in source'd scripts, but through an internal flag which, as far as I know, is not made visible. It certainly does not appear in $SHELLOPTS. Since a sourced script runs in the current bash context, it shares the current execution environment, including shell options. So in the execution of a sourced script initiated from an interactive session, you'll see both history and histexpand in $SHELLOPTS, but no history expansion will take place. In order to enable it, you need to:
set -o history
which is not a no-op because it has the side-effect of resetting the internal flag which suppresses history recording. Setting the histexpand shell option does not have this side-effect.
In short, I'm not sure how you managed to enable history expansion in a script (if, indeed, the misbehaving command was in a script and not in an interactive shell), but you might want to consider not doing so, unless you have a really good reason.
How history expansion is parsed
The bash implementation of history expansion is designed to work with readline, so that it can be performed during command input. (By default this function is bound to Meta-^; generally Meta is ESC, but you can customize that as well.) However, it is also performed immediately after each line is input, before any bash parsing is performed.
By default, the history expansion character is !, and -- as mostly documented -- that will trigger history expansion except:
when it is followed by whitespace or =
if the shell option extglob is set, and it is followed by ( [Note 1]
if it appears in a single-quoted string
if it is preceded by a \ [Note 2 and see below]
if it is preceded by $ or ${ [Note 1]
if it is preceded by [ [Note 1]
(As of bash v4.3) if it is the last character in a double-quoted string.
The immediate issue here is the precise interpretation of the third case, an ! appearing inside of a single-quoted string. Normally, bash starts a new quoting context for a command substitution ($(...) or the deprecated backtick notation). For example:
$ s=SUBSTITUTED
$ # The interior single quotes are just characters
$ echo "'Echoing $s'"
'Echoing SUBSTITUTED'
$ # The interior single quotes are single quotes
$ echo "$(echo 'Echoing $s')"
Echoing $s
However, the history expansion scanner isn't that intelligent. It keeps track of quotes, but not of command substitution. So as far as it is concerned, both of the single quotes in the above example are double-quoted single quotes, which is to say ordinary characters. So history expansion occurs in both of them:
# A no-op to indicated history expansion
$ HIST() { :; }
# Single-quoted strings inhibit history expansion
$ HIST
$ echo '!!'
!!
# Double-quoted strings allow history expansion
$ HIST
$ echo "'!!'"
echo "'HIST'"
'HIST'
# ... and it applies also to interior command substitution.
$ HIST
$ echo "$(echo '!!')"
echo "$(echo 'HIST')"
HIST
So if you have a perfectly normal command like sed '/foo/!d' file, where you would expect the single-quotes to protect you from history-expansion, and you put it inside a double-quoted command substitution:
result="$(sed '/foo/!d' file)"
you suddenly find that the ! is a history expansion character. Worse, you can't fix this by backslash escaping the exclamation point, because although "\!" inhibits history expansion, it doesn't remove the backslash:
$ echo "\!"
\!
In this particular example -- and the one in the OP -- the double quotes are completely unnecessary, because the right-hand side of a variable assignment does not undergo either filename expansion nor word splitting. However, there are other contexts in which removing the double quotes would change the semantics:
# Undesired history expansion
printf "The answer is '%s'\n" "$(sed '/foo/!d' file)"
# Undesired word splitting
printf "The answer is '%s'\n" $(sed '/foo/!d' file)
In this case, the best solution is probably to put the sed argument in a variable
# Works
sed_prog='/foo/!d'
printf "The answer is '%s'\n" "$(sed "$sed_prog" file)"
(The quotes around $sed_prog were not necessary in this case but usually they would be, and they do no harm.)
Notes:
The inhibition of history expansion when the following character is some form of open parenthesis only works if there is a corresponding close parenthesis in the rest of the string. However, it doesn't have to really match the open parenthesis. For example:
# No matching close parenthesis
$ echo "!("
bash: !: event not found
# The matching close parenthesis has nothing to do with the open
$ echo "!(" ")"
!( )
# An actual extended glob: files whose names don't start with a
$ echo "!(a*)"
b
As indicated in the bash manual, a history-expansion character is treated as an ordinary character if immediately preceded by a backslash. This is literally true; it doesn't matter whether the backslash will later be considered an escape character or not:
$ echo \!
!
$ echo \\!
\!
$ echo \\\!
\!
\ also inhibits history expansion inside double quotes, but \! is not a valid escape sequence inside the double quoted string, so the backslash is not removed:
$ echo "\!"
\!
$ echo "\\!"
\!
$ echo "\\\!"
\\!
I'm referring to the source code for bash v4.2 as I write this, so any undocumented behaviour may be completely different as of v4.3.
The problem is that within double quotes, bash is trying to expand !d before passing it to the subshell. You can get around this problem by removing the double quotes but I would also propose a simplification to your script:
VNCServerAndDisplayNumber=$(echo "$VNCServerResponse" | awk '/desktop/ {print $NF}')
This simply prints the last field on the line containing the word "desktop".
On a newer bash, you can use a herestring rather than piping an echo:
VNCServerAndDisplayNumber=$(awk '/desktop/ {print $NF}' <<<"$VNCServerResponse")
Don't wrap the $(...) command substitution in double quotes. You are asking the shell to perform evaluation on the contents of the quotes and are hitting the history substitution expansion feature. Drop the quotes and you stop telling the shell to do that and you won't hit that problem.
And yes, dropping those quotes is safe on that assignment line even if the output may contain spaces or newlines or whatever. Assignments of that sort are not going to split on those the way command substitution or variable evaluation will on a normal shell execution line.
Alternatively, disable history expansion in your shell/script before you run that. (It should be off when running a script by default I believe anyway.)
This only happens when history expansion is enabled, which it normally isn't and definitely shouldn't be for scripts.
Rather than trying to work around it, figure out why history expansion is enabled and what to do so it isn't.
If you're executing your script with . foo or source foo, use ./foo instead.
If you're writing this as a function in .bashrc or similar, consider making it a separate script.
If your script (or BASH_ENV) explicitly does set -H, don't.
Quote it with '' or \ or disable history expansion with set +H or shopt -u -o histexpand. See History Expansion.

ls -1 is not working as intended in csh

I want to list all the files in a directory one line after another, for which I am using a sample shell script as follows
#!/bin/sh
MY_VAR="$(ls -1)"
echo "$MY_VAR"
This works out as expected, however if the same is executed using csh as follows:
#!/bin/csh
set MY_VAR = `ls -1`
echo $MY_VAR
it outputs all files in one single line, than printing one file per line.
Can any one explain why in csh ls -1 is not working as expected.
From man csh, emphasis mine:
Command substitution
Command substitution is indicated by a command enclosed in ``'. The
output from such a command is broken into separate words at blanks,
tabs and newlines, and null words are discarded. The output is vari‐
able and command substituted and put in place of the original string.
Command substitutions inside double quotes (`"') retain blanks and
tabs; only newlines force new words. The single final newline does not
force a new word in any case. It is thus possible for a command sub‐
stitution to yield only part of a word, even if the command outputs a
complete line.
By default, the shell since version 6.12 replaces all newline and car‐
riage return characters in the command by spaces. If this is switched
off by unsetting csubstnonl, newlines separate commands as usual.
The entries are assigned in a list; you can access a single entry with e.g. echo $MY_VAR[2].
This is different from the Bourne shell, which doesn't have the concept of a "list" and variables are always strings.
To print all the entries on a single line, use a foreach loop:
#!/bin/csh
set my_var = "`ls -1`"
foreach e ($my_var)
echo "$e"
end
Adding double quotes around the `ls -1` is recommended, as this will make sure it works correctly when you have filenames with a space (otherwise such files would show up as two words/list entries).

bash print words on multiple lines in a single line

I am writing a shell script for which I write a header that has 30 (growing) column names. Right now, I have a echo statement that works and looks like this
echo "Colum_Name1, Column_Name2,Column_Name30"
While this works the readability sucks for me. if i want to add a column, its a bit of a nightmare to look at the screen and understand whether it is already in there. of course, I search my way out. Is it possible to do something like this with echo or printf and get the CSV in one line?
echo " Column_Name1,
Column_Name2,
Column_Name30"
and get the output as
Column_Name1,Column_Name2,Column_Name30
You can add backslash as the line continuation:
echo " Column_Name1,"\
"Column_Name2,"\
"Column_Name30"
From the bash manual:
The backslash character ‘\’ may be used to remove any special meaning
for the next character read and for line continuation.
Decouple the definition of the header and printing it, and use an array to store the column names.
headers=(
Column_Name1
Column_Name2
Column_Name30
)
(IFS=","; printf '%s\n' "${headers[*]}")
The elements of the array are joined by the first character of IFS when ${headers[*]} is expanded. The subshell is used so you don't have to worry about restoring the previous value of IFS.
Convenience solution, using paste:
If you don't mind the (probably negligible) overhead of invoking an external utility (paste) to build your string, you can combine it with a (literal, in this case) here-doc:
paste -s -d, - <<'EOF'
Column_Name1
Column_Name2
Column_Name30
EOF
yields
Column_Name1,Column_Name2,Column_Name30
The above acts like a single-quoted string, due to the opening delimiter, 'EOF', being quoted.
Omit the enclosing '...' to treat the string like a double-quoted string, i.e., with expansions being performed (allowing the inclusion of variable references, command substitutions, and arithmetic expansions).
If you take care to use actual leading tabs (\t) in your here-doc (multiple spaces do not work), you can even introduce indentation, by prepending - to the opening delimiter:
# !! Only works with actual *tabs* as the leading whitespace.
paste -s -d, - <<-'EOF'
Column_Name1
Column_Name2
Column_Name30
EOF
More efficient solution, using line continuation:
POSIX-compatible shells support line continuation even inside double-quoted strings, "..." (but not inside single-quoted ones, '...').
That means that any \<newline> sequence inside a double-quoted string is removed:
echo "\
Column_Name1,\
Column_Name2,\
Column_Name3\
"
Given that a here-document with an unquoted opening delimiter is treated like a double-quoted string, you can do the following:
cat <<EOF
Column_Name1,\
Column_Name2,\
Column_Name30
EOF
Note:
Using <<-EOF with to-be-stripped leading tabs (\t) for readability is not an option here, because the line continuations will still include them.
To take advantage of line continuation, it is invariably the interpolating (expanding) here-doc variety that must be used; therefore, you may need to \-escape $ instances to ensure their literal use.
Both commands again yield the desired single-line string:
Column_Name1,Column_Name2,Column_Name30
echo "foo bar" | (IFS=" "; xargs -n 1 echo)
yields
foo
bar

Why does echo "$out" split output onto multiple lines, if quotes suppress word-splitting?

I have very simple directory with "directory1" and "file2" in it.
After
out=`ls`
I want to print my variable: echo $out gives:
directory1 file2
but echo "$out" gives:
directory1
file2
so using quotes gives me output with each record on separate line. As we know ls command prints output using single line for all files/dirs (if line is big enough to contain output) so I expected that using double quotes prevents my shell from splitting words to separate lines while ommitting quotes would split them.
Pls tell me: why using quotes (used for prevent word-splitting) suddenly splits output ?
On Behavior Of ls
ls only prints multiple filenames on a single line by default when output is to a TTY. When output is to a pipeline, a file, or similar, then the default is to print one line to a file.
Quoting from the POSIX standard for ls, with emphasis added:
The default format shall be to list one entry per line to standard output; the exceptions are to terminals or when one of the -C, -m, or -x options is specified. If the output is to a terminal, the format is implementation-defined.
Literal Question (Re: Quoting)
It's the very act of splitting your command into separate arguments that causes it to be put on one line! Natively, your value spans multiple lines, so echoing it unmodified (without any splitting) prints it precisely that manner.
The result of your command is something like:
out='directory1
file2'
When you run echo "$out", that exact content is printed. When you run echo $out, by contrast, the behavior is akin to:
echo "directory1" "file2"
...in that the string is split into two elements, each passed as completely different argument to echo, for echo to deal with as it sees fit -- in this case, printing both those arguments on the same line.
On Side Effects Of Word Splitting
Word-splitting may look like it does what you want here, but that's often not the case! Consider some particular issues:
Word-splitting expands glob expressions: If a filename contains a * surrounded by whitespace, that * will be replaced with a list of files in the current directory, leading to duplicate results.
Word-splitting doesn't honor quotes or escaping: If a filename contains whitespace, that internal whitespace can't be distinguished from whitespace separating multiple names. This is closely related to the issues described in BashFAQ #50.
On Reading Directories
See Why you shouldn't parse the output of ls. In short -- in your example of out=`ls`, the out variable (being a string) isn't able to store all possible filenames in a useful, parsable manner.
Consider, for instance, a file created as such:
touch $'hello\nworld"three words here"'
...that filename contains spaces and newlines, and word-splitting won't correctly detect it as a single name in the output from ls. However, you can store and process it in an array:
# create an array of filenames
names=( * )
if ! [[ -e $names || -L $names ]]; then # this tests only the FIRST name
echo "No names matched" >&2 # ...but that's good enough.
else
echo "Found ${#files[#]} files" # print number of filenames
printf '- %q\n' "${names[#]}"
fi

Delete all comments in a file using sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment
Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.
As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.

Resources