I am writing a shell script for which I write a header that has 30 (growing) column names. Right now, I have a echo statement that works and looks like this
echo "Colum_Name1, Column_Name2,Column_Name30"
While this works the readability sucks for me. if i want to add a column, its a bit of a nightmare to look at the screen and understand whether it is already in there. of course, I search my way out. Is it possible to do something like this with echo or printf and get the CSV in one line?
echo " Column_Name1,
Column_Name2,
Column_Name30"
and get the output as
Column_Name1,Column_Name2,Column_Name30
You can add backslash as the line continuation:
echo " Column_Name1,"\
"Column_Name2,"\
"Column_Name30"
From the bash manual:
The backslash character ‘\’ may be used to remove any special meaning
for the next character read and for line continuation.
Decouple the definition of the header and printing it, and use an array to store the column names.
headers=(
Column_Name1
Column_Name2
Column_Name30
)
(IFS=","; printf '%s\n' "${headers[*]}")
The elements of the array are joined by the first character of IFS when ${headers[*]} is expanded. The subshell is used so you don't have to worry about restoring the previous value of IFS.
Convenience solution, using paste:
If you don't mind the (probably negligible) overhead of invoking an external utility (paste) to build your string, you can combine it with a (literal, in this case) here-doc:
paste -s -d, - <<'EOF'
Column_Name1
Column_Name2
Column_Name30
EOF
yields
Column_Name1,Column_Name2,Column_Name30
The above acts like a single-quoted string, due to the opening delimiter, 'EOF', being quoted.
Omit the enclosing '...' to treat the string like a double-quoted string, i.e., with expansions being performed (allowing the inclusion of variable references, command substitutions, and arithmetic expansions).
If you take care to use actual leading tabs (\t) in your here-doc (multiple spaces do not work), you can even introduce indentation, by prepending - to the opening delimiter:
# !! Only works with actual *tabs* as the leading whitespace.
paste -s -d, - <<-'EOF'
Column_Name1
Column_Name2
Column_Name30
EOF
More efficient solution, using line continuation:
POSIX-compatible shells support line continuation even inside double-quoted strings, "..." (but not inside single-quoted ones, '...').
That means that any \<newline> sequence inside a double-quoted string is removed:
echo "\
Column_Name1,\
Column_Name2,\
Column_Name3\
"
Given that a here-document with an unquoted opening delimiter is treated like a double-quoted string, you can do the following:
cat <<EOF
Column_Name1,\
Column_Name2,\
Column_Name30
EOF
Note:
Using <<-EOF with to-be-stripped leading tabs (\t) for readability is not an option here, because the line continuations will still include them.
To take advantage of line continuation, it is invariably the interpolating (expanding) here-doc variety that must be used; therefore, you may need to \-escape $ instances to ensure their literal use.
Both commands again yield the desired single-line string:
Column_Name1,Column_Name2,Column_Name30
echo "foo bar" | (IFS=" "; xargs -n 1 echo)
yields
foo
bar
Related
To escape characters in bash, Why the syntax is confusing when nesting commands deeply?, I know that there is an alternate approach with $() to nest commands, Just curious, why it is as such when nesting commands using backticks!
For example:
echo `echo \`echo \\\`echo inside\\\`\``
Gives output: inside
But
echo `echo \`echo \\`echo inside\\`\``
Fails with,
bash: command substitution: line 1: unexpected EOF while looking for matching ``'
bash: command substitution: line 2: syntax error: unexpected end of file
bash: command substitution: line 1: unexpected EOF while looking for matching ``'
bash: command substitution: line 2: syntax error: unexpected end of file
echo inside\
My question is that why the number of backslashes required for second level nesting is 3 and why it is not 2. In the above example given, one backslash is used for one level deep and three are used for second-level nesting commands to preserve the literal meaning of the backtick.
The basic problem is that there's no distinction between an open-backtick and a close-backtick. So if the shell sees something like this:
somecommand ` something1 ` something2 ` something3 `
...there's no intrinsic way to tell if that's two separate backticked commands (something1 and something3), with a literal string ("something2") in between; or a nested backtick expression, with something2 being run first and its output passed to something1 as an argument (along with the literal string "something3"). In order to avoid ambiguity, the shell syntax picks the first interpretation, and requires that if you want the second interpretation you need to escape the inner level of backticks:
somecommand ` something1 ` something2 ` something3 ` # Two separate expansions
somecommand ` something1 \` something2 \` something3 ` # Nested expansions
And that means adding another level of parsing-and-removing escapes, which means you need to escape any escapes you didn't want parsed at that point, and the whole thing gets quickly out of hand.
The $( ) syntax, on the other hand, is not ambiguous, because the opening and closing markers are not the same. Compare the two possibilities:
somecommand $( something1 ) something2 $( something3 ) # Two separate expansions
somecommand $( something1 $( something2 ) something3 ) # Nested expansions
There's no ambiguity there, so no need for escapes or other syntactic weirdness.
The reason the number of escapes grows so fast with the number of levels is again to avoid ambiguity. And it's not something specific to command expansions with backticks; this escape inflation shows up anytime you have a string going through multiple levels of parsing, each of which applies (and removes) escapes.
Suppose the shell runs across two escapes and a backtick (\\`) as it parses a line. Should it parse that as a doubly-escaped backtick, or a singly-escaped escape (backslash) character followed by a not-escaped-at-all backtick? If it runs across three escapes and a backtick (\\\`), is that a triply-escaped backtick, a doubly-escaped escape followed a not-escaped-at-all backtick, or a singly-escaped escape followed by a singly-escaped backtick?
The shell (like most things that deal with escapes) avoids the ambiguity by not treating stacked escapes as a special thing. When it runs into an escape character, that applies only to the thing immediately after it; if the thing immediately after it is another escape, then it escapes that one character and has no effect on whatever's after it. Thus \\` is an escaped escape, followed by a not-escaped-at-all backtick. That means you can't just add another escape to the front, you have to add an escape in front of each and every escape-worthy character in the string (including escapes from lower levels).
So, let's start with a simple backtick, and work through escaping it to various levels:
First level is easy, just escape it: \'.
For the second level, we have to escape that escape (\\) and then separately escape the backtick itself (\`), giving a total of three backticks: \\\`.
For the third level, we have to individually escape each of those three escapes (so 3x\\) and once again escape the backtick itself (\`), giving a total of seven backticks: \\\\\\\`.
It continues like that, more than doubling the number of escapes for each level. From 7 it goes to 15, then 31, then 63, then... There's a good reason people try to avoid situations with deeply nested escapes.
Oh, and as I mentioned, the shell isn't the only thing that does this, and that can complicate matters because different levels can have different escaping syntaxes, and some things may not need escaping at some of the levels. For example, suppose the thing being escaped is the regular expression \s. To add a level to that, you'd only need one additional escape (\\s) because the "s" doesn't need to be escaped by itself. Additional levels of escaping on that would give 4, 8, 16, 32 etc escapes.
TLDR; Yo, dawg, I heard you like escapes...
P.s. You can use the shell's -v option to make it print commands before executing them. With nested commands like this, it'll print each of the commands as it un-nests them, so you can watch the stack escaped escapes collapse as the layers get stripped off:
$ set -v
$ echo "this is `echo "a literal \`echo "backtick: \\\\\\\`" \`" `"
echo "this is `echo "a literal \`echo "backtick: \\\\\\\`" \`" `"
echo "a literal `echo "backtick: \\\`" `"
echo "backtick: \`"
this is a literal backtick: `
(For even more fun, try this after set -vx -- the -x option will print the commands after parsing, so after you see it drill into the nested commands, you'll then see what happens as it unwinds back out to the final top-level command.)
There is nothing confusing per se in the syntax that you have shown. You just need to breakdown each of the levels one by one.
The GNU bash man page says
When the old-style backquote form of substitution is used, backslash retains its literal meaning except when followed by $, `, or \.
Command substitutions may be nested. To nest when using the backquoted form, escape the inner backquotes with backslashes.
So with that in context, the nested substitution has one \ to escape the back-quote and one more to escape the escape character (now read the above quote that \ loses its special meaning except when followed by another \). So that's the reason the second level of escaping needs two additional backslashes to escape the original character
echo `echo \`echo \\\`echo inside\\\`\``
# ^^^^ ^^^^
becomes
echo `echo \`echo inside\``
# ^^ ^^
which in turn becomes
echo `echo inside`
# ^ ^
which eventually becomes
echo inside
I have to variables in a bash script:
$string = "The cat is green.\n"
$line = "Sunny day today.\n"
each of those variables contain "\n" character, how can I use sed to search and replace:
sed 's/$string/$line/g' file.txt
This doesn't seem to work, if I erase the "\n" from the strings sed works properly.
If I had only the text I could escape "\n" by adding a backslash:
sed 's/"The cat is green.\\n"/"Sunny day today.\\n"/g' file.txt
How can I manage to do search/replace when variables contain "\n" in them.
Thank you for the help.
It looks like you are trying to match the two-character sequence \n, as opposed to the single newline character that together they represent in some contexts. There is a tremendous difference between these.
As part of your example, you presented
sed 's/$string/$line/g' file.txt
, but that won't work at all, because variable references are not expanded within single-quoted strings. That has nothing whatever to do with the values of shell variables string and line.
But let's consider those values:
$string="The cat is green.\n"
$line="Sunny day today.\n"
[Extra spaces removed.]
Of course, the problem you're focusing on is that sed recognizes \n as a code for a newline character, but you also have the problem that in a regular expression, the . character matches any character, so if you want it to be treated as a literal then it, too, needs to be escaped (in the pattern, but not in the replacement). If you're trying to support search and replace for arbitrary text, then there are other characters you'll need to escape, too.
Answering the question as posed (escaping only \n sequences) you might do this:
sed "s/${string//\\n/\\\\n}/${line//\\n/\\\\n}/g"
The ${foo//pat/repl} form of parameter expansion performs pattern substitution on the expanded value, but note well that the pattern (pat) is interpreted according to shell globbing rules, not as a regular expression. That specific form replaces every appearance of the pattern; read the bash manual for alternatives that match only the first appearance and/or that match only at the beginning or the end of the parameter's value. Note, too, the extra doubling of the \ characters in the pattern substitution -- they need to be escaped for the shell, too.
Given your variable definitions, that command would be equivalent to this:
sed 's/The cat is green.\\n/Sunny day today.\\n/g'
In other words, exactly what you wanted. Again, however, be warned: that is not a general solution for arbitrary search & replace. If you want that, then you'll want to study the sed manual to determine which characters need to be escaped in the regex, and which need to be escaped in the replacement. Moreover, I don't see a way to do it with just one pattern substitution for each variable.
I am having a large shell script file. At times while doing modification I want to comment out part of it. But commenting line as shown in the below example is giving me error.
Script:
#!/bin/bash
<<COMMENT1
read build_label
read build_branch_tag
build_date_tag=$(echo $build_label | sed "s/$build_branch_tag//g")
echo $build_path
COMMENT1
echo "HELLO WORLD"
Error Message:
sed: first RE may not be empty
I just want to understand what's wrong with the above script and why comment section is not working properly.
First, using here docs to comment code is really dirty! Use the # instead. If you want to comment multiple lines, use your editor. In vim (commenting lines from 10 to 15 for example):
:10,15s/^/#
However, to solve your current problem you need to enclose the starting here-doc delimiter in single quotes, like this:
<<'COMMENT'
...
COMMENT
Using single quotes you tell bash that it should not attempt to expand variables or expression inside the here doc body.
Traditional UNIX shell doesn't have multiline comment support. What you're doing here is using a so-called "HERE document" without using its value, a common hack to get multiline comment like behaviour.
However, patterns inside the the HERE document are still evaluated, which means that your $(…) is executed. But since build_branch_tag has not been defined before, it will evaluate to an empty string, and the shell will thus execute sed s///g.
You can use a different hack:
: '
Bla bla, no $expansion is taking place here.
'
What this is doing: the : is a no-op command, it simply does nothing. And you're passing it an argument which is a string '…'. Inside the single quotes, no expansion/evaluation is taking place. Beware of ' inside the "commented out" region, though.
You can turn parameter substitution off inside a here document like this:
<<"Endofmessage"
or
<<'Endofmessage'
or
<<\Endofmessage
Here Documents
This type of redirection instructs the shell to read input from the
current source until a line containing only delimiter (with no
trailing blanks) is seen. All of the lines read up to that point are
then used as the standard input for a command. The format of
here-documents is:
<<[-]word
here-document delimiter No parameter expansion, command substitution, arithmetic expansion, or pathname expansion is performed
on word. If any characters in word are quoted, the delimiter is the
result of quote removal on word, and the lines in the here-document
are not expanded. If word is unquoted, all lines of the here-document
are subjected to parameter expansion, command substitution, and
arithmetic expansion. In the latter case, the character sequence
\ is ignored, and \ must be used to quote the characters \,
$, and `. If the redirection operator is <<-, then all leading tab
characters are stripped from input lines and the line containing
delimiter. This allows here-documents within shell scripts to be
indented in a natural fashion.
And maybe something that you may also like: I prefer to do multiline comments in my bash script with the nodepad++ shortcut ctrl+Q (toggle comment).
if this is not a syntax error (open string, ...)
#!/bin/bash
if false;then
read build_label
read build_branch_tag
build_date_tag=$(echo $build_label | sed "s/$build_branch_tag//g")
echo $build_path
fi
echo "HELLO WORLD"
if sysntax error or equivalent (unfound place like in search of error by descativate part of failing code)
#!/bin/bash
#read build_label
#read build_branch_tag
#build_date_tag=$(echo $build_label | sed "s/$build_branch_tag//g")
#echo $build_path
echo "HELLO WORLD"
for this you can use:
- editor if find/replace with regex is available like vi(m)
- a sed (sed '14,45 s/^/#/' YourFile > YourFile.Debug where 14 and 45 are first and last lines to comment)
Using here docs to comment code is safe and elegant like this:
: <<'EOT'
Example usage of the null command ':' and the here-document syntax for a
multi-line comment. If the delimiter word ('EOT' here) is quoted, the
here-document will not be expanded in any way. This is important, as
an unquoted delimiter will result in problems with unintended potential
expansions. All of this here-doc text is redirected to the standard input
of :, which does nothing but return true.
EOT
Say I have this command:
printf $text | perl program.pl
How do I guarantee that everything in the $text variable is literally? For example, if $text contains hello"\n, how do I make sure that's exactly what gets passed to program.pl, without the newline or quotation mark (or any conceivable character) being interpreted as a special character?
Quotes!
printf '%s' "$text" | ...
Don't ever expand variables unquoted if you care about preserving their contents precisely. Also, don't ever pass a dynamic string as a format variable when you want it to be treated as literal data.
If you want backslash sequences to be interpreted -- for instance, the two-character sequence \n to be changed to a single newline -- and your shell is bash, use printf '%b' "$text" instead. If you want byte-for-byte accuracy, %s is the Right Thing (and works on any POSIX-compliant shell). If you want escaping for interpretation by another shell (which would be appropriate if, say, you were passing content as part of a ssh command line), then the appropriate format string (for bash only) is %q.
How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.
If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.
This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.
Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file
To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment
Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.
As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.
sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.