How do I properly escape data for a Makefile?

How do I properly escape data for a Makefile? - bash

I'm dynamically generating config.mk with a bash script which will be used by a Makefile. The file is constructed with:
cat > config.mk <<CFG
SOMEVAR := $value_from_bash1
ANOTHER := $value_from_bash2
CFG
How do I ensure that the generated file really contains the contents of $value_from_bash*, and not something expanded / interpreted? I probably need to escape $ to $$ and \ to \\, but are there other characters that needs to be escaped? Perhaps there is a special literal assignment I've not heard of?
Spaces seems to be troublesome too:
$ ls -1
a b
a
$ cat Makefile
f := a b
default_target:
echo "$(firstword $(wildcard ${f}))"
$ make
a
If I use f := a\ b it works (using quotes like f := 'a b' did not work either, makefile just treats it as a regular character)

Okay, it turned out that Makefiles need little escaping for itself, but the commands which are executed by the shell interpreter need to be escaped.
Characters which have a special meaning in Makefile and that need to be escaped are:
sharp (#, comment) becomes \#
dollar ($, begin of variable) becomes $$
Newlines cannot be inserted in a variable, but to avoid breaking the rest of the Makefile, prepend it with a backslash so the line break will be ignored.
Too bad a backslash itself cannot be escaped (\\ will still be \\ and not \ as you might expect). This makes it not possible to put a literal slash on the end of a string as it will either eat the newline or the hash of a following comment. A space can be put on the end of the line, but that'll also be put in the variable itself.
The recipe itself is interpreted as a shell command, without any fancy escaping, so you've to escape data yourself, just imagine that you're writing a shellscript and inserting the variables from other files. The strategy here would be putting the variables between single quotes and escape only ' with '\'' (close the string, insert a literal ' and start a new string). Example: mornin' all becomes 'morning'\'' all' which is equivalent to "morning' all".
The firstword+wildcard issue is caused by the fact that filenames with spaces in them are treated as separate filenames by firstword. Furthermore, wildcard expands escapes using \ so x\ y is matches as one word, x y and not two words.

It seems that the full answer to this question is found nowhere on the internet, so I finally sat down and figured it out for the Windows case.
Specifically, the "Windows case" refers to file names that are valid in Windows, meaning that they do not contain the characters \, /, *, ?, ", ^, <, >, |, or line breaks. It also means \ and / are both considered valid directory separators for the purposes of Make.
An example will clear it up better than I can explain. Basically, if you are trying to match this file path:
Child\a$b {'}(a.o#$#,&+=~`),[].c
Then you have to write these rules:
all: Child\\a$$b\\\ \\\ {'}(a.o\#$$#,&+=~`),[].o
%.o: %.c
$(CC) '$(subst ','"'"',$(subst \,,$(subst \\,/,$+)))'
Stare at it for a long time and it'll sort of start making some remote sense.
This works in my MSYS2 environment, so I presume it is correct.

I don't see how that makefile can work as you say. A pattern rule cannot be the default.
You're missing a `$` in `$(wildcard ...)`, so I think you haven't posted what you're really testing.
You should escape newlines too.

Related

Why 2 backslashes are needed to escape a backslash in vim -c argument?

In Windows 10, I used following command in Git bash 1.9.5,
$ vim -c "%s/^/\=line(".").'. '/g" -c "%s/\//\\/g" -c "%y+" -c "wq" ~/Desktop/sample.txt
in order to,
-c "%s/^/\=line(".").'. '/g": add a sequence of numbers before every line.
-c "%s/\//\\/g": replace all slashes with backslashes.
-c "%y+" -c "wq": copy all text to clipboard, save & exit Vim.
Everything works well except 2, it seems that -c "%s/\//\\/g" argument cannot be dealt correctly by Vim. None of slashes was replaced, and a g was inserted after every first / of each line. For example,
Sample.txt before
A/P/A/T/H
B/P/A/T/H
Sample.txt after
A/gP/A/T/H
B/gP/A/T/H
However, if I execute :%s/\//\\/g in vim, it would works as I expected.
Besides, I've tried these,
-c "%s/\//#/g" can replace all / to # as expected.
-c "%s,/,\\,g" will replace the first / in each line to ,g.
So, I wonder if it is a limit or a known issue of -c argument, or I made mistakes somewhere?
Edit: I just found by chance that -c "%s/\//\\\/g" works as expected.
So could anyone please explain why another \ is needed to escape \?

In Vim, when using regular expressions, the backslash is used to escape characters so it can't be used on its own to represent a literal backslash. For that, you must escape the literal backslash with an escape backslash:
:%s/\//\\/g
So you start with two backslashes no matter what: the escape one and the literal one. That's what Vim expects.
In your shell, backslashes also have a special meaning. When inside double quotes, two consecutive backslashes "collapse" into a single one so, when you think you are telling Vim to do:
:%s/\//\\/g
with:
-c "%s/\//\\/g"
what it actually receives is:
%s/\//\/g
which means: "substitute each slash with an escaped slash (so a simple slash) followed by letter g". Not exactly what you had in mind.
To make sure Vim actually receives the proper command you need to add a third backslash:
-c "%s/\//\\\/g"
Two backslashes are collapsed into a single one and the other one is left intact so you end up with two backslashes, which is what Vim expects:
%s/\//\\/g
Another, better, approach would be to use single quotes, where backslashes are always literal, instead of double quotes:
-c '%s/\//\\/g'
From this page about shell quoting, emphasis mine:
The backslash retains its meaning only when followed by dollar, backtick, double quote, backslash or newline. Within double quotes, the backslashes are removed from the input stream when followed by one of these characters.

Do Not interpret as Make variable in Make command

I have this code in a Makefile:
$(TCLNAME).batch.tcl: $(TCLNAME).tcl
echo source $::env(TOOLS_DIR)/my.tcl > $#
What I want to be printed in $(TCLNAME) is:
source $::env(TOOLS_DIR)/my.tcl
But I get an error because $::env(TOOLS_DIR) is being interpreted as a Make variable and it is expecting ( after the $.
How do I make it to print that line as is and not interpret it as Make variable ?
I tried to use escape character such as \$::env(TOOLS_DIR) but that also did not work.

Escape the $ with another $, and the parentheses with backslashes:
$(TCLNAME).batch.tcl: $(TCLNAME).tcl
echo source $$::env\(TOOLS_DIR\)/my.tcl > $#

A more universal escape method, as suggested by MadScientist:
Replace each $ with $$. The $ is the only symbol that Make has special treatment for.
Enclose the whole string in single quotes. Make doesn't care about them, they are for the shell.
If you need to print single quotes, replace each ' with '\''.
The end result:
echo 'source $$::env(TOOLS_DIR)/my.tcl' > $#
This way, only $ needs to be doubled (to prevent Make from interpreting it as a variable). Everything else is handled by the single quotes (which have no special meaning for Make, and are there for the shell).

How to write fancy-indented multi-line brace expansion in Bash?

I'm dealing with a line such :
mkdir -p "$DEST_ROOT_PATH/"{"$DEST_DIR1","$DEST_DIR2", ..., "$DEST_DIRN"}
This line is quite long. I want to cut it so its width will fit into a 80 columns line. I tried to escape an end of line with a backslash, but space alignement breaks the expansion :
$ echo "ha"{a,b,\
> c}
ha{a,b, c}

You could use this disgusting hack.
echo "ha"{a,b,\
> ` `c}
It opens a subshell with nothing in it, but gets processed before the expansion so the expansion just sees an empty space

This is the normal behaviour. From the Bash reference manual:
3.5.1 Brace expansion
Brace expansion is a mechanism by which arbitrary strings may be
generated. This mechanism is similar to filename expansion (see
Filename Expansion), but the filenames generated need not exist.
Patterns to be brace expanded take the form of an optional preamble, followed by either a series of comma-separated strings or a sequence
expression between a pair of braces, followed by an optional
postscript. The preamble is prefixed to each string contained within
the braces, and the postscript is then appended to each resulting
string, expanding left to right.
Brace expansion does not allow spaces in between elements that get placed between \ and the next element in the following line.
And why? Because it gets removed when being processed:
3.1.2.1 Escape Character
A non-quoted backslash ‘\’ is the Bash escape character. It preserves
the literal value of the next character that follows, with the
exception of newline. If a \newline pair appears, and the backslash
itself is not quoted, the \newline is treated as a line continuation
(that is, it is removed from the input stream and effectively
ignored).
So when you say
something + \ + <new line> + another_thing
Bash converts it into
something + another_thing
What can you do, then?
Add a backslash and then start writing from the very beginning on the next line:
mkdir -p "$DEST_ROOT_PATH/"{"$DEST_DIR1",\
"$DEST_DIR2",\
...,\
"$DEST_DIRN"}
Some examples
When you say:
$ echo "ha"{a,b\
> c}
ha{a,b c}
And then move the arrow up you'll see this is the command that was performed:
$ echo "ha"{a,b c}
So just say:
$ echo "ha"{a,b\
> c}
haa habc
And you'll see this when moving up:
$ echo "ha"{a,b,c}
Another example:
$ cat touch_files.sh
touch X{1,\
2,3}
$ bash touch_files.sh
$ ls X*
X1 X2 X3

Thus I accepted #123's answer, here's the one I choosed :
mkdir -p "$DEST_ROOT_PATH/"{"$DEST_DIR1","$DEST_DIR2"}
mkdir -p "$DEST_ROOT_PATH/"{"$DEST_DIR3","$DEST_DIR4"}
There are not a lot of destination directories here, so I think it's a good balance between the fancy-and-disgusting hack and the frustrating backslash which breaks the indentation.

I would do it is follows (though it only addresses your particular task of creating multiple directories and doesn't answer the question as stated in the title):
for d in \
"$DEST_DIR1" \
"$DEST_DIR2" \
... \
"$DEST_DIRn" \
;
do
mkdir -p "$DEST_ROOT_PATH/$d"
done
The advantage of this approach is that maintaining the list is a little easier.
In general, you should stop sticking to syntactic sugar when you notice that it starts causing inconveniences.

Delete all comments in a file using sed

How would you delete all comments using sed from a file(defined with #) with respect to '#' being in a string?
This helped out a lot except for the string portion.

If # always means comment, and can appear anywhere on a line (like after some code):
sed 's:#.*$::g' <file-name>
If you want to change it in place, add the -i switch:
sed -i 's:#.*$::g' <file-name>
This will delete from any # to the end of the line, ignoring any context. If you use # anywhere where it's not a comment (like in a string), it will delete that too.
If comments can only start at the beginning of a line, do something like this:
sed 's:^#.*$::g' <file-name>
If they may be preceded by whitespace, but nothing else, do:
sed 's:^\s*#.*$::g' <file-name>
These two will be a little safer because they likely won't delete valid usage of # in your code, such as in strings.
Edit:
There's not really a nice way of detecting whether something is in a string. I'd use the last two if that would satisfy the constraints of your language.
The problem with detecting whether you're in a string is that regular expressions can't do everything. There are a few problems:
Strings can likely span lines
A regular expression can't tell the difference between apostrophies and single quotes
A regular expression can't match nested quotes (these cases will confuse the regex):
# "hello there"
# hello there"
"# hello there"
If double quotes are the only way strings are defined, double quotes will never appear in a comment, and strings cannot span multiple lines, try something like this:
sed 's:#[^"]*$::g' <file-name>
That's a lot of pre-conditions, but if they all hold, you're in business. Otherwise, I'm afraid you're SOL, and you'd be better off writing it in something like Python, where you can do more advanced logic.

This might work for you (GNU sed):
sed '/#/!b;s/^/\n/;ta;:a;s/\n$//;t;s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta;s/\n\([^#]\)/\1\n/;ta;s/\n.*//' file
/#/!b if the line does not contain a # bail out
s/^/\n/ insert a unique marker (\n)
ta;:a jump to a loop label (resets the substitute true/false flag)
s/\n$//;t if marker at the end of the line, remove and bail out
s/\n\(\("[^"]*"\)\|\('\''[^'\'']*'\''\)\)/\1\n/;ta if the string following the marker is a quoted one, bump the marker forward of it and loop.
s/\n\([^#]\)/\1\n/;ta if the character following the marker is not a #, bump the marker forward of it and loop.
s/\n.*// the remainder of the line is comment, remove the marker and the rest of line.

Since there is no sample input provided by asker, I will assume a couple of cases and Bash is the input file because bash is used as the tag of the question.
Case 1: entire line is the comment
The following should be sufficient enough in most case:
sed '/^\s*#/d' file
It matches any line has which has none or at least one leading white-space characters (space, tab, or a few others, see man isspace), followed by a #, then delete the line by d command.
Any lines like:
# comment started from beginning.
# any number of white-space character before
# or 'quote' in "here"
They will be deleted.
But
a="foobar in #comment"
will not be deleted, which is the desired result.
Case 2: comment after actual code
For example:
if [[ $foo == "#bar" ]]; then # comment here
The comment part can be removed by
sed "s/\s*#*[^\"']*$//" file
[^\"'] is used to prevent quoted string confusion, however, it also means that comments with quotations ' or " will not to be removed.
Final sed
sed "/^\s*#/d;s/\s*#[^\"']*$//" file

To remove comment lines (lines whose first non-whitespace character is #) but not shebang lines (lines whose first characters are #!):
sed '/^[[:space:]]*#[^!]/d; /#$/d' file
The first argument to sed is a string containing a sed program consisting of two delete-line commands of the form /regex/d. Commands are separated by ;. The first command deletes comment lines but not shebang lines. The second command deletes any remaining empty comment lines. It does not handle trailing comments.
The last argument to sed is a file to use as input. In Bash, you can also operate on a string variable like this:
sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${MYSTRING}"
Example:
# test.sh
S0=$(cat << HERE
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
HERE
)
printf "\nBEFORE removal:\n\n${S0}\n\n"
S1=$(sed '/^[[:space:]]*#[^!]/d; /#$/d' <<< "${S0}")
printf "\nAFTER removal:\n\n${S1}\n\n"
Output:
$ bash test.sh
BEFORE removal:
#!/usr/bin/env bash
# comment
# indented comment
echo 'FOO' # trailing comment
# last line is an empty, indented comment
#
AFTER removal:
#!/usr/bin/env bash
echo 'FOO' # trailing comment

Supposing "being in a string" means "occurs between a pair of quotes, either single or double", the question can be rephrased as "remove everything after the first unquoted #". You can define the quoted strings, in turn, as anything between two quotes, excepting backslashed quotes. As a minor refinement, replace the entire line with everything up through just before the first unquoted #.
So we get something like [^\"'#] for the trivial case -- a piece of string which is neither a comment sign, nor a backslash, nor an opening quote. Then we can accept a backslash followed by anything: \\. -- that's not a literal dot, that's a literal backslash, followed by a dot metacharacter which matches any character.
Then we can allow zero or more repetitions of a quoted string. In order to accept either single or double quotes, allow zero or more of each. A quoted string shall be defined as an opening quote, followed by zero or more of either a backslashed arbitrary character, or any character except the closing quote: "\(\\.\|[^\"]\)*" or similarly for single-quoted strings '\(\\.\|[^\']\)*'.
Piecing all of this together, your sed script could look something like this:
s/^\([^\"'#]*\|\\.\|"\(\\.\|[^\"]\)*"\|'\(\\.\|[^\']\)*'\)*\)#.*/\1/
But because it needs to be quoted, and both single and double quotes are included in the string, we need one more additional complication. Recall that the shell allows you to glue together strings like "foo"'bar' gets replaced with foobar -- foo in double quotes, and bar in single quotes. Thus you can include single quotes by putting them in double quotes adjacent to your single-quoted string -- '"foo"'"'" is "foo" in single quotes next to ' in double quotes, thus "foo"'; and "' can be expressed as '"' adjacent to "'". And so a single-quoted string containing both double quotes foo"'bar can be quoted with 'foo"' adjacent to "'bar" or, perhaps more realistically for this case 'foo"' adjacent to "'" adjacent to another single-quoted string 'bar', yielding 'foo'"'"'bar'.
sed 's/^\(\(\\.\|[^\#"'"'"']*\|"\(\\.\|[^\"]\)*"\|'"'"'\(\\.\|[^\'"'"']\)*'"'"'\)*\)#.*/\1/p' file
This was tested on Linux; on other platforms, the sed dialect may be slightly different. For example, you may need to omit the backslashes before the grouping and alteration operators.
Alas, if you may have multi-line quoted strings, this will not work; sed, by design, only examines one input line at a time. You could build a complex script which collects multiple lines into memory, but by then, switching to e.g. Perl starts to make a lot of sense.

As you have pointed out, sed won't work well if any parts of a script look like comments but actually aren't. For example, you could find a # inside a string, or the rather common $# and ${#param}.
I wrote a shell formatter called shfmt, which has a feature to minify code. That includes removing comments, among other things:
$ cat foo.sh
echo $# # inline comment
# lone comment
echo '# this is not a comment'
[mvdan#carbon:12] [0] [/home/mvdan]
$ shfmt -mn foo.sh
echo $#
echo '# this is not a comment'
The parser and printer are Go packages, so if you'd like a custom solution, it should be fairly easy to write a 20-line Go program to remove comments in the exact way that you want.

sed 's:^#\(.*\)$:\1:g' filename
Supposing the lines starts with single # comment, Above command removes all comments from file.

how to force to expand a concatenation of $path/*

I'm writing a script to remove some files, but I don't understand how asterisk expansions work. These are my attempts to solve my problem:
rm "$path"*.txt
rm "$path"/*.txt
rm "$path"{*}.txt
rm "$path"'*'
rm "/folder/folder\ with\ spaces/*.txt"
I also tried replacing double quotes (") with single quotes(') and backticks (`). After every script computation, I get an error because the * is not substitute. So now I have two questions:
Why is the asterisk not expanded?
What's the difference between the different quoting character (` " ' ...) ?

In single quoted nothing interesting happns. Not even $-variable expansion. Some of those you've tried should work (some depending on the variable content). And, really, * is most likely to be not expanded if there're no matches. Are you sure you got your names right?

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio