Allowing punctuation characters in directory and file names in bash - bash

What techniques or principles should I use in a bash script to handle directories and filenames that are allowed to contain as many as possible of
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
and space?
I guess / is not a valid filename or directory name character in most linux/unix systems?
So far I have had problems with !, ;, |, (a space character) and ' in filenames.

You are right, / is not valid, as is the null-byte \0. There is no way around that limitation (besides file system hacking).
All other characters can be used in file names, including such surprising characters as a newline \n or a tab \t. There are many ways to enter them so that the shell does not understand them as special characters. I will give just a pragmatic approach.
You can enter most of the printable characters by using the singlequote ' to to quote them:
date > 'foo!bar["#$%&()*+,-.:;<=>?#[\]^_`{|}~'
Of course, you cannot enter a singlequote this way, but for this you can use the doublequote ":
date > "foo'bar"
If you need to have both, you can end one quotation and start another:
date > "foo'bar"'"bloh'
Alternatively you also can use the backslash \ to escape the special character directly:
date > foo\"bar
The backslash also works as an escaper withing doublequotes, it does not work that way within singlequotes (there it is a simple character without special meaning).
If you need to enter non-printable characters like a newline, you can use the dollar-singlequote notation:
date > $'foo\nbar'
This is valid in bash, but not necessarily in all other shells. So take care!
Finally, it can make sense to use a variable to keep your strange name (in order not to have to spell it out directly:
strangeName=$(xxd -r <<< "00 41 42 43 ff 45 46")
date > "$strangeName"
This way you can keep the shell code readable.
BUT in general it is not a good idea to have such characters in file names because a lot of scripts cannot handle such files properly.
To write scripts fool-proof is not easy. The most basic rule is the quote variable usage in doublequotes:
for i in *
do
cat "$i" | wc -l
done
This will solve 99% of the issues you are likely to encounter.
If you are using find to find directory entries which can contain special characters, you should use printf0 to separate the output not by spaces but by null-bytes. Other programs like xargs often can understand a list of null-byte separated file names.
If your file name can start with a dash - it often can be mistaken as an option. Some programs allow giving the special option -- to state that all following arguments are no options. The more general approach is to use a name which does not start with a dash:
for i in *
do
cat ./"$i" | wc -l
done
This way, a file named -n will not run cat -n but cat ./-n which will not be understood as the option -n given to cat (which would mean "number lines").

Always quote your variable substitutions. I.e. not cp $source $target, but cp "$source" "$target". This way they won't be subject to word splitting and pathname expansion.
Specify "--" before positional arguments to file operation commands. I.e. not cp "$source" "$target", but cp -- "$source" "$target". This prevents interpreting file names starting with dash as options.
And yes, "/" is not a valid character for file/directory names.

Related

replace pattern with newline in shell variable

A script save.sh uses 'cp' and outputs its cp errors to an errors file. Mostly these errors are due to the origin filesystem being EXT4 and the destination filesystem being NTFS or FAT and doesnt accept some specia characters.
Another script onerrors.sh reads the error file so as to best manage files that could not be copied : it copies them toward a crafted filename file that's OK for FAT and NTFS = where "bad" characters have been replaced with '_'.
That works fine for allmost all of the errors, but not 100%.
In this error file, the special characters in the filenames seem to be multiple times escaped :
simple quotes ' appear as '\''.
\n (real newline in filenames!) appear as '$'\n'' (7 glyphs !)
I want to unescape these so as to get the filename.
I convert quotes back to ' with line=${line//\'\\\'\'/\'}. That's OK.
But how can i convert the escaped newline back to a real unescaped \n in $line variable = how can i replace the '$'\n'' to unescaped \n in variable ?
The issue is not in recognising the pattern but in inserting a real newline. I've not been able to do it using same variable expansion syntax.
What other tool is advised or other way of doing it ?
The question is:
how can i replace the '$'\n'' to unescaped \n in variable ?
That's simple:
var="def'$'\n''abc"
echo "${var//\'$\'\\n\'\'/$'\n'}"
I think I remember, that using ANSI C quoting inside variable expansion happened to be buggy in some version of bash. Use a temporary variable in such cases.
What other tool is advised or other way of doing it ?
For string replacement in shell, the most popular tools are sed (which the name literally comes from "String EDitor") and awk. Writing a parser is better done in full-blown programming languages, like Python, C, C++ and similar.
The only way to decode cp output correctly, is to see cp source code, see how it quotes the filenames, and decode it in the same way. Note that it may change between cp flavors and versions, so such a tool may need to query cp version and will be not portable.
Note that parsing cp output is a very very very very bad idea. cp output is in no way standardized, may change anytime and is meant for humans to read. Instead, strongly consider rewriting save.sh to copy file by file and in case of cp returning non-zero exit status, write the filename yourself in an "errors file" as a zero separated stream.
# save.sh
find .... -print0 |
while IFS= read -d '' -r file; do
if ! cp "$file" "$dst"; then
printf "%s\0" "$file" > errorsfile
fi
done
# onerrors.sh
while IFS= read -d '' -r file; do
echo "do something with $file"
done < errorsfile

Shell : Display a folder's files content (ls) , search for a file name and display it

I have a folder which may contain several files. Among those files I have files like these:
test.xml
test.jar
test.jarGENERATED
dev.project.jar
...
and many other files. To get only the "dev.project.jar" I have executed:
ls | grep ^{{dev}}.*.jar$
This displays the file with its properties for me. However, I only want the file name (only the file name string)
How to rectify it??
ls and grep are both unnecessary here. The shell will show you any file name matches for a wildcard:
echo dev.*.jar
(ls dev.*.jar without options will do something similar per se; if you see anything more than the filename, perhaps you have stupidly defined alias ls='ls -l' or something like that?)
The argument to grep should be a regular expression; what you specified would match {{dev}} and not dev, though in the absence of quoting, your shell might have expanded the braces. The proper regex would be grep '^dev\..*\.jar$' where the single quotes protect the regex from any shell expansions, and . matches any character, and * repeats that character as many times as possible. To match a literal dot, we backslash-escape it.
Just printing a file name is rarely very useful; often times, you actually want something like
for file in ./dev.*.jar; do
echo "$file"
: probably do more things with "$file"
done
though if that's all you want, maybe prefer printf over echo, which also lets you avoid the loop:
printf '%s\n' dev.*.jar

Replacing a parameter in bash using sed

Trying to clean up several dozen redundant nagios config files, but sed isn't working for me (yes I'm fairly new to bash), here's the string I want to replace:
use store-service
host_name myhost
service_description HTTP_JVM_SYM_DS
check_command check_http!'-p 8080 -N -u /SymmetricDS/app'
check_interval 1
with this:
use my-template-service
host_name myhost
just the host_name should stay unchanged since it'll be different for each file. Any help will be greatly appreciated. Tried escaping the ' and !, but get this error -bash: !'-p: event not found
Thanks
Disclaimer: This question is somewhat light on info and rings a bit like "write my code for me". In good faith I'm assuming that it's not that, so I am answering in hopes that this can be used to learn more about text processing/regex substitutions in general, and not just to be copy-pasted somewhere and forgotten.
I suggest using perl instead of sed. While sed is often the right tool for the job, in this case I think Perl's better, for the following reasons:
Perl lets you easily do multi-line matches on a regex. This is possible with sed, but difficult (see this question for more info).
With multiple lines and complex delimiters and quote characters, sed starts to display different behavior depending on what platform you're using it on. For example, trying to do this with sed in "sorta multiline" mode gave me different results on OSX versus Linux (really GNU sed vs BSD sed). When using semi-advanced functionality like that, I'd stick with a tool that behaves consistently across platforms, which Perl does in this case.
Perl lets you deal with ASCII values and other special characters without a ton of "toothpick tower" escaping or subshelling. Since it's convenient to use ASCII values to match the single quotes in your pattern (we could use mixed double and single quotes instead, but that makes it harder to copy/paste this command into, say, a subshell or an eval'd part of a script), it's better to use a tool that supports this without extra hassle. It's possible with sed, but tricky; see this article for more info.
In sed/BRE, doing something as simple as a "one or more" match usually requires escaping special characters, aka [[:space:]]\{1,\}, which gets tedious. Since it's convenient to use a lot of repetition/grouping characters in this pattern, I prefer Perl for conciseness in this case, since it improves clarity of the matching code.
Perl lets you write comments in regex statements in one-liner mode via the x modifier. For big, multiline patterns like this one, having the pattern broken up and commented for readability really helps if you ever need to go back and change it. sed has comments too, but using them in single-pasteable-command mode (as opposed to a file of sed script code) can be tricky, and can result in less readable commands.
Anyway, following is the matcher I came up with. It's commented inline as much as I can make it, but the non-commented parts are explained here:
The -0777 switch tells perl to consume input files whole before processing them, rather than operating line-by-line. See perlrun for more info on this and the other flags. Thanks to #glennjackman for pointing this out in the comments on the original question!
The -p switch tells Perl to read STDIN until it sees a delimiter (which is end-of-input as set by -0777), run the program supplied, and print that program's return value before shutting down. Since our "program" is just a string substitution statement, its return value is the substituted string.
The -e switch tells perl to evaluate the next string argument for a program to run, rather than finding a script file or similar.
Input is piped from mytext.txt, which could be a file containing your pattern. You could also pipe input to Perl e.g. via cat mytext.txt | perl ... and it would work exactly the same way.
The regex modifiers work as follows: I use the multiline m modifier to match more than one \n-delimited statement, and the extended x modifier so we can have comments and turn off matching of literal whitespace, for clarity. You could get rid of comments and literal whitespace and splat it all into one line if you wanted, but good luck making any changes after you've forgotten what it does. See perlre for more info on these modifiers.
This command will replace the literal string you supplied, in a file that contains it (it can have more than just that string before/after it; only that block of text will be manipulated). It is less than literal in one minor way: it allows any number (one or more) of space characters between the first and second words in each line. If I remember Nagios configs, the number of spaces doesn't particularly matter anyway.
This command will not change the contents of a file it is supplied. If a file does not match the pattern, its contents will be printed out unchanged by this command. If it contains that pattern, the replaced contents will be printed out. You can write those contents to a new file, or do anything you like with them.
perl -0777pe '
# Use the pipe "|" character as an expression delimiter, since
# the pattern contains slashes.
s|
# 'use', one or more space-equivalent characters, and then 'store-service',
# on one line.
use \s+ store-service \n
# Open a capturing group.
(
# Capture the host name line in its entirety, then close the group.
host_name \s+ \S+
# Close the group and end the line.
) \n
service_description \s+ HTTP_JVM_SYM_DS \n
# Look for check_command, spaces, and check_http!, but keep matching on the
# same line.
check_command \s+ check_http!
# Look for a single quote character by ASCII value, since shell
# escaping these can be ugly/tricky, and makes your code less copy-
# pasteable in/out of scripts/subcommands.
\047
# Look for the arguments to check_http, delimited by explicit \s
# spaces, since we are in "extended" mode in order to be able to write
# these comments and the expression on multiple lines.
-p \s 8080 \s -N \s -u \s /SymmetricDS/app
# Look for another single quote and the end of the line.
\047 \n
check_interval \s+ 1\n
# Replace all of the matched text with the "use my-template-service" line,
# followed by the contents of the first matching group (the host_name line).
# You could capture the "use" statement in another group, or use e.g.
# sprintf() to align fields here instead of a big literal space line, but
# this is the simplest, most obvious way to get the replacement done.
|use my-template-service\n$1|mx
' < mytext.txt
Assuming you can glob the files to select on the log files of interest, I would first filter the files that you want to replace to be limited to five lines.
You can do that with Bash and awk:
for fn in *; do # make that glob apply to your files...
[[ -e "$fn" && -f "$fn" && -s "$fn" ]] || continue
line_cnt=$(awk 'FNR==NR{next}
END {print NR}' "$fn")
(( line_cnt == 5 )) || continue
# at this point you only have files with 5 lines of text...
done
Once you have done that, you can add another awk to the loop to make the replacements:
for fn in *; do
[[ -e "$fn" && -f "$fn" && -s "$fn" ]] || continue
line_cnt=$(awk -v l=5 'FNR==NR{next}
END {print NR}' "$fn")
(( line_cnt == 5 )) || continue
awk 'BEGIN{tgt["use"]="my-template-service"
tgt["host_name"]=""}
$1 in tgt { if (tgt[$1]=="") s=$2
else s=tgt[$1]
printf "%-33s%s\n", $1, s
}
' "$fn"
done
This is the GNU sed solution, check it. Backup your files before testing.
#!/bin/bash
# You should escape all special characters in this string (like $, ^, /, {, }, etc),
# which you need interpreted literally, not as regex - by the backslash.
# Your original string was contained only slashes from this list, but
# I decide don't escape them by backslashes, but change sed's s/pattern/replace/
# command to the s|patter|replace|. You can pick any more fittable character.
needle="use\s{1,}store-service\n\
host_name\s{1,}myhost\n\
service_description\s{1,}HTTP_JVM_SYM_DS\n\
check_command\s{1,}check_http!'-p 8080 -N -u /SymmetricDS/app'\n\
check_interval\s{1,}1"
replacement="use my-template-service\n\
host_name myhost"
# This echo command displays the generated substitute command,
# which will be used by sed
# uncomment it for viewing
# echo "s/$needle/$replacement/"
# for changing the file in place add the -i option.
sed -r "
/use\s{1,}store-service/ {
N;N;N;N;
s|$needle|$replacement|
}" input.txt
Input
one
two
use store-service
host_name myhost
service_description HTTP_JVM_SYM_DS
check_command check_http!'-p 8080 -N -u /SymmetricDS/app'
check_interval 1
three
four
Output
one
two
use my-template-service
host_name myhost
three
four

Shell: rsync parsing spaces incorrectly in file name/path

I'm trying to pull a list of files over ssh with rsync, but I can't get it to work with filenames that have spaces on it! One example file is this:
/home/pi/Transmission_Downloads/FUNDAMENTOS_JAVA_E_ORIENTAÇÃO_A_OBJETOS/2. Fundamentos da linguagem/estruturas-de-controle-if-else-if-e-else-v1.mp4
and I'm trying to transfer it using this shell code.
cat $file_name | while read LINE
do
echo $LINE
rsync -aP "$user#$server:$LINE" $local_folder
done
and the error I'm getting is this:
receiving incremental file list
rsync: link_stat "/home/pi/Transmission_Downloads/FUNDAMENTOS_JAVA_E_ORIENTAÇÃO_A_OBJETOS/2." failed: No such file or directory (2)
rsync: link_stat "/home/pi/Fundamentos" failed: No such file or directory (2)
rsync: link_stat "/home/pi/da" failed: No such file or directory (2)
rsync: change_dir "/home/pi//linguagem" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.0]
I don't get it why does it print OK on the screen, but parses the file name/path incorrectly! I know spaces are actually backslash with spaces, but don't know how to solve this. Sed (find/replace) didn't help either, and I also tried this code without success
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
rsync -aP "$user#$server:$line" $local_folder
done < $file_name
What should I do to fix this, and why this is happening?
I read the list of files from a .txt file (each file and path on one line), and I'm using ubuntu 14.04. Thanks!
rsync does space splitting by default.
You can disable this using the -s (or --protect-args) flag, or you can escape the spaces within the filename
The shell is correctly passing the filename to rsync, but rsync interprets spaces as separating multiple paths on the same server. So in addition to double-quoting the variable expansion to make sure rsync sees the string as a single argument, you also need to quote the spaces within the filename.
If your filenames don't have apostrophes in them, you can do that with single quotes inside the double quotes:
rsync -aP "$user#$server:'$LINE'" "$local_folder"
If your filenames might have apostrophes in them, then you need to quote those (whether or not the filenames also have spaces). You can use bash's built-in parameter substitution to do that (as long as you're on bash 4; older versions, such as the /bin/bash that ships on OS X, have issues with backslashes and apostrophes in such expressions). Here's what it looks like:
rsync -aP "$user#$server:'${LINE//\'/\'\\\'\'}'" "$local_folder"
Ugly, I know, but effective. Explanation follows after the other options.
If you're using an older bash or a different shell, you can use sed instead:
rsync -aP "$user#$server:'$(sed "s/'/'\\\\''/g" <<<"$LINE")'" "$local_folder"
... or if your shell also doesn't support <<< here-strings:
rsync -aP "$user#$server:'$(echo "$LINE" | sed "s/'/'\\\\''/g")'" "$local_folder"
Explanation: we want to replace all apostrophes with.. something that becomes a literal apostrophe in the middle of a single-quoted string. Since there's no way to escape anything inside single quotes, we have to first close the quotes, then add a literal apostrophe, and then re-open the quotes for the rest of the string. Effectively, that means we want to replace all occurrences of an apostrophe (') with the sequence (apostrophe, backslash, apostrophe, apostrophe): '\''. We can do that with either bash parameter expansion or sed.
In bash, ${varname/old/new} expands to the value of the variable $varname with the first occurrence of the string old replaced by the string new. Doubling the first slash ( ${varname//old/new} ) replaces all occurrences instead of just the first one. That's what we want here. But since both apostrophe and backslash are special to the shell, we have to put a(nother) backslash in front of every one of those characters in both expressions. That turns our old value into \', and our new one into \'\\\'\'.
The sed version is a little simpler, since apostrophes aren't special. Backslashes still are, so we have to put a \\ in the string to get a \ back. Since we want apostrophes in the string, it's easier to use a double-quoted string instead of a single-quoted one, but that means we need to double all the backslashes again to make sure the shell passes them on to sed unmolested. That's why the shell command has \\\\: that gets handed to sed as \\, which it outputs as \.

How to escape a previously unknown string in regular expression?

I need to egrep a string that isn't known before runtime and that I'll get via shell variable (shell is bash, if that matters). Problem is, that string will contain special characters like braces, spaces, dots, slashes, and so on.
If I know the string I can escape the special characters one at a time, but how can I do that for the whole string?
Running the string through a sed script to prefix each special character with \ could be an idea, I still need to rtfm how such a script should be written. I don't know if there are other, better, options.
I did read re_format(7) but it seems there is no such thing like "take the whole next string as literal"...
EDIT: to avoid false positives, I should also add newline detection to the pattern, eg. egrep '^myunknownstring'
If you need to embed the string into a larger expression, sed is how I would do it.
s_esc="$(echo "$s" | sed 's/[^-A-Za-z0-9_]/\\&/g')" # backslash special characters
inv_ent="$(egrep "^item [0-9]+ desc $s_esc loc .+$" inventory_list)"
Use the -F flag to make the PATTERN a fixed literal string
$ var="(.*+[a-z]){3}"
$ echo 'foo bar (.*+[a-z]){3} baz' | grep -F "$var" -o
(.*+[a-z]){3}
Are you trying to protect the string from being incorrectly interpreted as bash syntax or are you trying to protect parts of the string from being interpreted as regular expression syntax?
For bash protection:
grep supports the -f switch:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file contains zero patterns, and therefore matches nothing.
No escaping is necessary inside the file. Just make it a file containing a single line (and thus one pattern) which can be produced from your shell variable if that's what you need to do.
# example trivial regex
var='^r[^{]*$'
pattern=/tmp/pattern.$$
rm -f "$pattern"
echo "$var" > "$pattern"
egrep -f "$pattern" /etc/password
rm -f "$pattern"
Just to illustrate the point.
Try it with -F instead as another poster suggested for regex protection.

Resources