How to create new names for files with problematic characters for use in an existing bash scripted environment? - bash

The goal is to get rid of (by changing) filenames that give headaches for scripting by translating them to something else. The reason is that in this nearly 30 year Unix / Linux environment, with a lot of existing scripts that may not be "written correctly", a new, large and important cache of files arrived that have to be managed, and so, a colleague has asked me to write a script to help with "problematic filenames" and translate them. They've got a list of chars to turn into dots, such as the comma, and another list to turn into underscores, such as whitespace, as but two examples and ran into problems which I asked about over here.
I was using tr to do it, but commenters to it said I should perhaps ask just about this instead of how to get tr to work. So, I have!

Parameter expansion can do this for you.
Note that unlike when using tr (as requested on your other question), when using parameter expansion you don't need to use backslashes inside your character class definitions: put the expansion in double quotes and bash will treat the results of that expansion as literal.
#!/usr/bin/env bash
toDots='\,;:|+##$%^&*~'
toUnderscores='}{]['"'"'="()`!'
# requires bash 5+: if debug=1, then print what we would do instead of doing it
runOrDebug() {
if (( debug )); then
printf '%s\n' "${*#Q}"
else
"$#"
fi
}
renameFiles() {
local name subDots subBoth
for name; do
subDots=${name//["$toDots"]/.}
subBoth=${subDots//["$toUnderscores"]/_}
if [[ $subBoth != "$name" ]]; then
runOrDebug mv -- "$name" "$subBoth"
fi
done
}
debug=1 renameFiles '[/a],/;[p:r|o\b+lem#a#t$i%c]/#(%$^!/(e^n&t*ry)~='
Note that toUnderscores is (except for the single quote in the middle) in single quotes, so all the backslashes in it are part of the variable's data rather than being syntax; because globs use character class syntax from REs, they're parsed as POSIX regular expression character class syntax.
See a demonstration of the technique running at https://ideone.com/kKE7IJ

Related

Can a bash function be used to fully escape any string (including nested single quotes)?

I know that there are a number of ways to manually escape nested quotes of the same kind.
Example
echo 'this single quote '"'"' is escaped.'
There are several more ways to do this that are documented well on StackOverflow already, but I'm trying to take that concept and apply it as a function to escape everything. I have been unsuccessful several times to find an all in one escaping solution for Bash (ex. I've looked for a string library that includes escaping with no luck).
Here's one of my attempts:
function quote() {
printf '%s' "'${1//\'/\'"\'"\'}'"
}
The idea is that all single quotes will be replaced with the aforementioned escape style '"'"'. I've also tried doing the same thing using the escape style like so: \'. No luck.
Every response on this topic in my research that I've come across has made it sound like this isn't possible, but I'm not the type to accept that for an answer. To me, if you can echo a string like this: 'test \' test', then it should be expressible in a function too where the backslashes or otherwise are automatically added to escape the characters.
echo '~!##$%^&*()_+`1234567890-=qwertyuiop[]\QWERTYUIOP{}|ASDFGHJKL:"ZXCVBNM<>?zxcvbnm,./'
As shown above, none of the basic keys on a traditional English keyboard need to be escaped, but nesting the same type of quotes within itself requires it.
Can anyone shed some light on this? Am I missing something obvious or is this really that difficult?
You don't need to mimic the shell quoting, properly quoting the variables should be enough.
#! /bin/bash
tag () {
echo Setting tags to "$2".
}
while read tags ; do
tag --set "$tags"
done <<EOF
tag1,tag2,tag3
Tom's_Shoes
The_"best"
EOF
If the tags are in variables or an array and you use proper quoting, you should't need to do any extra escaping. See #choroba's answer.
But there are times when you need it and this will do the trick:
printf -v my_var '%q' "$myvar"
which replaces the contents of my_var with an shell-escaped version.

Renaming the file Directory which contains Space based on CSV in Shell

I need to rename the files inside the folder that has a space in it eg(Deco/main library/file1.txt )
code:
while IFS="," read orig new pat
do
mv -v $pat$new $pat$orig
done < new.csv
csv file:
newname,file1.txt,Deco/main\\\ library/
error:
mv: invalid option -- '\'
Welcome to Stackoverflow!
First: Use quotes around the use of variables. That means except in very rare occasions, you always should use "$foo" instead of $foo because if you are using the latter, the shell is supposed (and will) interpret spaces in the variables as word delimiters which you rarely want. Especially in your case you do not want it.
Second: Your CSV file seems to contain backslashes to quote the spaces. And some additional step seems to have added another level of quotation so than now you end up with three backslashes and a space for each original space. If this really is the case (please double check if what you wrote in your question is correct, otherwise my answer doesn't fit), you need to unquote this before you can use it.
There are security issues involved in using eval, so do not use it lightly (this disclaimer is necessary whenever proposing to use eval), but if you have trust in the input you are handling to not contain any nastinesses, then you can do this using this code:
while IFS="," read orig new pat
do
eval eval mv -v "$pat$new" "$pat$orig"
done < new.csv
Using this, two levels of quotation are evaluated (that's what eval does) before the mv command is executed.
I strongly suggest to do a dry run by adding echo before the mv first. Then instead of executing your commands they are merely printed first.

Printf splits a string at spaces using Bash [duplicate]

This question already has answers here:
Why a variable assignment replaces tabs with spaces
(2 answers)
Closed 7 years ago.
I'm having some troubles with the printf function in bash.
I wrote a little script on which I pass a name and two letters (such as "sh", "py", "ht") and it creates a file in the current working directory named "name.extension".
For instance, if I execute seed test py a file named test.py is created in the current working dir with the shebang #!/usr/bin/python3.
So far, so good, nothing fancy: I'm learning shell scripting and I thought this could be a simple exercise to test the knowledge gained so far.
The problem is when I want to create an HTML file. This is the function that I use:
creaHtml(){
head='<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
percorso=$CARTELLA_CORRENTE/$NOME_FILE.html
printf $head>>$percorso
chmod 755 $percorso
}
If I run, for instance, seed test ht the correct function (creaHtml) is called, test.html is created but if I try to look into it I only see:
<!--DOCTYPE
And nothing else.
This is the trace for that function:
[sviluppo:~/bin]$ seed test ht
+ creaHtml
+ head='<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
+ percorso=/home/sviluppo/bin/test.html
+ printf '<!--DOCTYPE' 'html-->\n<html>\n\t<head>\n\t\t<meta' 'charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
+ chmod 755 /home/sviluppo/bin/test.html
+ set +x
However, if I try to run printf '<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>' from the terminal, I see the correct output: the "skeleton" of an HTML file neatly displayed with indentation and everything. What am I missing here?
Try echo -e instead of printf. printf is for printing formatted strings. Since you didn't protect $head with quotes, bash splits the string to form the command. The first word (before first white space) forms the format string. The rest are just arguments for things you didn't specify to print.
echo -e "$head" > "$percorso"
The -e evaluates your \n into newlines. I changed your >> to > since it looks like you want this to be the whole file, rather than append to any existing file you might have.
You have to be careful with quotes in bash. One thing can become many things. This actually makes it more powerful, but it can be confusing for people learning. Notice that I also put the file name "$percorso" in double quotes too. This evaluates the variable and makes sure that it ends up as one thing. If you use single quotes, it will be one word, but not evaluated. Unlike Python, there is a big difference between single and double quotes.
If you want to use printf for compatibility as #chepner pointed out, just be sure to quote it:
printf "$head" > "$percorso"
Actually that is much simpler anyway.

how to escape paths to be executed with $( )?

I have program whose textual output I want to directly execute in a shell. How shall I format the output of this program such that the paths with spaces are accepted by the shell ?
$(echo ls /folderA/folder\ with\ spaces/)
Some more info: the program that generates the output is coded in Haskell (source). It's a simple program that keeps a list of my favorite commands. It prints the commands with 'cmdl -l'. I can then choose one command to execute with 'cmdl -g12' for command number 12. Thanks for pointing out that instead of $( ) use 'cmdl -g12 | bash', I wasn't aware of that...
How shall I format the output of this program such that the paths with
spaces are accepted by the shell ?
The shell cannot distinguish between spaces that are part of a path and spaces that are separator between arguments, unless those are properly quoted. Moreover, you actually need proper quoting using single quotes ('...') in order to "shield" all those characters combinations that might otherwise have special meaning for the shell (\, &, |, ||, ...).
Depending the language used for your external tool, their might be a library available for that purpose. As as example, Python has pipes.quote (shlex.quote on Python 3) and Perl has String::ShellQuote::shell_quote.
I'm not quite sure I understand, but don't you just want to pipe through the shell?
For a program called foo
$ foo | sh
To format output from your program so Bash won't try to space-separate them into arguments either update, probably easiest just to double-quote them with any normal quoting method around each argument, e.g.
mkdir "/tmp/Joey \"The Lips\" Fagan"
As you saw, you can backslash the spaces alternatively, but I find that less readable ususally.
EDIT:
If you may have special shell characters (&|``()[]$ etc), you'll have to do it the hard/proper way (with a specific escaper for your language and target - as others have mentioned.
It's not just spaces you need to worry about, but other characters such as [ and ] (glob a.k.a pathname-expansion characters) and metacharacters such as ;, &, (, ...
You can use the following approach:
Enclose the string in single quotes.
Replace existing single quotes in the string with '\'' (which effectively breaks the string into multiple parts with spliced in \-escaped single quotes; the shell then reassembles the parts into a single string).
Example:
I'm good (& well[1];) would encode to 'I'\''m good (& well[1]);'
Note how single-quoting allows literal use of the glob characters and metacharacters.
Since single quotes themselves can never be used within single-quoted strings (there's not even an escape), the splicing-in approach described above is needed.
As described by #mklement0, a safe algorithm is to wrap every argument in a pair of single quotes, and quote single quotes inside arguments as '\''. Here is a shell function that does it:
function quote {
typeset cmd="" escaped
for arg; do
escaped=${arg//\'/\'\\\'\'}
cmd="$cmd '$escaped'"
done
printf %s "$cmd"
}
$ quote foo "bar baz" "don't do it"
'foo' 'bar baz' 'don'\''t do it'

variable substitution removing quotes

I seem to have some difficulty getting what I want to work. Basically, I have a series of variables that are assigned strings with some quotes and \ characters. I want to remove the quotes to embed them inside a json doc, since json hates quotes using python dump methods.
I figured it would be easy. Just determine how to remove the characters easy and then write a simple for loop for the variable substitution, well it didn't work that way.
Here is what I want to do.
There is a variable called "MESSAGE23", it contains the following "com.centrify.tokend.cac", I want to strip out the quotes, which to me is easy, a simple echo $opt | sed "s/\"//g". When I do this from the command line:
$> MESSAGE23="com."apple".cacng.tokend is present"
$> MESSAGE23=`echo $MESSAGE23 | sed "s/\"//g"`
$> com.apple.cacng.tokend is present
This works. I get the properly formatted string.
When I then try to throw this into a loop, all hell breaks loose.
for i to {1..25}; do
MESSAGE$i=`echo $MESSAGE$i | sed "s/\"//g"`
done
This doesn't work (either it throws a bunch of indexes out or nothing), and I'm pretty sure I just don't know enough about arg or eval or other bash substitution variables.
But basically I want to do this for another set of variables with the same problems, where I strip out the quotes and incidentally the "\" too.
Any help would be greatly appreciated.
You can't do that. You could make it work using eval, but that introduces another level of quoting you have to worry about. Is there some reason you can't use an array?
MESSAGE=("this is MESSAGE[0]" "this is MESSAGE[1]")
MESSAGE[2]="I can add more, too!"
for (( i=0; i<${#MESSAGE[#]}; ++i )); do
echo "${MESSAGE[i]}"
done
Otherwise you need something like this:
eval 'echo "$MESSAGE'"$i"'"'
and it just gets worse from there.
First, a couple of preliminary problems: MESSAGE23="com."apple".cacng.tokend is present" will not embed double-quotes in the variable value, use MESSAGE23="com.\"apple\".cacng.tokend is present" or MESSAGE23='com."apple".cacng.tokend is present' instead. Second, you should almost always put double-quotes around variable expansions (e.g. echo "$MESSAGE23") to prevent parsing oddities.
Now, the real problems: the shell doesn't allow variable substitution on the left side of an assignment (i.e. MESSAGE$i=something won't work). Fortunately, it does allow this in a declare statement, so you can use that instead. Also, when the sees $MESSAGE$i it replaces it will the value of $MESSAGE followed by the value of $i; for this you need to use indirect expansion (`${!metavariable}').
for i in {1..25}; do
varname="MESSAGE$i"
declare $varname="$(echo "${!varname}" | tr -d '"')"
done
(Note that I also used tr instead of sed, but that's just my personal preference.)
(Also, note that #Mark Reed's suggestion of an array is really the better way to do this sort of thing.)

Resources