Open file with special characters in file name in Perl - macos

I am trying to open a file whose name contains special characters "(" and ")", parentheses, in a Perl script and I am unable to do so.
If I have a file name like abc_5(rt)_dgj.csv, is there a way to open this file in the script?
$inputfile = <STDIN>;
chomp($inputfile);
unless(-e $inputfile) {
usage("Input file not present\n");
exit 2;
}
system("cat $inputfile | tr '\r' '\n' > temp");
$inputfile = "temp";
# do something here, like copy the data from inputfile and write to another file
system("rm temp");

Counter-example
There shouldn't be any problem. For example:
rd.pl
#!/usr/bin/env perl
use strict;
use warnings;
my $file = "abc_5(rt)_dgj.csv";
open my $fh, "<", $file or die "A horrible death";
my $line = <$fh>;
print $line;
close $fh;
Test
$ cat > "abc_5(rt)_dgj.csv"
Record 1,2014-12-30,19:43:21
$ perl rd.pl
Record 1,2014-12-30,19:43:21
$
Tested with Perl 5.22.0 on Mac OS X 10.10.5.
The real problem
Now the question has sample code, the real problem is obvious: the line
system("cat $inputfile | tr '\r' '\n' > temp");
runs
cat abc_5(rt)_dgj.csv | …
and that will generate (shell) syntax errors. If you try it for yourself at the command line, you'll see them too. You should enclose the file name in (single or) double quotes:
system(qq{cat "$inputfile" | tr '\r' '\n' > temp});
so that the parentheses are not exposed in the shell.
When you hide what's going on — not quoting the code, not quoting the error message — you make it hard for people to help you even though the problem is easy to resolve. Creating an MCVE (How to create a Minimal, Complete, and Verifiable Example?), as you did in the end, helps enormously. It wastes less of your time, and it wastes less of our time.
Even weirder characters in file names — there's a module for that!
Having a pair of parentheses (or even one parenthesis) in a file name of itself doesn't cause any trouble if the name is quoted sufficiently. As I noted, double quotes are adequate for this specific file name (and a good many others, even ones containing special characters); as ikegami noted, there are other names that will cause problems. Specifically, names containing dollar signs, back quotes or double quotes will cause grief with double quotes; names containing single quotes will cause grief with single quotes.
Ikegami also notes that there is a module, String::ShellQuote, which is designed to deal with these problems. It is not a Perl core module so you would have to install it yourself, but using it would be sensible if you have to deal with any perverse name that the user might throw your way.
The documentation for the shell_quote function from the module says:
If any string can't be safely quoted shell_quote will croak.
I'm not clear what strings can't be safely quoted; the documentation doesn't elaborate on the issue. (croak is a method from the core module called Carp.)

Related

replace pattern with newline in shell variable

A script save.sh uses 'cp' and outputs its cp errors to an errors file. Mostly these errors are due to the origin filesystem being EXT4 and the destination filesystem being NTFS or FAT and doesnt accept some specia characters.
Another script onerrors.sh reads the error file so as to best manage files that could not be copied : it copies them toward a crafted filename file that's OK for FAT and NTFS = where "bad" characters have been replaced with '_'.
That works fine for allmost all of the errors, but not 100%.
In this error file, the special characters in the filenames seem to be multiple times escaped :
simple quotes ' appear as '\''.
\n (real newline in filenames!) appear as '$'\n'' (7 glyphs !)
I want to unescape these so as to get the filename.
I convert quotes back to ' with line=${line//\'\\\'\'/\'}. That's OK.
But how can i convert the escaped newline back to a real unescaped \n in $line variable = how can i replace the '$'\n'' to unescaped \n in variable ?
The issue is not in recognising the pattern but in inserting a real newline. I've not been able to do it using same variable expansion syntax.
What other tool is advised or other way of doing it ?
The question is:
how can i replace the '$'\n'' to unescaped \n in variable ?
That's simple:
var="def'$'\n''abc"
echo "${var//\'$\'\\n\'\'/$'\n'}"
I think I remember, that using ANSI C quoting inside variable expansion happened to be buggy in some version of bash. Use a temporary variable in such cases.
What other tool is advised or other way of doing it ?
For string replacement in shell, the most popular tools are sed (which the name literally comes from "String EDitor") and awk. Writing a parser is better done in full-blown programming languages, like Python, C, C++ and similar.
The only way to decode cp output correctly, is to see cp source code, see how it quotes the filenames, and decode it in the same way. Note that it may change between cp flavors and versions, so such a tool may need to query cp version and will be not portable.
Note that parsing cp output is a very very very very bad idea. cp output is in no way standardized, may change anytime and is meant for humans to read. Instead, strongly consider rewriting save.sh to copy file by file and in case of cp returning non-zero exit status, write the filename yourself in an "errors file" as a zero separated stream.
# save.sh
find .... -print0 |
while IFS= read -d '' -r file; do
if ! cp "$file" "$dst"; then
printf "%s\0" "$file" > errorsfile
fi
done
# onerrors.sh
while IFS= read -d '' -r file; do
echo "do something with $file"
done < errorsfile

sed substitution: substitute string is a variable needing expansion AND contains slashes

I am fighting with sed to do a substitution where the substitute string contains slashes. This general topic has been discussed on stack overflow before. But, AFAICT, I have anew wrinkle that hasn't been addressed in previous questions.
Let's say I have a file, ENVIRO.tpml, which has several lines, one of which is
Loaded modules: SUPPLY_MODULES_HERE
I want to replace SUPPLY_MODULES_HERE in an automated fashion with a list of loaded modules. (At this point, if anyone has a better way to do this than sed, please let me know!) My first effort here is to define an environment variable and use sed to put it into the file:
> modules=$(module list 2>&1)
> sed "s/SUPPLY_MODULES_HERE/${modules}/" ENVIRO.tmpl > ENVIRO.txt
(The 2>&1 being needed because module list sends its output to STDERR, for reasons I can't begin to understand.) However, as is often the case, the modules have slashes in them. For example
> echo ${modules}
gcc/9.2.0 mpt/2.20
The slashes kill my command because sed can't understand the expression and thinks my substitution command is "unterminated".
So I do the usual thing and use some other character for the command delimiter:
> modules=$(module list 2>&1)
> sed "s|SUPPLY_MODULES_HERE|${modules}|" ENVIRO.tmpl > ENVIRO.txt
and I still get an "unterminated 's'" error.
So I replace double quotes with single quotes:
> sed 's|SUPPLY_MODULES_HERE|${modules}|' ENVIRO.tmpl > ENVIRO.txt
and now I get no error, but the line in ENVIRO.txt looks like
Loaded modules: ${modules}
Not what I was hoping for.
So, AFAICT, I need double quotes to expand the variable, but I need single quotes to make the alternative delimiters work. But I need both at the same time. How do I get this?
UPDATE: Gordon Davisson's comment below got to the root of the matter: "echo ${modules} can be highly misleading". Examining $modules with declare -p shows that it actually has a newline (or, more generally, some kind of line break) in it. What I did was add an extra step to extract newlines out of the variable. With that change, everything worked fine. An alternative would be to convince sed to expand the variable with line breaks and substitute it as such into the text, but I haven't been able to make that work. Any takers?
sed is not the best tool here due to use of regex and delimiters.
Better to use awk command that doesn't require any regular expression.
awk -v kw='SUPPLY_MODULES_HERE' -v repl="$(module list 2>&1)" '
n = index($0, kw) {
$0 = substr($0, 1, n-1) repl substr($0, n+length(kw))
} 1
' file
index function uses plain string search in awk.
substr function is used to get substring before and after the search keyword.

removing backslash with tr

So Im removing special characters from filenames and replacing with spaces. I have all working apart from files with single backslashes contained therein.
Note these files are created in the Finder on OS X
old_name="testing\this\folder"
new_name=$(echo $old_name | tr '<>:\\#%|?*' ' ');
This results in new_name being "testing hisolder"
How can I just removed the backslashes and not the preceding character?
This results in new_name being "testing hisolder"
This string looks like the result of echo -e "testing\this\folder", because \t and \f are actually replaced with the tabulation and form feed control characters.
Maybe you have an alias like alias echo='echo -e', or maybe the implementation of echo in your version of the shell interprets backslash escapes:
POSIX does not require support for any options, and says that the
behavior of ‘echo’ is implementation-defined if any STRING contains a
backslash or if the first argument is ‘-n’. Portable programs can use
the ‘printf’ command if they need to omit trailing newlines or output
control characters or backslashes.
(from the info page)
So you should use printf instead of echo in new software. In particular, echo $old_name should be replaced with printf %s "$old_name".
There is a good explanation in this discussion, for instance.
No need for printf
As #mklement0 suggested, you can avoid the pipe by means of the Bash here string:
tr '<>:\\#%|?*' ' ' <<<"$old_name"
Ruslan's excellent answer explains why your command may not be working for you and offers a robust, portable solution.
tl;dr:
You probably ran your code with sh rather than bash (even though on macOS sh is Bash in disguise), or you had shell option xpg_echo explicitly turned on.
Use printf instead of echo for portability.
In Bash, with the default options and using the echo builtin, your command should work as-is (except that you should double-quote $old_name for robustness), because echo by default does not expand escape sequences such as \t in its operands.
However, Bash's echo can be made to expand control-character escape sequences:
explicitly, by executing shopt -s xpg_echo
implicitly, if you run Bash as sh or with the --posix option (which, among other options and behavior changes, activates xpg_echo)
Thus, your symptom may have been caused by running your code from a script with shebang line #!/bin/sh, for instance.
However, if you're targeting sh, i.e., if you're writing a portable script, then echo should be avoided altogether for the very reason that its behavior differs across shells and platforms - see Ruslan's printf solution.
As an aside: perhaps a more robust approach to your tr command is a whitelisting approach: stating only the characters that are explicitly allowed in your result, and excluding other with the -C option:
old_name='testing\this\folder'
new_name=$(printf '%s' "$old_name" | tr -C '[:alnum:]_-' ' ')
That way, any characters that aren't either letters, numbers, _, or - are replaced with a space.
With Bash, you can use parameter expansion:
$ old_name="testing\this\folder"
$ new_name=${old_name//[<>:\\#%|?*]/ }
$ echo $new_name
testing this folder
For more, please refer to the Bash manual on shell parameter expansion.
I think your test case is missing proper escaping for \, so you're not really testing the case of a backslash contained in a string.
This worked for me:
old_name='testing\\this\\folder'
new_name=$(echo $old_name | tr '<>:\\#%|?*' ' ');
echo $new_name
# testing this folder

Convert function arguments from upper to lowercase in bash

I'm trying to make a convenience function to fix issues when I accidentally have my caps locks on and am trying to run some case-sensitive tools.
e.g. I occasionally find myself typing MAKE DEBUG instead of make debug.
What I have now is pretty straightforward: alias MAKE="make" and editing the makefiles to duplicate the rules, e.g. DEBUG: debug.
I'd prefer a solution that works on the arguments, without having to modify the tools involved.
Using GNU sed
If you just want everything in your makefile to be in lowercase, you can use GNU sed to lowercase the whole thing:
sed -i 's/.*/\L&/' Makefile
You could also build a sed script that's a little more discriminating, but the \L replacement escape is your friend.
Using tr
Since you tagged your question with tr, you might also want a tr solution. It's a little more cumbersom, since tr won't do in-place translations, but you could shuffle temp files or use sponge from moreutils. For example:
tr '[[:upper:]]' '[[:lower:]]' < Makefile | sponge Makefile
This involves a script, but avoids the Ctrl-D issue of my earlier attempt:
For each command, an alias like
alias MAKE="casefixer make"
And then the following file, which I've created at /usr/local/bin/casefixer:
#!/bin/bash
command=`echo $1 | tr '[:upper:]' '[:lower:]'` # convert 1st arg to lowercase, it's the command to invoke
shift # remove 1st arg from $*
$command `echo "$*" | tr '[:upper:]' '[:lower:]'` # convert arguments to lowercase, and invoke the command with them
Playing on #Clayton Hughes' casefixer solution, here's a solution that'll handle funny things like spaces in arguments (which $* messes up):
casefixer() { eval "$(printf "%q " "$#" | tr '[:upper:]' '[:lower:]')"; }
alias MAKE='casefixer make'
Note: eval is a fairly dangerous thing, with a well-deserved reputation for causing really bizarre bugs. In this case, however, the combination of double-quoting and encoding the command and its arguments with %q should prevent trouble. At least, I couldn't find a case where it did anything unexpected.
Here's one solution, though it's not perfect:
alias MAKE="make `tr '[:upper:]' '[:lower:]`"
it works, but has the unfortunate problem that I need to press Ctrl-D to send an EOF before anything starts executing.
The readline command "downcase-word" (bound to M-u by default) is worth mentioning here. Suppose you typed "MAKE DEBUG". If you catch it before hitting return, you can move the cursor to the beginning of the line with C-a. (Otherwise, bring the command back first, using the up arrow). Then, each time you hit M-u, the word immediately after the cursor will be changed to lowercase, and the cursor will move to the beginning of the next word.
It's a little laborious, and I don't see a way to lowercase the entire line at once. Perhaps someone can improve on this.

How do you escape a user-provided search term that you don't want evaluated for sed?

I'm trying to escape a user-provided search string that can contain any arbitrary character and give it to sed, but can't figure out how to make it safe for sed to use. In sed, we do s/search/replace/, and I want to search for exactly the characters in the search string without sed interpreting them (e.g., the '/' in 'my/path' would not close the sed expression).
I read this related question concerning how to escape the replace term. I would have thought you'd do the same thing to the search, but apparently not because sed complains.
Here's a sample program that creates a file called "my_searches". Then it reads each line of that file and performs a search and replace using sed.
#!/bin/bash
# The contents of this heredoc will be the lines of our file.
read -d '' SAMPLES << 'EOF'
/usr/include
P#$$W0RD$?
"I didn't", said Jane O'Brien.
`ls -l`
~!##$%^&*()_+-=:'}{[]/.,`"\|
EOF
echo "$SAMPLES" > my_searches
# Now for each line in the file, do some search and replace
while read line
do
echo "------===[ BEGIN $line ]===------"
# Escape every character in $line (e.g., ab/c becomes \a\b\/\c). I got
# this solution from the accepted answer in the linked SO question.
ES=$(echo "$line" | awk '{gsub(".", "\\\\&");print}')
# Search for the line we read from the file and replace it with
# the text "replaced"
sed 's/'"$ES"'/replaced/' < my_searches # Does not work
# Search for the text "Jane" and replace it with the line we read.
sed 's/Jane/'"$ES"'/' < my_searches # Works
# Search for the line we read and replace it with itself.
sed 's/'"$ES"'/'"$ES"'/' < my_searches # Does not work
echo "------===[ END ]===------"
echo
done < my_searches
When you run the program, you get sed: xregcomp: Invalid content of \{\} for the last line of the file when it's used as the 'search' term, but not the 'replace' term. I've marked the lines that give this error with # Does not work above.
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: xregcomp: Invalid content of \{\}
------===[ END ]===------
If you don't escape the characters in $line (i.e., sed 's/'"$line"'/replaced/' < my_searches), you get this error instead because sed tries to interpret various characters:
------===[ BEGIN ~!##$%^&*()_+-=:'}{[]/.,`"| ]===------
sed: bad format in substitution expression
sed: No previous regexp.
------===[ END ]===------
So how do I escape the search term for sed so that the user can provide any arbitrary text to search for? Or more precisely, what can I replace the ES= line in my code with so that the sed command works for arbitrary text from a file?
I'm using sed because I'm limited to a subset of utilities included in busybox. Although I can use another method (like a C program), it'd be nice to know for sure whether or not there's a solution to this problem.
This is a relatively famous problem—given a string, produce a pattern that matches only that string. It is easier in some languages than others, and sed is one of the annoying ones. My advice would be to avoid sed and to write a custom program in some other language.
You could write a custom C program, using the standard library function strstr. If this is not fast enough, you could use any of the Boyer-Moore string matchers you can find with Google—they will make search extremely fast (sublinear time).
You could write this easily enough in Lua:
local function quote(s) return (s:gsub('%W', '%%%1')) end
local function replace(first, second, s)
return (s:gsub(quote(first), second))
end
for l in io.lines() do io.write(replace(arg[1], arg[2], l), '\n') end
If not fast enough, speed things up by applying quote to arg[1] only once, and inline frunciton replace.
As ghostdog mentioned, awk '{gsub(".", "\\\\&");print}' is incorrect because it escapes out non-special characters. What you really want to do is perhaps something like:
awk 'gsub(/[^[:alpha:]]/, "\\\\&")'
This will escape out non-alpha characters. For some reason I have yet to determine, I still cant replace "I didn't", said Jane O'Brien. even though my code above correctly escapes it to
\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\.
It's quite odd because this works perfectly fine
$ echo "\"I didn't\", said Jane O'Brien." | sed s/\"I\ didn\'t\"\,\ said\ Jane\ O\'Brien\./replaced/
replaced`
this : echo "$line" | awk '{gsub(".", "\\\\&");print}' escapes every character in $line, which is wrong!. do an echo $ES after that and $ES appears to be \/\u\s\r\/\i\n\c\l\u\d\e. Then when you pass to the next sed, (below)
sed 's/'"$ES"'/replaced/' my_searches
, it will not work because there is no line that has pattern \/\u\s\r\/\i\n\c\l\u\d\e. The correct way is something like:
$ sed 's|\([#$#^&*!~+-={}/]\)|\\\1|g' file
\/usr\/include
P\#\$\$W0RD\$?
"I didn't", said Jane O'Brien.
\`ls -l\`
\~\!\#\#\$%\^\&\*()_\+-\=:'\}\{[]\/.,\`"\|
you put all the characters you want escaped inside [], and choose a suitable delimiter for sed that is not in your character class, eg i chose "|". Then use the "g" (global) flag.
tell us what you are actually trying to do, ie an actual problem you are trying to solve.
This seems to work for FreeBSD sed:
# using FreeBSD & Mac OS X sed
ES="$(printf "%q" "${line}")"
ES="${ES//+/\\+}"
sed -E s$'\777'"${ES}"$'\777'replaced$'\777' < my_searches
sed -E s$'\777'Jane$'\777'"${line}"$'\777' < my_searches
sed -E s$'\777'"${ES}"$'\777'"${line}"$'\777' < my_searches
The -E option of FreeBSD sed is used to turn on extended regular expressions.
The same is available for GNU sed via the -r or --regexp-extended options respectively.
For the differences between basic and extended regular expressions see, for example:
http://www.gnu.org/software/sed/manual/sed.html#Extended-regexps
Maybe you can use FreeBSD-compatible minised instead of GNU sed?
# example using FreeBSD-compatible minised,
# http://www.exactcode.de/site/open_source/minised/
# escape some punctuation characters with printf
help printf
printf "%s\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
printf "%q\n" '!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~'
# example line
line='!"#$%&'"'"'()*+,-./:;<=>?#[\]^_`{|}~ ... and Jane ...'
# escapes in regular expression
ES="$(printf "%q" "${line}")" # escape some punctuation characters
ES="${ES//./\\.}" # . -> \.
ES="${ES//\\\\(/(}" # \( -> (
ES="${ES//\\\\)/)}" # \) -> )
# escapes in replacement string
lineEscaped="${line//&/\&}" # & -> \&
minised s$'\777'"${ES}"$'\777'REPLACED$'\777' <<< "${line}"
minised s$'\777'Jane$'\777'"${lineEscaped}"$'\777' <<< "${line}"
minised s$'\777'"${ES}"$'\777'"${lineEscaped}"$'\777' <<< "${line}"
To avoid potential backslash confusion, we could (or rather should) use a backslash variable like so:
backSlash='\\'
ES="${ES//${backSlash}(/(}" # \( -> (
ES="${ES//${backSlash})/)}" # \) -> )
(By the way using variables in such a way seems like a good approach for tackling parameter expansion issues ...)
... or to complete the backslash confusion ...
backSlash='\\'
lineEscaped="${line//${backSlash}/${backSlash}}" # double backslashes
lineEscaped="${lineEscaped//&/\&}" # & -> \&
If you have bash, and you're just doing a pattern replacement, just do it natively in bash. The ${parameter/pattern/string} expansion in Bash will work very well for you, since you can just use a variable in place of the "pattern" and replacement "string" and the variable's contents will be safe from word expansion. And it's that word expansion which makes piping to sed such a hassle. :)
It'll be faster than forking a child process and piping to sed anyway. You already know how to do the whole while read line thing, so creatively applying the capabilities in Bash's existing parameter expansion documentation can help you reproduce pretty much anything you can do with sed. Check out the bash man page to start...

Resources