replace pattern with newline in shell variable - shell

A script save.sh uses 'cp' and outputs its cp errors to an errors file. Mostly these errors are due to the origin filesystem being EXT4 and the destination filesystem being NTFS or FAT and doesnt accept some specia characters.
Another script onerrors.sh reads the error file so as to best manage files that could not be copied : it copies them toward a crafted filename file that's OK for FAT and NTFS = where "bad" characters have been replaced with '_'.
That works fine for allmost all of the errors, but not 100%.
In this error file, the special characters in the filenames seem to be multiple times escaped :
simple quotes ' appear as '\''.
\n (real newline in filenames!) appear as '$'\n'' (7 glyphs !)
I want to unescape these so as to get the filename.
I convert quotes back to ' with line=${line//\'\\\'\'/\'}. That's OK.
But how can i convert the escaped newline back to a real unescaped \n in $line variable = how can i replace the '$'\n'' to unescaped \n in variable ?
The issue is not in recognising the pattern but in inserting a real newline. I've not been able to do it using same variable expansion syntax.
What other tool is advised or other way of doing it ?

The question is:
how can i replace the '$'\n'' to unescaped \n in variable ?
That's simple:
var="def'$'\n''abc"
echo "${var//\'$\'\\n\'\'/$'\n'}"
I think I remember, that using ANSI C quoting inside variable expansion happened to be buggy in some version of bash. Use a temporary variable in such cases.
What other tool is advised or other way of doing it ?
For string replacement in shell, the most popular tools are sed (which the name literally comes from "String EDitor") and awk. Writing a parser is better done in full-blown programming languages, like Python, C, C++ and similar.
The only way to decode cp output correctly, is to see cp source code, see how it quotes the filenames, and decode it in the same way. Note that it may change between cp flavors and versions, so such a tool may need to query cp version and will be not portable.
Note that parsing cp output is a very very very very bad idea. cp output is in no way standardized, may change anytime and is meant for humans to read. Instead, strongly consider rewriting save.sh to copy file by file and in case of cp returning non-zero exit status, write the filename yourself in an "errors file" as a zero separated stream.
# save.sh
find .... -print0 |
while IFS= read -d '' -r file; do
if ! cp "$file" "$dst"; then
printf "%s\0" "$file" > errorsfile
fi
done
# onerrors.sh
while IFS= read -d '' -r file; do
echo "do something with $file"
done < errorsfile

Related

Using bash, how to pass filename arguments to a command sorted by date and dealing with spaces and other special characters?

I am using the bash shell and want to execute a command that takes filenames as arguments; say the cat command. I need to provide the arguments sorted by modification time (oldest first) and unfortunately the filenames can contain spaces and a few other difficult characters such as "-", "[", "]". The files to be provided as arguments are all the *.txt files in my directory. I cannot find the right syntax. Here are my efforts.
Of course, cat *.txt fails; it does not give the desired order of the arguments.
cat `ls -rt *.txt`
The `ls -rt *.txt` gives the desired order, but now the blanks in the filenames cause confusion; they are seen as filename separators by the cat command.
cat `ls -brt *.txt`
I tried -b to escape non-graphic characters, but the blanks are still seen as filename separators by cat.
cat `ls -Qrt *.txt`
I tried -Q to put entry names in double quotes.
cat `ls -rt --quoting-style=escape *.txt`
I tried this and other variants of the quoting style.
Nothing that I've tried works. Either the blanks are treated as filename separators by cat, or the entire list of filenames is treated as one (invalid) argument.
Please advise!
Using --quoting-style is a good start. The trick is in parsing the quoted file names. Backticks are simply not up to the job. We're going to have to be super explicit about parsing the escape sequences.
First, we need to pick a quoting style. Let's see how the various algorithms handle a crazy file name like "foo 'bar'\tbaz\nquux". That's a file name containing actual single and double quotes, plus a space, tab, and newline to boot. If you're wondering: yes, these are all legal, albeit unusual.
$ for style in literal shell shell-always shell-escape shell-escape-always c c-maybe escape locale clocale; do printf '%-20s <%s>\n' "$style" "$(ls --quoting-style="$style" '"foo '\''bar'\'''$'\t''baz '$'\n''quux"')"; done
literal <"foo 'bar' baz
quux">
shell <'"foo '\''bar'\'' baz
quux"'>
shell-always <'"foo '\''bar'\'' baz
quux"'>
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
locale <‘"foo 'bar'\tbaz \nquux"’>
clocale <‘"foo 'bar'\tbaz \nquux"’>
The ones that actually span two lines are no good, so literal, shell, and shell-always are out. Smart quotes aren't helpful, so locale and clocale are out. Here's what's left:
shell-escape <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
shell-escape-always <'"foo '\''bar'\'''$'\t''baz '$'\n''quux"'>
c <"\"foo 'bar'\tbaz \nquux\"">
c-maybe <"\"foo 'bar'\tbaz \nquux\"">
escape <"foo\ 'bar'\tbaz\ \nquux">
Which of these can we work with? Well, we're in a shell script. Let's use shell-escape.
There will be one file name per line. We can use a while read loop to read a line at a time. We'll also need IFS= and -r to disable any special character handling. A standard line processing loop looks like this:
while IFS= read -r line; do ... done < file
That "file" at the end is supposed to be a file name, but we don't want to read from a file, we want to read from the ls command. Let's use <(...) process substitution to swap in a command where a file name is expected.
while IFS= read -r line; do
# process each line
done < <(ls -rt --quoting-style=shell-escape *.txt)
Now we need to convert each line with all the quoted characters into a usable file name. We can use eval to have the shell interpret all the escape sequences. (I almost always warn against using eval but this is a rare situation where it's okay.)
while IFS= read -r line; do
eval "file=$line"
done < <(ls -rt --quoting-style=shell-escape *.txt)
If you wanted to work one file at a time we'd be done. But you want to pass all the file names at once to another command. To get to the finish line, the last step is to build an array with all the file names.
files=()
while IFS= read -r line; do
eval "files+=($line)"
done < <(ls -rt --quoting-style=shell-escape *.txt)
cat "${files[#]}"
There we go. It's not pretty. It's not elegant. But it's safe.
Does this do what you want?
for i in $(ls -rt *.txt); do echo "FILE: $i"; cat "$i"; done

removing backslash with tr

So Im removing special characters from filenames and replacing with spaces. I have all working apart from files with single backslashes contained therein.
Note these files are created in the Finder on OS X
old_name="testing\this\folder"
new_name=$(echo $old_name | tr '<>:\\#%|?*' ' ');
This results in new_name being "testing hisolder"
How can I just removed the backslashes and not the preceding character?
This results in new_name being "testing hisolder"
This string looks like the result of echo -e "testing\this\folder", because \t and \f are actually replaced with the tabulation and form feed control characters.
Maybe you have an alias like alias echo='echo -e', or maybe the implementation of echo in your version of the shell interprets backslash escapes:
POSIX does not require support for any options, and says that the
behavior of ‘echo’ is implementation-defined if any STRING contains a
backslash or if the first argument is ‘-n’. Portable programs can use
the ‘printf’ command if they need to omit trailing newlines or output
control characters or backslashes.
(from the info page)
So you should use printf instead of echo in new software. In particular, echo $old_name should be replaced with printf %s "$old_name".
There is a good explanation in this discussion, for instance.
No need for printf
As #mklement0 suggested, you can avoid the pipe by means of the Bash here string:
tr '<>:\\#%|?*' ' ' <<<"$old_name"
Ruslan's excellent answer explains why your command may not be working for you and offers a robust, portable solution.
tl;dr:
You probably ran your code with sh rather than bash (even though on macOS sh is Bash in disguise), or you had shell option xpg_echo explicitly turned on.
Use printf instead of echo for portability.
In Bash, with the default options and using the echo builtin, your command should work as-is (except that you should double-quote $old_name for robustness), because echo by default does not expand escape sequences such as \t in its operands.
However, Bash's echo can be made to expand control-character escape sequences:
explicitly, by executing shopt -s xpg_echo
implicitly, if you run Bash as sh or with the --posix option (which, among other options and behavior changes, activates xpg_echo)
Thus, your symptom may have been caused by running your code from a script with shebang line #!/bin/sh, for instance.
However, if you're targeting sh, i.e., if you're writing a portable script, then echo should be avoided altogether for the very reason that its behavior differs across shells and platforms - see Ruslan's printf solution.
As an aside: perhaps a more robust approach to your tr command is a whitelisting approach: stating only the characters that are explicitly allowed in your result, and excluding other with the -C option:
old_name='testing\this\folder'
new_name=$(printf '%s' "$old_name" | tr -C '[:alnum:]_-' ' ')
That way, any characters that aren't either letters, numbers, _, or - are replaced with a space.
With Bash, you can use parameter expansion:
$ old_name="testing\this\folder"
$ new_name=${old_name//[<>:\\#%|?*]/ }
$ echo $new_name
testing this folder
For more, please refer to the Bash manual on shell parameter expansion.
I think your test case is missing proper escaping for \, so you're not really testing the case of a backslash contained in a string.
This worked for me:
old_name='testing\\this\\folder'
new_name=$(echo $old_name | tr '<>:\\#%|?*' ' ');
echo $new_name
# testing this folder

Shell: rsync parsing spaces incorrectly in file name/path

I'm trying to pull a list of files over ssh with rsync, but I can't get it to work with filenames that have spaces on it! One example file is this:
/home/pi/Transmission_Downloads/FUNDAMENTOS_JAVA_E_ORIENTAÇÃO_A_OBJETOS/2. Fundamentos da linguagem/estruturas-de-controle-if-else-if-e-else-v1.mp4
and I'm trying to transfer it using this shell code.
cat $file_name | while read LINE
do
echo $LINE
rsync -aP "$user#$server:$LINE" $local_folder
done
and the error I'm getting is this:
receiving incremental file list
rsync: link_stat "/home/pi/Transmission_Downloads/FUNDAMENTOS_JAVA_E_ORIENTAÇÃO_A_OBJETOS/2." failed: No such file or directory (2)
rsync: link_stat "/home/pi/Fundamentos" failed: No such file or directory (2)
rsync: link_stat "/home/pi/da" failed: No such file or directory (2)
rsync: change_dir "/home/pi//linguagem" failed: No such file or directory (2)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1655) [Receiver=3.1.0]
I don't get it why does it print OK on the screen, but parses the file name/path incorrectly! I know spaces are actually backslash with spaces, but don't know how to solve this. Sed (find/replace) didn't help either, and I also tried this code without success
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "Text read from file: $line"
rsync -aP "$user#$server:$line" $local_folder
done < $file_name
What should I do to fix this, and why this is happening?
I read the list of files from a .txt file (each file and path on one line), and I'm using ubuntu 14.04. Thanks!
rsync does space splitting by default.
You can disable this using the -s (or --protect-args) flag, or you can escape the spaces within the filename
The shell is correctly passing the filename to rsync, but rsync interprets spaces as separating multiple paths on the same server. So in addition to double-quoting the variable expansion to make sure rsync sees the string as a single argument, you also need to quote the spaces within the filename.
If your filenames don't have apostrophes in them, you can do that with single quotes inside the double quotes:
rsync -aP "$user#$server:'$LINE'" "$local_folder"
If your filenames might have apostrophes in them, then you need to quote those (whether or not the filenames also have spaces). You can use bash's built-in parameter substitution to do that (as long as you're on bash 4; older versions, such as the /bin/bash that ships on OS X, have issues with backslashes and apostrophes in such expressions). Here's what it looks like:
rsync -aP "$user#$server:'${LINE//\'/\'\\\'\'}'" "$local_folder"
Ugly, I know, but effective. Explanation follows after the other options.
If you're using an older bash or a different shell, you can use sed instead:
rsync -aP "$user#$server:'$(sed "s/'/'\\\\''/g" <<<"$LINE")'" "$local_folder"
... or if your shell also doesn't support <<< here-strings:
rsync -aP "$user#$server:'$(echo "$LINE" | sed "s/'/'\\\\''/g")'" "$local_folder"
Explanation: we want to replace all apostrophes with.. something that becomes a literal apostrophe in the middle of a single-quoted string. Since there's no way to escape anything inside single quotes, we have to first close the quotes, then add a literal apostrophe, and then re-open the quotes for the rest of the string. Effectively, that means we want to replace all occurrences of an apostrophe (') with the sequence (apostrophe, backslash, apostrophe, apostrophe): '\''. We can do that with either bash parameter expansion or sed.
In bash, ${varname/old/new} expands to the value of the variable $varname with the first occurrence of the string old replaced by the string new. Doubling the first slash ( ${varname//old/new} ) replaces all occurrences instead of just the first one. That's what we want here. But since both apostrophe and backslash are special to the shell, we have to put a(nother) backslash in front of every one of those characters in both expressions. That turns our old value into \', and our new one into \'\\\'\'.
The sed version is a little simpler, since apostrophes aren't special. Backslashes still are, so we have to put a \\ in the string to get a \ back. Since we want apostrophes in the string, it's easier to use a double-quoted string instead of a single-quoted one, but that means we need to double all the backslashes again to make sure the shell passes them on to sed unmolested. That's why the shell command has \\\\: that gets handed to sed as \\, which it outputs as \.

Open file with special characters in file name in Perl

I am trying to open a file whose name contains special characters "(" and ")", parentheses, in a Perl script and I am unable to do so.
If I have a file name like abc_5(rt)_dgj.csv, is there a way to open this file in the script?
$inputfile = <STDIN>;
chomp($inputfile);
unless(-e $inputfile) {
usage("Input file not present\n");
exit 2;
}
system("cat $inputfile | tr '\r' '\n' > temp");
$inputfile = "temp";
# do something here, like copy the data from inputfile and write to another file
system("rm temp");
Counter-example
There shouldn't be any problem. For example:
rd.pl
#!/usr/bin/env perl
use strict;
use warnings;
my $file = "abc_5(rt)_dgj.csv";
open my $fh, "<", $file or die "A horrible death";
my $line = <$fh>;
print $line;
close $fh;
Test
$ cat > "abc_5(rt)_dgj.csv"
Record 1,2014-12-30,19:43:21
$ perl rd.pl
Record 1,2014-12-30,19:43:21
$
Tested with Perl 5.22.0 on Mac OS X 10.10.5.
The real problem
Now the question has sample code, the real problem is obvious: the line
system("cat $inputfile | tr '\r' '\n' > temp");
runs
cat abc_5(rt)_dgj.csv | …
and that will generate (shell) syntax errors. If you try it for yourself at the command line, you'll see them too. You should enclose the file name in (single or) double quotes:
system(qq{cat "$inputfile" | tr '\r' '\n' > temp});
so that the parentheses are not exposed in the shell.
When you hide what's going on — not quoting the code, not quoting the error message — you make it hard for people to help you even though the problem is easy to resolve. Creating an MCVE (How to create a Minimal, Complete, and Verifiable Example?), as you did in the end, helps enormously. It wastes less of your time, and it wastes less of our time.
Even weirder characters in file names — there's a module for that!
Having a pair of parentheses (or even one parenthesis) in a file name of itself doesn't cause any trouble if the name is quoted sufficiently. As I noted, double quotes are adequate for this specific file name (and a good many others, even ones containing special characters); as ikegami noted, there are other names that will cause problems. Specifically, names containing dollar signs, back quotes or double quotes will cause grief with double quotes; names containing single quotes will cause grief with single quotes.
Ikegami also notes that there is a module, String::ShellQuote, which is designed to deal with these problems. It is not a Perl core module so you would have to install it yourself, but using it would be sensible if you have to deal with any perverse name that the user might throw your way.
The documentation for the shell_quote function from the module says:
If any string can't be safely quoted shell_quote will croak.
I'm not clear what strings can't be safely quoted; the documentation doesn't elaborate on the issue. (croak is a method from the core module called Carp.)

Allowing punctuation characters in directory and file names in bash

What techniques or principles should I use in a bash script to handle directories and filenames that are allowed to contain as many as possible of
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
and space?
I guess / is not a valid filename or directory name character in most linux/unix systems?
So far I have had problems with !, ;, |, (a space character) and ' in filenames.
You are right, / is not valid, as is the null-byte \0. There is no way around that limitation (besides file system hacking).
All other characters can be used in file names, including such surprising characters as a newline \n or a tab \t. There are many ways to enter them so that the shell does not understand them as special characters. I will give just a pragmatic approach.
You can enter most of the printable characters by using the singlequote ' to to quote them:
date > 'foo!bar["#$%&()*+,-.:;<=>?#[\]^_`{|}~'
Of course, you cannot enter a singlequote this way, but for this you can use the doublequote ":
date > "foo'bar"
If you need to have both, you can end one quotation and start another:
date > "foo'bar"'"bloh'
Alternatively you also can use the backslash \ to escape the special character directly:
date > foo\"bar
The backslash also works as an escaper withing doublequotes, it does not work that way within singlequotes (there it is a simple character without special meaning).
If you need to enter non-printable characters like a newline, you can use the dollar-singlequote notation:
date > $'foo\nbar'
This is valid in bash, but not necessarily in all other shells. So take care!
Finally, it can make sense to use a variable to keep your strange name (in order not to have to spell it out directly:
strangeName=$(xxd -r <<< "00 41 42 43 ff 45 46")
date > "$strangeName"
This way you can keep the shell code readable.
BUT in general it is not a good idea to have such characters in file names because a lot of scripts cannot handle such files properly.
To write scripts fool-proof is not easy. The most basic rule is the quote variable usage in doublequotes:
for i in *
do
cat "$i" | wc -l
done
This will solve 99% of the issues you are likely to encounter.
If you are using find to find directory entries which can contain special characters, you should use printf0 to separate the output not by spaces but by null-bytes. Other programs like xargs often can understand a list of null-byte separated file names.
If your file name can start with a dash - it often can be mistaken as an option. Some programs allow giving the special option -- to state that all following arguments are no options. The more general approach is to use a name which does not start with a dash:
for i in *
do
cat ./"$i" | wc -l
done
This way, a file named -n will not run cat -n but cat ./-n which will not be understood as the option -n given to cat (which would mean "number lines").
Always quote your variable substitutions. I.e. not cp $source $target, but cp "$source" "$target". This way they won't be subject to word splitting and pathname expansion.
Specify "--" before positional arguments to file operation commands. I.e. not cp "$source" "$target", but cp -- "$source" "$target". This prevents interpreting file names starting with dash as options.
And yes, "/" is not a valid character for file/directory names.

Resources