Testing "framework" for scripts with nonstandard filenames - bash

Here are many comments on some questions (especially for shell) that say basically one or more of the following:
This will fail on file names that contain spaces, newlines, etc,
This will fail if the file is a symbolic link (or not),
This will fail if the $filaneme is a directory and not regular file,
and so on.
While I understand that every script needs its own testing environment, but
these are some common things for what the script should be immune against.
So, my intention is to write a script what will create some directory hierarchy
with "specially crafted" file names for testing purposes.
The question is: what "special" file names are good for this test?
Currently I have (the script creates files and directories) with:
space in the file name
newline in the file name
file name that starts with one of:
- (like command argument)
# (comment char)
! (command history)
file name that contains one of:
| char (pipe)
() chars
* and ? (wildcards)
file name with unicode characters
all above for the directories
symbolic link to the directory
symbolic link to the file
Any other idea what I shouldn't miss?

What comes to my mind:
quotes in the filename single and double
the $ character at the start
several redirection characters like > < << <<<
the ~ char ($HOME)
the ';' (as command delimiter)
backslash in the filename \
basically, go thru ascii table and test all chars, if you think that you need this :)
Some another comments:
If you want test scripts for the stack-overflow questions, you should create one file with the OP's content (calling as the "basic file")
And the all above "special files" should be symlinks to the above basic file. With this method you can easily modify the content of the files (you need change only one - the basic).
Or, if symlinks not a solution for you use hard-links.
Not directly about special characters in the filenames, but it is good care about:
different case filenames, especially for images like image.jpg image.JPG, same filename only different extension
EDIT: Ideas from the comments:
Very long filenames, lots and lots of files, and very deep directory hierarchies (tripleee)

Related

Running command on windows does not allow quotations

when i run a command on windows 10 command line that requires a path as one of its params, it works if the path is NOT inside a quotation, but if a path has a space in it, i need to wrap it inside quotes so that it treats as one single path, but then it complains that the file in that path does not exists.
For example:
C:/PROJECTS/desktopfiles/public/libs/cpdf/win64/cpdf.exe C:/Users/john/Documents/cat.pdf C:/Users/john/Documents/my_dog.pdf -o C:/Users/john/Documents/cat_dog_Merged.pdf
The above works,
the below doesn't (because there is a space in my dog.pdf)
C:/PROJECTS/desktopfiles/public/libs/cpdf/win64/cpdf.exe C:/Users/john/Documents/cat.pdf C:/Users/john/Documents/my dog.pdf -o C:/Users/john/Documents/cat_dog_Merged.pdf
You could try to replace spaces with a question mark. The question mark is a wildcard to match "any single character", which would be a space in your case. Like this: my?dog.pdf. Just make sure that there is no other file matching this pattern. But the system should give you some error message then (which might or might not point to the root of the problem).
Another solution that comes to my mind is a batch file that renames the files in question automatically (replacing spaces with underscores) and renames them back after the pdf merge.

Sed replace unusual file extension arising from gmv

As a result of using gmv on a large nested directory to flatten in, I have a number of duplicate files separated out and with the extensions "._1_" "._2_" etc ( .... ._n_ )
eg "a.pdf.\_1\_"
ie its
a(dot)pdf(dot)(back slash)1(back slash)
as opposed to
a(dot)pdf(dot)1
which I want to reduce it back to "a.pdf"
I tried something like
sed -i .bak "s|.\_1\_||" *
which is usually reliable and doesn't require escape characters. However its giving me
"error: illegal byte sequence"
Grateful for help to fix. This is on Mac OSX terminal. Ideally I'd like a generic solution to fix ._*_ forms where the * varies 1 to 9
There are two challenges here.
How to deal with the duplicate basename (The suffixes '1', '2', ... mostly like added to designate different sections of a single file - may be different pages a PDF, etc. Performing rename that will strip the files may cause some important files to disappear.
How to deal with the "error: illegal byte sequence" which indicate that some special characters (unicode) are part of the file name. Usually ASCII characters with value >= \0xc0, which can not be decoded according to the current local. The fact that the file names are escaped (as per OP "a.pdf.\_1\_" may hint at additional characters, not displayed (assuming this was not added by the OP).
Proposed solution is to rename the file, and place the 'sequence' part, that make the file unique BEFORE the extension, allowing the extension to be used to determine file type.
a.pdf.1 => a.1.pdf
The rename command to perform this task is:
rename 's/(.).pdf.(_._)/$1$2.pdf/' .pdf.__
Adjust the file name list as needed, and use -n to verify before running.
rename -n s/.\_1\_// *.*_1_
works (remove the -n once tested).

How to write a script to fetch the address of the links to .rar files on a webpage?

Have a pile of 50 .rar files on a web server and I want to download them all.
And, the names of the files have nothing in common other than .rar.
I wanted to try aria2 to download all of them altogether, but I think I need to write a script to fetch the addresses of all the .rar files.
I have no idea how to start writing the scrip. Any hint will be appreciated.
You can try to play with wget with -A parameter in your shell script:
wget -r "https://foo/" -P /tmp -A "*.rar"
Here is an explanation of what -A does
Specify comma-separated lists of file name suffixes or patterns to accept or reject (see Types of Files). Note that if any of the wildcard characters, ‘’, ‘?’, ‘[’ or ‘]’, appear in an element of acclist or rejlist, it will be treated as a pattern, rather than a suffix. In this case, you have to enclose the pattern into quotes to prevent your shell from expanding it, like in ‘-A ".mp3"’ or ‘-A '*.mp3'’.

Replace incorrectly displayed special chars in bash

I've uploaded a big number of files including their folder structure to my Ubuntu 12.04 LTS Server using WinSCP.
The goal is to access these files in Owncloud.
However, all files that contain special character like German Umlauts cause problems. In Ownclouds view, their name is cut off at the special character and trying to view that folder or file will send you back to the folder root.
Using ls, the special character is always displayed as a question mark, e.g. "Moterschwei?en1.jpg"
What works is manually renaming them through "mv" in the shell. Inserting the special char properly, e.g. "Motorschweißen1.jpg" for this example, does work, but doing this for all of them would take ages.
Using find . -name "?" will not yield any hits.
Is there any way to replace all of those special characters, e.g. with an underscore?
Try the command rename:
rename 'y/\W/_' *
The above command will replace all non alphanumeric characters with _. See http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators and http://perldoc.perl.org/perlre.html#Special-Backtracking-Control-Verbs for the documentation of perl regex expression.

Makefile problem with files beginning with "#"

I have a directory "FS2" that contains the following files:
ARGH
this
that
I have a makefile with the following contents.
Template:sh= ls ./FS2/*
#all: $(Template)
echo "Template is: $(Template)"
touch all
When I run "clearmake -C sun" and the file "all" does not exist, I get the following output:
"Template is: ./FS2/#ARGH# ./FS2/that ./FS2/this"
Modifying either "this" or "that" does not cause "all" to be regenerated. When run with "-d" for debug, the "all" target is only dependent on the directory "./FS2", not the three files in the directory. I determined that when it expands "Template", the "#" gets treated as the beginning of a comment and the rest of the line is ignored!
The problem is caused by an editor that when killed leaves around files that begin with "#". If one of those files exists, then no modifications to files in the directory causes "all" to be regenerated.
Although, I do not want to make compilation dependent on whether a temporary file has been modified or not and will remove the file from the "Template" variable, I am still curious as to how to get this to work if I did want to treat the "#ARGH#" as a filename that the rule "all" is dependent on. Is this even possible?
I have a directory "FS2" that contains the following files: #ARGH# ...
Therein lies your problem. In my opinion, it is unwise using "funny" characters in filenames. Now I know that those characters are allowed but that doesn't make them a good idea (ASCII control characters like backspace are also allowed with similar annoying results).
I don't even like spaces in filenames, preferring instead SomethingLikeThis to show independent words in a file name, but at least the tools for handling spaces in many UNIX tools is known reasonably well.
My advice would be to rename the file if it was one of yours and save yourself some angst. But, since they're temporary files left around by an editor crash, delete them before your rules start running in the makefile. You probably shouldn't be rebuilding based on an editor temporary file anyway.
Or use a more targeted template like: Template:sh= ls ./FS2/[A-Za-z0-9]* to bypass those files altogether (that's an example only, you should ensure it doesn't faslely exclude files that should be included).
'#' is a valid Makefile comment char, so the second line is ignored by the make program.
Can you filter out (with grep) the files that start with # and process them separately?
I'm not familiar with clearmake, but try replacing your template definition with
Template:sh= ls ./FS2/* | grep -v '#'
so that filenames containing # are not included in $(Template).
If clearmake follows the same rules as GNU make, then you can also re-write your target using something like Template := $(wildcard *.c) which will be a little more intelligent about files with oddball names.
If I really want the file #ARGH# to contribute to whether the target all should be rebuilt as well as be included in the artifacts produced by the rule, the Makefile should be modified so that the line
Template:sh= ls ./FS2/*
is changed to
Template=./FS2/*
Template_files:sh= ls $(Template)
This works because $(Template) will be replaced by the literal string ./FS2/* after all and in the expansion of $(Template_files).
Clearmake (and GNU make) then use ./FS2/* as a pathname containing a wildcard when evaluating the dependencies, which expands in to the filenames ./FS2/#ARGH# ./FS2/that ./FS2/this and $(Template_files) can be used in the rules where a list of filenames is needed.

Resources