boost::filesystem normalize filename - boost

I need normalize file names such that it don't contain any non-portable characters in it. There is portable_file_name but that just checks and returns bool. I need to anyhow convert the given string to a portable name to create files.
Is there any reusable works ?

I assume that you mean some characters (*:;\"?<>/\|) are acceptable as file name or path name characters on some operating systems (Mac OS 9 for instance) but are not acceptable on others (such as Windows XP). Is that correct?
If so, you should probably do the character conversion yourself. I've done this in the past by using a regex to find and replace the unacceptable file name characters with a dash or something that works on all target operating systems. Then, you may safely use the files on both.

Try this:
boost::filesystem3::path portable_file_name;
portable_file_name.normalize();

The best I could come up with so far is:
for (auto &c:name)
{
char test[] = { c,0 };
if (!boost::filesystem::portable_file_name(test))
{
c = '_';
}
}

One of the important steps in doing this is converting file names like ./file or ones pointing to symbolic links into filenames that work on other platforms which might not have these concepts. Boost v1.48.0+ actually has the following functions to do so:
path canonical(const path& p, const path& base = current_path());
path canonical(const path& p, system::error_code& ec);
path canonical(const path& p, const path& base, system::error_code& ec);
This usually involves converting relative paths into absolute ones. These functions are also often called before performing security checks (e.g. is the requested file within the web-root directory of a web server?).
Note that cannonical() requires the file to exist.

Related

How to I determine whether two file paths (or file URLs) identify the same file or directory on macOS?

Imagine this simple example of two paths on macOS:
/etc/hosts
/private/etc/hosts
Both point to the same file. But how do you determine that?
Another example:
~/Desktop
/Users/yourname/Desktop
Or what about upper / lower case mixes on a case-insensitive file system:
/Volumes/external/my file
/Volumes/External/My File
And even this:
/Applications/Über.app
Here: The "Ü" can be specified in two unicode composition formats (NFD, NFC). For an example where this can happen when you use the (NS)URL API see this gist of mine.
Since macOS 10.15 (Catalina) there are additionally firmlinks that link from one volume to another in a volume group. Paths for the same FS object could be written as:
/Applications/Find Any File.app
/System/Volumes/Data/Applications/Find Any File.app
I like to document ways that reliably deal with all these intricacies, with the goal of being efficient (i.e. fast).
There are two ways to check if two paths (or their file URLs) point to the same file system item:
Compare their paths. This requires that the paths get prepared first.
Compare their IDs (inodes). This is overall safer as it avoids all the complications with unicode intricacies and wrong case.
Comparing file IDs
In ObjC this is fairly easy (note: Accordingly to a knowledgeable Apple developer one should not rely on [NSURL fileReferenceURL], so this code uses a cleaner way):
NSString *p1 = #"/etc/hosts";
NSString *p2 = #"/private/etc/hosts";
NSURL *url1 = [NSURL fileURLWithPath:p1];
NSURL *url2 = [NSURL fileURLWithPath:p2];
id ref1 = nil, ref2 = nil;
[url1 getResourceValue:&ref1 forKey:NSURLFileResourceIdentifierKey error:nil];
[url2 getResourceValue:&ref2 forKey:NSURLFileResourceIdentifierKey error:nil];
BOOL equal = [ref1 isEqual:ref2];
The equivalent in Swift (note: do not use fileReferenceURL, see this bug report):
let p1 = "/etc/hosts"
let p2 = "/private/etc/hosts"
let url1 = URL(fileURLWithPath: p1)
let url2 = URL(fileURLWithPath: p2)
let ref1 = try url1.resourceValues(forKeys[.fileResourceIdentifierKey])
.fileResourceIdentifier
let ref2 = try url2.resourceValues(forKeys[.fileResourceIdentifierKey])
.fileResourceIdentifier
let equal = ref1?.isEqual(ref2) ?? false
Both solution use the BSD function lstat under the hood, so you could also write this in plain C:
static bool paths_are_equal (const char *p1, const char *p2) {
struct stat stat1, stat2;
int res1 = lstat (p1, &stat1);
int res2 = lstat (p2, &stat2);
return (res1 == 0 && res2 == 0) &&
(stat1.st_dev == stat2.st_dev) && (stat1.st_ino == stat2.st_ino);
}
However, heed the warning about using these kind of file references:
The value of this identifier is not persistent across system restarts.
This is mainly meant for the volume ID, but may also affect the file ID on file systems that do not support persistent file IDs.
Comparing paths
To compare the paths you must get their canonical path first.
If you do not do this, you can not be sure that the case is correct, which in turn will lead to very complex comparison code. (See using NSURLCanonicalPathKey for details.)
There are different ways how the case can be messed up:
The user may have entered the name manually, with the wrong case.
You have previously stored the path but the user has renamed the file's case in the meantime. You path will still identify the same file, but now the case is wrong and a comparison for equal paths could fail depending on how you got the other path you compare with.
Only if you got the path from a file system operation where you could not specify any part of the path incorrectly (i.e. with the wrong case), you do not need to get the canonical path but can just call standardizingPath and then compare their paths for equality (no case-insensitive option necessary).
Otherwise, and to be on the safe side, get the canonical path from a URL like this:
import Foundation
let uncleanPath = "/applications"
let url = URL(fileURLWithPath: uncleanPath)
if let resourceValues = try? url.resourceValues(forKeys: [.canonicalPathKey]),
let resolvedPath = resourceValues.canonicalPath {
print(resolvedPath) // gives "/Applications"
}
If your path is stored in an String instead of a URL object, you could call stringByStandardizingPath (Apple Docs). But that would neither resolve incorrect case nor would it decompose the characters, which may cause problems as shown in the aforementioned gist.
Therefore, it's safer to create a file URL from the String and then use the above method to get the canonical path or, even better, use the lstat() solution to compare the file IDs as shown above.
There's also a BSD function to get the canonical path from a C string: realpath(). However, this is not safe because it does not resolve the case of different paths in a volume group (as shown in the question) to the same string. Therefore, this function should be avoided for this purpose.

My exists function says the file exists, but winapi functions say it does not

I copied code that's supposed to change desktop wallpaper. I have this constant in my program:
const char * image_name = "button_out.gif";
Later, I write the image on disk using Magick++:
image.write(image_name);
The image appears in program's working directory. If I run the program directly from explorer the working directory equals the program location.
Because the code prints the 0x80070002 - File not found error I added a exist function in the beginning:
#include <sys/stat.h>
bool exists(const char* name) {
struct stat buffer;
return (stat (name, &buffer) == 0);
}
void SetWallpaper(LPCWSTR file){
if(!exists((const char* )file)) {
wcout << "The file "<<file<<" does not exist!" << endl;
return;
... actually try to set a wallpaper ...
}
The error is not printed however and the code proceeds.
Now the question is:
Does my exist function work properly?
Where does windows look for that image?
Full code to set a Magick++ generated image as background in case I have missed something relevant in this question.
Problem 1: String Conversions
Your primary problem is that you are attempting to use LPCWSTR (a const wchar_t *) and const char * interchangeably. I see a number of issues in your source, in particular:
You start with const char * image_name.
You then cast it to a LPCWSTR to pass to SetWallpaper. This basically guarantees that SetWallpaper will fail, as desktop->SetWallpaper is not able to handle non wide-character strings.
You then cast it back to a const char * to pass to stat() via exists(). This should work in your situation (since the original string really is a char *) but isn't correct because your string parameter to SetWallpaper is supposedly a proper LPCWSTR.
You need to pick a string format (wide-character vs. what Windows terms "ANSI") and stick to that format, using consistent APIs throughout.
The easiest option is probably just to leave most of your code untouched, but modify SetWallpaper to take a const char * and convert to a wide-character string when needed (for this you can use mbstowcs). So, for example:
void SetWallpaper(const char * file){ // <- Use a const char* parameter.
...
// Convert to a wide-character string to pass to COM:
wchar_t wcfile[MAX_PATH + 1];
mbstowcs(wcfile, file, sizeof(wcfile) / sizeof(wchar_t));
// Pass the converted wide-character string:
desktop->SetWallpaper(wcfile, 0);
...
}
The other option would be to use wide-character strings throughout, i.e.:
LPCWSTR image_name = L"button_out.gif";
Modify exists() to take a LPCWSTR and use _wstat() instead.
Use wide-character versions of all other API functions.
However, I am unsure how that would interact with the ImageMagick API, which may not have wide-character support. So it's up to you. Choose whatever approach is the easiest to implement but make sure you are consistent. The general rule is do not cast between LPCWSTR and const char *; if you are ever in a situation where you need to change one to the other, you cannot cast, you must convert (via mbstowcs or wcstombs).
Problem 2: SetWallpaper default directory is not current working directory
At this point, your string usage will be consistent. Now that you have that problem ironed out, if SetWallpaper fails while exists() does not, then SetWallpaper is not looking where you think it is. As you discovered in your comment, SetWallpaper looks in the desktop by default. In this case, while I have not tested it, you may be able to work around this by passing an absolute path to SetWallpaper. For this, you can use GetFullPathName to determine the absolute file name given your relative path. Remember to be consistent with your string types, though.
Also, if stat() continues to fail, then that problem is either that your working directory is not what you think it is, or your filename is not what you think it is. To that end you will want to perform the following tests:
Print the current working directory at the point you check for the files existence, verify it is correct.
Print the filename when you check for its existence, verify it is correct.
You should be good to go once you work all the above issues out.

How can I check if a string is a valid file name for windows using R?

I've been writing a program in R that outputs randomization schemes for a research project I'm working on with a few other people this summer, and I'm done with the majority of it, except for one feature. Part of what I've been doing is making it really user friendly, so that the program will prompt the user for certain pieces of information, and therefore know what needs to be randomized. I have it set up to check every piece of user input to make sure it's a valid input, and give an error message/prompt the user again if it's not. The only thing I can't quite figure out is how to get it to check whether or not the file name for the .csv output is valid. Does anyone know if there is a way to get R to check if a string makes a valid windows file name? Thanks!
These characters aren't allowed: /\:*?"<>|. So warn the user if it contains any of those.
Some other names are also disallowed: COM, AUX, NUL, COM1 to COM9, LPT1 to LPT9.
You probably want to check that the filename is valid using a regular expression. See this other answer for a Java example that should take minimal tweaking to work in R.
https://stackoverflow.com/a/6804755/134830
You may also want to check the filename length (260 characters for maximum portability, though longer names are allowed on some systems).
Finally, in R, if you try to create a file in a directory that doesn't exist, it will still fail, so you need to split the name up into the filename and directory name (using basename and dirname) and try to create the directory first, if necessary.
That said, David Heffernan gives good advice in his comment to let Windows do the wok in deciding whether or not it can create the file: you don't want to erroneously tell the user that a filename is invalid.
You want something a little like this:
nice_file_create <- function(filename)
{
directory_name <- dirname(filename)
if(!file.exists(directory_name))
{
ok <- dir.create(directory_name)
if(!ok)
{
warning("The directory of that path could not be created.")
return(invisible())
}
}
tryCatch(
file.create(filename),
error = function(e)
{
warning("The file could not be created.")
}
)
}
But test it thoroughly first! There are all sorts of edge cases where things can fall over: try UNC network path names, "~", and paths with "." and ".." in them.
I'd suggest that the easiest way to make sure a filename is valid is to use fs::path_sanitize().
It removes control characters, reserved characters, and Windows-reserved filenames, truncating the string at 255 bytes in length.

Filesystem object symbology

What is the most standard Ruby symbology for naming variables containing file names, file names with path and file instances? Completely clear way of doing this would be:
file_name = "bar.txt"
file_name_with_path = "foo", file_name
file = File.open( file_name_with_path )
But it's too long. It is out of question to use :file_name_with_path in method definition:
def quux( file_name_with_path: "foo/bar.txt" )
# ...
end
Having encountered this for umpteenth time, I realized that shortening conventions are needed. I started making personal shortening conventions: :file_name => :fn, :file_name_with_path => :fnwp, :file always refers to a File instance, :fn never includes path, :fnwap means :file_name_with_absolute_path etc. But everyone must be facing this, so I am asking: Is there a public convention for this? More particularly, does Rails code have a convention for this?
But everyone must be facing this...
No, not really, because you're really over-thinking this.
Just use file:, or filename:. It doesn't matter whether your filename contains a relative or absolute path, or whether the path contains directories, and your code should reflect this. A path to a file is just a path to a file, and all paths should be treated identically by your code: It just opens the file, and raises an error if it can't.
You can use filesystem utilities to extract directories and base names from a path, and they'll work just fine on any path, regardless of the presence of directories, regardless of wether the path is absolute or relative. It just doesn't matter.

SHFileOperation FOF_ALLOWUNDO fails on long filenames

I'm using the following function to delete a file to the recycle bin: (C++, MFC, Unicode)
bool DeleteFileToPaperbasket (CString filename)
{
TCHAR Buffer[2048+4];
_tcsncpy_s (Buffer, 2048+4, filename, 2048);
Buffer[_tcslen(Buffer)+1]=0; //Double-Null-Termination
SHFILEOPSTRUCT s;
s.hwnd = NULL;
s.wFunc = FO_DELETE;
s.pFrom = Buffer;
s.pTo = NULL;
s.fFlags = FOF_ALLOWUNDO | FOF_SILENT | FOF_NOERRORUI;
s.fAnyOperationsAborted = false;
s.hNameMappings = NULL;
s.lpszProgressTitle = NULL;
int rc = SHFileOperation(&s);
return (rc==0);
}
This works nicely for most files. But if path+filename exceed 255 characters (and still much shorter that 2048 characters), SHFileOperation returns 124. Which is DE_INVALIDFILES.
But what's wrong? I checked everything a million times. The path is double-null terminated, I'm not using \\?\ and it works for short filenames.
I'm totally out of ideas...
I think backwards comparability is biting you in the --- in several ways, and I'd need to actually see the paths your using and implement some error checking code to help. But here are some hints.
You would not get a DE_INVALIDFILES 0x7C "The path in the source or destination or both was invalid." for a max path violation, you'd get a DE_PATHTOODEEP 0x79 "The source or destination path exceeded or would exceed MAX_PATH."
These error codes(return value) do, can, and have changed over time, to be sure what your specific error code means, you need to check it with GetLastError function(msdn)
Also, taken from the SHFileOperation function documentation: "If you do not check fAnyOperationsAborted as well as the return value, you cannot know that the function accomplished the full task you asked of it and you might proceed under incorrect assumptions."
You should not be using this API for extremely long path names, it has been replaced in vista+ by IFileOperation interface
The explanation for why it may work in explorer and not thru this LEGACY api is - Taken from the msdn page on Naming Files, Paths, and Namespaces
The shell and the file system have different requirements. It is
possible to create a path with the Windows API that the shell user
interface is not able to interpret properly.
Hope this was helpful
The recycle bin doesn't support files whose paths exceed MAX_PATH in length. You can verify this for yourself by trying to recycle such a file in Explorer - you will get an error message about the path being too long.

Resources