Windows directory that will never contain non-ASCII characters for temp file? - windows

Using MinGW 7.3.0 on Windows, Hunspell can't load the dictionary files from locations that have non-ASCII characters because of Windows limitations. I've tried everything[1] and I'm now resorting to copying the file to a path without ASCII characters before giving it to Hunspell. What is a good location to copy it to?
[1]
Windows requires wchar_t support for std::iostream.open() to work right, which MinGW does not implement
std::filesystem can solve this, but only available in GCC 8
Hunspell insists on loading files on its own, it is not possible to pass the read files as strings to it

The "natural" fit would be the use the user's choosen temporary directory (or subdirectory thereof) (see %temp% or GetTempPath()). However, that defaults to something that contains the user name (which can contain "non-ASCII" characters; e.g. c:\users\Ø¥Ć¼\AppData\LocalLow\Temp) or something arbitrary (regarding character set) all together.
So you're most likely best off to choose some directory that
a) does not contain off-limits characters from the get go. For example, a directory underneath C:\ProgramData that you choose yourself (e.g. the application name) that you know does not contain non-ASCII characters.
b) let the user decide where to put these files and make sure it is not permissible to enter a path that contains only allowed characters.
c) Pass the "short path name" to Hunspell, which should not contain non-ASCII characters for compatibility with FAT file system traits. For example, the short path name for c:\temp\Ø¥Ć¼ is c:\temp\571D~1.
You can see the short names for directories using cmd.exe /c dir /x:
C:\temp>dir /x
...
19.07.2019 15:30 <DIR> .
19.07.2019 15:30 <DIR> ..
19.07.2019 15:30 <DIR> 571D~1 Ø¥Ć¼
How you can invoke the GetShortPathName Win32 API from MinGW I don't know, but I would assume that it is possible.
Also make sure to review the MSDN page for the above function for traitoffs, e.g. short names are not supported everywhere (e.g. SMB + see comments below).

From this bug tracker:
In WIN32 environment, use UTF-8 encoded paths started with the long
path prefix \\?\ to handle system-independent character encoding
and very long path names (without the long path prefix Hunspell will
use fopen() with system-dependent character encoding instead of
_wfopen()).
So the actual solution seems to be:
Call GetFullPathNameW to normalize the path. Required because paths with long path prefix \\?\ are passed to the NT API unchanged.
Prepend L"\\\\?\\" to the normalized path (backslashes doubled because of C string literal requirements).
For a UNC path, you have to use the "UNC" device directly (i. e. L"\\\\server\\share" → L"\\\\?\\UNC\\server\\share" (thanks eryksun)
Encode the path in UTF-8, e. g. using WideCharToMultiByte() with CP_UTF8.
Pass the final UTF-8 encoded path to Hunspell.

It looks like C:\Windows\Temp is still a valid path you can write to yourself.

Related

Where is the use of "\\?\" defined?

This command is to delete all files and sub-folders in a folder
rd /s "\\?\D:\TestFolder
This command snippet got from a youtube video right here
Could someone explain what this, \\?\, does?
It's the prefix to bypass Windows path normalization. With it you'll be able to access paths that are not valid in Win32 namespace like names ending with . or spaces: D:\TestFolder\folder ending with space \file name ending with dot., or files with path longer than MAX_PATH (260 characters in older Windows)
For file I/O, the "\\?\" prefix to a path string tells the Windows APIs to disable all string parsing and to send the string that follows it straight to the file system. For example, if the file system supports large paths and file names, you can exceed the MAX_PATH limits that are otherwise enforced by the Windows APIs. For more information about the normal maximum path limitation, see the previous section Maximum Path Length Limitation.
Naming Files, Paths, and Namespaces - Win32 File Namespaces
See
Dots at the end of file name?
How to copy files that have too long of a filepath in Windows?

How to load hunspell dictionary in Windows path with non-ASCII characters?

How to load hunspell dictionary in Windows path with non-ASCII characters?
Hunspell manual suggests:
In WIN32
environment, use UTF-8 encoded paths started with the long path prefix \?\ to handle
system-independent character encoding and very long path names, too.
So I have code to do the following:
QString spell_aff = QStringLiteral(R"(\\?\%1%2.aff)").arg(path, newDict);
QString spell_dic = QStringLiteral(R"(\\?\%1%2.dic)").arg(path, newDict);
// while normally not a an issue, you can't mix forward and back slashes with the prefix
spell_dic = spell_aff.replace(QChar('/'), QStringLiteral("\\"));
spell_dic = spell_dic.replace(QChar('/'), QStringLiteral("\\"));
qDebug() << "right before Hunspell_create";
mpHunspell_system = Hunspell_create(spell_aff.toUtf8().constData(), spell_dic.toUtf8().constData());
qDebug() << "right after Hunspell_create";
This prefixes \\?\ to the path, uses a consistent directory separator as documented by the note in microsoft documentation, and converts it to UTF-8 encoding with .toUtf8().
Yet running the code out on Windows 10 Pro fails:
How to fix?
Using Qt5, MinGW 7.3.0.
I've also done due research and as far as I can see, LibreOffice does the same thing and it seemingly works for them: sspellimp.cxx, lingutil.hxx, and lingutil.cxx.
You can use GetShortPathNameW to obtain a pure-ASCII path that Hunspell will understand. See QTIFW-175 for an example.
(thanks to Windows directory that will never contain non-ASCII characters for temp file?)

Haskell Directory creates invalid symlink on Windows

This year System.Directory was updated to include createFileLink and createDirectoryLink actions, and for me on Windows 10 both work fine for relative paths.
When I use either on an absolute path (of about 50 character length, so I suppose in unicode it exceeds 260) it prepends \\?\ (i.e. "\\\\?\\") to the paths, which can be seen from DIR as follows
<SYMLINKD> source [\\?\T:\Code\hLink\binaries\dest]
<SYMLINK> source.txt [\\?\T:\Code\hLink\binaries\dest\source.txt]
The directory link works fine, but the file link doesn't do anything, it doesn't even say that the target file is missing.
When I create a file link using MKLINK without \\?\ in the absolute path it works fine as well, and when I create either link using MKLINKwith \\?\ it has the same result.
Is this a Windows problem? Can I make Haskell use short path format instead? (Using Win10 so apparently I can enable long paths via registry)
Should the Windows api be passing the \\?\ header to symlinks at all?
References:
MaxPath and the meaning of \\?\, plus disabling path limitations on Win10
https://msdn.microsoft.com/en-us/library/aa365247(VS.85).aspx#maxpath
Changelog reporting the addition of \\?\ to win32 calls https://hackage.haskell.org/package/directory-1.3.1.1/changelog

What does slash dot refer to in a file path?

I'm trying to install a grunt template on my computer but I'm having issues. I realized that perhaps something different is happening because of the path given by the Grunt docs, which is
%USERPROFILE%\.grunt-init\
What does that . mean before grunt-init?
I've tried to do the whole import manually but it also isn't working
git clone https://github.com/gruntjs/grunt-init-gruntfile.git "C:\Users\Imray\AppData\Roaming\npm\gru
nt-init\"
I get a message:
fatal: could not create work tree dir 'C:\Users\Imray\AppData\Roaming\npm\.grunt-init"'.: Invalid argument
Does it have to do with this /.? What does it mean?
The \ (that's a backslash, not a slash) is a directory delimiter. The . is simply part of the directory name.
.grunt-init and grunt-init are two distinct names, both perfectly valid.
On Unix-like systems, file and directory names starting with . are hidden by default, which is why you'll often see such names for things like configuration files.
The . is part of a directory name. Filenames can contain . . The \ is a separator between directory names.
Typically, files or directories starting with . are considered "hidden" and/or used for storing metadata. In particular, shell wildcard expansion skips over files that start with ..
For example if you wrote ls -d * then it would not show any files or directories beginning with . (including . and .., the current and parent directories).
Linux hides files and directories whose names begin with dot, unless you use the a (for "all") option when listing directory contents. If this convention is not followed on Windows, your example is probably just a carryover.
It may well be something behind the scenes (later) expects that name to match exactly. While I like things, installers, for example, to just do what I said, I realize that keeping default value is the most tested path.
Directories starting with a dot are invisible by default on xNIX systems. Typically used for configurations files and similar in a users home directory.
\ before " has a special meaning on windows, the error is because windows won't let you create a file containing " as part of its name.

Windows program files path names?

Maybe this can be a silly question but I don't figure out how to search in google why in some code I read, it is used to write this way: \\progra~1
What does ~ and 1 mean?
I tried executing in Windows Run the same path but changing numbers and these are the results:
C:\progra~1 -> Opens Program Files
C:\progra~2 -> Opens Program Files(x86)
C:\progra~3 -> Opens ProgramData
C:\progra~4 -> Opens ProgramDevices, a folder I created in C:\
Why? Is this like a Match or something in the Folder names list?
For example a regex like "progra" and then to show the ~1 (First) match in some X order or ~2 (Second) ... etc?
It's a compatability mode with the old (really old) windows 8.3 naming convention. The ~n represents the instance of the name that has the same root characters.
In your example:
Program Files and Program Files(x86) have the same root characters Progra.
Hence one gets progra~1, the next progra~2 etc.
8.3 compatability can be turned off for a disk partition.
Exactly, it's a pattern counter.
Check out also this answer: What does %~d0 mean in a Windows batch file?
You can find more examples of different variables with modifiers here:
https://technet.microsoft.com/en-us/library/bb490909.aspx
(ctrl-f for "Variable substitution")

Resources