This is a sample code i have written to check if i am able to create a folder with name length greater than MAX_PATH -
wstring s = L"D:\\Test";
wstring s2 = L"\\?\D:\\datafffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffr700000000000000datafffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffr700000000000000datafffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffr700000000000000datafffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffr700000000000000datafffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffr700000000000000";
int ret = CreateDirectoryEx(s.c_str(), s2.c_str(), NULL);
int error = GetLastError();
It did not work, the returned error is ERROR_PATH_NOT_FOUND. Can anyone please tell me whats the problem in the code?
Note: "D:\Test" folder is an existing folder. I am using Windows 7.
need not confuse Maximum file name length (path component) and Maximum path length - see Limits
the Maximum file name length is <= 255 Unicode characters for all file systems
and Maximum path length
32,760 Unicode characters with each path component no more than 255 characters
initial error was by using L"\\?\" prefix - really it must be L"\\\\?\\" because c/c++ translate "\\" to \ - but this already only language specific error.
if fix it - must be error ERROR_INVALID_NAME (converted from NTSTATUS STATUS_OBJECT_NAME_INVALID ) because path component which you use more than 255 characters
Because the syntax is simply wrong. You have to escape the backslash. So the prefix should be L"\\\\?\\".
wstring s2 = L"\\\\?\\D:\\dataff...";
Because the path sizes are limited (to 160 caracters I think on W7 but not sure)
Related
We have SMB shares that are used by Windows and Mac clients. We want to move some data to Sharepoint, but need to validate the filenames against characters that are not allowed in Windows. Although Windows users wouldn't be able to create files with illegal characters anyway, Mac users are still able to create files with characters that are illegal in Windows.
The problem is that for files with illegal characters in their names, Windows/Powershell substitutes those characters with private-use address unicode codepoint. These vary by input character.
$testfolder = "\\server\test\test*dir" # created from a Mac
$item = get-item -path $testfolder
$item.Name # testdir
$char = $($item.Name)[4] #
$bytes = [System.Text.Encoding]::BigEndianUnicode.GetBytes($char) # 240:33
$unicode = [System.BitConverter]::toString($bytes) # F0-21
For a file with name pipe|, the above code produces the output F0-27, so it's not simply a generic "invalid" character.
How can I check filenames for invalid values when I can't actually get the values??
As often happens, in trying to formulate my question as precisely as possible, I came upon a solution. I would still love any other answers for how this could be tackled more elegantly, but since I didn't find any other resources with this information, I'm providing my solution here in hopes it might help others with this same problem.
Invalid Characters Map to Specific Codepoints
Note: I'm extrapolating all of this from observations I've made. I'm happy for someone to comment or provide an alternative answer that is more complete or correct.
There is a certain set of characters that are invalid for Windows file names, but this is a restriction of the OS, NOT the filesystem. This means that that it's possible to set a filename on an SMB share that is valid on another OS (e.g. MacOS) but not on Windows. When Windows encounters such a file, the invalid characters are shadowed by a set of proxy unicode codepoints, which allows Windows to interact with the files without renaming them. These codepoints are in the unicode Private Use Area, which covers 0xE000-0xF8FF. Since these codepoints are not mapped to printable characters, Powershell displays them all as ▯ (U+25AF). In my specific use case, I need to run a report of what invalid characters are present in a filename, so this generic character message is not helpful.
Through experimentation, I was able to determine the proxy codepoints for each of the printable restricted characters. I've included them below for reference (note: YMMV on this, I haven't tested it on multiple systems, but I suspect it's consistent between versions).
Character
Unicode
"
0xF020
*
0xF021
/
0xF022
<
0xF023
>
0xF024
?
0xF025
\
0xF026
|
0xF027
(trailing space)
0xF028
: is not allowed in filenames on any system I have easy access to, so I wasn't able to test that one.
Testing names in Powershell
Now that we know this, it's pretty simple to tackle in powershell. I created a hashtable with all of the proxy unicode points as keys and the "real" characters as values, which we can then use as a lookup table. I chose to replace the characters in the filename string before testing the name. This makes debugging easier.
#Set up regex for invalid characters
$invalid = [Regex]::new('^\s|[\"\*\:<>?\/\\\|]|\s$')
#Create lookup table for unicode values
$charmap = #{
[char]0xF020 = '"'
[char]0xF021 = '*'
[char]0xF022 = '/'
[char]0xF023 = '<'
[char]0xF024 = '>'
[char]0xF025 = '?'
[char]0xF026 = '\'
[char]0xF027 = '|'
[char]0xF028 = ' '
}
Get-ChildItem -Path "\\path\to\folder" -Recurse | Foreach-Object {
# Get the filename
$fixedname = split-path -path $_.FullName -leaf
#Iterate through the hashtable and replace all the proxy characters with printable versions
foreach($key in $charmap.getEnumerator()){
$fixedname = $fixedname.Replace($key.Name,$key.Value)
}
#Build a list of invalid characters to include in report (not shown here)
$invalidmatches = $invalid.Matches($fixedname)
if ($invalidmatches.count -gt 0) {
$invalidchars = $($invalidmatches | foreach-object {
if ($_.value -eq ' '){"Leading or trailing space"} else {$_.value}}) -join ", "
}
}
Extending the solution
In theory, you could also extend this to cover other prohibited characters, such as the ASCII control characters. Since these proxy unicode points are in the PUA, and there is no documentation on how this is handled (as far as I know), discovering these associations is down to experimentation. I'm content to stop here, as I have run through all of the characters that are easily put in filenames by users on MacOS systems.
I created a file with square brackets called [id].go but I am unable to build it.
When I run go build "[id].go", I see the the following:
can't load package: package main: invalid input file name "[id].go"
Are there restrictions on Go file names? Specifically, what is not allowed? Please provide documentation if any.
At the time of writing, Go files must begin with one of the following:
0 through 9
a through z
A through Z
. (period)
_ (underscore)
/ (forward slash)
>= utf8.RuneSelf (char 0x80 or higher)
Two or more files in the same folder can't be named equal (case insensitive match)
https://github.com/golang/go/blob/993ec7f6cdaeb38b88091f42d6369d408dcb894b/src/cmd/go/internal/load/pkg.go#L1826-L1835
To be conservative, we reject almost any arg beginning with non-alphanumeric ASCII.
As an example if you try a[id].go as the file name you should be good to go.
As in the above title question, my current working directory contains one directory "a" which contains another directory "b". The correct path to directory "b" is "a\b" (on Windows platform). Assuming that "/" is used as "switch" character I expect function GetFileAttributesA() to give an error for the specified path "a/b". The following documentation says nothing about additional internal path separator conversion.
The question is why GetFileAttributesA() works with unix path separators?
The C++ code is (C++14):
#include <windows.h>
int main()
{
DWORD gfa1 = GetFileAttributesA("a\\b");
DWORD gfa2 = GetFileAttributesA("a/b");
// Both gfa1 and gfa2 are equal to FILE_ATTRIBUTE_DIRECTORY
// I expect the gfa2 to be INVALID_FILE_ATTRIBUTES
return 0;
}
The reason why I would expect function to fail with "a/b" is simple. To simplify I have one function which tells if particular path is a directory for both Linux and Windows system. As long as the function has the same behaviour for slashes and backslashes on Windows I'm forced to add the same behaviour on Linux (separator conversion) or vice-versa (do not allow creating directories with "/" on Windows which is not supported by this function).
Many parts of Windows accept both forward and backward slashes, including nearly all the file API's. Both slashes are reserved characters, and can not appear within a file or directory name.
I am not sure this is detailed in a central place, but for the file API's, the Naming Files, Path, and Namespaces document has this to say:
File I/O functions in the Windows API convert "/" to "\" as part of converting the name to an NT-style name, except when using the "\?\" prefix as detailed in the following sections.
As for:
Assuming that "/" is used as "switch" character
Since on the command line any file or directory path containing a space must be quoted, you can safely split such a path with forward slashes from any switches/parameters on that space character or quotation rules. Similar to how there is no issue with - being in file and directory names, but also used by many programs for command line switches.
I'm reading about the NTFS attribute types and it come to the $FILE_NAME attribute structure. Here it is:
Offset Size Description
~ ~ Standard Attribute Header
0x00 8 File reference to the parent directory.
0x08 8 C Time - File Creation
0x10 8 A Time - File Altered
0x18 8 M Time - MFT Changed
0x20 8 R Time - File Read
0x28 8 Allocated size of the file
0x30 8 Real size of the file
0x38 4 Flags, e.g. Directory, compressed, hidden
0x3c 4 Used by EAs and Reparse
0x40 1 Filename length in characters (L)
0x41 1 Filename namespace
0x42 2L File name in Unicode (not null terminated)
What is "Filename Namespace" at the offset 0x41? I know a little about namespace i think. How can it be stored in just 1 byte? Can anyone clear this for me? Thank you.
It describes the "traits" of a filename, i.e. length, allowable characters, etc. It is not a "string" in itself (like a C++/C#/etc. namespace).
I found a document here, of which I have frankly no idea of its validity.
But anyway, it describes the namespaces as such (which makes it quite obvious, see chapter 13.2.):
0: POSIX
This is the largest namespace. It is case sensitive and
allows all Unicode characters except for NULL (0) and Forward Slash
'/'. The maximum name length is 255 characters. N.B. There are some
characters, e.g. Colon ':', which are valid in NTFS, but Windows will
not allow you to use.
1: Win32
Win32 is a subset of the POSIX
namespace and is case insensitive. It uses all the Unicode characters,
except: '"' '*' '/' ':' '<' '>' '?' '\' '|' N.B. Names cannot end with
Dot '.', or Space ''.
2: DOS
DOS is a subset of the Win32 namespace,
allowing only 8 bit upper case characters, greater than Space '', and
excluding: '"' '*' '+' ',' '/' ':' ';' '<' '=' '>' '?' '\'. N.B. Names
must match the following pattern: 1 to 8 characters, then '.', then 1
to 3 characters.
3: Win32 &DOS
This namespace means that both the
Win32 and the DOS filenames are identical and hence have been saved in
this single filename record.
So the field can be one byte, because it just contains a number identifying the respective namespace in use.
I am told that I can add images to a label, but when I run the following code I get an error message:
unicode error: unicodeescape codec can't decode bytes in position 2-3: truncated \UXXXXXX escape
My code is as simple as possible
from tkinter import *
root = Tk()
x = PhotoImage(file="C:\Users\user\Pictures\bee.gif")
w1 = Label(root, image=x).pack()
root.mainloop()
All the examples I've seen don't include the file path to the image but in that case Python can't find the image.
What am I doing wrong ??
Python is treating \Users as a unicode character because of the leading \U. Since it's an invalid unicode character, you get the error.
You can either use forward slashes ("C:/Users/user/Pictures/bee.gif"), a raw string (r"C:\Users\user\Pictures\bee.gif"), or escape the backslashes ("C:\\Users\\user\\Pictures\\bee.gif")