I'm working on a personal project that I was trying to solve via canonicalizing a relative path in Rust. However, whenever I do so, the new path gets prefixed with a strange \\?\ sequence. For example, something as simple as:
let p = fs::canonicalize(".").unwrap();
println!("{}", p.display());
will result in something like the following output:
\\?\C:\Users\[...]\rustprojects\projectname
This isn't a particular problem because I can accomplish what I'm attempting in other ways. However, it seems like odd behavior, especially if you are going to use the string form of the path in some way that requires accuracy. Why is this sequence of characters prepending the result, and how can I avoid it?
The \\?\ prefix tells Windows to treat the path as is, i.e. it disables the special meaning of . and .., special device names like CON are not interpreted and the path is assumed to be absolute. It also enables using paths up to 32,767 characters (UTF-16 code units), whereas otherwise the limit is 260 (unless you're on Windows 10, version 1607 or later, and your application opts in to longer paths).
Therefore, the \\?\ prefix ensures that you'll get a usable path; removing that prefix may yield a path that is unusable or that resolves to a different file! As such, I would recommend that you keep that prefix in your paths.
Related
My regexp behaves just like I want it to on http://regexr.com, but not like I want it in irb.
I'm trying to make a regular expression that will match the following:
A forward slash,
then 2 * any number of random characters (i.e. `.*`),
up to but not including another /
OR the end of the string (whichever comes first)
I'm sorry as that was probably unclear, but it's my best attempt at an English translation.
Here's my current attempt and hopefully that will give you a better idea of what I'm trying to do:
/(\/.*?(?=\/|$)){2}/
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar). I'm trying to do this using the command path.scan(-regex-).shift.
The usage scenario is I want to be able to take a path like /foo/bar/baz/bin/bash and shorten it to the level I'm at in the filesystem, in this case the second level (/foo/bar)
Ruby already has a class for handling paths, Pathname. You can use Pathname#relative_path_from to do what you want.
require 'pathname'
path = Pathname.new("/foo/bar/baz/bin/bash")
# Normally you'd use Pathname.getwd
cwd = Pathname.new("/foo/bar")
# baz/bin/bash
puts path.relative_path_from(cwd)
Regexes just invite problems, like assuming the path separator is /, not honoring escapes, and not dealing with extra /. For example, "//foo/bar//b\\/az/bin/bash". // is particularly common in code which joins together directories using paths.join("/") or "#{dir}/#{file}.
For completeness, the general way you match a single piece of a path is this.
%r{^(/[^/]+)}
That's the beginning of the string, a /, then 1 or more characters which are not /. Using [^/]+ means you don't have to try and match an optional / or end of string, a very useful technique. Using %r{} means less leaning toothpicks.
But this is only applicable to a canonicalized path. It will fail on //foo//b\\/ar/. You can try to fix up the regex to deal with that, or do your own canonicalization, but just use Pathname.
File.directory?(ENV["HOME"])
returns false because ENV["HOME"] contains path with Russian words, like:
c:/Users/Администратор
How do I solve it?
You seem to be on Windows. Windows has the oddity, that upper- and lowercase file names are not distinguished (i.e. Windows remembers the case of the letters, when the entry is created, but it ignores case, when a filename is used). Ruby tries to mimic this, but I don't know whether it is able to to this "case-insensitivity" with cyrillic characters. Could it be that the directory had been created in a different spelling (with regard to upper/lower case), than it is stored in the ENV hash?
I would proceed as follows: From an irb shell, do a
Dir['c:/Users']
You should see the entries in the "correct" spelling. Does it exactly match the content of ENV['HOME']? If you use copy and paste from this output, and ask (again in the irb shell) for File.directory?('....'), does it evaluate to true then?
I am trying to use Windows API functions compatible with Windows XP and up to find the target of a junction or symbolic link. I am using CreateFile to get a handle to the reparse point, then DeviceIoControl with the FSCTL_GET_REPARSE_POINT flag to read the reparse data into a REPARSE_DATA_BUFFER. Then, I use the offsets and lengths in the buffer to extract the SubstituteName and PrintName strings.
In Windows 8, extracting the PrintName works perfectly, giving me a normal path (ie c:\filename.ext), but in XP the PrintName section of the REPARSE_DATA_BUFFER seems to always have a length of 0, leaving me with an empty string.
Using the SubsituteName seems to work in both, but I always end up with a prefix of \??\ on the beginning of the file path (ie \??\c:\filename.ext). (as a side note, fsutil reparsepoint query shows the \??\ prefix as well).
I've read through much of the documentation on MSDN, but I can't find any explanation of this prefix. If the prefix is guaranteed to begin every SubstituteName, then I can just exclude the first four characters when I copy the file path from the buffer, but I'm not sure that this is the case. I would love to know if the "\??\" prefix appears in the SubstituteName for all Microsoft reparse points and why.
The Windows kernel has a "DOS Devices namespace" \DosDevices\ which is basically where anything you can open with CreateFile resides. (QueryDosDevice is a function which gives you all the members of that namespace.)
Because it's such a commonly used path, \??\ also redirects to that namespace. So, to the kernel, the path C:\Windows is invalid -- it should really be written as something like \??\C:\Windows. That's where this notation comes from.
The \??\ prefix means the path is not parsed. It is not guaranteed on every name, so you will have to look for the prefix on a per-name basis and skip it if present.
Update: I could not find any definitive documentation explaining exactly that \??\ actually represents, but here are some links that mention the \??\ prefix in action:
http://www.flexhex.com/docs/articles/hard-links.phtml
Note that szTarget string must contain the path prefixed with the "non-parsed" prefix "\??\", and terminated with the backslash character, for example "\??\C:\Some Dir\".
http://social.msdn.microsoft.com/Forums/en-US/vbgeneral/thread/908b3927-1ee9-4e03-9922-b4fd49fc51a6
http://mjunction.googlecode.com/svn-history/r5/trunk/MJunction/MJunction/JunctionPoint.cs
This prefix indicates to NTFS that the path is to be treated as a non-interpreted path in the virtual file system.
Private Const NonInterpretedPathPrefix As String = "\??\"
I deployed an CGI DLL built with Delphi 2007 on the Windows 2008 server. Internally I need to use the current DLL path.
Normally I can use GetModuleFileName or GetModuleName, but on the server they both return:
\\?\c:\my\correct\path
Why the first 4 characters? It looks like a network path? Is there any way to exclude those first 4 characters?
The pertinent documentation is this:
Maximum Path Length Limitation
In the Windows API (with some exceptions discussed in the following
paragraphs), the maximum length for a path is MAX_PATH, which is
defined as 260 characters. A local path is structured in the following
order: drive letter, colon, backslash, name components separated by
backslashes, and a terminating null character. For example, the
maximum path on drive D is "D:\some 256-character path string"
where "" represents the invisible terminating null character for
the current system codepage. (The characters < > are used here for
visual clarity and cannot be part of a valid path string.)
Note File I/O functions in the Windows API convert "/" to "\" as part
of converting the name to an NT-style name, except when using the
"\\?\" prefix as detailed in the following sections.
The Windows API has many functions that also have Unicode versions to
permit an extended-length path for a maximum total path length of
32,767 characters. This type of path is composed of components
separated by backslashes, each up to the value returned in the
lpMaximumComponentLength parameter of the GetVolumeInformation
function (this value is commonly 255 characters). To specify an
extended-length path, use the "\\?\" prefix. For example, "\\?\D:\very
long path".
Note The maximum path of 32,767 characters is approximate, because
the "\\?\" prefix may be expanded to a longer string by the system at
run time, and this expansion applies to the total length.
The "\\?\" prefix can also be used with paths constructed according to
the universal naming convention (UNC). To specify such a path using
UNC, use the "\\?\UNC\" prefix. For example, "\\?\UNC\server\share",
where "server" is the name of the computer and "share" is the name of
the shared folder. These prefixes are not used as part of the path
itself. They indicate that the path should be passed to the system
with minimal modification, which means that you cannot use forward
slashes to represent path separators, or a period to represent the
current directory, or double dots to represent the parent directory.
Because you cannot use the "\\?\" prefix with a relative path,
relative paths are always limited to a total of MAX_PATH characters.
As long as you are calling Unicode versions of Windows API functions, then there's no need to strip the "\\?\" prefix. Because the path that you have been handed is a valid path.
As we discovered in the comments, you were calling an ANSI version of an API function. And when you do that, the "\\?\" prefix is not valid. So, stick to Unicode API functions and it's all good!
Is there an api in windows that retrieves the server name from a UNC path ? (\\server\share)
Or do i need to make my own ?
I found PathStripToRoot but it doesn't do the trick.
I don't know of a Win32 API for parsing a UNC path; however you should check for:
\\computername\share
\\?\UNC\computername\share (people use this to access long paths > 260 chars)
You can optionally also handle this case: smb://computername/share and this case hostname:/directorypath/resource
Read here for more information
This is untested, but maybe a combination of PathIsUNC() and PathFindNextComponent() would do the trick.
I don't know if there is a specific API for this, I would just implement the simple string handling on my own (skip past "\\" or return null, look for next \ or end of string and return that substring) possibly calling PathIsUNC() first
If you'll be receiving the data as plain text you should be able to parse it with a simple regex, not sure what language you use but I tend to use perk for quick searches like this. Supposing you have a large document containing multiple lines containing one path per line you can search on \\'s I.e
m/\\\\([0-9][0-9][0-9]\.(repeat 3 times, of course not recalling ip address requirements you might need to modify the first one for sure) then\\)? To make it optional and include the trailing slash, and finally (.*)\\/ig it's rough but should do the trick, and the path name should be in $2 for use!
I hope that was clear enough!