IOKit and the DiskArbitration framework can tell me a lot of things about mounted volumes on a mac, but they don't seem to be able to differentiate between HFS+ and HFS Standard volumes.
The IOKit/DA keys Content, DAVolumeKind and DAMediaContent are always Apple_HFS and hfs for both HFS Standard and HFS+ volumes.
diskutil and DiskUtility.app can tell the difference, but I they don't seem to have been open sourced by Apple.
p.s. statfs (2) does not differentiate
There are two ways to do this:
Use getattrlist() to retrieve the ATTR_VOL_SIGNATURE attribute for the mount path of the volume.
Use the Carbon call FSGetVolumeInfo() and look in the signature field of the returned struct.
The signature of a volume is a 16 bit value, usually interpreted as two ASCII characters. The signature for HFS is 'BD', HFS+ is 'H+', and case sensitive HFS+ is 'HX'.
The man page for getattrlist says the field is a u_int32, but the equivalent field in the FSVolumeInfo struct is only 16 bits, so I'm not sure which 16 bits of the 32 are filled in with the signature when using getattrlist, you'll probably have to just experiment a bit if you want to go the non-Carbon route.
getattrlist man page
HFS Plus Volume Format reference
FSGetVolumeInfo
In addition to Carbon FSGetVolumeInfo() which returns a FSVolumeInfo containing signature and filesystemID fields, there is the Cocoa -getFileSystemInfoForPath: method of class NSWorkspace which returns a string representation of filesystem type: e.g., hfs for HFS+ and msdos for DOS FAT.
The other problem you can run into should you ever try your hand at reading partition maps directly is that, historically, HFS+ volumes were nested inside HFS wrappers. This was done so that anyone trying to use an HFS+ disk with an older OS would see a single file on the drive explaining where all the rest of their data was.
Related
I have been closely following the Open ZFS development scene for OS X for the last few years. Things have progressed significantly over the last several months since the sad problems that occurred with Greenbytes, etc., but I have been pleased to see that we're finally close to getting real Spotlight support in place. I noticed this email passing by the other day from Jorgen Lundman (who has put a great deal of personal time into getting this going and contributing to the community) and thought perhaps others here might be interested in chiming in on this, his, topic regarding implementing Spotlight support for ZFS on OS X:
To summarize, I think the crux of this question boils down to this:
So then, what does "mds" use to iterate the mounted file systems? I do not
think the sources for "Spotlight-800.28" was ever released so we can't just
go look and learn, like we did for xnu, and IOkit.
It doesn't use the BSD getfsstat(), more likely it asks IOKit, and for some
reason rejects the lower mounts.
And the body of the email for convenience:
Hey guys,
So one of our long-term issues in OpenZFSonOSX is to play nice with Spotlight.
We have reached the point where everything sometimes pretends to work.
For example;
# mdfind helloworld4
/Volumes/hfs1/helloworld4.jpg
/Volumes/hfs2/helloworld4.jpg
/Volumes/zfs1/helloworld4.jpg
/Volumes/zfs2/helloworld4.jpg
Great, picks it up in our regular (control group) HFS mounted filesystems,
as well as the 2 ZFS mounts.
Mounted as:
/dev/disk2 on /Volumes/zfs1 (zfs, local, journaled)
/dev/disk2s1 on /Volumes/zfs2 (zfs, local, journaled)
# diskutil list
/dev/disk1
#: TYPE NAME SIZE IDENTIFIER
0: GUID_partition_scheme *42.9 GB disk1
1: ZFS 42.9 GB disk1s1
2: 6A945A3B-1DD2-11B2-99A6-080020736631 8.4 MB disk1s9
/dev/disk2
#: TYPE NAME SIZE IDENTIFIER
0: zfs_pool_proxy FEST *64.5 MB disk2
1: zfs_filesystem_proxy ssss 64.5 MB disk2s1
So you can see, the actual pool disk is /dev/disk1, and the fake nodes we
create for mounting as /dev/disk2*, as it appears to be required by
Spotlight to work at all. We internally also let the volumes auto-mount,
from issuing "diskutil mount -mountPoint %s %s".
We are not a VOLFS, so there is no ".vol/" directory, nor will mdutil -t
work. But these two points are true for MS-DOS as well, and that does work
with Spotlight.
We correctly reply to zfs.fsbundle's zfs.util for "-p" (volume name) and
"-k" (get uuid), done pre-flight to mounting by DA.
Using FSMegaInfo tool, we can confirm that stat, statfs, readdir, and
similar tests appear to match that of HFS.
So then, the problem.
The problem comes from mounting zfs inside zfs. Ie,
When we mount
/Volumes/hfs1/
/Volumes/hfs1/hfs2/
/Volumes/zfs1/
/Volumes/zfs1/zfs2/
# mdfind helloworld4
/Volumes/hfs1/helloworld4.jpg
/Volumes/hfs1/hfs2/helloworld4.jpg
/Volumes/zfs1/helloworld4.jpg
Absent is of course, "/Volumes/zfs1/zfs2/helloworld4.jpg".
Interestingly, this works
# mdfind -onlyin /Volumes/zfs1/zfs2/ helloworld4
/Volumes/zfs1/zfs2/helloworld4.jpg
And additionally, mounting in reverse:
/Volumes/hfs2/
/Volumes/hfs2/hfs1/
/Volumes/zfs2/
/Volumes/zfs2/zfs1/
# mdfind helloworld4
/Volumes/hfs2/helloworld4.jpg
/Volumes/hfs2/hfs1/helloworld4.jpg
/Volumes/zfs2/helloworld4.jpg
So whichever ZFS filesystem was mounted first, works, but not the second.
So the individual ZFS filesystems are both equal. It is as if it doesn't
realise the lower mount is its own device.
So then, what does "mds" use to iterate the mounted fileystems? I do not
think the sources for "Spotlight-800.28" was ever released so we can't just
go look and learn, like we did for xnu, and IOkit.
It doesn't use the BSD getfsstat(), more likely it asks IOKit, and for some
reason rejects the lower mounts.
Some observations:
# /System/Library/Filesystems/zfs.fs/zfs.util -k disk2
87F06909-B1F6-742F-7355-F0D597849138
# /System/Library/Filesystems/zfs.fs/zfs.util -k disk2s1
8F60C810-2D29-FCD5-2516-2D02EED4566B
# grep uu /Volumes/zfs1/.Spotlight-V100/VolumeConfiguration.plist
<key>uuid.87f06909-b1f6-742f-7355-f0d597849138</key>
# grep uu /Volumes/zfs1/zfs2/.Spotlight-V100/VolumeConfiguration.plist
<key>uuid.8f60c810-2d29-fcd5-2516-2d02eed4566b</key>
Any assistance is appreciated, the main issue tracking Spotlight is;
https://github.com/openzfsonosx/zfs/issues/116
The branch for it;
https://github.com/openzfsonosx/zfs/tree/issue116
vfs_getattr;
https://github.com/openzfsonosx/zfs/blob/issue116/module/zfs/zfs_vfsops.c#L2307
It appears to be down to some undocumented expectations in the vfs_vget method, to lookup entries based entirely on the inode number. Ie, stat /.vol/16777222/1102011
It is expected that vfs_vget sets the vnode_name correctly here, using a call like vnode_update_identity() or similar.
The documentation on IOCTL_MOUNTDEV_QUERY_UNIQUE_ID is a bit confusing... exactly what kind of ID should be returned in the MOUNTDEV_UNIQUE_ID structure?
The documentation for
typedef struct _MOUNTDEV_UNIQUE_ID {
USHORT UniqueIdLength;
UCHAR UniqueId[1];
} MOUNTDEV_UNIQUE_ID, *PMOUNTDEV_UNIQUE_ID;
says:
UniqueIdLength
Contains the length of unique volume ID.
UniqueId
Contains the unique volume ID. The format for unique volume names is "\??\Volume{GUID}\", where GUID is a globally unique identifier that identifies the volume.
However, there's something weird here: What should be the exact format of UniqueId? If it's meant to be in the \??\Volume{GUID}\ format, then what's the point of the UniqueIdLength field -- aren't they all the same size? Otherwise, what format does the device ID need to be in?
Furthermore, is this a device ID or a volume ID? In other words, is this supposed to be unique per medium (e.g. CD) or per device (CD drive)?
This kind of struct is pretty common in MS APIs - the UniqueID[1] variable is just a placeholder, in reality it's used as a UniqueId[UniqueIdLength] variable.
The ID is unique both per medium and per device - it depends on whether you're talking to a volume driver or a device class driver. The ID is intended to identify "something that can be mounted" - so e.g. a CD-ROM device, a fixed disk partition or an unpartitioned removable disk. The mount manager uses the ID a.o. to lookup where this particular volume was mounted before, and remount it at the same point.
From MSDN
Maybe there is misunderstanding about this structure.
I called DeviceIoControl(IOCTL_MOUNTDEV_QUERY_UNIQUE_ID) and got a string as the similar format to Device Interface Path, but it is just different of the prefix 4 characters, and then it saved in registry \HKLM\SYSTEM\MountedDevices.
MOUNTDEV_UNIQUE_ID is acquired upon a volume arrival notification where mountmgr!MountMgrMountedDeviceArrival invokes mountmgr!QueryDeviceInformation, which sends a IOCTL_MOUNTDEV_QUERY_UNIQUE_ID IRP to the volume PDO stack, which volmgr picks up, and I'm not sure what routine it is but in XP's ftdisk it was ftdisk!FtpQueryUniqueIdBuffer that determined whether to set the UniqueID member to a GPT partition GUID, a MBR signature + offset, or the symbolic link like STORAGE#RemovableMedia.... The symbolic link is based on the name of devnode that the volume PDO is part of, and the symbolic link was generated by IoRegisterDeviceInterface, which was then stored in the volume extension before alerting mountmgr of the volume arrival in the first place (alerting is done by IoSetDeviceInterfaceState Enable). On Windows 7 the volume PDO devnode name is STORAGE\Volume\_??_USBSTOR#Disk&Ven_SanDisk&Prod_Cruzer_B lade&Rev_1.27#4C530399920812105355&0#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}, the symbolic link is STORAGE#Volume#_??_USBSTOR#Disk&Ven_SanDisk&Prod_Cruzer_Blade&Rev_1.27#4C530399920812105355&0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b} but the MountedDevices data is _??_USBSTOR#Disk&Ven_SanDisk&Prod_Cruzer_Blade&Rev_1.27#4C530399920812105355&0#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}#{53f5630d-b6bf-11d0-94f2-00a0c91efb8b}. The symbolic link that is created is always to the volume PDO name, because the volume PDO is supplied to the call, which is \Device\HarddiskVolumeX.
FtpQueryUniqueIdBuffer uses the MBR signature and partition offset if it's an MBR disk, and uses the GPT partition GUID if it's a GPT disk, and uses the symbolic link if it's neither, which tends to be a regular USB drive mass storage volume that doesn't have a boot sector, and ftdisk considers a disk without a boot sector to be a 'superfloppy', so it looks for that flag on the volume extension. So that's how unique it is, MBR signature and GPT GUID uniqueness speak for themselves, but the symlink doesn't so I'll elaborate: it contains the DIID of the USBSTOR device, which includes the USB serial number, or if it doesn't have one, a system wide unique number determine according to the following scheme.
Mountmgr creates further symbolic links between the drive letter and volume device name, and the volume GUID and the volume device name, and then puts them in the MountedDevices database but uses the unique ID instead of the volume device name. The volume GUID \??\Volume{GUID}\ it generates using ExUuidCreate. IOCTL_MOUNTMGR_QUERY_POINTS shows each symbolic link for a mounted device, so it will show \DosDevices\C: -> \Device\HarddiskVolumeX and \??\Volume{GUID}\ -> \Device\HarddiskVolumeX and the unique ID of the mounted device and the name of the mounted device. It does not however show the symbolic link to \Device\HarddiskVolumeX created by IoRegisterDeviceInterface because the symbolic link wasn't created by mount manager, so it doesn't know about it.
I needed to recover the partition table I deleted accidentally. I used an application named TestDisk. Its simply mind blowing. I reads each cylinder from the disk. I've seen similar such applications which work with MBR & partitioning.
I'm curious.
How do they read
clusters/cylinders/sectors from the
disk? Is there some kind of API for this?
Is it again OS dependent? If so whats the way to for Linux & for windows?
EDIT:
Well, I'm not just curious I want a hands on experience. I want to write a simple application which displays each LBA.
Cylinders and sectors (wiki explanation) are largely obsoleted by the newer LBA (logical block addressing) scheme for addressing drives.
If you're curious about the history, use the Wikipedia article as a starting point. If you're just wondering how it works now, code is expected to simply use the LBA address (which works largely the same way as a file does - a linear array of bytes arranged in blocks)
It's easy due to the magic of *nix special device files. You can open and read /dev/sda the same way you'd read any other file.
Just use open, lseek, read, write (or pread, pwrite). If you want to make sure you're physically fetching data from a drive and not from kernel buffers you can open with the flag O_DIRECT (though you must perform aligned reads/writes of 512 byte chunks for this to work).
For *nix, there have been already answers (/dev directory); for Windows, there are the special objects \\.\PhisicalDriveX, with X as the number of the drive, which can be opened using the normal CreateFile API. To actually perform reads or writes you have then to use the DeviceIoControl function.
More info can be found in "Physical Disks and Volumes" section of the CreateFile API documentation.
I'm the OP. I'm combining Eric Seppanen's & Matteo Italia's answers to make it complete.
*NIX Platforms:
It's easy due to the magic of *nix special device files. You can open and read /dev/sda the same way you'd read any other file.
Just use open, lseek, read, write (or pread, pwrite). If you want to make sure you're physically fetching data from a drive and not from kernel buffers you can open with the flag O_DIRECT (though you must perform aligned reads/writes of 512 byte chunks for this to work).
Windows Platform
For Windows, there are the special objects \\.\PhisicalDriveX, with X as the number of the drive, which can be opened using the normal CreateFile API. To perform reads or writes simply call ReadFile and WriteFile (buffer must be aligned on sector size).
More info can be found in "Physical Disks and Volumes" section of the CreateFile API documentation.
Alternatively you can also you DeviceIoControl function which sends a control code directly to a specified device driver, causing the corresponding device to perform the corresponding operation.
On linux, as root, you can save your MBR like this (Assuming you drive is /dev/sda):
dd if=/dev/sda of=mbr bs=512 count=1
If you wanted to read 1Mb from you drive, starting at the 10th MB:
dd if=/dev/sda of=1Mb bs=1Mb count=1 skip=10
I'm trying to figure out the available disk space programmatically in windows. For this, I need to first get a list of the available drives, then check which of those are local drives and then query the available bytes on each local drive.
I'm a bit stuck on the first part, where the API presents two functions:
GetLogicalDrives (http://msdn.microsoft.com/en-us/library/aa364972(VS.85).aspx) which gives you a DWORD with the bits set (bit 0 if drive A is present, bit 1 if drive B etc)
GetLogicalDriveStrings (http://msdn.microsoft.com/en-us/library/aa364975(VS.85).aspx) which gives you the actual strings.
Now, although I'll be using strings later on, I'd prefer using the first option for querying. However, on my system a DWORD is typedef-ed to "unsigned long", which is 4 bytes, whereas drive letters only range A-Z (26 - i think - characters). Obviously, one can define more than 26 drives on their system (however unlikely they are to do so) - so I was wondering if there was any convention for those drives. Can someone point me to a resource on this?
Thanks.
DWORD is always 4 bytes, regardless of the system (it's a Win32 type).
The maximum for drive letters in Windows is 26. Because English alphabet has only 26 letters :). However, Windows allows two ways to mount a volume:
to a drive letter
to a directory (on an NTFS volume).
You can mount one volume to multiple locations (but no more than one drive letter, IIRC). A GUI for this task is presented by Control Panel -> Administrative Tools -> Computer Management -> Disk Management.
If you want to have more than 26 drives with the additional drives being redirects to already active drives and are okay with them not working properly in most programs, then you can assign more with the following method (be warned they won't even show up in the file explorer):
subst ♪: C:\Temp\
cd /D ♪:\
and to delete them (also they aren't preserved through restarts):
subst /D ♪:
You can enumerate all volumes and their mount points as described in this article.
You could use WMI. The following WMI query should list all drives:
SELECT * FROM Win32_DiskDrive
It it not sufficient to enumerate MS-DOS drives (there can be at most 26 of them, by the way, although each can be bound twice, once globally and once locally in your session), a volume can, for example, be mounted to a directory. What you want is probably to enumerate all volumes in the system, using FindFirstVolume et al. Take a look at the associated MSDN example.
I have come up against this problem a few times at inopportune moments:
Trying to work on open source Java projects with deep paths
Storing deep Fitnesse wiki trees in source control
An error trying to use Bazaar to import my source control tree
Why does this limit exist?
Why hasn't it been removed yet?
How do you cope with the path limit?
And no, switching to Linux or Mac OS X is not a valid answer to this question ;)
Quoting this article https://learn.microsoft.com/en-us/windows/desktop/FileIO/naming-a-file#maximum-path-length-limitation
Maximum Path Length Limitation
In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters. A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string<NUL>" where "<NUL>" represents the invisible terminating null character for the current system codepage. (The characters < > are used here for visual clarity and cannot be part of a valid path string.)
Now we see that it is 1+2+256+1 or [drive][:\][path][null] = 260. One could assume that 256 is a reasonable fixed string length from the DOS days. And going back to the DOS APIs we realize that the system tracked the current path per drive, and we have 26 (32 with symbols) maximum drives (and current directories).
The INT 0x21 AH=0x47 says “This function returns the path description without the drive letter and the initial backslash.” So we see that the system stores the CWD as a pair (drive, path) and you ask for the path by specifying the drive (1=A, 2=B, …), if you specify a 0 then it assumes the path for the drive returned by INT 0x21 AH=0x15 AL=0x19. So now we know why it is 260 and not 256, because those 4 bytes are not stored in the path string.
Why a 256 byte path string, because 640K is enough RAM.
This is not strictly true as the NTFS filesystem supports paths up to 32k characters. You can use the win32 api and "\\?\" prefix the path to use greater than 260 characters.
A detailed explanation of long path from the .Net BCL team blog.
A small excerpt highlights the issue with long paths
Another concern is inconsistent behavior that would result by exposing long path support. Long paths with the \\?\ prefix can be used in most of the file-related Windows APIs, but not all Windows APIs. For example, LoadLibrary, which maps a module into the address of the calling process, fails if the file name is longer than MAX_PATH. So this means MoveFile will let you move a DLL to a location such that its path is longer than 260 characters, but when you try to load the DLL, it would fail. There are similar examples throughout the Windows APIs; some workarounds exist, but they are on a case-by-case basis.
The question is why does the limitation still exist. Surely modern Windows can increase the side of MAX_PATH to allow longer paths. Why has the limitation not been removed?
The reason it cannot be removed is that Windows promised it would never change.
Through API contract, Windows has guaranteed all applications that the standard file APIs will never return a path longer than 260 characters.
Consider the following correct code:
WIN32_FIND_DATA findData;
FindFirstFile("C:\Contoso\*", ref findData);
Windows guaranteed my program that it would populate my WIN32_FIND_DATA structure:
WIN32_FIND_DATA {
DWORD dwFileAttributes;
FILETIME ftCreationTime;
FILETIME ftLastAccessTime;
FILETIME ftLastWriteTime;
//...
TCHAR cFileName[MAX_PATH];
//..
}
My application didn't declare the value of the constant MAX_PATH, the Windows API did. My application used that defined value.
My structure is correctly defined, and only allocates 592 bytes total. That means that i am only able to receive a filename that is less than 260 characters. Windows promised me that if i wrote my application correctly, my application would continue to work in the future.
If Windows were to allow filenames longer than 260 characters then my existing application (which used the correct API correctly) would fail.
For anyone calling for Microsoft to change the MAX_PATH constant, they first need to ensure that no existing application fails. For example, i still own and use a Windows application that was written to run on Windows 3.11. It still runs on 64-bit Windows 10. That is what backwards compatibility gets you.
Microsoft did create a way to use the full 32,768 path names; but they had to create a new API contract to do it. For one, you should use the Shell API to enumerate files (as not all files exist on a hard drive or network share).
But they also have to not break existing user applications. The vast majority of applications do not use the shell api for file work. Everyone just calls FindFirstFile/FindNextFile and calls it a day.
From Windows 10. you can remove the limitation by modifying a registry key.
Tip Starting in Windows 10, version 1607, MAX_PATH limitations have been removed from common Win32 file and directory functions. However, you must opt-in to the new behavior.
A registry key allows you to enable or disable the new long path behavior. To enable long path behavior set the registry key at HKLM\SYSTEM\CurrentControlSet\Control\FileSystem LongPathsEnabled (Type: REG_DWORD). The key's value will be cached by the system (per process) after the first call to an affected Win32 file or directory function (list follows). The registry key will not be reloaded during the lifetime of the process. In order for all apps on the system to recognize the value of the key, a reboot might be required because some processes may have started before the key was set.
The registry key can also be controlled via Group Policy at Computer Configuration > Administrative Templates > System > Filesystem > Enable NTFS long paths.
You can also enable the new long path behavior per app via the manifest:
<application xmlns="urn:schemas-microsoft-com:asm.v3">
<windowsSettings xmlns:ws2="http://schemas.microsoft.com/SMI/2016/WindowsSettings">
<ws2:longPathAware>true</ws2:longPathAware>
</windowsSettings>
</application>
You can mount a folder as a drive. From the command line, if you have a path C:\path\to\long\folder you can map it to drive letter X: using:
subst x: \path\to\long\folder
One way to cope with the path limit is to shorten path entries with symbolic links.
For example:
create a C:\p directory to keep short links to long paths
mklink /J C:\p\foo C:\Some\Crazy\Long\Path\foo
add C:\p\foo to your path instead of the long path
You can enable long path names using PowerShell:
Set-ItemProperty -Path 'HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem' -Name LongPathsEnabled -Type DWord -Value 1
Another version is to use a Group Policy in Computer Configuration/Administrative Templates/System/Filesystem:
As to why this still exists - MS doesn't consider it a priority, and values backwards compatibility over advancing their OS (at least in this instance).
A workaround I use is to use the "short names" for the directories in the path, instead of their standard, human-readable versions. So e.g. for C:\Program Files\ I would use C:\PROGRA~1\ You can find the short name equivalents using dir /x.
As to how to cope with the path size limitation on Windows - using 7zip to pack (and unpack) your path-length sensitive files seems like a viable workaround. I've used it to transport several IDE installations (those Eclipse plugin paths, yikes!) and piles of autogenerated documentation and haven't had a single problem so far.
Not really sure how it evades the 260 char limit set by Windows (from a technical PoV), but hey, it works!
More details on their SourceForge page here:
"NTFS can actually support pathnames up to 32,000 characters in
length."
7-zip also support such long names.
But it's disabled in SFX code. Some users don't like long paths, since
they don't understand how to work with them. That is why I have
disabled it in SFX code.
and release notes:
9.32 alpha 2013-12-01
Improved support for file pathnames longer than 260 characters.
4.44 beta 2007-01-20
7-Zip now supports file pathnames longer than 260 characters.
IMPORTANT NOTE: For this to work properly, you'll need to specify the destination path in the 7zip "Extract" dialog directly, rather than dragging & dropping the files into the intended folder. Otherwise the "Temp" folder will be used as an interim cache and you'll bounce into the same 260 char limitation once Windows Explorer starts moving the files to their "final resting place". See the replies to this question for more information.
It does, and it is a default for some reason, but you could easily override it with this registry key:
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem]
"LongPathsEnabled"=dword:00000001
See: https://blogs.msdn.microsoft.com/jeremykuhne/2016/07/30/net-4-6-2-and-long-paths-on-windows-10/
Another way to cope with it is to use Cygwin, depending on what do you want to do with the files (i.e. if Cygwin commands suit your needs)
For example it allows to copy, move or rename files that even Windows Explorer can't. Or of course deal with the contents of them like md5sum, grep, gzip, etc.
Also for programs that you are coding, you could link them to the Cygwin DLL and it would enable them to use long paths (I haven't tested this though)