How do they read clusters/cylinders/sectors from the disk? - windows

I needed to recover the partition table I deleted accidentally. I used an application named TestDisk. Its simply mind blowing. I reads each cylinder from the disk. I've seen similar such applications which work with MBR & partitioning.
I'm curious.
How do they read
clusters/cylinders/sectors from the
disk? Is there some kind of API for this?
Is it again OS dependent? If so whats the way to for Linux & for windows?
EDIT:
Well, I'm not just curious I want a hands on experience. I want to write a simple application which displays each LBA.

Cylinders and sectors (wiki explanation) are largely obsoleted by the newer LBA (logical block addressing) scheme for addressing drives.
If you're curious about the history, use the Wikipedia article as a starting point. If you're just wondering how it works now, code is expected to simply use the LBA address (which works largely the same way as a file does - a linear array of bytes arranged in blocks)

It's easy due to the magic of *nix special device files. You can open and read /dev/sda the same way you'd read any other file.
Just use open, lseek, read, write (or pread, pwrite). If you want to make sure you're physically fetching data from a drive and not from kernel buffers you can open with the flag O_DIRECT (though you must perform aligned reads/writes of 512 byte chunks for this to work).

For *nix, there have been already answers (/dev directory); for Windows, there are the special objects \\.\PhisicalDriveX, with X as the number of the drive, which can be opened using the normal CreateFile API. To actually perform reads or writes you have then to use the DeviceIoControl function.
More info can be found in "Physical Disks and Volumes" section of the CreateFile API documentation.

I'm the OP. I'm combining Eric Seppanen's & Matteo Italia's answers to make it complete.
*NIX Platforms:
It's easy due to the magic of *nix special device files. You can open and read /dev/sda the same way you'd read any other file.
Just use open, lseek, read, write (or pread, pwrite). If you want to make sure you're physically fetching data from a drive and not from kernel buffers you can open with the flag O_DIRECT (though you must perform aligned reads/writes of 512 byte chunks for this to work).
Windows Platform
For Windows, there are the special objects \\.\PhisicalDriveX, with X as the number of the drive, which can be opened using the normal CreateFile API. To perform reads or writes simply call ReadFile and WriteFile (buffer must be aligned on sector size).
More info can be found in "Physical Disks and Volumes" section of the CreateFile API documentation.
Alternatively you can also you DeviceIoControl function which sends a control code directly to a specified device driver, causing the corresponding device to perform the corresponding operation.

On linux, as root, you can save your MBR like this (Assuming you drive is /dev/sda):
dd if=/dev/sda of=mbr bs=512 count=1
If you wanted to read 1Mb from you drive, starting at the 10th MB:
dd if=/dev/sda of=1Mb bs=1Mb count=1 skip=10

Related

Intel Pin Tool: Get instruction from address

I'm using Intel's Pin Tool to do some binary instrumentation, and was wondering if there an API to get the instruction byte code at a given address.
Something like:
instruction = getInstructionatAddr(addr);
where addr is the desired address.
I know the function Instruction (used in many of the simple/manual examples) given by Pin gets the instruction, but I need to know the instructions at other addresses. I perused the web with no avail. Any help would be appreciated!
CHEERS
wondering if there an API to get the instruction byte code at a given
address
Yes, it's possible but in a somewhat contrived way: with PIN you are usually interested in what is executed (or manipulated through the executed instructions), so everything outside the code / data flow is not of any interest for PIN.
PIN is using (and thus ships with) Intel XED which is an instruction encoder / decoder.
In your PIN installation you should have and \extra folder with two sub-directories: xed-ia32 and xed-intel64 (choose the one that suits your architecture). The main include file for XED is xed-interface.h located in the \include folder of the aforementioned directories.
In your Pintool, given any address in the virtual space of your pintooled program, use the PIN_SafeCopy function to read the program memory (and thus bytes at the given address). The advantage of PIN_SafeCopy is that it fails graciously even if it can't read the memory, and can read "shadowed" parts of the memory.
Use XED to decode the instruction bytes for you.
For an example of how to decode an instruction with XED, see the first example program.
As the small example uses an hardcoded buffer (namely itext in the example program), replace this hardcoded buffer with the destination buffer you used in PIN_SafeCopy.
Obviously, you should make sure that the memory you are reading really contains code.
AFAIK, it is not possible to get an INS type (the usual type describing an instruction in PIN) from an arbitrary address as only addresses in the code flow will "generate" an INS type.
As a side note:
I know the function Instruction (used in many of the simple/manual
examples) given by Pin gets the instruction
The Instruction routine used in many PIN example is called an "Instrumentation routine": its name is not relevant in itself.
Pin_SafeCopy may help you. This API could copy memory content from the address space of target process to one specified buffer.

For strict education purposes, what exact format of bytes/bits do modern BIOS understand?

BIOS will look in the first 512 bytes of the first sector(at least on PC BIOS, AmeriTrend, PhoenixBIOS, etc.), and any .bin file binary formatted block of bytes will be understood by BIOS, am I correct here?
I just want to ask this to be certain, and because I want to assure that I don't make mistakes when writing my operating system carefully.
The BIOS will be executing under the processor and native-architecture obviously, so once I instruct BIOS with the binary to have the processor move the bytes in to memory I can then transfer control to my software which will then instruct the processor on what it does next, right?
I just want to know if I have this right, and I assure you this isn't spam, as I'm a curious hobbyist who has C/C++, Java, C#, x86 Assembly, and some hardware-design experience as well.
EDIT PEOPLE: I also would like to know if there's a modernized format, file, or block of bytes the BIOS must be assembled/compiled to to be executed, such as a .bin.
As pst comment says, the boot sector is treated as i386 machine code.
The last 2 bytes need to match a special signature (0x55AA), but I think that is it as far as hard requirements.
The code just gets loaded and executed as is.
If you are trying to conform to MBR or GPT partition specs (so that other OS's can see your disk partitions) there is more to it, but that is another thing altogether.
There is no specific "file format" for a boot sector. The BIOS simply reads the raw bytes from the boot sector, and jumps to the first instruction. It is literally just a "block of bytes", the file extension (you keep mentioning .bin) is not relevant at all.

Read a chunk of a file using WINAPI's ReadFile or something similar?

Well, I'm working on a project, in which I'm handling potentially big files, that I can't load into ram all at once, so I'm going to treat them like a CHS hard drive, and grab the data one 0x800 byte chunk at a time.
My problem is, I cannot find any functions in the WINAPI that allow me to read the data from a file I've opened with CreateFile, starting at an offset.
And yes, it must be a WINAPI function, and no, I do not want to map the whole file into memory.
Thanks much, Bradley.
Use ReadFile with SetFilePointer

(windows) raw write to file without involving win32api

I would like to know if there is an option, and if so - how exactly, to be able to write raw bytes to a file without using WIN32API file handling calls, while in Windows.
I tried to use a stright-forward approach using x86asm direct file calls, but without success in the meantime.
You can try using the native API from ntdll or even direct syscalls (int 2eh or systenter instruction), but it's quite tricky - you need to use kernel-style filenames, for one.
Before answering your question let me mention that writing to a file using API in Windows consists of following (simplified) stages:
You call WriteFile (kernel32.dll)
WriteFile calls NtWriteFile (ntdll.dll)
NtWriteFile calls SYSENTER and operation proceeds to kernel mode
In kernel mode NtWriteFile function of Ntoskrnl.exe is called
This sends IRP_MJ_WRITE to file system driver
File system driver determines which sectors should be written and passes to storage driver
Storage driver sends a command to the hard drive to actually write data to specified sectors
Hard drive writes the data
All operations 1 to 7 are very fast compared to 8 (unless you are working with a RAM drive or extremely fast SSD)
Method 1 - You can skip Step 1 easily (by calling NtWriteFile), and Step2 (by calling SYSENTER - not easy). However you will not gain any performance improvement, so no point in doing it. Consider WriteFile just a wrapper for those (I don't think you are after eliminating one extra function call).
Method 2 - you can find out which sectors the file occupies and write to them directly (effectively skipping all steps down to Step 7). To do that you will need to open and lock the volume, find the clusters that the target file occupies by FSCTL_GET_RETRIEVAL_POINTERS call, and call WriteFile on volume handle.
But it will be unfair comparison, because file system driver not only writes to the data sectors, but also updates file system metadata when you call WriteFile.
Bottom line is - "Testing efficiency over win32 API" doesn't make much sense. You can skip some of the stuff that OS does, but either won't give you any difference in speed (method 1), or there will be unfair comparison (method2).

Doing a zero-copy move of data from a Linux kernel buffer to hard disk

am trying to move data from a buffer in kernel space into the hard
disk without having to incur any additional copies from kernel buffer to
user buffers or any other kernel buffers. Any ideas/suggestions would be
most helpful.
The use case is basically a demux driver which collects data into a
demux buffer in kernel space and this buffer has to be emptied
periodically by copying the contents into a FUSE-based partition on the
disk. As the buffer gets full, a user process is signalled which then
determines the sector numbers on the disk the contents need to be copied
to.
I was hoping to mmap the above demux kernel buffer into user address
space and issue a write system call to the raw partition device. But
from what I can see, the this data is being cached by the kernel on its
way to the Hard Disk driver. And so I am assuming that involves
additional copies by the linux kernel.
At this point I am wondering if there is any other mechansim to do this
without involving additional copies by the kernel. I realize this is an
unsual usage scenario for non-embedded environments, but I would
appreciate any feedback on possible options.
BTW - I have tried using O_DIRECT when opening the raw partition, but
the subsequent write call fails if the buffer being passed is the
mmapped buffer.
Thanx!
You need to expose your demux buffer as a file descriptor (presumably, if you're using mmap() then you're already doing this - great!).
On the kernel side, you then need to implement the splice_read member of struct file_operations.
On the userspace side, create a pipe(), then use splice() twice - once to move the data from the demux file descriptor into the pipe, and a second time to move the data from the pipe to the disk file. Use the SPLICE_F_MOVE flag.
As documented in the splice() man page, it will avoid actual copies where it can, by copying references to pages of kernel memory rather than the pages themselves.

Resources