Where to store the bootloader on a floppy image? - bootloader

I'm going to write and test a bootloader. In order to do this, I am planning to copy the bootloader onto a floppy image file and mount it in a VM.
However, I'm not sure where to put the bootloader's machine code. Does it just get dumped into the first few bytes of the file?

The boot sector of the floppy was the first sector. If you're talking about a raw floppy image (1440K), it should be the first 512 bytes of the image file.
From memory, this gets loaded by the BIOS into 7c00:0000 (real mode) and then jumps to that address.
The DOS boot floppies had a 3-byte JMP instruction there to jump over the Disk Parameter Block (DPB), which detailed the attributes of the disk. But, if you're in total control of the disk and your boot code, I don't think you need to follow that convention. I don't recall any BIOS' checking what was loaded for validity (though admittedly it was a long time ago).

its been a VERY long time but if i recall in DOS it was stored in the MBR. i believe its still the same today
http://en.wikipedia.org/wiki/Master_boot_record

Related

Memory mapping in Virtual Address Space(VAS)

This [wiki article] about Virtual memory says:
The process then starts executing bytes in the exe file. However, the
only way the process can use or set '-' values in its VAS is to ask
the OS to map them to bytes from a file. A common way to use VAS
memory in this way is to map it to the page file.
A diagram follows :
0 4GB
VAS |---vvvvvvv----vvvvvv---vvvv----vv---v----vvv--|
mapping ||||||| |||||| |||| || | |||
file bytes app.exe kernel user system_page_file
I didn't understand the part values in its VAS is to ask the OS to map them to bytes from a file.
What is the system page file here?
First off, I can't imagine such a badly written article to exist in Wikipedia. One has to be an expert already familiar with the topic before being able to understand what was described.
Assuming you understand the rest of the article, the '-' part represents unallocated virtual address within the 4GB address space available to a process. So the sentence "the only way the process can use or set '-' values in its VAS is to ask the OS to map them to bytes from a file" means to allocate virtual memory address e.g. in a Windows native program calling VirtualAlloc(), or a C program calling malloc() to allocate some memory to store program data while those memory were not already existing in the current process's virtual address space.
When Windows allocates memory to a process address space, it normally associate those memory with the paging file in the hard disk. The c:\pagefile.sys is this paging file which is the system_page_file mentioned in the article. Memory page is swapped out to that file when there is not enough physical page to accommodate the demand.
Hope that clarifies

Size constraints of initramfs on ARM?

I'm creating a bootable Linux system on a PicoZed board (ARM CortexA9 core), and I've run into a "limitation", which I don't think really is a limitation (I get the feeling it's another problem masquerading as a limitation).
I boot by starting the system in JTAG boot mode; after powering on the board I use the xmd debugger to place u-boot into the system's RAM and then I run it.
Next I place the kernel (uImage), the gzip'd initramfs image and the device tree into memory. Finally I tell u-boot to boot the system using bootm, and using three arguments to point out the memory locations of all the images.
All of this works, and I manage to boot up Linux + userland. However, I need to grow the initramfs, and this is where I run into problems.
The working image is 16MiB exactly. I tried to make it 24MiB (completely regenerated from scratch each time I try to boot), but just after the kernel has loaded and it tries to find init the kernel reports file system faults and fails. There shouldn't be any overlaps, but just in case I tried moving things around a little, but the same problem occurred.
After searching for some tips, I saw someone on a forum say that the image needs to be placed at a 16MiB alignment (which I don't think is true, but I tried it none the less, but it didn't get a working system). Another post claimed that the images must be aligned with their sizes (which I again don't think is true, but I tried that as well, but with no change). Yet another post claimed that this happens if the initramfs image crosses the __init end boundary, and that placing the initramfs image firmly inside will allow the memory to be reclaimed after the image has been uncompressed by the kernel, and placing it beyond the __init section will work but then that memory is forever lost after boot. I know way too little about Linux in order to know if any of this is in any way true/accurate, and I have no idea where "__init"'s - if such a thing exists - end-boundary is, but if the issue is that I was crossing it I tried moving the initramfs image way beyond anywhere were I was previously using it, but that didn't change anything.
I also tried the original location (which works with a 16MiB image) and create a 16MiB + 1K sized image, but this didn't work either. (Checking that it didn't overlay any of the over images, obviously).
This originally led me to think there's a 16MiB initramfs size limit lurking somewhere. But on searching for it, it got me thinking that doesn't make sense -- as far as I can gather the bootm command in u-boot shoud set up the tag list for the system (which includes the location and size of the initramfs), and I haven't come across any note about a 16MiB limit in relation to the tagged lists for the initramfs so far.
I have found a page which claims that the initramfs size is in practical terms limited to roughly half the size of physical ram in the system, and the PicoZed board has 1G RAM, so we're in orders of magnitude away from what "should" be a limitation.
To clarify: The 16MiB image is the size of the raw image; compressed it's just under 6MiB. However, if I populate it so it's not compressed as tightly it doesn't make any difference -- the problem doesn't appear to be related to the size of the compressed image, only the uncompressed image.
Main question:
Where is this apparent 16MiB initramfs size limit coming from?
Side-question:
Is there such a thing as an "kernel __init section" which is reclaimed by the kernel after it has uncompressed/loaded images? If so, how do I see/configure the location/size of it?
Size constraints of initramfs on ARM?
I have not encountered a 16 MiB size constraint as you allege.
At one time I thought there was a size limit too, but that turned out to be a memory footprint issue during boot. Once that was sorted out I've been using large initramfs of 30MiB (e.g. with glibc, gtstreamer and qt5 libraries).
Where is this apparent 16MiB initramfs size limit coming from?
There isn't one. A ramfs is only constrained by available RAM.
There is a definition for the "Default RAM disk size", but this would not affect the size of an initramfs.
Your method of booting with the U-Boot bootm command is suspect, i.e. you're passing the memory address of the initramfs as the second argument.
The U-Boot documentation clearly describes the second argument as "the address of an initrd image" (emphasis added).
There is no mention of initramfs as an argument.
Linux kernel documentation states that the initramfs archive can be "linked into the linux kernel image". There's a kernel menuconfig entry for specifying the path to the initramfs cpio file. The make script will append this cpio file to the kernel image so that there is a single image file for booting.
Or (like an initrd) "a separate file" can be passed to the kernel at boot to populate the initramfs.
U-Boot passes the location (and length) of this archive either as an ATAG entry or as a reserved memory region in the Device Tree blob.
The kernel expects either a cpio archive for the initramfs or a tar archive for an initrd.
You neglect to mention (besides its compression) what kind of archive this "initramfs" or "separate file" actually is .
So it's not clear if you're booting the kernel with an initrd (tar archive) or an initramfs (cpio archive).
Your repeated reference to the initramfs as an "image" file rather than a cpio archive suggests that you really are using an initrd.
An initrd would definitely have a size constraint.
Is there such a thing as an "kernel __init section" which is reclaimed by the kernel after it has uncompressed/loaded images?
Yes, there is an __init section of memory, which is released after kernel initialization is complete.
If so, how do I see/configure the location/size of it?
Usually routines and data that have no use after initialization can be declared with the __init macro. See this.
The location and size of this memory section would be under the control of the linker script, rather than explicit user control. The kernel System.map file should have the info for review.
I think you need to pass ramdisk_size in uboot bootargs command
ramdisk_size needs to be set if the ram-disk uncompress file
size is bigger than default setting. It should be more than
ramdisk uncompress file size.

x86 segmentation, DOS, MZ file format, and disassembling

I'm disassembling "Test Drive III". It's a 1990 DOS game. The *.EXE has MZ format.
I've never dealt with segmentation or DOS, so I would be grateful if you answered some of my questions.
1) The game's system requirements mention 286 CPU, which has protected mode. As far as I know, DOS was 90% real mode software, yet some applications could enter protected mode. Can I be sure that the app uses the CPU in real mode only? IOW, is it guaranteed that the segment registers contain actual offset of the segment instead of an index to segment descriptor?
2) Said system requirements mention 1 MB of RAM. How is this amount of RAM even meant to be accessed if the uppermost 384 KB of the address space are reserved for stuff like MMIO and ROM? I've heard about UMBs (using holes in UMA to access RAM) and about HMA, but it still doesn't allow to access the whole 1 MB of physical RAM. So, was precious RAM just wasted because its physical address happened to be reserved for UMA? Or maybe the game uses some crutches like LIM EMS or XMS?
3) Is CS incremented automatically when the code crosses segment boundaries? Say, the IP reaches 0xFFFF, and what then? Does CS switch to the next segment before next instruction is executed? Same goes for SS. What happens when SP goes all the way down to 0x0000?
4) The MZ header of the executable looks like this:
signature 23117 "0x5a4d"
bytes_in_last_block 117
blocks_in_file 270
num_relocs 0
header_paragraphs 32
min_extra_paragraphs 3349
max_extra_paragraphs 65535
ss 11422
sp 128
checksum 0
ip 16
cs 8385
reloc_table_offset 30
overlay_number 0
Why does it have no relocation information? How is it even meant to run without address fixups? Or is it built as completely position-independent code consisting from program-counter-relative instructions? The game comes with a cheat utility which is also an MZ executable. Despite being much smaller (8448 bytes - so small that it fits into a single segment), it still has relocation information:
offset 1
segment 0
offset 222
segment 0
offset 272
segment 0
This allows IDA to properly disassemble the cheat's code. But the game EXE has nothing, even though it clearly has lots of far pointers.
5) Is there even such thing as 'sections' in DOS? I mean, data section, code (text) section etc? The MZ header points to the stack section, but it has no information about data section. Is data and code completely mixed in DOS programs?
6) Why even having a stack section in EXE file at all? It has nothing but zeroes. Why wasting disk space instead of just saying, "start stack from here"? Like it is done with BSS section?
7) MZ header contains information about initial values of SS and CS. What about DS? What's its initial value?
8) What does an MZ executable have after the exe data? The cheat utility has whole 3507 bytes in the end of the executable file which look like
__exitclean.__exit.__restorezero._abort.DGROUP#.__MMODEL._main._access.
_atexit._close._exit._fclose._fflush._flushall._fopen._freopen._fdopen
._fseek._ftell._printf.__fputc._fputc._fputchar.__FPUTN.__setupio._setvbuf
._tell.__MKNAME._tmpnam._write.__xfclose.__xfflush.___brk.___sbrk._brk._sbrk
.__chmod.__close._ioctl.__IOERROR._isatty._lseek.__LONGTOA._itoa._ultoa.
_ltoa._memcpy._open.__open._strcat._unlink.__VPRINTER.__write._free._malloc
._realloc.__REALCVT.DATASEG#.__Int0Vector.__Int4Vector.__Int5Vector.
__Int6Vector.__C0argc.__C0argv.__C0environ.__envLng.__envseg.__envSize
Is this some kind of debugging symbol information?
Thank you in advance for your help.
Re. 1. No, you can't be sure until you prove otherwise to yourself. One giveaway would be the presence of MOV CR0, ... in the code.
Re. 2. While marketing materials aren't to be confused with an engineering specification, there's a technical reason for this. A 286 CPU could address more than 1M of physical address space. The RAM was only "wasted" in real mode, and only if an EMM (or EMS) driver wasn't used. On 286 systems, the RAM past 640kb was usually "pushed up" to start at the 1088kb mark. The ISA and on-board peripherals' memory address space was mapped 1:1 into the 640-1024kb window. To use the RAM from the real mode needed an EMM or EMS driver. From protected mode, it was simply "there" as soon as you set up the segment descriptor correctly.
If the game actually needed the extra 384kb of RAM over the 640kb available in the real mode, it's a strong indication that it either switched to protected mode or required the services or an EMM or EMS driver.
Re. 3. I wish I remembered that. On reflection, I wish not :) Someone else please edit or answer separately. Hah, I did know it at some point in time :)
Re. 4. You say "[the code] has lots of instructions like call far ptr 18DCh:78Ch". This implies one of three things:
Protected mode is used and the segment part of the address is a selector into the segment descriptor table.
There is code there that relocates those instructions without DOS having to do it.
There is code there that forcibly relocates the game to a constant position in the address space. If the game doesn't use DOS to access on-disk files, it can remove DOS completely and take over, gaining lots of memory in the process. I don't recall whether you could exit from the game back to the command prompt. Some games where "play until you reboot".
Re. 5. The .EXE header does not "point" to any stack, there is no stack section you imply, the concept of sections doesn't exist as far as the .EXE file is concerned. The SS register value is obtained by adding the segment the executable was loaded at with the SS value from the header.
It's true that the linker can arrange sections contiguously in the .EXE file, but such sections' properties are not included in the .EXE header. They often can be reverse-engineered by inspecting the executable.
Re. 6. The SS and SP values in the .EXE header are not file pointers. The EXE file might have a part that maps to the stack, but that's entirely optional.
Re. 7. This is already asked and answered here.
Re. 8. This looks like a debug symbol list. The cheat utility was linked with the debugging information left in. You can have completely arbitrary data there - often it'd various resources (graphics, music, etc.).

For strict education purposes, what exact format of bytes/bits do modern BIOS understand?

BIOS will look in the first 512 bytes of the first sector(at least on PC BIOS, AmeriTrend, PhoenixBIOS, etc.), and any .bin file binary formatted block of bytes will be understood by BIOS, am I correct here?
I just want to ask this to be certain, and because I want to assure that I don't make mistakes when writing my operating system carefully.
The BIOS will be executing under the processor and native-architecture obviously, so once I instruct BIOS with the binary to have the processor move the bytes in to memory I can then transfer control to my software which will then instruct the processor on what it does next, right?
I just want to know if I have this right, and I assure you this isn't spam, as I'm a curious hobbyist who has C/C++, Java, C#, x86 Assembly, and some hardware-design experience as well.
EDIT PEOPLE: I also would like to know if there's a modernized format, file, or block of bytes the BIOS must be assembled/compiled to to be executed, such as a .bin.
As pst comment says, the boot sector is treated as i386 machine code.
The last 2 bytes need to match a special signature (0x55AA), but I think that is it as far as hard requirements.
The code just gets loaded and executed as is.
If you are trying to conform to MBR or GPT partition specs (so that other OS's can see your disk partitions) there is more to it, but that is another thing altogether.
There is no specific "file format" for a boot sector. The BIOS simply reads the raw bytes from the boot sector, and jumps to the first instruction. It is literally just a "block of bytes", the file extension (you keep mentioning .bin) is not relevant at all.

How do they read clusters/cylinders/sectors from the disk?

I needed to recover the partition table I deleted accidentally. I used an application named TestDisk. Its simply mind blowing. I reads each cylinder from the disk. I've seen similar such applications which work with MBR & partitioning.
I'm curious.
How do they read
clusters/cylinders/sectors from the
disk? Is there some kind of API for this?
Is it again OS dependent? If so whats the way to for Linux & for windows?
EDIT:
Well, I'm not just curious I want a hands on experience. I want to write a simple application which displays each LBA.
Cylinders and sectors (wiki explanation) are largely obsoleted by the newer LBA (logical block addressing) scheme for addressing drives.
If you're curious about the history, use the Wikipedia article as a starting point. If you're just wondering how it works now, code is expected to simply use the LBA address (which works largely the same way as a file does - a linear array of bytes arranged in blocks)
It's easy due to the magic of *nix special device files. You can open and read /dev/sda the same way you'd read any other file.
Just use open, lseek, read, write (or pread, pwrite). If you want to make sure you're physically fetching data from a drive and not from kernel buffers you can open with the flag O_DIRECT (though you must perform aligned reads/writes of 512 byte chunks for this to work).
For *nix, there have been already answers (/dev directory); for Windows, there are the special objects \\.\PhisicalDriveX, with X as the number of the drive, which can be opened using the normal CreateFile API. To actually perform reads or writes you have then to use the DeviceIoControl function.
More info can be found in "Physical Disks and Volumes" section of the CreateFile API documentation.
I'm the OP. I'm combining Eric Seppanen's & Matteo Italia's answers to make it complete.
*NIX Platforms:
It's easy due to the magic of *nix special device files. You can open and read /dev/sda the same way you'd read any other file.
Just use open, lseek, read, write (or pread, pwrite). If you want to make sure you're physically fetching data from a drive and not from kernel buffers you can open with the flag O_DIRECT (though you must perform aligned reads/writes of 512 byte chunks for this to work).
Windows Platform
For Windows, there are the special objects \\.\PhisicalDriveX, with X as the number of the drive, which can be opened using the normal CreateFile API. To perform reads or writes simply call ReadFile and WriteFile (buffer must be aligned on sector size).
More info can be found in "Physical Disks and Volumes" section of the CreateFile API documentation.
Alternatively you can also you DeviceIoControl function which sends a control code directly to a specified device driver, causing the corresponding device to perform the corresponding operation.
On linux, as root, you can save your MBR like this (Assuming you drive is /dev/sda):
dd if=/dev/sda of=mbr bs=512 count=1
If you wanted to read 1Mb from you drive, starting at the 10th MB:
dd if=/dev/sda of=1Mb bs=1Mb count=1 skip=10

Resources