How Linux Kernel knows how to execute a binary format - linux-kernel

I'm reading about binary formats, the ELF format for example, so suppose i have two binary files, one compiled as an ELF file and another as a COFF(Or another binary format), how the kernel handles this? i mean, when you execute the program, how linux knows how to handle each different format?? Has the kernel some interface which selects according the header of the binary, the correct code to handle each kind of binary??

As you said, the kernel detects the type of binary based on the header.
Different binary formats are registered using register_binfmt(). Take a look at the fs/binfmt_* files for the different implementations.
This is done by exec_binprm() - basically the meat of the execve syscall - (in fs/exec.c). It calls search_binary_handler(), which searches the registered format handlersto find one willing to handle the file.

Related

Checksum inside Altera FPGA .jic file

I'm modifying a firmware file (.jic) JTAG Indirect Configuration File with a small algorithm, but changing data inside the file makes it unusable because there is a checksum somewhere in the file that has to be updated.
I need to find where is a checksum inside .jic file and decipher which algorithm is used (crc32, etc).
The bits on each byte are reversed and I inspected the normal and the reversed bit file with no success.
Does someone know or is there a way to find out where are is the checksum data inside the .jic file?
You need to generate a .rpd file.
This data will be loaded into the FPGA at power-up.
This is what you will see if you read flash memory byte-by-byte after loading .jic.
If you have access to the software that creates .jic files (e.g. Quartus) you can create two .jic files with one bit of difference and compare the two outputs (the two .jic) files. It should give you a hint about where the check is located (if there is one)
Not by starting from a .jic file. But if the data you're trying to update is initialized from a .hex or .mif file, you can use quartus_cdb --update_mif to perform a partial recompilation of your project. (This is also available in the IDE as "Update Memory Initialization File".)

Binary files format (ARM GCC)

what contains a binary file which come from a ARM GCC for ARM devices?
Is there inside it some information about destination address which write to?
Or just native, pure, content of program without information about memory location?
If i have a bootloader, Or any way through programmer, can i write a binary file everywhere in flash or written itself by internal information about specific memory address?
If i setup my linker-script to write a program in a specific memory address, is there an influence in the bin file?
There are several types of files, which are called "binary' (at least among my colleagues):
.bin file extension. Contains only data that would/could be written to single continuous partition. It doesn't contain any addresses or offsets inside. When flashing this file to microcontroller you should explicitly specify destination address (often this is 0x0, beginning of flash). If you need to write to different partitions you need separate .bin for each of them (or it can be merged one if these partitions are consecutive). So this file type is like memory footprint.
Pros: minimum overhead if you have a single continuous partition and destination address always the same (so it can be hardcoded)
.hex is an Intel hex file format. It contains destination address for each line in it. Can be opened in any text editor.
.s19 or .srec Motorola s-record. Very similar to .hex, just another format. Also can include some metadata, that wouldn't be flashed.
Pros of last two types: best choice if you have several inconsistent partitions. Can be compressed by removing gaps
For VSCode there are several plugins that can highlight .s19 and .hex files

Generating a PE format executable

I'm trying to generate a PE format executable; I'm at the stage where I have something that dumpbin is happy with, and as far as I can tell is not materially different from an empty program linked with Microsoft's linker, but Windows still rejects it: PE file - what's missing?
If I had some algorithm for generating a valid PE file, maybe I could hill climb from there. Here's what I've found so far:
There's plenty of documentation, sample code and tools for reading PE files, as opposed to generating them from scratch.
PE Bliss lists generation among its features, but won't compile.
Sample assembly language templates for PE file generation concentrate on minimizing size. The most promising looking one generates a file that Windows rejects even though as far as I can see it should be accepted; the one I found that did work, ironically, generates a file that Windows accepts even though as far as I can see it should be rejected, since almost every nominally essential component is missing or malformed.
Is there any sample code available that generates a correct PE file?
Here's the classic page about generating PE from scratch:
http://www.phreedom.org/research/tinype
As for the generic list of required/optional parts, see corkami page on the PE format:
http://code.google.com/p/corkami/wiki/PE
See also the code tree for many examples of small PE files, generated completely from scratch.

In-depth understanding of binary files

I am learning C++ specially about binary file structure/manipulation, and since I am totally new to the subject of binary files, bits, bites & hexadecimal numbers, I decided to take one step backward and establish a solid understanding on the subjects.
In the picture I have included below, I wrote two words (blue thief) in a .txt file.
The reason for this, is when I decode the file using a hexeditor, I wanted to understand how the information is really stored in hex format. Now, don't get me wrong, I am not trying to make a living out of reading hex formats all day, but only to have a minimum level of understanding the basics of a binary file's composition. I also, know all files have different structures, but just for the sake of understanding, I wanted to know, how exactly the words "blue thief" and a single ' ' (space) were converted into those characters.
One more thing, is that, I have heard that binary files contain three types of information:
header, ftm & and the data! is that only concerned with multimedia files like audios, videos? because, I can't seem to see anything, other than what it looks like a the data chunk in this file only.
The characters in your text file are encoded in a Windows extension of ASCII--one byte for each character that you see in Notepad. What you see is what you get.
Generally, a hard distinction is made between text and binary files on Windows systems. On Unix/Linux systems, the distinction is fuzzier... you could argue that there is no distinction, in fact.
On Windows systems, the distinction is enforced by file extensions. All files with the extension ".TXT" are assumed to be text files (i.e., to contain only hex codes that represent visible onscreen characters, where "visible" includes whitespace).
Binary files are a whole different kettle of fish. Most, as you mention, include some sort of header describing how the data that follows is encoded. These headers can vary tremendously in size depending on the type of data (again, assumed to be indicated by the extension on Windows systems as well as Unix). A simple example is the WAV format for uncompressed audio. If you open a WAV file in your hex editing program, you'll see that the first four bytes are "RIFF"--this is a marker, often called a "magic number" even though it is readable as text, indicating that the contents are an audio file. Newer versions of the WAV specification have complicated this somewhat, but originally the WAV header was just the "RIFF" tag plus a dozen or so bytes indicating the sample rate of the following data. (You can see this by comparing the raw data in a track on an audio CD to the WAV file created by ripping an uncompressed copy of that track at 44.1 KHz--the data should be the same, with just a header section added at the start of the WAV file.)
Executable files (compiled programs) are a special type of binary file, but they follow roughly the same scheme of a header followed by data in a prescribed format. In this case, though, the "data" is executable machine code, and the header indicates, among other things, what operating system the file runs on. (For example, most Linux executables begin with the characters "ELF".)

Appending data to a file from the Linux Kernel

I'm trying gather measurements of cycle counts for a particular sys call (sys_clone) in the linux kernel. That said, my process won't be the only one calling it and I can't know my pid ahead of time; so I'll have to record every invocation of it for every pid.
The problem that I've got is that the only ways I can figure out how to output this data (debugfs, sysfs, procfs) involve statically sized buffers, which will be quickly overwritten with irrelevant data from other processes calling sys_clone.
So, does anyone know how to append an arbitrary number of lines to a user space accessible file in linux?
You can take the printk()/klogd approach, and use a circular buffer that is exported via /proc. A user-space process blocks on reading your /proc file, and once it reads something that is removed from the buffer. In fact, you could take a look whether klogd/syslogd can be modified to also read your /proc file, thus you wouldn't need to implement the userspace part.
If you are good with something simpler, just printk() your information in a normalized form with some prefix, and then just filter it out from your syslog using this prefix.
There are a few more possibilities (e.g. using netlink to send messages to userspace), but writing to a file from the kernel is not something I'd recommend.
You could stash the counts in the right task_struct, and make it visible through a per-process file in /proc/<pid>/.

Resources