The size of a PE Header - windows

Is there a way to find out the size of a PE Header without reading all of it or the entire file?

You can calculate the total size of the PE header like this:
sizeof(Signature) + sizeof(FileHeader) + sizeof(OptionalHeader) + sizeof(SectionTable)
The file header always has the same size but the OptionalHeader's size can differ, as can the section table size.
The OptionalHeader's size is stored in FileHeader.SizeOfOptionalHeader, and the section table size equals FileHeader.NumberOfSections * sizeof(IMAGE_SECTION_HEADER)
And some C code:
DWORD SizeOfPEHeader(const IMAGE_NT_HEADERS * pNTH)
{
return (offsetof(IMAGE_NT_HEADERS, OptionalHeader) + pNTH->FileHeader.SizeOfOptionalHeader + (pNTH->FileHeader.NumberOfSections * sizeof(IMAGE_SECTION_HEADER)));
}
All you have to do is read the DOS header, get the PE offset (e_lfanew) and read PE.Signature + PE.FileHeader into memory. That's two reading operations of fixed size and you have all the info you need.

Related

what does it mean files overflow_xxxx.bin while training glove

I'm training a word embedding model based on Glove method. While the algorith shows a logger like:
$ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 8 < /home/ignacio/data/GUsDany/corpus/GUs_regulon_pubMed.txt > cooccurrence.bin
COUNTING COOCCURRENCES
window size: 8
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 145223095 words.
Building lookup table...table contains 228170143 elements.
Processing token: 5478600000
The home directory of Glove is filled with files caled overflow_0534.bin. Can someone tell whether all is going well?
Thanks
Everything is OK.
You can view the source code of Glove cooccur program at Github.
At the line 57 of the file:
long long overflow_length; // Number of cooccurrence records whose product exceeds max_product to store in memory before writing to disk
If your corpus has too many co-occurrence records, then there will be some data to be written into some temp bin disk files.
while (1) {
if (ind >= overflow_length - window_size) { // If overflow buffer is (almost) full, sort it and write it to temporary file
qsort(cr, ind, sizeof(CREC), compare_crec);
write_chunk(cr,ind,foverflow);
fclose(foverflow);
fidcounter++;
sprintf(filename,"%s_%04d.bin",file_head,fidcounter);
foverflow = fopen(filename,"w");
ind = 0;
}
The variable overflow_length depends on your memory settings.
Line 463:
if ((i = find_arg((char *)"-memory", argc, argv)) > 0) memory_limit = atof(argv[i + 1]);
Line 467:
rlimit = 0.85 * (real)memory_limit * 1073741824/(sizeof(CREC));
Line 470:
overflow_length = (long long) rlimit/6; // 0.85 + 1/6 ~= 1

How to determine the size of an PE executable file from headers and or footers

Assuming you have a stream of data or a block of bytes you want to carve, how can you determine the size of the executables?
There are numerous headers inside the PE executable format, but what header sections do I use to determine (if possible) the total length of the executable?
Here is a picture of the file format.
If the PE file is well formed, the calculation can be simplified as (pseudo-code):
size = IMAGE_NT_HEADERS.OptionalHeader.SizeOfHeaders
foreach section_header in section_headers:
size += section_header.SizeOfRawData
Where:
SizeOfHeaders is a member of IMAGE_OPTIONAL_HEADER structure.
(IMAGE_OPTIONAL_HEADER structure is part of IMAGE_NT_HEADERS)
SizeOfHeaders field gives the length of all the headers (note: including the 16-bit stub).
Each section header is an IMAGE_SECTION_HEADER structure
SizeOfRawData field gives the length of each section on disk.
Example with notepad (Windows 10):
SizeOfHeaders : 0x400
SizeOfRawDataof each sections :
.text: 0x15400
.data: 0x800
.idata: 0x1A00
.rsrc: 0x19C00
.reloc: 0x1600
(note: SizeOfRawData is called Raw Size in the below picture):
Sum everything:
>>> size_of_headers = 0x400
>>> sec_sizes = [0x15400, 0x800, 0x1a00, 0x19c00, 0x1600]
>>> size_of_headers + sum(sec_sizes)
207872
>>>
Total size: 207872 bytes.
Verification:
Note: the above calculation doesn't take into account if the PE is badly formed or if there is an overlay.

Splitting file into parts by bits

Ok, so this is a unique question.
We are getting files (daily) from a company. These files are downloaded from their servers to ours (SFTP). The company that we deal with deals with a third party provider that creates the files (and reduces their size) to make downloads faster and also reduce file-size on their servers.
We download 9 files daily from the server, 3 groups of 3 files
Each group of files consists of 2 XML files and one "image" file.
One of these XML files gives us information on the 'image' file.
Information in the XML file we need:
offset: Gives us where a section of data starts
length: Used with offset, gives us the end of that section
count: Gives us the number of elements held in the file
The 'image' file itself is unusable until we split the file into pieces based on the offset and length of each image in the file. The images are basically concatenated together. We need to extract these images to be able to view them.
An example of offset, length and count values are as follows:
offset: 0
length: 2670
offset: 2670
length: 2670
offset: 5340
length: 2670
offset: 8010
length: 2670
count: 4
This means that there are 4 (count) items. The first count item begins at offset[0] and is length[0] in length. The second item begins at offset[1] and is length[1] in length, etc.
I need to split the images at these points and these points PRECISELY without room for error. The third party provider will not provide us with the code and we are to figure this out ourselves. The image file is not readable without splitting the files and are essentially useless until then.
My question: Does anyone have a way of splitting files at a specific byte?
P.S. I do not have any code yet. I don't even know where to begin with this one. I am not new to coding, but I have never done file splitting by the byte.
I don't care which language this uses. I just need to make it work.
EDIT
The OS is Windows
You hooked me. Here's a rough Java method that can split a file based on offset and length. This requires at least Java 8.
A few of the classes used:
SeekableByteChannel
ByteBuffer
And an article I found helpful in producing this example.
/**
* Method that splits the data provided in fileToSplit into outputDirectory based on the
* collection of offsets and lengths provided in offsetAndLength.
*
* Example of input offsetAndLength:
* Long[][] data = new Long[][]{
* {0, 2670},
* {2670, 2670},
* {5340, 2670},
* {8010, 2670}
* };
*
* Output files will be placed in outputDirectory and named img0, img1... imgN
*
* #param fileToSplit
* #param outputDirectory
* #param offsetAndLength
* #throws IOException
*/
public static void split( Path fileToSplit, Path outputDirectory, Long[][] offsetAndLength ) throws IOException{
try (SeekableByteChannel sbc = Files.newByteChannel(fileToSplit, StandardOpenOption.READ )){
for(int x = 0; x < offsetAndLength.length; x++){
ByteBuffer buffer = ByteBuffer.allocate(offsetAndLength[x][4].intValue());
sbc.position(offsetAndLength[x][0]);
sbc.read(buffer);
buffer.flip();
File img = new File(outputDirectory.toFile(), "img"+x);
img.createNewFile();
try(FileChannel output = FileChannel.open(img.toPath(), StandardOpenOption.WRITE)){
output.write(buffer);
}
buffer.clear();
}
}
}
I leave parsing the XML file to you.

reading file with UPC

I'm starting to learn UPC, and I have the following piece of code to read a file:
upc_file_t *fileIn;
int n;
fileIn = upc_all_fopen("input_small", UPC_RDONLY | UPC_INDIVIDUAL_FP , 0, NULL);
upc_all_fread_local(fileIn, &n, sizeof(int), 1, UPC_IN_ALLSYNC | UPC_OUT_ALLSYNC);
upc_barrier;
printf("%d\n", n);
upc_all_fclose(fileIn);
However, the output (value of n) is always 808651319, which means something is wrong, and I can't find what is it. The first line of the file I'm giving as input is '7', so the result of the printf should be 7...
Any idea why this happens?
Thanks in advance!
UPC Parallel I/O library performs unformatted (binary) input/output, not formatted one like what you get with (f)printf(3)/(f)scanf(3) from the standard C library. Parallel I/O cannot handle text files because of their intrinsic properties like variable-length records.
upc_all_fread_local(fileIn, &n, sizeof(int), 1, UPC_IN_ALLSYNC | UPC_OUT_ALLSYNC)
behaves like the following call to the standard C library function for unformatted read from a file:
fread(&n, sizeof(int), 1, fh)
You are just reading 1 element of sizeof(int) bytes from the file (4 bytes on most platforms) into the address of n. The number you got 808651319 in hexadecimal is 0x30330A37. On little endian systems like x86/x64 this is stored in memory and on disk as 0x37 0x0A 0x33 0x30 (reversed byte order). These are the ASCII codes of the first 4 bytes of the string 7\n30 (\n or LF is the line feed/new line symbol) so I'd guess your input_small file looked like:
7
30...
...
You should prepare your input data in binary format using fwrite(3) instead of using (f)printf(3) or your text editor of choice.

Interpreting valgrind error Invalid write of size 4

I was recently trying to track down some bugs in a program I am working on using valgrind, and one of the errors I got was:
==6866== Invalid write of size 4
==6866== at 0x40C9E2: superneuron::read(_IO_FILE*) (superneuron.cc:414)
the offending line # 414 reads
amplitudes__[points_read] = 0x0;
and amplitudes__ is defined earlier as
uint32_t * amplitudes__ = (uint32_t* ) amplitudes;
Now obviously a uint32_t is 4 bytes long, so this is the write size, but could someone tell me why it's invalid ?
points_read is most likely out of bounds, you're writing past (or before) the memory you allocated for amplitudes.
A typical mistake new programmers do to get this warning is:
struct a *many_a;
many_a = malloc(sizeof *many_a * size + 1);
and then try to read or write to the memory at location 'size':
many_a[size] = ...;
Here the allocation should be:
many_a = malloc(sizeof *many_a * (size + 1));

Resources