finding the section of a PE's entrypoint - portable-executable

I'm trying to find what section the PE entrypoint points to.
I have two questions:
Is it correct to say that this section is the one such that section.PointerToRawData < AddressOfEntryPoint < section.PointerToRawData + section.SizeOfRawData ?
I see some PE's that have AddressOfEntryPoint > total size of file. How is this possible? Is the AddressOfEntryPoint value wrong?
thanks

Is it correct to say that this section is the one such that
section.PointerToRawData < AddressOfEntryPoint <
section.PointerToRawData + section.SizeOfRawData ?
Not quite, the section you want should be the one such that: section.VirtualAddress < AddressOfEntryPoint < section.VirtualAddress+ section.VirtualSize
Then to find the position in the file, use: AddressOfEntryPoint - section.VirtualAddress + section.PointerToRawData

Related

what does it mean files overflow_xxxx.bin while training glove

I'm training a word embedding model based on Glove method. While the algorith shows a logger like:
$ build/cooccur -memory 4.0 -vocab-file vocab.txt -verbose 2 -window-size 8 < /home/ignacio/data/GUsDany/corpus/GUs_regulon_pubMed.txt > cooccurrence.bin
COUNTING COOCCURRENCES
window size: 8
context: symmetric
max product: 13752509
overflow length: 38028356
Reading vocab from file "vocab.txt"...loaded 145223095 words.
Building lookup table...table contains 228170143 elements.
Processing token: 5478600000
The home directory of Glove is filled with files caled overflow_0534.bin. Can someone tell whether all is going well?
Thanks
Everything is OK.
You can view the source code of Glove cooccur program at Github.
At the line 57 of the file:
long long overflow_length; // Number of cooccurrence records whose product exceeds max_product to store in memory before writing to disk
If your corpus has too many co-occurrence records, then there will be some data to be written into some temp bin disk files.
while (1) {
if (ind >= overflow_length - window_size) { // If overflow buffer is (almost) full, sort it and write it to temporary file
qsort(cr, ind, sizeof(CREC), compare_crec);
write_chunk(cr,ind,foverflow);
fclose(foverflow);
fidcounter++;
sprintf(filename,"%s_%04d.bin",file_head,fidcounter);
foverflow = fopen(filename,"w");
ind = 0;
}
The variable overflow_length depends on your memory settings.
Line 463:
if ((i = find_arg((char *)"-memory", argc, argv)) > 0) memory_limit = atof(argv[i + 1]);
Line 467:
rlimit = 0.85 * (real)memory_limit * 1073741824/(sizeof(CREC));
Line 470:
overflow_length = (long long) rlimit/6; // 0.85 + 1/6 ~= 1

Fast and furious random reading of huge files

I know that the question isn't new but I haven't found anything useful. In my case I have a 20 GB file and I need to read random lines from it. Now I have simple file index which contains line numbers and corresponding seek offsets. Also I disabled buffering when reading to read only the needed line.
And this is my code:
def create_random_file_gen(file_path, batch_size=0, dtype=np.float32, delimiter=','):
index = load_file_index(file_path)
if (batch_size > len(index)) or (batch_size == 0):
batch_size = len(index)
lines_indices = np.random.random_integers(0, len(index), batch_size)
with io.open(file_path, 'rb', buffering=0) as f:
for line_index in lines_indices:
f.seek(index[line_index])
line = f.readline(2048)
yield __get_features_from_line(line, delimiter, dtype)
The problem is that it's extremely slow: reading of 5000 lines takes 89 seconds on my Mac(here I point to ssd drive). There is code I used for testing:
features_gen = tedlium_random_speech_gen(5000) # just a wrapper for function given above
i = 0
for feature, cls in features_gen:
if i % 1000 == 0:
print("Got %d features" % i)
i += 1
print("Total %d features" % i)
I've read something about files memory mapping but I don't really understand how it works: how the mapping works in essence and will it speed up the process or no.
So the main question what are the possible ways to speed up the process? The only way I see now is to read randomly not every line but blocks of lines.

How to determine the size of an PE executable file from headers and or footers

Assuming you have a stream of data or a block of bytes you want to carve, how can you determine the size of the executables?
There are numerous headers inside the PE executable format, but what header sections do I use to determine (if possible) the total length of the executable?
Here is a picture of the file format.
If the PE file is well formed, the calculation can be simplified as (pseudo-code):
size = IMAGE_NT_HEADERS.OptionalHeader.SizeOfHeaders
foreach section_header in section_headers:
size += section_header.SizeOfRawData
Where:
SizeOfHeaders is a member of IMAGE_OPTIONAL_HEADER structure.
(IMAGE_OPTIONAL_HEADER structure is part of IMAGE_NT_HEADERS)
SizeOfHeaders field gives the length of all the headers (note: including the 16-bit stub).
Each section header is an IMAGE_SECTION_HEADER structure
SizeOfRawData field gives the length of each section on disk.
Example with notepad (Windows 10):
SizeOfHeaders : 0x400
SizeOfRawDataof each sections :
.text: 0x15400
.data: 0x800
.idata: 0x1A00
.rsrc: 0x19C00
.reloc: 0x1600
(note: SizeOfRawData is called Raw Size in the below picture):
Sum everything:
>>> size_of_headers = 0x400
>>> sec_sizes = [0x15400, 0x800, 0x1a00, 0x19c00, 0x1600]
>>> size_of_headers + sum(sec_sizes)
207872
>>>
Total size: 207872 bytes.
Verification:
Note: the above calculation doesn't take into account if the PE is badly formed or if there is an overlay.

Getting fortran runtime error: end of file

I have recently learned how to work with basic files in Fortran
and I assumed it was as simple as:
open(unit=10,file="data.dat")
read(10,*) some_variable, somevar2
close(10)
So I can't understand why this function I wrote is not working.
It compiles fine but when I run it it prints:
fortran runtime error:end of file
Code:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*) Staff_Name(i)
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
I am using Fortran 2008 with gfortran.
A common reason for the error you report is that the program doesn't find the file it is trying to open. Sometimes your assumptions about the directory in which the program looks for files at run-time will be wrong.
Try:
using the err= option in the open statement to write code to deal gracefully with a missing file; without this the program crashes, as you have observed;
or
using the inquire statement to figure out whether the file exists where your program is looking for it.
You can check when a file has ended. It is done with the option IOSTAT for read statement.
Try:
Function Load_Names()
character(len=30) :: Staff_Name(65)
integer :: i = 1
integer :: iostat
open(unit=10, file="Staff_Names.txt")
do while(i < 65)
read(10,*, IOSTAT=iostat) Staff_Name(i)
if( iostat < 0 )then
write(6,'(A)') 'Warning: File containts less than 65 entries'
exit
else if( iostat > 0 )then
write(6,'(A)') 'Error: error reading file'
stop
end if
print*, Staff_Name(i)
i = i + 1
end do
close(10)
end Function Load_Names
Using Fortran 2003 standard, one can do the following to check if the end of file is reached:
use :: iso_fortran_env
character(len=1024) :: line
integer :: u1,stat
open (newunit=u1,action='read',file='input.dat',status='old')
ef: do
read(u1,'A',iostat=stat) line
if (stat == iostat_end) exit ef ! end of file
...
end do ef
close(u1)
Thanks for all your help i did fix the code:
Function Load_Names(Staff_Name(65))!Loads Staff Names
character(len=30) :: Staff_Name(65)
integer :: i = 1
open(unit=10, file="Staff_Names.txt", status='old', action='read')!opens file for reading
do while(i < 66)!Sets Set_Name() equal to the file one string at a time
read(10,*,end=100) Staff_Name(i)
i = i + 1
end do
100 close(10)!closes file
return!returns Value
end Function Load_Names
I needed to change read(10,*) to read(10,*,END=100)
so it knew what to do when it came to the end the file
as it was in a loop I assume.
Then your problem was that your file was a row vector, and it was likely
giving you this error immediately after reading the first element, as #M.S.B. was suggesting.
If you have a file with a NxM matrix and you read it in this way (F77):
DO i=1,N
DO j=1,M
READ(UNIT,*) Matrix(i,j)
ENDDO
ENDDO
it will load the first column of your file in the first row of your matrix and will give you an error as soon as it reaches the end of the file's first column, because the loop enforces it to read further lines and there are no more lines (if N<M when j=N+1 for example). To read the different columns you should use an implicit loop, which is why your solution worked:
DO i=1,N
READ(UNIT,*) (Matrix(i,j), j=1,M)
ENDDO
I am using GNU Fortran 5.4.0 on the Ubuntu system 16.04. Please check your file if it is the right one you are looking for, because sometimes files of the same name are confusing, and maybe one of them is blank. As you may check the file path if it is in the same working directory.

The size of a PE Header

Is there a way to find out the size of a PE Header without reading all of it or the entire file?
You can calculate the total size of the PE header like this:
sizeof(Signature) + sizeof(FileHeader) + sizeof(OptionalHeader) + sizeof(SectionTable)
The file header always has the same size but the OptionalHeader's size can differ, as can the section table size.
The OptionalHeader's size is stored in FileHeader.SizeOfOptionalHeader, and the section table size equals FileHeader.NumberOfSections * sizeof(IMAGE_SECTION_HEADER)
And some C code:
DWORD SizeOfPEHeader(const IMAGE_NT_HEADERS * pNTH)
{
return (offsetof(IMAGE_NT_HEADERS, OptionalHeader) + pNTH->FileHeader.SizeOfOptionalHeader + (pNTH->FileHeader.NumberOfSections * sizeof(IMAGE_SECTION_HEADER)));
}
All you have to do is read the DOS header, get the PE offset (e_lfanew) and read PE.Signature + PE.FileHeader into memory. That's two reading operations of fixed size and you have all the info you need.

Resources