I got PE file having having two ".data" sections Bytes of section name is different. What type of this file can be? - portable-executable

i got two PE files having same sections named as ".data". These name contains different bytes when we see in hex dump. This sections is having 00 bytes in contents. What is this file type can be?

https://www.curlybrace.com/archive/PE%20File%20Structure.pdf
You can get all the details about section names here [PE file Structure]
And then decide yourself if the file is malicious or not.
Happy Overflowing :D

Normal compilers shouldn't produce two sections with identical names, so the likely explanation is that the binary was modified post-compilation. Such obvious modifications are typical (but not conclusive) of malware. Without further information, it's not possible to say much else.

Related

What is the rva and base address in the portable executable format?

I need help to understand these concepts.
I understand that the rva is an offset from the base address. But Its relative to what in a file? I understood it was from where the image will be loaded in memory, but in the executable file itself, an rva is relative to what? The beggining of the file, so the file Id at the start?
Thanks for reading :)
Yes, usually from the start of the file. There are probably a couple of exceptions when you get deeper into specific parts of a file. You will generally find them when reading the documentation:
MESSAGE_RESOURCE_BLOCK.OffsetToEntries:
The offset, in bytes, from the beginning of the MESSAGE_RESOURCE_DATA structure to the MESSAGE_RESOURCE_ENTRY structures in this MESSAGE_RESOURCE_BLOCK. The MESSAGE_RESOURCE_ENTRY structures contain the message strings.

What does a gcc output file look like and what exactly does it contain?

While compiling a c file, gcc by default compiles it to a file called "a.out". My professor said that the output file contains the binaries, but I when I open it I usually encounter unreadable text (VS Code says something like "This file contains unsupported text encoding").
I assumed that by 'binaries', I would be able to see literal zeroes and ones in the file but that does not seem to be the case. So what exactly does it output file look like or what exactly does it contain and what is 'text encoding'? Why can I not read it? What special characters might it contain? I'm aware of the fact that gcc first pre-processes, which means it removes all comments, expands all macros and copies the contents of any header files that might be included. You get the header file by running gcc -E <file_name>.c, then the this processed file is complied into assembly. Up to this point, the output files are readable, i.e., I can open them with VS Code, but after this the assembled code and the object file thereafter are human-unreadable.
For reference, I have no prior experience with programming or any language for that matter and this is my first CS related course in my first sem of college, and I apologize if this is too trivial of a question to ask.
I actually had the same confusion early on. Not about that file type specifically, but about binary vs text files.
After all aren't all files, even text ones binary? In the sense that all information is 1s and 0s? Well, yes, all information can be stored/transmitted as 1s and 0s, but that's not what binary/text files refer to.
It refers to what that information, the content of the file, those 1s and 0s represent.
In a text file the bytes encode characters. In a binary file the bits encode some information that is not text. The format and semantics of that information is completely free, it can mean anything and use whatever encoding scheme. It's up to the application that writes/reads the file to properly understand the bit patterns.
Most text editors (like VS Code) when open a file they treat it as a text file. I.e. they try to interpret the bit patterns as a text encoding scheme (e.g. ASCII or UTF-8) But not all bit patterns are valid ASCII/UTF-8 so that's why you get "unsupported text encoding".
If you want to inspect the actual 1s and 0 for both text and binary files you need to use a utility that shows you that, e.g. hex viewers/editors.

resolving pointer inside PE structure

I am analyzing PE structure.
some article in MSDN(http://msdn.microsoft.com/en-us/magazine/bb985997.aspx) says
"IMAGE_DIRECTORY_ENTRY_IMPORT" points to the imports(an array of IMAGE_IMPORT_DESCRIPTOR structures).
I checked the actual value with 010 Editor PE template.
however the value seemed to be encoded somehow and I don't know how to interpret.
pictures below clearly explains this situation problem.
some advice would be appreciated...!
I looked through the template, and it would appear that the "FOA" comments are generated by passing an RVA to the "RVA2FOA" function, which looks like it's converting the RVA to a file offset.
That makes sense, the file offset is something you often want to know (especially in a HEX editor, where you have to navigate by file offset), and FOA looks like it can be short for File Offset Something-or-other.

PE File Format - What is between the section table and the first section?

When looking at PE files in a hex editor, I often encountered some bytes between the section table and the first section, which doesn't really make sense to me. As far as I am concerned, there should be a 00-byte padding in order to fit the alignment. However, here is a screenshot which demonstrates the opposite:
As it turned out the highlighted block is pretty much the Bound Import Table. But I am still confused. Why is this table not located in a section? Is this always the case or is it just the specification of a certain compiler/linker? I did not find any documentation on this specific issue. Everything one can find on this topic basically says:
DOS MZ Header
DOS Stub
PE Header
Section Table
Section 1
Section 2
Section 3
... and so on
Before I encountered this issue I was not even aware of the fact, that there can be things outside of the sections (besides the ones i listed above, of course).
[EDIT]
Proof of concept (Since Mox did not believe me):
Data directories such as the IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT can exist outside of sections. Another example of a data directory existing outside of any known section would be the IMAGE_DIRECTORY_ENTRY_CERTIFICATE data directory which is the data directory used to store the certificate information when an executable is signed.
Data directories can point to data outside of a section, with-in a section, or they can point to the entire section. The IMAGE_DIRECTORY_ENTRY_RESOURCE data directory points to the entire ".rsrc" section. Certain data directories point to known sections and these are documented in the PE format specification by Microsoft.
Items like the bound import table can be written wherever the linker wants to put them in the raw image. It just overwrites the zero bytes with the table and makes the pointer correct in the data directory. You could probably even overwrite the middle of the DOS header or stub with the import table and it would work as long as the pointer in the directory was correct.
As far as I can see with LordPe, the IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT entry of iexplore.exe is empty.
both 32bit and 64bit versions of IEXPLORE.EXE don't have IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT entries.
Here a snaphot of LordPE, showing the 64bit version of IEXPLORE.EXE on a Windows 7 machine and (in green) the missing IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT entry:
It looks like you don't look at the right directory entry.

Is there a way to infer what image format a file is, without reading the entire file?

Is there a good way to see what format an image is, without having to read the entire file into memory?
Obviously this would vary from format to format (I'm particularly interested in TIFF files) but what sort of procedure would be useful to determine what kind of image format a file is without having to read through the entire file?
BONUS: What if the image is a Base64-encoded string? Any reliable way to infer it before decoding it?
Most image file formats have unique bytes at the start. The unix file command looks at the start of the file to see what type of data it contains. See the Wikipedia article on Magic numbers in files and magicdb.org.
Sure there is. Like the others have mentioned, most images start with some sort of 'Magic', which will always translate to some sort of Base64 data. The following are a couple examples:
A Bitmap will start with Qk3
A Jpeg will start with /9j/
A GIF will start with R0l (That's a zero as the second char).
And so on. It's not hard to take the different image types and figure out what they encode to. Just be careful, as some have more than one piece of magic, so you need to account for them in your B64 'translation code'.
Either file on the *nix command-line or reading the initial bytes of the file. Most files come with a unique header in the first few bytes. For example, TIFF's header looks something like this: 0x00000000: 4949 2a00 0800 0000
For more information on the TIFF file format specifically if you'd like to know what those bytes stand for, go here.
TIFFs will begin with either II or MM (Intel byte ordering or Motorolla).
The TIFF 6 specification can be downloaded here and isn't too hard to follow

Resources