How can an executable be this small in file size? - windows

I've been generating payloads on Metasploit and I've been experimenting with the different templates and one of the templates you can have your payload as is exe-small. The type of payload I've been generating is a windows/meterpreter/reverse_tcp and just using the normal exe template it has a file size around 72 KB however exe-small outputs a payload the size of 2.4kb. Why is this? And how could I apply this to my programming?

The smallest possible PE file is just 97 bytes - and it does nothing (just return).
The smallest runnable executable today is 133 bytes, because Windows requires kernel32 being loaded. Executing a PE file with no imports is not possible.
At that size it can already download payload from the Internet by specifying an UNC path in the import table.
To achieve such a small executable, you have to
implement in assembler, mainly to get rid of the C runtime
decrease the file alignment which is 1024 by default
remove the DOS stub that prints the message "This program cannot be run in DOS mode"
Merge some of the PE parts into the MZ header
Remove the data directory
The full description is available in a larger research blog post called TinyPE.

For EXE's this small, the most space typically is used for the icon. Typically the icon has various sizes and color schemes contained, which you could get rid of, if you do not care having an "old, rusty" icon, or no icon at all.
There is also some 4k of space used, when you sign the EXE.
As an example for a small EXE, see never10 by grc. There is a details page which highlights the above points:
https://www.grc.com/never10/details.htm
in the last paragraph:
A final note: I'm a bit annoyed that “Never10” is as large as it is at
85 kbyte. The digital signature increases the application's size by
4k, but the high-resolution and high-color icons Microsoft now
requires takes up 56k! So without all that annoying overhead, the app
would be a respectable 25k. And, yes, of course I wrote it in
assembly language.
Disclaimer: I am not affiliated with grc in any way.

The is little need for an executable to be big, except when it contains what I call code spam, code not actually critical to the functionality of the program/exe. This is valid for other files too. Look at a manually written HTML page compared to one written in FrontPage. That's spamcode.
I remember my good old DOS files that were all KB in size and were performing practically any needed task in the OS. One of my .exes (actually .com) was only 20 bytes in size.
Just think of it this way: just as in some situations a large majority of the files contained in a Windows OS can be removed and still the OS can function perfectly, it's the same with the .exe files: large parts of the code is either useless, or has different than relevant-to-objective purpose or are intentionally added (see below).
The peak of this aberration is the code added nowdays in the .exe files of some games that use advanced copy protection, which can make the files as large as dozens of MB. The actually code needed to run the game is practically under 10% of the full code.
A file size of 72 KB as in your example can be pretty sufficient to do practically anything to a windows OS.
To apply this to your programming, as in make very small .exes, keep things simple. Don't add unnecessary code just for the looks of it or by thinking you will use that part of the program/code at a point.

Related

Visual Studio embed large resource file (almost 4gb)

I am trying to embed a large resource file (almost 4gb), its a .dat file. However i am running into issues where it throws an error
"Error reading resource 'Sx64.x-none.dat' -- 'Specified argument was out of the range of valid values.
It appears there is a limitation to the size of an embedded resource for Visual studio. Would there be a way to increase the max size? or some other work around for this? I am trying not to use a linked resource or have another file being copied around with the exe.
While in the PE format specification the SizeOfImage value is a 32 bit unsigned integer and can theoretically handle up to 4 GiB, in practice the limit for an executable file is lower. Some user here on stackoverflow has tested this behavior. However it's still possible to make an executable bigger and working (on 64 bit Windows only) but the data must be kept outside of the image sections at End Of File, so the loader won't attempt to allocate it. This is a bad practice and I suggest, as suggested by others in comments, to ship it in a separate file along with your executable.

Is dual mode executable possible?

A bit of history... I have 3 systems that I spend time on, a DOS 6.22 system, a Windows 95 system, and a modern Windows 7 (64-bit) system. When I upgraded to Win7-64, some of my favorite command line utilities stopped working, so I decided to re-write them myself. The only 2 compilers I have are Borland Turbo C++ 3.0 and Visual Studio 2008, and they worked fine for building 2 versions, a DOS 16-bit, and a Windows 7 32-bit (could have built 64-bit too, I guess.) The problem came with my Win95 system. The DOS version works fine there, but since I spent the time to support LFNs in the Win7 build, I wanted it with my Win95 system. So, after a lot of research, I found and purchased Visual Studio 6 (last one with Win95 support according to what I researched,) copied the code over (had to rewrite sections, of course,) and it compiled just fine, and works :)
The problem occurred the next time I had to boot my Win95 system in DOS mode. The program stopped working (of course,) because Win95 wasn't loaded. I don't really want to have 2 copies of the program installed (needing 2 different file names,) so I was hoping there was a way to link the 2 versions together into one file. If I execute it in DOS, instead of it saying it requires windows, it would just jump to the DOS section of the program. That way, it would be a single program, with LFN support if Win95 is loaded, and without if Win95 isn't loaded. Since the Win95 version also works fine in Win7-64, it would probably also produce a single version that works on all 3 systems (which would be an added bonus.)
I did some web searches, and couldn't find anything germane to what I'm looking for. So I have no idea if it is even possible. I may have to get yet another compiler, but considering how old it would have to be, I could probably afford it. My web searches did result in information that leads me to believe that it "should" be possible, though. It would just require a different exe header than the one Windows compilers put in. It may require that I re-write the DOS version for 32-bit and use a DOS extender (for protected mode, assuming I can't find a way to include it in the file itself.) That would be acceptable (though not ideal.) I would much rather have 16-bit code in the DOS section, and 32-bit code in the Windows section (for the most compatibility.)
Does anyone have any information about something like this? If you could just point me in the right direction it would be greatly appreciated.
I don't know if it has been continued in Windows 7 executables, but back in Win95 the executable (EXE) actually had two entry points -- one "normal" one that DOS would find, and a second one that Windows would use. The DOS entry point was usually a very simple default that would just print "This is a Windows program" and exit. You can actually override this default, and have the linker use your own code, however it is very limited.
What I'd recommend doing is add logic to your DOS 6.22 version (e.g. "sed") that would check the OS level & if it meets the right criteria, pass the parameters along to a second executable (e.g. "sedx") that uses features from the "newer" OS.
The documentation for Visual Studio 6 describes the /STUB option here, simply point this at the DOS version of your program.
I don't have VS6 handy, so I can't be too specific, but in the project settings GUI, there should be an "additional options" setting in the linker section.
Well the answer is the /stub option in the Linker you are using for your Windows code. Some additional information for anyone who finds the question later.... I had to do several days of web searches to find that there doesn't appear to be another answer to my particular problem.
Stub requires that the DOS mode executable have a header of at least 40 bytes. After fighting with multiple compilers that "DO" give you a header of the right size (Borland Turbo C++ won't,) and not being able to convert my code, I had to get sneaky/fancy. BTW - Visual C 1.52c (last Visual C that supports DOS,) will make a correct header, as will Open WatCom.
If you are faces with the same issue I was - the compiler you used won't make the correct size header, and your code is too compiler specific to convert easily, you can do what I ended up doing. I used Open WatCom to write a tiny ("Hello World") Windows program using my exe with the short (Borland created,) header as the stub. Open WatCom will adjust the header automatically. I then used a Hex Editor to read the header information to get the ending address of the stub and a partial file copier to copy only that part of the program to a file I named "stub.exe" (stripping of the Windows code.) Using the same Hex Editor I zeroed out the PE pointer in the header. I now had a working DOS exe that would also work as a stub. Took my stub to my Windows compiler, and linked it in. It works great, all features fully realized :)
FYI - Information needed to strip the Windows portion and zero the PE pointer.
first byte is offset 0 (of course, but some people may not realize that, and think it's byte 1.) Also remember, that most Hex Editors (by their very name,) are giving you numbers in hexadecimal format.
offset 2 & 3, number of bytes in the last block of the DOS portion of the file in low byte - high byte format. That is, offset 2 is low, 3 is high. So take them, reverse them, and you will get a number from 0 - 511 (0 - 1ff in hex.) 0 means the entire block of 512 (200 in hex) bytes is used.
offset 4 & 5 (again in low/high format,) is the number of 512 (200 in hex) byte block in the DOS portion. Remember to reverse the number, and that the last block may only be a partial block. So, subtract one, multiply by 512 (200 hex,) add the number from 2-3, and you have how many bytes are in the DOS portion. Since you are starting from 0, subtract 1, and you now know to only copy bytes 0 - "whatever the total is" to your stub exe.
offset 60-61 (hex 3C-3D) is the pointer to the start of the PE (or Portable Executable,) portion of the code (the part that Windows jumps to.) It should be just past (mine was padded with a few zeroes,) the end of the DOS portion of the code. This isn't important at this time, as we are just turning those into 0's anyway (the PE portion has been stripped.) You can use this as confirmation that you have the correct "end of DOS" offset selected though.
The tools I used are:
Open WatCom at http://www.openwatcom.org/index.php/Main_Page
and
Part Copy at http://www.virtualobjectives.com.au/utilitiesprogs/partcopy.htm
I have no idea where to find the Hex Editor I used. I used CEdit, a DOS program I really like, but have been unable to find on the net. Have to use DOSBox with it as Win7 won't run it, though. There are probably other compilers that do the same thing, and probably tons of partial file copiers available. These are the tools I used.

Generating a PE format executable

I'm trying to generate a PE format executable; I'm at the stage where I have something that dumpbin is happy with, and as far as I can tell is not materially different from an empty program linked with Microsoft's linker, but Windows still rejects it: PE file - what's missing?
If I had some algorithm for generating a valid PE file, maybe I could hill climb from there. Here's what I've found so far:
There's plenty of documentation, sample code and tools for reading PE files, as opposed to generating them from scratch.
PE Bliss lists generation among its features, but won't compile.
Sample assembly language templates for PE file generation concentrate on minimizing size. The most promising looking one generates a file that Windows rejects even though as far as I can see it should be accepted; the one I found that did work, ironically, generates a file that Windows accepts even though as far as I can see it should be rejected, since almost every nominally essential component is missing or malformed.
Is there any sample code available that generates a correct PE file?
Here's the classic page about generating PE from scratch:
http://www.phreedom.org/research/tinype
As for the generic list of required/optional parts, see corkami page on the PE format:
http://code.google.com/p/corkami/wiki/PE
See also the code tree for many examples of small PE files, generated completely from scratch.

Should I deal with files longer than MAX_PATH?

Just had an interesting case.
My software reported back a failure caused by a path being longer than MAX_PATH.
The path was just a plain old document in My Documents, e.g.:
C:\Documents and Settings\Bill\Some Stupid FOlder Name\A really ridiculously long file thats really very very very..........very long.pdf
Total length 269 characters (MAX_PATH==260).
The user wasn't using a external hard drive or anything like that. This was a file on an Windows managed drive.
So my question is this. Should I care?
I'm not saying can I deal with the long paths, I'm asking should I. Yes I'm aware of the "\?\" unicode hack on some Win32 APIs, but it seems this hack is not without risk (as it's changing the behaviour of the way the APIs parse paths) and also isn't supported by all APIs .
So anyway, let me just state my position/assertions:
First presumably the only way the user was able to break this limit is if the app she used uses the special Unicode hack. It's a PDF file, so maybe the PDF tool she used uses this hack.
I tried to reproduce this (by using the unicode hack) and experimented. What I found was that although the file appears in Explorer, I can do nothing with it. I can't open it, I can't choose "Properties" (Windows 7). Other common apps can't open the file (e.g. IE, Firefox, Notepad). Explorer will also not let me create files/dirs which are too long - it just refuses. Ditto for command line tool cmd.exe.
So basically, one could look at it this way: a rouge tool has allowed the user to create a file which is not accessible by a lot of Windows (e.g. Explorer). I could take the view that I shouldn't have to deal with this.
(As an aside, this isn't an vote of approval for a short max path length: I think 260 chars is a joke, I'm just saying that if Windows shell and some APIs can't handle > 260 then why should I?).
So, is this a fair view? Should I say "Not my problem"?
UPDATE: Just had another user with the same problem. This time an mp3 file. Am I missing something? How can these users be creating files that violate the MAX_PATH rule?
It's not a real problem. NTFS support filenames up to 32K (32,767 wide characters). You need only use correct API and correct syntax of filenames. The base rule is: the filename should start with '\\?\' (see http://msdn.microsoft.com/en-us/library/aa365247(v=VS.85).aspx) like \\?\C:\Temp. The same syntax you can use with UNC: \\?\UNC\Server\share\Path. Important to understand that you can use only a small subset of API function. For example look at MSDN description of functions
CreateFile
CreateDirectory
MoveFile
and so on
you will find text like :
In the ANSI version of this function,
the name is limited to MAX_PATH
characters. To extend this limit to
32,767 wide characters, call the
Unicode version of the function and
prepend "\?\" to the path. For more
information, see Naming a File.
This functions you can safe use. If you have a file handle from CreateFile you can use all other functions used hFile (ReadFile, WriteFile etc.) without any restriction.
If you write a program like virus scanner or backup software or some good software running on a server you should write your program so, that all file operations support filenames up to 32K characters and not MAX_PATH characters.
This limitation is baked into a lot of software written in C or C++. Including MSFT code, although they've been chipping away at it. It is only partly a Win32 limitation, it still has a hard upper limit on the length of a file name (not path) through WIN32_FIND_DATA for example. One reason that even .NET has length restrictions. This is not going away any time soon, Win32 is still going strong and the stone-age C string won't disappear.
Your customer will have little sympathy with it, no doubt, probably until you can show them another program that fails the same way. Do however make sure that your code reliably can detect the potential string buffer overflow, followed by a reasonable diagnostic. No sympathy for programs bombing on heap corruption.
As you mentioned many of the Windows Shell functions only work on paths up to MAX_PATH. Windows XP and I believe Vista both have problems in Explorer when nesting directories with long file names. I've not checked Windows 7 - perhaps they have fixed that. This unfortunately means that users have a hard time browsing these file.
If you really wish to support long paths you'll need to check any functions you are using in Shell32.dll that take paths to ensure they support long paths. For those that don't you'll have to use write them yourself using Kernel32 functions.
If you decide to use Shell32 and be limited to MAX_PATH, writing your code to support long file paths would be advisable. If Microsoft later change Shell32 (or create an alternative), you will be better positioned to add support for them.
Just to add another couple of dimensions to the problem, remember that filenames are UTF-16, and you may encounter non NTFS or FAT filesystems that may be case sensitive!
Your own APIs should not hard-code a fixed limit on the path length (or any other hard limits); however, you shouldn't violate the preconditions of the system APIs in order to accomplish some task. IMHO, the fact that Windows limits the length of path names is absurd and should be considered a bug. That said, no I would suggest you not attempt to use the various system APIs other than as documented, even if that results in certain undesireable behavior such as limits to the maximum path length. So, in short, your view is completely fair; if the OS doesn't support it, then the OS doesn't support it. That said, you may want to make it clear to users that this is a limitation of Windows and not of your own code.
One easy way how these files with long paths could be created even by software that does not support paths longer than MAX_PATH: through a file share.
Example:
"C:\My veeeeeeeeeeeeeeeeeeeeery looooooooooooooooooong folder" could be shared as "data". Users could then access that folder through the UNC path \\computer\data or (even shorter) through a drive letter (M:\) assuming that M: is mapped to \\computer\data.
This often happens on file servers.
Paths often can be bigger than 260, one example would be when symlinks get nested and repeat over and over sometimes even on purpose. I think programmers should think about whether they want their program to handle these insanely large paths or not. IMO, 260 is PLENTY of space but thats just me. My answer to this is:
if you have to ask yourself so deeply about breaking the 260 char limit, then thats probably what you should do. We often look for confirmation when we are about to do something that we are unsure about...
I think the maximum path anywhere in the API is about 32k long but thats up to you. Back in the day that was a pretty big chunk of change (half of an entire memory segment!! sheesh!) but nowdays, in the segment-transparent addressing environment in which we live, where all memory is heaped together on the flat, 32k is nothin'... AFAIK paths wouldn't need to be that long unless you are using some fancy unicode language that requires lots of other characters, etc, etc.. we could blab about this all day but you get the idea. I hope this helps..... or hurts?
I am doung some C programming and I was searching for a way to get the maximum length of a given filename, after a search for MAX_PATH I stumbled to this thread and after som thoughts on this matter and after reading the comments on this thread I have come to the following conclusion.
So I understand that NTFS support filenames up to 32.767 characters in length, however, according to knowledge FAT16 only support 11 character filenames, 8 + 3, so in reallity operating systems should have a function which our program can call to dertemine the maximum filename size, simply because all filesystems have different limitations including the length of the filename.
So the end conclusion must be that since us, the developers, don't know anything about which filesystem the data is going to be stored in, so therefore the only solution must be an try and error method.
Not strictly an answer to your specific question, but it might help those who do need to handle long file names.
The Delimon library is a .NET Framework 4 based library on Microsoft TechNet for overcoming the long filenames problem:
Delimon.Win32.I​O Library (V4.0).
It has its own versions of key methods from System.IO. For example, you would replace:
System.IO.Directory.GetFiles
with
Delimon.Win32.IO.Directory.GetFiles
which will let you handle long files and folders.
From the website:
Delimon.Win32.IO replaces basic file functions of System.IO and
supports File & Folder names up to up to 32,767 Characters.
This Library is written on .NET Framework 4.0 and can be used either
on x86 & x64 systems. The File & Folder limitations of the standard
System.IO namespace can work with files that have 260 characters in a
filename and 240 characters in a folder name (MAX_PATH is usually
configured as 260 characters). Typically you run into the
System.IO.PathTooLongException Error with the Standard .NET Library.

Are there alternatives for creating large container files that are cross platform?

Previously, I asked the question.
The problem is the demands of our file structure are very high.
For instance, we're trying to create a container with up to 4500 files and 500mb data.
The file structure of this container consists of
SQLite DB (under 1mb)
Text based xml-like file
Images inside a dynamic folder structure that make up the rest of the 4,500ish files
After the initial creation the images files are read only with the exception of deletion.
The small db is used regularly when the container is accessed.
Tar, Zip and the likes are all too slow (even with 0 compression). Slow is subjective I know, but to untar a container of this size is over 20 seconds.
Any thoughts?
As you seem to be doing arbitrary file system operations on your container (say, creation, deletion of new files in the container, overwriting existing files, appending), I think you should go for some kind of file system. Allocate a large file, then create a file system structure in it.
There are several options for the file system available: for both Berkeley UFS and Linux ext2/ext3, there are user-mode libraries available. It might also be possible that you find a FAT implementation somewhere. Make sure you understand the structure of the file system, and pick one that allows for extending - I know that ext2 is fairly easy to extend (by another block group), and FAT is difficult to extend (need to append to the FAT).
Alternatively, you can put a virtual disk format yet below the file system, allowing arbitrary remapping of blocks. Then "free" blocks of the file system don't need to appear on disk, and you can allocate the virtual disk much larger than the real container file will be.
Three things.
1) What Timothy Walters said is right on, I'll go in to more detail.
2) 4500 files and 500Mb of data is simply a lot of data and disk writes. If you're operating on the entire dataset, it's going to be slow. Just I/O truth.
3) As others have mentioned, there's no detail on the use case.
If we assume a read only, random access scenario, then what Timothy says is pretty much dead on, and implementation is straightforward.
In a nutshell, here is what you do.
You concatenate all of the files in to a single blob. While you are concatenating them, you track their filename, the file length, and the offset that the file starts within the blob. You write that information out in to a block of data, sorted by name. We'll call this the Table of Contents, or TOC block.
Next, then, you concatenate the two files together. In the simple case, you have the TOC block first, then the data block.
When you wish to get data from this format, search the TOC for the file name, grab the offset from the begining of the data block, add in the TOC block size, and read FILE_LENGTH bytes of data. Simple.
If you want to be clever, you can put the TOC at the END of the blob file. Then, append at the very end, the offset to the start of the TOC. Then you lseek to the end of the file, back up 4 or 8 bytes (depending on your number size), take THAT value and lseek even farther back to the start of your TOC. Then you're back to square one. You do this so you don't have to rebuild the archive twice at the beginning.
If you lay out your TOC in blocks (say 1K byte in size), then you can easily perform a binary search on the TOC. Simply fill each block with the File information entries, and when you run out of room, write a marker, pad with zeroes and advance to the next block. To do the binary search, you already know the size of the TOC, start in the middle, read the first file name, and go from there. Soon, you'll find the block, and then you read in the block and scan it for the file. This makes it efficient for reading without having the entire TOC in RAM. The other benefit is that the blocking requires less disk activity than a chained scheme like TAR (where you have to crawl the archive to find something).
I suggest you pad the files to block sizes as well, disks like work with regular sized blocks of data, this isn't difficult either.
Updating this without rebuilding the entire thing is difficult. If you want an updatable container system, then you may as well look in to some of the simpler file system designs, because that's what you're really looking for in that case.
As for portability, I suggest you store your binary numbers in network order, as most standard libraries have routines to handle those details for you.
Working on the assumption that you're only going to need read-only access to the files why not just merge them all together and have a second "index" file (or an index in the header) that tells you the file name, start position and length. All you need to do is seek to the start point and read the correct number of bytes. The method will vary depending on your language but it's pretty straight forward in most of them.
The hardest part then becomes creating your data file + index, and even that is pretty basic!
An ISO disk image might do the trick. It should be able to hold that many files easily, and is supported by many pieces of software on all the major operating systems.
First, thank-you for expanding your question, it helps a lot in providing better answers.
Given that you're going to need a SQLite database anyway, have you looked at the performance of putting it all into the database? My experience is based around SQL Server 2000/2005/2008 so I'm not positive of the capabilities of SQLite but I'm sure it's going to be a pretty fast option for looking up records and getting the data, while still allowing for delete and/or update options.
Usually I would not recommend to put files inside the database, but given that the total size of all images is around 500MB for 4500 images you're looking at a little over 100K per image right? If you're using a dynamic path to store the images then in a slightly more normalized database you could have a "ImagePaths" table that maps each path to an ID, then you can look for images with that PathID and load the data from the BLOB column as needed.
The XML file(s) could also be in the SQLite database, which gives you a single 'data file' for your app that can move between Windows and OSX without issue. You can simply rely on your SQLite engine to provide the performance and compatability you need.
How you optimize it depends on your usage, for example if you're frequently needing to get all images at a certain path then having a PathID (as an integer for performance) would be fast, but if you're showing all images that start with "A" and simply show the path as a property then an index on the ImageName column would be of more use.
I am a little concerned though that this sounds like premature optimization, as you really need to find a solution that works 'fast enough', abstract the mechanics of it so your application (or both apps if you have both Mac and PC versions) use a simple repository or similar and then you can change the storage/retrieval method at will without any implication to your application.
Check Solid File System - it seems to be what you need.

Resources