I am in a process of learning things in reverse order for fun, and I have decided to dissect Windows 10, bit-by-bit, and learn what makes a great OS function. And I also suppose that my question will be geared in other ways as well.
My question is, how do I look at something like Windows bootmgr source code properly? I have opened the file - which the file type is redundantly called "File" - and even though it is in Assembly language, it is completely impossible to read. My guess is that whoever wrote the File did something to encrypt the File so that it is unreadable, and thus unchangeable/unable to be edited.
Let me be perfectly clear: my purpose is not to change the bootmgr File to change windows, but rather to get a better understanding of how an OS works via reading, and also through trial and error.
Any help that anyone can give would be greatly appreciated. I love to learn about these things, and I just have been completely unable to find the answer I am looking for on any site thus far, including this one...IDK if I need to refine my searches or what.
Thank in advanced for your help. :)
Ps. I shall include a picture of what I am seeing in Notepad++ so you can get a better understanding of what I need here .
I think you may be confusing assembly language with machine code. Machine code is the language that your computer's processor understands. Assembly language is a series of symbols that are used to represent machine code. Compiled executables are stored in machine code.
That said, the standard way to view the machine code for a compiled binary is through the use of a program called a hex editor. A hex editor will display the binary code in a numerical format, rather than attempting to interpret the binary as text, like your editor is trying to do in the screenshot you supplied. Frhed is a popular hex editor, but there are many good ones to choose from.
Related
So I've recently been looking into the Control Panel, to try to see how I might be able to create a custom applet for it, like sometimes you get a custom one when you use a printer, and I just can't seem to figure out how to make one. I've tried opening one of them in a code editing software, and assume that they are compiled as all I get is a bunch of random characters, but I'm not quite sure whether it is or not. I've looked for anything related to it, but the closest thing to an answer I have is something about trying to make the applet show up, but it doesn't say how it's made, so its not really useful.
Thanks in advance.
There are two types of control panel applets:
.Exe files. These are normal applications and can be written in any language.
.Cpl files. These are actually normal .DLL files and can be written in anything that can produce a PE DLL with a named exported function (C/C++, Delphi or if you must, C#).
Support for .Exe applets started in Vista and is now the preferred method according to Microsoft.
For programs to run on a computer, does the code have to be converted into machine code so the CPU can run it?
How does this happen?
Wow! That needs a lot to explain. :D
First, machines like humans have their own languages so we can simply say that if you want a computer to work as you say, you have to say it in its language :)
But you probably heard about compiling and interpreting:
Compile: convert (a program) into a machine-code or lower-level form in which the program can be executed.
So basically it means that the code code will be converted to something else like an executable file, when the programmer(s) decides that they are done programming. So if you look at an .exe file with notepad, you cannot simply understand any thing. and the code that has been compiled for windows, cannot be executed on Mac.
interpreting : the code will be converted by another program in the runtime. So the code is human readable until the last seconds. Like if you right click on this page and select "view page source", you can see the HTML code that has been generated for this page. This means that the code flexible and can work on different machines like as you see, you can see the same page on your Mac, windows or with different browsers like chrome, firefox or IE but then it will be a lot slower than compiling.
What we do in practice?
We compile our code to a an intermediate language that is understandable by a virtual machine that is specific for each machine.
Let me explain it with an example. Lets say someone wants to give a speech in UN lets say in Chinese.
If he translate all of his speech to different languages and give it to people, it is compiling.
If he speaks and some people translate his words online to French, English, etc. then it is interpreting. But it sucks and you probably won't find anyone to do it for many languages
If he give a translated version (like English version) of it to translators, before the speech and they can read it and say in to different languages when the speaker speaks, then it is what we do now :D
You can read more in here : Runtime vs Compile time
All texts on how to create a compiler stop after explaining lexers and parsers. They don't explain how to create the machine code. I want to understand the end-to-end process.
Currently what I understand is that, the Windows exe file formats are called Portable Executable. I read about the headers it has and am yet to find a resource which explains this easily.
My next issue is, I don't see any resource which explains how machine code is stored in the file. Is it like 32-bit fixed length instructions stored one after another in the .text section?
Is there any place which at least explains how to create an exe file which does nothing (it has a No Op instruction). My next step then would be linking to dll files to print to console.
Nice question! I don't have much expertise on this specific question, but this is how I would start:
PE or ELF does not create pure machine code. It also contains some header info etc. Read more: Writing custom data to executable files in Windows and Linux
I assume you are looking for how does ELF/PE file hold the machine code, you can get that from this question (using objdump): How do you extract only contents of an ELF section
Now, if you want to know how the content part is generated in the first place, i.e. how is the machine code generated, then that's the task of the compiler's code generation.
Try out some resource editor like ResourceEditor to understand the exe or simply ildasm.
PS: These are mostly Unix solutions, but I am sure, PE should be doing something fundamentally similar.
I think the best way to approach it will be first try to analyze how existing PE/ELFs work, basically reverse engineering. And to do that, Unix machine will be a good point to start. And then do your magic :)
Not same but a similar question here.
Update:
I generated an object dump out of a sample c code. Now, I assume that's what you are targeting right? You need to know do you generate this file (a.out)?
https://gist.github.com/1329947
Take a look at this image, a life time of a c code.
Source
Now, just to be clear, you are looking to implement the final step, i.e. conversion of object code to executable code?
As in many of his articles, I'd say Matt Pietrek's piece about PE internals remains the best introdction to the matter more than a decade after being written.
Iv'e used "Wotsit's File Format" for years... all the way back to the days of MS-Dos :-) and back to when it was just a collection of text files you could download from most BBS systems called "The Game programmers file type encyclopaedia"
It's now owned by the people that run Gamedev.Net, and probably one of the best kept secrets on the internet.
You'll find the EXE format on this page : http://www.wotsit.org/list.asp?fc=5
Enjoy.
UPDATE June 2020 - The link above seems to be now dead, I've found the "EXE" page listed on this web archive page of the wotsit site: https://web.archive.org/web/20121019145432/http://www.wotsit.org/list.asp?al=E
UPDATE 2 - I'm keeping the edit as it was when I added the update erlier, thanks to those who wanted to edit it, but it's for a good reason I'm rejecting it:
1) Wotsit.org may at some point in the future come back online, if you actually try visiting the url, you'll find that it's not gone, it does still respond, it just responds with an error message. This tells me that someone is keeping the domain alive for whatever reason.
2) The archive links do seem to be a bit jittery, some work, some don't, sometimes they seem to work, then after a refresh they don't work, then they do work again. I remember from experience when wotsit was still online, they they had some very strange download/linking detection code in, and this probably caused archive.org to get some very wierd results, I do remember them taking this stance because of the huge number of 3rd party sites trying to cash in on their success, by pretending to be affiliate's and then direct linking to wotsit from an ad infested site.
Until the wotsit domain is removed entirely from the internet and not even the DNS responds, then would be the time to wrap everything up into single archive links, until then, this is the best way to maintain the link.
Not surprisingly the best sites for information about writing PE format files are all about creating viruses.
A search of VX Heavens for "PE" gives a whole bunch of tutorials for modifying PE files
Some information about making PE files as small as possible: Tiny PE.
The minimalistic way to mess around with code generation, if you're just looking to try a few simple things out, is to output MS-DOS .COM files, which have no header or metadata. Sadly, you'd be restricted to 16-bit code. This format is still somewhat popular for demos.
As for the instruction format, from what I recall the x86 instruction set is variable-length, including 1-byte instructions. RISC CPUs would probably have fixed-length instructions.
For Linux, one may read and run the examples from
"Programming from the Ground Up" by Jonathan Bartlett:
http://www.cs.princeton.edu/courses/archive/spr08/cos217/reading/ProgrammingGroundUp-1-0-lettersize.pdf
Then of course one may prefer to hack Windows programs. But perhaps the former
gives a better way to understand what really goes on.
Executable file format is dependent on the OS. For windows it is PE32(32 bit) or PE32+(64 bit).
The way the final executable look like depends on the ABI (application binary interface) of the OS. The ABI tells how the OS loader should load the exe and how it should relocate it, whether it is dll or plain executable etc..
Every object file(executable or dll or driver) contains a part called sections. This is where all of our code, data, jump tables etc.. are situated.
Now, to create an object file, which is what a compiler does, you should not just create the executable machine code, but also the headers, symbol table, relocation records, import/export tables etc..
The pure machine code generation part is completely dependent on how much optimized you want your code to be. But to actually run the code in the PC, you must have to create a file with all of the headers and related data(check MSDN for precise PE32+ format) and then put all of the executable machine code(which your compiler generated) into one of the sections(usually code resides in section called .text). If you have created the file conforming to the PE32+ format, then you have now successfully created an executable in windows.
I'm searching for a tool that will take a source directory and produce a single PDF containing the source code, preferably with syntax highlighting.
I would like to read the PDF on my phone, in order to get familiar with a code-base, or just to see what I can learn by reading a lot of code. I will most often be reading Ruby.
I would prefer if the tool ran on Linux. I don't mind paying for a tool if it is particularly good.
Any suggestions?
You could wipe something up yourself with Prawn and Ultraviolet.
PDF is no good for reflowing. You might like a html based solution better.
And in reading existing code, a lineair model is no good. You need to jump from one file to the other. A hypertext model with history would probably work best on the limited screen estate of a phone. It should borrow some features of the smalltalk IDEs (jump to senders, implementors).
For the UI, take a look at clamato
GNU source-highlight supports many languages and can output LaTeX in particular that can be converted to pdf.
The SciTE editor can export the currently edited file (with syntax highlighting) to PDF (and HTML, RTF, LaTeX and XML).
Alas, it doesn't have batch conversion capability, but IIRC somebody made a batch tool out of this code base.
I realize this is very late, but I wanted to do the same thing, except I wanted it for my tablet, which is a Galaxy Note 10.1 with a Wacom digitizer that I can use to annotate code. I found that one good solution is to use Doxygen to generate a PDF which will have hyperlinks and everything you would want in a PDF. For my use case, I would pair it with EzPDF on Android to annotate the code. This was also for the purpose of learning a new codebase. In the end I ended up not using the generated PDF but it was pretty usable.
A friend of mine downloaded some malware from Facebook, and I'm curious to see what it does without infecting myself. I know that you can't really decompile an .exe, but can I at least view it in Assembly or attach a debugger?
Edit to say it is not a .NET executable, no CLI header.
With a debugger you can step through the program assembly interactively.
With a disassembler, you can view the program assembly in more detail.
With a decompiler, you can turn a program back into partial source code, assuming you know what it was written in (which you can find out with free tools such as PEiD - if the program is packed, you'll have to unpack it first OR Detect-it-Easy if you can't find PEiD anywhere. DIE has a strong developer community on github currently).
Debuggers:
OllyDbg, free, a fine 32-bit debugger, for which you can find numerous user-made plugins and scripts to make it all the more useful.
WinDbg, free, a quite capable debugger by Microsoft. WinDbg is especially useful for looking at the Windows internals, since it knows more about the data structures than other debuggers.
SoftICE, SICE to friends. Commercial and development stopped in 2006. SoftICE is kind of a hardcore tool that runs beneath the operating system (and halts the whole system when invoked). SoftICE is still used by many professionals, although might be hard to obtain and might not work on some hardware (or software - namely, it will not work on Vista or NVIDIA gfx cards).
Disassemblers:
IDA Pro(commercial) - top of the line disassembler/debugger. Used by most professionals, like malware analysts etc. Costs quite a few bucks though (there exists free version, but it is quite quite limited)
W32Dasm(free) - a bit dated but gets the job done. I believe W32Dasm is abandonware these days, and there are numerous user-created hacks to add some very useful functionality. You'll have to look around to find the best version.
Decompilers:
Visual Basic: VB Decompiler, commercial, produces somewhat identifiable bytecode.
Delphi: DeDe, free, produces good quality source code.
C: HexRays, commercial, a plugin for IDA Pro by the same company. Produces great results but costs a big buck, and won't be sold to just anyone (or so I hear).
.NET(C#): dotPeek, free, decompiles .NET 1.0-4.5 assemblies to C#. Support for .dll, .exe, .zip, .vsix, .nupkg, and .winmd files.
Some related tools that might come handy in whatever it is you're doing are resource editors such as ResourceHacker (free) and a good hex editor such as Hex Workshop (commercial).
Additionally, if you are doing malware analysis (or use SICE), I wholeheartedly suggest running everything inside a virtual machine, namely VMware Workstation. In the case of SICE, it will protect your actual system from BSODs, and in the case of malware, it will protect your actual system from the target program. You can read about malware analysis with VMware here.
Personally, I roll with Olly, WinDbg & W32Dasm, and some smaller utility tools.
Also, remember that disassembling or even debugging other people's software is usually against the EULA in the very least :)
psoul's excellent post answers to your question so I won't replicate his good work, but I feel it'd help to explain why this is at once a perfectly valid but also terribly silly question. After all, this is a place to learn, right?
Modern computer programs are produced through a series of conversions, starting with the input of a human-readable body of text instructions (called "source code") and ending with a computer-readable body of instructions (called alternatively "binary" or "machine code").
The way that a computer runs a set of machine code instructions is ultimately very simple. Each action a processor can take (e.g., read from memory, add two values) is represented by a numeric code. If I told you that the number 1 meant scream and the number 2 meant giggle, and then held up cards with either 1 or 2 on them expecting you to scream or giggle accordingly, I would be using what is essentially the same system a computer uses to operate.
A binary file is just a set of those codes (usually call "op codes") and the information ("arguments") that the op codes act on.
Now, assembly language is a computer language where each command word in the language represents exactly one op-code on the processor. There is a direct 1:1 translation between an assembly language command and a processor op-code. This is why coding assembly for an x386 processor is different than coding assembly for an ARM processor.
Disassembly is simply this: a program reads through the binary (the machine code), replacing the op-codes with their equivalent assembly language commands, and outputs the result as a text file. It's important to understand this; if your computer can read the binary, then you can read the binary too, either manually with an op-code table in your hand (ick) or through a disassembler.
Disassemblers have some new tricks and all, but it's important to understand that a disassembler is ultimately a search and replace mechanism. Which is why any EULA which forbids it is ultimately blowing hot air. You can't at once permit the computer reading the program data and also forbid the computer reading the program data.
(Don't get me wrong, there have been attempts to do so. They work as well as DRM on song files.)
However, there are caveats to the disassembly approach. Variable names are non-existent; such a thing doesn't exist to your CPU. Library calls are confusing as hell and often require disassembling further binaries. And assembly is hard as hell to read in the best of conditions.
Most professional programmers can't sit and read assembly language without getting a headache. For an amateur it's just not going to happen.
Anyway, this is a somewhat glossed-over explanation, but I hope it helps. Everyone can feel free to correct any misstatements on my part; it's been a while. ;)
Good news. IDA Pro is actually free for its older versions now:
http://www.hex-rays.com/idapro/idadownfreeware.htm
x64dbg is a good and open source debugger that is actively maintained.
Any decent debugger can do this. Try OllyDbg. (edit: which has a great disassembler that even decodes the parameters to WinAPI calls!)
If you are just trying to figure out what a malware does, it might be much easier to run it under something like the free tool Process Monitor which will report whenever it tries to access the filesystem, registry, ports, etc...
Also, using a virtual machine like the free VMWare server is very helpful for this kind of work. You can make a "clean" image, and then just go back to that every time you run the malware.
I'd say in 2019 (and even more so in 2022), Ghidra (https://ghidra-sre.org/) is worth checking out. It's open source (and free), and has phenomenal code analysis capabilities, including the ability to decompile all the way back to fairly readable C code.
Sure, have a look at IDA Pro. They offer an eval version so you can try it out.
You may get some information viewing it in assembly, but I think the easiest thing to do is fire up a virtual machine and see what it does. Make sure you have no open shares or anything like that that it can jump through though ;)
Boomerang may also be worth checking out.
I can't believe nobody said nothing about Immunity Debugger, yet.
Immunity Debugger is a powerful tool to write exploits, analyze malware, and reverse engineer binary files. It was initially based on Ollydbg 1.0 source code, but with names resoution bug fixed. It has a well supported Python API for easy extensibility, so you can write your python scripts to help you out on the analysis.
Also, there's a good one Peter from Corelan team wrote called mona.py, excelent tool btw.
If you want to run the program to see what it does without infecting your computer, use with a virtual machine like VMWare or Microsoft VPC, or a program that can sandbox the program like SandboxIE
You can use dotPeek, very good for decompile exe file. It is free.
https://www.jetbrains.com/decompiler/
What you want is a type of software called a "Disassembler".
Quick google yields this: Link
If you have no time, submit the malware to cwsandbox:
http://www.cwsandbox.org/
http://jon.oberheide.org/blog/2008/01/15/detecting-and-evading-cwsandbox/
HTH
The explorer suite can do what you want.