Finding the syscalls associated with a number - ptrace

I am tracing a program using ptrace. After stopping on a syscall, I use PTRACE_PEEKUSER to look at the value of (ORIG_)EAX. Actually RAX since I'm 64 bit.
What is a good way of translating this into the appropriate value?
For example 2-> "open" ( IIRC).

You have to make your own table mapping the numbers to the names. I don't think there's any other way. I've often wished otherwise, but... Also, note that the mappings are arch-dependent.

Related

Follow register changes with gdb

How can I follow on changes in specific registers using GDB?
I want to write a log each instruction's address that changed the value on this register
How can I do that using GDB ?
I want to write a log each instruction's address that changed the value on this register
The only way to do this is to single-step the program, compare values of registers to previously-saved values, and print previous value of instruction pointer if the value of the register of interest has changed.
You can automate this by using GDB embedded Python, but even with automation this will be impractically slow for any non-trivial program (as would single-stepping without actually doing anything between the steps).
P.S. Depending on what actual problem you are trying to solve (see http://xyproblem.info), more practical solutions may exist.

Compiling Binary

I would like to get a feel of how computers originally worked. I know initially with computers such as ENIAC, they physically had to plug in wires in the correct order to make their programs execute. They later used punch cards and finally then came up with assembly language(s). It just build upward from there with FORTRAN, COBOL, etc. Is there any way I am compile 0s and 1s on my computer. If I open textedit, and type in a specific sequence of zeroes and ones, then is how can I make that a binary file and not a text with a sequence of ASCII characters? I am open to any method. (Disclaimer: I know doing things in binary takes forever, I just want to learn how to very basic things.)
The easiest way to do this is to start with an assembler of your choice, in an IDE if you like. Use some sort of debugger (such as an IDE) so you can see the effect of your code without also having to write to console or file.
Rather than writing only binary as text digits, write a complete assembler source using data elements instead of instructions.
So, instead of
.code
main proc
mov eax,5
add eax,6
main endp
end main
you could write:
main proc
db 10111000b, 00000101b, 00000000b, 00000000b, 00000000b
db 10000011b, 11000000b, 00000110b
main endp
end main
db means define byte and the b suffix means binary.
And, with this, you'd be all set up to cheat, but I won't tell you how until you ask so I don't spoil the fun for you.
Here is a good tutorial for getting started on Windows with MASM and Visual Studio 2015.
The way to create a binary data stream depends heavily on its purpose.
Binary data itself is not much of magic. You can take any hex-editor and start typing the desired binary input.
But this is not how computers are programmed nowadays. If you really want to go to the lowest level, you can have a look on assembly programming, which basically allows you to tell your machine the exact instructions it should execute in a more handy way.
But even here you won't have much fun. If you want to be able to actually execute your programs and see some results on your display or perhaps even things like keyboard input, the code would grow really large and hard to write and understand for humans.
This is why we use compilers. Compilers generate such code from a high level language and eliminate the need to write the smallest instruction blocks over and over again.
If you really just want to understand how computers work in principle, download some emulator for a simple CPU (perhaps with a nice GUI)
and play around with it. Edumips is one of those emulators for educational purpose.

Is there a way to watch all registers for a specific value in GDB?

I'm reverse engineering a c program that has no debugging symbols in GDB. It asks for a specific 1-15 digit pin number and tells you whether or not it is correct. My goal is to find out what this pin number is.
I am trouble finding where my pin number guess is compared to the correct pin number. One method I think would help is to find any place where my guess is loaded into a register.
So on to my question, is it possible to check and see if a specific value is loaded into any register?
For instance I can do this with individual registers by using watch $rax == 1234, but I'd like to do this for every register.
GDB does not have this functionality.
This sounds like a bad approach, because if the digits are numbers from 0-9 you will get a lot of false positives and you cannot even be sure that is how they are represented.
An easier approach should be to look for changes closely related to the pin entering or failing and tracking the data from there:
If there is console output when a wrong pin is entered, look for this string in the binary and find where it is referenced.
Look for references to scanf / printf if this is a console application.
If it is not using scanf - for example an external keypad, you could find the number of entered digits with a tool like scanmem.

Why does "garbage" data appear to not be meaningful?

I have always wondered why garbage data appears to not be meaningful. For clarity, what I mean by "garbage" is data that is just whatever happens to be at a particular memory address, that you have access to because of something like forgetting to initialize a variable.
For example, printing out an unused array gave me this:
#°õN)0ÿÿl¯ÿ¯ÿ ``¯ÿ¯ÿ #`¯ÿø+))0 wy¿[d
Obviously, this is useless for my application, but it also seems like it is not anything useful for any application. Why is this? Is there some sort of data protection going on here perhaps?
As you state in your question:
... "garbage" is data that is just whatever happens to be at a particular memory address, that you have access to because of something like forgetting to initialize a variable.
This implies that something else used to be in that memory before you got to use it for your variable. Whatever used to be there may or may not have any relation to how you wish to use the variable. That is, most languages do not force memory used for one type of object to be reused for the exact same type.
This means, if memory was used to store a pointer, and then released, that same memory may be used to store a string. If the pointer value was read out as if it was a string, something that looks like garbage may appear. This is because the bytes used to represent a pointer value are not restricted to the values that correspond to printable ASCII values.
A common way to detect a buffer overrun has occurred in a program is to examine a pointer value and see if it contains printable ASCII values. In this case, the user of the memory as a pointer sees junk, but in this case it is "printable".
Of course memory is never garbage, unless you make a conscious effort. After all, you are on a deterministic machine, even if it doesn't always seem like it. (Of course, if you interprete arbitrary bytes as text then it's unlikely that you see yourself as ASCII art, although you would deserve it.)
That was the reason for one of the worst bugs in history, quite recently, cf. https://xkcd.com/1354/. Where do you live to have missed it?

Is 'handle' synonymous to pointer in WinAPI?

I've been reading some books on windows programming in C++ lately, and I have had some confusing understanding of some of the recurring concepts in WinAPI. For example, there are tons of data types that start with the handle keyword'H', are these supposed to be used like pointers? But then there are other data types that start with the pointer keyword 'P'. So I guess not. Then what is it exactly? And why were pointers to some data types given separate data types in the first place? For example, PCHAR could have easily designed to be CHAR*?
Handles used to be pointers in early versions of Windows but are not anymore. Think of them as a "cookie", a unique value that allows Windows to find back a resource that was allocated earlier. Like CreateFile() returns a new handle, you later use it in SetFilePointer() and ReadFile() to read data from that same file. And CloseHandle() to clean up the internal data structure, closing the file as well. Which is the general pattern, one api function to create the resource, one or more to use it and one to destroy it.
Yes, the types that start with P are pointer types. And yes, they are superfluous, it works just as well if you use the * yourself. Not actually sure why C programmers like to declare them, I personally think it reduces code readability and I always avoid them. But do note the compound types, like LPCWSTR, a "long pointer to a constant wide string". The L doesn't mean anything anymore, that dates back to the 16-bit version of Windows. But pointer, const and wide are important. I do use that typedef, not doing so will risk future portability problems. Which is the core reason these typedefs exist.
A handle is the same as a pointer only so far as both ID a particular item. Obviously a pointer is the address of the item so if you know it's structure you can start getting fields in the item. A handle may or may not be a pointer - basically if it is a pointer you don't know what it is pointing to so you can't get into the fields.
Best way to think of a handle is that it is a unique ID for something in the system. When you pass it to something in the system the system will know what to cast it to (if it is a pointer) or how to treat it (if it is just some id or index).

Resources