Heap Information in Windows 7 - memory-management

I made a simple experiment in win7 to test its organization of heaps in memory allocations, using the following code:
char *pointer[50];
for(i=0;i<=49;i++) pointer[i]=new char[64];
for(i=0;i<=49;i++) printf("0x%X\n",pointer[i]);
The output was:
0x572F00
0x572F48
0x572F90
......
Obviously, the space between two adjacent pointers has been 72 bytes rather than 64 bytes. There must be some information kept in the first few bytes of every heap chunk. I printed out the values in the 8 extra bytes and found them to be:
71 39 19 36 B3 9F 00 08
Can anyone please tell me how to tell the size of the heap chunk from these values? Thanks!

Just the idea you want to do this is pretty scary. This is undocumented information, liable to change without notice, liable to vary between debug and non-debug builds, etc, etc. I strongly suggest you find another way, such as storing the length using your own allocator.
In answer to your question, the information I know is stored is a forward and backward link and some flags. The links are probably stored in a single pointer using an XOR scheme. There is probably a sentinel as well.
If you really have to know the answer to this question, it's very easy to find. Simply compile and run your program in Visual Studio and step into the C run-time library code for new. All the declarations and code are there for you to read. Fully commented, very straightforward stuff.
Please note: this is nothing to do with the Windows 7 API. This is the runtime library associated with the C++ compiler (which I assume is Visual Studio).
There are several memory allocators internal to Windows 7, but that's an entirely different story.

Related

Machine Code: How many Read and/or Write cycles are involved in the Fetch and Execute cycles of the following instructions execution?

Okay, I'm going through past exam questions for a module, Computer Architecture, and I've come across the following question and I have no idea how to do it? If anyone can tell/show me how I would answer this or send me a link where I could learn how to answer this type of question that would be ideal. Thanks.
Q: How many Read and/or Write cycles are involved in the Fetch and Execute cycles of the
following instructions execution of the following :
a) LDA B $10EF corresponding machine code A6 10 EF, Extended addressing.
b) LDA B #$ 3B corresponding machine code C6 3B, Immediate addressing.
c) STA B $6020 corresponding machine code 57 60 20, Extended addressing.
Without the information regarding which CPU it is, all we can give is general advice on how to work it out.
Those opcodes look like rather simple ones from the early days of the PC industry but they don't match the more popular chips of that time frame.
The basic approach would be to look up the instructions in the CPU reference/guide and it would tell you what read and write cycles would occur for a given instruction/addressing-mode combination.
For example, immediate addressing is usually just the extraction of a value at or near the program counter (PC) so would involve a simple read.
Extended addressing depends, of course, on what they mean by extended. It may be a single de-reference which would involve reading a word at or near the PC followed by the use of that value to read another. Or it may be two levels of indirection. Or their definition of extended could be some bizarre combination of indexed, based and indirect addressing combined, which would result in even more cycles.
Without the chip specs, it's difficult to be certain.
My advice is to comb through the course material (if available) to try and discern what CPU is being used, then look that up with your favourite search engine. It doesn't appear to be any of the usual suspects like Mostek 6502 and derivatives, the Motorola 680x series, or the TI chips.
The other thing you could try is to post all of the questions (or a link to them) up here, the extra information may provide a clue to the architecture in use.

Can Visual Studio tell me the SSE2 register spill count of compiled code?

I do not have any real compiler knowledge, and I used to hand-code SSE2 functions for selected pieces of code. I know how to read the generated machine code, but largely unaware of the crazy optimizations made possible by compilers. All of my work is done using Visual Studio.
Is there a way for Visual Studio to tell me the SSE2 register spill count of a piece of function? The reason is that we are soon able to mass-produce SSE2-like code (templated), and we would like each one of them to be compiled into decent quality machine code. We possibly can't manually check each one of them. What I hope to get is some sort of guarantee that the compiled code is acceptable and concise. I don't need to get the last bit of juice.
Alternatively, is there a keyword that works like __forceinline that forces compiler to not spill any SSE2 registers, like "__forcenospill" ? (If spill has to happen, the compile will fail, and therefore I would be aware of the problem and try to refactor my SSE2 code.)
Using an existing vector-library or blitter would be out of question because some of the calculations need to be highly registerized (6 or more operands in one step in a "simple operation" (Note #1); intermediate values promoted to 16-bit or 32-bit on-the-fly and converted back, etc) Rephrasing it with a generic vector-library would mean doubling or tripling of runtime (been there, done that).
Commercial tools are okay too, I can certainly afford it given the project's nature.
If there is no such tool, I will resort to profiling. You may downvote this post to let me know that such things don't exist.
Thanks!
(Note #1) it's an adaptive thresholding algorithm.

Downloading the binary code from an AT89S52 chip

I have an AT89S52, and I want to read the program burned on it.
Is there a way to do it with the programming interface?
(I am well aware it will be assembly code, but I think I can handle it, since I'm looking for a specific string in that code)
Thanks
You may not be able to (at all easily anyway) -- that chip has three protection bits that are intended to prevent you from doing so. If you're dealing with some sort of commercial product, chances are pretty good that those bits will be set.
Reference: Datasheet, page 20, section 17.

Windows API calls from assembly while minimizing program size

I'm trying to write a program in assembly and make the resulting executable as small as possible. Some of what I'm doing requires windows API calls to functions such as WriteProcessMemory. I've had some success with calling these functions, but after compiling and linking, my program comes out in the range of 14-15 KB. (From a source of less than 1 KB) I was hoping for much, much less than that.
I'm very new to doing low level things like this so I don't really know what would need to be done to make the program smaller. I understand that the exe format itself takes up quite a bit of space. Can anything be done to minimize that?
I should mention that I'm using NASM and GCC but I can easily change if that would help.
See Tiny PE for a bunch of tips and tricks you can use to reduce the final size of your executable. Be warned that some of the later techniques in that article are extremely fragile.
The default section alignment for most PE files is 4K to align with the natural system memory layout. If you have a .data, .text and .resource section - that's 12K already. Most of it will be 0's and a waste of space.
There are a few things you can do to minimize this waste. First, reduce the section alignment to 512 bytes (don't know the options needed for nasm/gcc). Second, merge the sections so that you only have a single .text section. This can be a problem though for modern machines with the NX bit turned on. This security feature prevents modification of executable sections of code from things like viruses.
There are also a slew of PE compression tools out there that will compact your PE and decompress it when executed.
I suggest using the DumpBin utility (or GNU's objdump) to determine what takes the most space. It may be resource files, huge global variables or something like that.
FWIW, the smallest programs I can assemble using ML or ML64 are on the order of 3kb. (That's just saying hello world and exiting.)
Give me a small C program (not C++), and I'll show you how to make a 1 ko .exe with it. The smallest size of executable I recommend is 1K, because it will fail to run on some Windows if it's not at least this size.
You merely have to play with linker switches to make it happen!
A good linker to do this is polink.
And if you do everything in Assembly, it's even easier. Just go to the MASM32 forum and you'll see plenty of programs like this.

What is your favourite anti-debugging trick?

Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
At my previous employer we used a third party component which basically was just a DLL and a header file. That particular module handled printing in Win32. However, the company that made the component went bankcrupt so I couldn't report a bug I'd found.
So I decided to fix the bug myself and launched the debugger. I was surprised to find anti-debugging code almost everywhere, the usual IsDebuggerPresent, but the thing that caught my attention was this:
; some twiddling with xor
; and data, result in eax
jmp eax
mov eax, 0x310fac09
; rest of code here
At the first glance I just stepped over the routine which was called twice, then things just went bananas. After a while I realized that the bit twiddling result was always the same, i.e. the jmp eax always jumped right into the mov eax, 0x310fac09 instruction.
I dissected the bytes and there it was, 0f31, the rdtsc instruction which was used to measure the time spent between some calls in the DLL.
So my question to SO is: What is your favourite anti-debugging trick?
My favorite trick is to write a simple instruction emulator for an obscure microprocessor.
The copy protection and some of the core functionality will then compiled for the microprocessor (GCC is a great help here) and linked into the program as a binary blob.
The idea behind this is, that the copy protection does not exist in ordinary x86 code and as such cannot be disassembled. You cannot remove the entire emulator either because this would remove core functionality from the program.
The only chance to hack the program is to reverse engineer what the microprocessor emulator does.
I've used MIPS32 for emulation because it was so easy to emulate (it took just 500 lines of simple C-code). To make things even more obscure I didn't used the raw MIPS32 opcodes. Instead each opcode was xor'ed with it's own address.
The binary of the copy protection looked like garbage-data.
Highly recommended! It took more than 6 month before a crack came out (it was for a game-project).
I've been a member of many RCE communities and have had my fair share of hacking & cracking. From my time I've realized that such flimsy tricks are usually volatile and rather futile. Most of the generic anti-debugging tricks are OS specific and not 'portable' at all.
In the aforementioned example, you're presumably using inline assembly and a naked function __declspec, both which are not supported by MSVC when compiling on the x64 architecture. There are of course still ways to implement the aforementioned trick but anybody who has been reversing for long enough will be able to spot and defeat that trick in a matter of minutes.
So generally I'd suggest against using anti-debugging tricks outside of utilizing the IsDebuggerPresent API for detection. Instead, I'd suggest you code a stub and/or a virtual machine. I coded my own virtual machine and have been improving on it for many years now and I can honestly say that it has been by far the best decision I've made in regards to protecting my code so far.
Spin off a child process that attaches to parent as a debugger & modifies key variables. Bonus points for keeping the child process resident and using the debugger memory operations as a kind of IPC for certain key operations.
On my system, you can't attach two debuggers to the same process.
Nice thing about this one is unless they try to tamper w/ things nothing breaks.
Reference uninitialized memory! (And other black magic/vodoo...)
This is a very cool read:
http://spareclockcycles.org/2012/02/14/stack-necromancy-defeating-debuggers-by-raising-the-dead/
The most modern obfuscation method seems to be the virtual machine.
You basically take some part of your object code, and convert it to your own bytecode format. Then you add a small virtual machine to run this code. Only way to properly debug this code will be to code an emulator or disassembler for your VM's instruction format. Of course you need to think of performance too. Too much bytecode will make your program run slower than native code.
Most old tricks are useless now:
Isdebuggerpresent : very lame and easy to patch
Other debugger/breakpoint detections
Ring0 stuff : users don't like to install drivers, you might actually break something on their system etc.
Other trivial stuff that everybody knows, or that makes your software unstable. remember that even if a crack makes your program unstable but it still works, this unstability will be blamed on you.
If you really want to code the VM solution yourself (there are good programs for sale), don't use just one instruction format. Make it polymorphic, so that you can have different parts of the code have different format. This way all your code can't be broken by writing just one emulator/disassembler. For example MIPS solution some people offered seems to be easily broken because MIPS instruction format is well documented and analysis tools like IDA can already disassemble the code.
List of instruction formats supported by IDA pro disassembler
I would prefer that people write software that is solid, reliable and does what it is advertised to do. That they also sell it for a reasonable price with a reasonable license.
I know that I have wasted way too much time dealing with vendors that have complicated licensing schemes that only cause problems for the customers and the vendors. It is always my recommendation to avoid those vendors. Working at a nuclear power plant we are forced to use certain vendors products and thus are forced to have to deal with their licensing schemes. I wish there was a way to get back the time that I have personally wasted dealing with their failed attempts to give us a working licensed product. It seems like a small thing to ask, but yet it seems to be a difficult thing for people that get too tricky for their own good.
I second the virtual machine suggestion. I implemented a MIPS I simulator that (now) can execute binaries generated with mipsel-elf-gcc. Add to that code/data encryption capabilities (AES or with any other algorithm of your choice), the ability of self-simulation (so you can have nested simulators) and you have a pretty good code obfuscator.
The nice feature of choosing MIPS I is that 1) it's easy to implement, 2) I can write code in C, debug it on my desktop and just cross-compile it for MIPS when it's done. No need to debug custom opcodes or manually write code for a custom VM..
My personal favourite was on the Amiga, where there is a coprocessor (the Blitter) doing large data transfers independent from the processor; this chip would be instructed to clear all memory, and reset from a timer IRQ.
When you attached an Action Replay cartridge, stopping the CPU would mean that the Blitter would continue clearing the memory.
Calculated jumps in the middle of a legitimate looking but really hiding an actual instruction instructions are my favorite. They are pretty easy to detect for humans anyway, but automated tools often mess it up.
Also replacing a return address on the stack makes a good time waster.
Using nop to remove assembly via the debugger is a useful trick. Of course, putting the code back is a lot harder!!!

Resources