I have a question about how to store the assembly language in memory, when I compile the C-code in assembly, and run by "step", I can see the address of each instruction, but is there a way to change the start address of the code in the memory?
Second question is, can I break the assembly code into two?
That is, how to store the two parts in separate memory sections?
is there a way to do that?
I am curious about how the machine store the assembly code.
I am working on a MACBOOK Pro, duo core.
For the first question, can we change the offset value? or the linker and loader can not be controlled by the user? I am a litter confuse with your answer, it seems that we can not change it?
For the second question, I think what you are talking about is "input section", even if your have many ".text" input sections in your codes, after being assembled, they will become one ".text" "output section".
And the output section is the actual code stored in memory.
And I am wondering if I can control its position.
I am using the knowledge of DSP assembly, I think the mechanisms are same.
I'm not completely following either of your questions, but I'll guess.
For the first one, you're asking how to change where the executable is positioned in memory? ELF files have a preferred offset that the linker will try to use first, but the loader is usually free to position the sections anywhere if the base offset isn't available. If the image is non-relocatable and the preferred offset is unavailable, the loader will fail and the program won't run
As for your second question, you want to modify the assembly so code will be in different sections? How to do that depends on the assembler you're using; in gas you use the section pseudo-op:
.section new-section-name
The code following that directive will be in the specified section
Related
Is it possible to build a position independant code for windows ?
I am not talking about a dll, i am talking about an executable (PE).
What i want to do is that functions (main for example) which are in the program should be mapped at a different memory address between 2 executions.
Thanks
You should read up on the image base relocation directory in a PE file. I was wondering the same thing you are asking now a while back. From the research I had done I couldn't find any compiler that generates completely base independent code. I ended up writing my program entirely in x86 asm with some tricks to load strings etc independent form the address. I can tell you its not worth the trouble. Just remap all the addresses to your target location.
I suggest you start reading this
https://msdn.microsoft.com/en-us/library/ms809762.aspx
You should also check out Icezilon's (not sure how you spell his name) tutorials. They're very helpful
I am suspecting a double kfree in my kernel code. Basically, I have a data structure that is kzalloced and kfreed in a module. I notice that the same address is allocated and then allocated again without being freed in the module.
I would like to know what technique should I employ in finding where the wrong kfree is issued.
1.
Yes, kmemleak is an excellent tool, especially suitable for system-wide analysis.
Note that if you are going to use it to analyze a kernel module, you may need to save the addresses of the ELF sections containing the code of the module (.text, .init.text, ...) when the module is loaded. This may help you decipher the call stacks in the kmemleak's report. It usually makes sense to ask kmemleak to produce a report after the module has been unloaded but kmemleak cannot resolve the addresses at that time.
While a module is loaded, the addresses fo its sections can be found in the files in /sys/module/<module_name>/sections/.
After you have found the section each code address in the report belongs to and the corresponding offset into that section, you can use objdump, gdb, addr2line or a similar tool to obtain more detailed information about where the event of interest occurred.
2.
Besides that, if you are working on an x86 system and you would like to analyze a single kernel module, you can also use KEDR LeakCheck tool.
Unlike kmemleak, most of the time, it is not required to rebuild the kernel to be able to use KEDR.
The instructions on how to build and use KEDR are here. A simple example of how LeakCheck can be used is described in "Detecting Memory Leaks" section.
Have you tried enabling the kmemleak detection code?
See Documentation/kmemleak.txt for details.
actually i want to make a long code but i have some trouble on some subroutines that are at the end of my code like the space i have is not enough, so i found something like breaking my code in sessions like .data .stack .model small etc. so breaking my code in sessions like these will give me a solution? how linker translate these sessions so that can work on long codes?
what kind of model types exist?
i am working with 8088, so if you know any 16 bit editor,compiler,debugger that i can use in windows i'd appreciate. thanx
Well, you have asked a lot of questions, but I suppose that the main question is how memory models work when it comes to compiling&linking of assembly code. Actually, I've no experience with 8088 processor, but I guess it's not very different than 8086, as it has to work in real mode.
OK, when you're writing an assembly code you've to choose memory model that your program is going to be in. It's done In your assembly by .model directive (AFAIR both TASM and MASM used it).
This directive can have following parameters: TINY, SMALL, MEDIUM, COMACT, LARGE and HUGE. Each of them defines memory model of your program. TINY memory model means that your code and your data are going to be one segment. Other models allow you to declare code & data in separate segments. Or even declare a few segments that contain code (in case they exceed 64KB limit). That'd be for a starter.
Now, having all that said, we need to be note that different memory models change the way we're addressing data. If you chose TINY memory model your code and data are in the same segment. But in SMALL model they are not. So all your ASM instructions that access DATA segment are to have proper segment register set (DS).
I'm attempting to edit a library in hex editor, insert mode. The main point is to rename a few entries in it. If I make it in "Otherwrite" mode, everything works fine, but every time I try to add a few symbols to the end of string in "Insert" mode, the library fails to load. Anything I'm missing here?
Yes, you're missing plenty. A library follows the PE/COFF format, which is quite heavy on pointers throughout the file. (Eg, towards the beginning of the file is a table which points to the locations of each section in the file).
In the case that you are editing resources, there's the potential to do it without breaking things if you make sure you correct any pointers and sizes for anything pointing to after your edits, but I doubt it'll be easy. In the case that you are editing the .text section (ie, the code), then I doubt you'll get it done, since the operands of function calls and jumps are relative locations to their position in code - you would need to update the entire code to account for edits.
One technique to overcome this is a "code cave", where you replace a piece of the existing code with an explicit JMP instruction to some empty location (You can do this at runtime, where you have the ability to create new memory) - where you define some new code which can be of arbitrary length - then you explicitly JMP back to where you called from (+5 bytes say for the JMP opcode + operand).
Are the names you're changing them to the same length as the old names? If not, then the offsets of everything is shifted. And do any of the functions call one another? That could be another problem point. It'd be easier to obtain the source code (from the project's website if it's not in-house, or from the vendor if it's closed) and change them in that, and then recompile it. I'm curious as to why you are changing the names anyway.
DLLs are a complex binary format (ie compiled code). The compiling process turns named function calls into hard-wired references to specific positions in the file ("offsets"). Therefore if you insert characters into the middle of the file, the offsets after that point will no longer match what is actually at the position they reference, meaning that the function calls in your library will run the wrong code (if they manage to run anything at all).
Basically, the bottom line is what you're doing is always going to break stuff. If you're unlucky, it might even break it really badly and cause serious damage.
Sure - a detailed knowledge of the format, and what has to change. If you're wondering why some of your edits cause loading to fail, you are missing that knowledge.
Libraries are intended to be written by the linker for the use of the linker. They follow a well-defined format that is intended to be easy for the linker to write and read. They don't need tolerance for human input like a compiler does.
Very simply, libraries aren't intended to be modified by hex editors. It may be possible to change entries by overwriting them with names of the same length, or that may screw up an index somewhere. If you change the length of anything, you're likely breaking pointers and metadata.
You don't give any reason for wanting to do this. If it's for fun, well, it's harder than you expected. If you have another reason, you're better off getting the source, or getting somebody who has the source to rename and rebuild.
Say there is a buggy program that contains a sprintf() and i want to change it to a snprintf so it doesn't have a buffer overflow.. how do I do that in IDA??
You really don't want to make that kind of change using information from IDA pro.
Although IDA's disassembly is relatively high quality, it's not high quality enough to support executable rewriting. Converting a call to sprintf to a call to snprintf requires pushing a new argument on to the stack. That requires the introduction of a new instruction, which impacts the EA of everything that follows it in the executable image. Updating those effective addresses requires extremely high quality disassembly. In particular, you need to be able to:
Identify which addresses in the executable are data, and which ones are code
Identify which instruction operands are symbolic (address references) and which instruction operands are numeric.
Ida can't (reliably) give you that information. Also, if the executable is statically linked against the crt, it may not contain snpritnf, which would make performing the rewriting by hand VERY difficult.
There are a few potential workarounds. If there is sufficient padding available in (or after) the function making the call, you might be able to get away with only rewriting a single function. Alternatively, if you have access to object files, and those object files were compiled with the /GY switch (assuming you are using Visual Studio) then you may be able to edit the object file. However, editing the object file may still require substantial fix ups.
Presumably, however, if you have access to the object files you probably also have access to the source. Changing the source is probably your best bet.