For my LC3 assignment, I need to enter x3100 for the starting address of the file, how would I do that? Like what opcode would I need, I am not really sure among the ones we've studied so far.
You would use
.ORIG x3100
At the beginning of your code. It's called a pseudo-op or assembler directive, because only the assembler uses it.
Related
I am just starting to learn Assembly (x86 NASM) and I am currently going over function calls. Wherever I looked on the internet I saw everyone calling functions like this:
call power
Where power is the label where the function starts. But what I am trying to see is how to print something in Assembly, and interestingly enough, calling a function like in the above case doesn't seem to work. We'll use the printf function from C. Say I already used extern printf and import printf msvcrt.dll in my program (so I can actually use printf), also say I already defined a symbol in my data segment msg db "Hello World", 0 and now I am trying to print this message. If I do this:
push dword msg
call printf
Nothing happens, it doesn't work. I have no idea why. However, if I do this:
push dword msg
call [printf]
The message is printed just as expected.
This doesn't make much sense to me after all the articles that I read used just the label, without brackets. It also made a lot of sense to me when using just the label as we're using the call instruction to perform a jump to that label, so we needed the address of the label. But here it doesn't make sense at all to me why we're using the brackets and what exactly happens. I mean, what is [printf] and what would [power] be, for the example I presented at the start of my question. However, despite my confusion, this is what works and the method I initially used doesn't work.
Can you please tell me exactly what is going on? (PS: I am using Olly Debugger if that makes any difference)
It depends on what is "printf" in your assembly. If it is a function pointer (aka, the address of some function is stored at the address named "printf"), then you need brackets []. If "printf" is a function, that is, if the machine code is stored at the address that your assembler calls "printf", then you must not put brackets (or else you will probably end up with a segmentation fault, as the first 32 of 64 bits of machine code of "printf" probably don't accidentally contain an address of an executable code).
I've tried declaring variables in .text segment using e.g. file_handle: dd 0.
However, trying to store something in this variable like mov [file_handle], eax results in a write error.
I know, I could declare writeable variables in the .data segment, but to make the code more compact I'd like to try it as above.
Is the only possibility to use the stack for storing these value (e.g. the file handle), or could I somehow write to my variable above?
Executable code segments are not writable by default. This is a basic security precaution. No, it's not a good idea. But if you insist, as this is a toy project anyway, go ahead.
You can make yours writable by letting the linker know to mark it so, e.g. give the following argument to the MS linker:
link /SECTION:.text,EWR ....
You can actually arrange for the text segment of your Windows process to be mapped read+write+execute, see #Kuba's answer. This might also be possible on Linux with ELF binaries; I think ELF has similar flags for segments.
I think you could also call a Windows function (VirtualProtect) to change the mapping of your text segment to read+write+execute from inside your process.
Overall this sounds like a terrible idea, and you should definitely keep temporaries on the stack like a C compiler would, if you want to avoid having a data page.
Static storage for things you only use in part of the program is wasteful.
No it's not possible to have writable "variable" in .text section of an assembly program.
When writing file_handle: dd 0 in the .text section and then assemblying, your label file_handle refers to an address located in the text section of your binary. However the text section is read-only.
If the text section wasn't only read-only accessible, a program could modify itself while executing.
Let's say I have a small block of machine code at address X, that's part of an EXE (PE). It contains all kind of instructions including those with relative addresses. How could I move this code to address Y without changing its behaviour? I don't want to write my own decompiler. Some existing library or a trick, maybe?
The standard thing to do in the 'old school' days is to leave your code at 'x', replace the first few bytes (say it was 3) with a jump to the new code. At the end of the new code you execute the code that you replaced at 'x', then jump back to 'x'+3
Generally impossible, but oblivious if the code is carefully written. See: position-independent code.
I'm building a static binary out of several source files and libraries, and I want to control the order in which the functions are put into the resulting binary.
The background is, I have external code which is linked against offsets in this binary. Now if I change the source, all the offsets change because gcc may decide to order the functions differently, so I want to put the referenced functions at the beginning in a fixed order so their offsets stay unchanged...
I looked through ld's documentation but couldn't find anything about order of functions.
The only thing i found was -fno-toplevel-reorder which doesn't really help me.
There is really no clean and reliable way of forcing a function to a particular address (except for the entry function) or even forcing functions having a particular order (and if you could enforce the order that would still not mean that the addresses stay the same when the source is changed!).
The biggest problem that I see is that even if it may be possible to fix a function to some address, it will be sheer impossible to fix all of them to exactly the addresses that the already existing external program expects (assuming you cannot modify this program). If that actually worked, it would be total coincidence and sheer luck.
It might be almost easiest to provide trampolines at the addresses that the other program expects, and having the real functions (whereever they may be) pointed to by these. That would require your code to use a different base address, so the actual program code doesn't collide with the trampolines.
There are three things that almost work for giving functions fixed addresses:
You can place each function that isn't allowed to move in its proper section using __attribute__ ((section ("some name"))). Unluckily, .text always appears as the first section, so if anything in .text changes so the size is bumped over the 512 byte boundary, your offsets will change. By default (but see below) you can't get a section to start before .text.
The -falign-functions=n commandline option lets you align functions to a boundary. Normally this is something around 16 bytes. Now, you could choose a large value like for example 1024. That will waste an immense amount of space, but it will also make sure that as long as functions only change moderately, the addresses of the following functions will remain the same. Obviously it still does not prevent the compiler/linker from reordering entire blocks when it feels like it (though -fno-toplevel-reorder will prevent this at least partially).
If you are willing to write a custom linker script, you can assign a start address for each section. These are virtual memory addresses, not positions in the executable, but I assume the hard linking works with VMAs (based on the default image base) too. So that could kind of work, although with much trouble and not in a pretty way.
When writing your own linker script, you could also consider putting the functions that must not move into their own sections and moving these sections at the beginning of the executable (in front of .text), so changes in .text won't move your functions around.
Update:
The "gcc" tag suggests that you probably target *NIX, so again this is probably not going to help you, but... if you have the option to use COFF, dollar-sign sections might work (the info might be interesting for others, in any case).
I just stumbled across this today (emphasis mine):
The "$" character (dollar sign) has a special interpretation in section names in object files. When determining the image section that will contain the contents of an object section, the linker discards the "$" and all characters that follow it. Thus, an object section named .text$X actually contributes to the .text section in the image. However, the characters following the "$" determine the ordering of the contributions to the image section. All contributions with the same object-section name are allocated contiguously in the image, and the blocks of contributions are sorted in lexical order by object-section name. Therefore, everything in object files with section name .text$X ends up together, after the .text$W contributions and before the .text$Y contributions.
If the documentation does not lie (and if I'm not reading wrong), this means you should be able to pack all the functions that you want located in the front into one section .text$A, and everything else into .text$B, and it should do just that.
Build your code with -ffunction-sections -- this will place each function into its own section.
If you are using GNU-ld, the linker script gives you absolute control, but is a very platform-specific and somewhat painful solution.
A better solution might be to use the recent work on gold, which allows exactly the function ordering you are seeking.
A lot of it comes from the order the functions are in the file and the order the files are on the command line when you link.
Embed something in the code that your external code can find, a const structure with some ascii code and the address to functions perhaps, then no matter where the compiler puts the functions you can find them.
that or use the normal .dll or .so mechanisms, and not have to mess with it.
In my experience, gcc -O0 will fix the binary order of functions to match the order in the source code.
However as others have mentioned, even if the order is fixed, the offsets can change as you modify the source code or upgrade your toolchain.
I would like to know, is there any Windows platform disassembler (software) can generate the assembly source code which is also compilable by an assembler?
Since disassembler can generate the assembly code based on an EXE file, is it possible the assembly code be used directly as a source code, then the source code be compiled by an assembler like NASM?
IDA can generate the source code. But in most cases you can't edit it. Assume the following code:
loc_401020:
ret
; ...
dd 0FFFFFFFFh, 0, 1, 401020h, 0
; ^^^^^^^ can you find it in big real program?
to insert any new bytes to you must either be shure that any sub_XXXX or loc_XXXX will remain at the same offset, either you must replace all its references to labels.
If you don't move any code, you don't need to recompile it - just patch and maybe extend the code section.
I think IDA is quite good at this.
Anyway, the main problem would be that the generated assembly code would be quite unreadable and very hard to mantain (no variable names, function names and signatures), so altough technically it would be ASM code, it's still better to use IDA as clever editor to deduce this information.
I'd patch the executable directly in a debugger and then save the modified executable.
Decompiling and then recompiling is a fragile process since any change of position of the reassembled code can break the program. And I think there are multiple binary representations of certain asm instructions complicating the matter ever further.
You are losing some information when disassembling an executable so you are unlikely to get a fully working executable when assembling the disassembly. If you are clever, you can extract single functions from an executable, but not the whole program.
The objconv disassembler can produce assembly code in masm, nasm, yasm and gas syntax.