Call a shellcode without using pointer to function? - gcc

Is there a way to get the return value of a function that is in the shellcode, without using pointer to function?
#include <stdio.h>
unsigned char code[] = "\x55\x48\x89\xe5"
"\xb8\x05\x00\x00"
"\x00\x5d\xc3";
int main(void) {
int (*p)(void) = (int(*)(void))code;
printf("%d", p());
return 0;
}

Shellcode (see Wikipedia article Shellcode as well as this presentation Introduction to Shellcode Development) is machine code that is injected into an application in order to take over the application and run your own application within that application's process.
How the shellcode is injected into the application and starts running will vary depending on how the penetration is being done.
However for testing approaches for the actual shellcode, as opposed to approaches for injecting the shellcode in the first place, the testing is typically done with a simple program that allows you to (1) create the shellcode program that is to be injected as an array of bytes and (2) start the shellcode executing.
The simplest approach for this is the source code you have posted.
You have an array of unsigned char which contains the machine code to be executed.
You have a main() which creates a function pointer to the array of unsigned char bytes and then calls the shellcode through the function pointer.
However in a real world penetration what you would normally do is to use a technique whereby you would take over an application by injecting your shellcode into its process space and then triggering the execution of that shellcode. One such approach is a buffer overflow attack. See for example COEN 152 Computer Forensics Buffer Overflow Attack as well as Wikipedia article Buffer overflow.
See also
Shellcode in C program
Re-writing a small execve shellcode
Also note that the approaches for shellcode attacks will vary depending on the operating system that is being attacked. For instance see this article Basics of Windows shellcode writing which explains some of the intricacies of writing a shellcode for accessing system calls in Windows. Compare to this article providing a way for How to write a Linux x86 shellcode.

Related

How do I disable ASLR for heap addresses for a program compiled and linked with mingw-w64 GCC? [duplicate]

For debugging purposes, I would like malloc to return the same addresses every time the program is executed, however in MSVC this is not the case.
For example:
#include <stdlib.h>
#include <stdio.h>
int main() {
int test = 5;
printf("Stack: %p\n", &test);
printf("Heap: %p\n", malloc(4));
return 0;
}
Compiling with cygwin's gcc, I get the same Stack address and Heap address everytime, while compiling with MSVC with aslr off...
cl t.c /link /DYNAMICBASE:NO /NXCOMPAT:NO
...I get the same Stack address every time, but the Heap address changes.
I have already tried adding the registry value HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages but it does not work.
Both the stack address and the pointer returned by malloc() may be different every time. As a matter of fact both differ when the program is compiled and run on Mac/OS multiple times.
The compiler and/or the OS may cause this behavior to try and make it more difficult to exploit software flaws. There might be a way to prevent this in some cases, but if your goal is to replay the same series of malloc() addresses, other factors may change the addresses, such as time sensitive behaviors, file system side effects, not to mention non-deterministic thread behavior. You should try and avoid relying on this for your tests.
Note also that &test should be cast as (void *) as %p expects a void pointer, which is not guaranteed to have the same representation as int *.
It turns out that you may not be able to obtain deterministic behaviour from the MSVC runtime libraries. Both the debug and the production versions of the C/C++ runtime libraries end up calling a function named _malloc_base(), which in turn calls the Win32 API function HeapAlloc(). Unfortunately, neither HeapAlloc() nor the function that provides its heap, HeapCreate(), document a flag or other way to obtain deterministic behaviour.
You could roll up your own allocation scheme on top of VirtualAlloc(), as suggested by #Enosh_Cohen, but then you'd loose the debug functionality offered by the MSVC allocation functions.
Diomidis' answer suggests making a new malloc on top of VirtualAlloc, so I did that. It turned out to be somewhat challenging because VirtualAlloc itself is not deterministic, so I'm documenting the procedure I used.
First, grab Doug Lea's malloc. (The ftp link to the source is broken; use this http alternative.)
Then, replace the win32mmap function with this (hereby placed into the public domain, just like Doug Lea's malloc itself):
static void* win32mmap(size_t size) {
/* Where to ask for the next address from VirtualAlloc. */
static char *next_address = (char*)(0x1000000);
/* Return value from VirtualAlloc. */
void *ptr = 0;
/* Number of calls to VirtualAlloc we have made. */
int tries = 0;
while (!ptr && tries < 100) {
ptr = VirtualAlloc(next_address, size,
MEM_RESERVE|MEM_COMMIT, PAGE_READWRITE);
if (!ptr) {
/* Perhaps the requested address is already in use. Try again
* after moving the pointer. */
next_address += 0x1000000;
tries++;
}
else {
/* Advance the request boundary. */
next_address += size;
}
}
/* Either we got a non-NULL result, or we exceeded the retry limit
* and are going to return MFAIL. */
return (ptr != 0)? ptr: MFAIL;
}
Now compile and link the resulting malloc.c with your program, thereby overriding the MSVCRT allocator.
With this, I now get consistent malloc addresses.
But beware:
The exact address I used, 0x1000000, was chosen by enumerating my address space using VirtualQuery to look for a large, consistently available hole. The address space layout appears to have some unavoidable non-determinism even with ASLR disabled. You may have to adjust the value.
I confirmed this works, in my particular circumstances, to get the same addresses during 100 sequential runs. That's good enough for the debugging I want to do, but the values might change after enough iterations, or after rebooting, etc.
This modification should not be used in production code, only for debugging. The retry limit is a hack, and I've done nothing to track when the heap shrinks.

Successfully de-referenced userspace pointer in kernel space without using copy_from_user()

There's a bug in a driver within our company's codebase that's been there for years.
Basically we make calls to the driver via ioctls. The data passed between userspace and driver space is stored in a struct, and the pointer to the data is fed into the ioctl. The driver is responsible to dereference the pointer by using copy_from_user(). But this code hasn't been doing that for years, instead just dereferencing the userspace pointer. And so far (that I know of) it hasn't caused any issues until now.
I'm wondering how this code hasn't caused any problems for so long? In what case will dereferencing a pointer in kernel space straight from userspace not cause a problem?
In userspace
struct InfoToDriver_t data;
data.cmd = DRV_SET_THE_CLOCK;
data.speed = 1000;
ioctl(driverFd, DEVICE_XX_DRIVER_MODIFY, &data);
In the driver
device_xx_driver_ioctl_handler (struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{
struct InfoToDriver_t *user_data;
switch(cmd)
{
case DEVICE_XX_DRIVER_MODIFY:
// what we've been doing for years, BAD
// But somehow never caused a kernel oops until now
user_data = (InfoToDriver_t *)arg;
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ .... }
// what we're supposed to do
copy_from_user(user_data, (void *)arg, sizeof(InfoToDriver_t));
if (user_data->cmd == DRV_SET_THE_CLOCK)
{ ... }
A possible answer is, this depends on the architecture. As you have seen, on a sane architecture (such as x86 or x86-64) simply dereferencing __user pointers just works. But Linux pretends to support every possible architecture, there are architectures where simple dereference does not work. Otherwise copy_to/from_user won't existed.
Another reason for copy_to/from_user is possibility that usermode side modifies its memory simultaneously with the kernel side (in another thread). You cannot assume that the content of usermode memory is frozen while accessing it from kernel. Rougue usermode code can use this to attack the kernel. For example, you can probe the pointer to output data before executing the work, but when you get to copy the result back to usermode, this pointer is already invalid. Oops. The copy_to_user API ensures (should ensure) that the kernel won't crash during the copy, instead the guilty application will be killed.
A safer approach is to copy the whole usermode data structure into kernel (aka 'capture'), check this copy for consistency.
The bottom line... if this driver is proven to work well on certain architecture, and there are no plans to port it, there's no urgency to change it. But check carefully robustness of the kernel code, if capture of the usermode data is needed, or problem may arise during copying from usermode.

Linux Syscalls with > 6 parameters

IS it possible to write a (linux kernel)sycall function that has more than 6 input parameters? Looking at the header I see that the defined syscall macros have a maximum of 6 parameters. I'm tempted to try to define SYSCALL7 and SYSCALL8 to allow for 7 and 8 parameters but I'm not quite sure if that will actually work.
For x86, the following function (from x86...syscall.h) copies the arguments over:
static inline void syscall_get_arguments(struct task_struct *task,
struct pt_regs *regs,
unsigned int i, unsigned int n,
unsigned long *args)
{
BUG_ON(i + n > 6);
memcpy(args, &regs->bx + i, n * sizeof(args[0]));
}
This function is described well in the comments in asm_generic/syscall.h. It copies the arguments into the syscall, and there is a limit of 6 arguments. It may be implemented in a number of ways depending on architecture. For x86 (from the snippet above) it looks like the arguments are all passed by register.
So, if you want to pass more than 6 arguments, use a struct. If you must have a SYSCALL7, then you are going to have to create a custom kernel and likely modify almost every step of the syscall process. x86_64 would likely accommodate this change easier, since it has more registers than x86.
What if one day you need 20 parameters ? I think the best way to go around your syscall problem is to use a pointer to *void.
This way you can pass a struct containing an unlimited amount of parameters.
Generally there is no limit to the number of parameter. But all these things need a standard: all kernel module write and user or caller will need to agree on a standard way to pass information from caller to callee (and vice versa) - whether it is passing by stack or register. It is called "ABI" or calling convention. There are different standard for x86 and AMD64, and generally it is the same for all UNIX in x86: Linux, FreeBSD etc.
http://www.x86-64.org/documentation/abi.pdf
Eg, x86 syscall ABI:
http://lwn.net/Articles/456731/
http://esec-lab.sogeti.com/post/2011/07/05/Linux-syscall-ABI
More details please see (to avoid repetition):
What are the calling conventions for UNIX & Linux system calls on x86-64
Why does Windows64 use a different calling convention from all other OSes on x86-64?
And userspace will have its own ABI as well:
https://www.kernel.org/doc/Documentation/ABI/README
https://lwn.net/Articles/234133/
http://lwn.net/Articles/456731/

To understand the concept of the loader in LINUX by simple example?

As I understand, the core of a boot loader is a loader program. By loader, I mean the program that will load another program. Or to be more specific first it will load itself then the high level image - for example kernel. Instead of making a bootloader, I thought to clear my doubts on loader by running on an OS that will load another program. I do understand that every process map is entirely independent to another. So, what I am trying to do is make a simple program hello_world.c this will print the great "hello world". Now, I want to make a loader program that will load this program hello world. As I understand the crux is in two steps
Load the hello world program on the RAM - loader address.
JMP to the Entry Address.
Since, this is to understand the concept, I am using the readymade utility readelf to read the address of the hello world binary. The intention here is not to make a ELF parser.
As all the process are independent and use virtual memory. This will fail, If I use the virtual memory addresses. Now, I am stuck over here, how can I achieve this?
#include "stdio.h"
#include <sys/mman.h>
int main( int argc, char **argv)
{
char *mem_ptr;
FILE *fp;
char *val;
char *exec;
mem_ptr = (char*) malloc(10*1024);
fp = fopen("./hello_world.out","rb");
fread(mem_ptr, 10240, 1, fp);
//val = mem_ptr + 0x8048300;
printf("The mem_ptr is %p\r\n",mem_ptr);
exec = mmap(NULL, 10240, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, 0x9c65008, 0);
memcpy(mem_ptr,exec,10240);
__asm__("jmp 0x9c65008");
fclose(fp);
return 0;
}
my rep is not enough to let me add comments.
As Chris Stratton said, your problem sounds ambiguous(still after editing!). Do you want to
Write a bootloader, that will load "Hello, World" instead of real OS? <--Actual Problem is saying this OR
Write a program, that will be running on OS(so full fledged OS will be there), and load another executable using this program?<--Comments are saying this
Answers will vary a lot depending on this.
In first case, bootloader is present on BIOS, that will fetch some predefined memory block to RAM. So what u need to do is just place your Hello, World at this place. There are many things regarding this, such as chain loading and all, but not sure if this is what you want achieve. If this is NOT something you wanted, why is bootstrap tag used?
In second case, fork() + exec() will do it for you. But be sure that this way, there will be two different address spaces. If you want them in the same address space, I am doubtful about daily used OS(for normal guys). Most of the your part sounds like this is what you want to do.
If you want to ask something different than this, please edit almost entire question and ask ONLY that part.(Avoid telling why you are trying to do something, what you think you already understand etc)

Why does Unix have fork() but not CreateProcess()?

I do not get why Unix has fork() for creating a new process. In Win32 API we have CreateProcess() which creates a new process and loads an executable into its address space, then starts executing from the entry point. However Unix offers fork for creating a new process, and I don't get why would I duplicate my process if I'd like to run another process.
So let me ask these two questions:
If fork() and then exec() is more efficient, why isn't there a function forkexec(const char *newProc) since we will call exec() after fork() almost in every case?
If it is not more efficient, why does fork() exist at all?
The fork() call is sufficient. It is also more flexible; it allows you to things like adjust the I/O redirection in the child process, rather than complicating the system call to create the process. With SUID or SGID programs, it allows the child to lose its elevated privileges before executing the other process.
If you want a complex way to create a process, lookup the posix_spawn() function.
#include <spawn.h>
int posix_spawn(pid_t *restrict pid, const char *restrict path,
const posix_spawn_file_actions_t *file_actions,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict], char *const envp[restrict]);
int posix_spawnp(pid_t *restrict pid, const char *restrict file,
const posix_spawn_file_actions_t *file_actions,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict], char *const envp[restrict]);
The difference is the posix_spawnp() does a search on PATH for the executable.
There is a whole set of other functions for handling posix_spawn_file_actions_t and posix_spawnattr_t types (follow the 'See Also' links at the bottom of the referenced man page).
This is quite a bit more like CreateProcess() on Windows. For the most part, though, using fork() followed shortly by exec() is simpler.
I don't understand what you mean. The child process code will be written by me, so what is the difference between writing if (fork() == 0) and putting this code in the beginning of child's main()?
Very often, the code you execute is not written by you, so you can't modify what happens in the beginning of the child's process. Think of a shell; if the only programs you run from the shell are those you've written, life is going to be very impoverished.
Quite often, the code you execute will be called from many different places. In particular, think of a shell and a program that will sometimes be executed in a pipeline and sometimes executed without pipes. The called program cannot tell what I/O redirections and fixups it should do; the calling program knows.
If the calling program is running with elevated privileges (SUID or SGID privileges), it is normal to want to turn those 'off' before running another program. Relying on the other program to know what to do is ... foolish.
UNIX-like operating systems (at least newer Linux and BSD kernels) generally have a very efficient fork implementation -- it is "so cheap" that there are "threaded" implementations based upon it in some languages.
In the end the forkexec function is ~n -- for some small value of n -- lines of application code.
I sure wish windows had such a useful ForkProcess :(
Happy coding.
A cnicutar mentioned, Copy-On-Write (COW) is one strategy used.
There is a function that is equivalent to forkexec - system
http://www.tutorialspoint.com/c_standard_library/c_function_system.htm
#include <stdio.h>
#include <string.h>
int main ()
{
char command[50];
strcpy( command, "ls -l" );
system(command);
return(0);
}

Resources