To understand the concept of the loader in LINUX by simple example? - embedded-linux

As I understand, the core of a boot loader is a loader program. By loader, I mean the program that will load another program. Or to be more specific first it will load itself then the high level image - for example kernel. Instead of making a bootloader, I thought to clear my doubts on loader by running on an OS that will load another program. I do understand that every process map is entirely independent to another. So, what I am trying to do is make a simple program hello_world.c this will print the great "hello world". Now, I want to make a loader program that will load this program hello world. As I understand the crux is in two steps
Load the hello world program on the RAM - loader address.
JMP to the Entry Address.
Since, this is to understand the concept, I am using the readymade utility readelf to read the address of the hello world binary. The intention here is not to make a ELF parser.
As all the process are independent and use virtual memory. This will fail, If I use the virtual memory addresses. Now, I am stuck over here, how can I achieve this?
#include "stdio.h"
#include <sys/mman.h>
int main( int argc, char **argv)
{
char *mem_ptr;
FILE *fp;
char *val;
char *exec;
mem_ptr = (char*) malloc(10*1024);
fp = fopen("./hello_world.out","rb");
fread(mem_ptr, 10240, 1, fp);
//val = mem_ptr + 0x8048300;
printf("The mem_ptr is %p\r\n",mem_ptr);
exec = mmap(NULL, 10240, PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, 0x9c65008, 0);
memcpy(mem_ptr,exec,10240);
__asm__("jmp 0x9c65008");
fclose(fp);
return 0;
}

my rep is not enough to let me add comments.
As Chris Stratton said, your problem sounds ambiguous(still after editing!). Do you want to
Write a bootloader, that will load "Hello, World" instead of real OS? <--Actual Problem is saying this OR
Write a program, that will be running on OS(so full fledged OS will be there), and load another executable using this program?<--Comments are saying this
Answers will vary a lot depending on this.
In first case, bootloader is present on BIOS, that will fetch some predefined memory block to RAM. So what u need to do is just place your Hello, World at this place. There are many things regarding this, such as chain loading and all, but not sure if this is what you want achieve. If this is NOT something you wanted, why is bootstrap tag used?
In second case, fork() + exec() will do it for you. But be sure that this way, there will be two different address spaces. If you want them in the same address space, I am doubtful about daily used OS(for normal guys). Most of the your part sounds like this is what you want to do.
If you want to ask something different than this, please edit almost entire question and ask ONLY that part.(Avoid telling why you are trying to do something, what you think you already understand etc)

Related

Call a shellcode without using pointer to function?

Is there a way to get the return value of a function that is in the shellcode, without using pointer to function?
#include <stdio.h>
unsigned char code[] = "\x55\x48\x89\xe5"
"\xb8\x05\x00\x00"
"\x00\x5d\xc3";
int main(void) {
int (*p)(void) = (int(*)(void))code;
printf("%d", p());
return 0;
}
Shellcode (see Wikipedia article Shellcode as well as this presentation Introduction to Shellcode Development) is machine code that is injected into an application in order to take over the application and run your own application within that application's process.
How the shellcode is injected into the application and starts running will vary depending on how the penetration is being done.
However for testing approaches for the actual shellcode, as opposed to approaches for injecting the shellcode in the first place, the testing is typically done with a simple program that allows you to (1) create the shellcode program that is to be injected as an array of bytes and (2) start the shellcode executing.
The simplest approach for this is the source code you have posted.
You have an array of unsigned char which contains the machine code to be executed.
You have a main() which creates a function pointer to the array of unsigned char bytes and then calls the shellcode through the function pointer.
However in a real world penetration what you would normally do is to use a technique whereby you would take over an application by injecting your shellcode into its process space and then triggering the execution of that shellcode. One such approach is a buffer overflow attack. See for example COEN 152 Computer Forensics Buffer Overflow Attack as well as Wikipedia article Buffer overflow.
See also
Shellcode in C program
Re-writing a small execve shellcode
Also note that the approaches for shellcode attacks will vary depending on the operating system that is being attacked. For instance see this article Basics of Windows shellcode writing which explains some of the intricacies of writing a shellcode for accessing system calls in Windows. Compare to this article providing a way for How to write a Linux x86 shellcode.

Test if the program uses MPI (distributed) correctly?

How do I check that a program is using MPI when it runs? Specifically, how can I verify the program is running on multiple processors? Also, how can I figure out if my program is correctly running across multiple nodes?
I am assuming you're trying to figure out which processor/host is the MPI process running on.
You can use the MPI_Get_processor_name function to print the processor name.
Here is what your code will look like.
#include <mpi.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int rank, max_len;
char processorname[MPI_MAX_PROCESSOR_NAME];
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Get_processor_name(processorname,&max_len);
printf("Hello world! I am process number: %d on processor %s\n", rank, processorname);
MPI_Finalize();
return 0;
}
So now to compile the program use mpicc -o hello_world hello_world.c.
To run the program use mpirun -np 4 -f machinefile ./hello_world.
This will run the program in 4 different processors mentioned in your machinefile.
You didn't tell us, what you are actually looking for. Your question is unclear and ambiguous, it would be great if you could improve it. That being said, I guess you would like to know wether your processes are actually executed by distinct CPU cores.
First of all, Pooja Nilangekar explained a method to verify the distribution across a network. Now within a single node, it most likely depends on the systems you are running on. If it is a Linux, you could for example make use of the /proc filesystem, and check the status of the current process in /proc/self/. This pseudo filesystem offers a file stat, which contains a field processor showing the cpu_id, this process was last run on. Maybe, also check /proc/self/status for the cpus, the process is allowed to run on. It might be that MPI or your scheduler puts restrictions on this for each process. Together with the node information from the answer of Pooja Nilangekar, you can thereby obtain the running information for each process.
If you can not modify the sources, to have each process reporting where it is running, I think, the easiest way to see which cores are utilized would be top, maybe also have look at this blog on How do I find out Linux CPU utilization?, which also mentions mpstat and sar.

Make a system call to get list of processes

I'm new on modules programming and I need to make a system call to retrieve the system processes and show how much CPU they are consuming.
How can I make this call?
Why would you implement a system call for this? You don't want to add a syscall to the existing Linux API. This is the primary Linux interface to userspace and nobody touches syscalls except top kernel developers who know what they do.
If you want to get a list of processes and their parameters and real-time statuses, use /proc. Every directory that's an integer in there is an existing process ID and contains a bunch of useful dynamic files which ps, top and others use to print their output.
If you want to get a list of processes within the kernel (e.g. within a module), you should know that the processes are kept internally as a doubly linked list that starts with the init process (symbol init_task in the kernel). You should use macros defined in include/linux/sched.h to get processes. Here's an example:
#include <linux/module.h>
#include <linux/printk.h>
#include <linux/sched.h>
static int __init ex_init(void)
{
struct task_struct *task;
for_each_process(task)
pr_info("%s [%d]\n", task->comm, task->pid);
return 0;
}
static void __exit ex_fini(void)
{
}
module_init(ex_init);
module_exit(ex_fini);
This should be okay to gather information. However, don't change anything in there unless you really know what you're doing (which will require a bit more reading).
There are syscalls for that, called open, and read. The information of all processes are all kept in /proc/{pid} directories. You can gather process information by reading corresponding files.
More explained here: http://www.tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

Fixed Memory I don't need to allocate?

I just need a fixed address in any win32 process, where I can store 8 bytes without using any winapi function. I also cannot use assembler prefixes like fs:. and I have no stack pointer.
What I need:
-8 bytes of memory
-constant address and present in any process
-read and write access (via pointer, from the same process)
-should not crash the application (at least not instantly) if modified.
Don't even ask, why I need it.
The only way I'm aware of to do this is to use a DLL with a shared section...
// This goes in a DLL loaded by all apps that want to share the data
#pragma data_seg (".sharedseg")
long long myShared8Bytes = 0; // has to be initialized or this fails
#pragma data_seg()
Then, you add the following to the link command for the dll:
/SECTION:sharedseg,RWS
I am also curious why you want this...
Not that I recommend this, but the PEB probably has some unused or inconsequential fields in it that you could overwrite. I still think this is a terrible idea, though.
constant address and present in any
process
You won't be able to achieve that. Win32 uses paged memory so different processes can access the same memory addresses even though it is different memory.

how can i override malloc(), calloc(), free() etc under OS X?

Assuming the latest XCode and GCC, what is the proper way to override the memory allocation functions (I guess operator new/delete as well). The debugging memory allocators are too slow for a game, I just need some basic stats I can do myself with minimal impact.
I know its easy in Linux due to the hooks, and this was trivial under codewarrior ten years ago when I wrote HeapManager.
Sadly smartheap no longer has a mac version.
I would use library preloading for this task, because it does not require modification of the running program. If you're familiar with the usual Unix way to do this, it's almost a matter of replacing LD_PRELOAD with DYLD_INSERT_LIBRARIES.
First step is to create a library with code such as this, then build it using regular shared library linking options (gcc -dynamiclib):
void *malloc(size_t size)
{
void * (*real_malloc)(size_t);
real_malloc = dlsym(RTLD_NEXT, "malloc");
fprintf(stderr, "allocating %lu bytes\n", (unsigned long)size);
/* Do your stuff here */
return real_malloc(size);
}
Note that if you also divert calloc() and its implementation calls malloc(), you may need additional code to check how you're being called. C++ programs should be pretty safe because the new operator calls malloc() anyway, but be aware that no standard enforces that. I have never encountered an implementation that didn't use malloc(), though.
Finally, set up the running environment for your program and launch it (might require adjustments depending on how your shell handles environment variables):
export DYLD_INSERT_LIBRARIES=./yourlibrary.dylib
export DYLD_FORCE_FLAT_NAMESPACE=1
yourprogram --yourargs
See the dyld manual page for more information about the dynamic linker environment variables.
This method is pretty generic. There are limitations, however:
You won't be able to divert direct system calls
If the application itself tricks you by using dlsym() to load malloc's address, the call won't be diverted. Unless, however, you trick it back by also diverting dlsym!
The malloc_default_zone technique mentioned at http://lists.apple.com/archives/darwin-dev/2005/Apr/msg00050.html appears to still work, see e.g. http://code.google.com/p/fileview/source/browse/trunk/fileview/fv_zone.cpp?spec=svn354&r=354 for an example use that seems to be similar to what you intend.
After much searching (here included) and issues with 10.7 I decided to write a blog post about this topic: How to set malloc hooks in OSX Lion
You'll find a few good links at the end of the post with more information on this topic.
The basic solution:
malloc_zone_t *dz=malloc_default_zone();
if(dz->version>=8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ | VM_PROT_WRITE);//remove the write protection
}
original_free=dz->free;
dz->free=&my_free; //this line is throwing a bad ptr exception without calling vm_protect first
if(dz->version==8)
{
vm_protect(mach_task_self(), (uintptr_t)malloc_zones, protect_size, 0, VM_PROT_READ);//put the write protection back
}
This is an old question, but I came across it while trying to do this myself. I got curious about this topic for a personal project I was working on, mainly to make sure that what I thought was automatically deallocated was being properly deallocated. I ended up writing a C++ implementation to allow me to track the amount of allocated heap and report it out if I so chose.
https://gist.github.com/monitorjbl/3dc6d62cf5514892d5ab22a59ff34861
As the name notes, this is OSX-specific. However, I was able to do this on Linux environments using the malloc_usable_size
Example
#define MALLOC_DEBUG_OUTPUT
#include "malloc_override_osx.hpp"
int main(){
int* ip = (int*)malloc(sizeof(int));
double* dp = (double*)malloc(sizeof(double));
free(ip);
free(dp);
}
Building
$ clang++ -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.sdk \
-pipe -stdlib=libc++ -std=gnu++11 -g -o test test.cpp
$ ./test
0x7fa28a403230 -> malloc(16) -> 16
0x7fa28a403240 -> malloc(16) -> 32
0x7fa28a403230 -> free(16) -> 16
0x7fa28a403240 -> free(16) -> 0
Hope this helps someone else out in the future!
If the basic stats you need can be collected in a simple wrapper, a quick (and kinda dirty) trick is just using some #define macro replacement.
void* _mymalloc(size_t size)
{
void* ptr = malloc(size);
/* do your stat work? */
return ptr;
}
and
#define malloc(sz_) _mymalloc(sz_)
Note: if the macro is defined before the _mymalloc definition it will end up replacing the malloc call inside that function leaving you with infinite recursion... so ensure this isn't the case. You might want to explicitly #undef it before that function definition and simply (re)define it afterward depending on where you end up including it to hopefully avoid this situation.
I think if you define a malloc() and free() in your own .c file included in the project the linker will resolve that version.
Now then, how do you intend to implement malloc?
Check out Emery Berger's -- the author of the Hoard memory allocator's -- approach for replacing the allocator on OSX at https://github.com/emeryberger/Heap-Layers/blob/master/wrappers/macwrapper.cpp (and a few other files you can trace yourself by following the includes).
This is complementary to Alex's answer, but I thought this example was more to-the-point of replacing the system provided allocator.

Resources