Trap memory accesses inside a standard executable built with MinGW - gcc

So my problem sounds like this.
I have some platform dependent code (embedded system) which writes to some MMIO locations that are hardcoded at specific addresses.
I compile this code with some management code inside a standard executable (mainly for testing) but also for simulation (because it takes longer to find basic bugs inside the actual HW platform).
To alleviate the hardcoded pointers, i just redefine them to some variables inside the memory pool. And this works really well.
The problem is that there is specific hardware behavior on some of the MMIO locations (w1c for example) which makes "correct" testing hard to impossible.
These are the solutions i thought of:
1 - Somehow redefine the accesses to those registers and try to insert some immediate function to simulate the dynamic behavior. This is not really usable since there are various ways to write to the MMIO locations (pointers and stuff).
2 - Somehow leave the addresses hardcoded and trap the illegal access through a seg fault, find the location that triggered, extract exactly where the access was made, handle and return. I am not really sure how this would work (and even if it's possible).
3 - Use some sort of emulation. This will surely work, but it will void the whole purpose of running fast and native on a standard computer.
4 - Virtualization ?? Probably will take a lot of time to implement. Not really sure if the gain is justifiable.
Does anyone have any idea if this can be accomplished without going too deep? Maybe is there a way to manipulate the compiler in some way to define a memory area for which every access will generate a callback. Not really an expert in x86/gcc stuff.
Edit: It seems that it's not really possible to do this in a platform independent way, and since it will be only windows, i will use the available API (which seems to work as expected). Found this Q here:
Is set single step trap available on win 7?
I will put the whole "simulated" register file inside a number of pages, guard them, and trigger a callback from which i will extract all the necessary info, do my stuff then continue execution.
Thanks all for responding.

I think #2 is the best approach. I routinely use approach #4, but I use it to test code that is running in the kernel, so I need a layer below the kernel to trap and emulate the accesses. Since you have already put your code into a user-mode application, #2 should be simpler.
The answers to this question may provide help in implementing #2. How to write a signal handler to catch SIGSEGV?
What you really want to do, though, is to emulate the memory access and then have the segv handler return to the instruction after the access. This sample code works on Linux. I'm not sure if the behavior it is taking advantage of is undefined, though.
#include <stdint.h>
#include <stdio.h>
#include <signal.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static void segv_handler(int, siginfo_t *, void *);
int main()
{
struct sigaction action = { 0, };
action.sa_sigaction = segv_handler;
action.sa_flags = SA_SIGINFO;
sigaction(SIGSEGV, &action, NULL);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static void segv_handler(int, siginfo_t *info, void *ucontext_arg)
{
ucontext_t *ucontext = static_cast<ucontext_t *>(ucontext_arg);
ucontext->uc_mcontext.gregs[REG_RAX] = 1234;
ucontext->uc_mcontext.gregs[REG_RIP] += 2;
}
The code to read the register is written in assembly to ensure that both the destination register and the length of the instruction are known.

This is how the Windows version of prl's answer could look like:
#include <stdint.h>
#include <stdio.h>
#include <windows.h>
#define REG_ADDR ((volatile uint32_t *)0x12340000f000ULL)
static uint32_t read_reg(volatile uint32_t *reg_addr)
{
uint32_t r;
asm("mov (%1), %0" : "=a"(r) : "r"(reg_addr));
return r;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *);
int main()
{
SetUnhandledExceptionFilter(segv_handler);
// force sigsegv
uint32_t a = read_reg(REG_ADDR);
printf("after segv, a = %d\n", a);
return 0;
}
static LONG WINAPI segv_handler(EXCEPTION_POINTERS *ep)
{
// only handle read access violation of REG_ADDR
if (ep->ExceptionRecord->ExceptionCode != EXCEPTION_ACCESS_VIOLATION ||
ep->ExceptionRecord->ExceptionInformation[0] != 0 ||
ep->ExceptionRecord->ExceptionInformation[1] != (ULONG_PTR)REG_ADDR)
return EXCEPTION_CONTINUE_SEARCH;
ep->ContextRecord->Rax = 1234;
ep->ContextRecord->Rip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}

So, the solution (code snippet) is as follows:
First of all, i have a variable:
__attribute__ ((aligned (4096))) int g_test;
Second, inside my main function, i do the following:
AddVectoredExceptionHandler(1, VectoredHandler);
DWORD old;
VirtualProtect(&g_test, 4096, PAGE_READWRITE | PAGE_GUARD, &old);
The handler looks like this:
LONG WINAPI VectoredHandler(struct _EXCEPTION_POINTERS *ExceptionInfo)
{
static DWORD last_addr;
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) {
last_addr = ExceptionInfo->ExceptionRecord->ExceptionInformation[1];
ExceptionInfo->ContextRecord->EFlags |= 0x100; /* Single step to trigger the next one */
return EXCEPTION_CONTINUE_EXECUTION;
}
if (ExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) {
DWORD old;
VirtualProtect((PVOID)(last_addr & ~PAGE_MASK), 4096, PAGE_READWRITE | PAGE_GUARD, &old);
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
This is only a basic skeleton for the functionality. Basically I guard the page on which the variable resides, i have some linked lists in which i hold pointers to the function and values for the address in question. I check that the fault generating address is inside my list then i trigger the callback.
On first guard hit, the page protection will be disabled by the system, but i can call my PRE_WRITE callback where i can save the variable state. Because a single step is issued through the EFlags, it will be followed immediately by a single step exception (which means that the variable was written), and i can trigger a WRITE callback. All the data required for the operation is contained inside the ExceptionInformation array.
When someone tries to write to that variable:
*(int *)&g_test = 1;
A PRE_WRITE followed by a WRITE will be triggered,
When i do:
int x = *(int *)&g_test;
A READ will be issued.
In this way i can manipulate the data flow in a way that does not require modifications of the original source code.
Note: This is intended to be used as part of a test framework and any penalty hit is deemed acceptable.
For example, W1C (Write 1 to clear) operation can be accomplished:
void MYREG_hook(reg_cbk_t type)
{
/** We need to save the pre-write state
* This is safe since we are assured to be called with
* both PRE_WRITE and WRITE in the correct order
*/
static int pre;
switch (type) {
case REG_READ: /* Called pre-read */
break;
case REG_PRE_WRITE: /* Called pre-write */
pre = g_test;
break;
case REG_WRITE: /* Called after write */
g_test = pre & ~g_test; /* W1C */
break;
default:
break;
}
}
This was possible also with seg-faults on illegal addresses, but i had to issue one for each R/W, and keep track of a "virtual register file" so a bigger penalty hit. In this way i can only guard specific areas of memory or none, depending on the registered monitors.

Related

How to calculate required stack size for tree of called functions in GCC

Is it possible to determine the demand a non recursive function on the stack without external computation, right in the text of the program? I need this to allocate a memory resource for the thread in very small micro-controllers, such as AVR. And I need know this before function calling. Directive --stack-usage is very non informative, unfortunately. Or I something do not understand?
Getting the address of a passed argument yields it's place on the stack. Therefore running this:
#include <stdio.h>
void my_fun(int dummy);
int get_stack_space(int dummy);
int main(void)
{
int dummy = 0;
my_fun(dummy);
return 0;
}
void my_fun(int dummy)
{
// do stuff
printf("%d\n", get_stack_space((int)&dummy));
return;
}
int get_stack_space(int dummy)
{
return dummy - (int)&dummy;
}
should get you the distance in bytes on the stack between the point of calling my_fun() and calling get_stack_space(). Hope it helps.
Edit: on x86 you get the distance + a machine word for the push of the return address when calling my_fun() + a machine word for the push of ebp at the start of my_fun()

how to transfer string(char*) in kernel into user process using copy_to_user

I'm making code to transfer string in kernel to usermode using systemcall and copy_to_user
here is my code
kernel
#include<linux/kernel.h>
#include<linux/syscalls.h>
#include<linux/sched.h>
#include<linux/slab.h>
#include<linux/errno.h>
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
printk("getProcTag system call \n\n");
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
/*
task -> tag : string is stored in task->tag (ex : "abcde")
this part is well worked
*/
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
return 1;
}
and this is user
#include<stdio.h>
#include<stdlib.h>
int main()
{
char *ret=NULL;
int pid = 0;
printf("PID : ");
scanf("%4d", &pid);
if(syscall(339, pid, &ret)!=1) // syscall 339 is getProcTagSysCall
printf("pid %d does not exist\n", pid);
else
printf("Corresponding pid tag is %s \n",ret); //my output is %s = null
return 0;
}
actually i don't know about copy_to_user well. but I think copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) is operated like this code
so i use copy_to_user like above
#include<stdio.h>
int re();
void main(){
char *b = NULL;
if (re(&b))
printf("success");
printf("%s", b);
}
int re(char **str){
char *temp = "Gdg";
*str = temp;
return 1;
}
Is this a college assignment of some sort?
asmlinkage int sys_getProcTagSysCall(pid_t pid, char **tag){
What is this, Linux 2.6? What's up with ** instead of *?
printk("getProcTag system call \n\n");
Somewhat bad. All strings are supposed to be prefixed.
struct task_struct *task= (struct task_struct*) kmalloc(sizeof(struct task_struct),GFP_KERNEL);
What is going on here? Casting malloc makes no sense whatsoever, if you malloc you should have used sizeof(*task) instead, but you should not malloc in the first place. You want to find a task and in fact you just overwrite this pointer's value few lines later anyway.
read_lock(&tasklist_lock);
task = find_task_by_vpid(pid);
find_task_by_vpid requires RCU. The kernel would have told you that if you had debug enabled.
if(task == NULL )
{
printk("corresponding pid task does not exist\n");
read_unlock(&tasklist_lock);
return -EFAULT;
}
read_unlock(&tasklist_lock);
So... you unlock... but you did not get any kind of reference to the task.
printk("Corresponding pid task exist \n");
printk("tag is %s\n" , task->tag);
... in other words by the time you do task->tag, the task may already be gone. What requirements are there to access ->tag itself?
if(copy_to_user(*tag, task->tag, sizeof(char) * task->tag_length) !=0)
;
What's up with this? sizeof(char) is guaranteed to be 1.
I'm really confused by this entire business.
When you have a syscall which copies data to userspace where amount of data is not known prior to the call, teh syscall accepts both buffer AND its size. Then you can return appropriate error if the thingy you are trying to copy would not fit.
However, having a syscall in the first place looks incorrect. In linux per-task data is exposed to userspace in /proc/pid/. Figuring out how to add a file to proc is easy and left as an exercise for the reader.
It's quite obvious from the way you fixed it. copy_to_user() will only copy data between two memory regions - one accessible only to kernel and the other accessible also to user. It will not, however, handle any memory allocation. Userspace buffer has to be already allocated and you should pass address of this buffer to the kernel.
One more thing you can change is to change your syscall to use normal pointer to char instead of pointer to pointer which is useless.
Also note that you are leaking memory in your kernel code. You allocate memory for task_struct using kmalloc and then you override the only pointer you have to this memory when calling find_task_by_vpid() and this memory is never freed. find_task_by_vpid() will return a pointer to a task_struct which already exists in memory so there is no need to allocate any buffer for this.
i solved my problem by making malloc in user
I changed
char *b = NULL;
to
char *b = (char*)malloc(sizeof(char) * 100)
I don't know why this work properly. but as i guess copy_to_user get count of bytes as third argument so I should malloc before assigning a value
I don't know. anyone who knows why adding malloc is work properly tell me

Is a spinlock necessary in this Linux device driver code?

Is the following Linux device driver code safe, or do I need to protect access to interrupt_flag with a spinlock?
static DECLARE_WAIT_QUEUE_HEAD(wq_head);
static int interrupt_flag = 0;
static ssize_t my_write(struct file* filp, const char* __user buffer, size_t length, loff_t* offset)
{
interrupt_flag = 0;
wait_event_interruptible(wq_head, interrupt_flag != 0);
}
static irqreturn_t handler(int irq, void* dev_id)
{
interrupt_flag = 1;
wake_up_interruptible(&wq_head);
return IRQ_HANDLED;
}
Basically, I kick off some event in my_write() and wait for the interrupt to indicate that it completes.
If so, which form of spin_lock() do I need to use? I thought spin_lock_irq() was appropriate, but when I tried that I got a warning about the IRQ handler enabling interrupts.
Doesn't wait_event_interruptible evaluate the interrupt_flag != 0 condition? That would imply that the lock should be held while it reads the flag, right?
No lock is needed in the example given. Memory barriers are needed after the store of the flag, and before the load -- to ensure visibility to the flag -- but the wait_event_* and wake_up_* functions provide those. See the section entitled "Sleep and wake-up functions" in this document: https://www.kernel.org/doc/Documentation/memory-barriers.txt
Before adding a lock, consider what is being protected. Generally locks are needed if you're setting two or more separate pieces of data and you need to ensure that another cpu/core doesn't see an incomplete intermediate state (after you started but before you finished). In this case, there's no point in protecting the storing / loading of the flag value because stores and loads of a properly aligned integer are always atomic.
So, depending on what else your driver is doing, it's quite possible you do need a lock, but it isn't needed for the snippet you've provided.
Yes you need a lock. With the given example (that uses int and no specific arch is mentioned), the process context may be interrupted while accessing the interrupt_flag. Upon return from the IRQ, it may continue and interrupt_flag may be left in inconsistent state.
Try this:
static DECLARE_WAIT_QUEUE_HEAD(wq_head);
static int interrupt_flag = 0;
DEFINE_SPINLOCK(lock);
static ssize_t my_write(struct file* filp, const char* __user buffer, size_t length, loff_t* offset)
{
/* spin_lock_irq() or spin_lock_irqsave() is OK here */
spin_lock_irq(&lock);
interrupt_flag = 0;
spin_unlock_irq(&lock);
wait_event_interruptible(wq_head, interrupt_flag != 0);
}
static irqreturn_t handler(int irq, void* dev_id)
{
unsigned long flags;
spin_lock_irqsave(&lock, flags);
interrupt_flag = 1;
spin_unlock_irqrestore(&lock, flags);
wake_up_interruptible(&wq_head);
return IRQ_HANDLED;
}
IMHO, the code has to be written without making any arch or compiler-related assumptions (like the 'properly aligned integer' in Gil Hamilton answer).
Now if we can change the code and use atomic_t instead of the int flag, then no locks should be needed.

Using an existent TCP connection to send packets

I'm using WPE PRO, and I can capture packets and send it back. I tried do it using WinSock 2(The same lib which WPE PRO use), but I don't know how to send packet to a existent TCP connection like WPE PRO does.
http://wpepro.net/index.php?categoryid=2
How can I do it ?
Are you asking how to make someone else's program send data over its existing Winsock connection?
I've done exactly this but unfortunately do not have the code on-hand at the moment. If you give me an hour or two I can put up a working example using C; if you need one let me know and I will.
Edit: sample DLL to test at the bottom of the page if you or anyone else wants to; I can't. All I know is that it compiles. You just need to download (or write!) a freeware DLL injector program to test it; there are tons out there.
In the meantime, what you need to research is:
The very basics of how EXEs are executed.
DLL injection
API hooking
Windows Sockets API
1. The very basics of how EXEs are executed:
The whole entire process of what I'm about to explain to you boils down to this very principal. When you double-click an executable, Windows parses it and loads its code, etc. into memory. This is the key. The compiled code is all being put into RAM. What does this imply? Well, if the application's code is all in RAM, can we change the application's code while it's running by just changing some of its memory? After all, it's just a bunch of instructions.
The answer is yes and will provide us the means of messing with another application - in this case, telling it to send some data over its open socket.
(This principal is also the reason you have to be careful writing programs in low-level languages like C since if you put bad stuff in bad parts of RAM, it can crash the program or open you up to shell code exploits).
2. DLL injection:
The problem is, how do we know which memory to overwrite? Do we have access to that program's memory, especially the parts containing the instructions we want to change? You can write to another process' memory but it's more complicated. The easiest way to change their memory (again, when I say memory, we're talking about the machine code instructions being executed) is by having a DLL loaded and running within that process. Think of your DLL as a .c file you can add to another program and write your own code: you can access the program's variables, call its functions, anything; because it's running within the process.
DLL injection can be done through numerous methods. The usual is by calling the CreateRemoteThread() API function. Do a Google search on that.
3. API Hooking
What is API hooking? To put it more generally, it's "function hooking", we just happen to be interested in hooking API calls; in this case, the ones used for Sockets (socket(), send(), etc.).
Let's use an example. A target application written in C using Winsock. Let's see what they are doing and then show an example of what we WANT to make it do:
Their original source code creating a socket:
SOCKET ConnectSocket = INVALID_SOCKET;
ConnectSocket = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
Now, that's the original program's source code. Our DLL won't have access to that because it's loaded within an EXE and an EXE is compiled (duh!). So let's say their call to socket() looked something like this after being compiled to machine code (assembly). I don't know assembly at all but this is just for illustration:
The assembly/machine code:
PUSH 06 ; IPPROTO_TCP
PUSH 01 ; SOCK_STREAM
PUSH 02 ; AF_INET
CALL WS2_32.socket ; This is one of the parts our DLL will need to intercept ("hook").
In order for us to make that program send data (using our DLL), we need to know the socket's handle. So we need to intercept their call to the socket function. Here are some considerations:
The last instruction there would need to be changed to: CALL OurOwnDLL.socket. That CALL instruction is just a value in memory somewhere (remember?) so we can do that with WriteProcessMemory. We'll get to that.
We want to take control of the target program, not crash it or make it behave strangely. So our code needs to be transparent. Our DLL which we will inject needs to have a socket function identical to the original, return the same value, etc. The only difference is, we will be logging the return value (SocketHandle) so that we can use it later when we want to send data.
We also need to know if/when the socket connects since we can't send data unless it is (assuming we're using TCP like most applications do). This means we need to also hook the Winsock connect API function and also duplicate that in our DLL.
DLL to inject and monitor the socket and connect functions (untested):
This C DLL will have everything in place to hook and unhook functions. I can't test it at the moment and I'm not even much of a C programmer so let me know if you come across any problems.
Compile this as a Windows DLL not using Unicode and inject it into a process that you know uses WS2_32's socket() and connect() functions and let me know if it works. I have no means to test, sorry. If you need further help or fixes, let me know.
/*
SocketHookDLL.c
Author: Daniel Elkins
License: Public Domain
Version: 1.0.0
Created: May 14th, 2014 at 12:23 AM
Updated: [Never]
Summary:
1. Link to the Winsock library so we can use its functions.
2. Export our own `socket` and `connect` functions so that
they can be called by the target application instead of
the original ones from WS2_32.
3. "Hook" the socket APIs by writing over the target's memory,
causing `CALL WS2_32.socket` to `CALL SocketHookDLL.socket`, using
WriteProcessMemory.
4. Make sure to keep a copy of the original memory for when we no
no longer want to hook those socket functions (i.e. DLL detaching).
*/
#pragma comment(lib, "WS2_32.lib")
#include <WinSock2.h>
/* These functions hook and un-hook an API function. */
unsigned long hookFunction (const char * dllModule, const char * apiFunction, unsigned char * memoryBackup);
unsigned int unHookFunction (const char * dllModule, const char * apiFunction, unsigned char * memoryBackup);
/*
These functions (the ones we want to hook) are copies of the original Winsock's functions from Winsock2.h.
1. Calls OurDLL.hooked_socket() (unknowingly).
2. OurDLL.hooked_socket() calls the original Winsock.socket() function.
3. We take note of the returned SOCKET handle so we can use it later to send data.
4. OurDLL.hooked_socket() returns the SOCKET back to the target app so everthing works as it should (hopefully!).
Note: You can change return values, parameters (like data being sent/received like WPE does), just be aware it will
also (hopefully, intendingly) change the behavior of the target application.
*/
SOCKET WSAAPI hooked_socket (int af, int type, int protocol);
int WSAAPI hooked_connect (SOCKET s, const struct sockaddr FAR * name, int namelen);
/* Backups of the original memory; need one for each API function you hook (if you want to unhook it later). */
unsigned char backupSocket[6];
unsigned char backupConnect[6];
/* Our SOCKET handle used by the target application. */
SOCKET targetsSocket = INVALID_SOCKET;
/* This is the very first code that gets executed once our DLL is injected: */
BOOL APIENTRY DllMain (HMODULE moduleHandle, DWORD reason, LPVOID reserved)
{
/*
We will hook the desired Socket APIs when attaching
to target EXE and UN-hook them when being detached.
*/
switch (reason)
{
case DLL_PROCESS_ATTACH:
/* Here goes nothing! */
hookFunction ("WS2_32.DLL", "socket", backupSocket);
hookFunction ("WS2_32.DLL", "connect", backupConnect);
break;
case DLL_THREAD_ATTACH:
break;
case DLL_PROCESS_DETACH:
unHookFunction ("WS2_32.DLL", "socket", backupSocket);
unHookFunction ("WS2_32.DLL", "connect", backupConnect);
break;
case DLL_THREAD_DETACH:
break;
}
return TRUE;
}
unsigned long hookFunction (const char * dllModule, const char * apiFunction, unsigned char * memoryBackup)
{
/*
Hook an API function:
=====================
1. Build the necessary assembly (machine code) opcodes to get our DLL called!
2. Get a handle to the API we're hooking.
3. Use ReadProcessMemory() to backup the original memory to un-hook the function later.
4. Use WriteProcessMemory to make changes to the instructions in memory.
*/
HANDLE thisTargetProcess;
HMODULE dllModuleHandle;
unsigned long apiAddress;
unsigned long memoryWritePosition;
unsigned char newOpcodes[6] = {
0xE9, 0x00, 0x00, 0x00, 0x00, 0xC3 // Step #1.
};
thisTargetProcess = GetCurrentProcess ();
// Step #2.
dllModuleHandle = GetModuleHandle (dllModule);
if (!dllModuleHandle)
return 0;
apiAddress = (unsigned long) GetProcAddress (dllModuleHandle, apiFunction);
if (!apiAddress)
return 0;
// Step #3.
ReadProcessMemory (thisTargetProcess, (void *) apiAddress, memoryBackup, 6, 0);
memoryWritePosition = ((unsigned long) apiFunction - apiAddress - 5);
memcpy (&newOpcodes[1], &apiAddress, 4);
// Step #4.
WriteProcessMemory (thisTargetProcess, (void *) apiAddress, newOpcodes, 6, 0);
return apiAddress;
}
unsigned int unHookFunction (const char * dllModule, const char * apiFunction, unsigned char * memoryBackup)
{
HANDLE thisTargetProcess;
HMODULE dllModuleHandle;
unsigned long apiAddress;
unsigned long memoryWritePosition;
thisTargetProcess = GetCurrentProcess ();
dllModuleHandle = GetModuleHandleA (dllModule);
if (!dllModuleHandle)
return 0;
apiAddress = (unsigned long) GetProcAddress (dllModuleHandle, apiFunction);
if (!apiAddress)
return 0;
if (WriteProcessMemory (thisTargetProcess, (void *) apiAddress, memoryBackup, 6, 0))
return 1;
return 0;
}
/* You may want to use a log file instead of a MessageBox due to time-outs, etc. */
SOCKET WSAAPI hooked_socket (int af, int type, int protocol)
{
targetsSocket = socket (af, type, protocol);
MessageBox (NULL, "(Close this quickly)\r\n\r\nThe target's socket was hooked successfully!", "Hooked SOCKET", MB_OK);
return targetsSocket;
}
int WSAAPI hooked_connect (SOCKET s, const struct sockaddr FAR * name, int namelen)
{
MessageBox (NULL, "(Close this quickly)\r\n\r\nThe target just connected to a remote address.", "Target Connected", MB_OK);
return connect (s, name, namelen);
}

atomic_inc and atomic_xchg in gcc assembly

I have written the following user-level code snippet to test two sub functions, atomic inc and xchg (refer to Linux code).
What I need is just try to perform operations on 32-bit integer, and that's why I explicitly use int32_t.
I assume global_counter will be raced by different threads, while tmp_counter is fine.
#include <stdio.h>
#include <stdint.h>
int32_t global_counter = 10;
/* Increment the value pointed by ptr */
void atomic_inc(int32_t *ptr)
{
__asm__("incl %0;\n"
: "+m"(*ptr));
}
/*
* Atomically exchange the val with *ptr.
* Return the value previously stored in *ptr before the exchange
*/
int32_t atomic_xchg(uint32_t *ptr, uint32_t val)
{
uint32_t tmp = val;
__asm__(
"xchgl %0, %1;\n"
: "=r"(tmp), "+m"(*ptr)
: "0"(tmp)
:"memory");
return tmp;
}
int main()
{
int32_t tmp_counter = 0;
printf("Init global=%d, tmp=%d\n", global_counter, tmp_counter);
atomic_inc(&tmp_counter);
atomic_inc(&global_counter);
printf("After inc, global=%d, tmp=%d\n", global_counter, tmp_counter);
tmp_counter = atomic_xchg(&global_counter, tmp_counter);
printf("After xchg, global=%d, tmp=%d\n", global_counter, tmp_counter);
return 0;
}
My 2 questions are:
Are these two subfunctions written properly?
Will this behave the same when I compile this on 32-bit or
64-bit platform? For example, could the pointer address have a different
length. or could incl and xchgl will conflict with the operand?
My understanding of this question is below, please correct me if I'm wrong.
All the read-modify-write instructions (ex: incl, add, xchg) need a lock prefix. The lock instruction is to lock the memory accessed by other CPUs by asserting LOCK# signal on the memory bus.
The __xchg function in Linux kernel implies no "lock" prefix because xchg always implies lock anyway. http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/cmpxchg_64.h#L15
However, the incl used in atomic_inc does not have this assumption so a lock_prefix is needed.
http://lxr.linux.no/linux+v2.6.38/arch/x86/include/asm/atomic.h#L105
btw, I think you need to copy the *ptr to a volatile variable to avoid gcc optimization.
William

Resources