SIGSEGV handler and mprotect and looping effect when injecting instructions at runtime. Handler can't get info->si_addr - loader

I have looked at the various topics relating to this, but couldn't find this specific issue I am having.
Things I looked at:
Injecting code into executable at runtime
C SIGSEGV Handler & Mprotect
Can I write-protect every page in the address space of a Linux process?
How to write a signal handler to catch SIGSEGV?
I am able to handle SIGSEGV gracefully when the protection needs to be set to either PROT_READ or PROT_WRITE in the handler. However, when I try to inject instructions with mmap, and then use mprotect to set it to PROT_READ only, and then I execute the instructions via inline assembly, it causes a SIGSEGV as intended, but the handler is unable to get the originating address causing the signal, so I am unable to mprotect it to PROT_READ | PROT_EXEC.
Example:
void sigHandler(int signum, siginfo_t *info, void *ptr) {
printf("Received signal number: %d\n", signum);
printf("Signal originates from process %lu\n",
(unsigned long)info->si_pid);
printf("SIGSEGV caused by this address: ? %p\n", info->si_addr);
char * alignedbaseAddr = (((unsigned int)(info->si_addr)) >> 12) * getPageSize();
printf("Aligning to %p\n", alignedbaseAddr);
//flip this page to be r+x
mprotect(alignedbaseAddr, getPageSize(), PROT_READ | PROT_EXEC);
}
void setupSignalHandler() {
action.sa_sigaction = sigHandler;
action.sa_flags = SA_SIGINFO;
sigemptyset(&action.sa_mask);
sigaction(SIGSEGV, &action, NULL);
}
int main(int argc, char *argv[]) {
char * baseAddr = (char*)mmap(NULL, getDiskSize(), PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if(baseAddr == MAP_FAILED) {
perror("Unable to mmap.");
}
printf("Process address space is %d\n", getDiskSize());
//no-op filler
for(int i = 0; i < (getDiskSize()) - 1; i++) {
baseAddr[i] = 0x90;
}
//ret instruction
baseAddr[i] = 0xc3;
if( mprotect(baseAddr, getDiskSize(), PROT_READ) == -1) {
perror("mprotect");
exit(1);
}
printf("Protecting addresses: %p to %p for READ_ONLY\n", baseAddr, baseAddr + getDiskSize() - 1);
setupSignalHandler();
__asm__
(
"call %%eax;"
: "=a" (output)
: "a" (baseAddr)
);
printf("Will this ever print?");
//close fd, and unmap memory
cleanUp();
return EXIT_SUCCESS;
}
Here is the resulting output:
Received signal number: 11
Signal originates from process 0
SIGSEGV caused by this address: ? (nil)
//the above output repeatedly loops, since it fails to "re mprotect" that page.
Architecture:
x86 32 bit
OS:
Ubuntu 11.04 - Linux version 2.6.38-12-generic (buildd#vernadsky) (gcc version 4.5.2 (Ubuntu/Linaro 4.5.2-8ubuntu4) )
Any ideas? The above logic works fine for simply read and writing into memory. Is there
a better way to execute instructions at runtime as opposed to inline assembly?
Thanks in advance!

In that case, the faulting address is the instruction pointer. Cast your third argument ptr (of your signal handler installed with SA_SIGINFO) to a ucontext_t, and retrieve the appropriate register, perhaps as (untested code!)
ucontext_t *uc = ptr;
void* faultyip = uc->uc_mcontext.gregs[REG_IP];
Read carefully /usr/include/sys/ucontext.h for more.
I'm interested to know why you are asking!!

Related

SEH Handlers using RtlAddFunctionTable

I've been trying to setup SEH on x64 windows using gcc by calling the RtlAddFunctionTable. Unfortunately, the API call returns success but my handler doesn't seem to ever be called. And I can't find out what's wrong. My small example is:
EXCEPTION_DISPOSITION catchDivZero( struct _EXCEPTION_RECORD* rec
, void* arg1 __attribute__((unused))
, struct _CONTEXT* ctxt __attribute__((unused))
, void* arg2 __attribute__((unused))
)
{
printf("Exception will be handled!\n");
return ExceptionContinueSearch;
}
HMODULE GetCurrentModule()
{ // NB: XP+ solution!
HMODULE hModule = NULL;
GetModuleHandleEx(
GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS,
(LPCTSTR)GetCurrentModule,
&hModule);
return hModule;
}
typedef struct {
UINT8 Version : 3;
UINT8 Flags : 5;
UINT8 SizeOfProlog;
UINT8 CountOfUnwindCodes;
UINT8 FrameRegister : 4;
UINT8 FrameRegisterOffset : 4;
ULONG ExceptionHandler;
} UNWIND_INFO;
/* Hack, for bug in ld. Will be removed soon. */
#if defined(__GNUC__)
#define __ImageBase __MINGW_LSYMBOL(_image_base__)
#endif
/* Get the end of the text section. */
extern char etext[] asm("etext");
/* Get the base of the module. */
/* This symbol is defined by ld. */
extern IMAGE_DOS_HEADER __ImageBase;
static UNWIND_INFO info[1];
static RUNTIME_FUNCTION handlers[1];
#define base (ULONG)((HINSTANCE)&__ImageBase)
int main()
{
HANDLE hProcess = GetCurrentProcess();
HMODULE hModule = GetCurrentModule();
MODULEINFO mi;
GetModuleInformation(hProcess, hModule, &mi, sizeof(mi));
printf( "Module: 0x%.8X (0x%.8X) 0x%.8X |0x%.8X| [0x%.8X] {0x%.8X}\n\n"
, mi.lpBaseOfDll
, base
, (char*)etext
, mi.SizeOfImage
, &catchDivZero
, (ULONG)(&catchDivZero - base)
);
printf("Building UNWIND_INFO..\n");
info[0].Version = 1;
info[0].Flags = UNW_FLAG_EHANDLER;
info[0].SizeOfProlog = 0;
info[0].CountOfUnwindCodes = 0;
info[0].FrameRegister = 0;
info[0].FrameRegisterOffset = 0;
info[0].ExceptionHandler = (ULONG)(&catchDivZero - base);
printf("Created UNWIND_INFO at {0x%.8X}\n", info[0].ExceptionHandler);
printf("Building SEH handlers...\n");
handlers[0].BeginAddress = 0;
handlers[0].EndAddress = (ULONG)(etext - base);
handlers[0].UnwindData = (ULONG)((char*)info - base);
printf("Adding SEH handlers to .pdata..\n");
printf("Handler Unwind: 0x%.8X\n", &info);
printf( "Handler Info:: s: 0x%.8X, e: 0x%.8X, u: 0x%.8X\n"
, handlers[0].BeginAddress
, handlers[0].EndAddress
, handlers[0].UnwindData
);
if (RtlAddFunctionTable(handlers, 1, (DWORD64)base))
{
printf("Hook succeeded.\nTesting..\n");
printf("Things to do: %i\n", 12 / 0);
}
else
{
printf("Hook failed\n");
DWORD result = GetLastError();
printf("Error code: 0x%.8X\n", result);
}
}
However when called the output I get is:
> .\a.exe
Module: 0x00400000 (0x00400000) 0x00402FF0 |0x00022000| [0x00401530] {0x00001530}
Building UNWIND_INFO..
Created UNWIND_INFO at {0x00001530}
Building SEH handlers...
Adding SEH handlers to .pdata..
Handler Unwind: 0x00407030
Handler Info:: s: 0x00000000, e: 0x00002FF0, u: 0x00007030
Hook succeeded.
Testing..
The message in my handler is never printed.
Any help/pointers would be greatly appreciated.
RtlAddFunctionTable() adds a dynamic function table; if there already is a static function table (.pdata section) for the base address, the RtlAddFunctionTable() calls succeeds, but the static function table still takes precedence.
You need to allocate memory outside the image range, e.g. using VirtualAlloc(), and have your code and runtime table and unwind info there. The address of allocated memory is the base address for all the RVAs in the tables, and needs to be passed to RtlAddFunctionTable().
You can experiment with RtlLookupFunctionEntry() to see if the function table entry is found for a given address.
Sample code showing RtlAddFunctionTable() in action is at https://pmeerw.net/blog/programming/RtlAddFunctionTable.html.
Didn't you forget to register your handler with call to SetUnhandledExceptionFilter (if you use SEH as stated in your post) or AddVectoredExceptionHandler (if you decide to switch to VEH)? In your code you add information about the handler but do not register it.
I have tested your sample with the change of the handler itself:
LONG WINAPI catchDivZero(EXCEPTION_POINTERS * ExceptionInfo)
{
printf("Exception will be handled!\n");
return ExceptionContinueSearch;
}
and adding the code:
if (::AddVectoredExceptionHandler(TRUE, catchDivZero))
{
printf("Set exception handler.\nContinuing..\n");
}
else
{
printf("Setting exception handler failed\n");
DWORD result = GetLastError();
printf("Error code: 0x%.8X\n", result);
return 1;
}
just before the call to RtlAddFunctionTable.
Now the message from the handler is printed.
To remove the handler use:
::RemoveVectoredExceptionHandler(catchDivZero);
Hope it helps.
Note: as an alternative you may use SetUnhandledExceptionFilter(catchDivZero)). Keep in mind that it's not that useful for debugging:
After calling this function, if an exception occurs in a process that
is not being debugged, and the exception makes it to the unhandled
exception filter, that filter will call the exception filter function
specified by the lpTopLevelExceptionFilter parameter.
With VEH way we can debug the handler function right in IDE, with SEH we can not (there is probably a solution to this but I do not know about it) so I've proposed VEH as the main solution.

CUDA constant memory issue: invalid device symbol with cudaGetSymbolAddress

I am trying to set constant values on my GPU's constant memory before launching a kernel which needs these values.
My code (simplified):
__constant__ size_t con_N;
int main()
{
size_t N;
size_t* dev_N = NULL;
cudaError_t cudaStatus;
//[...]
cudaStatus = cudaGetSymbolAddress((void **)&dev_N, &con_N);
if (cudaStatus != cudaSuccess) {
cout<<"cudaGetSymbolAddress (dev_N) failed: "<<cudaGetErrorString(cudaStatus)<<endl;
}
I planned to cudaMemcpy my N to dev_N afterwards.
However, all I get at this point in the code is:
cudaGetSymbolAddress (dev_N) failed: invalid device symbol
I'm working with CUDA 6.5 so it's not a quoted symbol issue, as it is in most of the Q&A I've been checking so far.
I tried to replace con_N with con_N[1] (and remove the & before con_N in cudaGetSymbolAddress parameters): same result.
As the prototype of this function is cudaGetSymbolAddress(void **devPtr , const void* symbol ), I guessed it wanted to be given my symbol's address. However, I tried with cudaStatus = cudaGetSymbolAddress((void **)&dev_N, (const void*) con_N); and I got the same message.
I'm also getting the very same error message when I remove cudaGetSymbolAddress((void **)&dev_N, &con_N) and go directly with cudaMemcpyToSymbol(&con_N, &N, sizeof(size_t)) instead.
I'm afraid I missed something essential. Any help will be greatly appreciated.
The correct usage of cudaGetSymbolAddress is
cudaGetSymbolAddress((void **)&dev_N, con_N)
I'm showing this with the simple example below.
As the documentation explains, the symbol should physically reside on the device. Accordingly, using &con_N in the API call appears to be meaningless, since, being cudaGetSymbolAddress a host API, accessing the address of something residing on the device directly from host should not be possible. I'm not sure if the prototype appearing in the CUDA Runtime API document should better read as `
template<class T>
cudaError_t cudaGetSymbolAddress (void **devPtr, const T symbol)
with device symbol reference instead of device symbol address.
#include <stdio.h>
__constant__ int const_symbol;
/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
/***************/
/* TEST KERNEL */
/***************/
__global__ void kernel() {
printf("Address of symbol from device = %p\n", &const_symbol);
}
/********/
/* MAIN */
/********/
int main()
{
const int N = 16;
int *pointer = NULL;
gpuErrchk(cudaGetSymbolAddress((void**)&pointer, const_symbol));
kernel<<<1,1>>>();
printf("Address of symbol from host = %p\n", pointer);
return 0;
}
In my opinion, A line of your code should be fixed like below.
cudaStatus = cudaGetSymbolAddress((void **)&dev_N, con_N);
Hope this helps you.

turn on LED using C

I want to turn on a LED using C, meaning that I want to write on parallel port.
but the code doesn't work.
I use char ledStatus instead of BYTE ledStatus. is there any difference??
what is the problem in this code?
#include <windows.h>
#include <conio.h>
#include <staio.h>
#define LED_ON 1
int main()
{
HANDLE h;
unsigned long dwSize=1;
int success;
h = CreateFile(
L"LPT1",
GENERIC_WRITE, // access (read-write) mode
0, // share mode
NULL, // pointer to security attributes
OPEN_EXISTING, // how to create
FILE_ATTRIBUTE_NORMAL, // file attributes
NULL // handle to file with attributes to copy
);
if (INVALID_HANDLE_VALUE == h)
{
//Handle Error
printf("CreateFile failed with error %d\n", GetLastError());
exit(1);
}
else
{
printf("CreateFile1 Successful\n");
}
char ledStatus;
// turn on LED
ledStatus = LED_ON;
success = WriteFile(h, &ledStatus, 1, &dwSize, NULL);
if (success)
{
printf("File Write Successful - %i bytes\n", dwSize);
}
else
{
printf("File Write Failed\n");
}
// close port
CloseHandle(h);
return 0;
}
Your question is very poorly documented, you didn't describe what signal you used or how you wired the LED. Lots of ways to get that wrong. But you have no hope of making it work with the standard Windows parallel driver. It was written to interface parallel devices like printers. Which requires handshaking to clock a byte to the device. The driver turns on the STROBE signal, the device must turn on the ACK signal to acknowledge it copied the byte. That of course doesn't happen, the WriteFile() calls only fill a buffer in the driver.
You'll need another driver to directly control the output lines, Inpout32 is a common choice. Find essential advice in Jan Axelson's book, also includes a link to Inpout32.

How can I prevent semaphore lockup when thread is terminated with bus error

I am developing a Linux device driver running on an embedded CPU. This device driver control some external hardware. The external hardware has it's own DDR controler and external DDR. The hardware's DDR is visible on the embedded CPU via a movable memory window (so I have paged access to the external DDR from the Linux driver). I'm using Linux kernel version 2.6.33.
My driver uses sysfs to allow control of the external hardware from userspace. As an example, the external hardware generates a heartbeat counter which increments a specific address in external DDR. The driver reads this to detect if the external hardware is still running.
If the external DDR is not working correctly then an access to the external DDR produces a bus error on the embedded CPU. To protect against simultaneous multi-thread access, the driver uses a semaphore.
Now to the problem. If a thread grabs the semaphore, then terminates with a bus error, the semaphore is still locked. All subsequent calls to grab the semaphore block indefinatly. What techniques can I use to avoid this hanging the driver forever?
An example sysfs function (simplified):
static ssize_t running_attr_show(struct device *dev, struct device_attribute *attr, char *buffer)
{
struct my_device * const my_dev = container_of(dev, struct my_device, dev);
int ret;
if(down_interruptible(&my_dev->sem))
{
ret = -ERESTARTSYS;
}
else
{
u32 heartbeat;
int running;
// Following line could cause bus error
heartbeat = mwindow_get_reg(&my_dev->mwindow, HEARTBEAT_COUNTER_ADDR);
running = (heartbeat != my_dev->last_heartbeat) ? 1 : 0;
my_dev->last_heartbeat = heartbeat;
ret = sprintf(buffer, "%d\n", result);
/* unlock */
up(&my_dev->sem);
}
return ret;
}
You'll need to modify mwindow_get_reg() and possibly the architecture fault handler that's invoked on a bus error so that mwindow_get_reg() can return an error, rather than terminating the process.
You can then handle that error gracefully, by releasing the semaphore and returning an error to userspace.
Thanks to #caf, here is the solution I've implemented.
I've converted part of mwindow_get_reg to assembly. For the possible faulting read I've added an entry into the ex_table section with the faulting address and fixup address. This causes the exception handler to jump to the fixup code instead of terminating the thread if an exception occurs at this address. The fixup assembler sets a 'faulted' flag, which I can then test for in my c code:
unsigned long ret = 0;
int faulted;
asm volatile(
" 1: lwi %0, %2, 0; " // ret = *window_addr
" 2: addik %1, r0, 0; " // faulted = 0
" 3: "
" .section .fixup, \"ax\"; " // fixup code executed if exception occurs
" 4: brid 3b; " // jump to next line of c code
" addik %1, r0, 1; " // faulted = 1 (in delay slot)
" .previous; "
" .section __ex_table,\"a\"; "
" .word 1b,4b; " // ex_table entry. Gives fault address and jump address if fault occurs
" .previous; "
: "=r" (ret), "=r" (faulted) // output registers
: "r" (window_addr) // input registers
);
if (faulted)
{
printk(KERN_ERROR "%s: %s: FAULTED!", MODNAME, __FUNCTION__);
ret = 0xdeadbeef;
}
I also had to modify my DBUS exception handler by adding the following:
const struct exception_table_entry *fixup;
fixup = search_exception_tables(regs->pc);
if (fixup) {
printk(KERN_ERROR "DBUS exception: calling fixup\n");
regs->pc = fixup->fixup;
return;
}

how to use CryptoAPI in the linux kernel 2.6

I have been looking for some time but have not found anywhere near sufficient documentation / examples on how to use the CryptoAPI that comes with linux in the creation of syscalls / in kernel land.
If anyone knows of a good source please let me know, I would like to know how to do SHA1 / MD5 and Blowfish / AES within the kernel space only.
#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/err.h>
#include <linux/scatterlist.h>
#define SHA1_LENGTH 20
static int __init sha1_init(void)
{
struct scatterlist sg;
struct crypto_hash *tfm;
struct hash_desc desc;
unsigned char output[SHA1_LENGTH];
unsigned char buf[10];
int i;
printk(KERN_INFO "sha1: %s\n", __FUNCTION__);
memset(buf, 'A', 10);
memset(output, 0x00, SHA1_LENGTH);
tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
desc.tfm = tfm;
desc.flags = 0;
sg_init_one(&sg, buf, 10);
crypto_hash_init(&desc);
crypto_hash_update(&desc, &sg, 10);
crypto_hash_final(&desc, output);
for (i = 0; i < 20; i++) {
printk(KERN_ERR "%d-%d\n", output[i], i);
}
crypto_free_hash(tfm);
return 0;
}
static void __exit sha1_exit(void)
{
printk(KERN_INFO "sha1: %s\n", __FUNCTION__);
}
module_init(sha1_init);
module_exit(sha1_exit);
MODULE_LICENSE("Dual MIT/GPL");
MODULE_AUTHOR("Me");
There are a couple of places in the kernel which use the crypto module: the eCryptfs file system (linux/fs/ecryptfs/) and the 802.11 wireless stack (linux/drivers/staging/rtl8187se/ieee80211/). Both of these use AES, but you may be able to extrapolate what you find there to MD5.
Another good example is from the 2.6.18 kernel source in security/seclvl.c
Note: You can change CRYPTO_TFM_REQ_MAY_SLEEP if needed
static int
plaintext_to_sha1(unsigned char *hash, const char *plaintext, unsigned int len)
{
struct crypto_tfm *tfm;
struct scatterlist sg;
if (len > PAGE_SIZE) {
seclvl_printk(0, KERN_ERR, "Plaintext password too large (%d "
"characters). Largest possible is %lu "
"bytes.\n", len, PAGE_SIZE);
return -EINVAL;
}
tfm = crypto_alloc_tfm("sha1", CRYPTO_TFM_REQ_MAY_SLEEP);
if (tfm == NULL) {
seclvl_printk(0, KERN_ERR,
"Failed to load transform for SHA1\n");
return -EINVAL;
}
sg_init_one(&sg, (u8 *)plaintext, len);
crypto_digest_init(tfm);
crypto_digest_update(tfm, &sg, 1);
crypto_digest_final(tfm, hash);
crypto_free_tfm(tfm);
return 0;
}
Cryptodev-linux
https://github.com/cryptodev-linux/cryptodev-linux
It is a kernel module that exposes the kernel crypto API to userspace through /dev/crypto .
SHA calculation example: https://github.com/cryptodev-linux/cryptodev-linux/blob/da730106c2558c8e0c8e1b1b1812d32ef9574ab7/examples/sha.c
As others have mentioned, the kernel does not seem to expose the crypto API to userspace itself, which is a shame since the kernel can already use native hardware accelerated crypto functions internally.
Crypto operations cryptodev supports: https://github.com/nmav/cryptodev-linux/blob/383922cabeea7dca354415e8c590f8e932f4d7a8/crypto/cryptodev.h
Crypto operations Linux x86 supports: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto?id=refs/tags/v4.0
The best place to start is Documentation/crytpo in the kernel sources. dm-crypt is one of the many components that probably uses the kernel crypto API and you can refer to it to get an idea about usage.
how to do SHA1 / MD5 and Blowfish / AES within the kernel space only.
Example of hashing data using a two-element scatterlist:
struct crypto_hash *tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
if (tfm == NULL)
fail;
char *output_buf = kmalloc(crypto_hash_digestsize(tfm), GFP_KERNEL);
if (output_buf == NULL)
fail;
struct scatterlist sg[2];
struct hash_desc desc = {.tfm = tfm};
ret = crypto_hash_init(&desc);
if (ret != 0)
fail;
sg_init_table(sg, ARRAY_SIZE(sg));
sg_set_buf(&sg[0], "Hello", 5);
sg_set_buf(&sg[1], " World", 6);
ret = crypto_hash_digest(&desc, sg, 11, output_buf);
if (ret != 0)
fail;
One critical note:
Never compare the return value of crypto_alloc_hash function to NULL for detecting the failure.
Steps:
Always use IS_ERR function for this purpose. Comparing to NULL does not capture the error, hence you get segmentation faults later on.
If IS_ERR returns fail, you possibly have a missing crypto algorithm compiled into your kernel image (or as a module). Make sure you have selected the appropriate crypto algo. form make menuconfig.

Resources