how to use CryptoAPI in the linux kernel 2.6 - linux-kernel

I have been looking for some time but have not found anywhere near sufficient documentation / examples on how to use the CryptoAPI that comes with linux in the creation of syscalls / in kernel land.
If anyone knows of a good source please let me know, I would like to know how to do SHA1 / MD5 and Blowfish / AES within the kernel space only.

#include <linux/kernel.h>
#include <linux/init.h>
#include <linux/module.h>
#include <linux/crypto.h>
#include <linux/err.h>
#include <linux/scatterlist.h>
#define SHA1_LENGTH 20
static int __init sha1_init(void)
{
struct scatterlist sg;
struct crypto_hash *tfm;
struct hash_desc desc;
unsigned char output[SHA1_LENGTH];
unsigned char buf[10];
int i;
printk(KERN_INFO "sha1: %s\n", __FUNCTION__);
memset(buf, 'A', 10);
memset(output, 0x00, SHA1_LENGTH);
tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
desc.tfm = tfm;
desc.flags = 0;
sg_init_one(&sg, buf, 10);
crypto_hash_init(&desc);
crypto_hash_update(&desc, &sg, 10);
crypto_hash_final(&desc, output);
for (i = 0; i < 20; i++) {
printk(KERN_ERR "%d-%d\n", output[i], i);
}
crypto_free_hash(tfm);
return 0;
}
static void __exit sha1_exit(void)
{
printk(KERN_INFO "sha1: %s\n", __FUNCTION__);
}
module_init(sha1_init);
module_exit(sha1_exit);
MODULE_LICENSE("Dual MIT/GPL");
MODULE_AUTHOR("Me");

There are a couple of places in the kernel which use the crypto module: the eCryptfs file system (linux/fs/ecryptfs/) and the 802.11 wireless stack (linux/drivers/staging/rtl8187se/ieee80211/). Both of these use AES, but you may be able to extrapolate what you find there to MD5.

Another good example is from the 2.6.18 kernel source in security/seclvl.c
Note: You can change CRYPTO_TFM_REQ_MAY_SLEEP if needed
static int
plaintext_to_sha1(unsigned char *hash, const char *plaintext, unsigned int len)
{
struct crypto_tfm *tfm;
struct scatterlist sg;
if (len > PAGE_SIZE) {
seclvl_printk(0, KERN_ERR, "Plaintext password too large (%d "
"characters). Largest possible is %lu "
"bytes.\n", len, PAGE_SIZE);
return -EINVAL;
}
tfm = crypto_alloc_tfm("sha1", CRYPTO_TFM_REQ_MAY_SLEEP);
if (tfm == NULL) {
seclvl_printk(0, KERN_ERR,
"Failed to load transform for SHA1\n");
return -EINVAL;
}
sg_init_one(&sg, (u8 *)plaintext, len);
crypto_digest_init(tfm);
crypto_digest_update(tfm, &sg, 1);
crypto_digest_final(tfm, hash);
crypto_free_tfm(tfm);
return 0;
}

Cryptodev-linux
https://github.com/cryptodev-linux/cryptodev-linux
It is a kernel module that exposes the kernel crypto API to userspace through /dev/crypto .
SHA calculation example: https://github.com/cryptodev-linux/cryptodev-linux/blob/da730106c2558c8e0c8e1b1b1812d32ef9574ab7/examples/sha.c
As others have mentioned, the kernel does not seem to expose the crypto API to userspace itself, which is a shame since the kernel can already use native hardware accelerated crypto functions internally.
Crypto operations cryptodev supports: https://github.com/nmav/cryptodev-linux/blob/383922cabeea7dca354415e8c590f8e932f4d7a8/crypto/cryptodev.h
Crypto operations Linux x86 supports: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto?id=refs/tags/v4.0

The best place to start is Documentation/crytpo in the kernel sources. dm-crypt is one of the many components that probably uses the kernel crypto API and you can refer to it to get an idea about usage.

how to do SHA1 / MD5 and Blowfish / AES within the kernel space only.
Example of hashing data using a two-element scatterlist:
struct crypto_hash *tfm = crypto_alloc_hash("sha1", 0, CRYPTO_ALG_ASYNC);
if (tfm == NULL)
fail;
char *output_buf = kmalloc(crypto_hash_digestsize(tfm), GFP_KERNEL);
if (output_buf == NULL)
fail;
struct scatterlist sg[2];
struct hash_desc desc = {.tfm = tfm};
ret = crypto_hash_init(&desc);
if (ret != 0)
fail;
sg_init_table(sg, ARRAY_SIZE(sg));
sg_set_buf(&sg[0], "Hello", 5);
sg_set_buf(&sg[1], " World", 6);
ret = crypto_hash_digest(&desc, sg, 11, output_buf);
if (ret != 0)
fail;

One critical note:
Never compare the return value of crypto_alloc_hash function to NULL for detecting the failure.
Steps:
Always use IS_ERR function for this purpose. Comparing to NULL does not capture the error, hence you get segmentation faults later on.
If IS_ERR returns fail, you possibly have a missing crypto algorithm compiled into your kernel image (or as a module). Make sure you have selected the appropriate crypto algo. form make menuconfig.

Related

CUDA constant memory issue: invalid device symbol with cudaGetSymbolAddress

I am trying to set constant values on my GPU's constant memory before launching a kernel which needs these values.
My code (simplified):
__constant__ size_t con_N;
int main()
{
size_t N;
size_t* dev_N = NULL;
cudaError_t cudaStatus;
//[...]
cudaStatus = cudaGetSymbolAddress((void **)&dev_N, &con_N);
if (cudaStatus != cudaSuccess) {
cout<<"cudaGetSymbolAddress (dev_N) failed: "<<cudaGetErrorString(cudaStatus)<<endl;
}
I planned to cudaMemcpy my N to dev_N afterwards.
However, all I get at this point in the code is:
cudaGetSymbolAddress (dev_N) failed: invalid device symbol
I'm working with CUDA 6.5 so it's not a quoted symbol issue, as it is in most of the Q&A I've been checking so far.
I tried to replace con_N with con_N[1] (and remove the & before con_N in cudaGetSymbolAddress parameters): same result.
As the prototype of this function is cudaGetSymbolAddress(void **devPtr , const void* symbol ), I guessed it wanted to be given my symbol's address. However, I tried with cudaStatus = cudaGetSymbolAddress((void **)&dev_N, (const void*) con_N); and I got the same message.
I'm also getting the very same error message when I remove cudaGetSymbolAddress((void **)&dev_N, &con_N) and go directly with cudaMemcpyToSymbol(&con_N, &N, sizeof(size_t)) instead.
I'm afraid I missed something essential. Any help will be greatly appreciated.
The correct usage of cudaGetSymbolAddress is
cudaGetSymbolAddress((void **)&dev_N, con_N)
I'm showing this with the simple example below.
As the documentation explains, the symbol should physically reside on the device. Accordingly, using &con_N in the API call appears to be meaningless, since, being cudaGetSymbolAddress a host API, accessing the address of something residing on the device directly from host should not be possible. I'm not sure if the prototype appearing in the CUDA Runtime API document should better read as `
template<class T>
cudaError_t cudaGetSymbolAddress (void **devPtr, const T symbol)
with device symbol reference instead of device symbol address.
#include <stdio.h>
__constant__ int const_symbol;
/********************/
/* CUDA ERROR CHECK */
/********************/
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char *file, int line, bool abort=true)
{
if (code != cudaSuccess)
{
fprintf(stderr,"GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
if (abort) exit(code);
}
}
/***************/
/* TEST KERNEL */
/***************/
__global__ void kernel() {
printf("Address of symbol from device = %p\n", &const_symbol);
}
/********/
/* MAIN */
/********/
int main()
{
const int N = 16;
int *pointer = NULL;
gpuErrchk(cudaGetSymbolAddress((void**)&pointer, const_symbol));
kernel<<<1,1>>>();
printf("Address of symbol from host = %p\n", pointer);
return 0;
}
In my opinion, A line of your code should be fixed like below.
cudaStatus = cudaGetSymbolAddress((void **)&dev_N, con_N);
Hope this helps you.

WINAPI USB DEVICE LIST C LIBUSB not any good

I hope someone can help me with this issue. I made an application to read some data from a smartphone and display in an application. It worked fine at my house, so I took it to my friend's house to show him and it didn't work. So after the panic, I realized that the address had changed slightly due to being connected to a new PC not a problem there must be a simple solution on winapi.
\\?\usb#vid_045e&pid_0040#6&ff454f2&0&3#{a5dcbf10-6530-11d2-901f-00c04fb951ed}\
I have only found code for C++ and my app is in C so it's no use. I also found libusb on google, however this doesn't return the full paths like in my example above.
Is there a simple fix like search by GUID? Hope you can help.
BR
This was the LIBUSB I used
#include <stdio.h>
#include <sys/types.h>
#include <windows.h>
#include <libusb.h>
static void print_devs(libusb_device **devs)
{
libusb_device *dev;
int i = 0;
while ((dev = devs[i++]) != NULL) {
struct libusb_device_descriptor desc;
int r = libusb_get_device_descriptor(dev, &desc);
if (r < 0) {
fprintf(stderr, "failed to get device descriptor");
return;
}
printf("%04x:%04x (bus %d, device %d)\n",
desc.idVendor, desc.idProduct,
libusb_get_bus_number(dev), libusb_get_device_address(dev));
}
}
int main(void)
{
libusb_device **devs;
int r;
ssize_t cnt;
r = libusb_init(NULL);
if (r < 0)
return r;
cnt = libusb_get_device_list(NULL, &devs);
if (cnt < 0)
return (int) cnt;
print_devs(devs);
libusb_free_device_list(devs, 1);
libusb_exit(NULL);
system("pause");
return 0;
}
This just returns for example
1033:0194 (bus 1, device 255)
Yes you can get a list of all the device identifiers on your computer, but it's not really all that simple, especially if you need to filter it for a particular kind of device.
You start with SetupDiGetClassDevs. After enumerating the matching devices, use SetupDiGetDeviceInstanceId to get the device path, like the one shown in your question.

Problems With 64bit Posix Write In Mac OS X? (2gb+ Dataset in HDF5)

I'm having some issues with HDF5 on Mac os x (10.7). After some testing, I've confirmed that POSIX write seems to have issues with buffer sizes exceeding 2gb. I've written a test program to demonstrate the issue:
#define _FILE_OFFSET_BITS 64
#include <iostream>
#include <unistd.h>
#include <fcntl.h>
void writePosix(const int64_t arraySize, const char* name) {
int fd = open(name, O_WRONLY | O_CREAT);
if (fd != -1) {
double *array = new double [arraySize];
double start = 0.0;
for (int64_t i=0;i<arraySize;++i) {
array[i] = start;
start += 0.001;
}
ssize_t result = write(fd, array, (int64_t)(sizeof(double))*arraySize);
printf("results for array size %lld = %ld\n", arraySize, result);
close(fd);
} else {
printf("file error");
}
}
int main(int argc, char *argv[]) {
writePosix(268435455, "/Users/tpav/testfolder/lessthan2gb");
writePosix(268435456, "/Users/tpav/testfolder/equal2gb");
}
Output:
results for array size 268435455 = 2147483640
results for array size 268435456 = -1
As you can see, I've even tried defining the file offsets. Is there anything I can do about this or should I start looking for a workaround in the way I write 2gb+ chunks?
In the HDF5 virtual file drivers, we break I/O operations that are too large for the call into multiple smaller I/O calls. The Mac implementation of POSIX I/O takes a size_t argument so our code assumed that the max I/O size would be the max value that can fit in a variable of type ssize_t (the return type of read/write). Sadly, this is not the case.
Note that this only applies to single I/O operations. You can create files that go above the 2GB/4GB barrier, you just can't write >2GB in a single call.
This should be fixed in HDF5 1.8.10 patch 1, due out in late January 2013.

Getting process base address in Mac OSX

I'm trying to read the memory of a process using task_for_pid / vm_read.
uint32_t sz;
pointer_t buf;
task_t task;
pid_t pid = 9484;
kern_return_t error = task_for_pid(current_task(), pid, &task);
vm_read(task, 0x10e448000, 2048, &buf, &sz);
In this case I read the first 2048 bytes.
This works when I know the base address of the process (which I can find out using gdb "info shared" - in this case 0x10e448000), but how do I find out the base address at runtime (without looking at it with gdb)?
Answering my own question. I was able to get the base address using mach_vm_region_recurse like below. The offset lands in vmoffset. If there is another way that is more "right" - don't hesitate to comment!
#include <stdio.h>
#include <mach/mach_init.h>
#include <sys/sysctl.h>
#include <mach/mach_vm.h>
...
mach_port_name_t task;
vm_map_offset_t vmoffset;
vm_map_size_t vmsize;
uint32_t nesting_depth = 0;
struct vm_region_submap_info_64 vbr;
mach_msg_type_number_t vbrcount = 16;
kern_return_t kr;
if ((kr = mach_vm_region_recurse(task, &vmoffset, &vmsize,
&nesting_depth,
(vm_region_recurse_info_t)&vbr,
&vbrcount)) != KERN_SUCCESS)
{
printf("FAIL");
}
Since you're calling current_task(), I assume you're aiming at your own process at runtime. So the base address you mentioned should be the dynamic base address, i.e. static base address + image slide caused by ASLR, right? Based on this assumption, you can use "Section and Segment Accessors" to get the static base address of your process, and then use the dyld functions to get the image slide. Here's a snippet:
#import <Foundation/Foundation.h>
#include </usr/include/mach-o/getsect.h>
#include <stdio.h>
#include </usr/include/mach-o/dyld.h>
#include <string.h>
uint64_t StaticBaseAddress(void)
{
const struct segment_command_64* command = getsegbyname("__TEXT");
uint64_t addr = command->vmaddr;
return addr;
}
intptr_t ImageSlide(void)
{
char path[1024];
uint32_t size = sizeof(path);
if (_NSGetExecutablePath(path, &size) != 0) return -1;
for (uint32_t i = 0; i < _dyld_image_count(); i++)
{
if (strcmp(_dyld_get_image_name(i), path) == 0)
return _dyld_get_image_vmaddr_slide(i);
}
return 0;
}
uint64_t DynamicBaseAddress(void)
{
return StaticBaseAddress() + ImageSlide();
}
int main (int argc, const char *argv[])
{
printf("dynamic base address (%0llx) = static base address (%0llx) + image slide (%0lx)\n", DynamicBaseAddress(), StaticBaseAddress(), ImageSlide());
while (1) {}; // you can attach to this process via gdb/lldb to view the base address now :)
return 0;
}
Hope it helps!

OpenSSL and multi-threads

I've been reading about the requirement that if OpenSSL is used in a multi-threaded application, you have to register a thread identification function (and also a mutex creation function) with OpenSSL.
On Linux, according to the example provided by OpenSSL, a thread is normally identified by registering a function like this:
static unsigned long id_function(void){
return (unsigned long)pthread_self();
}
pthread_self() returns a pthread_t, and this works on Linux since pthread_t is just a typedef of unsigned long.
On Windows pthreads, FreeBSD, and other operating systems, pthread_t is a struct, with the following structure:
struct {
void * p; /* Pointer to actual object */
unsigned int x; /* Extra information - reuse count etc */
}
This can't be simply cast to an unsigned long, and when I try to do so, it throws a compile error. I tried taking the void *p and casting that to an unsigned long, on the theory that the memory pointer should be consistent and unique across threads, but this just causes my program to crash a lot.
What can I register with OpenSSL as the thread identification function when using Windows pthreads or FreeBSD or any of the other operating systems like this?
Also, as an additional question:
Does anyone know if this also needs to be done if OpenSSL is compiled into and used with QT, and if so how to register QThreads with OpenSSL? Surprisingly, I can't seem to find the answer in QT's documentation.
I will just put this code here. It is not panacea, as it doesn't deal with FreeBSD, but it is helpful in most cases when all you need is to support Windows and and say Debian. Of course, the clean solution assumes usage of CRYPTO_THREADID_* family introduced recently. (to give an idea, it has a CRYPTO_THREADID_cmp callback, which can be mapped to pthread_equal)
#include <pthread.h>
#include <openssl/err.h>
#if defined(WIN32)
#define MUTEX_TYPE HANDLE
#define MUTEX_SETUP(x) (x) = CreateMutex(NULL, FALSE, NULL)
#define MUTEX_CLEANUP(x) CloseHandle(x)
#define MUTEX_LOCK(x) WaitForSingleObject((x), INFINITE)
#define MUTEX_UNLOCK(x) ReleaseMutex(x)
#define THREAD_ID GetCurrentThreadId()
#else
#define MUTEX_TYPE pthread_mutex_t
#define MUTEX_SETUP(x) pthread_mutex_init(&(x), NULL)
#define MUTEX_CLEANUP(x) pthread_mutex_destroy(&(x))
#define MUTEX_LOCK(x) pthread_mutex_lock(&(x))
#define MUTEX_UNLOCK(x) pthread_mutex_unlock(&(x))
#define THREAD_ID pthread_self()
#endif
/* This array will store all of the mutexes available to OpenSSL. */
static MUTEX_TYPE *mutex_buf=NULL;
static void locking_function(int mode, int n, const char * file, int line)
{
if (mode & CRYPTO_LOCK)
MUTEX_LOCK(mutex_buf[n]);
else
MUTEX_UNLOCK(mutex_buf[n]);
}
static unsigned long id_function(void)
{
return ((unsigned long)THREAD_ID);
}
int thread_setup(void)
{
int i;
mutex_buf = malloc(CRYPTO_num_locks() * sizeof(MUTEX_TYPE));
if (!mutex_buf)
return 0;
for (i = 0; i < CRYPTO_num_locks( ); i++)
MUTEX_SETUP(mutex_buf[i]);
CRYPTO_set_id_callback(id_function);
CRYPTO_set_locking_callback(locking_function);
return 1;
}
int thread_cleanup(void)
{
int i;
if (!mutex_buf)
return 0;
CRYPTO_set_id_callback(NULL);
CRYPTO_set_locking_callback(NULL);
for (i = 0; i < CRYPTO_num_locks( ); i++)
MUTEX_CLEANUP(mutex_buf[i]);
free(mutex_buf);
mutex_buf = NULL;
return 1;
}
I only can answer the Qt part. Use QThread::currentThreadId(), or even QThread::currentThread() as the pointer value should be unique.
From the OpenSSL doc you linked:
threadid_func(CRYPTO_THREADID *id) is needed to record the currently-executing thread's identifier into id. The implementation of this callback should not fill in id directly, but should use CRYPTO_THREADID_set_numeric() if thread IDs are numeric, or CRYPTO_THREADID_set_pointer() if they are pointer-based. If the application does not register such a callback using CRYPTO_THREADID_set_callback(), then a default implementation is used - on Windows and BeOS this uses the system's default thread identifying APIs, and on all other platforms it uses the address of errno. The latter is satisfactory for thread-safety if and only if the platform has a thread-local error number facility.
As shown providing your own ID is really only useful if you can provide a better ID than OpenSSL's default implementation.
The only fail-safe way to provide IDs, when you don't know whether pthread_t is a pointer or an integer, is to maintain your own per-thread IDs stored as a thread-local value.

Resources