How to sleep() from kernel init? - debugging

I'm debugging some code of the kernel init with an oscilloscope by setting up values on GPIO, what is the best way to sleep() for a given time very early, i.e, in ddr3_init() ?
Thank you

You could use a busy loop that stops after a given time interval. This should sleep for one second (I'm not sure if it works, I put it together by looking at the time.h header):
#include <linux/time.h>
struct timespec start_ts = current_kernel_time();
s64 start = timespec_to_ns(&start_ts);
do {
struct timespec now_ts = current_kernel_time();
s64 now = timespec_to_ns(&now_ts);
} while (now - start < 1000000000ULL);

Related

Using <chrono> as a timer in bare-metal microcontroller?

Can chrono be used as a timer/counter in a bare-metal microcontroller (e.g. MSP432 running an RTOS)? Can the high_resolution_clock (and other APIs in chrono) be configured so that it increments based on the given microcontroller's actual timer tick/register?
The Real-Time C++ book (section 16.5) seems to suggest this is possible, but I haven't found any examples of this being applied, especially within bare-metal microcontrollers.
How could this be implemented? Would this be even recommended? If not, where can chrono aid in RTOS-based embedded software?
I would create a clock that implements now by reading from your timer register:
#include <chrono>
#include <cstdint>
struct clock
{
using rep = std::int64_t;
using period = std::milli;
using duration = std::chrono::duration<rep, period>;
using time_point = std::chrono::time_point<clock>;
static constexpr bool is_steady = true;
static time_point now() noexcept
{
return time_point{duration{"asm to read timer register"}};
}
};
Adjust period to whatever speed your processor ticks at (but it does have to be a compile-time constant). Above I've set it for 1 tick/ms. Here is how it should read for 1 tick == 2ns:
using period = std::ratio<1, 500'000'000>;
Now you can say things like:
auto t = clock::now(); // a chrono::time_point
and
auto d = clock::now() - t; // a chrono::duration

std::condition_variable why does it need a std::mutex

I am not sure if I really understand why std::condition_variable needs a additional std::mutex as a parameter? Should it not be locking by its self?
#include <iostream>
#include <condition_variable>
#include <thread>
#include <chrono>
std::condition_variable cv;
std::mutex cv_m;
int i = 0;
bool done = false;
void waits()
{
std::unique_lock<std::mutex> lk(cv_m);
std::cout << "Waiting... \n";
cv.wait(lk, []{return i == 1;});
std::cout << "...finished waiting. i == 1\n";
done = true;
}
void signals()
{
std::this_thread::sleep_for(std::chrono::seconds(1));
std::cout << "Notifying falsely...\n";
cv.notify_one(); // waiting thread is notified with i == 0.
// cv.wait wakes up, checks i, and goes back to waiting
std::unique_lock<std::mutex> lk(cv_m);
i = 1;
while (!done)
{
std::cout << "Notifying true change...\n";
lk.unlock();
cv.notify_one(); // waiting thread is notified with i == 1, cv.wait returns
std::this_thread::sleep_for(std::chrono::seconds(1));
lk.lock();
}
}
int main()
{
std::thread t1(waits), t2(signals);
t1.join();
t2.join();
}
Secondary, in the example they unlock the mutex first (signals method). Why are they doing this? Shoulden't they first lock and then unlock after notify?
The mutex protects the predicate, that is, the thing that you are waiting for. Since the thing you are waiting for is, necessarily, shared between threads, it must be protected somehow.
In your example above, i == 1 is the predicate. The mutex protects i.
It may be helpful to take a step back and think about why we need condition variables. One thread detects some state that prevents it from making forward progress and needs to wait for some other thread to change that state. This detection of state has to take place under a mutex because the state must be shared (otherwise, how could another thread change that state?).
But the thread can't release the mutex and then wait. What if the state changed after the mutex was released but before the thread managed to wait? So you need an atomic "unlock and wait" operation. That's specifically what condition variables provide.
With no mutex, what would they unlock?
The choice of whether to signal the condition variable before or after releasing the lock is a complex one with advantages on both sides. Generally speaking, you will get better performance if you signal while holding the lock.
A good rule of thumb to remember when working with multiple threads is that, when you ask a question, the result may be a lie. That is, the answer may have changed since it was given to you. The only way to reliably ask a question is to make it effectively single-threaded. Enter mutexes.
A condition variable waits for a trigger so that it can check its condition. To check its condition, it needs to ask a question.
If you do not lock before waiting, then it is possible that you ask the question and get the condition, and you are told that the condition is false. This becomes a lie as the trigger occurs and the condition becomes true. But you don't know this, since there's no mutex making this effectively single-threaded.
Instead, you wait on the condition variable for a trigger that will never fire, because it already did. This deadlocks.

How to send signal from kernel to user space

My kernel module code needs to send signal [def.] to a user land program, to transfer its execution to registered signal handler.
I know how to send signal between two user land processes, but I can not find any example online regarding the said task.
To be specific, my intended task might require an interface like below (once error != 1, code line int a=10 should not be executed):
void __init m_start(){
...
if(error){
send_signal_to_userland_process(SIGILL)
}
int a = 10;
...
}
module_init(m_start())
An example I used in the past to send signal to user space from hardware interrupt in kernel space. That was just as follows:
KERNEL SPACE
#include <asm/siginfo.h> //siginfo
#include <linux/rcupdate.h> //rcu_read_lock
#include <linux/sched.h> //find_task_by_pid_type
static int pid; // Stores application PID in user space
#define SIG_TEST 44
Some "includes" and definitions are needed. Basically, you need the PID of the application in user space.
struct siginfo info;
struct task_struct *t;
memset(&info, 0, sizeof(struct siginfo));
info.si_signo = SIG_TEST;
// This is bit of a trickery: SI_QUEUE is normally used by sigqueue from user space, and kernel space should use SI_KERNEL.
// But if SI_KERNEL is used the real_time data is not delivered to the user space signal handler function. */
info.si_code = SI_QUEUE;
// real time signals may have 32 bits of data.
info.si_int = 1234; // Any value you want to send
rcu_read_lock();
// find the task with that pid
t = pid_task(find_pid_ns(pid, &init_pid_ns), PIDTYPE_PID);
if (t != NULL) {
rcu_read_unlock();
if (send_sig_info(SIG_TEST, &info, t) < 0) // send signal
printk("send_sig_info error\n");
} else {
printk("pid_task error\n");
rcu_read_unlock();
//return -ENODEV;
}
The previous code prepare the signal structure and send it. Bear in mind that you need the application's PID. In my case the application from user space send its PID through ioctl driver procedure:
static long dev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) {
ioctl_arg_t args;
switch (cmd) {
case IOCTL_SET_VARIABLES:
if (copy_from_user(&args, (ioctl_arg_t *)arg, sizeof(ioctl_arg_t))) return -EACCES;
pid = args.pid;
break;
USER SPACE
Define and implement the callback function:
#define SIG_TEST 44
void signalFunction(int n, siginfo_t *info, void *unused) {
printf("received value %d\n", info->si_int);
}
In main procedure:
int fd = open("/dev/YourModule", O_RDWR);
if (fd < 0) return -1;
args.pid = getpid();
ioctl(fd, IOCTL_SET_VARIABLES, &args); // send the our PID as argument
struct sigaction sig;
sig.sa_sigaction = signalFunction; // Callback function
sig.sa_flags = SA_SIGINFO;
sigaction(SIG_TEST, &sig, NULL);
I hope it helps, despite the fact the answer is a bit long, but it is easy to understand.
You can use, e.g., kill_pid(declared in <linux/sched.h>) for send signal to the specified process. To form parameters to it, see implementation of sys_kill (defined as SYSCALL_DEFINE2(kill) in kernel/signal.c).
Note, that it is almost useless to send signal from the kernel to the current process: kernel code should return before user-space program ever sees signal fired.
Your interface is violating the spirit of Linux. Don't do that..... A system call (in particular those related to your driver) should only fail with errno (see syscalls(2)...); consider eventfd(2) or netlink(7) for such asynchronous kernel <-> userland communications (and expect user code to be able to poll(2) them).
A kernel module could fail to be loaded. I'm not familiar with the details (never coded any kernel modules) but this hello2.c example suggests that the module init function can return a non zero error code on failure.
People are really expecting that signals (which is a difficult and painful concept) are behaving as documented in signal(7) and what you want to do does not fit in that picture. So a well behaved kernel module should never asynchronously send any signal to processes.
If your kernel module is not behaving nicely your users would be pissed off and won't use it.
If you want to fork your experimental kernel (e.g. for research purposes), don't expect it to be used a lot; only then could you realistically break signal behavior like you intend to do, and you could code things which don't fit into the kernel module picture (e.g. add a new syscall). See also kernelnewbies.

OpenACC 2.0 routine: data locality

Take the following code, which illustrates the calling of a simple routine on the accelerator, compiled on the device using OpenACC 2.0's routine directive:
#include <iostream>
#pragma acc routine
int function(int *ARRAY,int multiplier){
int sum=0;
#pragma acc loop reduction(+:sum)
for(int i=0; i<10; ++i){
sum+=multiplier*ARRAY[i];
}
return sum;
}
int main(){
int *ARRAY = new int[10];
int multiplier = 5;
int out;
for(int i=0; i<10; i++){
ARRAY[i] = 1;
}
#pragma acc enter data create(out) copyin(ARRAY[0:10],multiplier)
#pragma acc parallel present(out,ARRAY[0:10],multiplier)
if (function(ARRAY,multiplier) == 50){
out = 1;
}else{
out = 0;
}
#pragma acc exit data copyout(out) delete(ARRAY[0:10],multiplier)
std::cout << out << std::endl;
}
How does function know to use the device copies of ARRAY[0:10] and multiplier when it is called from within a parallel region? How can we enforce the use of the device copies?
When your routine is called within a device region (the parallel in your code), it is being called by the threads on the device, which means those threads will only have access to arrays on the device. The compiler may actually choose to inline that function, or it may be a device-side function call. That means that you can know that when the function is called from the device it will be receiving device copies of the data because the function is essentially inheriting the present data clause from the parallel region. If you still want to convince yourself that you're running on the device once inside the function, you could call acc_on_device, but that only tells you that you're running on the accelerator, not that you received a device pointer.
If you want to enforce the use of device copies more than that, you could make the routine nohost so that it would technically not be valid to call from the host, but that doesn't really do what you're asking, which is to do a check on the GPU that the array really is a device array.
Keep in mind though that any code inside a parallel region that is not inside a loop will be run gang-redundantly, so the write to out is likely a race condition, unless you happen to be running with one gang or you write to it using an atomic.
Basically, when you involved "data" clause, the device will create/copy data to the device memory, then the block of code that defined with "acc routine" will be executed on the device. Notice that the memory between host and device does not share unlike multi-threading (OpenMP). So yes, "function" will be using the device copies of ARRAY and multiplier as long as it is under data segment. Hope this helps! :)
You should assign the function with one parallelism level such as gang/worker/vector. It's a more accurate way.
The routine will use the date in device memory.

Pthreads in Mac OS X - Mutexes issue

I'm trying to learn how to program parallel algorithms in C using POSIX threads. My environment is a Mac OS X 10.5.5 with gcc 4.
Compiling:
gcc -Wall -D_REENTRANT -lpthread source.c -o test.o
So, my problem is, if I compile this in a Ubuntu 9.04 box, it runs smoothly in thread order, on Mac looks like mutexes doesn't work and the threads don't wait to get the shared information.
Mac:
#1
#0
#2
#5
#3
#4
ubuntu
#0
#1
#2
#3
#4
#5
Any ideas?
Follow below the source code:
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <pthread.h>
#define NUM_THREADS 6
pthread_mutex_t mutexsum;
pthread_t threads[NUM_THREADS];
long Sum;
void *SumThreads(void *threadid){
int tmp;
int i,x[10],y[10];
// Para cada x e y do vetor, jogamos o valor de i, só para meio didáticos
for (i=0; i<10 ; i++){
x[i] = i;
y[i] = i;
}
tmp = Sum;
for (i=0; i<10 ; i++){
tmp += (x[i] * y[i]);
}
pthread_mutex_lock (&mutexsum);
Sum += tmp;
printf("Im thread #%ld sum until now is: %ld\n",threadid,Sum);
pthread_mutex_unlock (&mutexsum);
return 0;
}
int main(int argc, char *argv[]){
int i;
Sum = 0;
pthread_mutex_init(&mutexsum, NULL);
for(i=0; i<NUM_THREADS; i++){
pthread_create(&threads[i], NULL, SumThreads, (void *)i);
}
pthread_exit(NULL);
}
There is nothing on your code that will make your threads running in ANY order. If in Ubuntu is running on some order, it might be because you are just lucky. Try running 1000 times in Ubuntu and see if you get the same results over and over again.
The thing is, that you can't control the way the scheduler will make your threads access the processor(s). So, when you iterate through the for loop is creating your threads, you can't assume that the first call to pthread_create will get to run first, or will get to lock the mutex you are creating first. It's up to the scheduler which it at the OS level, and you can't control it, unless you write your own kernel :-).
If you want a serial behavior why would you run your code in separate threads in the first place? If it is just for experimentation, then one solution I can think of using pthread_signal to wake a specific thread up and make it running... Then the woken up thread can wake up the second one and so on so forth.
Hope it helps.
To my recollection, the variable you have protected isn't actually being shared amongst the processes. It exists in its own context inside each of the threads. So, it's really just a matter of when each thread gets scheduled that determines what will print.
I don't think one simple mutex will allow you to guarantee correctness, if correctness is defined as printing 0, 1, 2, 3 ...
what your code is doing is creating multiple execution contexts, using the code in your sum function as its execution code. the variable you are protecting, unless declared as static, will be unique to each call of that function.
in the end, it is coincidence that you are getting one system to print out correctly, because you have no logical method of blocking threads until it is their proper turn.
I don't do pthreads in C or any other language (but I do thread programming on high-performace computers) so this 'answer' might be useless to you;
What in your code requires the threads to pass the mutex in thread id order ? I see that the threads are created in id order, but what requires them to execute in that order /
If you do require your threads to execute in id order, why ? It seems a bit as if you are creating threads, then serialising them. To what end ?
When I program in threads and worry about execution order, I often try creating a very large number of threads and seeing what happens to the execution order.
As I say, ignore this if my lack of understanding of C and pthreads is too poor.

Resources