I would like to walk the page table, so I have accessed the current->mm, but it gives NULL value.
I'm working on linux kernel 3.9 and I don't understand how could current->mm is zero.
Is there something I miss here?
It means you are in a kernel thread.
In Linux, kernel threads have no mm struct. A kernel thread borrows the mm from the previous user thread and records it in active_mm. So you should use active_mm instead.
More details:
in /kernel/sched/core.c you can find the following code:
static inline void
context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
...
if (!mm) {
next->active_mm = oldmm;
atomic_inc(&oldmm->mm_count);
enter_lazy_tlb(oldmm, next);
} else
switch_mm(oldmm, mm, next);
...
}
If the next thread has no mm (a kernel thread), the scheduler would not switch mm and just reuse the mm of the previous thread.
Need for active_mm assignment : The call to switch_mm(), which results in a TLB flush, is avoided by “borrowing” the mm_struct used by the previous task and placing it in task_struct→active_mm. This technique has made large improvements to context switches times.
Related
In the code snippet from CPP reference, the memory barriers std::memory_order_release and std::memory_order_relaxed are used for the success and failure cases respectively. When is it OK to use std::memory_order_release for both or std::memory_order_relaxed for both?
template<class T>
struct node
{
T data;
node* next;
node(const T& data) : data(data), next(nullptr) {}
};
template<class T>
class stack
{
std::atomic<node<T>*> head;
public:
void push(const T& data)
{
node<T>* new_node = new node<T>(data);
// put the current value of head into new_node->next
new_node->next = head.load(std::memory_order_relaxed);
// now make new_node the new head, but if the head
// is no longer what's stored in new_node->next
// (some other thread must have inserted a node just now)
// then put that new head into new_node->next and try again
while(!std::atomic_compare_exchange_weak_explicit(
&head,
&new_node->next,
new_node,
std::memory_order_release,
std::memory_order_relaxed))
; // the body of the loop is empty
// note: the above loop is not thread-safe in at least
// GCC prior to 4.8.3 (bug 60272), clang prior to 2014-05-05 (bug 18899)
// MSVC prior to 2014-03-17 (bug 819819). See member function version for workaround
}
};
Using relaxed for both would not be safe. If the compare_exchange succeeds, then head is updated with the value of new_node, and other threads reading head will get that pointer. However, without release ordering, the value written to new_node->next (now head->next) may not be globally visible yet, so if the other thread tries to read head->next it may see garbage, or misbehave in other ways.
Formally, the write to new_node->next needs to happen before any other thread tries to read it, which can only be ensured by having release ordering on the store that signals other threads that the value is ready. (Likewise, the thread that reads head needs to use acquire ordering.) With relaxed ordering on the success store, the happens-before relationship is not there, so the code has a data race and its behavior is undefined.
Using release for both would not make sense, because release ordering only makes sense for stores, and in the failure case, no store is performed. In fact, for this reason, passing std::memory_order_release for the failure ordering is actually illegal; this is stated on the page where you got the sample code from. Using acquire or seq_cst would be safe (stronger ordering is always safe) but unnecessary, and might cause a needless performance hit.
I am writing a linux phy driver that handles packet timestamping. The bottom half does the process of calculating timestamps and sending this info to the kernel networking stack and then to user space. The bottom half needs some information from the skb(packet) which the caller of the tasklet has. I am having difficulty passing this skb to the takslet. tasklet handler function doesnot take any input other than unsigned long. I am stuck here. Below is a code snippet for you understanding -
static void tx_ts_task(unsigned long val)
{
struct phyts *phyts = container_of(&val, struct phyts, int_flags);
//skb_copy(skb); ///want to access skb in this tasklet but I am unable to do this.
.
.
}
int tx_timestamp(struct phyts *phyts, struct sk_buff *skb, int len)
{
.
.
tasklet_schedule(&tx_ts_tasklet);
}
Appreciate your inputs. Thanks
Tasklet function receives the same data parameter that is specified in DECLARE_TASKLET/tasklet_init. Usually this a pointer to some (large) driver struct.
So basically, you can't pass runtime data between ISR and tasklet directly and should use some sort of shared variable (may be the above-mentioned struct) with proper locking.
I am working on refactoring some legacy code that suffers from deadlocks. There are two main root causes:
1) the same thread locking the same mutex multiple times, which should not difficult to resolve, and
2) the code occasionally calls into user defined functions which can enter the same code at the top level. I need to lock the mutex before calling user defined functions, but I might end up executing the same code again which will result in a deadlock situation. So, I need some mechanism to tell me that the mutex has already been locked and I should not lock it again. Any suggestions?
Here is a (very) brief summary of what the code does:
class TreeNode {
public:
// Assign a new value to this tree node
void set(const boost::any& value, boost::function<void, const TreeNode&> validator) {
boost::upgrade_lock<boost::shared_mutex> lock(mutexToTree_);
// call validator here
boost::upgrade_to_unique_lock<boost::shared_mutex> ulock(lock);
// set this TreeNode to value
}
// Retrieve the value of this tree node
boost::any get() {
boost::shared_lock<boost::shared_mutex> lock(mutexToTree_);
// get value for this tree node
}
private:
static boost::shared_mutex mutexToRoot_;
};
The problem is that the validator function can call into get(), which locks mutexToRoot_ on the same thread. I could modify mutexToRoot_ to be a recursive mutex but that would prevent other threads from reading the tree during get() operation, which is unwanted behavior.
Since C++11 you can use std::recursive_mutex, which allows the owning thread to call lock or try_lock without blocking/reporting failure, whereas the other threads will block on lock/receive false on try_lock until the owning thread calls unlock as many times as it called lock/try_lock before.
I am migrating a project that was run on bare-bone to linux, and need to eliminate some {disable,enable}_scheduler calls. :)
So I need a lock-free sync solution in a single writer, multiple readers scenario, where the writer thread cannot be blocked. I came up with the following solution, which does not fit to the usual acquire-release ordering:
class RWSync {
std::atomic<int> version; // incremented after every modification
std::atomic_bool invalid; // true during write
public:
RWSync() : version(0), invalid(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
do { // wait until the object is valid
currentVersion = version.load(std::memory_order_acquire);
} while (invalid.load(std::memory_order_acquire));
lambda();
std::atomic_thread_fence(std::memory_order_seq_cst);
// check if something changed
} while (version.load(std::memory_order_acquire) != currentVersion
|| invalid.load(std::memory_order_acquire));
}
void beginWrite() {
invalid.store(true, std::memory_order_relaxed);
std::atomic_thread_fence(std::memory_order_seq_cst);
}
void endWrite() {
std::atomic_thread_fence(std::memory_order_seq_cst);
version.fetch_add(1, std::memory_order_release);
invalid.store(false, std::memory_order_release);
}
}
I hope the intent is clear: I wrap the modification of a (non-atomic) payload between beginWrite/endWrite, and read the payload only inside the lambda function passed to sync().
As you can see, here I have an atomic store in beginWrite() where no writes after the store operation can be reordered before the store. I did not find suitable examples, and I am not experienced in this field at all, so I'd like some confirmation that it is OK (verification through testing is not easy either).
Is this code race-free and work as I expect?
If I use std::memory_order_seq_cst in every atomic operation, can I omit the fences? (Even if yes, I guess the performance would be worse)
Can I drop the fence in endWrite()?
Can I use memory_order_acq_rel in the fences? I don't really get the difference -- the single total order concept is not clear to me.
Is there any simplification / optimization opportunity?
+1. I happily accept any better idea as the name of this class :)
The code is basically correct.
Instead of having two atomic variables (version and invalid) you may use single version variable with semantic "Odd values are invalid". This is known as "sequential lock" mechanism.
Reducing number of atomic variables simplifies things a lot:
class RWSync {
// Incremented before and after every modification.
// Odd values mean that object in invalid state.
std::atomic<int> version;
public:
RWSync() : version(0) {}
template<typename F> void sync(F lambda) {
int currentVersion;
do {
currentVersion = version.load(std::memory_order_seq_cst);
// This may reduce calls to lambda(), nothing more
if(currentVersion | 1) continue;
lambda();
// Repeat until something changed or object is in an invalid state.
} while ((currentVersion | 1) ||
version.load(std::memory_order_seq_cst) != currentVersion));
}
void beginWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Invalidation requires sequential order
version.store(currentVersion + 1, std::memory_order_seq_cst);
}
void endWrite() {
// Writer may read version with relaxed memory order
currentVersion = version.load(std::memory_order_relaxed);
// Release order is sufficient for mark an object as valid
version.store(currentVersion + 1, std::memory_order_release);
}
};
Note the difference in memory orders in beginWrite() and endWrite():
endWrite() makes sure that all previous object's modifications have been completed. It is sufficient to use release memory order for that.
beginWrite() makes sure that reader will detect object being in invalid state before any futher object's modification is started. Such garantee requires seq_cst memory order. Because of that reader uses seq_cst memory order too.
As for fences, it is better to incorporate them into previous/futher atomic operation: compiler knows how to make the result fast.
Explanations of some modifications of original code:
1) Atomic modification like fetch_add() is intended for cases, when concurrent modifications (like another fetch_add()) are possible. For correctness, such modifications use memory locking or other very time-costly architecture-specific things.
Atomic assignment (store()) does not use memory locking, so it is cheaper than fetch_add(). You may use such assignment because concurrent modifications are not possible in your case (reader does not modify version).
2) Unlike to release-acquire semantic, which differentiate load and store operations, sequential consistency (memory_order_seq_cst) is applicable to every atomic access, and provide total order between these accesses.
The accepted answer is not correct. I guess the code should be something like "currentVersion & 1" instead of "currentVersion | 1". And subtler mistake is that, reader thread can go into lambda(), and after that, the write thread could run beginWrite() and write value to non-atomic variable. In this situation, write action in payload and read action in payload haven't happens-before relationship. concurrent access (without happens-before relationship) to non-atomic variable is a data race. Note that, single total order of memory_order_seq_cst does not means the happens-before relationship; they are consistent, but two kind of things.
Is SDL_GetMouseState function thread safe?
And in the example of SDL_GetMouseState, the SDL_PumpEvents, which is known to be thread-unsafe, is used. If SDL_GetMouseState is thread-safe, do I have to use SDL_PumpEvents which is thread-unsafe with it to make it properly working?
The code of this function is:
Uint32
SDL_GetMouseState(int *x, int *y)
{
SDL_Mouse *mouse = SDL_GetMouse();
if (x) {
*x = mouse->x;
}
if (y) {
*y = mouse->y;
}
return mouse->buttonstate;
}
And SDL_GetMouse just returns address of static global variable. Hence, there is nothing unsafe with it, but there is no atomicity.
However events are processed separately. If you don't prcess events, mouse structure wouldn't update and SDL_GetMouseState will give you outdated values. Documentation explicitly states you should call SDL_PumpEvents only in graphics thread (the one that initialised graphics system).
Worst case scenario is you reading vales from SDL_GetMouseState while other thread updates it. You could read old value, new value, or even a mix of two (e.g. x from new but y from old).