I am debugging a kernel crash dump. There seems to be a problem with one process was trying to memory map a new region. The problem is that it was not able to hold the memory map semaphore.
When I looked into process's mm_struct and printed its content. I saw that the struct rw_semaphore mmap_sem were as seen below. Now, does he value of count seem suspicious? It has a negative value, as if there was a race condition where it was decremented twice by two different threads after checking for zero.
mmap_sem = {
count = -4294967295,
wait_lock = {
{
rlock = {
raw_lock = {
slock = 262148
}
}
}
},
wait_list = {
next = 0xffff8801f0113e48,
prev = 0xffff8801f0113e48
}
},
Sorry for the confusion. I thought crash pulls the correct data types and uses that properly when printing out the all the values ...
Looks like crash utility is not read the count member as an int ....
When I print it as int, I get the correct value.
crash> p (int) (((struct mm_struct *) 0xffff8801f15fa540)->mmap_sem).count
$13 = 1
Related
So, I'm trying to do a exhaustive search of a hash. The hash itself is not important here. As I want to use all processing power of my CPU, I'm using Rayon to get a thread pool and lots of tasks. The search algorithm is the following:
let (tx, rx) = mpsc::channel();
let original_hash = String::from(original_hash);
rayon::spawn(move || {
let mut i = 0;
let mut iter = SenhaIterator::from(initial_pwd);
while i < max_iteracoes {
let pwd = iter.next().unwrap();
let clone_tx = tx.clone();
rayon::spawn(move || {
let hash = calcula_hash(&pwd);
clone_tx.send((pwd, hash)).unwrap();
});
i += 1;
}
});
let mut last_pwd = None;
let bar = ProgressBar::new(max_iteracoes as u64);
while let Ok((pwd, hash)) = rx.recv() {
last_pwd = Some(pwd);
if hash == original_hash {
bar.finish();
return last_pwd.map_or(ResultadoSenha::SenhaNaoEncontrada(None), |s| {
ResultadoSenha::SenhaEncontrada(s)
});
}
bar.inc(1);
}
bar.finish();
ResultadoSenha::SenhaNaoEncontrada(last_pwd)
Just a high level explanation: as the tasks go completing their work, they send a pair of (password, hash) to the main thread, which will compare the hash with the original hash (the one I'm trying to find a password for). If they match, great, I return to main with an enum value that indicates success, and the password that produces the original hash. After all iterations end, I'll return to main with an enum value that indicates that no hash was found, but with the last password, so I can retry from this point in some future run.
I'm trying to use Indicatif to show a progress bar, so I can get a glimpse of the progress.
But my problem is that the program is growing it's memory usage without a clear reason why. If I make it run with, let's say, 1 billion iterations, it goes slowly adding memory until it fills all available system memory.
But when I comment the line bar.inc(1);, the program behaves as expected, with normal memory usage.
I've created a test program, with Rayon and Indicatif, but without the hash calculation and it works correctly, no memory misbehavior.
That makes me think that I'm doing something wrong with memory management in my code, but I can't see anything obvious.
I found a solution, but I'm still not sure why it solves the original problem.
What I did to solve it is to transfer the progress code to the first spawn closure. Look at lines 6 and 19 below:
let (tx, rx) = mpsc::channel();
let original_hash = String::from(original_hash);
rayon::spawn(move || {
let mut bar = ProgressBar::new(max_iteracoes as u64);
let mut i = 0;
let mut iter = SenhaIterator::from(initial_pwd);
while i < max_iteracoes {
let pwd = iter.next().unwrap();
let clone_tx = tx.clone();
rayon::spawn(move || {
let hash = calcula_hash(&pwd);
clone_tx.send((pwd, hash)).unwrap();
});
i += 1;
bar.inc();
}
bar.finish();
});
let mut latest_pwd = None;
while let Ok((pwd, hash)) = rx.recv() {
latest_pwd = Some(pwd);
if hash == original_hash {
return latest_pwd.map_or(PasswordOutcome::PasswordNotFound(None), |s| {
PasswordOutcome::PasswordFound(s)
})
}
}
PasswordOutcome::PasswordNotFound(latest_pwd)
That first spawn closure has the role of fetching the next password to try and pass it over to a worker task, which calculates the corresponding hash and sends the pair (password, hash) to the main thread. The main thread will wait for the pairs from the rx channel and compare with the expected hash.
What is still missing from me is why tracking the progress on the outer thread leaks memory. I couldn't identify what is really leaking. But it's working now and I'm happy with the result.
i have declared a map below using stl and inserted some values in it.
#include<bits/stdc++.h>
int main()
{
map<int,int> m;
m[1]=1;
m[2]=1;
m[3]=1;
m[4]=1;
m[5]=1;
m[6]=1;
for(auto it=m.begin();it!=m.end();)
{
cout<<it->first<<" "<<it->second<<endl;
it=it++;
}
return 0;
}
When i executed the above written code it ended up in an infinite loop. Can someone tell me why it does so?
I am incrementing the value of iterator it and then it gets stored in it which should get incremented next time the loop is executed and eventually it should terminate normally.Am i wrong?
The bad line is it = it++;. It is undefined behavior! Because it is not defined, when it is increased, in your case it is increased before the assingment to itsself again, that the value of it before it is increased is assigned to it again and so it keeps at the first position. The correct line would be it = ++it; or only ++it;/it++;, because it changes itsself.
Edit
That is only undefined with the builtin types, but in here that is defined by the source-code of the map in the stl.
If you try doing something similar with an int, you'll get a warning:
int nums[] = { 1, 2, 3, 4, 5 };
for (int i = 0; i < sizeof nums / sizeof *nums; ) {
cout << nums[i] << '\n';
i = i++;
}
warning: operation on 'i' may be undefined [-Wsequence-point]
However, when you're using a class (std::map::iterator) which has operator overloading, the compiler probably isn't smart enought to detect this.
In other words, what you're doing is a sequence point violation, so the behavior is undefined behavior.
The post-increment operation would behave like this:
iterator operator ++ (int) {
auto copy = *this;
++*this;
return copy;
}
So, what happens to your increment step is that iterator it would get overwritten by the copy of its original value. If the map isn't empty, your loop would remain stuck on the first element.
Can anyone help shed some light on why the below code consumes well over 100 MB of RAM during runtime?
public struct Trie<Element : Hashable> {
private var children: [Element:Trie<Element>]
private var endHere : Bool
public init() {
children = [:]
endHere = false
}
public init<S : SequenceType where S.Generator.Element == Element>(_ seq: S) {
self.init(gen: seq.generate())
}
private init<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
(children, endHere) = ([head:Trie(gen:gen)], false)
} else {
(children, endHere) = ([:], true)
}
}
private mutating func insert<G : GeneratorType where G.Element == Element>(var gen: G) {
if let head = gen.next() {
let _ = children[head]?.insert(gen) ?? { children[head] = Trie(gen: gen) }()
} else {
endHere = true
}
}
public mutating func insert<S : SequenceType where S.Generator.Element == Element>(seq: S) {
insert(seq.generate())
}
}
var trie = Trie<UInt32>()
for i in 0..<300000 {
trie.insert([UInt32(i), UInt32(i+1), UInt32(i+2)])
}
Based on my calculations total memory consumption for the above data structure should be somewhere around the following...
3 * count * sizeof(Trie<UInt32>)
Or –
3 * 300,000 * 9 = 8,100,000 bytes = ~8 MB
How is it that this data structure consumes well over 100 MB during runtime?
sizeof reports only the static footprint on the stack, which the Dictionary is just kind of a wrapper of the reference to its internal reference type implementation, and also the copy on write support. In other words, the key-value pairs and the hash table of your dictionary are allocated on the heap, which is not covered by sizeof. This applies to all other Swift collection types.
In your case, you are creating three Trie - and indirectly three dictionaries - every iteration of the 300000. I wouldn't be surprised if the 96-byte allocations mentioned by #Macmade is the minimum overhead of a dictionary (e.g. its hash bucket).
There might also be cost related to growing storage. So you may try to see if setting a minimumCapacity on the dictionary would help. On the other hand, if you do not need a divergent path generated per iteration, you may consider an indirect enum as an alternative, e.g.
public enum Trie<Element> {
indirect case Next(Element, Trie<Element>)
case End
}
which should use less memory.
Size of your struct is 9 bytes, not 5.
You can check it with sizeof:
let size = sizeof( Trie< UInt32 > );
Also, you iterate 300'000 times, but insert 3 values (of course, it's a trie). So that's 900'000.
Anyway, that does not explain by itself the memory consumption you are observing.
I'm not really fluent in Swift, and I don't understand you code.
Maybe there's also some error in it, making it allocate more memory than needed.
But anyway, in order to understand what's happening, you need to run your code in Instruments (command-i).
On my machine, I can see 900'000 96 bytes allocations by swift_slowAlloc.
That's more like it...
Why 96 bytes, assuming there's no error in your code?
Well, it might be because of the way memory is allocated for your elements.
When satisfying a request, the memory allocator may allocate more memory than requested. That may be because it needs some internal metadata, because of paging, because of alignment, ...
But even though, it seems really exaggerated, so use instruments and double check what your code is doing.
I have a pretty complicated OpenCL app. It fires up 5 different contexts on 5 different GPUs, and executes the same kernel on all of them, splitting up the work into 1024 "chunks" to be processed.
Each time a kernel finishes, a result is checked for, and it's given a new chunk. Sometimes, when running, as the app is starting (very rarely mid-run) it will immediately segfault on the GetEventInfo call.
This is done in a loop using callbacks and clGetEventInfo calls to ensure something is finished before moving on to the next step.
GDB output:
(gdb) back
#0 0x00007fdc686ab525 in clGetEventInfo () from /usr/lib/libOpenCL.so.1
#1 0x00000000004018c1 in ready (event=0x26a00000267) at gputest.c:165
#2 0x0000000000404b5a in main (argc=9, argv=0x7fffdfe3b268) at gputest.c:544
The ready function:
int ready(cl_event event) {
int rdy;
if(!event)
return 0;
clGetEventInfo(event, CL_EVENT_COMMAND_EXECUTION_STATUS, sizeof(cl_int), &rdy, NULL);
if(rdy == CL_COMPLETE)
return 1;
return 0;
}
How the kernel is run, the event set, and checked. Some pseudocode inserted for brevity:
while(test if loop is complete) {
for(j = 0; j < GPUS; j++) {
if(gpu[j].waiting && loops < 9999) {
gpu[j].waiting = 0;
offset[j] = loops * 1024 * 1024;
loops++;
EC("kernel init", clEnqueueNDRangeKernel(queues[j], kernel_init[j], 1, &(offset[j]), &global_work_size, &work128, 0, NULL, &events[j]));
gpu[j].readsearch = events[j];
gpu[j].reading = 1;
}
}
for(j = 0; j < GPUS; j++) {
if(gpu[j].reading && ready(gpu[j].readsearch)) {
gpu[j].reading = 0;
gpu[j].waiting = 1;
// unrelated reporting other code here
}
}
}
Its pretty simple. There is more to the code, but it's unrelated. The ready/checking function is very simple. I even added debugging to the ready function to printf the event # to see what was happening when it crashed - nothing really. No pattern I could see.
What could be causing this?
Ugh. Found the problem. Since you cannot initialize values when you create/declare a struct, I was using some values uninitialized. I malloc'ed the gpu structs then just started using them. With if(gpu[x].reading &&...) being random data and completely uninitialized. So sometimes it was non-zero, which allowed the ready() function to fire off. Since the gpu[x].readsearch event was never set in the first place, clGetEventInfo bombed trying to use whatever was at the memory location.
This would be time number 482,847 that accidentally using uninitialized variables has burned me.
The following code, which maps simple value holders to booleans, runs over 20x faster in Java than Swift 2 - XCode 7 beta3, "Fastest, Aggressive Optimizations [-Ofast]", and "Fast, Whole Module Optimizations" turned on. I can get over 280M lookups/sec in Java but only about 10M in Swift.
When I look at it in Instruments I see that most of the time is going into a pair of retain/release calls associated with the map lookup. Any suggestions on why this is happening or a workaround would be appreciated.
The structure of the code is a simplified version of my real code, which has a more complex key class and also stores other types (though Boolean is an actual case for me). Also, note that I am using a single mutable key instance for the retrieval to avoid allocating objects inside the loop and according to my tests this is faster in Swift than an immutable key.
EDIT: I have also tried switching to NSMutableDictionary but when used with Swift objects as keys it seems to be terribly slow.
EDIT2: I have tried implementing the test in objc (which wouldn't have the Optional unwrapping overhead) and it is faster but still over an order of magnitude slower than Java... I'm going to pose that example as another question to see if anyone has ideas.
EDIT3 - Answer. I have posted my conclusions and my workaround in an answer below.
public final class MyKey : Hashable {
var xi : Int = 0
init( _ xi : Int ) { set( xi ) }
final func set( xi : Int) { self.xi = xi }
public final var hashValue: Int { return xi }
}
public func == (lhs: MyKey, rhs: MyKey) -> Bool {
if ( lhs === rhs ) { return true }
return lhs.xi==rhs.xi
}
...
var map = Dictionary<MyKey,Bool>()
let range = 2500
for x in 0...range { map[ MyKey(x) ] = true }
let runs = 10
for _ in 0...runs
{
let time = Time()
let reps = 10000
let key = MyKey(0)
for _ in 0...reps {
for x in 0...range {
key.set(x)
if ( map[ key ] == nil ) { XCTAssertTrue(false) }
}
}
print("rate=\(time.rate( reps*range )) lookups/s")
}
and here is the corresponding Java code:
public class MyKey {
public int xi;
public MyKey( int xi ) { set( xi ); }
public void set( int xi) { this.xi = xi; }
#Override public int hashCode() { return xi; }
#Override
public boolean equals( Object o ) {
if ( o == this ) { return true; }
MyKey mk = (MyKey)o;
return mk.xi == this.xi;
}
}
...
Map<MyKey,Boolean> map = new HashMap<>();
int range = 2500;
for(int x=0; x<range; x++) { map.put( new MyKey(x), true ); }
int runs = 10;
for(int run=0; run<runs; run++)
{
Time time = new Time();
int reps = 10000;
MyKey buffer = new MyKey( 0 );
for (int it = 0; it < reps; it++) {
for (int x = 0; x < range; x++) {
buffer.set( x );
if ( map.get( buffer ) == null ) { Assert.assertTrue( false ); }
}
}
float rate = reps*range/time.s();
System.out.println( "rate = " + rate );
}
After much experimentation I have come to some conclusions and found a workaround (albeit somewhat extreme).
First let me say that I recognize that this kind of very fine grained data structure access within a tight loop is not representative of general performance, but it does affect my application and I'm imagining others like games and heavily numeric applications. Also let me say that I know that Swift is a moving target and I'm sure it will improve - perhaps my workaround (hacks) below will not be necessary by the time you read this. But if you are trying to do something like this today and you are looking at Instruments and seeing the majority of your application time spent in retain/release and you don't want to rewrite your entire app in objc please read on.
What I have found is that almost anything that one does in Swift that touches an object reference incurs an ARC retain/release penalty. Additionally Optional values - even optional primitives - also incur this cost. This pretty much rules out using Dictionary or NSDictionary.
Here are some things that are fast that you can include in a workaround:
a) Arrays of primitive types.
b) Arrays of final objects as long as long as the array is on the stack and not on the heap. e.g. Declare an array within the method body (but outside of your loop of course) and iteratively copy the values to it. Do not Array(array) copy it.
Putting this together you can construct a data structure based on arrays that stores e.g. Ints and then store array indexes to your objects in that data structure. Within your loop you can look up the objects by their index in the fast local array. Before you ask "couldn't the data structure store the array for me" - no, because that would incur two of the penalties I mentioned above :(
All things considered this workaround is not too bad - If you can enumerate the entities that you want to store in the Dictionary / data structure you should be able to host them in an array as described. Using the technique above I was able to exceed the Java performance by a factor of 2x in Swift in my case.
If anyone is still reading and interested at this point I will consider updating my example code and posting.
EDIT: I'd add an option: c) It is also possible to use UnsafeMutablePointer<> or Unmanaged<> in Swift to create a reference that will not be retained when passed around. I was not aware of this when I started and I would hesitate to recommend it in general because it's a hack, but I've used it in a few cases to wrap a heavily used array that was incurring a retain/release every time it was referenced.