How to find eigenvalues of a big matrix in parallel? [duplicate]

How to find eigenvalues of a big matrix in parallel? [duplicate] - matrix

NOTE: We have a lot of segfault questions, with largely the same
answers, so I'm trying to collapse them into a canonical question like
we have for undefined reference.
Although we have a question covering what a segmentation fault
is, it covers the what, but doesn't list many reasons. The top answer says "there are many reasons", and only lists one, and most of the other answers don't list any reasons.
All in all, I believe we need a well-organized community wiki on this topic, which lists all the common causes (and then some) to get segfaults. The purpose is to aid in debugging, as mentioned in the answer's disclaimer.
I know what a segmentation fault is, but it can be hard to spot in the code without knowing what they often look like. Although there are, no doubt, far too many to list exhaustively, what are the most common causes of segmentation faults in C and C++?

WARNING!
The following are potential reasons for a segmentation fault. It is virtually impossible to list all reasons. The purpose of this list is to help diagnose an existing segfault.
The relationship between segmentation faults and undefined behavior cannot be stressed enough! All of the below situations that can create a segmentation fault are technically undefined behavior. That means that they can do anything, not just segfault -- as someone once said on USENET, "it is legal for the compiler to make demons fly out of your nose.". Don't count on a segfault happening whenever you have undefined behavior. You should learn which undefined behaviors exist in C and/or C++, and avoid writing code that has them!
More information on Undefined Behavior:
What is the simplest standard conform way to produce a Segfault in C?
Undefined, unspecified and implementation-defined behavior
How undefined is undefined behavior?
What Is a Segfault?
In short, a segmentation fault is caused when the code attempts to access memory that it doesn't have permission to access. Every program is given a piece of memory (RAM) to work with, and for security reasons, it is only allowed to access memory in that chunk.
For a more thorough technical explanation about what a segmentation fault is, see What is a segmentation fault?.
Here are the most common reasons for a segmentation fault error. Again, these should be used in diagnosing an existing segfault. To learn how to avoid them, learn your language's undefined behaviors.
This list is also no replacement for doing your own debugging work. (See that section at the bottom of the answer.) These are things you can look for, but your debugging tools are the only reliable way to zero in on the problem.
Accessing a NULL or uninitialized pointer
If you have a pointer that is NULL (ptr=0) or that is completely uninitialized (it isn't set to anything at all yet), attempting to access or modify using that pointer has undefined behavior.
int* ptr = 0;
*ptr += 5;
Since a failed allocation (such as with malloc or new) will return a null pointer, you should always check that your pointer is not NULL before working with it.
Note also that even reading values (without dereferencing) of uninitialized pointers (and variables in general) is undefined behavior.
Sometimes this access of an undefined pointer can be quite subtle, such as in trying to interpret such a pointer as a string in a C print statement.
char* ptr;
sprintf(id, "%s", ptr);
See also:
How to detect if variable uninitialized/catch segfault in C
Concatenation of string and int results in seg fault C
Accessing a dangling pointer
If you use malloc or new to allocate memory, and then later free or delete that memory through pointer, that pointer is now considered a dangling pointer. Dereferencing it (as well as simply reading its value - granted you didn't assign some new value to it such as NULL) is undefined behavior, and can result in segmentation fault.
Something* ptr = new Something(123, 456);
delete ptr;
std::cout << ptr->foo << std::endl;
See also:
What is a dangling pointer?
Why my dangling pointer doesn't cause a segmentation fault?
Stack overflow
[No, not the site you're on now, what is was named for.] Oversimplified, the "stack" is like that spike you stick your order paper on in some diners. This problem can occur when you put too many orders on that spike, so to speak. In the computer, any variable that is not dynamically allocated and any command that has yet to be processed by the CPU, goes on the stack.
One cause of this might be deep or infinite recursion, such as when a function calls itself with no way to stop. Because that stack has overflowed, the order papers start "falling off" and taking up other space not meant for them. Thus, we can get a segmentation fault. Another cause might be the attempt to initialize a very large array: it's only a single order, but one that is already large enough by itself.
int stupidFunction(int n)
{
return stupidFunction(n);
}
Another cause of a stack overflow would be having too many (non-dynamically allocated) variables at once.
int stupidArray[600851475143];
One case of a stack overflow in the wild came from a simple omission of a return statement in a conditional intended to prevent infinite recursion in a function. The moral of that story, always ensure your error checks work!
See also:
Segmentation Fault While Creating Large Arrays in C
Seg Fault when initializing array
Wild pointers
Creating a pointer to some random location in memory is like playing Russian roulette with your code - you could easily miss and create a pointer to a location you don't have access rights to.
int n = 123;
int* ptr = (&n + 0xDEADBEEF); //This is just stupid, people.
As a general rule, don't create pointers to literal memory locations. Even if they work one time, the next time they might not. You can't predict where your program's memory will be at any given execution.
See also:
What is the meaning of "wild pointer" in C?
Attempting to read past the end of an array
An array is a contiguous region of memory, where each successive element is located at the next address in memory. However, most arrays don't have an innate sense of how large they are, or what the last element is. Thus, it is easy to blow past the end of the array and never know it, especially if you're using pointer arithmetic.
If you read past the end of the array, you may wind up going into memory that is uninitialized or belongs to something else. This is technically undefined behavior. A segfault is just one of those many potential undefined behaviors. [Frankly, if you get a segfault here, you're lucky. Others are harder to diagnose.]
// like most UB, this code is a total crapshoot.
int arr[3] {5, 151, 478};
int i = 0;
while(arr[i] != 16)
{
std::cout << arr[i] << std::endl;
i++;
}
Or the frequently seen one using for with <= instead of < (reads 1 byte too much):
char arr[10];
for (int i = 0; i<=10; i++)
{
std::cout << arr[i] << std::endl;
}
Or even an unlucky typo which compiles fine (seen here) and allocates only 1 element initialized with dim instead of dim elements.
int* my_array = new int(dim);
Additionally it should be noted that you are not even allowed to create (not to mention dereferencing) a pointer which points outside the array (you can create such pointer only if it points to an element within the array, or one past the end). Otherwise, you are triggering undefined behaviour.
See also:
I have segfaults!
Forgetting a NUL terminator on a C string.
C strings are, themselves, arrays with some additional behaviors. They must be null terminated, meaning they have an \0 at the end, to be reliably used as strings. This is done automatically in some cases, and not in others.
If this is forgotten, some functions that handle C strings never know when to stop, and you can get the same problems as with reading past the end of an array.
char str[3] = {'f', 'o', 'o'};
int i = 0;
while(str[i] != '\0')
{
std::cout << str[i] << std::endl;
i++;
}
With C-strings, it really is hit-and-miss whether \0 will make any difference. You should assume it will to avoid undefined behavior: so better write char str[4] = {'f', 'o', 'o', '\0'};
Attempting to modify a string literal
If you assign a string literal to a char*, it cannot be modified. For example...
char* foo = "Hello, world!"
foo[7] = 'W';
...triggers undefined behavior, and a segmentation fault is one possible outcome.
See also:
Why is this string reversal C code causing a segmentation fault?
Mismatching Allocation and Deallocation methods
You must use malloc and free together, new and delete together, and new[] and delete[] together. If you mix 'em up, you can get segfaults and other weird behavior.
See also:
Behaviour of malloc with delete in C++
Segmentation fault (core dumped) when I delete pointer
Errors in the toolchain.
A bug in the machine code backend of a compiler is quite capable of turning valid code into an executable that segfaults. A bug in the linker can definitely do this too.
Particularly scary in that this is not UB invoked by your own code.
That said, you should always assume the problem is you until proven otherwise.
Other Causes
The possible causes of Segmentation Faults are about as numerous as the number of undefined behaviors, and there are far too many for even the standard documentation to list.
A few less common causes to check:
UD2 generated on some platforms due to other UB
c++ STL map::operator[] done on an entry being deleted
DEBUGGING
Firstly, read through the code carefully. Most errors are caused simply by typos or mistakes. Make sure to check all the potential causes of the segmentation fault. If this fails, you may need to use dedicated debugging tools to find out the underlying issues.
Debugging tools are instrumental in diagnosing the causes of a segfault. Compile your program with the debugging flag (-g), and then run it with your debugger to find where the segfault is likely occurring.
Recent compilers support building with -fsanitize=address, which typically results in program that run about 2x slower but can detect address errors more accurately. However, other errors (such as reading from uninitialized memory or leaking non-memory resources such as file descriptors) are not supported by this method, and it is impossible to use many debugging tools and ASan at the same time.
Some Memory Debuggers
GDB | Mac, Linux
valgrind (memcheck)| Linux
Dr. Memory | Windows
Additionally it is recommended to use static analysis tools to detect undefined behaviour - but again, they are a tool merely to help you find undefined behaviour, and they don't guarantee to find all occurrences of undefined behaviour.
If you are really unlucky however, using a debugger (or, more rarely, just recompiling with debug information) may influence the program's code and memory sufficiently that the segfault no longer occurs, a phenomenon known as a heisenbug.
In such cases, what you may want to do is to obtain a core dump, and get a backtrace using your debugger.
How to generate a core dump in Linux on a segmentation fault?
How do I analyse a program's core dump file with GDB when it has command-line parameters?

Related

Why is non-zeroed memory only a problem with big data usage?

I was doing a graded programming assignment — an implementation of Rope data structure. The grader fed it an initial string and a series of edit operations. I did my development in C++ on a Linux machine. After testing my solution locally with small inputs (a string of ca 10 chars) I posted it to the grader, but got Segmentation Fault on one of the test cases.
I have generated a random input data with the maximum size given in the assignment specs (the string of 300k characters). I also got the Segmentation Fault locally. After a short debugging I found out that the leaves of my tree had random left and right pointers instead of NULL. After replacing the new Vertex calls with new Vertex() (the latter calls the default constructor, unlike the former which leaves the memory as-is) the code worked fine and got accepted by the grader.
This however makes me wonder — why did my code work correctly with a small input, both locally and on the grader’s machine? Is some amount of heap guaranteed to be zeroed when I run a process? Is this an artifact of some previously run program? What exactly is happening here?

Uninitialised objects can have any value. Uninitialised pointers can contain null, they can contain valid pointers by coincidence, or contain invalid pointers. It is completely undefined. Your program will behave accordingly. And it’s quite possible that memory is filled with some amount of zeroes followed by some amount of rubbish.
There may be a compiler option that will fill uninitialised variables with data that is likely to lead to a crash. More likely, there may be compiler options warning you when you use an uninitialised variable.

Compiler assumptions about relative locations from memory objects

I wonder what assumptions compilers make about the relative locations of memory objects.
For example if we allocate two stack variables of size 1 byte each, right after another and initialize them both with zero, can a compiler optimize this case by only emitting one single instruction that overwrites both bytes in memory with zeros, because the compiler knows the relative position of both variables?
I am interested specifically in the more well known compilers like gcc, g++, clang, the Windows C/C++ compiler etc.

A compiler can optimize multiple assignments into one.
a = 0;
b = 0;
might become something like
*(short*)&a = 0;
The subtle part is "if we allocate two stack variables of size 1 byte each, right after another" since you cannot really do that. A compiler can shuffle stack positions around at will. Also, simply declaring variables will not necessarily mean any stack allocation. Variables might just be in registers. In C you would have to use alloca and even that does not provide "right after another".
Even more general, the C standard does not allow you to compare the memory positions of different objects. This is undefined behavior.

Overflow datastructure

Everyone knows about overflow in the programming languages, if it happens program goes to crash. However, it is not clear for me what happens actually with data which get out of the boundary. Could you explain me, saying, giving example on C++ or Java. For example, Integer can save maximum 4 byte, what will happen if one puts data more than 4 byte to Integer. How compiler will identify this undefined behaviour?

what will happen if one puts data more than 4 byte to Integer.
Typically the value will roll-over1, meaning it will jump from one end of its range to another.
This can be seen, even in Windows calculator. Start with the highest possible signed 32-bit value:
Now add one to it:
We overflowed the maximum value of a signed Dword (231-1).
1 - This is a typical result. Some architectures might actually generate an exception on integer overflow, so you shouldn't count on this behavior.
How compiler will identify this undefined behaviour?
The compiler won't identify it. That's the problem. C# can mitigate this with the checked keyword, which checks to make sure that any arithmetic done on an integer will not cause overflow/underflow.

Does it change performance to use a non-int counter in a loop?

I'm just curious and can't find the answer anywhere. Usually, we use an integer for a counter in a loop, e.g. in C/C++:
for (int i=0; i<100; ++i)
But we can also use a short integer or even a char. My question is: Does it change the performance? It's a few bytes less so the memory savings are negligible. It just intrigues me if I do any harm by using a char if I know that the counter won't exceed 100.

Probably using the "natural" integer size for the platform will provide the best performance. In C++ this is usually int. However, the difference is likely to be small and you are unlikely to find that this is the performance bottleneck.

Depends on the architecture. On the PowerPC, there's usually a massive performance penalty involved in using anything other than int (or whatever the native word size is) -- eg, don't use short or char. Float is right out, too.
You should time this on your particular architecture because it varies, but in my test cases there was ~20% slowdown from using short instead of int.

I can't provide a citation, but I've heard that you often do incur a little performance overhead by using a short or char.
The memory savings are nonexistant since it's a temporary stack variable. The memory it lives in will almost certainly already be allocated, and you probably won't save anything by using something shorter because the next variable will likely want to be aligned to a larger boundary anyway.

You can use whatever legal type you want in a for; it doesn't have to be integral or even built in. For example, you can use iterators as well:
for( std::vector<std::string>::iterator s = myStrings.begin(); myStrings.end() != s; ++s )
{
...
}
Whether or not it will have an impact on performance comes down to a question of how the operators you use are implemented. So in the above example that means end(), operator!=() and operator++().

This is not really an answer. I'm just exploring what Crashworks said about the PowerPC. As others have pointed out already, using a type that maps to the native word size should yield the shortest code and the best performance.
$ cat loop.c
extern void bar();
void foo()
{
int i;
for (i = 0; i < 42; ++i)
bar();
}
$ powerpc-eabi-gcc -S -O3 -o - loop.c
.
.
.L5:
bl bar
addic. 31,31,-1
bge+ 0,.L5
It is quite different with short i, instead of int i, and looks like won't perform as well either.
.L5:
bl bar
addi 3,31,1
extsh 31,3
cmpwi 7,31,41
ble+ 7,.L5

No, it really shouldn't impact performance.

It probably would have been quicker to type in a quick program (you did the most complex line already) and profile it, than ask this question here. :-)
FWIW, in languages that use bignums by default (Python, Lisp, etc.), I've never seen a profile where a loop counter was the bottleneck. Checking the type tag is not that expensive -- a couple instructions at most -- but probably bigger than the difference between a (fix)int and a short int.

Probably not as long as you don't do it with a float or a double. Since memory is cheap you would probably be best off just using an int.

An unsigned or size_t should, in theory, give you better results ( wow, easy people, we are trying to optimise for evil, and against those shouting 'premature' nonsense. It's the new trend ).
However, it does have its drawbacks, primarily the classic one: screw-up.
Google devs seems to avoid it to but it is pita to fight against std or boost.

If you compile your program with optimization (e.g., gcc -O), it doesn't matter. The compiler will allocate an integer register to the value and never store it in memory or on the stack. If your loop calls a routine, gcc will allocate one of the variables r14-r31 which any called routine will save and restore. So use int, because that causes the least surprise to whomever reads your code.

Thread safety of simultaneous updates of a variable to the same value

Is the following construct thread-safe, assuming that the elements of foo are aligned and sized properly so that there is no word tearing? If not, why not?
Note: The code below is a toy example of what I want to do, not my actual real world scenario. Obviously, there are better ways of coding the observable behavior in my example.
uint[] foo;
// Fill foo with data.
// In thread one:
for(uint i = 0; i < foo.length; i++) {
if(foo[i] < SOME_NUMBER) {
foo[i] = MAGIC_VAL;
}
}
// In thread two:
for(uint i = 0; i < foo.length; i++) {
if(foo[i] < SOME_OTHER_NUMBER) {
foo[i] = MAGIC_VAL;
}
}
This obviously looks unsafe at first glance, so I'll highlight why I think it could be safe:
The only two options are for an element of foo to be unchanged or to be set to MAGIC_VAL.
If thread two sees foo[i] in an intermediate state while it's being updated, only two things can happen: The intermediate state is < SOME_OTHER_NUMBER or it's not. If it is < SOME_OTHER_NUMBER, thread two will also try to set it to MAGIC_VAL. If not, thread two will do nothing.
Edit: Also, what if foo is a long or a double or something, so that updating it can't be done atomically? You may still assume that alignment, etc. is such that updating one element of foo will not affect any other element. Also, the whole point of multithreading in this case is performance, so any type of locking would defeat this.

On a modern multicore processor your code is NOT threadsafe (at least in most languages) without a memory barrier. Simply put, without explicit barriers each thread can see a different entirely copy of foo from caches.
Say that your two threads ran at some point in time, then at some later point in time a third thread read foo, it could see a foo that was completely uninitialized, or the foo of either of the other two threads, or some mix of both, depending on what's happened with CPU memory caching.
My advice - don't try to be "smart" about concurrency, always try to be "safe". Smart will bite you every time. The broken double-checked locking article has some eye-opening insights into what can happen with memory access and instruction reordering in the absence of memory barriers (though specifically about Java and it's (changing) memory model, it's insightful for any language).
You have to be really on top of your language's specified memory model to shortcut barriers. For example, Java allows a variable to be tagged volatile, which combined with a type which is documented as having atomic assignment, can allow unsynchronized assignment and fetch by forcing them through to main memory (so the thread is not observing/updating cached copies).

You can do this safely and locklessly with a compare-and-swap operation. What you've got looks thread safe but the compiler might create a writeback of the unchanged value under some circumstances, which will cause one thread to step on the other.
Also you're probably not getting as much performance as you think out of doing this, because having both threads writing to the same contiguous memory like this will cause a storm of MESI transitions inside the CPU's cache, each of which is quite slow. For more details on multithread memory coherence you can look at section 3.3.4 of Ulrich Drepper's "What Every Programmer Should Know About Memory".

If reads and writes to each array element are atomic (i.e. they're aligned properly with no word tearing as you mentioned), then there shouldn't be any problems in this code. If foo[i] is less than either of SOME_NUMBER or SOME_OTHER_NUMBER, then at least one thread (possibly both) will set it to MAGIC_VAL at some point; otherwise, it will be untouched. With atomic reads and writes, there are no other possibilities.
However, since your situation is more complicated, be very very careful -- make sure that foo[i] is truly only read once per loop and stored in a local variable. If you read it more than once during the same iteration, you could get inconsistent results. Even the slightest change you make to your code could immediately make it unsafe with race conditions, so comment heavily about the code with big red warning signs.

It's bad practice, you should never be in the state where two threads are accessesing the same variable at the same time, regardless of the consequences. The example you give is over simplified, any majority complex samples will almost always have problems associated with it.. ...
Remember: Semaphores are your friend!

That particular example is thread-safe.
There are no intermediate states really involved here.
That particular program would not get confused.
I would suggest a Mutex on the array, though.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

How to find eigenvalues of a big matrix in parallel? [duplicate] - matrix

Related

Why is non-zeroed memory only a problem with big data usage?

Compiler assumptions about relative locations from memory objects

Overflow datastructure

Does it change performance to use a non-int counter in a loop?

Thread safety of simultaneous updates of a variable to the same value

Categories

Resources