opencl Bitonic sort with 64 bits keys - sorting

I used the nvidia sdk Bitonic Sort and it works great for me.
BUT it is for 32 bits (uint) I really need ulong keys.
I have only typically 2^14 keys at a time and power of 2.
I searched all over but could not find any kernel designed for ulong.
I tried to modified nvidia sdk Bitonic Sort for using ulong keys but it does not work. The kernel does not crash but after de call to clEnqueueNDRangeKernel I got error : CL_INVALID_COMMAND_QUEUE
Can anybody tell me how to modify Bitonic Sort or another like RadixSort, or anything that can sort ulong keys?
I am running an NVIDIA Quadro 4000 OpenCL 1.1 CUDA 6.5.20 FULL_PROFILE
I used the original nvidia sdk BitonicSort.cl
I just used"ulong" for keys and value input and ouput instead of "uint"
Thanks for helping

I finally made a work-around to suit my need.
I used 30 bits of the key and 18 HIGH bits of the value for the sorting.
I only changed the comparators (2 lines):
'#define MASK 0xFFFFC000 // to suit your need
from this: if( (*keyA > *keyB) == dir )
to this: if(((*keyA > *keyB)||((*keyA == *keyB)&&((*valA & MASK)> (*valB & MASK)))) == dir )
It works fine.
But, still, it's a work-around and the question remains, what is we need 64 bits keys + a 32 bits (or 64 bits) value?

Related

Detect 32 or 64 bits machine (Understanding `32 << (^uint(0) >> 63)` )

It is said that the 32 << (^uint(0) >> 63) expression can be used to detect whether the machine is 32 or 64 bits.
How so?
UPDATE:
The question was closed because of
How can I determine the size of words in bits (32 or 64) on the architecture?
however, that answer has two problems,
on a 32-bit architecture, the result of the first step yields 0, and no matter how you shift it, the result of the second step will always be 0 (the given answer is wrong).
Moreover, as per The Go Programming Language, the constant are compiled at the compile time, so that const BitsPerWord will be a fixed value, the same as runtime.GOARCH which gives the arch of the compiled program, and cannot be used to detect the OS architecture no matter which one it runs on.
UPDATE:
Found that the most reliable and portable way is to check with this under Linux:
$ getconf LONG_BIT
64
It won't depend on the language implementation of any programming language, and it can be used in shell scripts too.

Is there a reason why arbitrary precision arithmetic (such as BigInt in JavaScript) is implemented in binary?

From this question, it seems Google Chrome and Node.js both chose to implement arbitrary precision arithmetic in binary. Is there a good reason to do that?
If we can add, subtract, multiply, or divide, and do 7 + 8 = 15 and carry to the next digit, it is faster than doing it bit by bit, with 7 + 8 needing to add two bits 4 times.
V8 developer here. Binary is a good choice because hardware is binary [*]. That doesn't mean that operations happen one bit at a time. In V8, a BigInt's "digits" are uintptr_t values, i.e. register-sized (32 bit on a 32-bit machine, 64 bit on a 64-bit machine) unsigned integers. See our blog post for an overview, and the source for all the gory details. FWIW, many other implementations (e.g. GMP, OpenJDK, Go, Dart) have made the same basic choice.
[*] Some hardware architectures have instructions for "binary coded decimal" arithmetic, which is similar to what you're describing, but this approach is (1) generally considered less efficient, and (2) not available on all architectures that we want V8 to run on.
One possible answer: it is done by adding two 32 or 64 bit integer together, so it is faster than doing it one decimal digit at a time.
To get the result of a multiplication, probably in one machine code cycle, two 64 bit integers can multiply and all digits of the result can be obtained.

Probability of a collision using 32 bit CRC of a unique 32 byte array

I am trying to figure out if using 32 bit CRC will produce collision on 32 byte array.
BackGround
My system reads some configuration whenever it boots up from an external flash. I store the SHA256 hash of the last know configuration and when ever I read the configuration I calculate the SHA256 hash and compare it. If the two hash are different then the data is different.
I need to take that SHA256 and make it into a 32bit hash for another part of the system (due to some legacy code restrictions).
Questions
Will there be a high number of collision if I compute the 32 bit CRC on the 32 byte hash from SHA256?
I calculate the probability of collision to be 0. Can you let me know if this is correct?
The number of sample K is always 2 in my problem (I think) because I am calculating 32 bit CRC on two 32 bytes byte array (SHA256 byte array).
see calculation here
That's correct, if by "0" you mean that very small number. That small number is the probability that you would get a 32-bit CRC from random data that accidentally matches what you were expecting. It is simply 2-32.

Octave - out of memory or dimension too large for Octave's index type

I'm aware of the fact that there are 3 questions with a similar exception message. Unfortunately none of the questions is answered and the comments could not solve my problem.
I use Octave 4.2.1 in the 64 bit version on a Windows 10 System with 16 GB RAM in total and ~11 GB free during runtime.
When i try to multiply a 60000 x 10 matrix with a 10 x 60000 matrix Octave comes up with the following exception:
error: out of memory or dimension too large for Octave's index type
This multiplication would result in a 60000 x 60000 matrix and therefore should not be a problem for a 64 bit index.
I can't even do zeros(60000,60000);
I don't get what i am doing wrong. Could someone point me into the right direction?
As is often the case, this error is often misinterpreted (maybe we should address this as a bug on the octave tracker already ;) )
>> 60000*60000
ans = 3.6000e+09
>> intmax
ans = 2147483647
>> 60000*60000 > intmax
ans = 1
I.e. the number of elements of the resulting 60000x60000 matrix is larger than the maximum integer representation supported by the system, therefore there is no way to linearly index such a matrix using an integer index.
Also, in order to use actual 64-bit indexing, you need to compile octave in that manner, as this tends not to be the default, but unfortunately that's not as straightforward as you might wish, as you'll have to use the respective 64-bit supporting libraries as well. More on that here.
Having said that, it may well be possible to make use of sparse matrices instead if your matrices are indeed sparse in nature. If not, you're essentially using 'big data', and you need to find workarounds, such as blockprocessing / mapping large arrays to files etc. It's worth reading up on common "big data" techniques. Unfortunately octave does not seem to support matlab's memmapfile command just yet, but you can simulate this using fwrite / fread / fseek appropriately to read appropriate ranges from a file.

Difference in integer size for 64-bit system(confuse with my old 32-bit pc system)

Few months ago i get myself a laptop with cpu intel i7-2630qm with a 64-bit windows. While practising my programming skils under this system , I encountered some difference in terms of integer size which makes me think that it's probably due to my new 64-bit system.
Let's take a look at a code.
The C Code :
#include <stdio.h>
int main(void)
{
int num = 20;
printf("%d %lld\n" , num , num);
return 0;
}
The Question :
1.) I remember before getting this new laptop , which mean that time i'm still using my old 32-bit system , when i run this code , the program will print the integer 20 while some random number next to it due to the %lld specifier.
2.)But this phenomena no longer happen when i'm using my new laptop , it will instead print both integer correctly , even if i change the variable num to type short.
3.)Is it on a 64-bit system , there's new integer promotion which will promote int to long long when it's use as an argument??Or is it short integer can be promoted to long long which is 64-bit too when pass as an argument??
4.)Besides that I'm quite confuse with one thing , on 16-bit system , int would be 16-bit and it would be 32-bit when it's on a 32-bit system.But why isn't it become 64-bit when it's on a 64-bit??
==================================================================================
Addon :
1.)I choose "console program(64-bit)" as my project on the IDE while using my new laptop but "console program" on my 32-bit old PC system.
2.)I've check the size of int under "console program(64-bit)" project using sizeof operator and it returns 32-bit while short still remain 16-bit.The only change is long type , it's 64-bit and long long still remain its usual 64-bit size.
You are seeing this side-effect because the calling convention is different for x64 code. The function arguments in 32-bit x86 code are passed on the stack. The printf() function will read a word from the stack that isn't part of the activation frame. The odds that it contains a value of 0 are extremely low.
In x64 code, the first 4 arguments for a function are passed through cpu registers, not the stack. The odds that the high word of the 64-bit register is zero by chance are quite good. Left there by a previous 64-bit operation that worked with small numbers. But certainly not guaranteed.
Trying to reason out the defined behavior of undefined behavior is otherwise not useful. Other than trying to guess how the language is implemented for the core that's in your machine. There are better resources for that. Learning the machine code that's applicable to your compiler is an excellent shortcut. Together with the decent debugger that shows you how your C code got translated into machine code. Machine code has no undefined behavior.
I do not have access to an windows 64-bit compiler right now, but my guess is the following.
Your question is not about integer promotion, but regarding how parameters are passed from the function caller to the called function. This is beyond the C specification, but it is interesting to know.
In 32-bit, all parameters are divided into 32-bit blocks as all registers can hold 32 bits. So in this case we have the following stack layout:
[ 32-bit format string pointer ][ num as 32-bit ][ num as 32-bit ] junk...
In 64-bit, all parameters are divided into 64-bit blocks as all registers can hold 64 bits. So the stack will contain the following:
[ 64-bit format string pointer ][ num as 64-bit ][ num as 64-bit ] junk...
The upper 32 bits of the 64-bit registers holding 32-bit values are conveniently set to zero.
So when printf is reading a 64-bit number, it will load the equivalent of two 32-bit registers on a 32-bit platform but only one 64-bit register, with high bits cleared, on a 64-bit platform.
(1 and 2) As already stated, the behaviour in this situation is undefined, so the compiler is allowed to behave differently for any reason or indeed no reason at all.
(3) The compiler is allowed to define int as 64-bit, in which case no promotion would be necessary because all the variables in question would be the same size. But it almost certainly doesn't.
(4) On most or all 64-bit compilers, int is 32-bits. This is because int has been 32 bits for so long that programmers have come to expect it and changing it would break existing code. As far as I know this isn't officially part of the standard, but it's one of those de-facto standards that are even harder to change. :-)
Everything you are describing is specific to whatever spec your compiler is using and the platform you are on (with the exception that long is guaranteed to be at least the same size as int):
Wikipedia entries:
long long
int
The c99 standard seeks to end this ambiguity by adding specific types; int32_t, uint64_t, etc. There's also a POSIX spec that defines u_int32_t, etc.
Edit: I missed the question about printf(), sorry. As #nos points out in the comments on your question, passing something other than a long long to %lld results in undefined behavior. This means there is no rhyme or reason as to what it will do; unicorns spontaneously appearing would not be out of the question.
Oh - and on every compiler and OS I know, int is 32 bit. Changing that has the potential to break things that depend on it being 32 bit.

Resources