Bit-wise alternative - random

I'm trying to write a shader that needs pseudo-random number generation per pixel - fetching from a texture is just too expensive.
All of the generators I've found use ^, <<, & operators, but the shader model I'm working on doesn't support these. Is there a mathematical equivalent of these operators I can use instead?
For reference, I'm valuing speed over precision.
Thanks!

Of those, the only one I know the mathematical equivalent to is the << operator. Namely:
N << X = N * (2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, etc)
e.g.
N << 5 = N * 32
Simply create a lookup for the value (2 ^ X), and multiply by that value.
The others are going to be more complicated, and will probably require that you write an algorithm to solve them. I don't think they have any direct mathematical equivalents.
The source code for a C runtime implementation might be useful for that. Or simply search for algorithms to implement each, such as: Fast implementation/approximation of pow() function in C/C++

Related

Optimize rank computation for high dimension tensor

My programme wastes a lot of time on the code below, whereas it's been executed on the GPU machine. How can I optimise it please? The tensors can be of this size y_ul.shape = [8, 512, 128, 128]
for i, m in enumerate(y_ul):
for j, l in enumerate(m):
ranks_topleft.append(torch.matrix_rank(l))
mean_rank_topleft = torch.mean(ranks_topleft.float())
Unlike torch.matrix_rank, torch.linalg.matrix_rank allows batched inputs documetation here. You can try:
ranks = torch.linalg.matrix_rank(y_ul) # shape (8, 512)
mean_rank_topleft = torch.mean(ranks) # scalar
Note that you can adjust the tolerance for computation (but the one by default should be good). Plus if your matrices are symmetric, you can add hermitian=True to speed-up calculations.
Note that torch.matrix_rank is deprecated and will be removed in further versions!

compression algorithm for sorted integers

I have a large sequence of random integers sorted from the lowest to the highest. The numbers start from 1 bit and end near 45 bits. In the beginning of the list I have numbers very close to each other: 4, 20, 23, 40, 66. But when the numbers start to get higher the distance between them is a bit higher too (actually the distance between them is aleatory). There are no duplicated numbers.
I'm using bit packing to save some space. Nonetheless, this file can get really big.
I would like to know what kind of compression algorithm can be used in this situation, or any other technique to save as much space as possible.
Thank you.
You can compress optimally if you know the true distribution of the data. If you can provide a probability distribution for each integer you can use arithmetic coding or other entropy coding techniques to compress to theoretical minimal size.
The trick is in predicting accurately.
First, you should probably compress the distances between the numbers because that allows you to make statistical statements. If you were to compress the numbers directly you'd have a hard time modelling them because they occur only once.
Next, you could try to build a very simple model to predict the next distance. Keep a histogram of all previously seen distances and calculate the probabilities from the frequencies.
You probably need to account for missing values (you clearly can't assign them 0 probability because that is not expressible) but you can use heuristics for that, like encoding the next distance bit-by-bit and predicting each bit individually. You will pay almost nothing for the high-order bits because they are almost always 0 and entropy encoding optimizes them away.
All of this is much simpler if you know the distribution. Example: You you are compressing a list of all prime numbers you know the theoretical distribution of distances because there are formulae for that. So you already have a perfect model.
There's a very simple and fairly effective compression technique which can be used for sorted integers in a known range. Like most compression schemes, it is optimized for serial access, although you can build an index to speed up random access if needed.
It's a type of delta encoding (i.e. each number is represented by the distance from the previous one), consisting of a vector of codes which are either
a single 1-bit, representing a delta of 2k which is added to the delta in the following code, or
a 0-bit followed by a k-bit delta, indicating that the next number is the specified delta from the previous one.
For example, if k is 4, the sequence:
00011 1 1 00000 1 00001
codes three numbers. The first four-bit encoding (3) is the first delta, taken from an initial value of 0, so the first number is 3. The next two solitary 1's accumulate to a delta of 2&centerdot;24, or 32, which is added to the following delta of 0000, for a total of 32. So the second number is 3+32=35. Finally, the last delta is a single 24 plus 1, total 17, and the third number is 35+17=52.
The 1-bit indicates that the next delta should be incremented by 2k (or, more generally, each delta is incremented by 2k times the number of immediately preceding 1-bits.)
Another, possibly better, way of thinking of this is that each delta is coded as a variable length bit sequence: 1i0(1|0)k, representing a delta of i&centerdot;2k+[the k-bit suffix]. But the first presentation aligns better with the optimality proof.
Since each "1" code represents an increment of 2k, there cannot be more than m/2k of them, where m is the largest number in the set to be compressed. The remaining codes all correspond to numbers, and have a total length of n&centerdot;(k + 1) where n is the size of the set. The optimal value of k is roughly log2 m/n, which in your case would be 7 or 8.
I did a quick proof of concept of the algorithm, without worrying about optimizations. It's still plenty fast; sorting the random sample takes a lot longer than compressing/decompressing it. I tried it with a few different seeds and vector sizes from 16,400,000 to 31,000,000 with a value range of [0, 4,000,000,000). The bits used per data value ranged from 8.59 (n=31000000) to 9.45 (n=16400000). All of the tests were done with 7-bit suffixes; log2 m/n varies from 7.01 (n=31000000) to 7.93 (n=16400000). I tried with 6-bit and 8-bit suffixes; except in the case of n=31000000 where the 6-bit suffixes were slightly smaller, the 7-bit suffix was always the best. So I guess that the optimal k is not exactly floor(log2 m/n) but it's not far off.
Compression code:
void Compress(std::ostream& os,
const std::vector<unsigned long>& v,
unsigned long k = 0) {
BitOut out(os);
out.put(v.size(), 64);
if (v.size()) {
unsigned long twok;
if (k == 0) {
unsigned long ratio = v.back() / v.size();
for (twok = 1; twok <= ratio / 2; ++k, twok *= 2) { }
} else {
twok = 1 << k;
}
out.put(k, 32);
unsigned long prev = 0;
for (unsigned long val : v) {
while (val - prev >= twok) { out.put(1); prev += twok; }
out.put(0);
out.put(val - prev, k);
prev = val;
}
}
out.flush(1);
}
Decompression:
std::vector<unsigned long> Decompress(std::istream& is) {
BitIn in(is);
unsigned long size = in.get(64);
if (size) {
unsigned long k = in.get(32);
unsigned long twok = 1 << k;
std::vector<unsigned long> v;
v.reserve(size);
unsigned long prev = 0;
for (; size; --size) {
while (in.get()) prev += twok;
prev += in.get(k);
v.push_back(prev);
}
}
return v;
}
It can be a bit awkward to use variable-length encodings; an alternative is to store the first bit of each code (1 or 0) in a bit vector, and the k-bit suffixes in a separate vector. This would be particularly convenient if k is 8.
A variant, which results in slight longer files but is a bit easier to build indexes for, is to only use the 1-bits as deltas. Then the deltas are always a&centerdot;2k for some a, possibly 0, where a is the number of consecutive 1 bits preceding the suffix code. The index then consists of the locations of every Nth 1-bit in the bit vector, and the corresponding index into the suffix vector (i.e. the index of the suffix corresponding with the next 0 in the bit vector).
One option that worked well for me in the past was to store a list of 64-bit integers as 8 different lists of 8-bit values. You store the high 8 bits of the numbers, then the next 8 bits, etc. For example, say you have the following 32-bit numbers:
0x12345678
0x12349785
0x13111111
0x13444444
The data stored would be (in hex):
12,12,13,13
34,34,11,44
56,97,11,44
78,85,11,44
I then ran that through the deflate compressor.
I don't recall what compression ratios I was able to achieve with this, but it was significantly better than compressing the numbers themselves.
I want to add another answer with the simplest possible solution:
Convert the numbers to deltas as discussed previously
Run it through the 7-zip LZMA2 algorithm. It is even multi-core ready
I think this will give almost perfect results in your case because the distances have a simple distribution. 7-zip will be able to pick it up.
You can use Delta Encoding and Protocol Buffers simply.
Like your example: 4, 20, 23, 40, 66.
Delta Encoding compressed: 4, 16, 3, 17, 26.
Then you store all numbers as varint in Protocol Buffers directly. Only need 1 byte for number between 0-127. And 2 bytes for number between 128-16384... This is enough for most scenes.
Further more you can use entropy coding(huffman) to achieve more effective compression rate than varint. Even less than 8bits per number.
Divide a number to 2 part. Like 17=...0001 0001(binary)=(5)0001. The first part (5) is valid bit count. The suffix part (0001) is without the leading 1.
Like the example: 4, 16, 3, 17, 26 = (3)00 (5)0000 (2)1 (5)0001 (5)1010
The first part will be between 0-45 even there are a lot of numbers. So they can be compressed by entropy coding like huffman effectively.
If your sequence is made up of pseudo-random numbers, such as might be generated by a typical digital computer, then I don't think that any compression scheme will beat, for brevity of representation, simply storing the code for the generator and whatever parameters you need to define its initial state.
If your sequence is made up of truly random numbers generated in some non-deterministic way then the other answers already posted offer a variety of good advice.

finding all first consecutive prime factors and find max of that by Mathematica

Let
2|n, 3|n,..., p_i|n, p_ j|n,..., p_k|n
p_i < p_ j< ... < p_k
where all primes up to p_i divide n and
j > i+1
I want to write a code in Mathematica to find p_i and determine {2,3,5,...,p_i}.
thanks.
B = {};
n = 2^6 * 3^8 * 5^3 * 7^2 * 11 * 23 * 29;
For[i = 1, i <= k, i++,
If[Mod[n, Prime[i]] == 0, AppendTo[B, Prime[i]]
If[Mod[n, Prime[i + 1]] > 0, Break[]]]];
mep1= Max[B];
B
mep1
result is
{2,3,5,7,11}
11
I would like to write the code instead of B to get B[n], since I need to draw the graph of mep1[n] for given n.
If I understand your question and code correctly you want a list of prime factors of the integer n but only the initial part of that list which matches the initial part of the list of all prime numbers.
I'll first observe that what you've posted looks much more like C or one of its relatives than like Mathematica. In fact you don't seem to have used any of the power of Mathematica's in-built functions at all. If you want to really use Mathematica you need to start familiarising yourself with these functions; if that doesn't appeal stick to C and its ilk, it's a fairly useful programming language.
The first step I'd take is to get the prime factors of n like this:
listOfFactors = Transpose[FactorInteger[n]][[1]]
Look at the documentation for the details of what FactorInteger returns; here I'm using transposition and part to get only the list of prime factors and to drop their coefficients. You may not notice the use of the Part function, the doubled square brackets are the usual notation. Note also that I don't have Mathematica on this machine so my syntax may be a bit awry.
Next, you want only those elements of listOfFactors which match the corresponding elements in the list of all prime numbers. Do this in two steps. First, get the integers from 1 to k at which the two lists match:
matches = TakeWhile[Range[Length[listOfFactors]],(listOfFactors[[#]]==Prime[#])&]
and then
listOfFactors[[matches]]
I'll leave it to you to:
assemble these fragments into the function you want;
correct the syntactical errors I have made; and
figured out exactly what is going on in each (sub-)expression.
I make no warranty that this approach is the best approach in any general sense, but it makes much better use of Mathematica's intrinsic functionality than your own first try and will, I hope, point you towards better use of the system in future.

a very strange question in mathematica

I am doing this in mma v7.0:
r[x_] := Rationalize[x, 0];
N[Nest[Sqrt, 10., 53] // r, 500]
It gave me
1.000000000000000222044604925031308084726333618164062500000000000000000
However, if I go one step further
N[Nest[Sqrt, 10., 54] // r, 500]
I got all zeros. Does anybody know an explanation, or it is a bug?
Besides, looks like this way to produce more digits from the solution Nest[Sqrt, 10., 53] is not working very well. How to obtain more significant digits for this calculation?
Many thanks.
Edit
If I did Nest[Sqrt, 10., 50], I still got a lot of significant digits.
You have no significant digits other than zeros if you do this 54 times. Hence rationalizing as you do (which simply preserves bit pattern) gives what you saw.
InputForm[n53 = Nest[Sqrt, 10., 53]]
Out[180]//InputForm=
1.0000000000000002
InputForm[n54 = Nest[Sqrt, 10., 54]]
Out[181]//InputForm=
1.
Rationalize[n53, 0]
4503599627370497/4503599627370496
Rationalize[n54, 0]
Out[183]= 1
For the curious: the issue is not loss of precision in the sense of degradation with iterations computation. Indeed, iterating these square roots actually increases precision. We can see this with bignum input.
InputForm[n54 = Nest[Sqrt, 10.`20, 54]]
Out[188]//InputForm=
1.0000000000000001278191493200323453724568038240908339267044`36.25561976585499
Here is the actual problem. When we use machine numbers then after 54 iterations there are no significant digits other than zeros in the resulting machine double. That is to say, our size restriction on the numbers is the cause.
The reason is not too mysterious. Call the resulting value 1+eps. Then we have (1+eps)^(2^54) equal (to close approximation) to 10. A second order expansion then shows eps must be smaller than machine epsilon.
InputForm[epsval =
First[Select[
eps /. N[Solve[Sum[eps^j*Binomial[2^54, j], {j, 2}] == 9, eps]],
Head[#] === Real && # > 0 &]]]
Out[237]//InputForm=
1.864563472253985*^-16
$MachineEpsilon
Out[235]= 2.22045*10^-16
Daniel Lichtblau
Wolfram Research
InputForm /# NestList[Sqrt, 10., 54]
10.
3.1622776601683795
1.7782794100389228
1.333521432163324
1.1547819846894583
1.0746078283213176
1.036632928437698
1.018151721718182
1.0090350448414476
1.0045073642544626
1.002251148292913
1.00112494139988
1.0005623126022087
1.00028111678778
1.0001405485169472
1.0000702717894114
1.000035135277462
1.0000175674844227
1.0000087837036347
1.0000043918421733
1.0000021959186756
1.000001097958735
1.0000005489792168
1.0000002744895706
1.000000137244776
1.0000000686223856
1.000000034311192
1.0000000171555958
1.0000000085777978
1.0000000042888988
1.0000000021444493
1.0000000010722245
1.0000000005361123
1.0000000002680562
1.0000000001340281
1.000000000067014
1.000000000033507
1.0000000000167535
1.0000000000083769
1.0000000000041884
1.0000000000020943
1.0000000000010472
1.0000000000005236
1.0000000000002618
1.000000000000131
1.0000000000000655
1.0000000000000329
1.0000000000000164
1.0000000000000082
1.000000000000004
1.000000000000002
1.0000000000000009
1.0000000000000004
1.0000000000000002
1.
Throwing N[x, 500] on this is like trying to squeeze water from a rock.
The calculations above are done in machine precision, which is very fast. If you are willing to give up speed, you can utilize Mathematica's arbitrary precision arithmetic by specifying a non-machine precision on the input values. The "backtick" can be used to do this (as in the example below) or you can use SetPrecision or SetAccuracy. Here I will specify that the input is the number 10 up to 20 digits of precision.
NestList[Sqrt, 10`20, 54]
10.000000000000000000
3.1622776601683793320
1.77827941003892280123
.
.
.
1.00000000000000051127659728012947952
1.00000000000000025563829864006470708
1.000000000000000127819149320032345372
As you can see you do not need to use InputForm as Mathematica will automatically print arbitrary-precision numbers to as many places as it accurately can.
If you do use InputForm or FullForm you will see a backtick and then a number, which is the current precision of that number.

Lists Hash function

I'm trying to make a hash function so I can tell if too lists with same sizes contain the same elements.
For exemple this is what I want:
f((1 2 3))=f((1 3 2))=f((2 1 3))=f((2 3 1))=f((3 1 2))=f((3 2 1)).
Any ideea how can I approch this problem ? I've tried doing the sum of squares of all elements but it turned out that there are collisions,for exemple f((2 2 5))=33=f((1 4 4)) which is wrong as the lists are not the same.
I'm looking for a simple approach if there is any.
Sort the list and then:
list.each do |current_element|
hash = (37 * hash + current_element) % MAX_HASH_VALUE
end
You're probably out of luck if you really want no collisions. There are N choose k sets of size k with elements in 1..N (and worse, if you allow repeats). So imagine you have N=256, k=8, then N choose k is ~4 x 10^14. You'd need a very large integer to distinctly hash all of these sets.
Possibly you have N, k such that you could still make this work. Good luck.
If you allow occasional collisions, you have lots of options. From simple things like your suggestion (add squares of elements) and computing xor the elements, to complicated things like sort them, print them to a string, and compute MD5 on them. But since collisions are still possible, you have to verify any hash match by comparing the original lists (if you keep them sorted, this is easy).
So you are looking something provides these properties,
1. If h(x1) == y1, then there is an inverse function h_inverse(y1) == x1
2. Because the inverse function exists, there cannot be a value x2 such that x1 != x2, and h(x2) == y1.
Knuth's Multiplicative Method
In Knuth's "The Art of Computer Programming", section 6.4, a multiplicative hashing scheme is introduced as a way to write hash function. The key is multiplied by the golden ratio of 2^32 (2654435761) to produce a hash result.
hash(i)=i*2654435761 mod 2^32
Since 2654435761 and 2^32 has no common factors in common, the multiplication produces a complete mapping of the key to hash result with no overlap. This method works pretty well if the keys have small values. Bad hash results are produced if the keys vary in the upper bits. As is true in all multiplications, variations of upper digits do not influence the lower digits of the multiplication result.
Robert Jenkins' 96 bit Mix Function
Robert Jenkins has developed a hash function based on a sequence of subtraction, exclusive-or, and bit shift.
All the sources in this article are written as Java methods, where the operator '>>>' represents the concept of unsigned right shift. If the source were to be translated to C, then the Java 'int' data type should be replaced with C 'uint32_t' data type, and the Java 'long' data type should be replaced with C 'uint64_t' data type.
The following source is the mixing part of the hash function.
int mix(int a, int b, int c)
{
a=a-b; a=a-c; a=a^(c >>> 13);
b=b-c; b=b-a; b=b^(a << 8);
c=c-a; c=c-b; c=c^(b >>> 13);
a=a-b; a=a-c; a=a^(c >>> 12);
b=b-c; b=b-a; b=b^(a << 16);
c=c-a; c=c-b; c=c^(b >>> 5);
a=a-b; a=a-c; a=a^(c >>> 3);
b=b-c; b=b-a; b=b^(a << 10);
c=c-a; c=c-b; c=c^(b >>> 15);
return c;
}
You can read details from here
If all the elements are numbers and they have a maximum, this is not too complicated, you sort those elements and then you put them together one after the other in the base of your maximum+1.
Hard to describe in words...
For example, if your maximum is 9 (that makes it easy to understand), you'd have :
f(2 3 9 8) = f(3 8 9 2) = 2389
If you maximum was 99, you'd have :
f(16 2 76 8) = (0)2081676
In your example with 2,2 and 5, if you know you would never get anything higher than 5, you could "compose" the result in base 6, so that would be :
f(2 2 5) = 2*6^2 + 2*6 + 5 = 89
f(1 4 4) = 1*6^2 + 4*6 + 4 = 64
Combining hash values is hard, I've found this way (no explanation, though perhaps someone would recognize it) within Boost:
template <class T>
void hash_combine(size_t& seed, T const& v)
{
seed ^= hash_value(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2);
}
It should be fast since there is only shifting, additions and xor taking place (apart from the actual hashing).
However the requirement than the order of the list does not influence the end-result would mean that you first have to sort it which is an O(N log N) operation, so it may not fit.
Also, since it's impossible without more stringent boundaries to provide a collision free hash function, you'll still have to actually compare the sorted lists if ever the hash are equals...
I'm trying to make a hash function so I can tell if two lists with same sizes contain the same elements.
[...] but it turned out that there are collisions
These two sentences suggest you are using the wrong tool for the job. The point of a hash (unless it is a 'perfect hash', which doesn't seem appropriate to this problem) is not to guarantee equality, or to provide a unique output for every given input. In the general usual case, it cannot, because there are more potential inputs than potential outputs.
Whatever hash function you choose, your hashing system is always going to have to deal with the possibility of collisions. And while different hashes imply inequality, it does not follow that equal hashes imply equality.
As regards your actual problem: a start might be to sort the list in ascending order, then use the sorted values as if they were the prime powers in the prime decomposition of an integer. Reconstruct this integer (modulo the maximum hash value) and there is a hash value.
For example:
2 1 3
sorted becomes
1 2 3
Treating this as prime powers gives
2^1.3^2.5^3
which construct
2.9.125 = 2250
giving 2250 as your hash value, which will be the same hash value as for any other ordering of 1 2 3, and also different from the hash value for any other sequence of three numbers that do not overflow the maximum hash value when computed.
A naïve approach to solving your essential problem (comparing lists in an order-insensitive manner) is to convert all lists being compared to a set (set in Python or HashSet in Java). This is more effective than making a hash function since a perfect hash seems essential to your problem. For almost any other approach collisions are inevitable depending on input.

Resources