Boost Version Numbers - boost

I don't understand why version numbers of the Boost Library are incremented only 1/100 (e.g. 1.33, 1.34, so on) even though major inclusions are made like huge libraries. Is there any strong motivation behind this?

It says in the Boost Faq:
What do the Boost version numbers mean? The scheme is x.y.z, where x is incremented only for massive changes, such as a reorganization of many libraries, y is incremented whenever a new library is added, and z is incremented for maintenance releases. y and z are reset to 0 if the value to the left changes.

Related

Gradle dependency resolution with ranges

How does Gradle resolve dependencies when version ranges are involved? Unfortunately, I couldn't find sufficient explanation on the official docs.
Suppose we have a project A that has two declared dependencies B and C. Those dependencies specify a range of versions, i.e. for B it is [2.0, 3.0) and for C it is [1.0, 1.5].
B itself does not have a dependency to C until version 2.8, so that version introduced a strict dependency to C in the range [1.1, 1.2].
Looking at this example, we might determine that the resolved versions are:
B: 2.8 (because it is the highest version in the range)
C: 1.2 (because given B at 2.8, this is the highest version that satisfies both required ranges)
In general, it is not clear to me how this entire algorithm is carried out exactly. In particular, if ranges are involved, every possible choice of a concrete version inside a range might introduce different transitive dependencies (as in my example with B introducing a dependency to C only at version 2.8) which can declare dependencies with ranges themselves, and so on, making the number of possibilities explode quickly.
Does it apply some sort of greedy strategy in that it tries to settle a version as early as possible and if later a new dependency is encountered that conflicts the already chosen one, it tries to backtrack and choose another version?
Any help in understanding this is very much appreciated.
EDIT: I've read here that the problem in general is NP-hard. So does Gradle actually simplify the process somehow, to make it solvable in polynomial time? And if so, how?

How to use a vector as index in eigen

In matlab, we can use a vector B to get the sublist of vector A by coding like A(B). But in Eigen, I am wondering if there is anyway to do that?
With the current development branch (since mid-2018) and starting with the 3.4 releases (which will likely be released start of 2019) you can simply write A(B), if B is either an Eigen object or a std::vector with integer coefficients.
See new features of the 3.4 release page for details.

Is this a bug in GSL's minimum finding routine?

I have run into a very strange problem using GSL recently. I'm trying to find the maximum of an ugly function by asking GSL to find the minimum of the function's negative image. This has been working fine with most of the functions I've been using it on up to now, but with certain functions, I get an error.
Specifically, the ruby GSL gem throws an exception when I feed my function to the minimum-finder, stating that the given endpoints do not enclose a minimum. However, the given endpoints DO enclose a minimum; furthermore, the results seem to depend on the initial estimate given.
When I ask GSL to find the minimum starting with a guess of 0.0, I get the error. When I ask it to find the minimum starting with a guess of 2.0, it finds a minimum. It seems strange that GSL would complain that there isn't a minimum in a given interval just depending on the initial guess.
Below is a script I've written to reproduce this error. I'm using GSL version 1.15 and the latest version of the Ruby/GSL gem wrapper, with Ruby version 1.9.3p392.
Will someone please try this script and see if they can reproduce these results? I would like to report this as a bug to the GSL maintainers, but I'd feel better doing that if I could get the error to occur on somebody else's computer too.
There are three values for the initial minimum guess provided in the script; starting with 0.0 causes GSL to throw the aforementioned error. Starting with 1.0 causes GSL to report a minimum, which happens to only be a local minimum. Starting with 2.0 causes GSL to report another minimum, which seems to be the global minimum I'm looking for.
Here is my test script:
require("gsl")
include GSL::Min
function = Function.alloc { |beta| -(((6160558822864*(Math.exp(4*beta))+523830424923*(Math.exp(3*beta))+1415357447750*(Math.exp(5*beta))+7106224104*(Math.exp(6*beta)))/(385034926429*(Math.exp(4*beta))+58203380547*(Math.exp(3*beta))+56614297910*(Math.exp(5*beta))+197395114*(Math.exp(6*beta))))-((1540139705716*(Math.exp(4*beta))+174610141641*(Math.exp(3*beta))+283071489550*(Math.exp(5*beta))+1184370684*(Math.exp(6*beta)))*(1540139705716*(Math.exp(4*beta))+174610141641*(Math.exp(3*beta))+283071489550*(Math.exp(5*beta))+1184370684*(Math.exp(6*beta)))/(385034926429*(Math.exp(4*beta))+58203380547*(Math.exp(3*beta))+56614297910*(Math.exp(5*beta))+197395114*(Math.exp(6*beta)))**2)) }
def find_maximum(fn1)
iter = 0; max_iter = 500
minimum = 0.0 # reasonable initial guess; causes GSL to crash!
#minimum = 1.0 # another initial guess, gets a local min
#minimum = 2.0 # this guess gets what appears to be the global min
a = -6.0
b = 6.0
#pretty wide interval
gmf = FMinimizer.alloc(FMinimizer::BRENT)
gmf.set(fn1, minimum, a, b)
#THIS line is failing (sometimes), complaining that the interval given doesn't contain a minimum. Which it DOES.
begin
iter += 1
status = gmf.iterate
status = gmf.test_interval(0.001, 0.0)
# puts("Converged:") if status == GSL::SUCCESS
a = gmf.x_lower
b = gmf.x_upper
minimum = gmf.x_minimum
# printf("%5d [%.7f, %.7f] %.7f %.7f\n",
# iter, a, b, minimum, b - a);
end while status == GSL::CONTINUE and iter < max_iter
minimum
end
puts find_maximum(function)
Please let me know what happens when you try this code, commenting out the different initial values for minimum. If you can see a reason why this is actually the intended behavior of GSL, I would appreciate that as well.
Thank you for your help!
I think this Mathematica notebook perfectly summarize my answer
Whenever you need to numerically minimize a particular function, you must understand it qualitatively (make plots) to avoid failure due to numerical errors. This plot shows a very important fact: Your function seems to require numerically difficult cancellations between very large numbers at beta ~ 6. Numerically issues can also arise at beta ~ -6.
Then, when you set the interval to be [-6, 6], your numerical errors can prevent GSL to find the minimum. A simple plot and few evaluations of the potentially dangerous terms can give a better understanding of the appropriate limits you must set in GSL routines.
Conclusion: if the minimum is around zero (always make plots of the functions you want to minimize!), then you must set a small interval around zero to avoid numerical problems when you have functions that depends on precise cancellation between exponentials.
Edit: In a comment you asked me to not consider the factor Abs[beta]. In this case, GSL will find multiple local minimum (=> you must diminish the interval)
. I also can't understand how you get the minimum around 1.93
GSL one-dimensional minimising routines report this error whenever your initial guess has a function value greater or equal than the function value at either extreme of the interval. So, indeed, it depends on the initial guess that you find one or another local minimum, or none at all. The error message is certainly misleading.

How to make my Haskell program faster? Comparison with C

I'm working on an implementation of one of the SHA3 candidates, JH. I'm at the point where the algorithm pass all KATs (Known Answer Tests) provided by NIST, and have also made it an instance of the Crypto-API. Thus I have began looking into its performance. But I'm quite new to Haskell and don't really know what to look for when profiling.
At the moment my code is consistently slower then the reference implementation written in C, by a factor of 10 for all input lengths (C code found here: http://www3.ntu.edu.sg/home/wuhj/research/jh/jh_bitslice_ref64.h).
My Haskell code is found here: https://github.com/hakoja/SHA3/blob/master/Data/Digest/JHInternal.hs.
Now I don't expect you to wade through all my code, rather I would just want some tips on a couple of functions. I have run some performance tests and this is (part of) the performance file generated by GHC:
Tue Oct 25 19:01 2011 Time and Allocation Profiling Report (Final)
main +RTS -sstderr -p -hc -RTS jh e False
total time = 6.56 secs (328 ticks # 20 ms)
total alloc = 4,086,951,472 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
roundFunction Data.Digest.JHInternal 28.4 37.4
word128Shift Data.BigWord.Word128 14.9 19.7
blockMap Data.Digest.JHInternal 11.9 12.9
getBytes Data.Serialize.Get 6.7 2.4
unGet Data.Serialize.Get 5.5 1.3
sbox Data.Digest.JHInternal 4.0 7.4
getWord64be Data.Serialize.Get 3.7 1.6
e8 Data.Digest.JHInternal 3.7 0.0
swap4 Data.Digest.JHInternal 3.0 0.7
swap16 Data.Digest.JHInternal 3.0 0.7
swap8 Data.Digest.JHInternal 1.8 0.7
swap32 Data.Digest.JHInternal 1.8 0.7
parseBlock Data.Digest.JHInternal 1.8 1.2
swap2 Data.Digest.JHInternal 1.5 0.7
swap1 Data.Digest.JHInternal 1.5 0.7
linearTransform Data.Digest.JHInternal 1.5 8.6
shiftl_w64 Data.Serialize.Get 1.2 1.1
Detailed breakdown omitted ...
Now quickly about the JH algorithm:
It's a hash algorithm which consists of a compression function F8, which is repeated as long as there exists input blocks (of length 512 bits). This is just how the SHA-functions operate. The F8 function consists of the E8 function which applies a round function 42 times. The round function itself consists of three parts:
a sbox, a linear transformation and a permutation (called swap in my code).
Thus it's reasonable that most of the time is spent in the round function. Still I would like to know how those parts could be improved. For instance: the blockMap function is just a utility function, mapping a function over the elements in a 4-tuple. So why is it performing so badly? Any suggestions would be welcome, and not just on single functions, i.e. are there structural changes you would have done in order to improve the performance?
I have tried looking at the Core output, but unfortunately that's way over my head.
I attach some of the heap profiles at the end as well in case that could be of interest.
EDIT :
I forgot to mention my setup and build. I run it on a x86_64 Arch Linux machine, GHC 7.0.3-2 (I think), with compile options:
ghc --make -O2 -funbox-strict-fields
Unfortunately there seems to be a bug on the Linux plattform when compiling via C or LLVM, giving me the error:
Error: .size expression for XXXX does not evaluate to a constant
so I have not been able to see the effect of that.
Switch to unboxed Vectors (from Array, used for constants)
Use unsafeIndex instead of incurring the bounds check and data dependency from safe indexing (i.e. !)
Unpack Block1024 as you did with Block512 (or at least use UnboxedTuples)
Use unsafeShift{R,L} so you don't incur the check on the shift value (coming in GHC 7.4)
Unfold the roundFunction so you have one rather ugly and verbose e8 function. This was significat in pureMD5 (the rolled version was prettier but massively slower than the unrolled version). You might be able to use TH to do this and keep the code smallish. If you do this then you'll have no need for constants as these values will be explicit in the code and result in a more cache friendly binary.
Unpack your Word128 values.
Define your own addition for Word128, don't lift Integer. See LargeWord for an example of how this can be done.
rem not mod
Compile with optimization (-O2) and try llvm (-fllvm)
EDIT: And cabalize your git repo along with a benchmark so we can help you easier ;-). Good work on including a crypto-api instance.
The lower graph shows that a lot of memory is occupied by lists. Unless there are more lurking in other modules, they can only come from e8. Maybe you'll have to bite the bullet and make that a loop instead of a fold, but for starters, since Block1024 is a pair, the foldl' doesn't do much evaluation on the fly (unless the strictness analyser has become significantly better). Try making that stricter, data Block1024 = B1024 !Block512 !Block512, perhaps it also needs {-# UNPACK #-} pragmas. In roundFunction, use rem instead of mod (this will only have minor impact, but it's a bit faster) and make the let bindings strict. In the swapN functions, you might get better performance giving the constants in the form W x y rather than as 128-bit hex numbers.
I can't guarantee those changes will help, but that's what looks most promising after a short glance.
Ok, so I thought I would chime in with an update of what I have done and the results obtained thus far. Changes made:
Switched from Array to UnboxedArray (made Word128 an instance type)
Used UnboxedArray + fold in e8 instead of lists and (prelude) fold
Used unsafeIndex instead of !
Changed type of Block1024 to a real datatype (similiar to Block512), and unpacked its arguments
Updated GHC to version 7.2.1 on Arch Linux, thus fixing the problem with compiling via C or LLVM
Switched mod to rem in some places, but NOT in roundFunction. When I do it there, the compile time suddenly takes an awful lot of time, and the run time becomes 10 times slower! Does anyone know why that may be? It is only happening with GHC-7.2.1, not GHC-7.0.3
I compile with the following options:
ghc-7.2.1 --make -O2 -funbox-strict-fields main.hs ./Tests/testframe.hs -fvia-C -optc-O2
And the results? Roughly 50 % reduction in time. On an input of ~107 MB, the code now use 3 minutes as compared to the previous 6-7 minutes. The C version uses 42 seconds.
Things I tried, but which didn't result in better performance:
Unrolled the e8 function like this:
e8 !h = go h 0
where go !x !n
| n == 42 = x
| otherwise = go h' (n + 1)
where !h' = roundFunction x n
Tried breaking up the swapN functions to use the underlying Word64' directly:
swap1 (W xh hl) =
shiftL (W (xh .&. 0x5555555555555555) (xl .&. 0x5555555555555555)) 1
.|.
shiftR (W (xh .&. 0xaaaaaaaaaaaaaaaa) (xl .&. 0xaaaaaaaaaaaaaaaa)) 1
Tried using the LLVM backend
All of these attempts gave worse performance than what I have currently. I don't know if thats because I'm doing it wrong (especially the unrolling of e8), or because they just are worse options.
Still I have some new questions with these new tweaks.
Suddenly I have gotten this peculiar bump in memory usage. Take a look at following heap profiles:
Why has this happened? Is it because of the UnboxedArray? And what does SYSTEM mean?
When I compile via C I get the following warning:
Warning: The -fvia-C flag does nothing; it will be removed in a future GHC release
Is this true? Why then, do I see better performance using it, rather than not?
It looks like you did a fair amount of tweaking already; I'm curious what the performance is like without explicit strictness annotations (BangPatterns) and the various compiler pragmas (UNPACK, INLINE)... Also, a dumb question: what optimization flags are you using?
Anyway, two suggestions which may be completely awful:
Use unboxed primitive types where you can (e.g. replace Data.Word.Word64 with GHC.Word.Word64#, make sure word128Shift is using Int#, etc.) to avoid heap allocation. This is, of course, non-portable.
Try Data.Sequence instead of []
At any rate, rather than looking at the Core output, try looking at the intermediate C files (*.hc) instead. It can be hard to wade through, but sometimes makes it obvious where the compiler wasn't quite as sharp as you'd hoped.

gcc implementation of rand()

I've tried for hours to find the implementation of rand() function used in gcc...
It would be much appreciated if someone could reference me to the file containing it's implementation or website with the implementation.
By the way, which directory (I'm using Ubuntu if that matters) contains the c standard library implementations for the gcc compiler?
rand consists of a call to a function __random, which mostly just calls another function called __random_r in random_r.c.
Note that the function names above are hyperlinks to the glibc source repository, at version 2.28.
The glibc random library supports two kinds of generator: a simple linear congruential one, and a more sophisticated linear feedback shift register one. It is possible to construct instances of either, but the default global generator, used when you call rand, uses the linear feedback shift register generator (see the definition of unsafe_state.rand_type).
You will find C library implementation used by GCC in the GNU GLIBC project.
You can download it sources and you should find rand() implementation. Sources with function definitions are usually not installed on a Linux distribution. Only the header files which I guess you already know are usually stored in /usr/include directory.
If you are familiar with GIT source code management, you can do:
$ git clone git://sourceware.org/git/glibc.git
To get GLIBC source code.
The files are available via FTP. I found that there is more to rand() used in stdlib, which is from [glibc][2]. From the 2.32 version (glibc-2.32.tar.gz) obtained from here, the stdlib folder contains a random.c file which explains that a simple linear congruential algorithm is used. The folder also has rand.c and rand_r.c which can show you more of the source code. stdlib.h contained in the same folder will show you the values used for macros like RAND_MAX.
/* An improved random number generation package. In addition to the
standard rand()/srand() like interface, this package also has a
special state info interface. The initstate() routine is called
with a seed, an array of bytes, and a count of how many bytes are
being passed in; this array is then initialized to contain
information for random number generation with that much state
information. Good sizes for the amount of state information are
32, 64, 128, and 256 bytes. The state can be switched by calling
the setstate() function with the same array as was initialized with
initstate(). By default, the package runs with 128 bytes of state
information and generates far better random numbers than a linear
congruential generator. If the amount of state information is less
than 32 bytes, a simple linear congruential R.N.G. is used.
Internally, the state information is treated as an array of longs;
the zeroth element of the array is the type of R.N.G. being used
(small integer); the remainder of the array is the state
information for the R.N.G. Thus, 32 bytes of state information
will give 7 longs worth of state information, which will allow a
degree seven polynomial. (Note: The zeroth word of state
information also has some other information stored in it; see setstate
for details). The random number generation technique is a linear
feedback shift register approach, employing trinomials (since there
are fewer terms to sum up that way). In this approach, the least
significant bit of all the numbers in the state table will act as a
linear feedback shift register, and will have period 2^deg - 1
(where deg is the degree of the polynomial being used, assuming
that the polynomial is irreducible and primitive). The higher order
bits will have longer periods, since their values are also
influenced by pseudo-random carries out of the lower bits. The
total period of the generator is approximately deg*(2deg - 1); thus
doubling the amount of state information has a vast influence on the
period of the generator. Note: The deg*(2deg - 1) is an
approximation only good for large deg, when the period of the shift
register is the dominant factor. With deg equal to seven, the
period is actually much longer than the 7*(2**7 - 1) predicted by
this formula. */

Resources