Fastest math programming language? - performance

I have an application that requires millions of subtractions and remainders, i originally programmed this algorithm inside of C#.Net but it takes five minutes to process this information and i need it faster than that.
I have considered perl and that seems to be the best alternative now. Vb.net was slower in testing. C++ may be better also. Any advice would be greatly appreciated.

You need a compiled language like Fortran, C, or C++. Other languages are designed to give you flexibility, object-orientation, or other advantages, and assume absolutely fastest performance is not your highest priority.
Know how to get maximum performance out of a single thread, and after you have done so investigate sharing the work across multiple cores, for example with MPI. To get maximum performance in a single thread, one thing I do is single-step it at the machine instruction level, to make sure it's not dawdling about in stuff that could be removed.

Some calculations are regular enough to take profit of GPGPUs: recent graphic cards are essentially specialized massively parallel numerical co-processors. For instance, you could code your numerical kernels in OpenCL. Otherwise, learn C++11 (not some earlier version of the C++ standard) or C. And in many cases Ocaml could be nearly as fast as C++ but much easier to code with.
Perhaps your problem can be handled by scilab or R, I did not understand it enough to help more.
And you might take advantage of your multi-core processor by e.g. using Pthreads or MPI
At last, the Linux operating system is perhaps better to deal with massive calculations. It is significant that most super computers use it today.

If execution speed is the highest priority, that usually means Fortran.

Try Julia: its killing feature is being easy to code in a high level concise way, while keeping performances at the same order of magnitude of Fortran/C.

PARI/GP is the best I have used so far. It's written in C.

Try to look at DMelt mathematical program. The program calls Java libraries. Java virtual machine can optimize long mathematical calculations for you.

The standard tool for mathmatic numerical operations in engineering is often Matlab (or as free alternatives octave or the already mentioned scilab).

Related

Performance of Google's Go?

So has anyone used Google's Go? I was wondering how the mathematical performance (e.g. flops) is compared to other languages with a garbage collector... like Java or .NET?
Has anyone investigated this?
Theoretical performance: The theoretical performance of pure Go programs is somewhere between C/C++ and Java. This assumes an advanced optimizing compiler and it also assumes the programmer takes advantage of all features of the language (be it C, C++, Java or Go) and refactors the code to fit the programming language.
Practical performance (as of July 2011): The standard Go compiler (5g/6g/8g) is currently unable to generate efficient instruction streams for high-performance numerical codes, so the performance will be lower than C/C++ or Java. There are multiple reasons for this: each function call has an overhead of a couple of additional instructions (compared to C/C++ or Java), no function inlining, average-quality register allocation, average-quality garbage collector, limited ability to erase bound checks, no access to vector instructions from Go, compiler has no support for SSE2 on 32-bit x86 CPUs, etc.
Bottom line: As a rule of thumb, expect the performance of numerical codes implemented in pure Go, compiled by 5g/6g/8g, to be 2 times lower than C/C++ or Java. Expect the performance to get better in the future.
Practical performance (September 2013): Compared to older Go from July 2011, Go 1.1.2 is capable of generating more efficient numerical codes but they remain to run slightly slower than C/C++ and Java. The compiler utilizes SSE2 instructions even on 32-bit x86 CPUs which causes 32-bit numerical codes to run much faster, most likely thanks to better register allocation. The compiler now implements function inlining and escape analysis. The garbage collector has also been improved but it remains to be less advanced than Java's garbage collector. There is still no support for accessing vector instructions from Go.
Bottom line: The performance gap seems sufficiently small for Go to be an alternative to C/C++ and Java in numerical computing, unless the competing implementation is using vector instructions.
The Go math package is largely written in assembler for performance.
Benchmarks are often unreliable and are subject to interpretation. For example, Robert Hundt's paper Loop Recognition in C++/Java/Go/Scala looks flawed. The Go blog post on Profiling Go Programs dissects Hundt's claims.
You're actually asking several different questions. First of all, Go's math performance is going to be about as fast as anything else. Any language that compiles down to native code (which arguably includes even JIT languages like .NET) is going to perform extremely well at raw math -- as fast as the machine can go. Simple math operations are very easy to compile into a zero-overhead form. This is the area where compiled (including JIT) languages have a advantage over interpreted ones.
The other question you asked was about garbage collection. This is, to a certain extent, a bit of a side issue if you're talking about heavy math. That's not to say that GC doesn't impact performance -- actually it impacts quite a bit. But the common solution for tight loops is to avoid or minimize GC sweeps. This is often quite simple if you're doing a tight loop -- you just re-use your old variables instead of constantly allocating and discarding them. This can speed your code by several orders of magnitude.
As for the GC implementations themselves -- Go and .NET both use mark-and-sweep garbage collection. Microsoft has put a lot of focus and engineering into their GC engine, and I'm obliged to think that it's quite good all things considered. Go's GC engine is a work in progress, and while it doesn't feel any slower than .NET's architecture, the Golang folks insist that it needs some work. The fact that Go's specification disallows destructors goes a long way in speeding things up, which may be why it doesn't seem that slow.
Finally, in my own anecdotal experience, I've found Go to be extremely fast. I've written very simple and easy programs that have stood up in my own benchmarks against highly-optimized C code from some long-standing and well-respected open source projects that pride themselves on performance.
The catch is that not all Go code is going to be efficient, just like not all C code is efficient. You've got to build it correctly, which often means doing things differently than what you're used to from other languages. The profiling blog post mentioned here several times is a good example of that.
Google did a study comparing Go to some other popular languages (C++, Java, Scala). They concluded it was not as strong performance-wise:
https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
Quote from the Conclusion, about Go:
Go offers interesting language features, which also allow for a concise and standardized notation. The compilers for this language are still immature, which reflects in both performance and binary sizes.

most suitable language for computationally and memory expensive algorithms

Let's say you have to implement a tool to efficiently solve an NP-hard problem, with unavoidable possible explosion of memory usage (the output size in some cases exponential to the input size) and you are particularly concerned about the performances of this tool at running time. The source code has also to be readable and understandable once the underlying theory is known, and this requirement is as important as the efficiency of the tool itself.
I personally think that 3 languages could be suitable for these three requirements: c++, scala, java.
They all provide the right abstraction on data types that makes it possible to compare different structures or apply the same algorithms (which is also important) to different data types.
C++ has the advantage of being statically compiled and optimized, and with function inlining (if the data structures and algorithms are designed carefully) and other optimisation techniques it's possible to achieve a performance close to that of pure C while maintaining a fairly good readability.
If you also put a lot of care in data representation you can optimise the cache performance, which can gain orders of magnitude in speed when the cache miss rate is low.
Java is instead JIT compiled, which allows to apply optimisations during runtime, and in this category of algorithms that could have different behaviours between different runs, that may be a plus. I fear instead that such an approach could suffer from garbage collector, however in the case of this algorithm it's common to continuously allocate memory and java heap performance is notoriously better than C/C++ and if you implement your own memory manager inside the language you could even achieve good efficiency.
This approach instead is not able to inline method invocation (which induces a huge performance penalty) and doesn't give you control over the cache performance. Among the pros there's a better and cleaner syntax than C++.
My concerns about scala are more or less the same as Java, plus the fact that I can't control how the language is optimised unless I have a deep knowledge on the compiler and the standard library. But well: I get a very clean syntax :)
What's your take on the subject? Have you had to deal with this already? Would you implement an algorithm with such properties and requirements in any of these languages or would you suggest something else? How would you compare them?
Usually I’d say “C++” in a heartbeat. The secret being that C++ simply produces less (memory) garbage that needs managing.
On the other hand, your observation that
however in the case of this algorithm it's common to continuously allocate memory
is a hint that Java / Scala may actually be more suited. But then you could use a small object heap in C++ as well. Boost has one that uses the standard allocator interface, if memory serves.
Another advantage of C++ is obviously the use of abstraction without penalty through templates – i.e. that you can easily create generic algorithmic components that can interact without incurring a runtime overhead due to abstraction. In fact, you noted that
it's possible to achieve a performance close to that of pure C while maintaining a fairly good readability
– this is looking at things the wrong way: Templates allow C++ to achieve performance superior to that of C while still maintaining high abstraction.
D might be worth a look, seeing as how it tries to be a better C++.
From a superficial glance, it has better source code readability than C++ does, so that's one of your points covered.
It also has memory management, which makes playing with algorithms a bit easier.
And templates
Here is a stackoverflow discussion comparing the performance of C++ and D
The languages you noticed were my first guesses as well.
Each language has a different take on how to handle specific issues like compilation, memory management and source code, but in theory, any of them should be fitting to your problem.
It is impossible to tell which is best, and there is likely no major difference if you are familiar enough with all of them to work around their respective quirks.
And obviously, if you actually find the need to optimize (I'm not sure if that's a given), that's possible in each language. Lower level languages obviously offer more options, but are also (far) more complex to actually improve.
A single note about C++ vs Java: This is really a holy war, and if you've followed the recent development you'll probably have your own opinion. I, for one, think Java offers enough good aspects to make up for its flaws, usually.
And a final note on C++ vs C: According to my knowledge, the difference usually amounts to a sufficiently low percentage to ignore this. It it doesn't make a difference for the source code, it's fine to go with C, if C++ could make for easier-to-read source code, go with C++. In any case, the choice is kind of negligible.
In the end, remember that money spent on a few hours of programming/optimizing this could as well go into slightly superior hardware to make up for missed tiny details.
It all boils down to: Any of your options is fine as long as you do it right (domain knowledge).
I would use a language which makes it very easy to work on the algorithm. Get the algorithm right and it could very easily outweigh any advantage from fine-tuning the wrong algorithm. Don't be scared to play around in a language normally thought of as slow in execution speed if that language makes it easier to express algorithmic ideas. It is usually much easier to transcribe the right algorithm into another language than it is to eek-out the last dregs of speed from the wrong algorithm in the fastest executing language.
So do it in a language you are comfortable with and which is expressive. You might surprise yourself and find that what is produced is fast enough!

Is it possible to design a dynamic language without significant performance loss?

Is it possible to design something like Ruby or Clojure without the significant performance loss in many situations compared with C/Java? Does hardware design play a role?
Edit: With significant I mean in an order of magnitudes, not just ten procent
Edit: I suspect that delnan is correct with me meaning dynamic languages so I changed the title
Performance depends on many things. Of course the semantics of the language have to be preserved even if we are compiling it - you can't remove dynamic dispatch from Ruby, it would speed things up drmatically but it would totally break 95% of the all Ruby code in the world. But still, much of the performance depends on how smart the implementation is.
I assume, by "high-level", you mean "dynamic"? Haskell and OCaml are extremely high-level, yet are is compiled natively and can outperform C# or Java, even C and C++ in some corner cases - especially if parallelism comes into play. And they certainly weren't designed with performance as #1 goal. But compiler writers, especially those focused onfunctional languages, are a very clever folk. If you or I started a high-level language, even if we used e.g. LLVM as backend for native compilation, we wouldn't get anywhere near this performance.
Making dynamic languages run fast is harder - they delay many decisions (types, members of a class/an object, ...) to runtime instead of compiletime, and while static code analysis can sometimes prove it's not possible in lines n and m, you still have to carry an advanced runtime around and do quite a few things a static language's compiler can do at compiletime. Even dynamic dispatch can be optimized with a smarter VM (Inline Cache anyone?), but it's a lot of work. More than a small new-fangeled language could do, that is.
Also see Steve Yegge's Dynamic Languages Strike Back.
And of course, what is a significant peformance loss? 100 times slower than C reads like a lot, but as we all know, 80% of execution time is spent in 20% of the code = 80% of the code won't have notable impact on the percieved performance of the whole program. For the remaining 20%, you can always rewrite it in C or C++ and call it from the dynamic language. For many applications, this suffices (for some, you don't even need to optimize). For the rest... well, if performance is that critical, you should propably write it in a language designed for performance.
Don't confuse the language design with the platform that it runs on.
For instance, Java is a high-level language. It runs on the JVM (as does Clojure - identified above, and JRuby - a Java version of Ruby). The JVM will perform byte-code analysis and optimise how the code runs (making use of escape analysis, just-in-time compilation etc.). So the platform has an effect on the performance that is largely independent of the language itself (see here for more info on Java performance and comparisons to C/C++)
Loss compared to what? If you need a garbage collector or closures then you need them, and you're going to pay the price regardless. If a language makes them easy for you to get at, that doesn't mean you have to use them when you don't need them.
If a language is interpreted instead of compiled, that's going to introduce an order of magnitude slowdown. But such a language may have compensating advantages, like ease of use, platform independence, and not having to compile. And, the programs you write in them may not run long enough for speed to be an issue.
There may be language implementations that introduce slowness for no good reason, but those don't have to be used.
You might want to look at what the DARPA HPCS initiative has come up with. There were 3 programming languages proposed: Sun's Fortress, IBM's X10 and Cray's Chapel. The latter two are still under development. Whether any of these meet your definition of high-level I don't know.
And yes, hardware design certainly does play a part. All 3 of these languages are targeted at supercomputers with very many processors and exhibit features appropriate to that domain.
It's certainly possible. For example, Objective-C is a dynamically-typed language that has performance comparable to C++ (although a wee bit slower, generally speaking, but still roughly equivalent).

Fortran's performance

Fortran's performances on Computer Language Benchmark Game are surprisingly bad. Today's result puts Fortran 14th and 11th on the two quad-core tests, 7th and 10th on the single cores.
Now, I know benchmarks are never perfect, but still, Fortran was (is?) often considered THE language for high performance computing and it seems like the type of problems used in this benchmark should be to Fortran's advantage. In an recent article on computational physics, Landau (2008) wrote:
However, [Java] is not as efficient or
as well supported for HPC and parallel
processing as are FORTRAN and C, the
latter two having highly developed
compilers and many more scientific
subroutine libraries available.
FORTRAN, in turn, is still the
dominant language for HPC, with
FORTRAN 90/95 being a surprisingly
nice, modern, and effective language;
but alas, it is hardly taught by any
CS departments, and compilers can be
expensive.
Is it only because of the compiler used by the language shootout (Intel's free compiler for Linux) ?
No, this isn't just because of the compiler.
What benchmarks like this -- where the program differs from benchmark to benchmark -- is largely the amount of effort (and quality of effort) that the programmer put into writing any given program. I suspect that Fortran is at a significant disadvantage in that particular metric -- unlike C and C++, the pool of programmers who'd want to try their hand at making the benchmark program better is pretty small, and unlike most anything else, they likely don't feel like they have something to prove either. So, there's no motivation for someone to spend a few days poring over generated assembly code and profiling the program to make it go faster.
This is fairly clear from the results that were obtained. In general, with sufficient programming effort and a decent compiler, neither C, C++, nor Fortran will be significantly slower than assembly code -- certainly not more than 5-10%, at worst, except for pathological cases. The fact that the actual results obtained here are more variant than that indicates to me that "sufficient programming effort" has not been expended.
There are exceptions when you allow the assembly to use vector instructions, but don't allow the C/C++/Fortran to use corresponding compiler intrinsics -- automatic vectorization is not even a close approximation of perfect and probably never will be. I don't know how much those are likely to apply here.
Similarly, an exception is in things like string handling, where you depend heavily on the runtime library (which may be of varying quality; Fortran is rarely a case where a fast string library will make money for the compiler vendor!), and on the basic definition of a "string" and how that's represented in memory.
Some random thoughts:
Fortran used to do very well because it was easier to identify loop invariants which made some optimizations easier for the compiler. Since then
Compilers have gotten much more sophisticated. Enormous effort has been put into c and c++ compilers in particular. Have the fortran compilers kept up? I suppose the gfortran uses the same back end of gcc and g++, but what of the intel compiler? It used to be good, but is it still?
Some languages have gotten a lot specialized keywords and syntax to help the compiler (restricted and const int const *p in c, and inline in c++). Not knowing fortran 90 or 95 I can't say if these have kept pace.
I've looked at these tests. It's not like the compiler is wrong or something. In most tests Fortran is comparable to C++ except some where it gets beaten by a factor of 10. These tests just reflect what one should know from the beggining - that Fortran is simply NOT an all-around interoperable programming language - it is suited for efficient computation, has good list operations & stuff but for example IO sucks unless you are doing it with specific Fortran-like methods - like e.g. 'unformatted' IO.
Let me give you an example - the 'reverse-complement' program that is supposed to read a large (of order of 10^8 B) file from stdin line-by-line, does something with it & prints the resulting large file to stdout. The pretty straighforward Fortran program is about 10 times slower on a single core (~10s) than a HEAVILY optimized C++ (~1s). When you try to play with the program, you'll see that only simple formatted read & write take more than 8 seconds. In a Fortran way, if you care for efficiency, you'd just write an unformatted structure to a file & read it back in no time (which is totally non-portable & stuff but who cares anyway - an efficient code is supposed to be fast & optimized for a specific machine, not able to run everywhere).
So the short answer is - don't worry, just do your job - and if you want to write a super-efficient operating system, than sorry - Fortran is just not the way for that kind of performance.
This benchmark is stupid at all.
For example, they measure CPU-time for the whole program to run. As mcmint stated (and it might be actually true) Fortran I/O sucks*. But who cares? In real-world tasks one read input for some seconds than do calculations for hours/days/months and finally write output for the seconds. Thats why in most benchmarks I/O operations are excluded from time measurements (if you of course do not benchmark I/O by itself).
Norber Wiener in his book God & Golem, Inc. wrote
Render unto man the things which are man’s and unto the computer the things which are the computer’s.
In my opinion the usage of this principle while implementing algorithm in any programming language means:
Write as readable and simple code as you can and let compiler do the optimizations.
Especially it is important in real-world (huge) applications. Dirty tricks (so heavily used in many benchmarks) even if they might improve the efficiency to some extent (5%, maybe 10%) are not for the real-world projects.
/* C/C++ uses stream I/O, but Fortran traditionally uses record-based I/O. Further reading. Anyway I/O in that benchmarks are so surprising. The usage of stdin/stdout redirection might also be the source of problem. Why not simply use the ability of reading/writing files provided by the language or standard library? Once again this woud be more real-world situation.
I would like to say that even if the benchmark do not bring up the best results for FORTRAN, this language will still be used and for a long time. Reasons of use are not just performance but also some kind of thing called easyness of programmability. Lots of people that learnt to use it in the 60's and 70's are now too old for getting into new stuff and they know how to use FORTRAN pretty well. I mean, there are a lot of human factors for a language to be used. The programmer also matters.
Considering they did not publish the exact compiler options they used for the Intel Fortran Compiler, I have little faith in their benchmark.
I would also remark that both Intel's math library, MKL, and AMD's math library, ACML, use the Intel Fortran Compiler.
Edit:
I did find the compilation options when you click on the benchmark's name. The result is surprising since the optimization level seems reasonable. It may come down to the efficiency of the algorithm.

Image Recognition

I'd like to do some work with the nitty-gritties of computer imaging. I'm looking for a way to read single pixels of data, analyze them programatically, and change them. What is the best language to use for this (Python, c++, Java...)? What is the best fileformat?
I don't want any super fancy software/APIs... I'm looking for the bare basics.
If you need speed (you'll probably always want speed with image processing) you definitely have to work with raw pixel data.
Java has some real disadvantages as you cannot access memory directly which makes pixel access quite slow compared to accessing the memory directly.
C++ is definitely the language of choice for production use image processing. But you can, for example, also use C# as it allows for unsafe code in specific areas. (Take a look at the scan0 pointer property of the bitmapdata class.)
I've used C# successfully for image processing applications and they are definitely much faster than their java counterparts.
I would not use any scripting language or java for such a purpose.
It's very east to manipulate the large multi-dimensional or complex arrays of pixel information that are pictures using high-level languages such as Python. There's a library called PIL (the Python Imaging Library) that is quite useful and will let you do general filters and transformations (change the brightness, soften, desaturate, crop, etc) as well as manipulate the raw pixel data.
It is the easiest and simplest image library I've used to date and can be extended to do whatever it is you're interested in (edge detection in very little code, for example).
I studied Artificial Intelligence and Computer Vision, thus I know pretty well the kind of tools that are used in this field.
Basically: you can use whatever you want as long as you know how it works behind the scene.
Now depending on what you want to achieve, you can either use:
C language, but you will lose a lot of time in bugs checking and memory management when implementing your algorithms. So theoretically, this is the fastest language to do that kind of job, but if your algorithms are not computationnally efficient (in terms of complexity) or if you lose too much time in bugs checking, this is clearly not worth it. So I would advise to first implement your application in another language, and then later you can always optimize small parts of your code with C bindings.
Octave/MatLab: very efficient language, almost as much as C, and you can make very elegant and succinct algorithms. If you are into vectorization, matrix and linear operations, you should go with that. However, you won't be able to develop a whole application with this language, it's more focused on algorithms, but then you can always develop an interface using another language later.
Python: all-in-one elegant and accessible language, used in gigantically large scale applications such as Google and Facebook. You can do pretty much everything you want with Python, any kind of application. It will be perfectly adapted if you want to make a full application (with client interaction and all, not only algorithms), or if you want to quickly draft a prototype using existent libraries since Python has a very large set of high quality libraries, like OpenCV. However if you only want to make algorithms, you should better use Octave/MatLab.
The answer that was selected as a solution is very biaised, and you should be careful about this kind of archaic comment.
Nowadays, hardware is cheaper than wetware (humans), and thus, you should use languages where you will be able to produce results faster, even if it's at the cost of a few CPU cycles or memory space.
Also, a lot of people tends to think that as long as you implement your software in C/C++, you are making the Saint Graal of speedness: this is just not true. First, because algorithms complexity matters a lot more than the language you are using (a bad algorithm will never beat a better algorithm, even if implemented in the slowest language in the universe), and secondly because high-level languages are nowadays doing a lot of caching and speed optimization for you, and this can make your program run even faster than in C/C++.
Of course, you can always do everything of the above in C/C++, but how much of your time are you willing to waste to reinvent the wheel?
Not only will C/C++ be faster, but most of the image processing sample code you find out there will be in C as well, so it will be easier to incorporate things you find.
if you are looking to numerical work on your images (think matrix) and you into Python check out http://www.scipy.org/PyLab - this is basically the ability to do matlab in python, buddy of mine swears by it.
(This might not apply for the OP who only wanted the bare basics -- but now that the speed issue was brought up, I do need to write this, just for the record.)
If you really need speed, it's better to forget about working on the pixel-by-pixel level, and rather see whether the operations that you need to perform could be vectorized. For example, for your C/C++ code you could use the excellent Intel IPP library (no, I don't work for Intel).
It depends a little on what you're trying to do.
If runtime speed is your issue then c++ is the best way to go.
If speed of development is an issue, though, I would suggest looking at java. You said that you wanted low level manipulation of pixels, which java will do for you. But the other thing that might be an issue is the handling of the various file formats. Java does have some very nice APIs to deal with the reading and writing of various image formats to file (in particular the java2d library. You choose to ignore the higher levels of the API)
If you do go for the c++ option (or python come to think of it) I would again suggest the use of a library to get you over the startup issues of reading and writing files. I've previously had success with libgd
What language do you know the best? To me, this is the real question.
If you're going to spend months and months learning one particular language, then there's no real advantage in using Python or Java just for their (to be proven) development speed.
I'm particularly proficient in C++ and I think that for this particular task I can be as speedy as a Java programmer, for example. With the aid of some good library (OpenCV comes to mind) you can create anything you need in a matter of a couple of lines of C++ code, really.
Short answer: C++ and OpenCV
Short answer? I'd say C++, you have far more flexibility in manipulating raw chunks of memory than Python or Java.

Resources