How computers calculate logarithm? - algorithm

I wanted to know how computers calculate logarithms?
I don't mean the related functions. For example, Python uses math.log() function. But I want to know what exactly does this function do? And can it be simulated again and more accurately?
Is there a formula for it? Or an algorithm? (I don't think the computer has a log table!)
Thanks

The GNU C library, for example, uses a call to the fyl2x() assembler instruction, which means that logarithms are calculated directly from the hardware.
Hence one should ask: what algorithm is used for calculating logarithms by computers?
Depends on the CPU, For intel IA64, they use the Taylor series combined with a table.
More info can be found here: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.24.5177
and here: http://www.computer.org/csdl/proceedings/arith/1999/0116/00/01160004.pdf

This is a hugely open, broad and "depends-on".
For every programming language, every different core library, every different system or so forth, different algorithms/mechanisms and machine-code instructions may be existing for performing mathematical (and any other type of) calculations.
Furthermore, even if all the programming languages, in this world, would have been using same algorithm "X", it still does not mean, that Computer calculates logarithm in way X, because, computer still will (most likely) be doing its machine-level job differently, in different circumstances, disregarding the point, that algorithms are the same (which is, very less likely).
Bear in mind, that Computer Architectures differ, Operating Systems differ, and Assembler Instructions can be very different from CPU to CPU.
I really think, that you should come up with more specific and concrete questions on this website.

Related

How to accurately measure performance of sorting algorithms

I have a bunch of sorting algorithms in C I wish to benchmark. I am concerned regarding good methodology for doing so. Things that could affect benchmark performance include (but are not limited to): specific coding of the implementation, programming language, compiler (and compiler options), benchmarking machine and critically the input data and time measuring method. How do I minimize the effect of said variables on the benchmark's results?
To give you a few examples, I've considered multiple implementations on two different languages to adjust for the first two variables. Moreover I could compile the code with different compilers on fairly mundane (and specified) arguments. Now I'm going to be running the test on my machine, which features turbo boost and whatnot and often boosts a core running stuff to the moon. Of course I will be disabling that and doing multiple runs and likely taking their mean completion time to adjust for that as well. Regarding the input data, I will be taking different array sizes, from very small to relatively large. I do not know what the increments should ideally be like, and what the range of the elements should be as well. Also I presume duplicate elements should be allowed.
I know that theoretical analysis of algorithms accounts for all of these methods, but it is crucial that I complement my study with actual benchmarks. How would you go about resolving the mentioned issues, and adjust for these variables once the data is collected? I'm comfortable with the technologies I'm working with, less so with strict methodology for studying a topic. Thank you.
You can't benchmark abstract algorithms, only specific implementations of them, compiled with specific compilers running on specific machines.
Choose a couple different relevant compilers and machines (e.g. a Haswell, Ice Lake, and/or Zen2, and an Apple M1 if you can get your hands on one, and/or an AArch64 cloud server) and measure your real implementations. If you care about in-order CPUs like ARM Cortex-A53, measure on one of those, too. (Simulation with GEM5 or similar performance simulators might be worth trying. Also maybe relevant are low-power implementations like Intel Silvermont whose out-of-order window is much smaller, but also have a shorter pipeline so smaller branch mispredict penalty.)
If some algorithm allows a useful micro-optimization in the source, or that a compiler finds, that's a real advantage of that algorithm.
Compile with options you'd use in practice for the use-cases you care about, like clang -O3 -march=native, or just -O2.
Benchmarking on cloud servers makes it hard / impossible to get an idle system, unless you pay a lot for a huge instance, but modern AArch64 servers are relevant and may have different ratios of memory bandwidth vs. branch mispredict costs vs. cache sizes and bandwidths.
(You might well find that the same code is the fastest sorting implementation on all or most of the systems you test one.
Re: sizes: yes, a variety of sizes would be good.
You'll normally want to test with random data, perhaps always generated from the same PRNG seed so you're sorting the same data every time.
You may also want to test some unusual cases like already-sorted or almost-sorted, because algorithms that are extra fast for those cases are useful.
If you care about sorting things other than integers, you might want to test with structs of different sizes, with an int key as a member. Or a comparison function that does some amount of work, if you want to explore how sorts do with a compare function that isn't as simple as just one compare machine instruction.
As always with microbenchmarking, there are many pitfalls around warm-up of arrays (page faults) and CPU frequency, and more. Idiomatic way of performance evaluation?
taking their mean completion time
You might want to discard high outliers, or take the median which will have that effect for you. Usually that means "something happened" during that run to disturb it. If you're running the same code on the same data, often you can expect the same performance. (Randomization of code / stack addresses with page granularity usually doesn't affect branches aliasing each other in predictors or not, or data-cache conflict misses, but tiny changes in one part of the code can change performance of other code via effects like that if you're re-compiling.)
If you're trying to see how it would run when it has the machine to itself, you don't want to consider runs where something else interfered. If you're trying to benchmark under "real world" cloud server conditions, or with other threads doing other work in a real program, that's different and you'd need to come up with realistic other loads that use some amount of shared resources like L3 footprint and memory bandwidth.
Things that could affect benchmark performance include (but are not limited to): specific coding of the implementation, programming language, compiler (and compiler options), benchmarking machine and critically the input data and time measuring method.
Let's look at this from a very different perspective - how to present information to humans.
With 2 variables you get a nice 2-dimensional grid of results, maybe like this:
A = 1 A = 2
B = 1 4 seconds 2 seconds
B = 2 6 seconds 3 seconds
This is easy to display and easy for humans to understand and draw conclusions from (e.g. from my silly example table it's trivial to make 2 very different observations - "A=1 is twice as fast as A=2 (regardless of B)" and "B=1 is faster than B=2 (regardless of A)").
With 3 variables you get a 3-dimensional grid of results, and with N variables you get an N-dimensional grid of results. Humans struggle with "3-dimensional data on 2-dimensional screen" and more dimensions becomes a disaster. You can mitigate this a little by "peeling off" a dimension (e.g. instead of trying to present a 3D grid of results you could show multiple 2D grids); but that doesn't help humans much.
Your primary goal is to reduce the number of variables.
To reduce the number of variables:
a) Determine how important each variable is for what you intend to observe (e.g. "which algorithm" will be extremely important and "which language" will be less important).
b) Merge variables based on importance and "logical grouping". For example, you might get three "lower importance" variables (language, compiler, compiler options) and merge them into a single "language+compiler+options" variable.
Note that it's very easy to overlook a variable. For example, you might benchmark "algorithm 1" on one computer and benchmark "algorithm 2" on an almost identical computer, but overlook the fact that (even though both benchmarks used identical languages, compilers, compiler options and CPUs) one computer has faster RAM chips, and overlook "RAM speed" as a possible variable.
Your secondary goal is to reduce number of values each variable can have.
You don't want massive table/s with 12345678 million rows; and you don't want to spend the rest of your life benchmarking to generate such a large table.
To reduce the number of values each variable can have:
a) Figure out which values matter most
b) Select the right number of values in order of importance (and ignore/skip all other values)
For example, if you merged three "lower importance" variables (language, compiler, compiler options) into a single variable; then you might decide that 2 possibilities ("C compiled by GCC with -O3" and "C++ compiled by MSVC with -Ox") are important enough to worry about (for what you're intending to observe) and all of the other possibilities get ignored.
How do I minimize the effect of said variables on the benchmark's results?
How would you go about resolving the mentioned issues, and adjust for these variables once the data is collected?
By identifying the variables (as part of the primary goal) and explicitly deciding which values the variables may have (as part of the secondary goal).
You've already been doing this. What I've described is a formal method of doing what people would unconsciously/instinctively do anyway. For one example, you have identified that "turbo boost" is a variable, and you've decided that "turbo boost disabled" is the only value for that variable you care about (but do note that this may have consequences - e.g. consider "single-threaded merge sort without the turbo boost it'd likely get in practice" vs. "parallel merge sort that isn't as influenced by turning turbo boost off").
My hope is that by describing the formal method you gain confidence in the unconscious/instinctive decisions you're already making, and realize that you were very much on the right path before you asked the question.

Comparing algorithmic performance to old methods

I have written a new algorithm for something. Now I need to compare it with existing methods, some of which are old about 10 years.
The idea I had is to look at benchmarks of different processors over the years in order to establish how much faster my processor (i7-920) is than average processor from 2003. Then I would simply divide old methods' execution time by the speedup factor and use those numbers to compare with my own algorithm.
Has something like this been done? So I don't redo the existing work.
Can such a comparison be done some other way?
Are there some scientific papers written about such comparisons which I can reference?
I don't know which of these are possible for you, but here's a list of options I can think of:
Run their implementation side-by-side on your machine against yours.
This is the best option.
Rewrite their implementations and do (1).
You preferably need to compare it against their test to ensure you get vaguely similar results.
Find a library that implements their algorithm (or multiple libraries) and do (1).
I suggest multiple libraries, if possible, since a single one may not have implemented the algorithm efficiently. You may also want to compare these against their test.
Compare the algorithms mathematically.
This may be difficult, but it's not impossible.
Do what you presented.
(a) I would not recommend this as there are other determining factors in your computer other than the processor speed that affect the speed of an algorithm. Getting an equation that perfectly balances these will likely be very difficult.
(b) There is a massive difference between top and bottom of the line computers, so using the average is not a particularly good idea. If the author didn't provide details regarding this, I'm afraid your benchmark is not likely to be too accurate.
Go out and buy a machine of similar specs to the one used by the desired test to benchmark on.
A 10-year-old machine should be pretty cheap, if you can find one. Also, see (5.b).
Contact the author to allow for any of the other options.
Papers often provide contact details of the authors, or you should be able to find them elsewhere if they have any sort of online presence and you're half-decent at using Google.
If I were reviewing your results, I would be annoyed if you attempted to demonstrate less than an order of magnitude speedup this way. There are a lot of variables determining algorithm performance, and I would be skeptical that a generic benchmark could capture the right ones. My gold standard is old and new algorithms implemented by the same programmer, with similar effort made to optimize, running on the same hardware. Using the previous authors' implementation instead of making a new one is commonplace in the experimental algorithms literature, but using different hardware isn't.
Algorithmic performance is usually measured in big-O terms, for which it is better to count basic operations, like comparisons, and do it for a range of input sizes.
If you must measure overall time, at least eliminate other sources of difference.
As #larsmans said, do it on the same processor.
Also, if there is existing work, there's no harm in repeating it.
Generally, in science, that's a good thing.
You should attempt to reduce the amount of differing factors between the two runs. I think just run-timing the two algorithms side by side on the same machine and/or comparing their Big O times are both equally valid and important. You should also attempt to use updated libraries and other external functions; using outdated ones my also be the cause of timing results.

Fastest math programming language?

I have an application that requires millions of subtractions and remainders, i originally programmed this algorithm inside of C#.Net but it takes five minutes to process this information and i need it faster than that.
I have considered perl and that seems to be the best alternative now. Vb.net was slower in testing. C++ may be better also. Any advice would be greatly appreciated.
You need a compiled language like Fortran, C, or C++. Other languages are designed to give you flexibility, object-orientation, or other advantages, and assume absolutely fastest performance is not your highest priority.
Know how to get maximum performance out of a single thread, and after you have done so investigate sharing the work across multiple cores, for example with MPI. To get maximum performance in a single thread, one thing I do is single-step it at the machine instruction level, to make sure it's not dawdling about in stuff that could be removed.
Some calculations are regular enough to take profit of GPGPUs: recent graphic cards are essentially specialized massively parallel numerical co-processors. For instance, you could code your numerical kernels in OpenCL. Otherwise, learn C++11 (not some earlier version of the C++ standard) or C. And in many cases Ocaml could be nearly as fast as C++ but much easier to code with.
Perhaps your problem can be handled by scilab or R, I did not understand it enough to help more.
And you might take advantage of your multi-core processor by e.g. using Pthreads or MPI
At last, the Linux operating system is perhaps better to deal with massive calculations. It is significant that most super computers use it today.
If execution speed is the highest priority, that usually means Fortran.
Try Julia: its killing feature is being easy to code in a high level concise way, while keeping performances at the same order of magnitude of Fortran/C.
PARI/GP is the best I have used so far. It's written in C.
Try to look at DMelt mathematical program. The program calls Java libraries. Java virtual machine can optimize long mathematical calculations for you.
The standard tool for mathmatic numerical operations in engineering is often Matlab (or as free alternatives octave or the already mentioned scilab).

Minimal instruction set to solve any problem with a computer program

Years ago, I have heard that someone was about to demonstrate that every computer program could be solved with just three instructions:
Assignment
Conditional
Loop
Please I would like to hear your opinion. I mean representing any algorithm as a computer program. Do you agree with this?
No need. The minimal theoretical computer needs just one instruction. They are called One Instruction Set Computers (OISC for short, kinda like the ultimate RISC).
There are two types. The first is a theoretically "pure" one instruction machine in which the instruction really works like a regular instruction in normal CPUs. The instruction is usually:
subtract and branch if less than zero
or variations thereof. The wikipedia article have examples of how this single instruction can be used to write code that emulates other instructions.
The second type is not theoretically pure. It is the transfer triggered architecture (wikipedia again, sorry). This family of architectures are also known as move machines and I have designed and built some myself.
Some consider move machines cheating since the machine actually have all the regular instructions only that they are memory mapped instead of being part of the opcode. But move machines are not merely theoretical, they are practical (like I said, I've built some myself). There is even a commercially available family of CPUs built by Maxim: the MAXQ. If you look at the MAXQ instruction set (they call it transfer set since there is really only one instruction, I usually call it register set) you will see that MAXQ assembly looks rather like a standard accumulator based architecture.
This is a consequence of Turing Completeness, which is something that was established many decades ago.
Alan Turing, the famous computer scientist, proved that any computable function could be computed using a Turing Machine. A Turing machine is a very simple theoretical device which can do only a few things. It can read and write to a tape (i.e. memory), maintain an internal state which is altered by the contents read from memory, and use the internal state and the last read memory cell to determine which direction to move the tape before reading the next memory cell.
The operations of assignment, conditional, and loop are sufficient to simulate a Turing Machine. Reading and writing memory and maintaining state requires assignment. Changing the direction of the tape based on state and memory contents require conditionals and loops. "Loops" in fact are a bit more high-level than what is actually required. All that is really required is that program flow can jump backwards somehow. This implies that you can create loops if you want to, but the language does not need to have an explicit loop construct.
Since these three operations allow simulation of a Turing Machine, and a Turing Machine has been proven to be able to compute any computable function, it follows that any language which provides these operations is also able to compute any computable function.
Edit: And, as other answerers pointed out, these operations do not need to be discrete. You can craft a single instruction which does all three of these things (assign, compare, and branch) in such a way that it can simulate a Turing machine all by itself.
The minimal set is a single command, but you have to choose a fitting one, for example - One instruction set computer
When I studied, we used such a "computer" to calculate factorial, using just a single instruction:
SBN - Subtract and Branch if Negative:
SBN A, B, C
Meaning:
if((Memory[A] -= Memory[B]) < 0) goto C
// (Wikipedia has a slightly different definition)
Notable one instruction set computer (OSIC) implementations
This answer will focus on interesting implementations of single instruction set CPUs, compilers and assemblers.
movfuscator
https://github.com/xoreaxeaxeax/movfuscator
Compiles C code using only mov x86 instructions, showing in a very concrete way that a single instruction suffices.
The Turing completeness seems to have been proven in a paper: https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf
subleq
https://esolangs.org/wiki/Subleq:
https://github.com/hasithvm/subleq-verilog Verilog, Xilinx ISE.
https://github.com/purisc-group/purisc Verilog and VHDL, Altera. Maybe that project has a clang backend, but I can't use it: https://github.com/purisc-group/purisc/issues/5
http://mazonka.com/subleq/sqasm.cpp | http://mazonka.com/subleq/sqrun.cpp C++-based assembler and emulator.
See also
What is the minimum instruction set required for any Assembly language to be considered useful?
https://softwareengineering.stackexchange.com/questions/230538/what-is-the-absolute-minimum-set-of-instructions-required-to-build-a-turing-comp/325501
In 1964, Bohm and Jacopini published a paper in which they demonstrated that all programs could be written in terms of only three control structures:
the sequence structure,
the selection structure
and the repetition structure.
Programmers using Haskell might argue that you only need the Contional and Loop because assignments, and mutable state, don't exist in Haskell.

every language eventually compiled into low-level computer language?

Isn't every language compiled into low-level computer language?
If so, shouldn't all languages have the same performance?
Just wondering...
As pointed out by others, not every language is translated into machine language; some are translated into some form (bytecode, reverse Polish, AST) that is interpreted.
But even among languages that are translated to machine code,
Some translators are better than others
Some language features are easier to translate to high-performance code than others
An example of a translator that is better than some others is the GCC C compiler. It has had many years' work invested in producing good code, and its translations outperform those of the simpler compilers lcc and tcc, for example.
An example of a feature that is hard to translate to high-performance code is C's ability to do pointer arithmetic and to dereference pointers: when a program stores through a pointer, it is very difficult for the compiler to know what memory locations are affected. Similarly, when an unknown function is called, the compiler must make very pessimistic assumptions about what might happen to the contents of objects allocated on the heap. In a language like Java, the compiler can do a better job translating because the type system enforces greater separation between pointers of different types. In a language like ML or Haskell, the compiler can do better still, because in these languages, most data allocated in memory cannot be changed by a function call. But of course object-oriented languages and functional languages present their own translation challenges.
Finally, translation of a Turing-complete language is itself a hard problem: in general, finding the best translation of a program is an NP-hard problem, which means that the only solutions known potentially take time exponential in the size of the program. This would be unacceptable in a compiler (can't wait forever to compile a mere few thousand lines), and so compilers use heuristics. There is always room for improvement in these heuristics.
It is easier and more efficient to map some languages into machine language than others. There is no easy analogy that I can think of for this. The closest I can come to is translating Italian to Spanish vs. translating a Khoisan language into Hawaiian.
Another analogy is saying "Well, the laws of physics are what govern how every animal moves, so why do some animals move so much faster than others? Shouldn't they all just move at the same speed?".
No, some languages are simply interpreted. They never actually get turned into machine code. So those languages will generally run slower than low-level languages like C.
Even for the languages which are compiled into machine code, sometimes what comes out of the compiler is not the most efficient possible way to write that given program. So it's often possible to write programs in, say, assembly language that run faster than their C equivalents, and C programs that run faster than their JIT-compiled Java equivalents, etc. (Modern compilers are pretty good, though, so that's not so much of an issue these days)
Yes, all programs get eventually translated into machine code. BUT:
Some programs get translated during compilation, while others are translated on-the-fly by an interpreter (e.g. Perl) or a virtual machine (e.g. original Java)
Obviously, the latter is MUCH slower as you spend time on translation during running.
Different languages can be translated into DIFFERENT machine code. Even when the same programming task is done. So that machine code might be faster or slower depending on the language.
You should understand the difference between compiling (which is translating) and interpreting (which is simulating). You should also understand the concept of a universal basis for computation.
A language or instruction set is universal if it can be used to write an interpreter (or simulator) for any other language or instruction set. Most computers are electronic, but they can be made in many other ways, such as by fluidics, or mechanical parts, or even by people following directions. A good teaching exercise is to write a small program in BASIC and then have a classroom of students "execute" the program by following its steps. Since BASIC is universal (to a first approximation) you can use it to write a program that simulates the instruction set for any other computer.
So you could take a program in your favorite language, compile (translate) it into machine language for your favorite machine, have an interpreter for that machine written in BASIC, and then (in principle) have a class full of students "execute" it. In this way, it is first being reduced to an instruction set for a "fast" machine, and then being executed by a very very very slow "computer". It will still get the same answer, only about a trillion times slower.
Point being, the concept of universality makes all computers equivalent to each other, even though some are very fast and others are very slow.
No, some languages are run by a 'software interpreter' as byte code.
Also, it depends on what the language does in the background as well, so 2 identically functioning programs in different languages may have different mechanics behind the scenes and hence be actually running different instructions resulting in differing performance.

Resources