Knowledge required to build your own integer class? - algorithm

Upon reaching a brick wall with the .Net framework's lack of a BigInteger class (yet), I've decided I'd like to develop my own as an exercise (I realize open source alternatives exist). What hoops do I need to jump through to be able to develop this? Is there any particuliar knowledge pieces that I probably wouldn't have?
edit: side question. Which data type would you use to represent the numbers inside of your new big integer class?

Arbitrary precision arithmetic?
Edit: To represent your numbers you will probably want a resizeable array of integers.

I would brush up on your basic math skills. When I wrote a Big Int class I had to remember how to add, multiply and divide by hand like in Elementary school.
Next if you are going to create a new class I would try to follow the standards that have been set up for the Framework. So it looks like any other .Net class.
I would follow TDD so you know your class works the way it is designed.

You need to have a very good understanding of number systems. You could choose to represent the bignum in base 10, base 2 or any base x. This choice would affect your class performance a lot. You also have to choose the algorithms you want to implement. In general, great libraries like GMP for example, choose the algorithm based on the size of the operands. There are a lot of topics you have to be aware of, but in the end you should be convinced that you can't produce something really interesting. As a learning topic it is very valuable, but as producing something useful consider NOT reinventing the wheel!

If you want to dive really deep into the math of it, you need to read Donald Knuth.

.Net 3.5 has a BigInteger class, but it's scoped internal to the CLR. You can see the code using Reflector. Open System.Core.dll and look in the System.Numeric namespace. BigInteger is the only class in that namespace.
If you want to see the code for the F# BigInteger class, look in the [F# install folder]\source\fsharp\FSharp.Core\math\z.fs file.

Maybe studying an already implemented BigInteger class might help?

If you use elementary algorithms, your BigInts will be unusable for very large integers, for example like Mersenne Primes.
If this is ok for you, use a simple 32 bit int as the basic data type. You need to handle 64 results in this case. If you don't care about speed at all, use some radix-10 number as base, lets say 10000, which will be very easy to implement.
This is mainly because the multiplication in a naive implementation has O(n^2) runtime. Advanced algorithms, based on fourier transforms, have O(n(log(n)) runtime.
This requires some mathematical skills and knowledge.

Related

How to use data structures in interviews

This question is about how to best approach a coding interview from a data structures point of view.
The way I see it, there are two different ways, I could implement a specific DS from scratch, initialise it and then use it to solve my problem, or simply use a library (I'm talking about Node.js here, but I guess this applies to other languages as well, at least those with some in-built support for DS) without worrying about the implementation and only focusing on how to use them to solve a problem.
In the first case, I'm also demonstrating that I can implement a specific DS from scratch, but at the same time I would need more time and there's some additional complexity. Instead, using a library would leave me more time to solve the actual problem, but some companies might take a dim view on this approach.
I know there's no silver bullet, and different companies will have different views, but what approach would you take if you could only pick one, and why?
Well it is always best to use the library but it is always better to know how common library functions work at least the basic ones.
For example, in many interviews Binary search is asked to be implemented instead of just using the library functions. This is because knowing the implementation adds some good concept which can be used in general problem solving like using the same concept in other divide and conquer algorithms.
In production level code we always look for the fail safe and properly tested library code.
You should pick available libraries, first hand. If needed, customize the behavior of already available libraries.

Recommendations for Fast Multipole Method implementation?

I'm interested in implementing the Fast Multipole Method to efficiently simulate a system of repulsive particles.
I've found a large collection of references discussing FMM, but none seem very approachable for non-mathematicians who want to fully understand the algorithm.
Can you recommend a ground-up reference that clearly explains the mathematics behind the process, and includes pseudocode exemplifying a proper implementation?
I am by no means an expert in FMM, but this java implementation and introduction is the best source I've found so far for explaining it carefully and slowly. The paper is good at defining terms before using them, and the code at least is useful as a reference point. The math still gets hairy very quickly, but it is what it is :)
A pedestrian introduction to fast multipole methods is a close second. It doesn't explain the actual details of a working FMM implementation, but it's a good introduction to the basic ideas.
I like the short course on FMM. In begins with FMM in 1D, than it uses theory of complex variable to do FMM in 2D. And than there is the crazy 3D version which uses theory of spherical harmonics functions, which I guess can be very difficult for non-mathematician. But If you need FMM only in 2D you should be fine.
Unfortunately no pseudo codes are given there.
But do you really need the accuracy of FMM?. You might be fine with Barnes-Hut's algorithm
After running into a similar issue to you, I ended up writing a fully-documented Python fast multipole method implementation, pybbfmm. I've also written a short, mathematics-free tutorial on how the method works. Together, I think they're substantially more accessible than any of the other presentations I could find.
(meta: Although this is effectively a linkpost, the OP is explicitly asking for a link. I've added what I think was missing from the last one - the name fo the library - but I'm not sure how else to offer this answer except as a name and a link. Certainly it doesn't feel any more linkpost-y than the accepted answer. If this one gets deleted as well, I'll give up)

How to calculate indefinite integral programmatically

I remember solving a lot of indefinite integration problems. There are certain standard methods of solving them, but nevertheless there are problems which take a combination of approaches to arrive at a solution.
But how can we achieve the solution programatically.
For instance look at the online integrator app of Mathematica. So how do we approach to write such a program which accepts a function as an argument and returns the indefinite integral of the function.
PS. The input function can be assumed to be continuous(i.e. is not for instance sin(x)/x).
You have Risch's algorithm which is subtly undecidable (since you must decide whether two expressions are equal, akin to the ubiquitous halting problem), and really long to implement.
If you're into complicated stuff, solving an ordinary differential equation is actually not harder (and computing an indefinite integral is equivalent to solving y' = f(x)). There exists a Galois differential theory which mimics Galois theory for polynomial equations (but with Lie groups of symmetries of solutions instead of finite groups of permutations of roots). Risch's algorithm is based on it.
The algorithm you are looking for is Risch' Algorithm:
http://en.wikipedia.org/wiki/Risch_algorithm
I believe it is a bit tricky to use. This book:
http://www.amazon.com/Algorithms-Computer-Algebra-Keith-Geddes/dp/0792392590
has description of it. A 100 page description.
You keep a set of basic forms you know the integrals of (polynomials, elementary trigonometric functions, etc.) and you use them on the form of the input. This is doable if you don't need much generality: it's very easy to write a program that integrates polynomials, for example.
If you want to do it in the most general case possible, you'll have to do much of the work that computer algebra systems do. It is a lifetime's work for some people, e.g. if you look at Risch's "algorithm" posted in other answers, or symbolic integration, you can see that there are entire multi-volume books ("Manuel Bronstein, Symbolic Integration Volume I: Springer") that have been written on the topic, and very few existing computer algebra systems implement it in maximum generality.
If you really want to code it yourself, you can look at the source code of Sage or the several projects listed among its components. Of course, it's easier to use one of these programs, or, if you're writing something bigger, use one of these as libraries.
These expert systems usually have a huge collection of techniques and simply try one after another.
I'm not sure about WolframMath, but in Maple there's a command that enables displaying all intermediate steps. If you do so, you get as output all the tried techniques.
Edit:
Transforming the input should not be the really tricky part - you need to write a parser and a lexer, that transforms the textual input into an internal representation.
Good luck. Mathematica is very complex piece of software, and symbolic manipulation is something that it does the best. If you are interested in the topic take a look at these books:
http://www.amazon.com/Computer-Algebra-Symbolic-Computation-Elementary/dp/1568811586/ref=sr_1_3?ie=UTF8&s=books&qid=1279039619&sr=8-3-spell
Also, going to the source wouldn't hurt either. These book actually explains the inner workings of mathematica
http://www.amazon.com/Mathematica-Book-Fourth-Stephen-Wolfram/dp/0521643147/ref=sr_1_7?ie=UTF8&s=books&qid=1279039687&sr=1-7

Algebraic logic

Both Wolfram Alpha and Bing are now providing the ability to solve complex, algebraic logic problems (ie "solve for x, given this equation"), and not just evaluate simple arithmetic expressions (eg "what's 5+5?"). How is this done?
I can read most types of code that might get thrown at me, so it doesn't really make a difference what you use to explain and represent the algorithm. I find that bash makes a really good pseudo-code, not to mention its actually functional, so that'd be ideal. Also, I'm fairly familiar with its in's and out's. Sorry to go ranting on a tangent, but it really irritates me to see people spend effort on crunching out "pseudocode" when they could be getting something 100% functional for just slightly more effort. Anyways, thanks so much for advance.
There are 2 main methods to solve:
Numeric methods. Numerical methods mean, basically, that the solver tries to change the value of x until the equation is satisfied. More info on numerical methods.
Symbolic math. The solver manipulates the equation as a string of symbols, by a number of formal rules. It's not that different from algebra we learn in school, the solver just knows a lot of different rules. More info on computer algebra.
Wolfram|Alpha (W|A) is based on the Mathematica kernel, combined with a natural language parser (which is also built primarily with Mathematica). They have a whole heap of curated data and associated formula that can be used once the question has been interpreted.
There's a blog post describing some of this which came out at the same time as W|A.
Finally, Bing simply uses the (non-free) API to answer questions via W|A.

Choosing a multiplier for a (string) hash function

Do you have any advice/rules on selecting a multiplier to use in a (multiplicative) hash function. The function is computing the hash value of a string.
You want to use something that is relatively prime to the size of your set. That way, when you loop around, you won't end up on the same numbers you just tried.
I had an interesting discussion with a coworker about hash function recently. Our conclusions were as follows:
If you really need to write a good hash function that minimizes collisions more than the default implementations available in the standard languages you need an advanced degree in mathematics.
If you're writing applications where a custom hash function will noticeably improve the performance of your application, you're Google and you've got plenty of Math PhDs to do the work.
Sorry to not directly answer your question, but the bottom line is that there's really no need to write your own hash function for String. What language are you working with? I'd imagine there's an easy way to compute a "good enough" hash code.
Historically 33 seems like a popular choice, and it tends to work pretty well. No one knows why though. For more details, look here

Resources