Choosing a multiplier for a (string) hash function - performance

Do you have any advice/rules on selecting a multiplier to use in a (multiplicative) hash function. The function is computing the hash value of a string.

You want to use something that is relatively prime to the size of your set. That way, when you loop around, you won't end up on the same numbers you just tried.

I had an interesting discussion with a coworker about hash function recently. Our conclusions were as follows:
If you really need to write a good hash function that minimizes collisions more than the default implementations available in the standard languages you need an advanced degree in mathematics.
If you're writing applications where a custom hash function will noticeably improve the performance of your application, you're Google and you've got plenty of Math PhDs to do the work.
Sorry to not directly answer your question, but the bottom line is that there's really no need to write your own hash function for String. What language are you working with? I'd imagine there's an easy way to compute a "good enough" hash code.

Historically 33 seems like a popular choice, and it tends to work pretty well. No one knows why though. For more details, look here

Related

How to avoid exponential notation using Scratch?

In my program, I have a large string of numbers that have been compiled together, and I'm switching it back and forth between different base values. But when I switch back to decimal, the computer directly switches to a number using exponential notation. The program I'm using is Scratch, but as long as any algorithms that are given are readable, I should be able to translate.
Essentially, I just need a way to go from like 1.0e13 to 10000000000000. Any ideas?
This script is the best I could muster:
And a sample output:
As well as a project containing the custom block for your convenience: https://scratch.mit.edu/projects/150067538/
Unfortunately, Scratch still rounds numbers, so your answers won't always be 100% exact, but at least they won't be in scientific (e) notation. If somebody else has an even better solution, I'd love to see it.
Like PullJosh said, (Hey again PullJosh!) scratch rounds numbers off the Scientific Notation so it won't be exactly accurate but their is always a solution to a problem!
My theory is that you can put each digit of the scientific notation into a list. This will make the conversion much easier! I will not take a picture of my code but send you the link to it as the code is massive mostly because I added some code that will detect if your scientific notation is a number and it can convert numbers like 1.123e2.
https://scratch.mit.edu/projects/341550388/editor
You can use the code without credit,yay! Just put it in your backpack and you're good to go.
Edit: Also, if you need more help with Scratch and stuff, feel free to follow me # endermite334 (you don't have to) and I will be happy to help you!

Should you always document functions, even if redundant (specifically python)?

I try to use function names that are active and descriptive, which I then document with active and descriptive text (!). This generates redundant-looking code.
Simplified (but not so unrealistic) example in python, following numpy docstring style:
def calculate_inverse(matrix):
"""Calculate the inverse of a matrix.
Parameters
----------
matrix : ndarray
The matrix to be inverted.
Returns
-------
matrix_inv : ndarray
The inverse of the matrix.
"""
matrix_inv = scipy.linalg.inv(matrix)
return matrix_inv
Specifically for python, I have read PEP-257 and the sphinx/napoleon example numpy and Google style docstrings. I like that I can automatically generate documentation for my functions, but what is the "best practice" for redundant examples like above? Should one simply not document "obvious" classes, functions, etc? The degree of "obviousness" then of course becomes subjective ...
I have in mind open-source, distributed code. Multiple authors suggests that the code itself should be readable (calculate_inverse(A) better than dgetri(A)), but multiple end-users would benefit from sphinx-style documentation.
I've always followed the guideline that the code tells you what it does, the comments are added to explain why it does something.
If you can't read the code, you have no business looking at it, so having (in the extreme):
index += 1 # move to next item
is a total waste of time. So is a comment on a function called calculate_inverse(matrix) which states that it calculates the inverse of the matrix.
Whereas something like:
# Use Pythagoras theorem to find hypotenuse length.
hypo = sqrt (side1 * side1 + side2 * side2)
might be more suitable since it adds the information on where the equation came from, in case you need to investigate it further.
Comments should really be reserved for added information, such as the algorithm you use for calculating the inverse. In this case, since your algorithm is simply handing off the work to scipy, it's totally unnecessary.
If you must have a docstring here for auto-generated documentation, I certainly wouldn't be going beyond the one-liner variant for this very simple case:
"""Return the inverse of a matrix"""
"Always"? Definitively not. Comment as little as possible. Comments lie. They always lie, and if they don't, then they will be lying tomorrow. The same applies to many docs.
The only times (imo) that you should be writing comments/documentation for your code is when you are shipping a library to clients/customers or if you're in an open source project. In these cases you should also have a rigorous standard so there is never any ambiguity what should and should not be documented, and how.
In these cases you also need to have an established workflow regarding who is responsible for updating the docs, since they will get out of sync with the code all the time.
So in summary, never ever comment/document if you can help it. If you have to (because of shipping libs/doing open source), do it Properly(tm).
Clear, concise, well written, and properly placed comments are often useful. In your example, however, I think the code stands alone without the comments. It can go both ways. Comments range from needed and excellent to completely useless.
This is an important topic. You should read the chapter on comments in “Clean Code: A Handbook of Agile Software Craftsmanship,” by Robert Martin and others (2008). Chapter 4, “Comments,” starts with this assertion, “Clear and expressive code with few comments is far superior to cluttered and complex code with lots of comments. Rather than spend your time writing the comments that explain the mess you’ve made, spend it cleaning the mess.” The chapter continues with an excellent discussion on comments.
Yes, you should always document functions.
Many answers write about commenting your code, this is very different. I say about docstrings, which document your interface.
Docstrings are useful, because you can get interactive help in python interpreter. For example,
import math
help(math)
shows you the following help:
...
cos(...)
cos(x)
Return the cosine of x (measured in radians).
cosh(...)
cosh(x)
Return the hyperbolic cosine of x.
...
Note that even though cos and cosh are very familiar (and exactly repeat functions from C math.h), they are documented. For cos it is stated explicitly that its argument should be in radians. For your example it would be useful to know what a matrix could be. Is it an array of arrays? A tuple of tuples, or an ndarray, as you correctly wrote in its proper documentation? Will a rectangular or zero matrix suit?
Another 'familiar' function is chdir from os, which is documented like this:
chdir(...)
chdir(path)
Change the current working directory to the specified path.
Frankly speaking, not all functions in standard library modules are documented. I found a non-documented method of a class statvfs_result in os:
| __reduce__(...)
Maybe it is still a good example of why you should document. I admit that I forgot what reduce does, so I've no idea about this method. More familiar __eq__, __ne__ are still documented in that class (like x.__eq__(y) <==> x==y).
If you don't document your function, the help for your module will look like this:
calculate_inverse(matrix)
Functions will clump together more, because a docstring takes additional vertical space.
Write a docstring for a person who doesn't see your code. If the function is really simple, the docstring should be simple as well. It will give confidence that the function really is simple, and nothing unexpected will raise from that undocumented function (if they didn't bother to write documentation, are they competent and responsible to produce good code, indeed?)
The spirit of PEPs and other guidelines is that code should be good for all.
I'm pretty sure that somebody will once have difficulty with which is obvious for you.
I (currently) write from my laptop with not a very large screen, and have only one window in vim, but I write in conformance with PEP 8, which says: "Limiting the required editor window width makes it possible to have several files open side-by-side, and works well when using code review tools that present the two versions in adjacent columns". PEP 257 recommends docstrings which will work well with Emacs' fill-paragraph.
So, I don't know any good example when not to write a docstring is worthy. But, as PEPs and guidelines are only recommendations, you can omit a docstring if your function will not be used by many people, if you won't use it in the future, and if you don't care to write good code (at least there).

How to calculate indefinite integral programmatically

I remember solving a lot of indefinite integration problems. There are certain standard methods of solving them, but nevertheless there are problems which take a combination of approaches to arrive at a solution.
But how can we achieve the solution programatically.
For instance look at the online integrator app of Mathematica. So how do we approach to write such a program which accepts a function as an argument and returns the indefinite integral of the function.
PS. The input function can be assumed to be continuous(i.e. is not for instance sin(x)/x).
You have Risch's algorithm which is subtly undecidable (since you must decide whether two expressions are equal, akin to the ubiquitous halting problem), and really long to implement.
If you're into complicated stuff, solving an ordinary differential equation is actually not harder (and computing an indefinite integral is equivalent to solving y' = f(x)). There exists a Galois differential theory which mimics Galois theory for polynomial equations (but with Lie groups of symmetries of solutions instead of finite groups of permutations of roots). Risch's algorithm is based on it.
The algorithm you are looking for is Risch' Algorithm:
http://en.wikipedia.org/wiki/Risch_algorithm
I believe it is a bit tricky to use. This book:
http://www.amazon.com/Algorithms-Computer-Algebra-Keith-Geddes/dp/0792392590
has description of it. A 100 page description.
You keep a set of basic forms you know the integrals of (polynomials, elementary trigonometric functions, etc.) and you use them on the form of the input. This is doable if you don't need much generality: it's very easy to write a program that integrates polynomials, for example.
If you want to do it in the most general case possible, you'll have to do much of the work that computer algebra systems do. It is a lifetime's work for some people, e.g. if you look at Risch's "algorithm" posted in other answers, or symbolic integration, you can see that there are entire multi-volume books ("Manuel Bronstein, Symbolic Integration Volume I: Springer") that have been written on the topic, and very few existing computer algebra systems implement it in maximum generality.
If you really want to code it yourself, you can look at the source code of Sage or the several projects listed among its components. Of course, it's easier to use one of these programs, or, if you're writing something bigger, use one of these as libraries.
These expert systems usually have a huge collection of techniques and simply try one after another.
I'm not sure about WolframMath, but in Maple there's a command that enables displaying all intermediate steps. If you do so, you get as output all the tried techniques.
Edit:
Transforming the input should not be the really tricky part - you need to write a parser and a lexer, that transforms the textual input into an internal representation.
Good luck. Mathematica is very complex piece of software, and symbolic manipulation is something that it does the best. If you are interested in the topic take a look at these books:
http://www.amazon.com/Computer-Algebra-Symbolic-Computation-Elementary/dp/1568811586/ref=sr_1_3?ie=UTF8&s=books&qid=1279039619&sr=8-3-spell
Also, going to the source wouldn't hurt either. These book actually explains the inner workings of mathematica
http://www.amazon.com/Mathematica-Book-Fourth-Stephen-Wolfram/dp/0521643147/ref=sr_1_7?ie=UTF8&s=books&qid=1279039687&sr=1-7

Knowledge required to build your own integer class?

Upon reaching a brick wall with the .Net framework's lack of a BigInteger class (yet), I've decided I'd like to develop my own as an exercise (I realize open source alternatives exist). What hoops do I need to jump through to be able to develop this? Is there any particuliar knowledge pieces that I probably wouldn't have?
edit: side question. Which data type would you use to represent the numbers inside of your new big integer class?
Arbitrary precision arithmetic?
Edit: To represent your numbers you will probably want a resizeable array of integers.
I would brush up on your basic math skills. When I wrote a Big Int class I had to remember how to add, multiply and divide by hand like in Elementary school.
Next if you are going to create a new class I would try to follow the standards that have been set up for the Framework. So it looks like any other .Net class.
I would follow TDD so you know your class works the way it is designed.
You need to have a very good understanding of number systems. You could choose to represent the bignum in base 10, base 2 or any base x. This choice would affect your class performance a lot. You also have to choose the algorithms you want to implement. In general, great libraries like GMP for example, choose the algorithm based on the size of the operands. There are a lot of topics you have to be aware of, but in the end you should be convinced that you can't produce something really interesting. As a learning topic it is very valuable, but as producing something useful consider NOT reinventing the wheel!
If you want to dive really deep into the math of it, you need to read Donald Knuth.
.Net 3.5 has a BigInteger class, but it's scoped internal to the CLR. You can see the code using Reflector. Open System.Core.dll and look in the System.Numeric namespace. BigInteger is the only class in that namespace.
If you want to see the code for the F# BigInteger class, look in the [F# install folder]\source\fsharp\FSharp.Core\math\z.fs file.
Maybe studying an already implemented BigInteger class might help?
If you use elementary algorithms, your BigInts will be unusable for very large integers, for example like Mersenne Primes.
If this is ok for you, use a simple 32 bit int as the basic data type. You need to handle 64 results in this case. If you don't care about speed at all, use some radix-10 number as base, lets say 10000, which will be very easy to implement.
This is mainly because the multiplication in a naive implementation has O(n^2) runtime. Advanced algorithms, based on fourier transforms, have O(n(log(n)) runtime.
This requires some mathematical skills and knowledge.

Algebraic logic

Both Wolfram Alpha and Bing are now providing the ability to solve complex, algebraic logic problems (ie "solve for x, given this equation"), and not just evaluate simple arithmetic expressions (eg "what's 5+5?"). How is this done?
I can read most types of code that might get thrown at me, so it doesn't really make a difference what you use to explain and represent the algorithm. I find that bash makes a really good pseudo-code, not to mention its actually functional, so that'd be ideal. Also, I'm fairly familiar with its in's and out's. Sorry to go ranting on a tangent, but it really irritates me to see people spend effort on crunching out "pseudocode" when they could be getting something 100% functional for just slightly more effort. Anyways, thanks so much for advance.
There are 2 main methods to solve:
Numeric methods. Numerical methods mean, basically, that the solver tries to change the value of x until the equation is satisfied. More info on numerical methods.
Symbolic math. The solver manipulates the equation as a string of symbols, by a number of formal rules. It's not that different from algebra we learn in school, the solver just knows a lot of different rules. More info on computer algebra.
Wolfram|Alpha (W|A) is based on the Mathematica kernel, combined with a natural language parser (which is also built primarily with Mathematica). They have a whole heap of curated data and associated formula that can be used once the question has been interpreted.
There's a blog post describing some of this which came out at the same time as W|A.
Finally, Bing simply uses the (non-free) API to answer questions via W|A.

Resources