Is a lookup in a hash table O(1)? - complexity-theory

If a hash table holds N distinct items, and is not overloaded, then the hashes for the N items must have have approximately lg(N) bits, otherwise too many items will get the same hash value.
But a hash table lookup is usually said to take O(1) time on average.
It's not possible to generate lg(N) bits in O(1) time, so the standard results for the complexity of hash tables are wrong.
What's wrong with my reasoning?

What is wrong with your reasoning is the use of conflicting definitions of "time".
When one says that lookup in a hash table takes O(1) time, one usually means that it takes O(1) comparisons, that is, the number of comparisons required to find an item is bounded above by a constant. Under this idea of "time", the actual time (as in the thing you would measure in seconds) used to compute the hash causes no variation.
Measuring time in comparisons is an approximation that, while it may not reflect reality in the same way that measuring it in seconds would, still provides useful information about the behaviour of the hash table.
This sort of thing is true for most asymptotic complexity descriptions of algorithms: people often use "time" with a very abstract meaning that isn't the informal meaning of "time", but more often than not is some variation of "number of operations" (with the kind of operation often left unstated, expected to be obvious, or clear from context).

The analysis is based on the assumption that the hash function is fixed and not related to the actual number of elements stored in the table. Rather than saying that the hash function returns a lg N-bit value if there are N elements in the hash table, the analysis is based on a hash function that returns, say, a k-bit value, where k is independent of N. Typical value of k (such as 32 or 64) provide for a hash table far larger than anything you need in practice.
So in once sense, yes, a table holding N elements requires a hash function that returns O(lg n) bits; but in practice, a constant that is far larger than the anticipated maximum value of lg n is used.

Hashtable search is O(1).
I think you are mixing insertion(which is O(n)) and search.

Related

Time complexity for hash tables when inserting and searching

looking at Wikipedia for Hash tables, it says that inserting and searching is O(1). But my concern is, that my teacher told me that only the lookup is O(1) and that hashing is O(s), where s the length of the string. Shouldn't the inserting and search be O(s) instead. Where it says hashing(s) + lookup(s)= O(hashing(s) + lookup(s))= O(s).
Could anyone explain me what is the correct way of writing the time complexity in big O notation for hash tables, and why? If assuming it is perfect hashing and no collisions occur.
Hash tables are used for more than just strings. The O(1) complexities for insert and lookup are for hash tables in general and only count the known operations.
Hashing and comparison are counted as O(1), because something must always be done for those, even if you're just storing integers, but we don't know what that is.
If you use a hash table for some data type (like strings) that multiplies the cost of those operations then it will multiply the complexity.
It is actually very important to consider this when measuring the complexity of a concrete algorithm that uses hash tables. Many of the string-based algorithms on this site, for example, are given complexities based on the assumption that the length of input strings is bounded by some constant. Thankfully that is usually the case.
This question is very similar to a question I asked: Is a lookup in a hash table O(1)?
The accepted answer was that for hashtables, "time" is measured in comparisons, and not operations. Here's the full answer, quoted:
What is wrong with your reasoning is the use of conflicting
definitions of "time".
When one says that lookup in a hash table takes O(1) time, one usually
means that it takes O(1) comparisons, that is, the number of
comparisons required to find an item is bounded above by a constant.
Under this idea of "time", the actual time (as in the thing you would
measure in seconds) used to compute the hash causes no variation.
Measuring time in comparisons is an approximation that, while it may
not reflect reality in the same way that measuring it in seconds
would, still provides useful information about the behaviour of the
hash table.
This sort of thing is true for most asymptotic complexity descriptions
of algorithms: people often use "time" with a very abstract meaning
that isn't the informal meaning of "time", but more often than not is
some variation of "number of operations" (with the kind of operation
often left unstated, expected to be obvious, or clear from context).

Why doesn't the complexity of hashing and BSTs factor in the time required to process the bytes of the keys?

I have a basic question on the time complexity of basic operations when using hash tables as opposed to binary search trees (or balanced ones).
In basic algorithm courses, which is unfortunately the only type I have studies, I learned that ideally, the time complexity of look-up/insert using Hashtables is O(1). For binary (search) trees, it is O(log(n)) where "n" is the "number" of input objects. So far, hashtable is the winner (I guess) in terms of asymptotic access time.
Now take "n" as the size of the data structure array, and "m" as the number of distinct input objects (values) to be stored in the DS.
For me, there is an ambiguity in the time complexity of data structure operations (e.g., lookup). Is it really possible to do Hashing with a "calculation/evaluation" complexity constant time in "n"? Specifically, if we know we have "m" distinct values for the objects which are being stored, then can the hash function still run faster than "Omega (log(m))"?
If not, then I would claim that the complexity for nontrivial applications has to be O( log ( n ) ) since in practice "n" and "m" are not drastically different.
I can't see a way such function can be found. For example, take m= 2^O( k) be the total number of distinct strings of length "k" bytes. A hash function has to go over all "k" bytes and even if it takes only constant time to do the calculations for each byte, then the overall time needed to hash the input is Omega( k ) = Omega( log( m) ).
Having said that, for cases where the number of potential inputs is comparable to the size of the table, e.g., "m" is almost equal to "n", the hashing complexity does not look like constant time to me.
Your concern is valid, though I think there's a secondary point you're missing. If you factor in the time required to look through all the bytes of the input into the calculation of the time complexity of a BST, you would take the existing O(log n) time and multiply it by the time required for each comparison, which would be O(log m). You'd then get O(log n log m) time for searches in a BST.
Typically, the time complexities states for BSTs and hash tables are not the real time complexities, but the number of "elementary operations" on the underlying data types. For example, a hash table does, on expectation, O(1) hashes and comparisons of the underlying data types. A BST will do O(log n) comparisons of the underlying data types. If those comparisons or hashes don't take time O(1), then the time required to do the lookups won't be O(1) (for hash tables) or O(log n) (for BSTs).
In some cases, we make assumptions about how the machine works that let us conveniently ignore the time required to process the bits of the input. For example, suppose that we're hashing numbers between 0 and 2k. If we assume that we have a transdichotomous machine model, then by assumption each machine word will be at least Ω(k) bits and we can perform operations on machine words in time O(1). This means that we can perform hashes on k bits in time O(1) rather than time O(k), since we're assuming that the word size grows as a function of the problem set.
Hope this helps!
That's a fair point. If your container's keys are arbitrarily large objects, you need a different analysis. However, in the end the result will be roughly the same.
In classic algorithmic analysis, we usually just assume that certain operations (like incrementing a counter, or comparing two values) take constant time, and that certain objects (like integers) occupy constant space. These two assumptions go hand in hand, because when we say that an algorithm is O(f(N)), the n refers to "the 'size' of the problem", and if individual components of the problem have non-constant size, then the total size of the problem will have an additional non-constant multiplier.
More importantly, we generally make the assumption that we can index a contiguous array in constant time; this is the so-called "RAM" or "von Neumann" model, and it underlies most computational analysis in the last four decades or so (see here for a potted history).
For simple problems, like binary addition, it really doesn't matter whether we count the size of the objects as 1 object or k bits. In either case, the cost of doing a set of additions of size n is O(n), whether we're counting objects-of-a-constant-size or bits in variable-size-objects. By the same token, the cost of a hash-table lookup consists of:
Compute the hash (time proportional to key size)
Find the hash bucket (assumed to be constant time since the hash is a fixed size)
Compare the target with each object in the bucket (time proportional to key size, assuming that the bucket length is constant)
Similarly, we usually analyze the cost of a binary search by counting comparisons. If each object takes constant space, and each comparison takes constant time, then we can say that a problem of size N (which is n objects multiplied by some constant) can be solved with a binary search tree in log n comparisons. Again, the comparisons might take non-constant time, but then the problem size will also be multiplied by the same constant.
There is a lengthy discussion on a similar issue (sorting) in the comments in this blog post, also from the Computational Complexity blog, which you might well enjoy if you're looking for something beyond the basics.

Hash table runtime complexity (insert, search and delete)

Why do I keep seeing different runtime complexities for these functions on a hash table?
On wiki, search and delete are O(n) (I thought the point of hash tables was to have constant lookup so what's the point if search is O(n)).
In some course notes from a while ago, I see a wide range of complexities depending on certain details including one with all O(1). Why would any other implementation be used if I can get all O(1)?
If I'm using standard hash tables in a language like C++ or Java, what can I expect the time complexity to be?
Hash tables are O(1) average and amortized case complexity, however it suffers from O(n) worst case time complexity. [And I think this is where your confusion is]
Hash tables suffer from O(n) worst time complexity due to two reasons:
If too many elements were hashed into the same key: looking inside this key may take O(n) time.
Once a hash table has passed its load balance - it has to rehash [create a new bigger table, and re-insert each element to the table].
However, it is said to be O(1) average and amortized case because:
It is very rare that many items will be hashed to the same key [if you chose a good hash function and you don't have too big load balance.
The rehash operation, which is O(n), can at most happen after n/2 ops, which are all assumed O(1): Thus when you sum the average time per op, you get : (n*O(1) + O(n)) / n) = O(1)
Note because of the rehashing issue - a realtime applications and applications that need low latency - should not use a hash table as their data structure.
EDIT: Annother issue with hash tables: cache
Another issue where you might see a performance loss in large hash tables is due to cache performance. Hash Tables suffer from bad cache performance, and thus for large collection - the access time might take longer, since you need to reload the relevant part of the table from the memory back into the cache.
Ideally, a hashtable is O(1). The problem is if two keys are not equal, however they result in the same hash.
For example, imagine the strings "it was the best of times it was the worst of times" and "Green Eggs and Ham" both resulted in a hash value of 123.
When the first string is inserted, it's put in bucket 123. When the second string is inserted, it would see that a value already exists for bucket 123. It would then compare the new value to the existing value, and see they are not equal. In this case, an array or linked list is created for that key. At this point, retrieving this value becomes O(n) as the hashtable needs to iterate through each value in that bucket to find the desired one.
For this reason, when using a hash table, it's important to use a key with a really good hash function that's both fast and doesn't often result in duplicate values for different objects.
Make sense?
Some hash tables (cuckoo hashing) have guaranteed O(1) lookup
Perhaps you were looking at the space complexity? That is O(n). The other complexities are as expected on the hash table entry. The search complexity approaches O(1) as the number of buckets increases. If at the worst case you have only one bucket in the hash table, then the search complexity is O(n).
Edit in response to comment I don't think it is correct to say O(1) is the average case. It really is (as the wikipedia page says) O(1+n/k) where K is the hash table size. If K is large enough, then the result is effectively O(1). But suppose K is 10 and N is 100. In that case each bucket will have on average 10 entries, so the search time is definitely not O(1); it is a linear search through up to 10 entries.
Depends on the how you implement hashing, in the worst case it can go to O(n), in best case it is 0(1) (generally you can achieve if your DS is not that big easily)

Can hash tables really be O(1)?

It seems to be common knowledge that hash tables can achieve O(1), but that has never made sense to me. Can someone please explain it? Here are two situations that come to mind:
A. The value is an int smaller than the size of the hash table. Therefore, the value is its own hash, so there is no hash table. But if there was, it would be O(1) and still be inefficient.
B. You have to calculate a hash of the value. In this situation, the order is O(n) for the size of the data being looked up. The lookup might be O(1) after you do O(n) work, but that still comes out to O(n) in my eyes.
And unless you have a perfect hash or a large hash table, there are probably several items per bucket. So, it devolves into a small linear search at some point anyway.
I think hash tables are awesome, but I do not get the O(1) designation unless it is just supposed to be theoretical.
Wikipedia's article for hash tables consistently references constant lookup time and totally ignores the cost of the hash function. Is that really a fair measure?
Edit: To summarize what I learned:
It is technically true because the hash function is not required to use all the information in the key and so could be constant time, and because a large enough table can bring collisions down to near constant time.
It is true in practice because over time it just works out as long as the hash function and table size are chosen to minimize collisions, even though that often means not using a constant time hash function.
You have two variables here, m and n, where m is the length of the input and n is the number of items in the hash.
The O(1) lookup performance claim makes at least two assumptions:
Your objects can be equality compared in O(1) time.
There will be few hash collisions.
If your objects are variable size and an equality check requires looking at all bits then performance will become O(m). The hash function however does not have to be O(m) - it can be O(1). Unlike a cryptographic hash, a hash function for use in a dictionary does not have to look at every bit in the input in order to calculate the hash. Implementations are free to look at only a fixed number of bits.
For sufficiently many items the number of items will become greater than the number of possible hashes and then you will get collisions causing the performance rise above O(1), for example O(n) for a simple linked list traversal (or O(n*m) if both assumptions are false).
In practice though the O(1) claim while technically false, is approximately true for many real world situations, and in particular those situations where the above assumptions hold.
You have to calculate the hash, so the order is O(n) for the size of the data being looked up. The lookup might be O(1) after you do O(n) work, but that still comes out to O(n) in my eyes.
What? To hash a single element takes constant time. Why would it be anything else? If you're inserting n elements, then yes, you have to compute n hashes, and that takes linear time... to look an element up, you compute a single hash of what you're looking for, then find the appropriate bucket with that. You don't re-compute the hashes of everything that's already in the hash table.
And unless you have a perfect hash or a large hash table there are probably several items per bucket so it devolves into a small linear search at some point anyway.
Not necessarily. The buckets don't necessarily have to be lists or arrays, they can be any container type, such as a balanced BST. That means O(log n) worst case. But this is why it's important to choose a good hashing function to avoid putting too many elements into one bucket. As KennyTM pointed out, on average, you will still get O(1) time, even if occasionally you have to dig through a bucket.
The trade off of hash tables is of course the space complexity. You're trading space for time, which seems to be the usual case in computing science.
You mention using strings as keys in one of your other comments. You're concerned about the amount of time it takes to compute the hash of a string, because it consists of several chars? As someone else pointed out again, you don't necessarily need to look at all the chars to compute the hash, although it might produce a better hash if you did. In that case, if there are on average m chars in your key, and you used all of them to compute your hash, then I suppose you're right, that lookups would take O(m). If m >> n then you might have a problem. You'd probably be better off with a BST in that case. Or choose a cheaper hashing function.
The hash is fixed size - looking up the appropriate hash bucket is a fixed cost operation. This means that it is O(1).
Calculating the hash does not have to be a particularly expensive operation - we're not talking cryptographic hash functions here. But that's by the by. The hash function calculation itself does not depend on the number n of elements; while it might depend on the size of the data in an element, this is not what n refers to. So the calculation of the hash does not depend on n and is also O(1).
Hashing is O(1) only if there are only constant number of keys in the table and some other assumptions are made. But in such cases it has advantage.
If your key has an n-bit representation, your hash function can use 1, 2, ... n of these bits. Thinking about a hash function that uses 1 bit. Evaluation is O(1) for sure. But you are only partitioning the key space into 2. So you are mapping as many as 2^(n-1) keys into the same bin. using BST search this takes up to n-1 steps to locate a particular key if nearly full.
You can extend this to see that if your hash function uses K bits your bin size is 2^(n-k).
so K-bit hash function ==> no more than 2^K effective bins ==> up to 2^(n-K) n-bit keys per bin ==> (n-K) steps (BST) to resolve collisions. Actually most hash functions are much less "effective" and need/use more than K bits to produce 2^k bins. So even this is optimistic.
You can view it this way -- you will need ~n steps to be able to uniquely distinguish a pair of keys of n bits in the worst case. There is really no way to get around this information theory limit, hash table or not.
However, this is NOT how/when you use hash table!
The complexity analysis assumes that for n-bit keys, you could have O(2^n) keys in the table (e.g. 1/4 of all possible keys). But most if not all of the time we use hash table, we only have a constant number of the n-bit keys in the table. If you only want a constant number of keys in the table, say C is your maximum number, then you could form a hash table of O(C) bins, that guarantees expected constant collision (with a good hash function); and a hash function using ~logC of the n bits in the key. Then every query is O(logC) = O(1). This is how people claim "hash table access is O(1)"/
There are a couple of catches here -- first, saying you don't need all the bits may only be a billing trick. First you cannot really pass the key value to the hash function, because that would be moving n bits in the memory which is O(n). So you need to do e.g. a reference passing. But you still need to store it somewhere already which was an O(n) operation; you just don't bill it to the hashing; you overall computation task cannot avoid this. Second, you do the hashing, find the bin, and found more than 1 keys; your cost depends on your resolution method -- if you do comparison based (BST or List), you will have O(n) operation (recall key is n-bit); if you do 2nd hash, well, you have the same issue if 2nd hash has collision. So O(1) is not 100% guaranteed unless you have no collision (you can improve the chance by having a table with more bins than keys, but still).
Consider the alternative, e.g. BST, in this case. there are C keys, so a balanced BST will be O(logC) in depth, so a search takes O(logC) steps. However the comparison in this case would be an O(n) operation ... so it appears hashing is a better choice in this case.
TL;DR: Hash tables guarantee O(1) expected worst case time if you pick your hash function uniformly at random from a universal family of hash functions. Expected worst case is not the same as average case.
Disclaimer: I don't formally prove hash tables are O(1), for that have a look at this video from coursera [1]. I also don't discuss the amortized aspects of hash tables. That is orthogonal to the discussion about hashing and collisions.
I see a surprisingly great deal of confusion around this topic in other answers and comments, and will try to rectify some of them in this long answer.
Reasoning about worst case
There are different types of worst case analysis. The analysis that most answers have made here so far is not worst case, but rather average case [2]. Average case analysis tends to be more practical. Maybe your algorithm has one bad worst case input, but actually works well for all other possible inputs. Bottomline is your runtime depends on the dataset you're running on.
Consider the following pseudocode of the get method of a hash table. Here I'm assuming we handle collision by chaining, so each entry of the table is a linked list of (key,value) pairs. We also assume the number of buckets m is fixed but is O(n), where n is the number of elements in the input.
function get(a: Table with m buckets, k: Key being looked up)
bucket <- compute hash(k) modulo m
for each (key,value) in a[bucket]
return value if k == key
return not_found
As other answers have pointed out, this runs in average O(1) and worst case O(n). We can make a little sketch of a proof by challenge here. The challenge goes as follows:
(1) You give your hash table algorithm to an adversary.
(2) The adversary can study it and prepare as long as he wants.
(3) Finally the adversary gives you an input of size n for you to insert in your table.
The question is: how fast is your hash table on the adversary input?
From step (1) the adversary knows your hash function; during step (2) the adversary can craft a list of n elements with the same hash modulo m, by e.g. randomly computing the hash of a bunch of elements; and then in (3) they can give you that list. But lo and behold, since all n elements hash to the same bucket, your algorithm will take O(n) time to traverse the linked list in that bucket. No matter how many times we retry the challenge, the adversary always wins, and that's how bad your algorithm is, worst case O(n).
How come hashing is O(1)?
What threw us off in the previous challenge was that the adversary knew our hash function very well, and could use that knowledge to craft the worst possible input.
What if instead of always using one fixed hash function, we actually had a set of hash functions, H, that the algorithm can randomly choose from at runtime? In case you're curious, H is called a universal family of hash functions [3]. Alright, let's try adding some randomness to this.
First suppose our hash table also includes a seed r, and r is assigned to a random number at construction time. We assign it once and then it's fixed for that hash table instance. Now let's revisit our pseudocode.
function get(a: Table with m buckets and seed r, k: Key being looked up)
rHash <- H[r]
bucket <- compute rHash(k) modulo m
for each (key,value) in a[bucket]
return value if k == key
return not_found
If we try the challenge one more time: from step (1) the adversary can know all the hash functions we have in H, but now the specific hash function we use depends on r. The value of r is private to our structure, the adversary cannot inspect it at runtime, nor predict it ahead of time, so he can't concoct a list that's always bad for us. Let's assume that in step (2) the adversary chooses one function hash in H at random, he then crafts a list of n collisions under hash modulo m, and sends that for step (3), crossing fingers that at runtime H[r] will be the same hash they chose.
This is a serious bet for the adversary, the list he crafted collides under hash, but will just be a random input under any other hash function in H. If he wins this bet our run time will be worst case O(n) like before, but if he loses then well we're just being given a random input which takes the average O(1) time. And indeed most times the adversary will lose, he wins only once every |H| challenges, and we can make |H| be very large.
Contrast this result to the previous algorithm where the adversary always won the challenge. Handwaving here a bit, but since most times the adversary will fail, and this is true for all possible strategies the adversary can try, it follows that although the worst case is O(n), the expected worst case is in fact O(1).
Again, this is not a formal proof. The guarantee we get from this expected worst case analysis is that our run time is now independent of any specific input. This is a truly random guarantee, as opposed to the average case analysis where we showed a motivated adversary could easily craft bad inputs.
TL-DR; usually hash() is O(m) where m is length of a key
My three cents.
24 years ago when Sun released jdk 1.2 they fixed a bug in String.hashCode() so instead of computing a hash only based on some portion of a string since jdk1.2 it reads every single character of a string instead. This change was intentional and IHMO very wise.
In most languages builtin hash works similar. It process the whole object to compute a hash because keys are usually small while collisions can cause serious issues.
There are a lot of theoretical arguments confirming and denying the O(1) hash lookup cost. A lot of them are reasonable and educative.
Let us skip the theory and do some experiment instead:
import timeit
samples = [tuple("LetsHaveSomeFun!")] # better see for tuples
# samples = ["LetsHaveSomeFun!"] # hash for string is much faster. Increase sample size to see
for _ in range(25 if isinstance(samples[0], str) else 20):
samples.append(samples[-1] * 2)
empty = {}
for i, s in enumerate(samples):
t = timeit.timeit(lambda: s in empty, number=2000)
print(f"{i}. For element of length {len(s)} it took {t:0.3f} time to lookup in empty hashmap")
When I run it I get:
0. For element of length 16 it took 0.000 time to lookup in empty hashmap
1. For element of length 32 it took 0.000 time to lookup in empty hashmap
2. For element of length 64 it took 0.001 time to lookup in empty hashmap
3. For element of length 128 it took 0.001 time to lookup in empty hashmap
4. For element of length 256 it took 0.002 time to lookup in empty hashmap
5. For element of length 512 it took 0.003 time to lookup in empty hashmap
6. For element of length 1024 it took 0.006 time to lookup in empty hashmap
7. For element of length 2048 it took 0.012 time to lookup in empty hashmap
8. For element of length 4096 it took 0.025 time to lookup in empty hashmap
9. For element of length 8192 it took 0.048 time to lookup in empty hashmap
10. For element of length 16384 it took 0.094 time to lookup in empty hashmap
11. For element of length 32768 it took 0.184 time to lookup in empty hashmap
12. For element of length 65536 it took 0.368 time to lookup in empty hashmap
13. For element of length 131072 it took 0.743 time to lookup in empty hashmap
14. For element of length 262144 it took 1.490 time to lookup in empty hashmap
15. For element of length 524288 it took 2.900 time to lookup in empty hashmap
16. For element of length 1048576 it took 5.872 time to lookup in empty hashmap
17. For element of length 2097152 it took 12.003 time to lookup in empty hashmap
18. For element of length 4194304 it took 25.176 time to lookup in empty hashmap
19. For element of length 8388608 it took 50.399 time to lookup in empty hashmap
20. For element of length 16777216 it took 99.281 time to lookup in empty hashmap
Clearly the hash is O(m) where m is the length of a key.
You can make similar experiments for other mainstream languages and I expect you get a similar results.
It seems based on discussion here, that if X is the ceiling of (# of elements in table/# of bins), then a better answer is O(log(X)) assuming an efficient implementation of bin lookup.
There are two settings under which you can get O(1) worst-case times.
If your setup is static, then FKS hashing will get you worst-case O(1) guarantees. But as you indicated, your setting isn't static.
If you use Cuckoo hashing, then queries and deletes are O(1)
worst-case, but insertion is only O(1) expected. Cuckoo hashing works quite well if you have an upper bound on the total number of inserts, and set the table size to be roughly 25% larger.
Copied from here
A. The value is an int smaller than the size of the hash table. Therefore, the value is its own hash, so there is no hash table. But if there was, it would be O(1) and still be inefficient.
This is a case where you could trivially map the keys to distinct buckets, so an array seems a better choice of data structure than a hash table. Still, the inefficiencies don't grow with the table size.
(You might still use a hash table because you don't trust the ints to remain smaller than the table size as the program evolves, you want to make the code potentially reusable when that relationship doesn't hold, or you just don't want people reading/maintaining the code to have to waste mental effort understanding and maintaining the relationship).
B. You have to calculate a hash of the value. In this situation, the order is O(n) for the size of the data being looked up. The lookup might be O(1) after you do O(n) work, but that still comes out to O(n) in my eyes.
We need to distinguish between the size of the key (e.g. in bytes), and the size of the number of keys being stored in the hash table. Claims that hash tables provide O(1) operations mean that operations (insert/erase/find) don't tend to slow down further as the number of keys increases from hundreds to thousands to millions to billions (at least not if all the data is accessed/updated in equally fast storage, be that RAM or disk - cache effects may come into play but even the cost of a worst-case cache miss tends to be some constant multiple of best-case hit).
Consider a telephone book: you may have names in there that are quite long, but whether the book has 100 names, or 10 million, the average name length is going to be pretty consistent, and the worst case in history...
Guinness world record for the Longest name used by anyone ever was set by Adolph Blaine Charles David Earl Frederick Gerald Hubert Irvin John Kenneth Lloyd Martin Nero Oliver Paul Quincy Randolph Sherman Thomas Uncas Victor William Xerxes Yancy Wolfeschlegelsteinhausenbergerdorff, Senior
...wc tells me that's 215 characters - that's not a hard upper-bound to the key length, but we don't need to worry about there being massively more.
That holds for most real world hash tables: the average key length doesn't tend to grow with the number of keys in use. There are exceptions, for example a key creation routine might return strings embedding incrementing integers, but even then every time you increase the number of keys by an order of magnitude you only increase the key length by 1 character: it's not significant.
It's also possible to create a hash from a fixed-size amount of key data. For example, Microsoft's Visual C++ ships with a Standard Library implementation of std::hash<std::string> that creates a hash incorporating just ten bytes evenly spaced along the string, so if the strings only vary at other indices you get collisions (and hence in practice non O(1) behaviours on the post-collision searching side), but the time to create the hash has a hard upper bound.
And unless you have a perfect hash or a large hash table, there are probably several items per bucket. So, it devolves into a small linear search at some point anyway.
Generally true, but the awesome thing about hash tables is that the number of keys visited during those "small linear searches" is - for the separate chaining approach to collisions - a function of the hash table load factor (ratio of keys to buckets).
For example, with a load factor of 1.0 there's an average of ~1.58 to the length of those linear searches, regardless of the number of keys (see my answer here). For closed hashing it's a bit more complicated, but not much worse when the load factor isn't too high.
It is technically true because the hash function is not required to use all the information in the key and so could be constant time, and because a large enough table can bring collisions down to near constant time.
This kind of misses the point. Any kind of associative data structure ultimately has to do operations across every part of the key sometimes (inequality may sometimes be determined from just a part of the key, but equality generally requires every bit be considered). At a minimum, it can hash the key once and store the hash value, and if it uses a strong enough hash function - e.g. 64-bit MD5 - it might practically ignore even the possibility of two keys hashing to the same value (a company I worked for did exactly that for the distributed database: hash-generation time was still insignificant compared to WAN-wide network transmissions). So, there's not too much point obsessing about the cost to process the key: that's inherent in storing keys regardless of the data structure, and as said above - doesn't tend to grow worse on average with there being more keys.
As for large enough hash tables bringing collisions down, that's missing the point too. For separate chaining, you still have a constant average collision chain length at any given load factor - it's just higher when the load factor is higher, and that relationship is non-linear. The SO user Hans comments on my answer also linked above that:
average bucket length conditioned on nonempty buckets is a better measure of efficiency. It is a/(1-e^{-a}) [where a is the load factor, e is 2.71828...]
So, the load factor alone determines the average number of colliding keys you have to search through during insert/erase/find operations. For separate chaining, it doesn't just approach being constant when the load factor is low - it's always constant. For open addressing though your claim has some validity: some colliding elements are redirected to alternative buckets and can then interfere with operations on other keys, so at higher load factors (especially > .8 or .9) collision chain length gets more dramatically worse.
It is true in practice because over time it just works out as long as the hash function and table size are chosen to minimize collisions, even though that often means not using a constant time hash function.
Well, the table size should result in a sane load factor given the choice of close hashing or separate chaining, but also if the hash function is a bit weak and the keys aren't very random, having a prime number of buckets often helps reduce collisions too (hash-value % table-size then wraps around such that changes only to a high order bit or two in the hash-value still resolve to buckets spread pseudo-randomly across different parts of the hash table).
Leaving other considerations aside, the O(1) claim hinges on a constant time access model of memory, which is a good enough approximation for most practical computer science but not strictly justifiable from a theoretical point of view.
For starters, any memory addressing scheme necessarily requires multiplexing at the circuit level, which in turn requires a circuit depth at least proportional to O(log N). Since clock frequency is inversely proportional to the longest path (in number of traversed gates) of a circuit, this implies no general memory access scheme can run in less than O(log N) for fast enough CPUs or large enough memories.
Then, at a more fundamental level, you can only stack so many bits of memory within a finite distance D from the processor, and given the finite speed of light this means your worst case time for a random memory access is at least O(D^1/3), and more likely O(D^1/2) if we take into account integrated circuits are two-dimensional.
But of course in practice computers operate far from reaching these limits... or do they? This is when cache hierarchies enter the game, and why no good implementation of an algorithm or data structure can afford to ignore the actual details of the use case or the hardware implementation.
Either way the absolute worst case for a random memory access timing is given by the ping latency between your computer and some server at the opposite side of the planet, which can be in the 100s of ms and is, for the record, a lot worse than the best case scenario of having the data cached in L1 or -even better- already loaded in the registers.
As for the cost of hashing, you are correct in that it cannot be truly constant or even bounded by a set number of operations when applied to a potentially unbounded set of arbitrary-size keys such as strings, which can only be dealt with efficiently for the randomized case, but often do share arbitrarily long common prefixes that require reading and processing a number of bits larger than the size of the prefix.
For such cases it may be advisable to use a specialized data structure such as a z-fast trie or similar, which can simultaneously disambiguate prefixes and perform random memory access in amortized O(lg lg lg N).

Run time to insert n elements into an empty hash table

People say it takes amortized O(1) to put into a hash table. Therefore, putting n elements must be O(n). That's not true for large n, however, since as an answerer said, "All you need to satisfy expected amortized O(1) is to expand the table and rehash everything with a new random hash function any time there is a collision."
So: what is the average running-time of inserting n elements into a hash table? I realize this is probably implementation-dependent, so mention what type of implementation you're talking about.
For example, if there are (log n) equally spaced collisions, and each collision takes O(k) to resolve, where k is the current size of the hashtable, then you'd have this recurrence relation:
T(n) = T(n/2) + n/2 + n/2
(that is, you take the time to insert n/2 elements, then you have a collision, taking n/2 to resolve, then you do the remaining n/2 inserts without a collision). This still ends up being O(n), so yay. But is this reasonable?
It completely depends on how inefficient your rehashing is. Specifically, if you can properly estimate the expected size of your hashtable the second time, your runtime still approaches O(n). Effectively, you have to specify how inefficient your rehash size calculation is before you can determine the expected order.
People say it takes amortized O(1) to put into a hash table.
From a theoretical standpoint, it is expected amortized O(1).
Hash tables are fundamentally a randomized data structure, in the same sense that quicksort is a randomized algorithm. You need to generate your hash functions with some randomness, or else there exist pathological inputs which are not O(1).
You can achieve expected amortized O(1) using dynamic perfect hashing:
The naive idea I originally posted was to rehash with a new random hash function on every collision. (See also perfect hash functions) The problem with this is that this requires O(n^2) space, from birthday paradox.
The solution is to have two hash tables, with the second table for collisions; resolve collisions on that second table by rebuilding it. That table will have O(\sqrt{n}) elements, so would grow to O(n) size.
In practice you often just use a fixed hash function because you can assume (or don't care if) your input is pathological, much like you often quicksort without prerandomizing the input.
All O(1) is saying is that the operation is performed in constant time, and it's not dependent on the number of elements in your data structure.
In simple words, this means that you'll have to pay the same cost no matter how big your data structure is.
In practical terms this means that simple data structures such as trees are generally more effective when you don't have to store a lot of data. In my experience I find trees faster up to ~1k elements (32bit integers), then hash tables take over. But as usual YMMW.
Why not just run a few tests on your system? Maybe if you'll post the source, we can go back and test them on our systems and we could really shape this into a very useful discussion.
It is just not the implementation, but the environment as well that decides how much time the algorithm actually takes. You can however, look if any benchmarking samples are available or not. The problem with me posting my results will be of no use since people have no idea what else is running on my system, how much RAM is free right now and so on. You can only ever have a broad idea. And that is about as good as what the big-O gives you.

Resources