Contraction operation | Table expansion and contraction | Dynamic tables - algorithm

me again for one algorithm question during my reading Introduction Of Algorithm 3rd Edition.
Firstly when reading the expansion section, I think I fully understood the step that a new table will be needed when an expansion occurs since the old table is too small to insert one more item.
However the next section is for the table contraction where a new table is needed again, which is confusing me. The contraction occurs when an item is to be removed, then the old table is definitely having sufficient space so that a new table for contraction seems meaningless.
Can we logically remove the step of creating a new table then copying items, from a table contraction?
I know I must be wrong in some way of thinking about it, while I raise it out here desire for some further communication with some deeper thinkers.

Related

Chess: Extracting the principal variation from the transposition table

Earlier, I was having an issue involving my principal variation becoming truncated by an alpha-beta search. Indeed, this appears to be a common issue. From the authors of Crafty:
Another solution with even worse properties is to extract the full PV
from the transposition table, and avoid using the triangular array
completely. If the transposition table is large enough so that
nothing gets overwritten at all, this would almost work. But there is
a hidden “gotcha”. Once you search the PV, you continue to search
other parts of the tree, and you will frequently encounter some of
those same positions again, but the hash draft will not be deep enough
to allow the entry to be used. You continue searching and now have to
overwrite that entry (exact hash signature match requires this) and
you now leave a transposition “trail” to a different endpoint, one
which does not match the original score. Or any one of the trans/ref
entries between the root and the actual PV endpoint get overwritten by
a different position completely, and you now have no way to search the
transposition table to find the actual endpoint you want to display.
Several use this approach to produce the PV, and the complaints are
generally frequent and loud, because an evaluation paired with an
unrelated PV is not very useful, either for debugging or for analyzing
your own games.
This makes a lot of sense.
Consider that the principal variation is ABCDEF, but AB returns the board to its original position. Then, an alternative line examined later might be CDEFGH, which results in a different evaluation than the earlier search of just CDEF. Thus, the transposition table entry for the board state after AB is overwritten, potentially with a node that will be cut off by alpha-beta (!!), and the PV of ABCDEF is destroyed forever.
Is there any way around this problem, or do I have to use an external data structure to save the PV?
Specifically, what is wrong with replacing if and only if the new entry is deeper and exact? This doesn't seem to work, but I'm not sure why.

INSERT: when should I not be using the APPEND hint?

I'm trying to insert batches of data in an Oracle table, with an INSERT statement, namely:
INSERT INTO t1 SELECT * FROM all_objects;
I've come across the APPEND hint, which seems to increase performance in some cases.
Are there situations where it might decrease performance and it should not be used?
Thanks
The append hint does a direct path insert, which is the same direct path insert used by SQL*Loader, when specified. For large datasets, you should see dramatic improvements.
One of the main caveats you need to be aware of is that one of the reasons it is so fast is that it inserts all new rows past the high water mark. This means that if you are frequently deleting rows and re-inserting, a conventional insert could potentially be better than a direct path because it will reclaim the freed space from the deleted rows.
If, for example, you had a table with 5 million rows where you did a delete from followed by a direct path insert, after a few iterations you would notice things slow to a crawl. The insert itself would continue to be nice and speedy, but your queries against the table will gradually get worse.
The only way I know of to reset the HWM is to truncate the table. If you plan to use direct path on a table with minimal dead rows, or if you are going to somehow reset the HWM, then I think in most cases it will be fine -- preferable, in fact, if you are inserting large amounts of data.
Here's a nice article that explains the details of the differences:
https://sshailesh.wordpress.com/2009/05/03/conventional-path-load-and-direct-path-load-simple-to-use-in-complex-situations/
A final parting shot -- with all Oracle hints, know everything you can before you use them. Using them haphazardly can be haphazard to your health.
I think performance will may be decreased in the special case if your select retrievs only one or a small number of rows.
So in this I would use not the append hint. The OracleBase article describes very well the impact of the APPEND hint. He also provides the link to the manual page
There are 3 different situations:
The APPEND hint will not have any effect because it will be silently ignored. This will happen if a trigger is defined on the table or a reference constraint or under some other circumstances.
The append hint will raise an error message or a statement following the statement with the APPEND hint will raise an error message. Her you have two possibilities: either you remove the APPEND hint or you split the transaction in two or more separate transactions.
The append hint will work. Here you will get better performance if you use the APPEND hint (except if you have only a small number of rows to insert as stated at the beginning). But you will also need more space when using the append hint. The insert will use news extents for the data and not fill them in the free space of the existing extends. If you do a parallel insert each process uses its own extents. This may in a lot of unused space and be a drawback in some situations.
It might negatively affect performance if you are using it for inserting small sets of data.
That's because it will allocate new space every time instead of reusing free space, so using it with multiple small sets can fragment your table which may result on performance issues.
That hint is a good idea for large inserts scheduled for times where usage is low.

Problems with a primary key sequence

When adding new data to a form my primary key sequence increases by 1.
However if i was to delete a data and replace it with new data the sequence would carry on.
So for example my primary keys for data go 1,2,3,4,5,6,10 because of previously deleted rows.
I hope that makes sence.
SEQUENCE values in Oracle are guaranteed to be unique, but you cannot expect the values to form a contiguous sequence without any gaps.
Even if you would never delete any rows from the table, you're likely to see gaps at some point, because sequence values are cached (pre-reserved) between different transactions.
It is a SEQUENCE of numbers, it doesn't care if you have used the "current value" or not.
As opposed to MySQL, in Oracle the Sequence is not tied to a column, but it is a separate object that you can ask a value from (through your_sequence.nextval). To handle the uniqueness, it doesn't take back values and offer them again.
If you always want to have a dense sequence of ID-s even through deletion, you would have to either
rearrange the ID-s (read: change ID-s of the rows newer than the deleted one), or
without knowing your task, I would suggest using the DENSE_RANK analytic function for querying your dataset, and separating the real (in-table) ID-s from the ranking of the rows.

Deleting from a HashTable

I'm using tombstone method to delete elements from a hash table.
That is, instead of deallocating the node and reorganizing the hash table I simply put DELETED mark on the deleted index and making it available for further INSERT operations and avoid it from breaking the SEARCH operation.
However, after # of those markers exceed a certain number I actually want to deallocate those nodes and reorganize my table.
I've thought of allocating a new table which has size of: Old Table Size - # of DELETED marks and inserting nodes that are NOT EMPTY and that do not have DELETED mark to this new table
using the regular INSERT but this seemed like overkill to me. Is there a better method to do what I want ?
My table uses Open Adressing with hash functions such as Linear Probing, Double Hashing etc.
The algorithm you describe is essentially rehashing, and that's an entirely reasonable approach to the problem. It has the virtue of being exactly the same code as you would use when your hash table occupancy becomes too large.
Estimating an appropriate size for the new table is tricky. It's usually a good idea to not shrink hash tables aggressively after deletions, in case the table is about to start growing again.

Best way to remove an entry from a hash table

What is the best way to remove an entry from a hashtable that uses linear probing? One way to do this would be to use a flag to indicate deleted elements? Are there any ways better than this?
An easy technique is to:
Find and remove the desired element
Go to the next bucket
If the bucket is empty, quit
If the bucket is full, delete the element in that bucket and re-add it to the hash table using the normal means. The item must be removed before re-adding, because it is likely that the item could be added back into its original spot.
Repeat step 2.
This technique keeps your table tidy at the expense of slightly slower deletions.
It depends on how you handle overflow and whether (1) the item being removed is in an overflow slot or not, and (2) if there are overflow items beyond the item being removed, whether they have the hash key of the item being removed or possibly some other hash key. [Overlooking that double condition is a common source of bugs in deletion implementations.]
If collisions overflow into a linked list, it is pretty easy. You're either popping up the list (which may have gone empty) or deleting a member from the middle or end of the linked list. Those are fun and not particularly difficult. There can be other optimizations to avoid excessive memory allocations and freeings to make this even more efficient.
For linear probing, Knuth suggests that a simple approach is to have a way to mark a slot as empty, deleted, or occupied. Mark a removed occupant slot as deleted so that overflow by linear probing will skip past it, but if an insertion is needed, you can fill the first deleted slot that you passed over [The Art of Computer Programming, vol.3: Sorting and Searching, section 6.4 Hashing, p. 533 (ed.2)]. This assumes that deletions are rather rare.
Knuth gives a nice refinment as Algorithm R6.4 [pp. 533-534] that instead marks the cell as empty rather than deleted, and then finds ways to move table entries back closer to their initial-probe location by moving the hole that was just made until it ends up next to another hole.
Knuth cautions that this will move existing still-occupied slot entries and is not a good idea if pointers to the slots are being held onto outside of the hash table. [If you have garbage-collected- or other managed-references in the slots, it is all right to move the slot, since it is the reference that is being used outside of the table and it doesn't matter where the slot that references the same object is in the table.]
The Python hash table implementation (arguable very fast) uses dummy elements to mark deletions. As you grow or shrink or table (assuming you're not doing a fixed-size table), you can drop the dummies at the same time.
If you have access to a copy, have a look at the article in Beautiful Code about the implementation.
The best general solutions I can think of include:
If you're can use a non-const iterator (ala C++ STL or Java), you should be able to remove them as you encounter them. Presumably, though, you wouldn't be asking this question unless you're using a const iterator or an enumerator which would be invalidated if the underlying collection is modified.
As you said, you could mark a deleted flag within the contained object. This doesn't release any memory or reduce collisions on the key, though, so it's not the best solution. Also requires the addition of a property on the class that probably doesn't really belong there. If this bothers you as much as it would me, or if you simply can't add a flag to the stored object (perhaps you don't control the class), you could store these flags in a separate hash table. This requires the most long-term memory use.
Push the keys of the to-be-removed items into a vector or array list while traversing the hash table. After releasing the enumerator, loop through this secondary list and remove the keys from the hash table. If you have a lot of items to remove and/or the keys are large (which they shouldn't be), this may not be the best solution.
If you're going to end up removing more items from the hash table than you're leaving in there, it may be better to create a new hash table, and as you traverse your original one, add to the new hash table only the items you're going to keep. Then replace your reference(s) to the old hash table with the new one. This saves a secondary list iteration, but it's probably only efficient if the new hash table will have significantly fewer items than the original one, and it definitely only works if you can change all the references to the original hash table, of course.
If your hash table gives you access to its collection of keys, you may be able to iterate through those and remove items from the hash table in one pass.
If your hash table or some helper in your library provides you with predicate-based collection modifiers, you may have a Remove() function to which you can pass a lambda expression or function pointer to identify the items to remove.
A common technique when time is a factor is to have a second table of deleted items, and clean up the main table when you have time. Commonly used in search engines.
How about enhancing the hash table to contain pointers like a linked list?
When you insert, if the bucket is full, create a pointer from this bucket to the bucket where the new field in stored.
While deleting something from the hashtable, the solution will be equivalent to how you write a function to delete a node from linkedlist.

Resources