Hash table insert and remove element

Hash table insert and remove element - data-structures

I'm trying to do a Hash table with linear probing insertion.
The size of the table is 11, my hash function is, h(k) = k mod 11, and what i want to do is.
Insert(15,c)
Insert(4,a)
Insert(26,b)
Delete(15)
Insert(5,d)
Insert(4,e)
Here is my solution but it aint right.
It is supposed to be like this, can someone explain why?

Ahh, you clarified your question.
The answer is correct.
Your solution is incorrect because when you do the insert(4,e), you're not first checking to see if that key already exists in the hash table. If it does exist, then you need to overwrite it.

Whe you insert(15,c) and (4,a), I believe you need create a linked list to solve the collision since they have the same key(4). Then Delete(15) will delete both a and c.

Related

Oracle PLSQL nested table iteration via index - can indices be out of order?

I've been reading, on different oracle.com sites, that
FOR i IN nested_table.FIRST .. nested_table.LAST
is what you normally do in PLSQL, when iterating over all elements of a nested table type (as long as there were no elements deleted).
The way my nested table comes into existence, is by doing
type nested_table_type is table of varchar2(20)
and in a different package
nested_table other_package.nested_table_type := other_package.nested_table_type();
then later, in a loop
nested_table.extend;
nested_table(nested_table.last) := something;
for any number of times.
Then, I want to do something with each value, kind of like using a for each in other languages. Can I use the for loop here? Somebody told me to watch out, because indices in case of Oracle may not be in order, so some might not be considered by the for loop. I should defintely use this, he said:
index := nested_table.first;
while (index is not null)
loop
do things...
index := nested_table.next(index);
end loop;
Is this true? How would the indices not be in order or the for loop not iterate over them all?
Thanks for helping :)
Edit:
Most likely this was some kind of miscommunication. I left the code the way it is. Still, thanks for reading / answering, and hopefully this helps somebody in the future or something :)

Indices are in order, it's just that you can create sparse table, meaning that some indices might be missing.
However, for your case using i IN t.FIRST .. t.LAST is perfectly fine.

Recursive database viewing

I have this situation. Starting from a table, I have to check all the records that match a key. If records are found, I have to check another table using a key from the first table and so on, more on less on five levels. There is a way to do this in a recursive way, or I have to write all the code "by hand"? The language I am using is Visual Fox Pro. If this is is not possible, is it al least possible to use recursion to popolate a treeview?

You can set a relation between tables. For example:
USE table_1.dbf IN 0 SHARED
USE table_2.dbf IN 0 SHARED
SET ORDER TO TAG key_field OF table_2.cdx IN table_2
SET RELATION TO key_field INTO table_2 ADDITIVE IN table_1
First two commands open table_1 and table_2. Then you have to set the order/index of table_2. If you don't have an index for the key field then this will not work. The final command sets the relation between the two tables on the key field.
From here you can browse both tables and table_2's records will be filtered based on table_1's key field. Hope this helps.

If the tables have similar structure or you only need to look at a few fields, you could write a recursive routine that receives the name of the table, the key to check, and perhaps the fields you need to check as parameters. The tricky part, I guess, is knowing what to pass down to the next call.
I don't think I can offer any more advice without at least seeing some table structures.

Sorry for answering so late, but the problem was of course that the recursion wasn't a viable solution since I had to search inside multiple tables. So I resolved by doing a simple 2-Level search in the tables that I needed.
Thank you very much for the help, and sorry again for answering so late.

SAP BODS - Getting PK violation from a Table Comparison

I want to read from a table, change a couple column values for a few lines in a query, then update those lines on the same table.
I'm using SAP BODS, and that's what I tried:
I was about to insert images but just found out I can't insert images until 10 rep.
Anyway, I created a DataFlow where I have the same table as source and target.
A query to filter (using where) and change values (using mapping). And then a Table Comparison (where I expected those lines to be set to update, in this particular case), set table name on first entry, then PK in 'input primary key' and then the two columns I want to change in 'Compare columns'. No other changes from default that I can recall.
Got no warnings on 'validate all', and on execution I receive an ORA-00001 for the PK.
So ... I thought the Table Comparison would try to update, but seems like it's trying to insert instead. I want to know what I'm doing wrong and how could I get the job to do those updates. Thanks in advance.
Ps. I did search SO before asking and didn't find anything relevant.

Ok
So, turns out I just found what's going on a few minutes after posting the question.
Wasn't sure if I should answer my own question and took a look at this Etiquette for answering your own question
and decided to come back here and answer my own question.
For some reason I got stuck thinking that it was something to do with the Table Comparison trying to insert a line with a PK that's already there, instead of doing the update I wanted.
But after going back to the job to take another look at the issue, it occurred to me that maybe the problem could be a duplicate in the incoming data set. Made a few adjustment to filter those, and voilà.

GlazedLists with two Comparators per column

I would like to know if somebody already did the effort to integrate the following into GlazedLists:
I want a separate Comparator for each sort direction of a column. A practical example for something like this is a file browser where the directories are always sorted to the front and only as secondary sorting requirement, I want to sort by the file/directory name. If i have only one Comparator, this is of course not possible.
I also just realized that it might exist also a workaround if I use sorting by multiple columns and the main sorting priority is on the column that says whether some element is a file or directory.
Does anyone have experience with this issue?
Thanks.

Ok, it seems I still did not unleash the full power of GlazedLists when I asked the question. The answer is very simple: I can just add another SortedList that sorts by file types as decorator of my initial SortedList that is sorted based on the table.

What would be the best algorithm to find an ID that is not used from a table that has the capacity to hold a million rows

To elaborate ..
a) A table (BIGTABLE) has a capacity to hold a million rows with a primary Key as the ID. (random and unique)
b) What algorithm can be used to arrive at an ID that has not been used so far. This number will be used to insert another row into table BIGTABLE.
Updated the question with more details..
C) This table already has about 100 K rows and the primary key is not an set as identity.
d) Currently, a random number is generated as the primary key and a row inserted into this table, if the insert fails another random number is generated. the problem is sometimes it goes into a loop and the random numbers generated are pretty random, but unfortunately, They already exist in the table. so if we re try the random number generation number after some time it works.
e) The sybase rand() function is used to generate the random number.
Hope this addition to the question helps clarify some points.

The question is of course: why do you want a random ID?
One case where I encountered a similar requirement, was for client IDs of a webapp: the client identifies himself with his client ID (stored in a cookie), so it has to be hard to brute force guess another client's ID (because that would allow hijacking his data).
The solution I went with, was to combine a sequential int32 with a random int32 to obtain an int64 that I used as the client ID. In PostgreSQL:
CREATE FUNCTION lift(integer, integer) returns bigint AS $$
SELECT ($1::bigint << 31) + $2
$$ LANGUAGE SQL;
CREATE FUNCTION random_pos_int() RETURNS integer AS $$
select floor((lift(1,0) - 1)*random())::integer
$$ LANGUAGE sql;
ALTER TABLE client ALTER COLUMN id SET DEFAULT
lift((nextval('client_id_seq'::regclass))::integer, random_pos_int());
The generated IDs are 'half' random, while the other 'half' guarantees you cannot obtain the same ID twice:
select lift(1, random_pos_int()); => 3108167398
select lift(2, random_pos_int()); => 4673906795
select lift(3, random_pos_int()); => 7414644984
...

Why is the unique ID Random? Why not use IDENTITY?
How was the ID chosen for the existing rows.
The simplest thing to do is probably (Select Max(ID) from BIGTABLE) and then make sure your new "Random" ID is larger than that...
EDIT: Based on the added information I'd suggest that you're screwed.
If it's an option: Copy the table, then redefine it and use an Identity Column.
If, as another answer speculated, you do need a truly random Identifier: make your PK two fields. An Identity Field and then a random number.
If you simply can't change the tables structure checking to see if the id exists before trying the insert is probably your only recourse.

There isn't really a good algorithm for this. You can use this basic construct to find an unused id:
int id;
do {
id = generateRandomId();
} while (doesIdAlreadyExist(id));
doSomethingWithNewId(id);

Your best bet is to make your key space big enough that the probability of collisions is extremely low, then don't worry about it. As mentioned, GUIDs will do this for you. Or, you can use a pure random number as long as it has enough bits.
This page has the formula for calculating the collision probability.

A bit outside of the box.
Why not pre-generate your random numbers ahead of time? That way, when you insert a new row into bigtable, the check has already been made. That would make inserts into bigtable a constant time operation.
You will have to perform the checks eventually, but that could be offloaded to a second process that doesn’t involve the sensitive process of inserting into bigtable.
Or go generate a few billion random numbers, and delete the duplicates, then you won't have to worry for quite some time.

Make the key field UNIQUE and IDENTITY and you wont have to worry about it.

If this is something you'll need to do often you will probably want to maintain a live (non-db) data structure to help you quickly answer this question. A 10-way tree would be good. When the app starts it populates the tree by reading the keys from the db, and then keeps it in sync with the various inserts and deletes made in the db. So long as your app is the only one updating the db the tree can be consulted very quickly when verifying that the next large random key is not already in use.

Pick a random number, check if it already exists, if so then keep trying until you hit one that doesn't.
Edit: Or
better yet, skip the check and just try to insert the row with different IDs until it works.

First question: Is this a planned database or a already functional one. If it already has data inside then the answer by bmdhacks is correct. If it is a planned database here is the second question:
Does your primary key really need to be random? If the answer is yes then use a function to create a random id from with a known seed and a counter to know how many Ids have been created. Each Id created will increment the counter.
If you keep the seed secret (i.e., have the seed called and declared private) then no one else should be able to predict the next ID.

If ID is purely random, there is no algorithm to find an unused ID in a similarly random fashion without brute forcing. However, as long as the bit-depth of your random unique id is reasonably large (say 64 bits), you're pretty safe from collisions with only a million rows. If it collides on insert, just try again.

depending on your database you might have the option of either using a sequenser (oracle) or a autoincrement (mysql, ms sql, etc). Or last resort do a select max(id) + 1 as new id - just be carefull of concurrent requests so you don't end up with the same max-id twice - wrap it in a lock with the upcomming insert statement

I've seen this done so many times before via brute force, using random number generators, and it's always a bad idea. Generating a random number outside of the db and attempting to see if it exists will put a lot strain on your app and database. And it could lead to 2 processes picking the same id.
Your best option is to use MySQL's autoincrement ability. Other databases have similar functionality. You are guaranteed a unique id and won't have issues with concurrency.

It is probably a bad idea to scan every value in that table every time looking for a unique value. I think the way to do this would be to have a value in another table, lock on that table, read the value, calculate the value of the next id, write the value of the next id, release the lock. You can then use the id you read with the confidence your current process is the only one holding that unique value. Not sure how well it scales.
Alternatively use a GUID for your ids, since each newly generated GUID is supposed to be unique.

Is it a requirement that the new ID also be random? If so, the best answer is just to loop over (randomize, test for existence) until you find one that doesn't exist.
If the data just happens to be random, but that isn't a strong constraint, you can just use SELECT MAX(idcolumn), increment in a way appropriate to the data, and use that as the primary key for your next record.
You need to do this atomically, so either lock the table or use some other concurrency control appropriate to your DB configuration and schema. Stored procs, table locks, row locks, SELECT...FOR UPDATE, whatever.
Note that in either approach you may need to handle failed transactions. You may theoretically get duplicate key issues in the first (though that's unlikely if your key space is sparsely populated), and you are likely to get deadlocks on some DBs with approaches like SELECT...FOR UPDATE. So be sure to check and restart the transaction on error.

First check if Max(ID) + 1 is not taken and use that.
If Max(ID) + 1 exceeds the maximum then select an ordered chunk at the top and start looping backwards looking for a hole. Repeat the chunks until you run out of numbers (in which case throw a big error).
if the "hole" is found then save the ID in another table and you can use that as the starting point for the next case to save looping.

Skipping the reasoning of the task itself, the only algorithm that
will give you an ID not in the table
that will be used to insert a new line in the table
will result in a table still having random unique IDs
is generating a random number and then checking if it's already used

The best algorithm in that case is to generate a random number and do a select to see if it exists, or just try to add it if your database errs out sanely. Depending on the range of your key, vs, how many records there are, this could be a small amount of time. It also has the ability to spike and isn't consistent at all.
Would it be possible to run some queries on the BigTable and see if there are any ranges that could be exploited? ie. between 100,000 and 234,000 there are no ID's yet, so we could add ID's there?

Why not append your random number creator with the current date in seconds. This way the only way to have an identical ID is if two users are created at the same second and are given the same random number by your generator.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Hash table insert and remove element - data-structures

Ahh, you clarified your question. The answer is correct. Your solution is incorrect because when you do the insert(4,e), you're not first checking to see if that key already exists in the hash table. If it does exist, then you need to overwrite it.

Whe you insert(15,c) and (4,a), I believe you need create a linked list to solve the collision since they have the same key(4). Then Delete(15) will delete both a and c.

Related

Oracle PLSQL nested table iteration via index - can indices be out of order?

Recursive database viewing

SAP BODS - Getting PK violation from a Table Comparison

GlazedLists with two Comparators per column

What would be the best algorithm to find an ID that is not used from a table that has the capacity to hold a million rows

Categories

Resources