I am new to Hbase
is it possible to/how can I auto increment row-key in Hbase? (like for each insert row-key has to be auto increment itself)
or is it possible to auto-increment any other column ? (like for each insert this column has to be auto-increment by 1)
Monolitically increasing row keys are not recommended in HBase, see this for reference: http://hbase.apache.org/book/rowkey.design.html, p.6.3.2. In fact, using globally ordered row keys would cause all instances of your distributed application write to the same region, which will become a bottleneck.
If you can avoid using auto-increment IDs and need to have just unique IDs in a distributed system, you can use something like "hostname" + "PID" + "TIMESTAMP" as a key. This way it would be unique for each row
If you are sure you need a global autoincrement in a table (it can be a key or some value from the column), you can use incrementColumnValue call - have a separate row in your table (or create a dedicated table for this) that would store the actual value, and the process will call incrementColumnValue before inserting new row to get the next value. But this way you cannot guarantee that there would be no gaps: if the client will fail after calling the incrementColumnValue, you might get the counter incremented but the row won't be inserted.
In short, all of the proposed solutions are client-side, there is no server-side implementation for this feature in HBase
Related
I have a table where i can chose to delete a row. e.g i have table having 5 records numbered from 1-5, after deleting lets say 3, am looking a way that the remaining records will be 1,2,3,4 and not 1,2,4,5.
If you want to reorder by IDs, please think if it is necessary. Here is explanation https://laracasts.com/discuss/channels/eloquent/how-do-you-handle-with-reordering-items Also if you are using soft deletes it can cause a problem concerning unique IDs.
If you are using other column to reorder by, you can iterate through each entity and setting value to iterator value after deleting.
Warning: If you use this approach on Primary Key columns it might bring inconsistency and mess up your relations if you don't take care of it properly. Also most of the time it is unnecessary to reorder the primary key column.
The process can be applied from database itself. The general procedure is as following:
Make sure if the column is used as foreign keys in other tables, the definitions must have ON UPDATE CASCADE
(On production server) to reduce inconsistency put lock on the table.
Apply reordering
SQL:
For example you can run the following commands for MySQL (inspired by this answer)
-- if on production we lock the table write
LOCK TABLES my_reordering_table WRITE;
SET #count = 0;
UPDATE `my_reordering_table` SET `my_reordering_table`.`id` = #count:= #count + 1;
ALTER TABLE `my_reordering_table` AUTO_INCREMENT = 1;
Warning: Again make sure you do not run this on production server and if you run, just make sure you have all the foreign keys ON UPDATE CASCADE.
I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful
When I execute the UPSERT command on apache phoenix, I always see that Phoenix add an extra column (named _0) with an empty value in the hbase, this column(_0) is auto generate by phoenix, but I don't need it, like this:
ROW COLUMN+CELL
abc column=F:A,timestamp=1451305685300,value=123
abc column=F:_0, timestamp=1451305685300, value= # I want to avoid generate this row
Could you tell me how to avoid that? Thank you very much!
"At create time, to improve query performance, an empty key value is
added to the first column family of any existing rows or the default
column family if no column families are explicitly defined. Upserts will also add this empty key value. This improves query performance by having a key value column we can guarantee always being there and thus minimizing the amount of data that must be projected and subsequently returned back to the client."
Apache Phoenix Documentation
Regarding your question if that is avoidable:
You could work around the problem by adding the following statements at the end of your sql:
ALTER TABLE "<your-table>" ADD "<your-cf>"."_0" VARCHAR(1);
ALTER TABLE "<your-table>" DROP COLUMN "<your-cf>"."_0";
You should only do this if you query some table with phoenix but then access the table with another system that is not aware of this phoenix-specific dummy value.
I'm using oracle 11gr2 and for the product table when a new product is inserted I need to assign an autoincrement id going from 1 to 65535. Product could be then be deleted.
When I reach the 65535th, I need to scan the table to find a free hole for assigning new ID.
As I have this requirement oracle sequence could not be used, so I am using a function (tried also a trigger on insert) in order to generate a free id...
The problem is that I could not handle batch insert for example and I have concurrency problems...
How could I solve this ? By using some sort of external Id generator ?
Sounds like an arbitrary design. Is there a good reason for having a 16-bit max product id or for reusing IDs? Both constraints are bad practice.
I doubt any external generator is going to provide anything that Oracle doesn't already provide. I recommend using sequences for batch insert. The problem you have is how to recycle the IDs. Oracle plain sequences don't track the primary key, so you need a solution to find recycled keys first, then fallback to the sequence perhaps.
Product ID Recycling
Batch Inserts - Use sequence for keys the first time you load them. For this small range, set NOCACHE on the sequence to eliminate gaps.
Deletes - When a product is deleted, instead of actually deleting the row, set a DELETED = 'Y' flag on the row.
Inserts - Update the first record available with DELETED flag set, or either select the min ID from product table where DELETED = 'Y'. Update record with new product info (but same ID) and set DELETED = 'N'
This ensures you always recycle before you insert new sequence IDs
If you want to implement the logic in the database, you can create a view (VIEW$PRODUCTS) where DELETED = 'N' and an INSTEAD OF INSERT trigger to do the insert.
In any scenario, when you run out of sequences (or sequence wraps), you are out of luck for batch inserts. I'd reconsider that part of the design if I were you.
I have a table A (3 columns) in production which is around 10 million records. I wanted to add one more column to that table and also I want to make default value to 1. Is it going to impact production DB performance If add a column with default value 1 or something else. What would be best approach to this to avoid any kind of performance impact on DB? your thoughts are much appreciated!!
In Oracle 11g the process of adding a new column with a default value has been considerably optimized. If a newly added column is specified as NOT NULL, default value for that column is maintained in the data dictionary and it's no longer required for a default value of a column to be stored for all records in a table, so it's no longer required to update each record with a default value. Such an optimization considerably reduces amount of time the table is exclusively locked during the operation.
alter table <tab_name> add(<col_name> <data_type> default <def_val> not null)
Moreover, column with a default value added that way will not consume space, until you deliberately start to update that column or insert a record with a non default value for that column. So the operation of adding a new column with a default value and not null constraint specified completes pretty quick.
i think that it is better that you create a table as backup table with this syntax:
create table BackUpTable as SELECT * FROM YourTable;
alter table BackUpTable add (newColumn number(5,0)default 1);