I'm trying to understand what will happen if my application will attempt to add 2 columns simultaneously to ClickHouseDB table based on ReplicatedMergeTree engine using 2 different nodes. Will ClickHouse reject one of the ALTERs or will it fail to apply?
So I have 2 nodes A and B and table alter_test. And then I run on node A
ALTER TABLE alter_test ADD COLUMN Added1 UInt32 FIRST;
and at the same time on node B
ALTER TABLE alter_test ADD COLUMN Added1 String FIRST;
Will I always have 1 of the statements failed? I tried the docs and they say that the ALTERs are executed asynchronously after registering in Zookeeper. I guess my question is if ClickHouse will detect the conflict on the Zookeeper stage.
Better use
ALTER TABLE alter_test ON CLUSTER 'cluster_name' ADD COLUMN IF NOT EXISTS Added1 Uint32
Related
I have a table where i can chose to delete a row. e.g i have table having 5 records numbered from 1-5, after deleting lets say 3, am looking a way that the remaining records will be 1,2,3,4 and not 1,2,4,5.
If you want to reorder by IDs, please think if it is necessary. Here is explanation https://laracasts.com/discuss/channels/eloquent/how-do-you-handle-with-reordering-items Also if you are using soft deletes it can cause a problem concerning unique IDs.
If you are using other column to reorder by, you can iterate through each entity and setting value to iterator value after deleting.
Warning: If you use this approach on Primary Key columns it might bring inconsistency and mess up your relations if you don't take care of it properly. Also most of the time it is unnecessary to reorder the primary key column.
The process can be applied from database itself. The general procedure is as following:
Make sure if the column is used as foreign keys in other tables, the definitions must have ON UPDATE CASCADE
(On production server) to reduce inconsistency put lock on the table.
Apply reordering
SQL:
For example you can run the following commands for MySQL (inspired by this answer)
-- if on production we lock the table write
LOCK TABLES my_reordering_table WRITE;
SET #count = 0;
UPDATE `my_reordering_table` SET `my_reordering_table`.`id` = #count:= #count + 1;
ALTER TABLE `my_reordering_table` AUTO_INCREMENT = 1;
Warning: Again make sure you do not run this on production server and if you run, just make sure you have all the foreign keys ON UPDATE CASCADE.
I need to update the some tables in my application from some other warehouse tables which would be updating weekly or biweekly. I should update my tables based on those. And these are having foreign keys in another tables. So I cannot just truncate the table and reinsert the whole data every time. So I have to take the delta and update accordingly based on few primary key columns which doesn't change. Need some inputs on how to implement this approach.
My approach:
Check the last updated time of those tables, views.
If it is most recent then compare each row based on the primary key in my table and warehouse table.
update each column if it is different.
Do nothing if there is no change in columns.
insert if there is a new record.
My Question:
How do I implement this? Writing a PL/SQL code is it a good and efficient way? as the expected number of records are around 800K.
Please provide any sample code or links.
I would go for Pl/Sql and bulk collect forall method. You can use minus in your cursor in order to reduce data size and calculating difference.
You can check this site for more information about bulk collect, forall and engines: http://www.oracle.com/technetwork/issue-archive/2012/12-sep/o52plsql-1709862.html
There are many parts to your question above and I will answer as best I can:
While it is possible to disable referencing foreign keys, truncate the table, repopulate the table with the updated data then reenable the foreign keys, given your requirements described above I don't believe truncating the table each time to be optimal
Yes, in principle PL/SQL is a good way to achieve what you are wanting to
achieve as this is too complex to deal with in native SQL and PL/SQL is an efficient alternative
Conceptually, the approach I would take is something like as follows:
Initial set up:
create a sequence called activity_seq
Add an "activity_id" column of type number to your source tables with a unique constraint
Add a trigger to the source table/s setting activity_id = activity_seq.nextval for each insert / update of a table row
create some kind of master table to hold the "last processed activity id" value
Then bi/weekly:
retrieve the value of "last processed activity id" from the master
table
select all rows in the source table/s having activity_id value > "last processed activity id" value
iterate through the selected source rows and update the target if a match is found based on whatever your match criterion is, or if
no match is found then insert a new row into the target (I assume
there is no delete as you do not mention it)
on completion, update the master table "last processed activity id" to the greatest value of activity_id for the source rows
processed in step 3 above.
(please note that, depending on your environment and the number of rows processed, the above process may need to be split and repeated over a number of transactions)
I hope this proves helpful
When I execute the UPSERT command on apache phoenix, I always see that Phoenix add an extra column (named _0) with an empty value in the hbase, this column(_0) is auto generate by phoenix, but I don't need it, like this:
ROW COLUMN+CELL
abc column=F:A,timestamp=1451305685300,value=123
abc column=F:_0, timestamp=1451305685300, value= # I want to avoid generate this row
Could you tell me how to avoid that? Thank you very much!
"At create time, to improve query performance, an empty key value is
added to the first column family of any existing rows or the default
column family if no column families are explicitly defined. Upserts will also add this empty key value. This improves query performance by having a key value column we can guarantee always being there and thus minimizing the amount of data that must be projected and subsequently returned back to the client."
Apache Phoenix Documentation
Regarding your question if that is avoidable:
You could work around the problem by adding the following statements at the end of your sql:
ALTER TABLE "<your-table>" ADD "<your-cf>"."_0" VARCHAR(1);
ALTER TABLE "<your-table>" DROP COLUMN "<your-cf>"."_0";
You should only do this if you query some table with phoenix but then access the table with another system that is not aware of this phoenix-specific dummy value.
I have a database with about 125 000 rows, each row with primary key, couple of int columns and couple of varchars.
I've added an int column and I'm trying to populate it before adding not null constraint.
The db is persisted in script file. I've read somewhere that all the affected rows get loaded to memory before the actual update, which means there wont be a disk write for every row. The whole db is about 20MB which would mean loading it and doing the update should be reasonably fast, right?
So, no joins, no nested queries, basic update.
I've tried multiple db managers including the one bundled with hsql jar.
update tbl1 set col1 = 1
Query never finishes executing.
It is probably running out of memory.
The easier way to do this operation is to define the column with DEFAULT 1, which does not use much memory regardless of the size of the table. You can even add the not null constraint at the same time
ALTER TABLE T ADD COLUMN C INT DEFAULT 1 NOT NULL
Suppose the following scenario:
I have a master database that contains lots of data, in this database I have a key table that I'm going to call DataOwners for this example, the DataOwners table has 4 records, each record of each of the other tables in the database "belongs" directly or indirectly to a record of the DataOwners, and by belongs I mean is linked to it with foreign keys.
I also have other 2 slave databases with the exact same structure from my master database whose are only updated through replication from my master database, but SlaveDatabase1 should only have records from DataOwner 2 and SlaveDatabase2 should only have records from DataOwners 1 and 3 whereas MasterDatabase has records of DataOwners 1, 2, 3 and 4.
Is there any tool for Oracle that allows me to do this kind of selective record replication?
If not, is there any way to improve my replication method? which is:
add to each table a trigger that inserts the record changes in a group of replication tables
execute the commands of the replication tables at selected slaves
The simplest option would be to define materialized views in the various slave databases that replicate just the data that you want. So, for example, if there is a table A in the master database, then in slave database 1, you'd create a materialized view
CREATE MATERIALIZED VIEW a
<<refresh criteria>>
AS
SELECT a.*
FROM a#to_master a,
dataOwners#to_master dm
WHERE a.dataOwnerID = dm.dataOwnerID
AND dm.some_column = <<some criteria that selects DataOwner2>>
while slave database 2 has a very similar materialized view
CREATE MATERIALIZED VIEW a
<<refresh criteria>>
AS
SELECT a.*
FROM a#to_master a,
dataOwners#to_master dm
WHERE a.dataOwnerID = dm.dataOwnerID
AND dm.some_column = <<some criteria that selects DataOwner1 & 3>>
Of course, if the dataOwnerID can be hard-coded, you could simplify things and avoid doing the join. I'm guessing, though, that there is some column in the DataOwners table that identifies which slave a particular owner should be replicated to.
Assuming that you want only incremental changes to be replicated, you'd need to create some materialized view logs on the base tables in the master database. And you would probably want to configure refresh groups on the slave databases so that all the materialized views would refresh at the same time and would be transactionally consistent with each other.
Oracle Golden Gate software can do all these tasks. Insert/Update/Delete have the same order of the master db, so it can avoid the foreign keys and other constraint issues.
MasterDatabase Extract generates a trail file, then split out the data to DB 1,2,3 and 4.
It also can do multiple ways replications, i.e. DB 1 sends data back to the Master DB.
Besides the Golden Gate, trigger may be your other option. But it requires some programming.