Sync ALTER TABLE ... DELETE on all replicas of a Clickhouse cluster - cluster-computing

TL;DR
There are 2 questions:
How to correctly do a synchronous ALTER TABLE .. DELETE on a Clickhouse cluster.
Will data be deleted from all replicas when using ALTER TABLE .. DELETE with mutations_sync = 2 settings and without using ON CLUSTER. Or how could this be verified
Long
There are 2 CH installations: single-server (H1) and cluster (3 nodes, H2). I have created a table foo with the engines: H1 - ReplacingMergeTree, H2 - ReplicatedReplacingMergeTree (using ON CLUSTER bar). Then I make requests:
For each request, I generated 1kk rows (about 200 MB was obtained approximately).
Request to H1 (single-server)
clickhouse-client -h $H1 --queries-file=queries.sql
queries.sql:
ALTER TABLE foo DELETE WHERE 1 SETTINGS mutations_sync = 0;
SELECT * FROM foo LIMIT 1
The result of SELECT shows some record, that hasn't been deleted yet. Stand to reason.
Do the same, but with mutations_sync = 1. The SELECT returned 0 rows. Same with mutations_sync = 2. So far, everything is as expected.
Request to H2 (cluster):
clickhouse-client -h $H2 --queries-file=queries.sql
queries.sql:
ALTER TABLE foo ON CLUSTER bar DELETE WHERE 1 SETTINGS mutations_sync = 2;
SELECT * FROM foo LIMIT 1
The SELECT returns some record, although it seems it shouldn't, since mutations_sync = 2 means that the request must complete on all replicas before it is finished (or am I misunderstanding something?)
Do the same, but remove ON CLUSTER bar from ALTER TABLE. In this case, the result of SELECT is 0 rows.
I assume that the reason of such behavior in case 3 is due to when the ON CLUSTER option is used, the request goes to ZooKeeper, and immediately complete because ZK just gets the request to send it to all replicas, but don't wait for its completion. Is that right?
I want to check if data is deleted from all replicas in case 4. I've tried making requests like:
#!/bin/bash
clickhouse-client -h $H2_REPLIC1 --query="ALTER TABLE topics ON CLUSTER dc2_test DELETE WHERE 1 SETTINGS mutations_sync = 0";
clickhouse-client -h $H2_REPLIC2 --query="SELECT * FROM topics LIMIT 1 FORMAT TabSeparated";
But both using mutations_sync = 0 and mutations_sync = 2, the SELECT returns 0 rows (even if increase number of generated rows in foo to 30kk). I don't understand this behavior, so I can't get the answer to my 2nd question (in TL;DR)

No way.
No.
Mutations were implemented as an ADMIN operations to solve GDPR problem, but not daily basis business (USER) tasks.
That's why mutations don't not provide consistency / atomicity.
And that's why mutations are very unreliable if you are trying to use them to solve business logic tasks (USER tasks).

Related

Is there any way to limit the size of flow table in Open vSwitch and to verify that it is working?

I have used the following instruction to restrict the flow table 0 to 5 entries only:
$ sudo ovs-vsctl -- --id=#ft create Flow_Table flow_limit=5 overflow_policy=refuse -- set Bridge s1 flow_tables=0=#ft
When i dump flows, it applies the limit 5 to flow table 0 as follows:
sudo ovs-ofctl dump-tables s1
OFPST_TABLE reply (xid=0x2):
table 0 ("classifier"):
active=1, lookup=26, matched=0
max_entries=5
But when i ping from h1 to h2, it keep storing the ping more than 5 as follows
A flow entry is defined as a unique tuple.
There is only one entry, h1 to h2. Although there are multiple pings, the same flow entry is receiving packets, which is why it says, "active=1".
If you were to pingall, the response would read, "active=2":
h1 to h2
h2 to h1

How to get mutation is done?(ReplicatedMergeeTree)

In my limited experience with CH Cluster, now I have two nodes, using replicatedMergeTree,1 sharding 2 replicas. I meet the problem that do data synchronize from Mysql.
When to update the table, I first delete data some days ago and count the table record where date >days_ago, and then load data from Mysql,codes like follows:
delete from ods.my_table where data_date>:days_ago;
# here to check if record count is zero
select count(*) from ods.my_table where data_date>:days_ago;
# if count(*) =0 ,load data ; else wait
insert into ods.my_table select * from mysql('xxx'......) where data_date>:days_ago;
but I get zero records in CH ods.my_table where data_date>:days_ago;
if I run it again, it will have data; and run it again, it will be zero..., the result is like that: when it's zero, rerun will be ok; when it's not zero, rerun will not be ok.
I analysis the log, and found that when the mutation is not done, the insert statement has been executed, so, data missed.
I try to check if the mutation is finished on the table, but I could not find any solution, can anybody help me ? Thank you in advantage?
just add table TTL definition on clickhouse side and forget about manual delete
https://clickhouse.tech/docs/en/engines/table-engines/mergetree-family/mergetree/#mergetree-table-ttl
you can add TTL to exists clickhouse MergeTree table
https://clickhouse.tech/docs/en/sql-reference/statements/alter/ttl/

Cassandra timing out when queried for key that have over 10,000 rows even after giving timeout of 10sec

Im using a DataStax Community v 2.1.2-1 (AMI v 2.5) with preinstalled default settings.
And i have a table :
CREATE TABLE notificationstore.note (
user_id text,
real_time timestamp,
insert_time timeuuid,
read boolean,
PRIMARY KEY (user_id, real_time, insert_time))
WITH CLUSTERING ORDER BY (real_time DESC, insert_time ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}
AND **default_time_to_live** = 20160
The other configurations are:
I have 2 nodes. on m3.large having 1 x 32 (SSD).
Im facing the issue of timeouts even if consistency is set to ONE on this particular table.
I increased the heap space to 3gb [ram size of 8gb]
I increased the read timeout to 10 secs.
select count (*) from note where user_id = 'xxx' limit 2; // errors={}, last_host=127.0.0.1.
I am wondering if the problem could be with time to live? or is there any other configuration any tuning that matters for this.
The data in the database is pretty small.
Also this problem occurs not as soon as you insert. This happens after some time (more than 6 hours)
Thanks.
[Copying my answer from here because it's the same environment/problem: amazon ec2 - Cassandra Timing out because of TTL expiration.]
You're running into a problem where the number of tombstones (deleted values) is passing a threshold, and then timing out.
You can see this if you turn on tracing and then try your select statement, for example:
cqlsh> tracing on;
cqlsh> select count(*) from test.simple;
activity | timestamp | source | source_elapsed
---------------------------------------------------------------------------------+--------------+--------------+----------------
...snip...
Scanned over 100000 tombstones; query aborted (see tombstone_failure_threshold) | 23:36:59,324 | 172.31.0.85 | 123932
Scanned 1 rows and matched 1 | 23:36:59,325 | 172.31.0.85 | 124575
Timed out; received 0 of 1 responses for range 2 of 4 | 23:37:09,200 | 172.31.13.33 | 10002216
You're kind of running into an anti-pattern for Cassandra where data is stored for just a short time before being deleted. There are a few options for handling this better, including revisiting your data model if needed. Here are some resources:
The cassandra.yaml configuration file - See section on tombstone settings
Cassandra anti-patterns: Queues and queue-like datasets
About deletes
For your sample problem, I tried lowering the gc_grace_seconds setting to 300 (5 minutes). That causes the tombstones to be cleaned up more frequently than the default 10 days, but that may or not be appropriate based on your application. Read up on the implications of deletes and you can adjust as needed for your application.

Best SQL DB design for temp storage of millions of records

I have a database table that collects records at the rate of about 4 records per/sec/device. This table gets pretty big pretty fast. When a device completes its task another process will loop through all the records, perform some operations, combine them into 5 minute chunks, compress them and store them for later use. Then it deletes all the records in that table for that device.
Right now there are nearly 1 million records for several devices. I can loop through them just fine to perform the processing, it appears, but when I try to delete them I time out. Is there a way to delete these records more quickly? Perhaps by turning off object tracking temporarily? Using some lock hint? Would the design be better to simply create a separate table for each device when it begins its task and then just drop it once processing of the data is complete? The timeout is set to 10 minutes. I would really like to get that process to complete within that 10 minute period if possible.
CREATE TABLE [dbo].[case_waveform_data] (
[case_id] INT NOT NULL,
[channel_index] INT NOT NULL,
[seconds_between_points] REAL NOT NULL,
[last_time_stamp] DATETIME NOT NULL,
[value_array] VARBINARY (8000) NULL,
[first_time_stamp] DATETIME NULL
);
CREATE CLUSTERED INDEX [ClusteredIndex-caseis-channelindex] ON [dbo]. [case_waveform_data]
(
[case_id] ASC,
[channel_index] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
CREATE NONCLUSTERED INDEX [NonClusteredIndex-all-fields] ON [dbo].[case_waveform_data]
(
[case_id] ASC,
[channel_index] ASC,
[last_time_stamp] ASC
)
INCLUDE ( [seconds_between_points],
[value_array]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON PRIMARY]
SQL Server 2008+ standard is the DB platform
UPDATE 3/31/2014:
I have started going down a path that seems to be problematic. Is this really all that bad?
I am creating a stored proc that takes a table-value parameter containing the data I want to append and a varchar parameter that contains a unique table name for the device. This stored proc is going to check for the existence of the table and, if it does not exist, create it with a specific structure. Then it will insert the data from the TVP. The problem I see is that I have to use dynamic sql in the SP as there seems to be no way to pass in a table name as a variable to either a CREATE or INSERT. Plus, every article I read on how to do this says not to...
Unfortunately, if I have a single table which is getting all the inserts at a frequency of 4/sec/device, just doing a count on the table for a specific case_id takes 17 minutes even with a clustered index on case_id and channel_index. So trying to delete them takes around 25 - 30 minutes. This also causes locking to occur and therefore the inserts start taking longer and longer which causes the service to get way behind. This even occurs when there is no deleting happening as well.
The described stored proc is designed to reduce the inserts from 4/sec/device to 1/sec/device as well as making it possible to just drop the table when done rather than deleting each record individually. Thoughts?
UPDATE 2 3/31/2014
I am not using cursors or any looping in the way you are thinking. Here is the code I use to loop through the records. This runs at an acceptable speed however:
using (SqlConnection liveconn = new SqlConnection(LiveORDataManager.ConnectionString))
{
using (SqlCommand command = liveconn.CreateCommand())
{
command.CommandText = channelQueryString;
command.Parameters.AddWithValue("channelIndex", channel);
command.CommandTimeout = 600;
liveconn.Open();
SqlDataReader reader = command.ExecuteReader();
// Call Read before accessing data.
while (reader.Read())
{
var item = new
{
//case_id = reader.GetInt32(0),
channel_index = reader.GetInt32(0),
last_time_stamp = reader.GetDateTime(1),
seconds_between_points = reader.GetFloat(2),
value_array = (byte[])reader.GetSqlBinary(3)
};
// Perform processing on item
}
}
}
The SQL I use to delete is trivial:
DELETE FROM case_waveform_data where case_id = #CaseId
This line takes 25+ minutes to delete 1 million rows
Sample data (value_array is truncated):
case_id channel_index seconds_between_points last_time_stamp value_array first_time_stamp
7823 0 0.002 2014-03-31 15:00:40.660 0x1F8B0800000000000400636060 NULL
7823 0 0.002 2014-03-31 15:00:41.673 0x1F8B08000000000004006360646060F80F04201A04F8418C3F4082DBFBA2F29E5 NULL
7823 0 0.002 2014-03-31 15:00:42.690 0x1F8B08000000000004006360646060F80F04201A04F8418C3F4082DBFB NULL
When deleting the large amount of data from Table, underneath SQL server marks those to delete and as a background job sql server will actually delete them from page as it gets some idle time. Also unlike Truncate; Delete is Log Enable.
If you would have Enterprise Edition, as other developers have suggested using Partition is the possible approach but you have standard edition.
Option:1 "Longer, Tedious, Subjective for 100% Performance"
lets say you keep the single Table approach. you can add a new column "IsProcessed" to indicate what records are already processed. when you insert new data it will have default value 0 so other processes consuming this data will now filter their query using this column as well. after processing you will need additional update on the table to mark those row as IsProcessed=1. Now you can create SQL Server JOB to delete top N rows where Isprocessed=1 and schedule that job as frequently as you can on ideal time slot. "TOP N" because you have to find out by try and error what is the best number for your environment. it may be 100, 1000, 10,000. in my experience if the number is smaller works best. increase the frequency of job execution. lets say "DELETE top 1000 From Table" takes 2 minutes. and you have 4 hours of clean window over night when this table is not being used, you can schedule this job to run every 5 minutes. 3 minutes is just buffer. and hence 12 exec/hour and 1000 rows per execution in 4 hours you will be deleting 48k rows from table. and then over Weekend you have larger window and you will have to catch up with remaining rows.
You can see in this approach lots of back and forth ad lot of minute details and yet it is not sure if this will be last for long for your needs in future. suddenly the input volume of data gets double and all your calculation will fall apart. another down side of this approach is consumer queries of the data will have to now relies on IsProcessed Column value. in your specific case consumer always read all data for a device so Indexing the table doesn't help you instead it will hurt the Insert process performance.
I did personalty experience this solution and last for 2 years in one of our client env.
Option:2 "Quick, Efficient, Make Sense to me, May Work for You"
Creating the One Table for device and as you mentioned using a stored procedure to create the table on the fly if does not exist. this is my most recent experience where we have Metadata driven ETL and all the ETL Target objects and API are getting created during run time based on user configuration. yes it is Dynamic SQL but if used wisely and once tested for performance it is not bad. The down side of this approach is debugging during initial phase if something not working. but in your case you know the table structures and it is fixed you are not dealing with daily change in table structure. That is why I think this is more suitable for your situation. Another thing is now you will have to also make sure TempDB is configured properly because using TVP and temp table increases the tempDB usage drastically so you initial and increment space assigned to tempdb, disk on which tempDB is located are two main thing to look at. As I said in Option-1, as the consumer proceses of the data always uses the ALL DATA I do not think you need any extra indexing in place. in fact I would test the performance w/o any index as well. it is like processing All Staging Data.
look at the sample code for this approch. If you feel positive and some doubt on inding or any other aspect of this approach lets us know.
Prepare the Schema objects
IF OBJECT_ID('pr_DeviceSpecificInsert','P') IS NOT NULL
DROP PROCEDURE pr_DeviceSpecificInsert
GO
IF EXISTS (
SELECT TOP 1 *
FROM sys.table_types
WHERE name = N'udt_DeviceSpecificData'
)
DROP TYPE dbo.udt_DeviceSpecificData
GO
CREATE TYPE dbo.udt_DeviceSpecificData
AS TABLE
(
testDeviceData sysname NULL
)
GO
CREATE PROCEDURE pr_DeviceSpecificInsert
(
#DeviceData dbo.udt_DeviceSpecificData READONLY
,#DeviceName NVARCHAR(200)
)
AS
BEGIN
SET NOCOUNT ON
BEGIN TRY
BEGIN TRAN
DECLARE #SQL NVARCHAR(MAX)=N''
,#ParaDef NVARCHAR(1000)=N''
,#TableName NVARCHAR(200)=ISNULL(#DeviceName,N'')
--get the UDT data into temp table
--because we can not use UDT/Table Variable in dynamic SQL
SELECT * INTO #Temp_DeviceData FROM #DeviceData
--Drop and Recreate the Table for Device.
BEGIN
SET #SQL ='
if object_id('''+#TableName+''',''u'') IS NOT NULL
drop table dbo.'+#TableName+'
CREATE TABLE dbo.'+#TableName+'
(
RowID INT IDENTITY NOT NULL
,testDeviceData sysname NULL
)
'
PRINT #SQL
EXECUTE sp_executesql #SQL
END
--Insert the UDT data in to actual table
SET #SQL ='
Insert INTO '+#TableName+N' (testDeviceData)
Select testDeviceData From #Temp_DeviceData
'
PRINT #SQL
EXECUTE sp_executesql #SQL
COMMIT TRAN
END TRY
BEGIN CATCH
ROLLBACK TRAN
SELECT ERROR_MESSAGE()
END CATCH
SET NOCOUNT OFF
END
Execute The sample Code
DECLARE #DeviceData dbo.udt_DeviceSpecificData
INSERT #DeviceData (testDeviceData)
SELECT 'abc'
UNION ALL SELECT 'xyz'
EXECUTE dbo.pr_DeviceSpecificInsert
#DeviceData = #DeviceData, -- udt_DeviceSpecificData
#DeviceName = N'tbl2' -- nvarchar(200)

Why are these queries deadlocking?

I've got two Oracle queries running in different sessions, which are deadlocking, and I'm having trouble seeing why that's happening.
The query in session 1 is this:
UPDATE REFS R SET R.REFS_NAME = :B2 WHERE R.REFS_CODE = :B1
The query in session 2 is this:
UPDATE REFS R SET R.STATUS_CODE = :B3, R.STATUS_TYPE = :B2 WHERE R.REFS_CODE = :B1
Each is surrounded by a cursor that loops through a selection of primary key values. When these queries are run at the same time, they deadlock. REFS_CODE is the primary key, and the Oracle trace shows that they're updating different rowids. The primary key is indexed, obviously, and there are some foreign key constraints, which are supported by indexes as this has been a problem for us in the past.
Moving into the realm of desperation, I've tried disabling triggers on the table, and that didn't help. Also tried using autonomous transactions, but that made things much worse.
Is there something I'm missing? Thanks for any help!
If a commit is happening after the entire cursor batch is updated, then it may just be a straight forward deadlocking scenario where the two cursors are operating on the same rows but in a different order.
Assume session 1 has cursor set 1 and is updating refs_code 1 and refs_code 2 in that order before attempting a commit.
Assume session 2 has cursor set 2 and is updating refs_code 2 and refs_code 1 in that order before attempting a commit.
Then, interleaving the updates:
time cursor set 1 cursor set 2
==== ============ ============
t1 refs_code 1 -
t2 - refs_code 2
t3 refs_code 2 -
t4 - refs_code 1
at t3, cursor set 1 is waiting on cursor set 2 to commit refs_code 2
at t4, cursor set 2 is waiting on cursor set 1 to commit refs_code 1
The two transactions are waiting on different rowids. If that is the case, you may be able to add an order by (in the same direction) to both cursors to help avoid this.

Resources