'allow_concurrent_memtable_write' on a column family level - leveldb

RocksDB supports concurrent writes on a memtable via the option, allow_concurrent_memtable_write which is a part of RocksDB Immutable DBOptions. Since this is a DBOption, this setting is applicable to all CFs created in the DB.
But I have a requirement where i want to enable concurrent writes in certain CFs and disable in others. Treating it more like a ColumnFamilyOptions.
I understand that, I can have two database pointers and separate the column families based on concurrent_writes setting. Still I would like to know if it can be done within the same DB.
Thanks in advance

No it is not possible, its a DB Level option not a column family option.

Related

Hive Managed vs External tables maintainability

Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison

How to know when data has been inserted in clickhouse

I understood that clickhouse is eventually consistent. So once an insert call returns, it doesn't mean that the data will appear in a select query.
does that apply to stand-alone clickhouse (no distribution, no replication)?
I understand the concept of eventual consistency for data replication, but does it apply with distribution but no replication?
using a distributed+replicated clickhouse, what is a recommended way to know that some insert(s) can be safely looked up?
Basically I didn't find much information on this topic, so maybe I am not asking the best questions. Feel free to enlighten me.
No, but single-node setup shouldn't be considered reliable either.
By default yes, you'll insert to node the client is connected to (probably via some load balancer) and Distributed table will asynchronously forward each piece of data to node where it belongs. The insert_distributed_sync=1 setting will make the client wait synchronously.
On insert use ***MergeTree shard tables directly (not Distributed) with insert_quorum=2 setting (if there are 3 replicas) and retry infinitely with exactly same batch if there are some errors (can use different replicas on retry, since there's a deduplication based on batch hash). Then on reads use select_sequential_consistency=1 setting.

Dynamically List contents of a table in database that continously updates

It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.

Create a new table vs modify existing table in DB

The question may sound vague , but am in a situation where I have to decide between the two options.
Say we have multiple requirements(modules) which require some configuration.
Is it preferable to have one configuration table per module or maintain single table for all the configurations.
Driver (or attributes) for the configuration for individual requirement might be different , so in case am opting for a single table, I will have to make sure driver of individual requirements is available or maintained in this single table. also, we may have to extend it in in case we have some more driver column in future requirements
Note : Data to be configured per module wont be more than 20 rows.
I have to analyse this and am just listing the pros and cons of this.
Please advise.
Also is there any disadvantage of having so many tables with this less data from DB point of view..

TTL Behavior - HBase

We have lot of data in a HBase table. Am new to this NoSQL world. We are looking to keep data only for fixed time. Should we write a separate clean up script or can we rely on TTL configuration?
I went through the docs available but am not understanding the exact behaviour.
The HBase documentation clearly says that data older than TTL will be automatically deleted by HBase.
Remember that data is never deleted by HBase until it does a compaction -- where it rewrites all of its data files. Once the data passes it TTL it will be invisible until a major compaction happens.
It behaves the way it says, i.e all the values in a row whose timestamps are older than the
configured TTL will be deleted at the next major compaction. It is an attribute of the column family. If you want the TTL to apply to the entire table, simply set it to the same value
for each column family in the table. This way you will get rid of the data once you are done with it.

Resources