how to create multiple logical tables in single levelDB instance? - key-value-store

I am working on a Distributed key-value system (or data-store), which uses levelDB as its embedded database library in the back-end.
I want one node/machine to host multiple tables (for the purpose of replication and load-balancing). I understand levelDB has no notion of tables, so I cannot logically partition my data in form of tables (hence cannot use these tables as my basic unit of distribution).
My question is: is there a provision of having multiple 'logical tables' in single instance of levelDB ?
From what I know, I can have multiple instances of levelDB running on my node each handling one table. But I do not want to do that, since in this case there will be serious contention (at disk I believe) when these multiple DB instances are accessed simultaneously. While having multiple logical tables in single instance of DB can give me advantages of levelDB optimizations for minimizing disk accesses.

If you want to have multiple "logical tables" in LevelDB, then you have to partition your key space or add a prefix to the keys. For each table create a different prefix, eg:
0x0001 is for table 1
0x0002 is for table 2
0x0003 is for table 3
and so on...
So a key would consist of the table prefix and the key itself: [0x0001,0xFF11] would address key 0xFF11 in table 1. You can then use a single LevelDB instance and have multiple "key spaces" which would correspond to "tables".

Your best option is partitioning the key space using a key prefix as suggested by Lirik. Though opening multiple databases is possible, I would not recommend it for your use case, since the databases will not share any buffers and caches. Working with multiple open databases may negatively impact performance, and it will make optimizing resource use (mostly memory) a lot harder.

Related

Hive Managed vs External tables maintainability

Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison

what is more efficient in performance of hbase,multiple tables of same structure or a single table containing large set of data?

I had earlier created a project of storing daily data of particular entity in RDMS by creating a single table for each day and than storing data of that day in this table.
But now i want to shift my database from RDMS to HBase. So my question is whether I should create a single table and store data of all days in that table or I should use my earlier concept of creating a individual table for each day.I want to compare both cases on basis of performance of hbase.
Sorry if that question seems foolish to you.Thank you
As you mentioned there are 2 options
Option 1: Single table of all days data
Option 2: multiple tables
I would prefer Namespaces (introduced in version 0.96 is a very important feature) with option 2 if you have huge data for single day. This will support multi tenancy requirements also...
See Hbase Book
A namespace is a logical grouping of tables analogous to a database in relation database systems. This abstraction lays the groundwork for
upcoming multi-tenancy related features: Quota Management (HBASE-8410)
Restrict the amount of resources (ie regions, tables) a namespace can consume.
Namespace Security Administration (HBASE-9206) - Provide another level of security administration for tenants.
Region server groups (HBASE-6721) - A namespace/table can be pinned onto a subset of - RegionServers thus guaranteeing a course level of
isolation.
below are commands w.r.t. namespaces
alter_namespace, create_namespace, describe_namespace,
drop_namespace, list_namespace, list_namespace_tables
Advantage :
Even if you use column filters, since its less data(per day data), data retrieval will be fast for full table scan compared to single table approach(full scan on big table is costly)
If you want authentication and authorization on a specific table then it could also be achived.
Limitation : you will end up with multiple scripts to manage tables rather single script(option 1)
Note : In any afore mentioned options above,your rowkey design is very imp for better performance & prevent hotspoting.
For more details look at hbase-series

Advantages of temporary tables in Oracle

I've tried to figure out which performance impacts the use of temporary tables has on an Oracle database. We want to use these tables in our ETL process to save temporary results. At this time we are using physical tables for this purpose and truncating this tables at the beginning of the ETL process. I know that the truncate process is very expensive and therefore I thought if it would be better to use temporary tables instead.
Have anyone of you experiences if there is a performance boost by using temporary tables in this scenario?
There were only some answers on this question regarding to the SQL Server like in this question. But I don't know if these recommendations also fit for the Oracle db.
It would be nice if anyone could list the advantages and disadvanteges of this feature and also point out in which scenarios this feature could be applicable.
Thanks in advance.
First of all: truncate is not expensive, a delete with no condition is very expensive.
Second: do your temporary table have indexes? What about external keys?
That could affect performance.
The temporary table works more or less like Sql Server (of course the syntax is different, like global temporary table), and both are just table.
You won't get any performance gain with temporary tables against normal table, they are just the same: they have a definition on DB, can have indexes, and are logged.
The only difference is that temporary table are exclusive to your session (except for global table) and that means if multiple scripts from multiple sessions refer to the same table, every one is reading/writing a different table and they cannot locking each other (in this case you could gain performance, but I think it's rarely the case).

Can a single Oracle block have rows for two different tables?

Oracle has logical blocks (basic unit) to store data. I want to ask can a single block have rows for two different tables?
Yes it can. Tables belonging to the same cluster can have rows within same data block. This is the basic idea of the cluster. To keep the related data as close as possible. So if you make a logical join there is no work needed, the data is joined already. So both logical and physical IOs are reduced.
See https://docs.oracle.com/database/121/CNCPT/tablecls.htm#CNCPT608.

does Firebird defrag? If so, like a clustered index?

I've seen a few (literally, only a few) links and nothing in the documentation that talks about clustering with Firebird, that it can be done.
Then, I shot for the moon on this question CLUSTER command for Firebird?, but answerer told me that Firebird doesn't even have clustered indexes at all, so now I'm really confused.
Does Firebird physically order data at all? If so, can it be ordered by any key, not just primary, and can the clustering/defragging be turned on and off so that it only does it during downtime?
If not, isn't this a hit to performance since it will take the disk longer to put together disparate rows that naturally should be right next to each other?
(DB noob)
MVCC
I found out that Firebird is based upon MVCC, so old data actually isn't overwritten until a "sweep". I like that a lot!
Again, I can't find much, but it seems like a real shame that data wouldn't be defragged according to a key.
This says that database pages are defragmented but provides no further explanation.
Firebird does not cluster records. It was designed to avoid the problems that require clustering and the fragmentation problems that come with clustered indexes. Indexes and data are stored separately, on different types of pages. Each data page contains data from only one table. Records are stored in the order they were inserted, give or take concurrent inserts, which generally go on separate pages. When old records are removed, new records will be stored in their place, so new records sometimes appear on the same page as older ones.
Many tables use an artificial primary key, generally ascending, which might be a database generated sequence or a timestamp. That practice causes records to be stored in key order, but that order is by no means guaranteed. Nor is it very interesting. When the primary key is artificial, most queries that return groups of related records are done on secondary indexes. That's a performance hit for records that are clustered because look-ups on secondary indexes require traversing two indexes because the secondary index provides only the key to the primary index, which must be traversed to find the data.
On the larger issue of defragmentation and space usage, Firebird tracks the free space on pages so new records will be inserted on pages that have had records removed. If a page becomes completely empty, it will be reallocated. This space management is done as the database runs. As you know, Firebird uses Multi-Version Concurrency Control, so when a record is updated or deleted, Firebird creates a new record version, but keeps the old version around. When all transactions that were running before the change was committed have ended, the old record version no longer serves any purposes, and Firebird will remove it. In many applications, old versions are removed in the normal course of running the database. When a transaction touches a record with old versions, Firebird checks the state of the old versions and removes them if no running transaction can read them. There is a function called "Sweep" that systematically removes unneeded old record versions. Sweep can run concurrently with other database activity, though it's better to schedule it when the database load is low. So no, it's not true that nothing is removed until you run a sweep.
Best regards,
Ann Harrison
who's worked with Firebird and it's predecessors for an embarassingly long time
BTW - as the first person to answer mentioned, Firebird does leave space on pages so that the old version of a record stays on the same page as the newer version. It's not a fixed percentage of the space, but 16 bytes per record stored on the page, so pages of tables with very short records have more free space and tables that have long records have less.
On restore, database pages are created ~70% full (as I recall, unless you specify gbak's -use_all_space switch) and the restore is done one table at a time, writing pages to the end of the database file as needed. You can imagine a scenario where pages could be condensed down to much less. Hence bringing the data together and "defragging" it.
As far as controlling the physical grouping on disk or doing an online defrag -- in Firebird there is none. Remember that just because you need to access a page does not mean your disk does a read -- file system and database cache can avoid it!

Resources