The question may sound vague , but am in a situation where I have to decide between the two options.
Say we have multiple requirements(modules) which require some configuration.
Is it preferable to have one configuration table per module or maintain single table for all the configurations.
Driver (or attributes) for the configuration for individual requirement might be different , so in case am opting for a single table, I will have to make sure driver of individual requirements is available or maintained in this single table. also, we may have to extend it in in case we have some more driver column in future requirements
Note : Data to be configured per module wont be more than 20 rows.
I have to analyse this and am just listing the pros and cons of this.
Please advise.
Also is there any disadvantage of having so many tables with this less data from DB point of view..
Related
Which one is better (performance wise and operation on the long run) in maintaining data loaded, managed or external?
And by maintaining, i mean that these tables will have the following operations on daily basis frequently;
Select using partitions most of the time.. but for some of it they are not used.
Delete specific records, not all the partition (for example found a problem in some columns and want to delete and insert it again). - i am not sure if this supported for normal tables, unless transactional is used.
Most important, The need to merge files frequently.. may be twice a day to merge small files to gain less mappers. I know concate is available on managed and insert overwrite on external.. which one is less cost?
It depends on your use case. External table is recommended when they are used across multiple application for example Along with hive pig or other application is also used for processing the data in this kind of scenario external tables are mainly recommended.They are used when you are mainly reading data.
While in case of managed tables hive have complete control over the data. Though you can convert any external table to managed and vice versa
alter table table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
As in your case you are doing frequent modifications in data so it is better that hive should have total control over the data. In this scenraio it is recommended to use Managed tables.
Apart from that managed table are more secure then external table because external table can be accessed by anyone. While in managed table you can implement hive level security which provided better control but in case of external you will have to implement HDFS level security.
You can refer the below links which can give you few pointers in considerations
External Vs Managed tables comparison
It's kinda real-world problem and I believe the solution exists but couldn't find one.
So We, have a Database called Transactions that contains tables such as Positions, Securities, Bogies, Accounts, Commodities and so on being updated continuously every second whenever a new transaction happens. For the time being, We have replicated master database Transaction to a new database with name TRN on which we do all the querying and updating stuff.
We want a sort of monitoring system ( like htop process viewer in Linux) for Database that dynamically lists updated rows in tables of the database at any time.
TL;DR Is there any way to get a continuous updating list of rows in any table in the database?
Currently we are working on Sybase & Oracle DBMS on Linux (Ubuntu) platform but we would like to receive generic answers that concern most of the platform as well as DBMS's(including MySQL) and any tools, utilities or scripts that can do so that It can help us in future to easily migrate to other platforms and or DBMS as well.
To list updated rows, you conceptually need either of the two things:
The updating statement's effect on the table.
A previous version of the table to compare with.
How you get them and in what form is completely up to you.
The 1st option allows you to list updates with statement granularity while the 2nd is more suitable for time-based granularity.
Some options from the top of my head:
Write to a temporary table
Add a field with transaction id/timestamp
Make clones of the table regularly
AFAICS, Oracle doesn't have built-in facilities to get the affected rows, only their count.
Not a lot of details in the question so not sure how much of this will be of use ...
'Sybase' is mentioned but nothing is said about which Sybase RDBMS product (ASE? SQLAnywhere? IQ? Advantage?)
by 'replicated master database transaction' I'm assuming this means the primary database is being replicated (as opposed to the database called 'master' in a Sybase ASE instance)
no mention is made of what products/tools are being used to 'replicate' the transactions to the 'new database' named 'TRN'
So, assuming part of your environment includes Sybase(SAP) ASE ...
MDA tables can be used to capture counters of DML operations (eg, insert/update/delete) over a given time period
MDA tables can capture some SQL text, though the volume/quality could be in doubt if a) MDA is not configured properly and/or b) the DML operations are wrapped up in prepared statements, stored procs and triggers
auditing could be enabled to capture some commands but again, volume/quality could be in doubt based on how the DML commands are executed
also keep in mind that there's a performance hit for using MDA tables and/or auditing, with the level of performance degradation based on individual config settings and the volume of DML activity
Assuming you're using the Sybase(SAP) Replication Server product, those replicated transactions sent through repserver likely have all the info you need to know which tables/rows are being affected; so you have a couple options:
route a copy of the transactions to another database where you can capture the transactions in whatever format you need [you'll need to design the database and/or any customized repserver function strings]
consider using the Sybase(SAP) Real Time Data Streaming product (yeah, additional li$ence is required) which is specifically designed for scenarios like yours, ie, pull transactions off the repserver queues and format for use in downstream systems (eg, tibco/mqs, custom apps)
I'm not aware of any 'generic' products that work, out of the box, as per your (limited) requirements. You're likely looking at some different solutions and/or customized code to cover your particular situation.
I had earlier created a project of storing daily data of particular entity in RDMS by creating a single table for each day and than storing data of that day in this table.
But now i want to shift my database from RDMS to HBase. So my question is whether I should create a single table and store data of all days in that table or I should use my earlier concept of creating a individual table for each day.I want to compare both cases on basis of performance of hbase.
Sorry if that question seems foolish to you.Thank you
As you mentioned there are 2 options
Option 1: Single table of all days data
Option 2: multiple tables
I would prefer Namespaces (introduced in version 0.96 is a very important feature) with option 2 if you have huge data for single day. This will support multi tenancy requirements also...
See Hbase Book
A namespace is a logical grouping of tables analogous to a database in relation database systems. This abstraction lays the groundwork for
upcoming multi-tenancy related features: Quota Management (HBASE-8410)
Restrict the amount of resources (ie regions, tables) a namespace can consume.
Namespace Security Administration (HBASE-9206) - Provide another level of security administration for tenants.
Region server groups (HBASE-6721) - A namespace/table can be pinned onto a subset of - RegionServers thus guaranteeing a course level of
isolation.
below are commands w.r.t. namespaces
alter_namespace, create_namespace, describe_namespace,
drop_namespace, list_namespace, list_namespace_tables
Advantage :
Even if you use column filters, since its less data(per day data), data retrieval will be fast for full table scan compared to single table approach(full scan on big table is costly)
If you want authentication and authorization on a specific table then it could also be achived.
Limitation : you will end up with multiple scripts to manage tables rather single script(option 1)
Note : In any afore mentioned options above,your rowkey design is very imp for better performance & prevent hotspoting.
For more details look at hbase-series
In our project we have a requirement to create dynamic notifications that "pop" in our site when a relevant rule applies.
We are based on oracle exadata as our main database.
This feature is suppose to allow the users to create dynamic rules that will be occasionally checked.
These rules may check specific fields in certain types, and may also check these fields relatively to other types field's data.
For example, if our program has a table of cars, with a location column, and another table of streets, with location column (no direct relation between those two tables), we might need to notify the users if a car is in a certain street.
Is there a good platform that can help us calculate the kind of "rules" that we want to check?
We started looking at elasticsearch and neo4j (we have a specific module that involves a graph-like relations..), but we aren't sure that they would be the right solution.
Any idea would be appreciated :)
Neo4j could help you to express your rules, but it sounds as if your disconnected data is rather queried by SQL style joins?
So if you want to express and manage your rules in predicates in the graph you can do that easily and then get a list of applicable rules to trigger queries in other databases.
I am building an information service that manages Suppliers.
The suppliers are used by our billing system, tender system and sales system.
Though 60% of the attributes of supplier are unique to each system, there are still 40% attributes of Supplier that are shared across the systems.
My objective is to build a flexible system, so that change to one individual system's data, should not impact other systems. For example, if i need to make certain tables offline for upgrading them, it should not impact rest of the systems that need supplier information.
What is the best way of achieving this? Should all the different context specific attributes live in one schema, but deployed on different table spaces?
Also, the read and update may happen more for one set of attributes than the other. How should i logically represent them via one model, but deploy them in such a fashion that they can evolve independently?
Thank you.
First, tablespaces are a means of controlling the storage characteristics of segments, they won't help wrt avoiding impact from changes.
I recommend you create separate child tables for each set of attributes, each with a 1:1 referential integrity constraint to a parent table. e.g.
SUPPLIERS (supplier_id PK, common attributes...)
SUPPLIER_BILLING_INFO (supplier_id PK, billing attributes...) + FK to SUPPLIERS
SUPPLIER_TENDER_INFO (supplier_id PK, tender attributes...) + FK to SUPPLIERS
SUPPLIER_SALES_INFO (supplier_id PK, sales attributes...) + FK to SUPPLIERS
Obviously they'll need to live in one instance. Whether you put them in one schema or in separate schemas is up to you.
Changes to one system should have no impact on other systems, as long as they don't all refer to all the tables (i.e. the Billing system should never access SUPPLIER_TENDER_INFO).
This sounds like a very difficult question that can't be easily answered here. But I can think of a few tricks that might help you with some of your issues. It is possible to make huge changes to your data and still keep the system online.
DBMS_REDEFINITION allows you to change your table structure while other people are still using the table (although it looks very complicated).
Partitioning also allows you to change part of your table without affecting other users. For example, you can truncate just one of the partitions of a table. Partitioning also allows you to use different physical structures for the same table. For example, one partition could use a tablespace with a small block size (good for writing), and another partition could use a tablespace with a larger block size (good for reading).