Greenplum distribution key that depends on the distribution of other key - greenplum

There is a table A that is distributed by column x;
CREATE table A (x integer, y integer) distributed by (x);
And there is another table B which has the column "y" but not the "x".
CREATE TABLE B (y integer);
Is there any way we can distribute B according to the distribution of "y" on table A ? So we can join both tables using column "y" without needing to redistribute data between the segments? Basically meaning that the distribution of column "y" on table A drives the distribution of table B.
SELECT
*
FROM A
JOIN B using (y);
The problem being that table A is often joined to other tables distributed by "x" and that's why we don't want to change table A's distribution.
Column "y" is unique
Thanks.

No you can't but I wouldn't be that worried about it. Do you filter table b or a when you join these two tables together? If so, Greenplum will use that to only motion the data it needs to complete the data.
Secondly, if you are that worried about it, you could de-normalize table a by adding the columns from table b.

Related

In Oracle I want to create a "routing interface" which insert into separate tables based on parameter

I need to find a solution to the following problem: there should be a common and single "interface" that I can use in an insert into statement, something like this: insert into INTERFACE (fields) select ...
But there are many tables with the same structure behind the interface which should decide based on list of values (coming in a field) where to put the data. The tables are partitioned by range interval (daily) right now.
I was thinking about having a composite partitioned table which cannot be SELECT-ed to avoid mixing different type of data in a single select query, but creating views on the top of it. In this case the table should be partitioned like this: partition by list FIELD subpartition by range interval. But oracle 12 does not support this.
Any idea how to solve this? (There is a reason why I need a single interface and why I have to store data separately.)
Thank you in advance!
The INSERT ALL syntax can help easily route data to specific tables based on conditions:
create table interface1(a number, b number);
create table interface2(a number, b number);
insert all
when a <= 1 then
into interface1
else
into interface2
select '1' a, 2 b from dual;

check for data transformation oracle etl

I am new to oracle and I would like to know how do we validate parent child relationship, compare range of values and validate
data types as part of the ETL testing process. (The two tables could be T1 and T2). Please let me know a sample query.
example: T1 is the temporary loading table, And T2 is the new table. we need to make sure that all the data between T1 and T2 is valid with context of range of the value of the variables, the relationships, and data types.
Thanks, Santosh
In order to validate data between two oracle tables following scenarios should be considered-:
1) Data comparison - compare the data between two tables using minus queries.
Select [column names] from tableA
Minus
Select [column names] from Table
2) Business rules - Verify the data complies to the business rules like in this can age between some range.
You can write negative queries to test such scenarios like
Select * from table where (age < x or age > y)
3) Data truncation - Make sure data in target database is not truncated. Make sure that is length of target column is not less than source column or the maximum length of data at source side
4) Data Correctness - Verify that data is not inaccurately recorded, check default values, Field Boundaries, unique Key, Primary key etc.

RethinkDB insert data with relationship

I have a table A and a table B.
Table B has a relationship with A with the key a_id.
I already created the document in a in table A.
I'm wondering how to insert data in a single query using a doc in table B with foreign key A.
r.db('DB').table('B').insert([{
'b_data': ...,
'a_id': r.db('DB').table('a').filter(r.row['name'] == 'some_name')
} for p in a]).run(conn)
You are on the right track, but ReQL differs from the logic of SQL in that it is usually more like a flow. So your query needs to start with the source of the data so it can flow into the insert portion. So here is a version of what I think you want (in Python format):
r.db('DB').table('a').filter({'name':'some_name}).for_each(
r.db('DB').table('b').insert(
{'name':r.row['name'],'b_data':'something'}
)
).run(conn)

ORACLE - Use Trigger to Update Data in another DB

I am new to creating triggers in ORACLE. My scenario is as follows:
I have two Databases on the same server - A & B
When the value of Column X is updated in Table AA of Database A, I need to update the same value(s) of Column Y in Table BB of Database B.
These tables are linked into an Access front-end and set up as one-to-many (one Column X to many Column Y's), but I cannot set up cascading updates in Access when the tables are linked in. In ORACLE, these 2 tables come from 2 separate databases.
How can I create this trigger so that when Column X of Table AA in Database A is updated, the trigger goes to Column Y in Table BB of Database B, checks for the OLD value of Column X and updates found instances with the NEW value of Column X?
Any help provided would be much appreciated.
Please clarify some question
Is update table is bi-directional or uni-directional to your environment?
Trigger will be heavy once the number or transaction and amount of data increase, so what is your exact purpose to replicate the data to target system?

comparing data in two tables taking time

I need to query table1 find all orders and created date ( key is order number an date)).
In table 2 ( key is order number an date) Check if the order exists for a a date.
For this i am scanning table 1 and for each record checking if it exists in table 2. Any better way to do this
In this situation in which your key is identical for both tables, it makes sense to have a single table in which you store both data for Table 1 and Table 2. In that way you can do a single scan on your data and know straight away if the data exists for both criteria.
Even more so, if you want to use this data in MapReduce, you would simply scan that single table. If you only want to get the relevant rows, you could define a filter on the Scan. For example, in the case where you will not be populating rows at all in Table 2, you would simply use a ColumnPrefixFilter
If, however, you do need to keep this data separately in 2 tables, you could pre-split the tables with the same region boundaries for both tables - this will be helpful when you do the query that you are aiming for - load all rows in Table 1 when row exists in Table 2. Essentially this would be a map-side join. You could define multiple inputs in your MapReduce job, and since the region borders are the same, the splits will be such that each mapper will have corresponding rows from both tables. You would probably need to implement your own MultipleInput format for that (the MultiTableInputFormat class recently introduced in 0.96 does not seem to do that map side join)

Resources