I would like to compare data between two tables say source and destination and output the difference,
the problem is there's a mapping table which stores the columns of source table and corresponding columns of destination.
For example,
Table: T_MAP
SourceTableName SourceTableColumns DestinationTable DestinationTableColumn
s_t1 s_t1_col1 d_t1 d_t1_col1
s_t1 s_t1_col2 d_t d_t1_col2
s_t2 s_t2_col1 d_t2 d_t2_col1
....
So the question is how to compare the data between two tables with the map table.
Current idea is using dynamic cursor to generate dynamic sql statement, then using minus+union all to compare data. But the performance may be a big problem.
Is there any thoughts?
Please help..
Thanks in advance.
Related
I created this view
SELECT p1.*, p2.*, p3.*, p4.*, p5.*
FROM proizvodi_1_nivo p1,proizvodi_2_nivo p2,proizvodi_3_nivo p3,proizvodi_4_nivo p4,proizvodi_5_nivo p5
WHERE p2.sif_proizvod_1=p1.sifra_proizvoda_1
AND p3.sif_proiz_2=p2.proizvod_nivo_2
AND p4.sif_pro_3=p3.proiz_nivo_3
AND p5.p_sifra_4=nivo_4_proizvod
and when it's created I can see all the columns' names, but it doesn't retrieve any data. I created many other views in this way but I only have problem with this one. Could it be related to constraints?
Thank you for the help.
you know your result will be the Cartesian product right. You need to optimize your query to work efficiently and you need to change this.
p5.p_sifra_4=p4.nivo_4_proizvod from last column. you forgot p4.
I'm trying to build a star schema in Oracle 12c. In my case my data source is not a relational database but a single excel/csv file which is populated via a google form, which means I don't have any sort of reference from a source system such as auto incremental keys/ids. Now what would be the best approach to build a star schema given this condition?
File row sample:
<submitted timestamp>,<submitted by user>,<region>,<country>,<branch>,<branch location>,<branch area>,<branch type>,<branch name>,<branch private? yes/no value>,<the following would be all "fact" values (measurements),...,...,...
In case i wanted to build a "branch" dimension, how would I handle updates/inserts after the first load into the dimension table?
Thought solution so far:
I had thought of making a concatenated string "key" with the branch values, which would make it unique (underscore would be the "glue" to concatenate the values), eg:
<region>_<country>_<branch>_<branch location> as branch_key
I would insert all the distinct branches into a staging table, including they branch_key column for each one of them, then when trying to load into the dimension I could compare which key does not exists yet in my dimension table and then insert it. As for updates, I'm a bit stuck on how to handle that, I had thought of having another file mapping which branches are active having a expiration date column. Basically trying to simulate what I could do having the data in a database instead of CSV files.
This is all I can think of so far, do you have any other recommendations/ideas on how to implement this? Take on consideration that the data source cannot as in I have to read these csv files, since data is not stored anywhere else.
Thank you.
I have two datasets in my birt report :
Lesson (date)
Student (name)
and I would like to know how to create a cross table using the date (red) as the column names and name (blue) as the row names as shown below :
The cells will stay empty.
I have try to use the Cross Tab but it seems that I can only use one dataset.
For information I am stuck with the version 2.5.2. I say this in case someone writes about a practical functionality available in the later version of birt... :-)
Where both datasets are coming from the same relational data source, the simplest way to achieve this would normally be:
Replace the existing two datasets with a single dataset, in which the two original datasets are cross-joined to each other;
create a crosstab from the new dataset, with the new dataset columns as the data cube groups.
Here's the scenario:
Say you have a Hive Table that stores twitter data.
Say it has 5 columns. One column being the Text Data.
Now How do you add a 6th column that stores the sentiment value from the Sentiment Analysis of the twitter Text data. I plan to use the Sentiment Analysis API like Sentiment140 or viralheat.
I would appreciate any tips on how to implement the "derived" column in Hive.
Thanks.
Unfortunately, while the Hive API lets you add a new column to your table (using ALTER TABLE foo ADD COLUMNS (bar binary)), those new columns will be NULL and cannot be populated. The only way to add data to these columns is to clear the table's rows and load data from a new file, this new file having that new column's data.
To answer your question: You can't, in Hive. To do what you propose, you would have to have a file with 6 columns, the 6th already containing the sentiment analysis data. This could then be loaded into your HDFS, and queried using Hive.
EDIT: Just tried an example where I exported the table as a .csv after adding the new column (see above), and popped that into M$ Excel where I was able to perform functions on the table values. After adding functions, I just saved and uploaded the .csv, and rebuilt the table from it. Not sure if this is helpful to you specifically (since it's not likely that sentiment analysis can be done in Excel), but may be of use to anyone else just wanting to have computed columns in Hive.
References:
https://cwiki.apache.org/Hive/gettingstarted.html#GettingStarted-DDLOperations
http://comments.gmane.org/gmane.comp.java.hadoop.hive.user/6665
You can do this in two steps without a separate table. Steps:
Alter the original table to add the required column
Do an "overwrite table select" of all columns + your computed column from the original table into the original table.
Caveat: This has not been tested on a clustered installation.
I have an assignment where I have two tables. Both of these two tables have multiple records that can be grouped by a certain ID creating record sets within those two tables
Those record sets can have various number of records. The trick is I have to compare those two tables and compare them by those record sets. If one record set ordered by update date (one of the record fields) doesn't find an identical record set in another table, I have to output that record set
What is the best way to do it? How do I compare two different tables by record groups/record sets/record blocks?
Should I use sub-query factoring? Should I temporary tables? Should I use something else?
Thank you very much for your generous responses and please let me know if I made my question unclear
i guess you just need a minus query to show the differences.
If you use Toad there is a specific function. Or you can use the minus operator or read this other post link