When using Upsolver SQLake, if my source table has 100's of columns, and I want to include most of them in a transformation, but exclude a few, can I do that without having to explicitly map every column in the transformation SQL?
For example, if my source table has 5 columns, (col1, col2, col3, col4, col5), and in my transformation I do not want to include col3. I could use the following SQL:
SELECT col1, col2, col4, col5 FROM sourcetable
However, if my source table has 1000 columns, I'd rather not have to type out 999 columns if I don't have to.
I was looking for an option to generate SQL, or some option to exclude certain columns from a transformation.
SQLake supports an EXCEPT parameter in the transformation job definition. The transformation SQL will be evaluated, however columns in the EXCEPT reference will be excluded in the target table.
CREATE JOB insert_all_columns_except_col3
START_FROM = NOW
ADD_MISSING_COLUMNS = TRUE
RUN_INTERVAL = 1 MINUTE
AS INSERT INTO target_table MAP_COLUMNS_BY_NAME EXCEPT col3
SELECT *
FROM source_table
WHERE $commit_time BETWEEN RUN_START_TIME() and RUN_END_TIME();
In this case, all columns from "source_table" will be written into "target_table" except for col3.
Related
As the question states, I'm trying to save calculated data, that is the result of a select statement, to another table. In this Image, the column with green outline is a database column and the columns with red outline are calculated based on that column, I want to save the Red outlined columns to another table where the column names would be same.
This looks like a classic report. Is it? If so, it is result of a select statement. As it calculates all values you're interested in, you'd use it in an insert statement. For example, you could create a button and create a process that fires when that button is pressed. It would then
insert into target_table (emp_id, salary, house_rent, ...)
select emp_id, ... whatever you select in report's query
from ...
However: data changes. What will you do when something - that is used to calculate those values - is changed? Will you delete those rows and insert new ones? Update existing values? Add yet another row?
If you'd update existing values, consider using merge as it is capable of inserting rows (in when not matched clause, in your case) , as well as updating rows (in when matched). That would look like this:
merge into target_table t
using (select emp_id, ... whatever you select in report's query
from ...
) x
on (t.emp_id = x.emp_id)
when matched then update set
t.salary = x.salary,
t.house_rent = x.house_rent,
...
when not matched then insert (emp_id, salary, house_rent, ...)
values (x.emp_id, x.salary, x.house_rent, ...);
You can use the INSERT INTO SELECT statement - plenty of examples available on google
INSERT INTO another_table (
emp_id,
col1,
col2
)
SELECT emp_id,
calculated_col1,
calculated_col2
FROM first_table
In my Laravel project, in the database table ads, I have the following structure :
id | col1 | col2
col2 has values like topad, bump,urgent along with empty value. I want to take all the rows from the ads table and sort them alphabetically based on col2 in descending order.
So I used:
Ads::orderBy('col2','DESC')->get()
Now I have 2 conditions to be applied on the query.
1st condition : Suppose there are 4 rows with topad in col2, 5 rows with urgent in col2, 6 rows with bump in col2 and 7 rows each with an empty value in col2 . So rows with urgent in col2 will appear 1st, with topad in col2 will appear 2nd and with bump in col2 will appear 3rd and with empty values in col2 will appear 4th. Now I need to randomize the rows' order within each set. For example , rows with topad in col2 may have the ids 1,2,3,4. I want to randomize these rows (which may result into for example 4,2,1,3). But they will appear before rows containing topad in col2. Same is true for topad and bump row sets and rows containing any empty value in col2.
So the query becomes :
Ads::orderBy('col2','DESC')->inRandomOrder()->get();
2nd condition : Suppose rows are ordered by col2 values. But from each set of rows containing same value in col2, I need n number of rows from those that have non-empty value in col2 i.e. randomly I need n rows from urgented rows, n from topaded rows, n from bumped rows and all from emptyed rows.
How to write the query then ?
You could do this with subqueries, but in my experience they take more time to execute then a few smaller ones (if they are indexed correctly). Also, you have more control over the limits and debugging issues.
$top_ads = Ads::whereCol2('topad')->inRandomOrder()->limit(5)->get();
$urgent_ads = Ads::whereCol2('urgent')->inRandomOrder()->limit(10)->get();
$bump_ads = Ads::whereCol2('bump')->inRandomOrder()->limit(2)->get();
This will create your queries and after that you can do whatever you want with their collections. Combine them, reorder them, etc.
Data is migrated from legacy Oracle DB into Amazon S3 buckets. Schemas are exactly the same in both. I can write individual queries on both systems, like row count and Date diff to make sure data is the same, but is there an approach to test at a larger scale on both platforms, across all columns at once in a specific table?
I’ve done data reconciliation in the past via SQL alone; in simple cases (e.g. 2 datasets) it’s very easy with SQL set based operations (using a combination of union, minus) as per simple 2-way reconciliation pattern* (below):
/* Pattern query for data reconciliation test:
Data differences between the two sets would be the output if there are differences; …
these rows would be the ones to deep dive on to understand/explain the differences.
No rows selected means there are no differences
N.B. the data source connection credentials would be embedded in the database link
*/
(
/* all rows in app1.tableX#link1 that are not found in app2.tableX#link2 */
select col1, col2, col3 /* … */ from app1.tableX#link1
minus
select col1, col2, col3 /* … */ from app2.tableX#link2
(
union
(
/* all rows in app2.tableX#link2 that are not found in app1.tableX#link1 */
select col1, col2, col3 /* … */ from app2.tableX#link2
minus
select col1, col2, col3 /* … */ from app1.tableX#link1
)
;
The results of these kinds of queries can easily be saved off via
“create table as select …”
or
“insert into as select …”
When I try to use a inbuilt UDF function or my own UDF function on the GroupBy columns as below in hive I seem to be getting error
select col1, col2 from xyz group by my_func(col1), col2
It keeps complaining column –col1 not found in group by expression.
When you apply a function to a column, it is not longer called the same thing. You should name it explicitly using the as keyword.
select group1, group2 from xyz group by my_func(col1) as group1, col2 as group2;
Also, if you're only selecting the columns that you're grouping by, not the actual grouped data, maybe distinct would be more appropriate than group by?
The call to the aggregate function is in the wrong place. It should be made as follows:
Select my_func(col1),col2 from xyz group by col1,col2
select col1, col2 from xyz group by my_func(col1) as col1, col2
The basic is that your GROUP BY needs to have all the cols that you have mentioned in SELECT clause.
Question 1
Can anyone tell me if there is any difference between following 2 update statements:
UPDATE TABA SET COL1 = '123', COL2 = '456' WHERE TABA.PK = 1
UPDATE TABA SET COL1 = '123' WHERE TABA.PK = 1
where the original value of COL2 = '456'
how does this affect the UNDO?
Question 2
What about if I update a record in table TABA using ROWTYPE like the following snippet.
how's the performance, and how does it affect the UNDO?
SampleRT TABA%rowtype
SELECT * INTO SampleRT FROM TABA WHERE PK = 1;
SampleRT.COL2 = '111';
UPDATE TABA SET ROW = SampleRT WHERE PK = SampleRT.PK;
thanks
Is your question 1 asking whether UNDO (and REDO) is generated when you're running an UPDATE against a row but not actually changing the value?
Something like?
update taba set col2='456' where col2='456';
If this is the question, then the answer is that even if you're updating a column to the same value then UNDO (and REDO) is generated.
(An exception is when you're updating a NULL column to NULL - this doesn't generate any REDO).
For Question 1:
The outcome of the two UPDATEs for rows in your table where PK=1 and COL2='456' is identical. (That is, each such row will have its COL1 value set to '123'.)
Note: there may be rows in your table with PK=1 and COL2 <> '456'. The outcome of the two statements for these rows will be different. Both statements will alter COL1, but only the first will alter the value in COL2, the second will leave it unchanged.
For question 1:
There can be a difference as triggers can fire depending on which columns are updated. Even if you are updating column_a to the same value, the trigger will fire.
The UNDO shouldn't be different as, if you expand or shrink the length of a variable length column (eg VARCHAR or NUMBER), all the rest of the bytes of the record need to be shuffled along too.
If the columns don't change size, then you MAY get a benefit in not specifying the column. You can probably test it using v$transaction queries to look at undo generated.
For question 2:
I'd be more concerned about memory (especially if you are bulk collecting SELECT * ) and triggers firing than UNDO.
If you don't need SELECT *, specify the columns (eg as follows)
cursor c_1 is select pk, col1, col2 from taba;
SampleRT c_1%rowtype;
SELECT pk, col1, col2 INTO SampleRT FROM TABA WHERE PK = 1;
SampleRT.COL2 = '111';
UPDATE (select pk, col1, col2 from taba)
SET ROW = SampleRT WHERE PK = SampleRT.PK;