Select all data from multiple tables non-relationally - oracle

Put simply, I want to select all data in two columns from each of six tables using one query. All six tables have these same two columns. They aren't relational tables and so there is no need to relationally join them.
The obvious (but apparently wrong) way to do it would be:
select col1, col2 from table1, table2, (... etc)
However this gives a "ORA-00918: column ambiguously defined" error. I've tried a variety of other things including some rather poor sub-querying but haven't managed to get any workable results.
Any suggestions for how to do this? Thanks.

My guess is that you're looking for something like
SELECT col1, col2 FROM table1
UNION ALL
SELECT col1, col2 FROM table2
UNION ALL
...
SELECT col1, col2 FROM table6
If that's not what you want, it would be helpful if you could post some sample data and the expected output.

Related

How to combine 2 Google Sheets Query tables chronologically by Date?

I currently have 2 Google Sheets with their own respective tables, Table1 and Table2.
Table1 is located in the example below from Column A to Column C. This was Query 1.
Table2 is located from E to G. This was Query 2.
My desired outcome is Table3 located in Column I to K.
Both of these table have been imported from separate sheets into my newly created sheet using 2 different Queries.
Here is an example of the code that I used:
=Query(IMPORTRANGE("url_example1", "Sheet1!A1:Z"), "select Col1, Col2, Col7", 1)
And another:
=Query(IMPORTRANGE("url_example2", "Sheet2!A1:Z"), "select Col1, Col2, Col7", 1)
Is there a way to combine both tables chronologically with 1 Query?
If not, what is the best way to accomplish this goal? Any help would be greatly appreciated.
Again my hope is to combine 2 Google Sheets Query tables into one table and have the data sorted chronologically. Thanks for your time.
use:
=QUERY({IMPORTRANGE("url_example1", "Sheet1!A1:Z");
IMPORTRANGE("url_example2", "Sheet1!A2:Z")},
"select Col1,Col2,Col7
order by Col1 asc", 1)

Why reading from some table in Oracle slower than other table in the same database

I am doing a simple Select col1, col2, col22 from Table1 order by col1 and the same Select statement in Table2. Select col1, col2, col22 from Table2 order by col1.
I use Pentaho ETL tool to replicate data from Oracle 19c to SQL Server. Reading from Table1 is much much slower than reading from Table2. Both have almost the same number for columns and almost the same number for rows. Both exist in the same schema. Table1 is being read at 10 rows per sec while Table2 is being read at 1000 rows a sec.
What can cause this slowness?
Are the indexes the same on the two tables? It's possible Oracle is using a fast full index scan (like a skinny version of the table) if an index covers all the relevant columns in one table, or may be using a full index scan to pre-sort by COL1. Check the execution plans to make sure the statements are using the same access methods:
explain plan for select ...;
select * from table(dbms_xplan.display);
Are the table segment sizes the same? Although the data could be the same, occasionally a table can have a lot of wasted space. For example, if the table used to contain a billion rows, and then 99.9% of the rows were deleted, but the table was never rebuilt. Compare the segment sizes with a query like this:
select segment_name, sum(bytes)/1024/1024 mb
from all_segments
where segment_name in ('TABLE1', 'TABLE2')
It depends on many factors.
The first things I would check are the table indexes:
select
uic.table_name,
uic.index_name,
utc.column_name
from USER_TAB_COLUMNS UTC,
USER_IND_COLUMNS UIC
where utc.table_name = uic.table_name
and utc.column_name = uic.column_name
and utc.table_name in ('TABLE1', 'TABLE2')
order by 1, 2, 3;

Dump large SQL table as .csv and split it into multiple separate CSVs for Redshift

I have a table, let's called it tableA, in an Oracle database that I need to upload to redshift. Looked up several methods online, most suggest using datagrip or some other IDE to dump it. The size of the table as reported by Oracle is 89GB so I cannot use DataGrip to dump it as a .csv file. How do I dump it as partitioned .csv so I can use the COPY command by RedShift for faster uploads? Please ask for any further info that might be required.
Try to use this utility.util
It executes the sql files sequentially and stores the result in separate csv files.
It uses sqlplus in console mode, sqlplus is the best performance utility, but it is not a friendly user.
Try to split the sql by primary key. Set ranges by primary key in separate sql files.
for example
sql1.sql
select col1, col2, col3. col4 from tableA where col1 >0 and col1<=1000000;
sql2.sql
select col1, col2, col3. col4 from tableA where col1 >1000000 and col1<=2000000;
sql3.sql
select col1, col2, col3. col4 from tableA where col1 >2000000 and col1<=3000000;

How to make insert statement re-runnable?

Need to add two following insert statements:
insert into table1(schema, table_name, table_alias)
values ('ref_owner','test_table_1','tb1');
insert into table1(schema, table_name, table_alias)
values ('dba_owner','test_table_2','tb2');
Question is how can I make those two insert statements re-runnable meaning, if those two insert statement are compiled again, it should throw row exists error or something along those lines...?
Additional notes:
1. I've seen examples of Merge in Oracle however, thats only when you're using two tables to match records. In this case im only using a single table.
2. The table does not have any primary, unique or foreign keys - only check constraints on one of the columns.
Any help is highly appreciated.
You can use a MERGE statement, as follows:
MERGE into table1 t1
USING (SELECT 'ref_owner' AS SCHEMA_NAME, 'test_table_1' AS TABLE_NAME, 'tb1' AS ALIAS_NAME FROM DUAL
UNION ALL
SELECT 'dba_owner', 'test_table_2', 'tb2' FROM DUAL) d
ON (t1.SCHEMA = d.SCHEMA_NAME AND
t1.TABLE_NAME = d.TABLE_NAME)
WHEN NOT MATCHED THEN
INSERT (SCHEMA, TABLE_NAME, TABLE_ALIAS)
VALUES (d.SCHEMA_NAME, d.TABLE_NAME, d.ALIAS_NAME)
Best of luck.
You should have a primary key, especially when you want to check for duplicate records and data integrity.
Provide a primary key for your table, or, if you somehow do not want to do that, create a unique constraint for all of the columns in the table, so no duplicate rows are possible.

Dynamically load partitions in Hive with predicate pushdown

I have a very large table in Hive, from which we need to load a subset of partitions. It looks something like this:
CREATE EXTERNAL TABLE table1 (
col1 STRING
) PARTITIONED BY (p_key STRING);
I can load specific partitions like this:
SELECT * FROM table1 WHERE p_key = 'x';
with p_key being the key on which table1 is partitioned. If I hardcode it directly in the WHERE clause, it's all good. However, I have another query which calculates which partitions I need. It's more complicated than this, but let's define it simply as:
SELECT DISTINCT p_key FROM table2;
So now I should be able to construct a dirty query like this:
SELECT * FROM table1
WHERE p_key IN (SELECT DISTINCT p_key FROM table2);
Or written as an inner join:
SELECT t1.* FROM table1 t1
JOIN (SELECT DISTINCT p_key FROM table2) t2 ON t1.p_key = t2.p_key
However, when I run this, it takes enough time to let me believe it's doing a full table scan. In the explain for the above queries, I can also see the result of the DISTINCT operation are used in the reducer, not the mapper, meaning it would be impossible for the mapper to know which partitions should be loaded or not. Granted, I'm not fully familiar with Hive explain output, so I may be overlooking something.
I found this page: MapJoin and Partition Pruning on the Hive wiki and the corrosponding ticket indicates it was released in version 0.11.0. So I should have it.
Is it possible to do this? If so, how?
I'm not sure how to help with MapJoin, but in the worst case you could dynamically create second query with something like:
SELECT concat('SELECT * FROM table1 WHERE p_key IN (',
concat_ws(',',collect_set(p_key)),
')')
FROM table2;
then execute obtained result. With this, query processor should be able to prune unneeded partitions.

Resources