Insert Overwrite for multiple inserts in hive which have the same partition with same parameter value

Insert Overwrite for multiple inserts in hive which have the same partition with same parameter value - insert

Hi Guys,
So I am trying to do multiple inserts and i am successfully able to
do it but if there are two queries which have same partition and
static value assigned it gives me the following error
:15:02:22 [EXPLAIN - 0 row(s), 0.000 secs] [Error Code: 10087, SQL State:
42000] Error while compiling statement: FAILED: SemanticException
[Error 10087]: The same output cannot be present multiple times:
table_name#id=0
here first insert happens successfully but because second insert has the same value assigned for id
which is 0 ..it gives the above error ..please let me know a
workaround.Thanks :)
FROM (
Select * from Table_Name
)Query
INSERT OVERWRITE TABLE Table_Name PARTITION(id=0)
select column1,column2,column3
GROUP BY column1,column2,column1
INSERT OVERWRITE TABLE Table_Name PARTITION(id=0)
select column1,count(*) as column2

Instead of multiple inserts, you can just do one insert with a union of the two queries.
FROM (
Select * from Table_Name
)Query
INSERT OVERWRITE TABLE Table_Name PARTITION(id=0)
select <query 1>
UNION ALL
select <query 2>

Related

Query to check the full schema scan for tables in Oracle DB

Hi I have a requirement to scan through the schema and identify the tables which are redundant (candidate for dropping) ,so i did a select in DBA_Dependencies to check whether the tables are being used in any of the DB object types like (Procedure, package body, views, Materialized views....) i was able to find some tables and excluded the tables ,since i also need to capture the total counts, when the table was last loaded/used is there a automated way to select only selected tables (not found in dependencies list) and capture the counts and also when it was used/loaded
Difficulty - so many tables 500+
i have used the below query
Query 1
select table_name,
to_number(extractvalue(xmltype(dbms_xmlgen.getxml('select count(*) c from '||owner||'.'||table_name)),'/ROWSET/ROW/C')) as count
from all_tables
where owner = 'SCHEMA_NAME'
Query 2
select owner, table_name, num_rows, sample_size, last_analyzed from all_tables;
Query 1 Result
Filter Table_name=CUST_ORDER
OWNER TABLE_NAME COUNT SAMPLE_SIZE LAST_ANALYZED
ABCD CUST_ORDER 1083 1023 01.01.2020
Query 2 Result
Filter Table_name=CUST_ORDER
OWNER TABLE_NAME NUM_ROWS SAMPLE_SIZE LAST_ANALYZED
ABCD CUST_ORDER 1023 1023 01.01.2020
Question
Query 1 - Results not matching when compared with query 2 ,since the same table and filter is applied
in both the queries and why the results are not matching ?
but when i randomly checked other filter it is matching , does any one know the reason ?
Upon further testing i encountered an error ,what does this error signify permissions ?
ORA-29913: error in executing ODCIEXTTABLEOPEN callout
ORA-29400: data cartridge error
KUP-04040: file **-**.csv in ****_***_***_***** not found
29913. 00000 - "error in executing %s callout"
*Cause: The execution of the specified callout caused an error.
*Action: Examine the error messages take appropriate action.

The number you see on all_tables is a point in time capture of the number of rows. It will only be updated if the statistics are rebuilt for that table.
Here is an example:
CREATE TABLE t1 AS
SELECT *
FROM all_objects;
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78570
SELECT COUNT(*)
FROM t1;
-- 78570
The stats and the physical number of rows match!
INSERT INTO t1
SELECT *
FROM all_objects ao
WHERE rownum <= 5;
-- 5 rows inserted
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78570
SELECT COUNT(*)
FROM t1;
-- 78575
Here we have the mis-match because rows were inserted (or maybe even deleted), but the stats for the table have not been updated. Let's update them:
BEGIN
dbms_stats.gather_table_stats(ownname => 'SCHEMA',
tabname => 'T1');
END;
/
SELECT t.num_rows
FROM all_tables t
WHERE t.table_name = 'T1';
-- 78575
Now you can see the rows match. Using the value from all_tables may be good enough for your research (and will certainly be faster to query than counting every table).

Query - 1 is actual data of the table and hence it is accurate data. One can rely on this query's output.
Query - 2 is not actual data. It is the data captured when table was last analyzed and one should not be dependant on this query for finding number of records in the table.
You can gather the stats on this table and execute the query-2 then you will find the same data as query-1
If records are not inserted or deleted from the table after stats are gathered, then query-1 and query-2 data will match for that table.

Create table in Hue after many with statements

I am having an issue creating a table in Hue after I do a bunch of temp. table commands. A very high-level example is below.. I am trying to create a table after the many temporary tables are created.
I am basically trying to create a table of the last select statement but I am running into errors both with the create table line and also determining what the last select * table is called..
With TABLEA as (Select * from TEST1.FILEA),
TableB as (Select * from tableA)
Select * from tableB
where TableB.Curr = 'TYPEE'
CREATE TABLE TEST
row format delimited
fields terminated by '|'
STORED AS RCFile
as Select * from TableB

In your query please follow the syntax and examples as below
create table as <your_with_clause_select_query>
Example:
create table test as
with tableA as ( select * from test1.fileA)
select * from tableA;
You can also use nested select statements with CTAS.
CREATE TABLE TEST AS
select * from (
select
*
from
test1.fileA
) b
row format delimited fields terminated by '|'
STORED AS RCFile

global temporary table hibernate

I'm trying to replace a giant IN clause (hundreds of values) with a JOIN for performance reasons, so I created a global temp table (Oracle) hoping that may be a viable alternative:
CREATE GLOBAL TEMPORARY TABLE TMP_USER_GUID (
user_guid varchar(20)
)
ON COMMIT DELETE ROWS
When I run my sql manually, it works fine:
INSERT ALL
INTO ent.tmp_usr_guid VALUES ('00JD49W7IJ93ZU5MBWBQ')
-- as many INTO statements as I would have IN parameters
SELECT * FROM DUAL;
SELECT u.guid, u.first_name, u.last_name, ...
FROM usr u
JOIN ...
JOIN ...
JOIN tmp_usr_guid tug ON u.guid = tug.usr_guid
When I try running it as a native sql statement using Hibernate (5.2.12.FINAL) it throws:
org.hibernate.QueryException: unexpected char: ';' [INSERT ALL INTO
ent.tmp_usr_guid VALUES ('00JD49W7IJ93ZU5MBWBQ') SELECT * FROM DUAL;
SELECT u.guid,
Any thoughts on the correct approach to take?

add partition in hive table based on a sub query

I am trying to add partition to a hive table (partitioned by date)
My problem is that the date needs to be fetched from another table.
My query looks like :
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date = (SELECT max(server_date) FROM processed_table));
When i run the query hive throws the following error:
Error: Error while compiling statement: FAILED: ParseException line 1:84 cannot recognize input near '(' 'SELECT' 'max' in constant (state=42000,code=40000)

Hive does not allow to use functions/UDF's for the partition column.
Approach 1:
To achieve this you can run the first query and store the result in one variable and then execute the query.
server_date=$(hive -e "set hive.cli.print.header=false; select max(server_date) from processed_table;")
hive -hiveconf "server_date"="$server_date" -f your_hive_script.hql
Inside your script you can use the following statement:
ALTER TABLE my_table ADD IF NOT EXISTS PARTITION(server_date =${hiveconf:server_date});
For more information on the hive variable substitution, you can refer link
Approach 2:
In this approach, you will need to create a temporary table if the partition data you are expecting is already not loaded in any other partitioned table.
Considering your data doesn't have the server_date column.
Load the data into temporary table
set hive.exec.dynamic.partition=true;
Execute the below query:
INSERT OVERWRITE TABLE my_table PARTITION (server_date)
SELECT b.column1, b.column2,........,a.server_date as server_date FROM (select max(server_date) as server_date from ) a, my_table b;

How to fix partition predicate error without using (set hive.mapred.mode=unstrict)

Is there any way to bypass Hive query when strict mode is enabled in partition table without using set hive.mapred.mode=unstrict
I have two tables both are partitioned with dte, when I do union operation on both tables and trying to select 2 where conditions I am getting partition predicate error
Query
with a as (select "table1" as table_name,column1,column2,column3 from table_one
union all
select "table2" as table_name,column1,column2,column3 from table_two)
select * from a where dte='2017-08-01' and table_name='table1';
Error
Error: Error while compiling statement: FAILED: SemanticException
[Error 10041]: No partition predicate found for Alias "table_one"
Table "table_one" (state=42000,code=10041)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Insert Overwrite for multiple inserts in hive which have the same partition with same parameter value - insert

Instead of multiple inserts, you can just do one insert with a union of the two queries. FROM ( Select * from Table_Name )Query INSERT OVERWRITE TABLE Table_Name PARTITION(id=0) select <query 1> UNION ALL select <query 2>

Related

Query to check the full schema scan for tables in Oracle DB

Create table in Hue after many with statements

global temporary table hibernate

add partition in hive table based on a sub query

How to fix partition predicate error without using (set hive.mapred.mode=unstrict)

Categories

Resources