How to drop hive partitions with hivevar passed as partition variable? - hadoop

I have been trying to run this piece of code to drop current day's partition from hive a table and for some reason it does not drop the partition from the hive table. Not sure what's worng.
Table Name : prod_db.products
desc:
+----------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+----------------------------+-----------------------+-----------------------+--+
| name | string | |
| cost | double | |
| load_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| load_date | string | |
+----------------------------+-----------------------+-----------------------+--+
## I am using the following code
SET hivevar:current_date=current_date();
ALTER TABLE prod_db.products DROP PARTITION(load_date='${current_date}');
Before and After picture of partitions:
+-----------------------+--+
| partition |
+-----------------------+--+
| load_date=2022-04-07 |
| load_date=2022-04-11 |
| load_date=2022-04-18 |
| load_date=2022-04-25 |
+-----------------------+--+
It runs without any error but doesn't work but won't drop the partition. Table is internal/managed.
I tried different ways mentioned on stack but it is just not working for me.
Help.

You dont need to set a variable. You can directly drop using direct sql.
Alter table prod_db.products
drop partition (load_date= current_date());

Related

Unable to query/select data those inserted through Spark SQL

I am trying to insert data into a Hive Managed table that has a partition.
Show create table output for reference.
+--------------------------------------------------------------------------------------------------+--+
| createtab_stmt |
+--------------------------------------------------------------------------------------------------+--+
| CREATE TABLE `part_test08`( |
| `id` string, |
| `name` string, |
| `baseamount` double, |
| `billtoaccid` string, |
| `extendedamount` double, |
| `netamount` decimal(19,5), |
| `netunitamount` decimal(19,5), |
| `pricingdate` timestamp, |
| `quantity` int, |
| `invoiceid` string, |
| `shiptoaccid` string, |
| `soldtoaccid` string, |
| `ingested_on` timestamp, |
| `external_id` string) |
| PARTITIONED BY ( |
| `productid` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
| LOCATION |
| 'wasb://blobrootpath/hive/warehouse/db_103.db/part_test08' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transactional'='true', |
| 'transactional_properties'='default', |
| 'transient_lastDdlTime'='1549962363') |
+--------------------------------------------------------------------------------------------------+--+
Trying to execute SQL statement to insert records into the part table like below
sparkSession.sql("INSERT INTO TABLE db_103.part_test08 PARTITION(ProductId) SELECT reflect('java.util.UUID', 'randomUUID'),stg_name,stg_baseamount,stg_billtoaccid,stg_extendedamount,stg_netamount,stg_netunitamount,stg_pricingdate,stg_quantity,stg_invoiceid,stg_shiptoaccid,stg_soldtoaccid,'2019-02-12 09:06:07.566',stg_id,stg_ProductId FROM tmp_table WHERE part_id IS NULL");
Without insert statement, if we run select query then getting below data.
+-----------------------------------+--------+--------------+--------------------+------------------+-------------+-----------------+-------------------+------------+-------------+--------------------+--------------------+-----------------------+------+-------------+
|reflect(java.util.UUID, randomUUID)|stg_name|stg_baseamount| stg_billtoaccid|stg_extendedamount|stg_netamount|stg_netunitamount| stg_pricingdate|stg_quantity|stg_invoiceid| stg_shiptoaccid| stg_soldtoaccid|2019-02-12 09:06:07.566|stg_id|stg_ProductId|
+-----------------------------------+--------+--------------+--------------------+------------------+-------------+-----------------+-------------------+------------+-------------+--------------------+--------------------+-----------------------+------+-------------+
| 4e0b4331-b551-42d...| OLI6| 16.0|2DD4E682-6B4F-E81...| 34.567| 1166.74380| 916.78000|2018-10-18 05:06:22| 13| I1|2DD4E682-6B4F-E81...|2DD4E682-6B4F-E81...| 2019-02-12 09:06:...| 6| P3|
| 8b327a8e-dd3c-445...| OLI7| 16.0|2DD4E682-6B4F-E81...| 34.567| 766.74380| 1016.78000|2018-10-18 05:06:22| 13| I6|2DD4E682-6B4F-E81...|2DD4E682-6B4F-E81...| 2019-02-12 09:06:...| 7| P4|
| c0e14b9a-8d1a-426...| OLI5| 14.6555| null| 34.56| 500.87000| 814.65000|2018-10-11 05:06:22| 45| I4|29B73C4E-846B-E71...|29B73C4E-846B-E71...| 2019-02-12 09:06:...| 5| P1|
+-----------------------------------+--------+--------------+--------------------+------------------+-------------+-----------------+-------------------+------------+-------------+--------------------+--------------------+-----------------------+------+-------------+
Earlier I was getting error while inserting into Managed table. But after restarting Hive & Thrift services now there is no error in execution of job but not able to see those inserted data while doing select query through beeline/program. I can see partition with delta files also got inserted into hive/warehouse, see below screenshot.
Also, I can see some warnings as below not sure its related to the error or not.
Cannot get ACID state for db_103.part_test08 from null
One more note: If I use External Table then it is working fine can able to view data as well.
We are using Azure HDInsight Spark 2.3 (HDI 4.0 Preview) cluster with below Service Stacks.
HDFS: 3.1.1
Hive: 3.1.0
Spark2: 2.3.1
Have you added below set commands while trying to insert data.
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
I have faced similar issue where i was not allowed to do any read / write operation, after adding the above properties I was able to query the table.
Since you have not faced any issue with external table, not sure if this is going to solve your problem.

Querying HIVE Metadata

I need to query the following table and view information from my Apache HIVE cluster:
Each row needs to contain the following:
TABLE SCHEMA
TABLE NAME
TABLE DESCRIPTION
COLUMN NAME
COLUMN DATA TYPE
COLUMN LENGTH
COLUMN PRECISION
COLUMN SCALE
NULL OR NOT NULL
PRIMARY KEY INDICATOR
This can be easily queried from most RDBMS (metadata tables/views), but I am struggling to find much information about the equivalent metadata tables/views in HIVE.
Please help :)
This information is available from the Hive metastore. The below example query is for a MySQL-backed metastore (Hive version 1.2).
SELECT
DBS.NAME AS TABLE_SCHEMA,
TBLS.TBL_NAME AS TABLE_NAME,
TBL_COMMENTS.TBL_COMMENT AS TABLE_DESCRIPTION,
COLUMNS_V2.COLUMN_NAME AS COLUMN_NAME,
COLUMNS_V2.TYPE_NAME AS COLUMN_DATA_TYPE_DETAILS
FROM DBS
JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN
(
SELECT DISTINCT TBL_ID, TBL_COMMENT
FROM
(
SELECT TBLS.TBL_ID TBL_ID, TABLE_PARAMS.PARAM_KEY, TABLE_PARAMS.PARAM_VALUE, CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment' THEN TABLE_PARAMS.PARAM_VALUE ELSE '' END TBL_COMMENT
FROM TBLS JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID
) TBL_COMMENTS_INTERNAL
) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
Sample output:
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| TABLE_SCHEMA | TABLE_NAME | TABLE_DESCRIPTION | COLUMN_NAME | COLUMN_DATA_TYPE_DETAILS |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
| default | temp003 | This is temp003 table | col1 | string |
| default | temp003 | This is temp003 table | col2 | array<string> |
| default | temp003 | This is temp003 table | col3 | array<string> |
| default | temp003 | This is temp003 table | col4 | int |
| default | temp003 | This is temp003 table | col5 | decimal(10,2) |
| default | temp004 | | col11 | string |
| default | temp004 | | col21 | array<string> |
| default | temp004 | | col31 | array<string> |
| default | temp004 | | col41 | int |
| default | temp004 | | col51 | decimal(10,2) |
+--------------+----------------------+-----------------------+-------------------+------------------------------+
Metastore tables referred in query:
DBS: Details of databases/schemas.
TBLS: Details of tables.
COLUMNS_V2: Details about columns.
SDS: Details about storage.
TABLE_PARAMS: Details about table parameters (key-value pairs)

Return filtered records from a returned set of data from two tables

I have three tables:
- Venue
- Space (belongs to Venue)
- Included Space (belongs to Space)
I receive the id of a Venue in the route and return all the related spaces that I know have Included Spaces(a field called num_included_spaces__c on the Space record that maintains a count of its children). Now that I have all the related parent Spaces for that Venue, I need to find all of the Included Spaces for them.
An Included Space is still a Space, it just happens to have a parent that resides in the same table. I'm trying to turn this:
Venue = Rockdog
- Space = Upstairs
- Space = Media Room
- Space = Courtyard
- Space = Downstairs
- Space = Front Patio
- Space = Indoor Bar
Into this:
Venue = Rockdog
- Space = Upstairs
-- Included Space = Media Room
-- Included Space = Courtyard
- Space = Downstairs
-- Included Space = Front Patio
-- Included Space = Indoor Bar
The Included Spaces table has belongs_to__c and space__c as fields, where belongs_to__c is the id of the parent space and space__c is the id of the child. So i'm looking to find all the Included Spaces where belongs_to_c matches the id of any #spaces returned below
#sub_spaces = Space.where("venue__c = ? AND num_included_spaces__c = ?", params[:venue],0)
#spaces = Space.where("venue__c = ? AND num_included_spaces__c > ?", params[:venue],0)
How would I write this Active Record Query for #included_spaces?
my database schema.
mysql> describe venues;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| created_at | datetime | NO | | NULL | |
| updated_at | datetime | NO | | NULL | |
+------------+--------------+------+-----+---------+----------------+
4 rows in set (0,00 sec)
mysql> describe spaces;;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| venue_id | int(11) | YES | | NULL | |
| created_at | datetime | NO | | NULL | |
| updated_at | datetime | NO | | NULL | |
+------------+--------------+------+-----+---------+----------------+
5 rows in set (0,00 sec)
ERROR:
No query specified
mysql> describe included_spaces;;
+------------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | | NULL | |
| space_id | int(11) | YES | | NULL | |
| created_at | datetime | NO | | NULL | |
| updated_at | datetime | NO | | NULL | |
+------------+--------------+------+-----+---------+----------------+
5 rows in set (0,00 sec)
ERROR:
No query specified
Below function will somehow give you the result you need (in console ofcourse ) however it's not a good solution - it queries database more than needed. However it is the easy -))
def foo id
v = Venue.find(id)
puts v.name
v.spaces.each do |space|
puts space.name
space.included_spaces.each do |spis|
puts spis.name
end
end
end
You can also try a more complex query sth like,
mysql> SELECT spaces.name, included_spaces.name FROM `spaces` INNER JOIN `venues` ON `venues`.`id` = `spaces`.`venue_id` INNER JOIN `included_spaces` ON `included_spaces`.`space_id` = `spaces`.`id` WHERE `spaces`.`venue_id` = 1
-> ;
+------------+-----------+
| name | name |
+------------+-----------+
| Upstairs, | Front |
| Upstairs, | Patio, |
| Upstairs, | Indoor |
| Upstairs, | Bar |
| Downstairs | Media |
| Downstairs | Room, |
| Downstairs | Courtyard |
+------------+-----------+
7 rows in set (0,00 sec)
which should be translated to active record as
Space.joins(:venue)
.joins(:included_spaces)
.where(venue_id: 1)
.select('spaces.name, included_spaces.name')

Automatically generating documentation about the structure of the database

There is a database that contains several views and tables.
I need create a report (documentation of database) with a list of all the fields in these tables indicating the type and, if possible, an indication of the minimum/maximum values and values from first row. For example:
.------------.--------.--------.--------------.--------------.--------------.
| Table name | Column | Type | MinValue | MaxValue | FirstRow |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | day | date | ‘2010-09-17’ | ‘2016-12-10’ | ‘2016-12-10’ |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | price | double | 1030.8 | 29485.7 | 6023.8 |
:------------+--------+--------+--------------+--------------+--------------:
| … | | | | | |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | day | date | ‘2014-06-20’ | ‘2016-11-28’ | ‘2016-11-16’ |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | owner | string | NULL | NULL | ‘Joe’ |
'------------'--------'--------'--------------'--------------'--------------'
I think the execution of many queries
SELECT MAX(column_name) as max_value, MIN(column_name) as min_value
FROM table_name
Will be ineffective on the huge tables that are stored in Hadoop.
After reading documentation found an article about "Statistics in Hive"
It seems I must use request like this:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
But this command ended with error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
Do I understand correctly that this request add information to the description of the table and not display the result? Will this request work with view?
Please suggest how to effectively and automatically create documentation for the database in HIVE?

Materialized View having UNKNOWN staleness - Oracle 11G

I am working on Oracle 11G.
One of my Materialized view has become UNKNOWN (MY_MAT_VW1). You can check the output of the ALL_MVIEWS below.
OWNER | MVIEW_NAME | CONTAINER_NAME | QUERY | QUERY_LEN | UPDATABLE | UPDATE_LOG | MASTER_ROLLBACK_SEG | MASTER_LINK | REWRITE_ENABLED | REWRITE_CAPABILITY | REFRESH_MODE | REFRESH_METHOD | BUILD_MODE | FAST_REFRESHABLE | LAST_REFRESH_TYPE | LAST_REFRESH_DATE | STALENESS | AFTER_FAST_REFRESH | UNKNOWN_PREBUILT | UNKNOWN_PLSQL_FUNC | UNKNOWN_EXTERNAL_TABLE | UNKNOWN_CONSIDER_FRESH | UNKNOWN_IMPORT | UNKNOWN_TRUSTED_FD | COMPILE_STATE | USE_NO_INDEX | STALE_SINCE | NUM_PCT_TABLES | NUM_FRESH_PCT_REGIONS | NUM_STALE_PCT_REGIONS
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
MY_DB | MY_MAT_VW1 | MY_MAT_VW1 | select.. | 6728 | N | | | | N | GENERAL | DEMAND | COMPLETE | IMMEDIATE | NO | COMPLETE | 14-Nov-16 | UNKNOWN | NA | N | Y | N | N | N | N | VALID | N | 0 | | |
MY_DB | MY_MAT_VW2 | MY_MAT_VW2 | select.. | 7074 | N | | | | N | TEXTMATCH | DEMAND | COMPLETE | IMMEDIATE | NO | COMPLETE | 13-Nov-16 | FRESH | NA | N | N | N | N | N | N | FRESH | N | 0 | 0 | |
The queries for the materialized view contain complex joins between multiple tables, inline views and unions.
As per my understanding (UNKNOWN_PLSQL_FUNC column) I guess there is a PLSQL Function which is causing the staleness to become UNKNOWN. However I am not sure which one.
I tried re-compiling and refreshing it but no luck.
Can anyone provide me some information on how to detect the root cause and make sure it does not become UNKNOWN again.
Also is there any implication of it on the data stored within it?
Below is just a sample I've created to replicate the scenario.
SELECT * FROM ENTITY_T;
ID | ENTITY_TYPE | FIRST_NAME | LAST_NAME | LEGAL_NAME
--------------------------------------------------
1 | INDIVIDUAL | JOHN | LESSEN |
2 | INDIVIDUAL | ROSAN | MEL |
3 | CORP | SIGMA | | SIGMA CORPORATION
--Function to get name base upon type
CREATE OR REPLACE FUNCTION GET_NAME (P_ID IN NUMBER)
RETURN VARCHAR2
DETERMINISTIC
AS
LV_NAME VARCHAR2(200);
BEGIN
SELECT CASE ENTITY_TYPE WHEN 'INDIVIDUAL' THEN FIRST_NAME ||' '|| LAST_NAME
WHEN 'CORP' THEN LEGAL_NAME
ELSE 'NONE'
END INTO LV_NAME
FROM ENTITY_T
WHERE ID=P_ID;
RETURN LV_NAME;
EXCEPTION
WHEN NO_DATA_FOUND THEN
RETURN 'NO ID FOUND';
WHEN OTHERS THEN
RETURN 'OTHER ERROR';
END;
--Materialized view creation
CREATE MATERIALIZED VIEW TEST_MV
AS
SELECT ID,ENTITY_TYPE,GET_NAME(ID) NAME
FROM ENTITY_T;
SELECT MVIEW_NAME,STALENESS,AFTER_FAST_REFRESH,UNKNOWN_PLSQL_FUNC,COMPILE_STATE,STALE_SINCE
FROM ALL_MVIEWS WHERE MVIEW_NAME='TEST_MV';
MVIEW_NAME | STALENESS | AFTER_FAST_REFRESH | UNKNOWN_PLSQL_FUNC | COMPILE_STATE | STALE_SINCE
----------------------------------------------------------------------------------------------
TEST_MV | UNKNOWN | NA | Y | VALID |
The Oracle Issue/Doc ID 757537.1 mentioned by JSapkota states clearly, that this is not a bug, but correct/expected behaviour:
STALENESS of the mview, refering to PL/SQL function is set to UNKOWN
as one cannot determine PL/SQL function changes. Current behaviour is
correct as per the design & code.
I guess using DETERMINISTIC functions instead of the default scope could prevent it.
As per the My Oracle Support this could be a bug(7582462).
As there is no solution to this bug, you have to deal with fact that staleness will show unknown, or not use functions on Materialized View definition.
Reference:DBA_MVIEWS Shows STALENESS Value of UNKNOWN After Refresh (Doc ID 757537.1)

Resources