I have been trying to run this piece of code to drop current day's partition from hive a table and for some reason it does not drop the partition from the hive table. Not sure what's worng.
Table Name : prod_db.products
desc:
+----------------------------+-----------------------+-----------------------+--+
| col_name | data_type | comment |
+----------------------------+-----------------------+-----------------------+--+
| name | string | |
| cost | double | |
| load_date | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| | NULL | NULL |
| load_date | string | |
+----------------------------+-----------------------+-----------------------+--+
## I am using the following code
SET hivevar:current_date=current_date();
ALTER TABLE prod_db.products DROP PARTITION(load_date='${current_date}');
Before and After picture of partitions:
+-----------------------+--+
| partition |
+-----------------------+--+
| load_date=2022-04-07 |
| load_date=2022-04-11 |
| load_date=2022-04-18 |
| load_date=2022-04-25 |
+-----------------------+--+
It runs without any error but doesn't work but won't drop the partition. Table is internal/managed.
I tried different ways mentioned on stack but it is just not working for me.
Help.
You dont need to set a variable. You can directly drop using direct sql.
Alter table prod_db.products
drop partition (load_date= current_date());
My hive table is a managed table and i can see the files present in HDFS.
While querying through hive it does not display any result.
hive> describe formatted emp
Result -
| Table Type: | MANAGED_TABLE
| Table Parameters: | NULL
| 2 | bucketing_version
| 1376 | numFiles
| 43 | numPartitions
| 0 | numRows
| gzip | parquet.compression
| 0 | rawDataSize
| 4770821594 | totalSize
| true | transactional
| insert_only | transactional_properties
| 1612857428 | transient_lastDdlTime
While selecting data from table -
select * from emp;
it fetches no results.
Why there is difference in HDFS and select output.
Command worked for me -
ANALYZE TABLE table_name COMPUTE STATISTICS FOR COLUMNS;
I am using MonetDB v11.29.7 "Mar2018-SP1" on a Windows10 x64 bit operating system. When I perform a full outer join with two tables on respective varchar columns with lengths > 0 (type_digits > 0), the resultant column in the target table yields a varchar column with type_digits=0, although the column data seems to display the proper, non-null varchar records.
I am not sure how to interpret column information of type=varchar and type_digits=0. This state is causing issues in the subsequent handling/extraction of data via Python interfaces (UDFs), as the expected Python dtype for the data of this column is ambiguous for Python numpy conversion.
I have provided a simple example whereby I created two small tables (dummy4 and dummy5) with two columns each and then create a third table (dummy6) using a full outer join command.
For table dummy6 and column "key", I would have expected the type_digits=32 (as per the "key" columns in the two source tables dummy4 & dummy5). Additionally, how should I interpret type=varchar and type_digits=0 state? What would be the proper handling/expectation when accessing/allocating a Python/numpy array for extracting the "key" column of table "dummy6" (via Python UDFs) in this case?
create table dummy4(key varchar(32), val int);
insert into dummy4 values('AAAAAAAA',1);
insert into dummy4 values('BBBBBBBBB',2);
select * from dummy4;
+-----------+------+
| key | val |
+===========+======+
| AAAAAAAA | 1 |
| BBBBBBBBB | 2 |
+-----------+------+
create table dummy5(key varchar(32), val int);
insert into dummy5 values('CCCCCCCC',3);
insert into dummy5 values('DDDDDDDD',4);
select * from dummy5;
+----------+------+
| key | val |
+==========+======+
| CCCCCCCC | 3 |
| DDDDDDDD | 4 |
+----------+------+
create table dummy6 as select key, dummy4.val as "val4", dummy5.val as "val5" from dummy4 full outer join dummy5 using (key);
select * from dummy6;
+-----------+------+------+
| key | val4 | val5 |
+===========+======+======+
| AAAAAAAA | 1 | null |
| BBBBBBBBB | 2 | null |
| CCCCCCCC | null | 3 |
| DDDDDDDD | null | 4 |
+-----------+------+------+
select t.name as "table_name", t.id as "table_id", c.id as "column_id", c.name as "column_name", c.type, c.type_digits from sys.tables t JOIN sys.columns c ON c.table_id = t.id where t.name = 'dummy4';
+------------+----------+-----------+-------------+---------+-------------+
| table_name | table_id | column_id | column_name | type | type_digits |
+============+==========+===========+=============+=========+=============+
| dummy4 | 78445 | 78443 | key | varchar | 32 |
| dummy4 | 78445 | 78444 | val | int | 32 |
+------------+----------+-----------+-------------+---------+-------------+
select t.name as "table_name", t.id as "table_id", c.id as "column_id", c.name as "column_name", c.type, c.type_digits from sys.tables t JOIN sys.columns c ON c.table_id = t.id where t.name = 'dummy5';
+------------+----------+-----------+-------------+---------+-------------+
| table_name | table_id | column_id | column_name | type | type_digits |
+============+==========+===========+=============+=========+=============+
| dummy5 | 78449 | 78447 | key | varchar | 32 |
| dummy5 | 78449 | 78448 | val | int | 32 |
+------------+----------+-----------+-------------+---------+-------------+
select t.name as "table_name", t.id as "table_id", c.id as "column_id", c.name as "column_name", c.type, c.type_digits from sys.tables t JOIN sys.columns c ON c.table_id = t.id where t.name = 'dummy6';
+------------+----------+-----------+-------------+---------+-------------+
| table_name | table_id | column_id | column_name | type | type_digits |
+============+==========+===========+=============+=========+=============+
| dummy6 | 78457 | 78454 | key | varchar | 0 |
| dummy6 | 78457 | 78455 | val4 | int | 32 |
| dummy6 | 78457 | 78456 | val5 | int | 32 |
+------------+----------+-----------+-------------+---------+-------------+
In fact this was a MonetDB's bug and was fixed today. Th fix will be featured on the upcoming Nov2019 release.
There is a database that contains several views and tables.
I need create a report (documentation of database) with a list of all the fields in these tables indicating the type and, if possible, an indication of the minimum/maximum values and values from first row. For example:
.------------.--------.--------.--------------.--------------.--------------.
| Table name | Column | Type | MinValue | MaxValue | FirstRow |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | day | date | ‘2010-09-17’ | ‘2016-12-10’ | ‘2016-12-10’ |
:------------+--------+--------+--------------+--------------+--------------:
| Table1 | price | double | 1030.8 | 29485.7 | 6023.8 |
:------------+--------+--------+--------------+--------------+--------------:
| … | | | | | |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | day | date | ‘2014-06-20’ | ‘2016-11-28’ | ‘2016-11-16’ |
:------------+--------+--------+--------------+--------------+--------------:
| TableN | owner | string | NULL | NULL | ‘Joe’ |
'------------'--------'--------'--------------'--------------'--------------'
I think the execution of many queries
SELECT MAX(column_name) as max_value, MIN(column_name) as min_value
FROM table_name
Will be ineffective on the huge tables that are stored in Hadoop.
After reading documentation found an article about "Statistics in Hive"
It seems I must use request like this:
ANALYZE TABLE tablename COMPUTE STATISTICS FOR COLUMNS;
But this command ended with error:
Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.ColumnStatsTask
Do I understand correctly that this request add information to the description of the table and not display the result? Will this request work with view?
Please suggest how to effectively and automatically create documentation for the database in HIVE?
I have got a table with name table_listnames whose structure is given below
mysql> desc table_listnames;
+-------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+----------------+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | | NULL | |
+-------+--------------+------+-----+---------+----------------+
2 rows in set (0.04 sec)
It has got sample data as shown
mysql> select * from table_listnames;
+----+------------+
| id | name |
+----+------------+
| 6 | WWW |
| 7 | WWWwww |
| 8 | WWWwwws |
| 9 | WWWwwwsSSS |
| 10 | asdsda |
+----+------------+
5 rows in set (0.00 sec)
I have a requirement where if name not found under the table , i need to insert or else do nothing
I am achieving it this way
String sql = "INSERT INTO table_listnames (name) SELECT name FROM (SELECT ?) AS tmp WHERE NOT EXISTS (SELECT name FROM table_listnames WHERE name = ?) LIMIT 1";
pst = dbConnection.prepareStatement(sql);
pst.setString(1, salesName);
pst.setString(2, salesName);
pst.executeUpdate();
Is it possible to know the id of the record of the given name in this case