Impala via JDBC: retrieve number of dropped partitions - jdbc

I am dropping multiple partitions of an Impala table via
ALTER TABLE foobar DROP IF EXISTS PARTITION (pkey='foo' OR pkey='bar');
When using impala-shell I am presented a result telling me how many partitions were actually dropped:
Starting Impala Shell without Kerberos authentication
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v3.2.0-cdh6.3.2 (1bb9836) built on Fri Nov 8 07:22:06 PST 2019)
The SET command shows the current value of all shell and query options.
***********************************************************************************
Opened TCP connection to impala:21000
Connected to impala:21000
Server version: impalad version 3.2.0-cdh6.3.2 RELEASE (build 1bb9836227301b839a32c6bc230e35439d5984ac)
[impala:21000] default> use my_schema;
Query: use my_schema
[impala:21000] my_schema> ALTER TABLE FOOBAR DROP IF EXISTS PARTITION (pkey='foo' OR pkey='bar');
Query: ALTER TABLE FOOBAR DROP IF EXISTS PARTITION (pkey='foo' OR pkey='bar')
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.13s
Now, in our productive code, we are stuck using only JDBC. When executing the same DDL statement via JDBC, for my Statement st I have st.getResultSet() == null and st.getUpdateCount() == -1
Is there a way to retrive the number of dropped partitions via JDBC only?

Related

How to retain last N partitions for a hive external table?

I need to retain say last 7 partitions and data of a given hive external table.
This can be either done via a shell script or a hive hql script.
The table is partitioned by intgestion_date=YYYY-MM-DD
what would be the best way to find the cutoff date (of 7th partition) which I can then use in the drop partitions where clause to drop everything older than that.
since it's an external table, I will have to change the table properties to make it internal before the drop and then revert it.
There are different possible approaches: drop all partitions older than 7 days, this is easy (shell):
hive -e "ALTER TABLE mytable DROP IF EXISTS PARTITION(intgestion_date < '$(date -d "7 days ago" '+%Y-%m-%d')')"
But it seems this is not exactly what you want. Need to get 7th partition first and use it in the previous statement. Execute show partition, use sort, head and tail to get 7th partition:
seventh_partition=$(hive -e -S "show partitions table_name" | sort -r | head -n 7 | tail -n 1)
#extract value
part_value=${seventh_partition#*=}
#Execute drop older than 7th partition. Replace hive -e with echo and check what it prints
hive -e "ALTER TABLE table_name DROP IF EXISTS PARTITION(intgestion_date < '$part_value')"

Hive "insert into" doesnt add values

Im new to hadoop etc.
Connect via beeline to hiveserver2. Then I create table:
create table test02(id int, name string);
Table creates and I try to insert values:
insert into test02(id, name) values (1, "user1");
And nothing happens. table02 and values__tmp__table__1 are created but they are both empty.
Hadoop directory "/user/$username/warehouse/test01" is empty to.
0: jdbc:hive2://localhost:10000> insert into test02 values (1,"user1");
No rows affected (2.284 seconds)
0: jdbc:hive2://localhost:10000> select * from test02;
+------------+--------------+
| test02.id | test02.name |
+------------+--------------+
+------------+--------------+
No rows selected (0.326 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+------------------------+
| tab_name |
+------------------------+
| test02 |
| values__tmp__table__1 |
+------------------------+
2 rows selected (0.137 seconds)
Temp tables like these are created when hive needs to manage intermediate data during an operation. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created. If you close the session and open it again, you won't find the temp table.
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/temp-tables.html
Insert data like this ->
insert into test02 values (999, "user_new");
Data would be inserted into test02 and a temp table like values__tmp__table__1 (temp table will gone after the hive session).
I found a solution. I'm new to Hadoop&co, so the answer was not obvious to me.
First, I turned Hive logging to level ERROR to see the problem:
Find hive-exec-log4j2.properties ({your hive directory}/conf/)
Find property.hive.log.level and set the value to ERROR (..log.level = ERROR)
Then, while executing the command insert into via Beeline, I saw all of the errors. The main error was:
There are 0 datanode(s) running and no node(s) are excluded in this operation
I found the same question elsewhere. The top answer helped me, which was to delete all /tmp/* files (which stored all of my local HDFS data).
Then, like the first time, I initialized namenode (-format) and Hive (ran my metahive script).
The problem was solved—though it did expose another issue, which I'll need to look into: the insert into executes in 25+ seconds.

unixODBC isql set hive config variable

I have a unixODBC connection with hive:
isql -v Hive
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
E.g. select install_dt, count(1) from device_metrics.sometable where install_dt >= '2020-04-10' group by install_dt;
Returns expected results.
I would like to run this query but with some hive variable settings. For example, I would liek to set the execution engine to be mr not default tez. When connected to hive directly, outside of odbc I can just do:
set hive.execution.engine=mr;
select ... [my query to run with mr here...
With isql I tried this:
SQL> set hive.execution.engine=mr;
SQLRowCount returns -1
I'm not really sure what SQLRowCount returns -1 but I guess it means either it was an error or no rows were affected?
Either way, I tried running my select query again after trying to configure this setting:
SQL> set hive.execution.engine=mr;
SQLRowCount returns -1
select install_dt, count(1) from device_metrics.sometable where install_dt >= '2020-04-10' group by install_dt;
When I then look at our hadoop running applications page I can see my second attempt at the query but it's still running with tez. Expected and desired behavior was that it would run with mr.
Is it possible configure hive settings with unixODBC connection? If so How can I tell hive to use mr engine and not tez?

Presto Query HIVE Table Exception: Failed to list directory

I'm new to Presto. I have two machine for presto 0.160, one is coordinator, the other is worker. I want to query table in hive. Now I can "show tables", "desc tablename", but when I want to "select * from tablename", exception occured: "Query 20170728_123013_00011_q4s3a failed: Failed to list directory: hdfs://cdh-test/user/hive/warehouse/employee_hive"
presto> desc hive.default.employee_hive;
Column | Type | Comment
-------------+---------+---------
eid | integer |
name | varchar |
salary | varchar |
destination | varchar |
(4 rows)
Query 20170728_123001_00010_q4s3a, FINISHED, 2 nodes
Splits: 2 total, 2 done (100.00%)
0:00 [4 rows, 268B] [40 rows/s, 2.68KB/s]
presto> select * from hive.default.employee_hive;
Query 20170728_123013_00011_q4s3a, FAILED, 1 node
Splits: 1 total, 0 done (0.00%)
0:00 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20170728_123013_00011_q4s3a failed: Failed to list directory: hdfs://cdh-test/user/hive/warehouse/employee_hive
Here is my configuration for hive catalog:
connector.name=hive-cdh4
hive.metastore.uri=thrift://***:9083
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
where am I wrong?
The path that the table is stored on needs to exist on HDFS for Presto to open it successfully. From the path it appears your table is an "internal" hive table, meaning hive should have created the path itself. Since it hasn't, you could create it yourself using a command similar to hdfs dfs -mkdir hdfs://cdh-test/user/hive/warehouse/employee_hive, although the exact command depends on your HDFS set up.
you can't access the hadoop directory directory. I hope you have created the table as textfile and it stores internal directory of respective user.
you just create table as external table and you can able to access via presto
Create External Table tablename (columnames datatypes) row format delimited fields terminated by '\t' stored as textfile;
load data inpath 'Your_hadoop_directory' into table tablename;
else you just create a internal table and load it to external ORC table and access via presto
Create Table tablename (columnames datatypes) row format delimited fields terminated by '\t' stored as textfile;
load data inpath 'Your_hadoop_directory' into table tablename;
Create external Table tablename (columnames datatypes) STORED AS ORC;
insert into orc_tablename select * from internal_tablename
I solved above issue by creating ORC table.

Can not contact a hive table partition, after delete hdfs file related to partition

My Hadoop Cluster works batch job for every data at 11:00.
The job creates hive table partition(ex. p_date=201702,p_domain=0) and import rdbms data to the hive table partition like ETL....(hive table is not external table)
but the job has failed, and i removed some hdfs file(the partition location => p_date=20170228,p_domain=0) for reprocess.
It is my mistake, i just a typing query for drop partition at beeline...
And i contact a hang when i query this way "select * from table_name where p_date=20170228,p_domain=0", But "select * from table_name where p_date=20170228,p_domain=6" is success.
I can not find a error log and console message is not appear
How can i solve this problem?
And i hope you understand my lack of english.
You should not delete your partitions in Hive table in that way. There is a special command for doing this:
ALTER TABLE table_name DROP IF EXISTS PARTITION(partitioncolumn= 'somevalue');
Deleteing the files from HDFS is not sufficient. You need to clean the data from the metastore. For this you need to connect to you relational db and remove the data from partition-related table in MetaStore database.
mysql
mysql> use hive;
mysql> SELECT PART_ID PARTITIONS WHERE PART_NAME like '%p_date=20170228,p_domain=0%'
+---------+-------------+------------------+--------------------+-------+--------+
| PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID |
+---------+-------------+------------------+--------------------+-------+--------+
| 7 | 1487237959 | 0 | partition name | 336 | 329 |
+---------+-------------+------------------+--------------------+-------+--------+
mysql> DELETE FROM PARTITIONS WHERE PART_ID=7;
mysql> DELETE FROM PARTITION_KEY_VALS WHERE PART_ID=7;
mysql> DELETE FROM PARTITION_PARAMS WHERE PART_ID=7;
After this Hive should stop using this partition in your queries.

Resources