unixODBC isql set hive config variable - hadoop

I have a unixODBC connection with hive:
isql -v Hive
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL>
E.g. select install_dt, count(1) from device_metrics.sometable where install_dt >= '2020-04-10' group by install_dt;
Returns expected results.
I would like to run this query but with some hive variable settings. For example, I would liek to set the execution engine to be mr not default tez. When connected to hive directly, outside of odbc I can just do:
set hive.execution.engine=mr;
select ... [my query to run with mr here...
With isql I tried this:
SQL> set hive.execution.engine=mr;
SQLRowCount returns -1
I'm not really sure what SQLRowCount returns -1 but I guess it means either it was an error or no rows were affected?
Either way, I tried running my select query again after trying to configure this setting:
SQL> set hive.execution.engine=mr;
SQLRowCount returns -1
select install_dt, count(1) from device_metrics.sometable where install_dt >= '2020-04-10' group by install_dt;
When I then look at our hadoop running applications page I can see my second attempt at the query but it's still running with tez. Expected and desired behavior was that it would run with mr.
Is it possible configure hive settings with unixODBC connection? If so How can I tell hive to use mr engine and not tez?

Related

Hive "insert into" doesnt add values

Im new to hadoop etc.
Connect via beeline to hiveserver2. Then I create table:
create table test02(id int, name string);
Table creates and I try to insert values:
insert into test02(id, name) values (1, "user1");
And nothing happens. table02 and values__tmp__table__1 are created but they are both empty.
Hadoop directory "/user/$username/warehouse/test01" is empty to.
0: jdbc:hive2://localhost:10000> insert into test02 values (1,"user1");
No rows affected (2.284 seconds)
0: jdbc:hive2://localhost:10000> select * from test02;
+------------+--------------+
| test02.id | test02.name |
+------------+--------------+
+------------+--------------+
No rows selected (0.326 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+------------------------+
| tab_name |
+------------------------+
| test02 |
| values__tmp__table__1 |
+------------------------+
2 rows selected (0.137 seconds)
Temp tables like these are created when hive needs to manage intermediate data during an operation. Hive automatically deletes all temporary tables at the end of the Hive session in which they are created. If you close the session and open it again, you won't find the temp table.
https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.5.0/bk_data-access/content/temp-tables.html
Insert data like this ->
insert into test02 values (999, "user_new");
Data would be inserted into test02 and a temp table like values__tmp__table__1 (temp table will gone after the hive session).
I found a solution. I'm new to Hadoop&co, so the answer was not obvious to me.
First, I turned Hive logging to level ERROR to see the problem:
Find hive-exec-log4j2.properties ({your hive directory}/conf/)
Find property.hive.log.level and set the value to ERROR (..log.level = ERROR)
Then, while executing the command insert into via Beeline, I saw all of the errors. The main error was:
There are 0 datanode(s) running and no node(s) are excluded in this operation
I found the same question elsewhere. The top answer helped me, which was to delete all /tmp/* files (which stored all of my local HDFS data).
Then, like the first time, I initialized namenode (-format) and Hive (ran my metahive script).
The problem was solved—though it did expose another issue, which I'll need to look into: the insert into executes in 25+ seconds.

Impala via JDBC: retrieve number of dropped partitions

I am dropping multiple partitions of an Impala table via
ALTER TABLE foobar DROP IF EXISTS PARTITION (pkey='foo' OR pkey='bar');
When using impala-shell I am presented a result telling me how many partitions were actually dropped:
Starting Impala Shell without Kerberos authentication
***********************************************************************************
Welcome to the Impala shell.
(Impala Shell v3.2.0-cdh6.3.2 (1bb9836) built on Fri Nov 8 07:22:06 PST 2019)
The SET command shows the current value of all shell and query options.
***********************************************************************************
Opened TCP connection to impala:21000
Connected to impala:21000
Server version: impalad version 3.2.0-cdh6.3.2 RELEASE (build 1bb9836227301b839a32c6bc230e35439d5984ac)
[impala:21000] default> use my_schema;
Query: use my_schema
[impala:21000] my_schema> ALTER TABLE FOOBAR DROP IF EXISTS PARTITION (pkey='foo' OR pkey='bar');
Query: ALTER TABLE FOOBAR DROP IF EXISTS PARTITION (pkey='foo' OR pkey='bar')
+-------------------------+
| summary |
+-------------------------+
| Dropped 1 partition(s). |
+-------------------------+
Fetched 1 row(s) in 0.13s
Now, in our productive code, we are stuck using only JDBC. When executing the same DDL statement via JDBC, for my Statement st I have st.getResultSet() == null and st.getUpdateCount() == -1
Is there a way to retrive the number of dropped partitions via JDBC only?

Show runtime of a query on monetdb

I am testing monetdb for a colunmnar storage.
I already installed and run the server
but, when I connect to the client and run a query, the response does not show the time to execute the query.
I am connecting as:
mclient -u monetdb -d voc
I already tried to connect with interactive like:
mclient -u monetdb -d voc -i
Output example:
sql>select count(*) from voc.regions;
+---------+
| L3 |
+=========+
| 5570699 |
+---------+
1 tuple
As mkersten mentioned, I would read through the options of the mclient utility first.
To get server and client timing measurements I used --timer=performance option when starting mclient.
Inside mclient I would then disable the result output by setting \f trash to ignore the results when measuring only.
Prepend trace to your query and you get your results like this:
sql>\f trash
sql>trace select count(*) from categories;
sql:0.000 opt:0.266 run:1.713 clk:5.244 ms
sql:0.000 opt:0.266 run:2.002 clk:5.309 ms
The first of the two lines shows you the server timings, the second one the overall timing including passing the results back to the client.
If you use the latest version MonetDB-Mar18 you have good control over the performance timers, which includes parsing, optimization, and runtime at server. See mclient --help.

ERROR 1064 (42000) at line 38: check the manual that corresponds to your MySQL server version for the right syntax to use near '//

I am trying to run a mysql script on centos. I have following mysql installed.
mysql> SHOW VARIABLES LIKE "%version%";
+-------------------------+------------------------------------------------------+
| Variable_name | Value |
+-------------------------+------------------------------------------------------+
| innodb_version | 5.6.25-73.1 |
| protocol_version | 10 |
| slave_type_conversions | |
| version | 5.6.25-73.1 |
| version_comment | Percona Server (GPL) |
| version_compile_machine | x86_64 |
| version_compile_os | Linux |
+-------------------------+------------------------------------------------------+
My sample script looks like:
DELIMITER //
DROP TRIGGER IF EXISTS trg_table1_category_insert;
DROP TRIGGER IF EXISTS trg_table1_category_update;
CREATE TRIGGER trg_table1_category_insert
AFTER INSERT
ON
table1_category
FOR EACH ROW
BEGIN
insert into table1_category_history (
table1_category_history_id,
table1_id,
transaction_start_date
) values (
new.table1_category_id,
new.table1_id,
new.create_date
);
END;
//
CREATE TRIGGER trg_table1_category_update AFTER UPDATE on table1_category FOR EACH ROW
BEGIN
insert into table1_category_history (
table1_category_history_id,
table1_id,
transaction_start_date
) values (
new.table1_category_id,
new.table1_id,
new.create_date
);
END;
//
DELIMITER ;
My database utilizes utf8 encoding. While importing this file in database on mysql client it keeps throwing
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '//
CREATE TRIGGER trg_table1_category_update AFTER UPDATE on ta' at line 1
I do not see any syntax error while using the delimiter, also it works on some machines absolutely fine, i have googled almost 100s of links and tried all the ways with downgrading/upgrading mysql server/client , my.cnf, charset etc it is not helping me out. Can anyone please help me on this? Can there be any settings done at client level to interpret it correctly.I am using the same version client that comes with mysql server installation.

Is it possible to count the number of partitions?

I am working on a test in which I must find out the number of partitions of a table and check if it is right. If I use show partitions TableName I get all the partitions by name, but I wish to get the number of partitions, like something along the lines show count(partitions) TableName (which retuns OK btw.. so it's not good) and get 12 (for ex.).
Is there any way to achieve this??
Using Hive CLI
$ hive --silent -e "show partitions <dbName>.<tableName>;" | wc -l
--silent is to enable silent mode
-e tells hive to execute quoted query string
You could use:
select count(distinct <partition key>) from <TableName>;
By using the below command, you will get the all partitions and also at the end it shows the number of fetched rows. That number of rows means number of partitions
SHOW PARTITIONS [db_name.]table_name [PARTITION(partition_spec)];
< failed pictoral example >
You can use the WebHCat interface to get information like this. This has the benefit that you can run the command from anywhere that the server is accessible. The result is JSON - use a JSON parser of your choice to process the results.
In this example of piping the WebHCat results to Python, only the number 24 is returned representing the number of partitions for this table. (Server name is the name node).
curl -s 'http://*myservername*:50111/templeton/v1/ddl/database/*mydatabasename*/table/*mytablename*/partition?user.name=*myusername*' | python -c 'import sys, json; print len(json.load(sys.stdin)["partitions"])'
24
In scala you can do following:
sql("show partitions <table_name>").count()
I used following.
beeline -silent --showHeader=false --outputformat=csv2 -e 'show partitions <dbname>.<tablename>' | wc -l
Use the following syntax:
show create table <table name>;

Resources