Hive Transactions + Remote Metastore Error - hadoop

I'm running Hive 2.1.1 on EMR 5.5.0 with a remote mysql metastore DB. I need to enable transactions on hive, but when I follow the configuration here and run any query, I get the following error
FAILED: Error in acquiring locks: Error communicating with the metastore
Settings on the metastore:
hive.compactor.worker.threads = 0
hive.compactor.initiator.on = true
Settings in the hive client:
SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
This only happens when I set hive.txn.manager, so my hive metastore is definitely online.
I've tried some of the old suggestions of turning hive test features on which didn't work, but I don't think this is a test feature anymore. I can't turn off concurrency as a similar post in SO suggests because I need concurrency. It seems like the problem is that either DbTxnManager isn't getting the remote metastore connection info properly from hive config or the mysqldb is missing some tables required by DbTxnManager. I have datanucleus.autoCreateTables=true.

It looks like hive wasn't properly creating the tables needed for the transaction manager. I'm not sure where it was getting its schema, but it was definitely wrong.
So we just ran the hive-txn-schema query to setup the schema manually. We'll do this at the start of any of our clusters from now on.
https://github.com/apache/hive/blob/master/metastore/scripts/upgrade/mysql/hive-txn-schema-2.1.0.mysql.sql

The error from
FAILED: Error in acquiring locks: Error communicating with the metastore
sometimes because of it without any data, you need to initialization some data in your tables. for example below.
create table t1(id int, name string)
clustered by (id) into 8 buckets
stored as orc TBLPROPERTIES ('transactional'='true');

Related

Kafka Connect sink to Redshift table not in public schema

I am unable to make a Kafka Connect sink work for a table that is not in the public schema.
I am using Kafka Connect to send records to a Redshift database via a sink operation using JdbcSinkConnector.
I have created my destination table in Redshift, but it is not in the public schema. (my_schema.test_table. Note: auto.create & auto.evolve are off in the connector configuration)
When I attempt to specify the table's location in the connector config, like so...
"table.name.format": "my_schema.test_table",
...the sink connector's task encounters this error when it attempts to get itself going:
"Table my_schema.test_table is missing and auto-creation is disabled"
from
Caused by: org.apache.kafka.connect.errors.ConnectException: Table my_schema.test_table is missing and auto-creation is disabled
at io.confluent.connect.jdbc.sink.DbStructure.create(DbStructure.java:86)
at io.confluent.connect.jdbc.sink.DbStructure.createOrAmendIfNecessary(DbStructure.java:63)
at io.confluent.connect.jdbc.sink.BufferedRecords.add(BufferedRecords.java:78)
...
I have tried the following formats for supplying table name:
my_schema.test_table
dev.my_schema.test_table
test_table <-- in this case I get past the existence check that stops the others, but then run into this error every time Kafka Connect attempts to write a row:
"org.apache.kafka.connect.errors.RetriableException: java.sql.SQLException: java.sql.SQLException: Amazon Invalid operation: relation "test_table" does not exist;"
Likely because test_table is not in the public schema. : (
And it seems like the code is attempting to parse this table name correctly, but unfortunately it doesn't log its results.
This is my connection string: "connection.url": "jdbc:redshift://xxx.xxx.xxx.xxx:5439/dev"
I have mucked around with attempting so specify currentSchema=my_schema in the connection string... both for the redshift jdbc driver as well as postgresql. No luck.
I'm using Kafka Connect version 1.1.0
Redshift JDBC JAR: RedshiftJDBC42-1.2.16.1027.jar
I am able to get data flowing by putting the table in the public schema and specifying table name with no schema: "table.name.format": "test_table".
Unfortunately, that's not where we need the data to be.
Any help much appreciated.
I noticed that the source code seemed to be trying to do the right thing… and then realized that the version of the JDBC sink connector we were using did not have those modifications, which are somewhat recent. I moved from version 4.1.0 of the JDBC sink connector jar to version 5.0.0 and voila data is flowing into a table in the schema I specified. 🙃

After restarting the services, the impala tables are not coming up

After restarting the Impala server, we are not able to see the tables(i.e. tables are not coming up).Anyone help me what order we have to follow to avoid this issue.
Thanks,
Srinivas
You should try running "invalidate metadata;" from impala-shell. This usually clears up tables not being visible as impala caches metadata.
From:
https://www.cloudera.com/documentation/enterprise/5-8-x/topics/impala_invalidate_metadata.html
The following example shows how you might use the INVALIDATE METADATA
statement after creating new tables (such as SequenceFile or HBase tables) through the Hive shell. Before the INVALIDATE METADATA statement was issued, Impala would give a "table not found" error if you tried to refer to those table names.

Apache Sentry: SemanticException No valid privileges Required privileges for this query

I have unsecured cluster (CDH 5.4) and as I want to provide an access to data to more users, I would like to turn on the Sentry, so far without Kerberos (which comes after sucessful launch of Sentry).
As some other people might need Impala at the moment, I decided to set it up in Hive in first stage.
Steps I have taken:
1) I have set up 2 users: hive and tuser
tuser - group test
hive - group hive, zookeeper
group test
indexer.access, about.access, beeswax.access, filebrowser.access, hbase.write, hbase.access, help.access, impala.access, jobbrowser.access,
jobsub.access, metastore.write, metastore.access, oozie.dashboard_jobs_access, oozie.access, pig.access, proxy.access, rdbms.access,
search.access, security.impersonate, security.access, spark.access, sqoop.access, useradmin.access_view:useradmin:edit_user, useradmin.access, zookeeper.access
group hive
beeswax.access
group hive has role admin (the first one with an unlocked lock):
SERVER
server=server1 action=ALL
SERVER
server=server1 action=ALL
group test has role neco
SERVER
server=server1 action=ALL
URI
server=server1 hdfs://...:8020/user/hive/warehouse action=ALL
DATABASE
server=server1 db=default action=ALL
Moreover, the user hive is in both sets sentry.service.admin.group and sentry.service.allow.connect.
2) I have turned on the sentry
- in Hive checked the Sentry Service from "none" to "Sentry"
- in Hive Service Advanced Configuration Snippet (Safety Valve) for sentry-site.xml inserted <property> <name>sentry.hive.testing.mode</name><value>true</value></property>
+ restarted Sentry
Result:
User hive can access anything in Hive. That's what I was expecting.
User tuser can't access anything in Hive: Error while compiling statement: FAILED: SemanticException No valid privileges Required privileges for this query: Server=server1->Db=*->Table=+->action=insert;Server=server1->Db=*->Table=+->action=select;
What am I missing?
Finally I was adviced what was wrong: The Hue groups must be the same as the groups on the Namenode's linux (as the HDFS org.apache.hadoop.security.ShellBasedUnixGroupsMapping is checked). In the case of Impala, all of nodes with Impala Daemons have to have same groups.
However, I am going to overtake the groups from LDAP (option org.apache.hadoop.security.LdapGroupsMapping).

Cloudera/Hive - Can't access tables after hostname change

I created a Cloudera cluster and imported some sample test files from oracle DB. But after a while I had to change the hostnames of the nodes. I followed the guide mentioned in cloudera site and everything worked fine. But when I try to access tables(using both hive and impala) I created earlier I get the following error:
Fetching results ran into the following error(s):
java.io.IOException: java.lang.IllegalArgumentException: java.net.UnknownHostException: [Old Host Name]
Then I created another table under the same DB (Using Hue>Metastore Tables) and I can access these new tables created under the new hostname with no issue.
Can someone explain how I can access my old tables without reverting back my hostnames. Can I access metastore db and change the table pointers to new hostname.
Never Mind I found the answer.
You can confirm that hive/impala is looking for the wrong location by executing
describe formatted [tablename];
O/P
14 Location: hdfs://[oldhostname]:8020/user/hive/warehouse/sample_07 NULL
Then you can change "Location" property using :
ALTER TABLE sample_07 SET LOCATION "hdfs://[newhostname]:8020/user/hive/warehouse/sample_07";
ps - sample_07 is my the table in concern
Some times this doesn't WORK !!
Above workaround works for sample table which is available by default but I had another table which I sqooped from external DB to a custome metastore DB, this gave me again an error similar to above.
Solution :
Go to host where you've installed hive.
Temporally add the old hostname of the hive server to /etc/hosts (if you don't have external DNS both new and old hostnames should exist in the same host file)
Execute the 'ALTER TABLE ....' at hive shell (or web interface)
Remove the oldhostname entry from /etc/hosts
Try this
hive --service metatool -updateLocation <newfsDefaultFSValue> <old_fsDefaultFSValue>
You can refer to https://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.trb.doc/doc/trb_inst_hive_hostnames.html

Tables not found when hive cli called from different directory

I am facing a weird problem with Hive Tables. I have HIVE_HOME set in my environ and it is also in my search path so i can invoke hive directly.
Now I invoke hive from a directory lets say /a/b/c and create some tables. I can see the tables.
Now I change to a directory e.g /a/b and invoke hive from there. Here is the problem part. Either i am unable to see the tables or i get this error
hive> show tables;
FAILED: Error in metadata: javax.jdo.JDOFatalDataStoreException: Failed to start
database 'metastore_db', see the next exception for details.
NestedThrowables:
java.sql.SQLException: Failed to start database 'metastore_db', see the next exception
for details.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask
Why are tables tied to the directory from which the hive cli was called from? Any pointers?
I think you are using derby server which hive uses for storing the metadata. So, for that what you can do is delete everything inside metastore_db folder and then try to restart the hadoop. And then try to see. But, i think best advice would be you use the mysql as a metastore.

Resources