This question already has answers here:
How to change sqoop metastore?
(3 answers)
Closed 5 years ago.
am planning to change sqoop metastore to mysql db(am using hadoop 2.65, mysql 5.7, sqoop 1.4.6)
by defalut where the sqoop metastore will be stored, like sqoop job definition's (like hive metadata will be stored in derby db)..
created sqoop job's and able to see those by sqoop job --list n executing those as well, how do i confirm that all the metadata is going to store in mysql..
i went through the google didn't get good one,can any one please provide good documentation or google link
thanks in advance
Check your sqoop-site.xml for sqoop.metastore.server.location parameter. It will tell you how Sqoop is configured to use metastore.
You can configure sqoop.metastore.client.autoconnect.url to point to your metastore and then create and execute saved jobs.
Generally, we have two options w.r.t metastore:
Internal metastore - maintained by sqoop and built over hsqldb
External metastore - like Hive metastore
It would be great if you can post your observations(along with code) here for others to refer.
Related
I try to connect my apache zeppelin with my hive metastore. I use zeppelin 0.7.3 so there is not a hive interpreter only jdbc. I have copied my hive-site.xml to zeppelin conf folder but I don't know how to create a new hive interpreter.
I also tried to access hive tables through spark's hive context but when I try this way, I can not see my hive databases only a default database is shown.
Can someone explain either how to create a hive interpreter or how to access my hive metastore through spark correctly?
Any answer is appreciated.
I solved it by following this documentation. After adding this parameters in jdbc connector you should be able to run hive interpreter with
%jdbc(hive)
In my case it was a little trickier because I used Cloudera Hadoop so the standard jdbc hive connector was not working. So I changed the external hive-jdbc.jar with the one suitable for my cdh version (for cdh 5.9.- for example it located here).
I also find out that you can change hive.url with the one for impala port and connect with jdbc to impala if you prefer.
I'm looking for a way to get HBASE data available/queriable in Vertica. I have seen that Vertica has a good integration with Hive's Metastore - HCatalog Connector.
The connector can read a table definition out of Hive Metastore and use the description to read the data directly.
The question is whether the connector supports the reading of Hive external tables configured with non-standard StorageHandler, HBaseStorageHandler in particular.
I have tried this long time ago and I was able to read Hive external tables using the HiveHBaseStorageHandler ( i think the name of the jar is hive-hbase-handler.jar) . Please give it a try and let us know. You need to place this jar in /opt/vertica/packages/hcat/lib/ .
I found a post that discusses about connecting Tableau to Elastic Search via Hive SQL. I was wondering if there is a way to connect to Elastic Search via Spark SQL as I am not much familiar with hive.
Thanks.
#busybug91,
The right driver is here please try with this one. Could be solve your issue.
#NicholasY It got it resolved after a couple of trials. Two steps that I took:-
I wasn't using the right driver for connection. I was using datastax enterprise driver. However, they have a driver for spark sql as well. I used windows 64bit version of driver. Using MapR Hadoop Hive and Hortonworks Hadoop Hive drivers didn't work as I've Apache hive.
When I used right driver (from DataStax) I realized that my hive metastore and spark-thrift-server running on same port. I changed spark-thrift-server's port to 10001 and a successful connection was established.
A new problem: I've created external table in hive. I am able to query the data as well. I start hive-metastore as a service. However, as mentioned on this link I am not able to see my tables in hive in Spark SQL. My connection of Tableau with Spark Sql is of no use unless I see tables from hive metastore!! When I do show tables; in spark sql (via spark-sql shell and hive metastore running as a service as same time), it runs a job which gives a completion time also but now table names. I monitored it via localhost:4040 I see that input and output size are 0.0. I believe I am not able to get tables from hive in spark sql that is why I don't see any table after connection is established from Tableau to spark sql.
EDIT
I changed metastore from derby to mysql for both hive and spark sql.
I'm trying to do that, so maybe i can help you to warn up something.
First, compile a Spark SQL version with Hive and thrift Server (ver 0.13):
export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -DskipTests clean package
You need to have a hive-site.xml properly configurered to work with Hive and to copy it to spark/conf folder.
Then, you have to set the $CLASSPATH with the elasticsearch-hadoop jar path.
Be careful ! Spark SQL 1.2.0 is not working with elasticsearch-hadoop-2.0.x. You have to use a elasticsearch-hadoop-2.1.0-Beta4 or BUILD-SNAPSHOT available here.
To finish you have to run thriftserver with something like that:
./start-thriftserver.sh --master spark://master:7077 --driver-class-path $CLASSPATH --jars /root/spark-sql/spark-1.2.0/lib/elasticsearch-hadoop-2.1.0.Beta4.jar --hiveconf hive.server2.thrift.bind.host 0.0.0.0 --hiveconf hive.server2.thrift.port 10000
It works for me but only on small docType ( 5000 rows ) , the data-colocation seems not working. I looking for a solution to move elasticsearch-hadoop.jar on each Spark workers as ryrobes did for Hadoop.
If you find a way to locate access to elasticsearch, let me know ;)
HTH,
We are running CDH 4.1.1 from the HUE / Beeswax Hive is runng fine and /beeswax/tables shows all tables.
I want to use the hive CLI to list all tables:
overlord#overlord-datanode1:~$ hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
Hive history file=/tmp/overlord/hive_job_log_overlord_201211280646_1426149164.txt
hive> SHOW TABLES;
OK
Time taken: 0.071 seconds
This appears to be empty, which leads me to believe that I'm maybe connecting to the wrong hive metastore?
How can I access the same hive data as from HUE/beeswax?
One possible reason is hive cli and beehive is using 2 different users(with different previlage) so when you switch users Meta store switch automatically(if it does not exist already).
If you are using derby as your metastore i would suggest you to migrated it to Mysql or PostgreSQL as derby is not suitable for production.
to migrate follow these guides.
http://www.mazsoft.com/blog/post/2010/02/01/Setting-up-HadoopHive-to-use-MySQL-as-metastore.aspx
https://ccp.cloudera.com/display/CDHDOC/Hive+Installation
I'm trying to create tables pragmatically using JDBC. However, I can't really see the table I created from the hive shell. What's worse, when i access hive shell from different directories, i see different result of the database.
Is any setting i need to configure?
Thanks in advance.
Make sure you run hive from the same directory every time because when you launch hive CLI for the first time, it creates a metastore derby db in the current directory. This derby DB contains metadata of hive tables. If you change directories, you will have unorganized metadata for hive tables. Also the Derby DB cannot handle multiple sessions. To allow for concurrent Hive access you would need to use a real database to manage the Metastore rather than the wimpy little derbyDB that comes with it. You can download mysql for this and change hive properties for jdbc connection to mysql type 4 pure java driver.
Try emailing the Hive userlist or the IRC channel.
You probably need to setup the central Hive metastore (by default, Derby, but it can be mySQL/Oracle/Postgres). The metastore is the "glue" between Hive and HDFS. It tells Hive where your data files live in HDFS, what type of data they contain, what tables they belong to, etc.
For more information, see http://wiki.apache.org/hadoop/HiveDerbyServerMode
Examine your hadoop logs. For me this happened when my hadoop system was not setup properly. The namenode was not able to contact the datanodes on other machines etc.
Yeah, it's due to the metastore not being set up properly. Metastore stores the metadata associated with your Hive table (e.g. the table name, table location, column names, column types, bucketing/sorting information, partitioning information, SerDe information, etc.).
The default metastore is an embedded Derby database which can only be used by one client at any given time. This is obviously not good enough for most practical purposes. You, like most users, should configure your Hive installation to use a different metastore. MySQL seems to be a popular choice. I have used this link from Cloudera's website to successfully configure my MySQL metastore.