When I am running Sqoop script in CLI, it is running fine without any issue. But when run it using Oozie, it failed with Sqoop command not found. It seems sqoop is not installed in other data nodes. So to run Sqoop script using Oozie, sqoop should be installed in all data nodes or is there any alternatives for that. Currently we have one master and 2 Data nodes.
Related
I am new to hadoop/HIve learning and struggling to fix this, for a distributed hadoop environment where should hive and pig need to install, is this edge node or where my hadoop installed
Hadoop installed on different server say hadoopVM, 2 separate data nodes DN1, DN2 & Edge Nodes from where I can submit jobs to hadoop to load any files to HDFS
till here i have no issue, i am trying to install hive edge node and getting below error
Attached error which i am getting on edgenode server
It seems that the Meta Store service is not started. start the service by issuing the following command in one of the session and don't close that session, and parallel start another session and try to use hive.
Active session mode:
sudo hive --service metastore
Background service mode:
If you add "&&" then service will be started and keep running as a background process.
sudo hive --service metastore &&
Altarnative:
If you still facing the problem then this is the problem because the new version of MySQL, you can refer my answer at below link.
SemanticException in Hive Shell Mode
Is there anyway to have Sqoop connected to different Hadoop clusters so that multiple Sqoop jobs can be created to export data to multiple hadoop clusters?
to export data to multiple hadoop clusters
If data is going into Hadoop, that's technically a Sqoop import
Not clear how you currently manage different clusters from one machine, but you would need to have the conf folder of all environments available for Sqoop to read
The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop. If you have multiple installations of Hadoop present on your machine, you can select the Hadoop installation by setting the $HADOOP_HOME environment variable.
For example:
$ HADOOP_HOME=/path/to/some/hadoop sqoop import --arguments...
or:
$ export HADOOP_HOME=/some/path/to/hadoop
$ sqoop import --arguments...
If $HADOOP_HOME is not set, Sqoop will use the default installation location for Cloudera’s Distribution for Hadoop, /usr/lib/hadoop.
The active Hadoop configuration is loaded from $HADOOP_HOME/conf/, unless the $HADOOP_CONF_DIR environment variable is set
https://sqoop.apache.org/docs/1.4.6/SqoopUserGuide.html#_controlling_the_hadoop_installation
Depending on how you setup Hadoop, Hortonworks only has Sqoop 1, while Cloudera (and maybe MapR) have Sqoop2, and those instructions are probably different since Sqoop2 architecture is different.
Just wondering can sqoop run without a hadoop cluster? sort of in a standalone mode? Has anyone tried to run sqoop on spark, please share some experiences on it.
To run Sqoop commands (both sqoop1 and sqoop2), Hadoop is a mandatory prerequisite. You cannot run sqoop commands without the Hadoop libraries.
Sqoop works in local mode too, so it is not a requirement that the Hadoop daemons must be running. To run sqoop in local mode,
sqoop [tool-name] -fs local -jt local [tool-arguments]
Sqoop on Spark is still In-Progress. See SQOOP-1532
I try to run the sqoop demo from Datastax Enterprise 4.8, I set up an Analytics cluster of 4 nodes, then with another node set up MySql, and populate the data as in the demo example, I followed all the steps of the demo, and everything seems working fine until the point where I actually run the sqoop data migration command. All DBs are created correctly, and cluster is running fine (I can see it with nodetool status and with OpsCenter), but when I run the sqoop command, I got an exception:
host# /bin/dse sqoop --options-file /usr/share/dse/demos/sqoop/import.options
/usr/share/dse/bin/dse.in.sh: line 4: /bin/dse-client-tool: No such file or directory
Unable to start sqoop: jobtracker not found
The import.options file:
*cql-import
--table
npa_nxx
--cassandra-keyspace
npa_nxx
--cassandra-table
npa_nxx_data
--cassandra-column-mapping
npa:npa,nxx:nxx,latitude:lat,longitude:lon,state:state,city:city
--connect
jdbc:mysql://10.xxx.xxx.xxx/npa_nxx_demo
--username
root
--password
xxxxx
--cassandra-host
10.xxx.xxx.xxx,10.xxx.xxx.xxx*
anyone has ideas why is this error? I reinstalled the DSE, and still got the same... Thanks.
I found the reason, need to do a softlink of the dse-client-tool in /bin dir:
# ln -s /usr/shares/dse/bin/dse-client-tool /bin/dse-client-tool
then it works, not sure why the link not created during the installation...
Start DSE as an analytics node.
Edit /etc/default/dse, set HADOOP_ENABLED=1 in the cassandra.yaml to start the DSE service.
bin/dse cassandra -t
I've just installed hadoop on windows using cygwin which works fine, and now I am installing Hive. I am running it as:
bin/hive -hiveconf java.io.tmpdir=/cygdrive/c/cygwin/tmp
OR
bin/hive -hiveconf java.io.tmpdir=/tmp
(both give the same problem) as I have found out there is a bug with the windows naming convension (https://issues.apache.org/jira/browse/HIVE-2388...)
When I run the above command, Hive seems to load fine, but when I enter "show tables;" I get no response. This is the same for all queries. CREATE TABLE etc, there is no response
Its the same problem as this guy:
http://mail-archives.apache.org/mod_mbox...
Any ideas?
I resolved a similar issue and successfully ran HIVE after starting all Hadoop daemons
namenode
datanode
jobtracker
Task Tracker
Running queries from files using hive -f <filename>, instead of writing queries directly at the HIVE command prompt. Additionally, you may also use bin/hive -e 'SHOW TABLES'