Issue with Impala Build - hadoop

I am trying to build impala on ubuntu-20.04 using https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala .
I am able to see HDFS, HIVE, HBase, Ranger components are downloaded but the build was failing with "Could not get one or more nodes"
enter image description here
For building purpose i have used the below command,
./buildall.sh -noclean -notests -format
While build it has successfully started HDFS,HIVE, Hbase, Zookeeper services
enter image description here

Related

Is there a way to load the install-interpreter.sh file in EMR in order to load 3rd party interpreters?

I have an Apache Zeppelin notebook running and I'm trying to load the jdbc and/or postgres interpreter to my notebook in order to write to a postgres DB from Zeppelin.
The main resource to load new interpreters here tells me to run the code below to get other interpreters:
./bin/install-interpreter.sh --all
However, when I run this command in EMR terminal, I find that the EMR cluster does not come with an install-interpreter.sh executable file.
What is the recommended path?
1. Should I find the install-interpreter.sh file and load that to the EMR cluster under ./bin/?
2. Is there an EMR configuration on start time that would enable the install-interpreter.sh file?
Currently all tutorials and documentations assumes that you can run the install-interpreter.sh file.
The solution is to not run this code below in root (aka - ./ )
./bin/install-interpreter.sh --all
Instead in EMR, run the code above in Zeppelin, which in the EMR cluster, is in /usr/lib/zeppelin

Error while working on hive which is installed on edgenode

I am new to hadoop/HIve learning and struggling to fix this, for a distributed hadoop environment where should hive and pig need to install, is this edge node or where my hadoop installed
Hadoop installed on different server say hadoopVM, 2 separate data nodes DN1, DN2 & Edge Nodes from where I can submit jobs to hadoop to load any files to HDFS
till here i have no issue, i am trying to install hive edge node and getting below error
Attached error which i am getting on edgenode server
It seems that the Meta Store service is not started. start the service by issuing the following command in one of the session and don't close that session, and parallel start another session and try to use hive.
Active session mode:
sudo hive --service metastore
Background service mode:
If you add "&&" then service will be started and keep running as a background process.
sudo hive --service metastore &&
Altarnative:
If you still facing the problem then this is the problem because the new version of MySQL, you can refer my answer at below link.
SemanticException in Hive Shell Mode

H2O: unable to connect to h2o cluster through python

I have a 5 node hadoop cluster running HDP 2.3.0. I setup a H2O cluster on Yarn as described here.
On running following command
hadoop jar h2odriver_hdp2.2.jar water.hadoop.h2odriver -libjars ../h2o.jar -mapperXmx 512m -nodes 3 -output /user/hdfs/H2OTestClusterOutput
I get the following ouput
H2O cluster (3 nodes) is up
(Note: Use the -disown option to exit the driver after cluster formation)
(Press Ctrl-C to kill the cluster)
Blocking until the H2O cluster shuts down...
When I try to execute the command
h2o.init(ip="10.113.57.98", port=54321)
The process remains stuck at this stage.On trying to connect to the web UI using the ip:54321, the browser tries to endlessly load the H2O admin page but nothing ever displays.
On forcefully terminating the init process I get the following error
No instance found at ip and port: 10.113.57.98:54321. Trying to start local jar...
However if I try and use H2O with python without setting up a H2O cluster, everything runs fine.
I executed all commands as the root user. Root user has permissions to read and write from the /user/hdfs hdfs directory.
I'm not sure if this is a permissions error or that the port is not accessible.
Any help would be greatly appreciated.
It looks like you are using H2O2 (H2O Classic). I recommend upgrading your H2O to the latest (H2O 3). There is a build specifically for HDP2.3 here: http://www.h2o.ai/download/h2o/hadoop
Running H2O3 is a little cleaner too:
hadoop jar h2odriver.jar -nodes 1 -mapperXmx 6g -output hdfsOutputDirName
Also, 512mb per node is tiny - what is your use case? I would give the nodes some more memory.

How to run Mahout jobs on Spark Engine?

Currently I’m doing some document similarity analysis using Mahout RowSimilarity Job. This can be easily done be running command ‘mahout rowsimilarity…’ from the console. However I noticed that this Job is also supported to be run on Spark engine. I wonder to know how I can run this Job on Spark Engine.
You can use MLlib alternate of mahout in spark. All library in MLlib are processing in distributed mode(Map-reduce in Hadoop).
In Mahout 0.10 provide job execution with spark.
More detail Link
http://mahout.apache.org/users/sparkbindings/play-with-shell.html
step to setup spark with mahout.
1 Goto the directory where you unpacked Spark and type sbin/start-all.sh to locally start Spark
2 Open a browser, point it to http://localhost:8080/ to check whether Spark successfully started. Copy the url of the spark master at the top of the page (it starts with spark://)
3 Define the following environment variables:
export MAHOUT_HOME=[directory into which you checked out Mahout]
export SPARK_HOME=[directory where you unpacked Spark]
export MASTER=[url of the Spark master]
4 Finally, change to the directory where you unpacked Mahout and type bin/mahout spark-shell, you should see the shell starting and get the prompt mahout>. Check FAQ for further troubleshooting.
Please visit link.It uses new mahout 0.10 and works uses spark server.

configure hive with hadoop

I have configured hadoop 2.2.0 as single node cluster ( was able to run example jar)
Now I need to make hive perform queries using this hadoop
should I set
mapred.job.tracker
to
yarn.resourcemanager.resource-tracker.address
property?
tried so, but can't see the data loaded into hive tables in hdfs
I don't have enough reputation points to add a comment, so trying to help via an answer.
What are the daemons currently running for Hadoop? Use ps -eaf |
grep "java" to check.
Do you see the JobTracker running or the ResourceManager?
Also, can you elaborate on the steps you performed to install Hive?
I have screen cast, Installing Apache Hive that walks you through installing Hive. Next, you can follow my blog post Apache Hive - Getting Started. Hope this helps.

Resources