Unable to install hadoop on macosx - macos

I am unable to run hadoop on my Max OSX.
When I run hadoop version -> nothing happens
When I run sudo hadoop version -> it shows my hadoop version. I read somewhere that I shldn't be using sudo to run hadoop. but anyway this tells me that my hadoop is installed?
Because hadoop version returns nothing, I am unable to open any nodes. Everytime I try to start a node with start-dfs.sh, nothing happens as well. Does anyone knows what's happening here? I've look through all the configuration files multiple times to ensure that I have set them right. Not sure where the problem lies here.
What i did:
Followed the instructions from here
In my config files I edited:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
etc/hadoop/hadoop-env.sh
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_181.jdk/Contents/Home
In fact I managed to run everything before.
I am not sure what I did after tht, but I am unable to start the nodes again. I think it might be because I failed to close my nodes before I shut down my computer before?

Related

How to configure the actual setting for localhost?

My core-site.xml is configured like this.
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
Now, when I enter 'start-all.cmd' in the command prompt, I see the services startup and I enter this 'localhost:9000' into my web browser, and I get an error message. When I enter this 'localhost:8088' into the web browser, I see the Hadoop cluster, which is up and running just fine. It seems like the core-site.xml is ignored, and the 'localhost:8088' is picked up from somewhere else, but I can't find it. Can someone give me a quick and dirty description of how this actually works? I already Googled for an answer, but I didn't seen anything useful about this.
format name node using this :
hdfs namenode -format
For more information:
Follow installation steps from this Site. It is working perfectly fine.
http://pingax.com/install-hadoop2-6-0-on-ubuntu/

Getting error when trying to run Hadoop 2.4.0 (-bash: bin/start-all.sh: No such file or directory)

I am doing the following to install and run Hadoop on my Mac:
First I install HomeBrew as the Package Manager
ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)"
Then I install Hadoop using the Brew command:
brew install hadoop
Then the following:
cd /usr/local/Cellar/hadoop/1.1.2/libexec
export HADOOP_OPTS="-Djava.security.krb5.realm= -Djava.security.krb5.kdc="
Then I configure Hadoop by adding the following to proper .xml files:
core-site.xml
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
mapred-site.xml
<property>
<name>mapred.job.tracker</name>
<value>localhost:9001</value>
</property>
I then Enable SSH to localhost:
System Preferences > Sharing > “Remote Login” is checked.
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
I then Format Hadoop filesystem:
bin/hadoop namenode -format
And then Start Hadoop (or at least try...this is where I get the error)
bin/start-all.sh
I get the error -bash: bin/start-all.sh: No such file or directory.
The one "odd" thing I did during setup was, since there is no longer a mapred-site.xml file in 2.4.0, I simply copied the mapred-site.xml.template file to my desktop, renamed it to mapred-site.xml, and put that new copy in the folder. I also tried running without any mapred-site.xml configuration but I still get this error.
AFAIK , brew installs hadoop-2.4.0 by default. See here https://github.com/Homebrew/homebrew/blob/master/Library/Formula/hadoop.rb
And in hadoop2.x there is no start-all.sh file in bin folder. It is moved to sbin. Also you need some more configurations. These links may be useful : http://codesfusion.blogspot.in/2013/10/setup-hadoop-2x-220-on-ubuntu.htmlhttps://hadoop.apache.org/docs/r2.2.0/

How to use Hive without hadoop

I am a new to NoSQL solutions and want to play with Hive. But installing HDFS/Hadoop takes a lot of resources and time (maybe without experience but I got no time to do this).
Are there ways to install and use Hive on a local machine without HDFS/Hadoop?
yes you can run hive without hadoop
1.create your warehouse on your local system
2. give default fs as file:///
than you can run hive in local mode with out hadoop installation
In Hive-site.xml
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<!-- this should eventually be deprecated since the metastore should supply this -->
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp</value>
<description></description>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp</value>
</property>
</configuration>
If you are just talking about experiencing Hive before making a decision you can just use a preconfigured VM as #Maltram suggested (Hortonworks, Cloudera, IBM and others all offer such VMs)
What you should keep in mind that you will not be able to use Hive in production without Hadoop and HDFS so if it is a problem for you, you should consider alternatives to Hive
You cant, just download Hive, and run:
./bin/hiveserver2
Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
Hadoop is like a core, and Hive need some library from it.
Update This answer is out-of-date : with Hive on Spark it is no longer necessary to have hdfs support.
Hive requires hdfs and map/reduce so you will need them. The other answer has some merit in the sense of recommending a simple / pre-configured means of getting all of the components there for you.
But the gist of it is: hive needs hadoop and m/r so in some degree you will need to deal with it.
Top answer works for me. But need few more setups. I spend a quite some time search around to fix multiple problems until I finally set it up. Here I summarize the steps from scratch:
Download hive, decompress it
Download hadoop, decompress it, put it in the same parent folder as hive
Setup hive-env.sh
$ cd hive/conf
$ cp hive-env.sh.template hive-env.sh
Add following environment in hive-env.sh (change path accordingly based
on actual java/hadoop version)
JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_281.jdk/Contents/Home
export path=$JAVA_HOME/bin:$path
export HADOOP_HOME=${bin}/../../hadoop-3.3.1
setup hive-site.xml
$ cd hive/conf
$ cp hive-default.xml.template hive-site.xml
Replace all the variable ${system:***} with constant paths (Not sure why this is not recognized in my system).
Set database path to local with following attributes (copied from top answer)
<configuration>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
</property>
<property>
<!-- this should eventually be deprecated since the metastore should supply this -->
<name>hive.metastore.warehouse.dir</name>
<value>file:///tmp</value>
<description></description>
</property>
<property>
<name>fs.default.name</name>
<value>file:///tmp</value>
</property>
</configuration>
setup hive-log4j2.properties (optional, good for troubleshooting)
cp hive-log4j2.properties.template hive-log4j2.properties
Replace all the variable ${sys:***} to constant path
Setup metastore_db
If you directly run hive, when do any DDL, you will got error of:
FAILED: HiveException org.apache.hadoop.hive.ql.metadata.HiveException:MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ? createDatabaseIfNotExist=true for mysql))
In that case we need to recreate metastore_db with following command
$ cd hive/bin
$ rm -rf metastore_db
$ ./schematool -initSchema -dbType derby
Start hive
$ cd hive/bin
$ ./hive
Now you should be able run hive on you local file system. One thing to note, the metastore_db will always be created on you current directory. If you start hive in a different directory, you need to recreate it again.
Although, there are some details that you have to keep in mind it's completely normal to use Hive without HDFS. There are a few details one should keep in mind.
As a few commenters mentioned above you'll still need some .jar files from hadoop common.
As of today(XII 2020) it's difficult to run Hive/hadoop3 pair. Use stable hadoop2 with Hive2.
Make sure POSIX permissions are set correctly, so your local hive can access warehouse and eventually derby database location.
Initialize your database by manual call to schematool
You can use site.xml file pointing to local POSIX filesystem, but you can also set those options in HIVE_OPTS environmen variable.
I covered that with examples of errors I've seen on my blog post

Why do we need to format HDFS after every time we restart machine?

I have installed Hadoop in pseudo distributed mode on my laptop, OS is Ubuntu.
I have changed paths where hadoop will store its data (by default hadoop stores data in /tmp folder)
hdfs-site.xml file looks as below :
<property>
<name>dfs.data.dir</name>
<value>/HADOOP_CLUSTER_DATA/data</value>
</property>
Now whenever I restart machine and try to start hadoop cluster using start-all.sh script, data node never starts. I confirmed that data node is not start by checking logs and by using jps command.
Then I
Stopped cluster using stop-all.sh script.
Formatted HDFS using hadoop namenode -format command.
Started cluster using start-all.sh script.
Now everything works fine even if I stop and start cluster again. Problem occurs only when I restart machine and try to start the cluster.
Has anyone encountered similar problem?
Why this is happening and
How can we solve this problem?
By changing dfs.datanode.data.dir away from /tmp you indeed made the data (the blocks) survive across a reboot. However there is more to HDFS than just blocks. You need to make sure all the relevant dirs point away from /tmp, most notably dfs.namenode.name.dir (I can't tell what other dirs you have to change, it depends on your config, but the namenode dir is mandatory, could be also sufficient).
I would also recommend using a more recent Hadoop distribution. BTW, the 1.1 namenode dir setting is dfs.name.dir.
For those who use hadoop 2.0 or above versions config file names may be different.
As this answer points out, go to the /etc/hadoop directory of your hadoop installation.
Open the file hdfs-site.xml. This user configuration will override the default hadoop configurations, that are loaded by the java classloader before.
Add dfs.namenode.name.dir property and set a new namenode dir (default is file://${hadoop.tmp.dir}/dfs/name).
Do the same for dfs.datanode.data.dir property (default is file://${hadoop.tmp.dir}/dfs/data).
For example:
<property>
<name>dfs.namenode.name.dir</name>
<value>/Users/samuel/Documents/hadoop_data/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/Users/samuel/Documents/hadoop_data/data</value>
</property>
Other property where a tmp dir appears is dfs.namenode.checkpoint.dir. Its default value is: file://${hadoop.tmp.dir}/dfs/namesecondary.
If you want, you can easily also add this property:
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/Users/samuel/Documents/hadoop_data/namesecondary</value>
</property>

Bigtop Hbase tables disappeared after PC restart

I installed Bigtop 0.7.0 on Ubuntu 12.04 and I started without any problem the master server with:
sudo hbase master start
I was able to connect with hbase shell and create a table.
After I restarted the PC, I saw that table is not there anymore.
I read that the problem is that it stores tables in /tmp which is cleared after restart, so I tried to change the configuration hbase-site.xml to set another folder.
the default hbase-site.xml was:
<configuration/>
(No properties defined)
When I wrote in hbase-site.xml, then I tried to start the hbase master again and I recieved Zookeeper client exception not possible to connect to server.
Can you please give me some advice on how to configure this right or if there is maybe some other problem that I'm not aware of?
EDIT (from the comments):
My hbase-site.xml is:
<configuration>
<!--property>
<name>hbase.rootdir</name>
<value>file://app/hadoop/tmp/hbase</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/app/hadoop/tmp</value>
<property-->
</configuration>

Resources