It's possible only install Hadoop HDFS? - hadoop

I'm new on Hadoop world, and I need install mesos with Hadoop HDFS to make a fault-tolerant distributed file system, but all installation references include necessary components for my scenario as for example: MapReduce.
Do you have any idea or references about this?

Absolutely possible. Don't think Hadoop as an installable program, it's just composed by a bunch of java processes running on different nodes inside a cluster.
If you use hadoop tar ball, you can just run NameNode and DataNodes processes if you only want HDFS.
If you use other hadoop distros (HDP for instance), I think HDFS and mapreduce come from different rpm packages, but it does harm to install both rpm packages. Again just run NameNode and DataNodes if you only need HDFS.

Related

Create hdfs when using integrated spark build

I'm working with Windows and trying to set up Spark.
Previously I installed Hadoop in addition to Spark, edited the config files, run the hadoop namenode -format and away we went.
I'm now trying to achieve the same by using the bundled version of Spark that is pre built with hadoop - spark-1.6.1-bin-hadoop2.6.tgz
So far it's been a much cleaner, simpler process however I no longer have access to the command that creates the hdfs, the config files for the hdfs are no longer present and I've no 'hadoop' in any of the bin folders.
There wasn't an Hadoop folder in the spark install, I created one for the purpose of winutils.exe.
It feels like I've missed something. Do the pre-built versions of spark not include hadoop? Is this functionality missing from this variant or is there something else that I'm overlooking?
Thanks for any help.
By saying that Spark is built with Hadoop, it is meant that Spark is built with the dependencies of Hadoop, i.e. with the clients for accessing Hadoop (or HDFS, to be more precise).
Thus, if you use a version of Spark which is built for Hadoop 2.6 you will be able to access HDFS filesystem of a cluster with the version 2.6 of Hadoop via Spark.
It doesn't mean that Hadoop is part of the pakage and downloading it Hadoop is installed as well. You have to install Hadoop separately.
If you download a Spark release without Hadoop support, you'll need to include the Hadoop client libraries in all the applications you write wiƬhich are supposed to access HDFS (by a textFile for instance).
I am also using same spark in my windows 10. What I have done create C:\winutils\bin directory and put winutils.exe there. Than create HADOOP_HOME=C:\winutils variable. If you have set all
env variables and PATH like SPARK_HOME,HADOOP_HOME etc than it should work.

How to bring down your namenode?

How to bring down your Namenode in Hadoop 1.2.1 on CentOs and swap your namenode with a Datanode instance, also I have to make sure no data is lost during the process.
I am using Hadoop 1.2.1 with master, slave 1 and slave 2 nodes.
I am looking for the Unix commands or the changes I need to make in the configuration files.
Please ask for any particular details if needed!
You can take a back up of namenode metadata and kill namenode. Install namenode packages on other node of interest and put the backup copy of metadata in namenode data dir. Now start namenode this should pick up your old metadata. Remember to change namenode details in all config files.

How to install impala on an already running hadoop cluster

I have an already up and running Hadoop, 5-node cluster. I want to install Impala on the HDFS cluster without the Cloudera Manager. Can anyone supposedly guide me through the process or a link of the same.
Thanks.

hadoop: different datanodes configuration in shared directory

I try to run hadoop in clustered server machines.
But, problem is server machines uses shared directories, but file directories are not physically in one disk. So, I guess if I configure different datanode direcotry in each machine (slave), I can run hadoop without disk/storage bottleneck.
How do I configure datanode differently in each slave or
How do I configure setup for master node to find hadoop that are installed in different directory in slave node when starting namenode and datanodes using "start-dfs.sh" ?
Or, is there some fancy way for this environment?
Thanks!

CDH4 installation using tarball

I have been struggling to install CDH via tarball, there is no document that describes the steps or guides through. I do have root access on the server & wish to install CDH4 via tarball in Pseudo mode. Can anyone help?. On the same server apache hadoop is also installed, i want to install this CDH, without effecting the existing apache hadoop.
It will not work..because in the end CDH4 will use the same ports which your existing apache hadoop is using..It will work ..if you shutdown your existing hadoop cluster and then start your CDH4 cluster. Or else change all the port numbers for namenode,secondary namenode,jobtracker, tasktracker and datanode and their respective web UI's port..which is kind of tedious.. It would be also helpful if you provide some error logs..So I can highlight what exactly is the problem.

Resources