Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I was looking for how to monitor Hadoop clusters more conveniently, and then I came across something called Ambari.
I want to apply Apache Ambari to my running Hadoop cluster.
Is it possible to apply Apache Ambari to a running Hadoop cluster?
If this is not possible, are there any future patches planned?
#Coldbrew No. Ambari should be installed on a fresh cluster. If you indeed need to use ambari hadoop, I would recommend make a new ambari cluster w/ hadoop configured as close to possible as your existing hadoop and then migrating the native hadoop data to the new platform.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I need to bring the files (zip, csv, xml etc) from windows share location to HDFS. Which is the best approach ? I have kafka - flume - hdfs in mind. Please suggest the efficient way.
I tried getting the files to Kafka consumer.
producer.send(
new ProducerRecord(topicName,key,value),
Expect an efficient approach
Kafka is not designed to send files, only individual messages of up to 1MB, by default.
You can install NFS Gateway in Hadoop, then you should be able to copy directly from the windows share to HDFS without any streaming technology, only a scheduled script on the windows machine, or externally ran
Or you can mount the windows share on some Hadoop node, and schedule a Cron job if you need continuous file delivery - https://superuser.com/a/1439984/475508
Other solutions I've seen use tools like Nifi / Streamsets which can be used to read/move files
https://community.hortonworks.com/articles/26089/windows-share-nifi-hdfs-a-practical-guide.html
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have used spark on my local machine using python for analytical puproses.
Recently I've heard the words "spark cluster" and I was wondering what it is exactly?
Is it just Spark running on some cluster of machines ?
And how can it be used on cluster without Hadoop system? Is it possible? Can you please describe?
Apache spark is a distributed computing system. While it can run on a single machine, it is meant to run on a cluster and to take advantage of parallelism possible utilizing the cluster. Spark utilizes much of the Hadoop stack, such as the HDFS file system. However, Spark overlaps considerably with Hadoop distributed computing chain. Hadoop centers around the map reduce programming pattern, while Spark is more general with regard to program design. Also, Spark has features to help increase performance.
For more information, see https://www.xplenty.com/blog/2014/11/apache-spark-vs-hadoop-mapreduce/
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I'm using Cloudera-quickstart 5.5.0 virtualbox
Trying to run this on terminal. As you can below, there is an exception. I've searched for solution to solve this on internet and found something.
1-) configuring core-site.xml file. https://datashine.wordpress.com/2014/09/06/java-net-connectexception-connection-refused-for-more-details-see-httpwiki-apache-orghadoopconnectionrefused/
But I can only open this file readable and haven't been able to change it. It seems I need to be root or hdfs user (su hdfs -) but it asks me for a password which I don't know.
A network configuration is not your problem. You don't need to touch any configurations in the VM, you need to start the services. In this image, for example. The HDFS service on the left is disabled, and I get the same error on that last command.
You have to start Cloudera Manager and start ZooKeeper, YARN, and HDFS (in that order).
To open Cloudera Manager, go to http://quickstart.cloudera:7180 in Firefox on the VM.
Then start the mentioned services.
After you start the services, you can use HDFS commands.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
Which one should start first..in sequential order .. when we have Oozie,HDFS,Hive Zookeeper and other tools in the Hadoop ecosystem?
It an administrative question posed by my superiors .
There are multiple possibilities based on the deployment scenarios like for example if you have highly available Hadoop (NameNode high-availability, ResourceManager/JobTracker high-availability) or have HBase in the cluster then the order would be something like this:
Zookeeper
HDFS
HBase (if used)
Yarn/MapReduce
Other Ecosystem tools (hive, pig, sqoop, oozie)
It doens't matter the start order of the eco-system tools.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
Need some help here guys. I am new to Hadoop and I need to setup a Hadoop cluster fast using windows machines.
I am aware that I can use Cloudera for this but I was just wondering that instead of downloading a virtual box first, configuring it with Ubuntu and then installing CDH4 on it, can I not just download a pre-configured VM that Cloudera provides on the different machines and then network them?
Is there any step by step tutorial available to do this using the VMs provided by Cloudera?
Any help would be very appreciated.
Thanks,
Kumar
EDIT : I have VMPlayer, isos of Ubuntu 12.04 LTS, CentOS 6.2, VirtualBox and fast internet. Now can someone tell me what's the fastest way of setting up a cluster using CDH4 on 4-5 laptops I have in a LAN with windows on them?
The fastest way to setup Cloudera Hadoop cluster is to install Cloudera Manager and leave all jobs to it.
First, install Cloudera manager server in one node, start the server service.
Second, install Cloudera manager agent on other nodes, set the hostname of server to /etc/cloudera-scm-agent/config.ini, then start all the agents.
Third, use a browser to visit the http://cloudera-scm-server:7180, then follow the wizard and Cloudera Manager will take care of all left jobs.