How to Start withHadoop in Oracle Virtual box - hadoop

I have configured Hadoop with a Hortonworks Sandbox and mounted it with Oracle Virtual Box. Now when I am starting the virtual box machine, the Linux system for Hadoop is booting up and there is an option Alt+F5 for start. But when I press Alt+F5, it asks me for a username and password.
I haven't specified any username/password during the time of installation when the virtual box is starting the Hortonworks Sandbox is running locally on my machine. So I am confident that my Hadoop is installed successfully.
How do I proceed?

You can login/SSH to the hadoop with the below credentials
hue/hadoop
root/hadoop
I've listed out these and other handy URL's that can help you get started in my (full disclosure) blog here. Go through it if you are interested.

For Hortonworks Sandbox the default username and password if I remember correctly is root/hadoop. Hope this helps.

Related

UI amberi not loading but wekan dashboard loaded when using virtualbox and hortonwork hdp

I am newbiw in hadoop env. I have installed virtual box and download hortonwrk sandbox hdp 2.6.5 to open ui amberi. The virtual vm show https://localhost:1080 and https://localhost:4200. When i open this link in broswerr, it hsow wekan dashboard. not ui amberi.
How to open ul amberi ? Need help on this matter.
First, Hortonworks Sandbox is no longer supported or maintained, so you should find another solution.
If you see a different service on either port, then you have something else running on your host or as a VM, and occupying those addresses, and it's not a Virtualbox/Hadoop/Ambari problem. Ambari is showing local addresses within the VM, not addresses for your host. You can edit the VM's port mappings to use some other host port.

Is it necessary to be able to do SSH from one node to another in order to do cloudera cluster installation?

I am creating a cloudera cluster and when I do SSH from one node to another I get message saying "Public Key".Login to the machines happen using PEM file and paraphrase Is it necessary to be able to do SSH from one node to another in order to do cloudera cluster installation?
Yes, it is necessary. Cloudera manager need a way to access other machines over SSH. This can be authenticated using public keys (recommended) or passwords.
Cloudera Manager requires an SSH key to communicate with the Cloudera SCM Agents and do the installation.
Cloudera Manager installs the cluster for you. but you will need to initially add the authorized_keys SSH file for the remote manager to access the agent machines.
How does Cloudera Manager Work? (doesn't mention SSH, though)

Is that possible to connect Ubuntu HDFS using C# application

I am having HDFS in Ubuntu Environment, Is that possible to connect Ubuntu HDFS using C# application (Windows OS).
All the systems are connected via LAN.
I want to read simple CSV file from HDFS.
I want to know whether it is possible or not.
If you are using Hortonworks Azure HDInsight you can directly use C# to access the HDFS. In your case you are trying to read from windows OS. Please try using webhdfs. But it need configuration. Please check the below url for details.
URL: http://hadoop.apache.org/docs/r2.4.1/hadoop-hdfs-httpfs/

Downloading Hadoop Data from other PC

I have Hadoop v2.6 installed in my one PC in Ubuntu OS 14.04. I have added lots of unstructured data using Hadoop -put command into HDFS.
Can someone tell me how to download this data from another PC which is not in Hadoop Cluster using the Web User Interface provided by Hadoop??
I can access the data from other PC by typing in the address bar of browser (the IP address of HDFS server):Port Number
Like this: 192.168.x.x:50070
The problem is, I am not able to download the data as it gives the error "Webpage Not Available". I also tried other browsers, but still no luck.
Port 50070 is the default name node port. You should try port 14000 which is the default HttpFS port. If it still doesn't work try using the example from the manual:
http://192.168.x.x:14000?user.name=babu&op=homedir

Use spark-submit to submit a application to EC2 cluster

I am new to Spark and I am trying to run it on EC2. I follow the tutorial on spark webpage by using spark-ec2 to launch a Spark cluster. Then, I try to use spark-submit to submit the application to the cluster. The command looks like this:
./bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://ec2-54-88-9-74.compute-1.amazonaws.com:7077 --executor-memory 2G --total-executor-cores 1 ./examples/target/scala-2.10/spark-examples_2.10-1.0.0.jar 100
However, I got the following error:
ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
Please let me know how to fix it. Thanks.
You're seeing this issue because the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your machine). The default mode of spark-submit is client which runs the driver on the machine that submitted it.
A new cluster mode was added to spark-deploy that submits the job to the master where it is then run on a client, removing the need for a direct connection. Unfortunately this mode is not supported in standalone mode.
You can vote for the JIRA issue here: https://issues.apache.org/jira/browse/SPARK-2260
Tunneling your connection via SSH is possible but latency would be a big issue since the driver would be running locally on your machine.
I'm curious if you still having this issue ... But in case anyone is asking here is a brief answer. As clarified by jhappoldt, the master node of your spark-standalone cluster cant open a TCP connection back to the drive (on your local machine). Two workarounds are possible, tested and succeeded.
(1) From EC2 Management Console, create a new security group and add rules to enable TCP back and forth from your PC (public IP). (what I did was adding TCP rules inbound and outbound) ... Then add this security group to your master instance. (right click --> Networking --> Change security groups). Note: add it and don't remove the already established security groups.
This solution work well, but in your specific scenario, deploying your application from local machine to EC2 cluster, you will face further problems (resource related) so the next option is the best one
(2) Having your .jar file (or .egg) copy it to the master node using scp. You can check this link http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AccessingInstancesLinux.html for information about how to do that; and deploy your application from the master node. Note: spark is already pre-insalled so you will do nothing but write the same exact command you write on your local machine from ~/spark/bin. This shall work perfect.
Are you executing the command on your local machine, or on the created EC2 node? If you're doing it locally, make sure port 7077 is open in the security settings, as its closed to the outside by default.

Resources