Cannot access hdfs file system running in mapr sandbox VM - hadoop

I have just installed the MapR sandbox virtual machine running in Virtualbox. The VM is set up using "NAT" network mode and ports are forwarded to my Mac. Since the ports are forwarded I am guessing that I should be able to access the hdfs on "localhost".
now I am trying to list the contents of the hdfs on the VM:
$ hadoop fs -fs maprfs://localhost -ls /
15/03/25 15:16:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015-03-25 15:16:11,6646 ERROR Cidcache fs/client/fileclient/cc/cidcache.cc:1586 Thread: 4548153344 MoveToNextCldb: No CLDB entries, cannot run, sleeping 5 seconds!
2015-03-25 15:16:16,6683 ERROR Client fs/client/fileclient/cc/client.cc:813 Thread: 4548153344 Failed to initialize client for cluster localhost:7222, error Connection refused(61)
ls: Could not create FileClient
I also tried with 127.0.0.1, with sudo and with the port :5660 at the end without success.
Any ideas?

Changing from NAT network mode to host only fixed the problem. Then, of course I have to use the IP of the VM for accessing maprfs.

if you are just running plain Spark on local/single node then you dont need HDFS, you can just mention your input and output files to be loaded from local file system, like below:
file:///pathtoinput
file:///pathtooutput

Related

hadoop on macOS initiating secondary namenode fails due to ssh connection refused

I've successfully gone through initiating single-node in a pseudo-distributed mode described in https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation, under Window's wsl2 environment.
After that, I tried to repeat it using MacBookPro. But somehow start-dfs.sh fails. Terminal throws error:
Stopping namenodes on [localhost]
Stopping datanodes
Stopping secondary namenodes [kakaoui-MacBookPro.local]
kakaoui-MacBookPro.local: ssh: connect to host kakaoui-macbookpro.local port 22: Connection refused
2021-06-26 23:01:23,377 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Okay. There are answers saying I should enable ssh connection via system property, but it is already set so and ssh localhost also works fine.
And then thing goes worth; Sometimes it is described that secondary namenode fails as:
Starting secondary namenodes [kakaoui-MacBookPro.local]
kakaoui-MacBookPro.local: ssh: connect to host kakaoui-macbookpro.local port 22: Operation timed out
Then when I leave Mac for a while and again command start-dfs.sh, once in a while it succeeds. And as I do stop-dfs.sh and start-dfs.sh to check, it fails.
Even if I could successfully start-dfs.sh, a lot of problems like not being able to start data node or resourcemanager or nodemanager etc comes after. I couldn't run hadoop environment even once.
Feels like everything is mixed up and things are not stable at all. Tried reinstalling this and that for several times already. Unfortunately most of initiation failure is not even recored in /logs folder.
Currently I'm using:
macOS: Catalina 10.15.6
java: 1.8.0_291
hadoop: 3.3.1
I've spent whole two day just trying. Please help!
Okay, I found the solution that I don’t understand. I turned off wifi connection during initiation process and all processes started up. Can’t understand how wifi connection interferes ssh localhost though.
Provide ssh-key less access to all your worker nodes in hosts file, even localhost as well as kakaoui-macbookpro.local. Read instruction in the Creating a SSH Public Key on OSX.
At last test access without password by ssh localhost and ssh [yourworkernode] (maybe ssh kakaoui-macbookpro.local).

Secondary namenode connection timed out

I'm trying to set up hadoop on my Mac Mojave 10.14.6. The hadoop version I'm using is 3.0.3
I followed this tutorial to set up the config: https://dbmstutorials.com/hive/hdfs-setup-on-mac.html
While running hdfs namenode -format I have this following error for the secondary namenode:
Starting secondary namenodes [xp]
xp: ssh: connect to host xp port 22: Operation timed out
2019-12-09 09:26:03,796 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
I allowed remote login and created ssh keys without password and desactivate the fire wall to check if it could help but the problem remains. Any help would be greatly appreciated :)
Yes I tried to ssh xp and it didn't work. After investigating a bit more I managed to make it work...
I changed the ip in the /etc/hosts from 127.0.1.1 to another one which were responding to the ping. I don't know why the ip 127.0.1.1 didn't work but at least the problem seems to be fixed for now

copy a file from wsl to hdfs running on docker

I'm trying to copy a file from my local drive to hdfs.
I'm running Hadoop on docker as an image. I try to perform some exercise on MapReduce, therefore, I want to copy a data file from a local drive (let's say my d: drive) to hdfs.
i tried below command but it fails with ssh: connect to host localhost port 22: Connection refused:
scp -P 50070 /mnt/d/project/recreate.out root#localhost:/root
since I'm new to Hadoop and big data my explanation may terrible. Please tolerate with me.
I'm trying to do above things from windows subsystem for Linux (WSL)
Regards,
crf
SCP won't move data to Hadoop. And port 50070 is not accepting connections over that protocol (SSH)
You need to setup and use a command similar to hdfs dfs -copyFromLocal. You can install the HDFS cli on the Windows host command prompt, too, so you don't need WSL to upload files...
When using Docker, I would suggest doing this
Add a volume mount from your host to some Hadoop container outside of the datanode and namenode directories (in other words, don't override the data that is there, and mounting files here will not "upload to HDFS")
docker exec into this running container
Run above hdfs command, uploading from the mounted volume

Hadoop standalone mode not starting at the local machine have permission issues

I am not able to figure out what the problem is, I have checked all the links available for the problem and tried but still the same problem.
Please need help as the sandbox available needs higher configuration like more RAM.
hstart
WARNING: Attempting to start all Apache Hadoop daemons as adityaverma
in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [localhost]
localhost: adityaverma#localhost: Permission denied
(publickey,password,keyboard-interactive).
Starting datanodes
localhost: adityaverma#localhost: Permission denied
(publickey,password,keyboard-interactive).
Starting secondary namenodes [Adityas-MacBook-Pro.local]
Adityas-MacBook-Pro.local: adityaverma#adityas-macbook-pro.local:
Permission denied (publickey,password,keyboard-interactive).
2018-05-30 11:07:03,084 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
Starting resourcemanager
Starting nodemanagers
localhost: adityaverma#localhost: Permission denied (publickey,password,keyboard-interactive).
This error typically means you failed to setup passwordless SSH. For example, the same error should happen with ssh localhost, and it should not prompt for a password
Check the Hadoop documentation again on SSH key generation and add it to your authorized keys file
I might suggest setting up a virtual machine anyway (for example, using Vagrant) if the sandbox requires too many resources. The Hortonworks&Cloudrea installation docs are fairly detailed to install a cluster from scratch
This way, Hadoop isn't cluttering your Mac's hard drive and a Linux server will closer match Hadoop installations running in production environments

Hadoop 2.2 Add new Datanode to an existing hadoop installation

I first installed hadoop 2.2 on my machine (called Abhishek-PC) and everything worked fine. I am able to run the entire system successfully. (both namenode and datanode).
Now I created 1 VM hdclient1 and I want to add this VM as a data node.
Here are the steps which I have followed
I setup SSH successfully and I can ssh into hdclient1 without a password and I can login from hdclient1 into my main machine without a password.
I setup hadoop 2.2 on this VM and I modified the configuration files as per many tutorials on the web. Here are my configuration files
Name Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXdEM1WmRqVG5uYlU/edit?usp=sharing
Data Node configuration
https://drive.google.com/file/d/0B0dV2NMSGYPXRnh3YUo1X2Frams/edit?usp=sharing
Now when I start start-dfs.sh on my first machine, I can see that DataNode starts successfully on hdclient1. Here is a screenshot from my hadoop console.
https://drive.google.com/file/d/0B0dV2NMSGYPXOEJ3UV9SV1d5bjQ/edit?usp=sharing
As you can see both the machines appear in my cluster (main main and data node).
Although both are called "localhost" for some strange reason.
I can see that the logs are being created on hdclient1in those logs there are no exceptions.
here are the logs from the name node
https://drive.google.com/file/d/0B0dV2NMSGYPXM0dZTWVRUWlGaDg/edit?usp=sharing
Here are the logs from the data node
https://drive.google.com/file/d/0B0dV2NMSGYPXNV9wVmZEcUtKVXc/edit?usp=sharing
I can login to the namenode UI successfully http://Abhishek-PC:50070
but here the UI in the live nodes it says only 1 live node and there is no mention of hdclient1.
https://drive.google.com/file/d/0B0dV2NMSGYPXZmMwM09YQlI4RzQ/edit?usp=sharing
I can create a directory in hdfs successfully hadoop fs -mkdir /small
From the datanode I can see that this directory has been created by using this command hadoop fs -ls /
Now when I try to add a file to my HDFS and I say
hadoop fs -copyFromLocal ~/Downloads/book/war_and_peace.txt /small
i get an error message
abhishek#Abhishek-PC:~$ hadoop fs -copyFromLocal
~/Downloads/book/war_and_peace.txt /small 14/01/04 20:07:41 WARN
util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable 14/01/04
20:07:41 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/small/war_and_peace.txt.COPYING could only be replicated to 0 nodes
instead of minReplication (=1). There are 1 datanode(s) running and
no node(s) are excluded in this operation. at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget(BlockManager.java:1384)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2477)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:555)
So my question is What am I doing wrong here? Why do I get this exception when I try to copy the file into HDFS?
We have a 3-node cluster (all physical boxes) that's been working great for a couple of months. This article helped me the most to setup.

Resources