How to copy files from a remote server to hdfs location - shell

I want to copy files from a remote server using sftp to an hdfs location directly without copying the files to local. The hdfs location is a secured cluster. Please suggest if this is feasible and how to proceed in that case.
Also I would want to know if there is any other way to connect and copy apart from sftp.

I think the most convenient way (given that your remote machine is able to connect to the hadoop cluster) is to make that remote machine act as an HDFS client. Just ssh to that machine, install the hadoop distribution, configure it properly, then run:
hadoop fs -put /local/path /hdfs/path

Related

How to copy a file from HDFS to a Windows machine?

I want to copy a .csv file from our Hadoop cluster in my local Desktop, so I can edit the file and upload back (replace).
I tried:
hadoop fs -copyToLocal /c_transaction_label.csv C:/Users/E_SJIRAK/Desktop
which yielded:
copyToLocal: '/Users/E_SJIRAK/Desktop': No such file or directory:
file:////Users/E_SJIRAK/Desktop
Help would be appreciated.
If you have SSH'd into the Hadoop cluster, then you cannot copyToLocal into Windows.
You need a 2 step process. Download from HDFS to the Linux environment. Then use SFTP (WinSCP, Filezilla, etc) or Putty scp command from Windows host to get files into your Windows machine.
Otherwise, you need to setup hadoop CLI command on Windows itself.

Copying a directory from a remote HDFS local file system to my local machine

I have a directory on my local hdfs environment, I want to copy it to my local computer. I am accessing the hdfs using ssh (with a password).
I tried many suggested copy command but did not work.
What I tried:
scp ‘username#hn0-sc-had:Downloads/*’ ~/Downloads
as mentioned in this link.
What am I doing wrong?
SCP will copy from the remote Linux server.
HDFS does not exist on a single server or is a "local filesystem", therefore SCP is not the right tool to copy from it directly
Your options include
SSH to remote server
Use hdfs dfs -copyToLocal in order to pull files from HDFS
Use SCP from your computer to get the files you just downloaded on the remote server
Or
Configure a local Hadoop CLI using XML files from remote server
Use hdfs dfs -copytoLocal directly against HDFS from your own computer
Or
Install HDFS NFS Gateway
Mount an NFS volume on your local computer, and copy of files from it

Retrieve files from remote HDFS

My local machine does not have an hdfs installation. I want to retrieve files from a remote hdfs cluster. What's the best way to achieve this? Do I need to get the files from hdfs to one of the cluster machines fs and then use ssh to retrieve them? I want to be able to do this programmatically through say a bash script.
Here are the steps:
Make sure there is connectivity between your host and the target cluster
Configure your host as client, you need to install compatible hadoop binaries. Also your host needs to be running using same operating system.
Make sure you have the same configuration files (core-site.xml, hdfs-site.xml)
You can run hadoop fs -get command to get the files directly
Also there are alternatives
If Webhdfs/httpFS is configured, you can actually download files using curl or even your browser. You can write bash scritps if Webhdfs is configured.
If your host cannot have Hadoop binaries installed to be client, then you can use following instructions.
enable password less login from your host to the one of the node on the cluster
run command ssh <user>#<host> "hadoop fs -get <hdfs_path> <os_path>"
then scp command to copy files
You can have the above 2 commands in one script

Copy files from Remote Unix and Windows servers into HDFS without intermediate staging

How can I copy files from remote Unix and Windows servers into HDFS without intermediate staging from the command line?
You can use following command:
hadoop fs -cp /user/myuser/copyTestFolder/* hdfs://remoteServer:8020/user/remoteuser/copyTestFolder/
or vice versa to copy from server to local machine.
You can also read the hadoop documentation.
You can use WebHDFS and cURL to upload files. This will not require having any hadoop binaries on your client, just a cURL or cURL like client. The BigInsights Knowledge Center has information on how to administer the file system using the HttpFS REST APIs.

HDFS file FTP from cluster to another machine

I want to create an Oozie workflow to transfer an HDFS file from an HDFS cluster to another server.
Since Oozie can run commands or scripts on any node in a system, is it possible to run a shell script or SFTP on one of the nodes and transfer the file to the destination server.
I think this task can be easily done by performing, from the remote server, a http GET (open operation) on the HDFS file (you can use curl for that).
Anyway, if you want to do it through Oozie, I think you can create a script in charge of moving the desired file from HDFS to the local file system, and then perform a scp in order to move the file within the local file system to the remote file system.

Resources