How to access .pem file from amazon emr - hadoop

Im new to amazon EMR and i want to use .pem file in EMR.
.pem file is in my local folder. When i create same file with pem file contents in EMR instance its not working.
it would be really help if anyone can provide steps to copy file to EMR from local machine or access file from S3.
thhanks in advance.

create a bootstrap script to copy .pem file to EMR boxes
use below command in bootstrap script to download file to any loction of EMR ( I am downloading file to /mnt/
#!/bin/bash
hadoop fs -copyToLocal s3:n://mybucket/myfolder/my.pem /mnt/my.pem

If you are trying to copy the keypair.pem into the master node then ssh core nodes. Then, you can copy the file from your computer using winscp or by running the command
sftp -i key.pem hadoop#master
put keypair.pem

If I understand your question, you want to transfer the .pem file to one of the EMR instances. So, you can use the winscp application to do such operation. If you are not familiar with such application, have a look on this tutorial here. Let me know if you need any help..

Related

How to copy a file from HDFS to a Windows machine?

I want to copy a .csv file from our Hadoop cluster in my local Desktop, so I can edit the file and upload back (replace).
I tried:
hadoop fs -copyToLocal /c_transaction_label.csv C:/Users/E_SJIRAK/Desktop
which yielded:
copyToLocal: '/Users/E_SJIRAK/Desktop': No such file or directory:
file:////Users/E_SJIRAK/Desktop
Help would be appreciated.
If you have SSH'd into the Hadoop cluster, then you cannot copyToLocal into Windows.
You need a 2 step process. Download from HDFS to the Linux environment. Then use SFTP (WinSCP, Filezilla, etc) or Putty scp command from Windows host to get files into your Windows machine.
Otherwise, you need to setup hadoop CLI command on Windows itself.

How can I transfer a zip file directly from a URL to HDFS by using Java?

How can I transfer a zip file from a URL to HDFS by using Java? I am not supposed to download it. I can only transfer the file directly from URL to HDFS. Anyone has some idea would work out?
You can use a simple ssh code like:
wget http://domain/file.zip
and then
hadoop fs -put /path/file.zip
In java, you should download the file and then put it in hdfs

Running oozie job using a modified hadoop config file to support S3 to HDFS

Hello I am trying to copy a file in my S3 bucket into HDFS using the cp command.
I do something like
Hadoop --config config fs -cp s3a://path hadooppath
This works well when my config is in my local.
However now I am trying to set it up as an oozie job. So when I am now unable to pass the configuration files present in config directory in my local system. Even if its in HDFS, then still it doesn't seem to work. Any suggestions ?
I tried -D command in Hadoop and passed name and value pairs, still it throws some error. It works only from my local system.
Did you Try DISTCP in oozie? Hadoop 2.7.2 will supports S3 data source. You can able to schedule it by coordinators. Just parse the credentials to coordinators either RESTAPI or in Properties files. Its easy way to copy a data periodically(Scheduled manner).
${HADOOP_HOME}/bin/hadoop distcp s3://<source>/ hdfs://<destination>/

Retrieve files from remote HDFS

My local machine does not have an hdfs installation. I want to retrieve files from a remote hdfs cluster. What's the best way to achieve this? Do I need to get the files from hdfs to one of the cluster machines fs and then use ssh to retrieve them? I want to be able to do this programmatically through say a bash script.
Here are the steps:
Make sure there is connectivity between your host and the target cluster
Configure your host as client, you need to install compatible hadoop binaries. Also your host needs to be running using same operating system.
Make sure you have the same configuration files (core-site.xml, hdfs-site.xml)
You can run hadoop fs -get command to get the files directly
Also there are alternatives
If Webhdfs/httpFS is configured, you can actually download files using curl or even your browser. You can write bash scritps if Webhdfs is configured.
If your host cannot have Hadoop binaries installed to be client, then you can use following instructions.
enable password less login from your host to the one of the node on the cluster
run command ssh <user>#<host> "hadoop fs -get <hdfs_path> <os_path>"
then scp command to copy files
You can have the above 2 commands in one script

Copy files from Remote Unix and Windows servers into HDFS without intermediate staging

How can I copy files from remote Unix and Windows servers into HDFS without intermediate staging from the command line?
You can use following command:
hadoop fs -cp /user/myuser/copyTestFolder/* hdfs://remoteServer:8020/user/remoteuser/copyTestFolder/
or vice versa to copy from server to local machine.
You can also read the hadoop documentation.
You can use WebHDFS and cURL to upload files. This will not require having any hadoop binaries on your client, just a cURL or cURL like client. The BigInsights Knowledge Center has information on how to administer the file system using the HttpFS REST APIs.

Resources