Most efficient way to write data to hadoop

Most efficient way to write data to hadoop - hadoop

I am new to Hadoop HDFS. I am trying to learn how to write data read from local file to hadoop HDFS . I want to know how to write in an efficient way. Please help

You can try like this
hadoop fs -put localpath hdfspath
Example
hadoop fs -put /user/sample.txt /sample.txt
You can google it to find more hdfs commands. Refer here

Related

why mapreduce doesn't get launched when using hadoop fs -put command?

Please excuse me for this basic question.
But I wonder why mapreduce job don't get launched when we try to load some file having size more than the block size.
Somewhere I learnt that MapReduce will take care of loading the datasets from LFS to HDFS. Then why I am not able to see mapreduce logs on the console when I give hadoop fs -put command?
thanks in Advance.

You're thinking of hadoop distcp which will spawn a MapReduce job.
https://hadoop.apache.org/docs/stable/hadoop-distcp/DistCp.html
DistCp Version 2 (distributed copy) is a tool used for large inter/intra cluster copying. It uses MapReduce to effect its distribution, error handling and recovery, and reporting. It expands a list of files and directories into input to map tasks, each of which will copy a partition of the files specified in the source list.
hadoop fs -put or hdfs dfs -put are implemented entirely by HDFS and don't require MapReduce.

how do you perform hadoop fs -getmerge on dataproc from google storage

How do you use getmerge on dataproc for part files which are dumped to the google storage bucket.
If I try this hadoop fs -getmerge gs://my-bucket/temp/part-* gs://my-bucket/temp_merged
I get an error
getmerge: /temp_merged (Permission denied)
It works fine for hadoop fs -getmerge gs://my-bucket/temp/part-* temp_merged but that of course writes the merged file on the cluster machine and not in GS.

According to the fsshell documentation, the getmerge command fundamentally treats the destination path as a "local" path (so in gs://my-bucket/temp_merged it's ignoring the "scheme" and "authority" components, trying to write directly to your local filesystem path /temp_meged; this is not specific to the GCS connector; you'll see the same thing if you try hadoop fs -getmerge gs://my-bucket/temp/part-* hdfs:///temp_merged, and even worse, if you try something like hadoop fs -getmerge gs://my-bucket/temp/part-* hdfs:///tmp/temp_merged, you may think it succeeded when in fact the file did not appear inside hdfs:///tmp/temp_merged, but instead appeared under your local filesystem, file:///tmp/temp_merged.
You can instead make use of piping stdout/stdin to make it happen; unfortunately -getmerge doesn't play well with /dev/stdout due to permissions and usage of .crc files, but you can achieve the same effect using the feature in hadoop fs -put which supports reading from stdin:
hadoop fs -cat gs://my-bucket/temp/part-* | \
hadoop fs -put - gs://my-bucket/temp_merged

How to decompress .Snappy files in Hadoop HDFS?

I have some snappy compressed Snappy files in a directory in HDFS. I need to decompress each file and load into a Text file. Any Hadoop DFS commands are available? I am new here. Kindly help.
Thanks,
Praveen.

One way you can achieve it is via -text hadoop command
hadoop fs -text /hdfs_path/hdfs_file.snappy > some_unix_file.txt
hadoop fs -put some_unix_file.txt /hdfs_path

Loading files into hadoop

I have a directory structure with data on a local filesystem. I need to replicate it to Hadoop cluster.
For now I found three ways to do it:
using "hdfs dfs -put" command
using hdfs nfs gateway
mounting my local dir via nfs on each datanode and using distcp
Am I missing any other tools? Which one of these would be the fastest way to make a copy?

I think hdfs dfs -put or hdfs dfs -copyFromLocal would be the simplest way of doing it.
If you have a lot data (many files), you can copy them programmatically.
FileSystem fs = FileSystem.get(conf);
fs.copyFromLocalFile(new Path("/home/me/localdirectory/"), new Path("/me/hadoop/hdfsdir"));

No such file or directory in copying file to hadoop

i'm beginner in hadoop, when i use
Hadoop fs -ls /
And
Hadoop fs - mkdir /pathname
Every thing is ok, but i want to use my csv file in hadoop, my file is in c drive, i used -put and wget and copyfromlocal commands like these:
Hadoop fs -put c:/ path / myhadoopdir
Hadoop fs copyFromLoacl c:/...
Wget ftp://c:/...
But in two of above it errors in no such file or directory /myfilepathinc:
And for the third
Unable to resolve host address"c"
Thanks for your help

Looking at your command, it seems that there could be couple of reasons for this issue.
Hadoop fs -put c:/ path / myhadoopdir
Hadoop fs copyFromLoacl c:/...
Use hadoop fs -copyFromLocal correctly.
Check your local file permission. You have to give full access to that file.
You have to give your absolute path location both in local and in hdfs.
Hope it will work for you.

salmanbw's answer is exact. To be more clear.
Suppose your file is "c:\testfile.txt", use the command below.
And also make sure you have write permission to your directory in HDFS.
hadoop fs -copyFromLocal c:\testfile.txt /HDFSdir/testfile.txt

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Most efficient way to write data to hadoop - hadoop

I am new to Hadoop HDFS. I am trying to learn how to write data read from local file to hadoop HDFS . I want to know how to write in an efficient way. Please help

You can try like this hadoop fs -put localpath hdfspath Example hadoop fs -put /user/sample.txt /sample.txt You can google it to find more hdfs commands. Refer here

Related

why mapreduce doesn't get launched when using hadoop fs -put command?

how do you perform hadoop fs -getmerge on dataproc from google storage

How to decompress .Snappy files in Hadoop HDFS?

Loading files into hadoop

No such file or directory in copying file to hadoop

Categories

Resources