Why do files need time to synchronize after writing in writeType THROUGH in Alluxio? - alluxio

When I write file in directory mounted by alluxio-fuse using writeType THROUGH. I find that it takes 2-3 minutes to synchronize files. Why do files need time to synchronize?
Following is mount direactory. write time : 15:40. after sync: 15:43

When writing to Alluxio with THROUGH writeType, Alluxio will first update its metadata to show files with zero bytes (like the 9.txt in your first image). When the file is successfully written to Alluxio Under FileSystem, Alluxio will update its metadata to show the actual size of this file (9.txt show its actual size as 209715200 bytes).
The 2 to 3 minutes is the time that Alluxio writes data to under filesystem.
Thanks,

Related

How to find the slowest datanodes?

I install one HDFS cluster that have 15 datanodes. Sometimes the writing performance of the entire hdfs cluster is slow.
How i to find the slowest datanode,which node can cause this problem。
The most common cause of a slow datanode is a bad disk. Disk timeout errors (EIO) defaults range from 30 to 90 seconds so any activity on that disk will take a long time.
You can check this by looking at dfs.datanode.data.dir in your hdfs-site.xmls for every datanode and verifying that each of the directories mentioned actually work.
For example:
run ls on the directory
cd into the directory
create a file under the directory
write into a file under the directory
read contents of a file under the directory
If any of these activities don't work or take a long time, then that's your problem.
You can also run dmesg on each host and look for disk errors.
Additional Information
HDFS DataNode Scanners and Disk Checker Explained
HDFS-7430 - Rewrite the BlockScanner to use O(1) memory and use multiple threads
https://superuser.com/questions/171195/how-to-check-the-health-of-a-hard-drive

Hadoop HDFS file recovery elapsed time (start time and end time)

I need to measure the speed of recovery for files with different file sizes stored with different storage policies (replication and erasure codes) in HDFS.
My question is: Is there a way to see the start time and end time (or simply the elapsed time in seconds) for a file recovery process in HDFS? For a specific file?
I mean the start time from where the system detects node failures (and starts the recovery process), and until HDFS recovers the data (and possibly reallocates nodes) and makes the file "stable" again?
Maybe I can look into some metadata files or log files of the particular file to see some timestamps etc? Or is there a file where I can see all the activity of a HDFS file?
I would really appreciate some terminal commands to get this info.
Thank you so much in advance!

How to make Hadoop Map Reduce process multiple files in a single run ?

For Hadoop Map Reduce program when we run it by executing this command $hadoop jar my.jar DriverClass input1.txt hdfsDirectory. How to make Map Reduce process multiple files( input1.txt & input2.txt ) in a single run ?
Like that:
hadoop jar my.jar DriverClass hdfsInputDir hdfsOutputDir
where
hdfsInputDir is the path on HDFS where your input files are stored (i.e., the parent directory of input1.txt and input2.txt)
hdfsOutputDir is the path on HDFS where the output will be stored (it should not exist before running this command).
Note that your input should be copied on HDFS before running this command.
To copy it to HDFS, you can run:
hadoop dfs -copyFromLocal localPath hdfsInputDir
This is your small files problem. for every file mapper will run.
A small file is one which is significantly smaller than the HDFS block size (default 64MB). If you’re storing small files, then you probably have lots of them (otherwise you wouldn’t turn to Hadoop), and the problem is that HDFS can’t handle lots of files.
Every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies 150 bytes, as a rule of thumb. So 10 million files, each using a block, would use about 3 gigabytes of memory. Scaling up much beyond this level is a problem with current hardware. Certainly a billion files is not feasible.
solution
HAR files
Hadoop Archives (HAR files) were introduced to HDFS in 0.18.0 to alleviate the problem of lots of files putting pressure on the namenode’s memory. HAR files work by building a layered filesystem on top of HDFS. A HAR file is created using the hadoop archive command, which runs a MapReduce job to pack the files being archived into a small number of HDFS files. To a client using the HAR filesystem nothing has changed: all of the original files are visible and accessible (albeit using a har:// URL). However, the number of files in HDFS has been reduced.
Sequence Files
The usual response to questions about “the small files problem” is: use a SequenceFile. The idea here is that you use the filename as the key and the file contents as the value. This works very well in practice. Going back to the 10,000 100KB files, you can write a program to put them into a single SequenceFile, and then you can process them in a streaming fashion (directly or using MapReduce) operating on the SequenceFile. There are a couple of bonuses too. SequenceFiles are splittable, so MapReduce can break them into chunks and operate on each chunk independently. They support compression as well, unlike HARs. Block compression is the best option in most cases, since it compresses blocks of several records (rather than per record).

Does Hadoop not show incomplete files?

I'm using the command fs -put to copy a huge 100GB file into HDFS. My HDFS block size is 128MB. The file copy takes a long time. My question is while the file copy is in progress, the other users are not able to see the file. Is this by design? How can we enable access to this partial file by another user so that he too can monitor the copy progress.
The size is shown block by block. So if your bloack size is 128MB, then you'll see the file size as 128MB when the first block is done, then after some time you'll see the size as 256MB and so on until the entire file is copied. So you can use the regular HDFS UI or command line hadoop fs -ls to monitor block-by-block copy progress. You can also read the part that is already copied using hadoop fs -cat even while the copy is in progress.
According to the Hadoop - The Definitive Guide
Once more than a block’s worth of data has been written, the first block will be visible
to new readers. This is true of subsequent blocks, too: it is always the current block
being written that is not visible to other readers.

How does HDFS with append works

Let's assume one is using default block size (128 MB), and there is a file using 130 MB ; so using one full size block and one block with 2 MB. Then 20 MB needs to be appended to the file (total should be now of 150 MB). What happens?
Does HDFS actually resize the size of the last block from 2MB to 22MB? Or create a new block?
How does appending to a file in HDFS deal with conccurency?
Is there risk of dataloss ?
Does HDFS create a third block put the 20+2 MB in it, and delete the block with 2MB. If yes, how does this work concurrently?
According to the latest design document in the Jira issue mentioned before, we find the following answers to your question:
HDFS will append to the last block, not create a new block and copy the data from the old last block. This is not difficult because HDFS just uses a normal filesystem to write these block-files as normal files. Normal file systems have mechanisms for appending new data. Of course, if you fill up the last block, you will create a new block.
Only one single write or append to any file is allowed at the same time in HDFS, so there is no concurrency to handle. This is managed by the namenode. You need to close a file if you want someone else to begin writing to it.
If the last block in a file is not replicated, the append will fail. The append is written to a single replica, who pipelines it to the replicas, similar to a normal write. It seems to me like there is no extra risk of dataloss as compared to a normal write.
Here is a very comprehensive design document about append and it contains concurrency issues.
Current HDFS docs gives a link to that document, so we can assume that it is the recent one. (Document date is 2009)
And the related issue.
Hadoop Distributed File System supports appends to files, and in this case it should add the 20 MB to the 2nd block in your example (the one with 2 MB in it initially). That way you will end up with two blocks, one with 128 MB and one with 22 MB.
This is the reference to the append java docs for HDFS.

Resources