I am using Hadoop 2.6.4 and I have files in status 'openforwrite'. I got the solutions: run 'hdfs debug recoverLease' to recover the lease of the hdfs block files.(Features After Hadoop 2.7.0)
But in my hadoop version(2.6.4) I can't execute recoverLease commands. Is there any ideas to fix that?
Thanks a lot.
In some rare cases, files can be stuck in the OPENFORWRITE state in HDFS more than the default expiration time. If this happens, the data needs to be moved to a new inode to clear up the OPENFORWRITE status.
Solution
1) Stop all applications writing to HDFS.
2) Move the file temporarily to some other location.
$ hdfs dfs -mv /Path_to_file /tmp/
3) Copy the file back to its original location. This will force a new inode to be created and will clear up the OPENFORWRITE state
$ hdfs dfs -cp /tmp/Path_to_file /Original/destination
4) Once you have confirmed that the file is working correctly, remove the copied file from temp location.
$ hdfs dfs -rm /tmp/Path_to_file
Related
I was asked with below question .
Interviewer: how to recover a deleted file in hdfs.
Me: from trash directory we can copy/move back to original directory.
Interviewer: Is there any other way except from trash recovery.
Me: I said No.
So my question is , whether there is really any way to recover deleted files or interviewer just asked me to test my confidence.
I have found below way to recover which is different from hdfs -cp/mv but it is also getting file from trash .
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true -D dfs.checksum.tpe=CRC32C -m 10 -pb -update /users/vijay/.Trash/ /application/data/vijay;
Hadoop has provided HDFS snapshot (SnapShot) function since version 2.1.0
You can try to use it
First,Create SnapShot
hdfs dfsadmin -allowSnapshot /user/hdfs/important
hdfs dfs -createSnapshot /user/hdfs/important important-snapshot
Next,try to delete one file
hdfs dfs -rm -r /user/hdfs/important/important-file.txt
Final,restore it
hdfs dfs -ls /user/hdfs/important/.snapshot/
hdfs dfs -cp /user/hdfs/important/.snapshot/important-snapshot/important-file.txt /user/hdfs/important/
hdfs dfs -cat /user/hdfs/important/important-file.txt
P.S:You have to use CP Command (not MV Command) to recover deleted file in this way Because the deleted file in snapshot is only-read file
Wish my answer can help you
I use Windows 8 with a cloudera-quickstart-vm-5.4.2-0 virtual box.
I downloaded a text file as words.txt into the Downloads folder.
I changed directory to Downloads and used hadoop fs -copyFromLocal words.txt
I get the no such file or directory error.
Can anyone explain me why this is happening / how to solve this issue?
Here is a screenshot of the terminal:
Someone told me this error occurs when Hadoop is in safe mode, but I have made sure that the safe mode is OFF.
It's happening because hdfs:///user/cloudera doesn't exist.
Running hdfs dfs -ls probably gives you a similar error.
Without specified destination folder, it looks for ., the current HDFS directory for the UNIX account running the command.
You must hdfs dfs -mkdir "/user/$(whoami)" before your current UNIX account can use HDFS, or you can specify an otherwise existing HDFS location to copy to
I have created a file in hdfs using below command
hdfs dfs -touchz /hadoop/dir1/file1.txt
I could see the created file by using below command
hdfs dfs -ls /hadoop/dir1/
But, I could not find the location itself by using linux commands (using find or locate). I searched on internet and found following link.
How to access files in Hadoop HDFS? . It says, hdfs is virtual storage. In that case, How its taking partition which one or how much it needs to be used, where the meta data being stored
Is it taking datanode location for virtual storage which I have mentioned in hdfs-site.xml to store all the data?
I looked into datanode location and there are files available. But I could not find out anything related to my file or folder which I have created.
(I am using hadoop 2.6.0)
HDFS file system is a distributed storage system wherein the storage location is virtual and created using the disk space from all the DataNodes. While installing hadoop, you must have specified paths for dfs.namenode.name.dir and dfs.datanode.data.dir. These are the locations at which all the HDFS related files are stored on individual nodes.
While storing the data onto HDFS, it is stored as blocks of a specified size (default 128MB in Hadoop 2.X). When you use hdfs dfs commands you will see the complete files but internally HDFS stores these files as blocks. If you check the above mentioned paths on your local file system, you will see a bunch of files which correcpond to files on your HDFS. But again, you will not see them as actual files as they are split into blocks.
Check below mentioned command's output to get more details on how much space from each DataNode is used to create the virtual HDFS storage.
hdfs dfsadmin -report #Or
sudo -u hdfs hdfs dfsadmin -report
HTH
As we creating a file in local file system i.e on creating a directory in it
for ex:$/mkdir MITHUN94** it is a directory entering into that(LFS) cd MITHUN90
in that **create a new file as **$nano file1.log .
And now create a directory in** hdfs for ex: hdfs dfs -mkdir /mike90 .Here "mike90"
refers to directory name . After that creating a directory send files from LFS to hdfs. By using this command $hdfs dfs -copyFromLocal /home/gopalkrishna/file1.log
/mike90
Here '/home/gopalkrishna/file1.log' refers to pwd (present working directory)
and '/mike90' refers to directory in hdfs. By clickig $hdfs dfs -ls /mike90
the list of files .
I am a beginner in hadoop. I have two doubts
1) how to access files stored in the hdfs? Is it same as using a FileReader in java.io and giving the local path or is it something else?
2) i have created a folder where i have copied the file to be stored in hdfs and the jar file of the mapreduce program. When I run the command in any directory
${HADOOP_HOME}/bin/hadoop dfs -ls
it just shows me all the files in the current dir. So does that mean all the files got added without me explicitly adding it?
Yes, it's pretty much the same. Read this post to read files from HDFS.
You should keep in mind that HDFS is different than your local file system. With hadoop dfs you access the HDFS, not the local file system. So, hadoop dfs -ls /path/in/HDFS shows you the contents of the /path/in/HDFS directory, not the local one. That's why it's the same, no matter where you run it from.
If you want to "upload" / "download" files to/from HDFS you should use the commads:
hadoop dfs -copyFromLocal /local/path /path/in/HDFS and
hadoop dfs -copyToLocal /path/in/HDFS /local/path, respectively.
In one of my folders on the HDFS, i have about 37 gigabytes of data
hadoop fs -dus my-folder-name
When i execute a
hadoop fs -rmr my-folder-name
the command executes in a flash. However on non-distributed files systems, an rm -rf would take much longer for a similarly sized directory
Why is there so much of a difference? I have a 2 node cluster
The fact is that when you issue hadoop fs -rmr, the Hadoop moved the files to .Trash folder under your home directory on HDFS. Under the hood I believe it's just a record change in the namenode to move the files location on HDFS. This is the reason why it's very fast.
Usually in an OS, a delete command deletes the associated meta data and not the actual data and so the reason why it is fast. The same is the case with the HDFS also, the block might be still in the DN's, but all the references to them are removed. Note that the delete command frees up the space though.