how t restore a hdfs deleted file - hadoop

I was asked with below question .
Interviewer: how to recover a deleted file in hdfs.
Me: from trash directory we can copy/move back to original directory.
Interviewer: Is there any other way except from trash recovery.
Me: I said No.
So my question is , whether there is really any way to recover deleted files or interviewer just asked me to test my confidence.
I have found below way to recover which is different from hdfs -cp/mv but it is also getting file from trash .
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true -D dfs.checksum.tpe=CRC32C -m 10 -pb -update /users/vijay/.Trash/ /application/data/vijay;

Hadoop has provided HDFS snapshot (SnapShot) function since version 2.1.0
You can try to use it
First,Create SnapShot
hdfs dfsadmin -allowSnapshot /user/hdfs/important
hdfs dfs -createSnapshot /user/hdfs/important important-snapshot
Next,try to delete one file
hdfs dfs -rm -r /user/hdfs/important/important-file.txt
Final,restore it
hdfs dfs -ls /user/hdfs/important/.snapshot/
hdfs dfs -cp /user/hdfs/important/.snapshot/important-snapshot/important-file.txt /user/hdfs/important/
hdfs dfs -cat /user/hdfs/important/important-file.txt
P.S:You have to use CP Command (not MV Command) to recover deleted file in this way Because the deleted file in snapshot is only-read file
Wish my answer can help you

Related

HDFS Directory with '.' in the name

I accidentally created a directory in HDFS that is named 'again.' and I am trying to delete the directory. I have tried everything that I can think to help but, have been unsuccessful. I tried 'hdfs dfs -rm -r /user/[username]/*'. I tried 'hdfs dfs -rm -r '/user/[username]/again.'. None of these have worked ! Even the first which deleted every directory except for the directory that I wanted to delete.
Hadoop 2.7.3
Any thoughts ?
You could try with a ? placeholder:
hdfs dfs -rm -r /user/[username]/again?
That could theoretically match other files too, but if you have only one matching file it should work tolerably well.
Try using
hdfs dfs -rm -r "/user/[username]/again\."
or
hdfs dfs -rm -r ".\ /user/[username]/again\."
Note: In case you have Hue, please do it in Hue. That will make life easy.
None of the responses worked but, thank you all for responding. I just dropped the entire directory structure and refreshed the environment from an existing instance.

How to execute command like 'recoverLease' in Hadoop 2.6.4

I am using Hadoop 2.6.4 and I have files in status 'openforwrite'. I got the solutions: run 'hdfs debug recoverLease' to recover the lease of the hdfs block files.(Features After Hadoop 2.7.0)
But in my hadoop version(2.6.4) I can't execute recoverLease commands. Is there any ideas to fix that?
Thanks a lot.
In some rare cases, files can be stuck in the OPENFORWRITE state in HDFS more than the default expiration time. If this happens, the data needs to be moved to a new inode to clear up the OPENFORWRITE status.
Solution
1) Stop all applications writing to HDFS.
2) Move the file temporarily to some other location.
$ hdfs dfs -mv /Path_to_file /tmp/
3) Copy the file back to its original location. This will force a new inode to be created and will clear up the OPENFORWRITE state
$ hdfs dfs -cp /tmp/Path_to_file /Original/destination
4) Once you have confirmed that the file is working correctly, remove the copied file from temp location.
$ hdfs dfs -rm /tmp/Path_to_file

Copied file from HDFS don't show in local machine

I copied a folder from HDFS to my local machine using the following command:
hdfs dfs -copyToLocal hdfs:///user/myname/output-64-32/
~/Documents/fromHDFS
But I can not see any file in fromHDFS folder and also when I try to run the command again, it says "File exists".
Any help is really appreciated.
Thanks.
Try these
rm -r ~/Documents/fromHDFS/*
hdfs dfs -get /user/myname/output-64-32/ ~/Documents/fromHDFS/

Bash unable to create directory

In docker, I want to copy a file README.md from an existing directory /opt/ibm/labfiles to a new one /input/tmp. I try this
hdfs dfs -put /opt/ibm/labfiles/README.md input/tmp
to no effect, because there seems to be no /input folder in the root. So I try to create it:
hdfs dfs -mkdir /input
mkdir:'/input': File exists
However, when I ls, there is no input file or directory
How can I create a folder and copy the file? Thank you!!
Please try hdfs dfs -ls / if you want to see there is an input folder that exists in HDFS at the root.
You cannot cd into an HDFS directory
It's also worth mentioning that the leading slash is important. In other words,
This will try to put the file in HDFS at /user/<name>/input/tmp
hdfs dfs -put /opt/ibm/labfiles/README.md input/tmp
While this puts the file at the root of HDFS
hdfs dfs -put /opt/ibm/labfiles/README.md /input/tmp

Reading files from hdfs vs local directory

I am a beginner in hadoop. I have two doubts
1) how to access files stored in the hdfs? Is it same as using a FileReader in java.io and giving the local path or is it something else?
2) i have created a folder where i have copied the file to be stored in hdfs and the jar file of the mapreduce program. When I run the command in any directory
${HADOOP_HOME}/bin/hadoop dfs -ls
it just shows me all the files in the current dir. So does that mean all the files got added without me explicitly adding it?
Yes, it's pretty much the same. Read this post to read files from HDFS.
You should keep in mind that HDFS is different than your local file system. With hadoop dfs you access the HDFS, not the local file system. So, hadoop dfs -ls /path/in/HDFS shows you the contents of the /path/in/HDFS directory, not the local one. That's why it's the same, no matter where you run it from.
If you want to "upload" / "download" files to/from HDFS you should use the commads:
hadoop dfs -copyFromLocal /local/path /path/in/HDFS and
hadoop dfs -copyToLocal /path/in/HDFS /local/path, respectively.

Resources