FsImage offline viewer - hadoop

I have fsimage stored in my local directory, Using offline viewer command specified at 'https://archive.cloudera.com/cdh/3/hadoop/hdfs_imageviewer.html' I have followed the instruction and executed the below command :
hadoop oiv -i fsimage -o fsimage.txt
Output is :
16/06/24 08:09:18 INFO offlineImageViewer.FSImageHandler: Loading 24 strings
16/06/24 08:09:18 INFO offlineImageViewer.FSImageHandler: Loading 3027842 inodes.
16/06/24 08:09:32 INFO offlineImageViewer.FSImageHandler: Loading inode references
16/06/24 08:09:32 INFO offlineImageViewer.FSImageHandler: Loaded 0 inode references
16/06/24 08:09:32 INFO offlineImageViewer.FSImageHandler: Loading inode directory section
16/06/24 08:09:35 INFO offlineImageViewer.FSImageHandler: Loaded 1446245 directories
16/06/24 08:09:35 INFO offlineImageViewer.WebImageViewer: WebImageViewer started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.
But the fsimage.txt file is of zero size, I have executed the command for XML format
hdfs oiv -p XML -i fsimage -o fsimage.xml
Which gives me fsimage.xml but I want this to be in '.txt' format, also In the output it says :
WebImageViewer started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.
Is there any UI available for accessing it, if yes how can we access it?

You can use indented format which is delineate the section of fsImage into separate levels of indentation.This also will be saved in txt format
You can refer this http://hadooptutorial.info/oiv-hdfs-offline-image-viewer/

Related

Can't exit or forceExit from Hadoop safe mode

I am trying to delete a folder in hadoop:
hadoop fs -rm -R /seqr-reference-data/GRCh37/all_reference_data
But it is giving me an error:
INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
rm: Cannot delete /seqr-reference-data/GRCh37/all_reference_data. Name node is in safe mode.
I tried the solutions specified here:
Name node is in safe mode. Not able to leave
None of them work. It writes me Safe mode is OFF and then I see the same error. What else could I try doing?
Update
Just found a duplicate thread but it has no answer:
Not able to delete the data from hdfs, even after leaving safemode?

Piping bzip2 output into tdbloader2 (apache-jena) gives "File does not exist"

I want to pipe the output from bzip2 and use it as an input to fill a TDB database using tbdloader2 from apache-jena-3.9.0.
I already found
Generating TDB Dataset from archive containing N-TRIPLES files
but the proposed solution there did not work for me.
bzip2 -dc test.ttl.bz2 | tdbloader2 --loc=/pathto/TDBdatabase_test -- -
produces
20:08:01 INFO -- TDB Bulk Loader Start
20:08:01 INFO Data Load Phase
20:08:01 INFO Got 1 data files to load
20:08:01 INFO Data file 1: /home/user/-
File does not exist: /home/user/-
20:08:01 ERROR Failed during data phase
Similar results I got with with (inspired by https://unix.stackexchange.com/questions/16990/using-data-read-from-a-pipe-instead-than-from-a-file-in-command-options)
bzip2 -dc test.ttl.bz2 | tdbloader2 --loc=/pathto/TDBdatabase_test /dev/stdin
20:34:45 INFO -- TDB Bulk Loader Start
20:34:45 INFO Data Load Phase
20:34:45 INFO Got 1 data files to load
20:34:45 INFO Data file 1: /proc/16256/fd/pipe:[92062]
File does not exist: /proc/16256/fd/pipe:[92062]
20:34:45 ERROR Failed during data phase
and
bzip2 -dc test.ttl.bz2 | tdbloader2 --loc=/pathto/TDBdatabase_test /dev/fd/0
20:34:52 INFO -- TDB Bulk Loader Start
20:34:52 INFO Data Load Phase
20:34:52 INFO Got 1 data files to load
20:34:52 INFO Data file 1: /proc/16312/fd/pipe:[97432]
File does not exist: /proc/16312/fd/pipe:[97432]
20:34:52 ERROR Failed during data phase
unpacking the bz2 file manually and then adding it works fine:
bzip2 -d test.ttl.bz2
tdbloader2 --loc=/pathto/TDBdatabase_test test.ttl
Would be great if someone could point me in the right direction.
tdbloader2 accepts bz2 compressed files on the command line:
tdbloader2 --loc=/pathto/TDBdatabase_test test.ttl.bz2
It doesn't accept input from a pipe - and if it did, then it would not know the syntax is Turtle which it gets from the file extension.

how to find a file in memory using volatility

There is an IMViewer.exe process in memory and open them file IMMAIL.IMM
vol.py -f d:\dump\dump\CRM-20180416-165859.dmp --profile=Win2012R2x64_18340 --kdbg=0xf80173c3f8e0 dlllist -p 8256 > dlllist.txt
IMViewer.EXE pid: 8256
Command line : "C:\Program Files (x86)\Inbit\Inbit Messenger Server\IMViewer.exe" "C:\Program Files (x86)\Inbit\Inbit Messenger Server\USER_ACCT\00001\IMMAIL.IMM"
Note: use ldrmodules for listing DLLs in Wow64 processes
Base Size LoadCount Path
------------------ ------------------ ------------------ ----
0x0000000000400000 0x208000 0x0 C:\Program Files (x86)\Inbit\Inbit Messenger Server\IMViewer.exe
0x00007ffca1a20000 0x1ad000 0x0 C:\Windows\SYSTEM32\ntdll.dll
0x0000000077850000 0x4b000 0x0 C:\Windows\SYSTEM32\wow64.dll
0x00000000777e0000 0x68000 0x0 C:\Windows\system32\wow64win.dll
0x00000000777d0000 0x9000 0x0 C:\Windows\system32\wow64cpu.dll
Execution
vol.py -f d:\dump\dump\CRM-20180416-165859.dmp --profile=Win2012R2x64_18340 --kdbg=0xf80173c3f8e0 dumpfiles -r IMM$ -i --name -D FileHandles/
does not find the file .IMM in memory.
The file IMMAIL.IMM is open and I can use it, but it was deleted from the disk and it could not be restored. Program IMViewer.EXE - Viewer and I cannot save the file IMMAIL.IMM. I wanted to find the file IMMAIL.IMM in memory and save it using dumpfiles, but the file can't be found. What can I do to find a file IMMAIL.IMM in memory?
Kinda new to this but this may help `Vol.py -f {file} --profile{profile} filescan | grep .ILL [ or the absoulute name fo the program instead ] and extract the file

How fsi image is stored in hadoop?

How fsi image is stored in hadoop(secondary namenode fsimage format), like table format or file format. If file format means it is compressed or non compressed and it is readable format?
Thanks
venkatbala
Fsimage is an “Image” file and It is not in a human-readable format. You have to use HDFS Offline Image Viewer in Hadoop to convert it to a readable format.
The contents of the fsimage is just an "Image" and cannot be read with "CAT". Basically the fsimage content has the meta data information like directory structure ,transaction ,etc . There is a tool "oiv " using it you can convert the fsimage into text file .
Download the fsimage using
hdfs dfsadmin -fetchImage /tmp
Then excute the below command -i - input , -o output
hdfs oiv -i fsimage_0000000000000001382 -o /tmp/fsimage.txt

Hadoop, Mapreduce - Cannot obtain block length for LocatedBlock

I've a file on hdfs in the path 'test/test.txt' which is 1.3G
output of ls and du commands is:
hadoop fs -du test/test.txt -> 1379081672 test/test.txt
hadoop fs -ls test/test.txt ->
Found 1 items
-rw-r--r-- 3 testuser supergroup 1379081672 2014-05-06 20:27 test/test.txt
I want to run a mapreduce job on this file but when i start the mapreduce job on this file the job fails with the following error:
hadoop jar myjar.jar test.TestMapReduceDriver test output
14/05/29 16:42:03 WARN mapred.JobClient: Use GenericOptionsParser for parsing the
arguments. Applications should implement Tool for the same.
14/05/29 16:42:03 INFO input.FileInputFormat: Total input paths to process : 1
14/05/29 16:42:03 INFO mapred.JobClient: Running job: job_201405271131_9661
14/05/29 16:42:04 INFO mapred.JobClient: map 0% reduce 0%
14/05/29 16:42:17 INFO mapred.JobClient: Task Id : attempt_201405271131_9661_m_000004_0, Status : FAILED
java.io.IOException: Cannot obtain block length for LocatedBlock{BP-428948818-namenode-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode4:50010, datanode3:50010, datanode1:50010]}
at org.apache.hadoop.hdfs.DFSInputStream.readBlockLength(DFSInputStream.java:319)
at org.apache.hadoop.hdfs.DFSInputStream.fetchLocatedBlocksAndGetLastBlockLength(DFSInputStream.java:263)
at org.apache.hadoop.hdfs.DFSInputStream.openInfo(DFSInputStream.java:205)
at org.apache.hadoop.hdfs.DFSInputStream.<init>(DFSInputStream.java:198)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1117)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:746)
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:83)
at org.apache.hadoop.mapred.Ma`
i tried the following commands:
hadoop fs -cat test/test.txt gives the following error
cat: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode3:50010, datanode1:50010, datanode4:50010]}
additionally i can't copy the file hadoop fs -cp test/test.txt tmp gives same error:
cp: Cannot obtain block length for LocatedBlock{BP-428948818-10.17.56.16-1392736828725:blk_-6790192659948575136_8493225; getBlockSize()=36904392; corrupt=false; offset=1342177280; locs=[datanode1:50010, datanode3:50010, datanode4:50010]}
output of the hdfs fsck /user/testuser/test/test.txt command:
Connecting to namenode via `http://namenode:50070`
FSCK started by testuser (auth:SIMPLE) from /10.17.56.16 for path
/user/testuser/test/test.txt at Thu May 29 17:00:44 EEST 2014
Status: HEALTHY
Total size: 0 B (Total open files size: 1379081672 B)
Total dirs: 0
Total files: 0 (Files currently being written: 1)
Total blocks (validated): 0 (Total open file blocks (not validated): 21)
Minimally replicated blocks: 0
Over-replicated blocks: 0
Under-replicated blocks: 0
Mis-replicated blocks: 0
Default replication factor: 3
Average block replication: 0.0
Corrupt blocks: 0
Missing replicas: 0
Number of data-nodes: 5
Number of racks: 1
FSCK ended at Thu May 29 17:00:44 EEST 2014 in 0 milliseconds
The filesystem under path /user/testuser/test/test.txt is HEALTHY
by the way i can see the content of the test.txt file from the web browser.
hadoop version is: Hadoop 2.0.0-cdh4.5.0
I got the same issue with you and I fixed it by the following steps.
There are some files that opened by flume but never closed (I am not sure about your reason).
You need to find the name of the opened files by the command:
hdfs fsck /directory/of/locked/files/ -files -openforwrite
You can try to recover files as command:
hdfs debug recoverLease -path <path-of-the-file> -retries 3
Or removing them by the command:
hdfs dfs -rmr <path-of-the-file>
I had the same error, but it was not due to the full disk problem, and I think the inverse, where there were files and blocks referenced by in the namenode that did not exist on any datanodes.
Thus, hdfs dfs -ls shows the files, but any operation on them fails, e.g. hdfs dfs -copyToLocal.
In my case, the hard part was isolating which files were listed but corrupted, as they existed in a tree having thousands of files. Oddly, hdfs fsck /path/to/files/ did not report any problems.
My solution was:
Isolate the location using copyToLocal which resulted in copyToLocal: Cannot obtain block length for LocatedBlock{BP-1918381527-10.74.2.77-1420822494740:blk_1120909039_47667041; getBlockSize()=1231; corrupt=false; offset=0; locs=[10.74.2.168:50010, 10.74.2.166:50010, 10.74.2.164:50010]} for several files
Get a list of the local directories using ls -1 > baddirs.out
get rid of the local files from the first copyToLocal
use for files incat baddirs.out;do echo $files; hdfs dfs -copyToLocal $files This will produce a list of directories checks, and errors where files are found.
get rid of the local files again, and now get lists of files from each affected subdirectory. Use that as input to a file-by-file copyToLocal, at which point you can echo each file as it's copied, then see where the error occurs.
use hdfs dfs -rm <file> for each file.
Confirm you got 'em all be removing all local files again, and using the original copyToLocal on the top level directory where you had problems.
A simple two hour process!
You are having some corrupted files with no blocks on datanode but an entry in namenode. Best to follow this:
https://stackoverflow.com/a/19216037/812906
According to this this may be produced by a full disk problem. I came across the same problem recently with an old file and checking my servers metrics it effectively was a full disk problem during the creation of that file. Most solutions just claim to delete the file and prey for it not happening again.

Resources