Recover data after partition format f2fs - data-recovery

I've accidentally format partition that was containing postgresql data. I realized quickly what I have done and haven't make anything more with this drive, so the data are untouched (I hope). The partition was f2fs type.
I have made image of entire drive and try to recover using testdrive and photorec (ver 7.1-WIP) but without any success.
Can someone tell me how to deal this problem?

Related

Checksum doesn't match: corrupted data.: while reading column `cid` at /opt/clickhouse//data/click

I am using clickhouse to store data, and I'm getting the following error while querying the column cid from the click table.
Checksum doesn't match: corrupted data.
I don't have any replicate for now, any suggestions for recovery?
The error comes down to the fact the checksum of the CityHash128 and the compressed data doesn't match and throws this exception in the readCompressedData function.
You can try to disable this check using the disable_checksum via the disableChecksumming method.
It could work, but a corrupted most probably means that something is wrong with your raw data and there is small chances for recovery unless you did backups.
Usually, you will get data part name and column name in exception message.
You could locate specific data part, remove files related to that single column, and restart the server. You will lose (already corrupted) data for one column in one data part (it will be filled with default values on read), but all other data will remain.

IBM BigSheets Issue

I am getting some error in loading my files onto big sheets both directly from the HDFS( files that are output of pig scripts) and also raw data that is lying on the local hard disk.
I have observed that whenever I am loading the files and issuing a row count to see if all data is loaded into bigsheets, then I see lesses number of rows being loaded.
I have checked that the files are consistent and proper delimeters(/t or comma separated fields).
Size of my file is around 2GB and I have used either of the format *.csv/ *.tsv.
Also in some cases when i have tired to load a file from windows os directly then the files sometimes load successfully with row count matching with actual number of lines in the data, and then sometimes with lesser number of rowcount.
Even sometimes when a fresh file being used 1st time it gives the correct result but if I do the same operation next time some rows are missing.
Kindly share your experience your bigsheets, solution to any such problems where the entire data is not being loaded etc. Thanks in advance
The data that you originally load into BigSheets is only a subset. You have to run the sheet to get it on the full dataset.
http://www-01.ibm.com/support/knowledgecenter/SSPT3X_3.0.0/com.ibm.swg.im.infosphere.biginsights.analyze.doc/doc/t0057547.html?lang=en

Incrementally updating data on HDFS

We are copying data from various sources such as Oracle, Teradata to HDFS using Sqoop. We use incremental update feature to 'import' new data & then 'merge' it with the existing data. Data first gets populated in a temporary directory & then we 'remove' the old & 'rename' the new one.
Problem is, if a user is running a query against the data on HDFS using a tool such as Hive while we swap the directory, the query terminates abnormally.
Is there a better way to handle the updates on HDFS?
(Please note, that even though HBase keeps different versions, it doesn't work for us because we want to query by any column. HBase is very slow in cases where you don't search by primary key.)
Hadoop is not designed to work like that. It is good for storing data but not editing. I would just add new data beside old data and while adding it(copying or any other import) you could add sufix .tmp to filename. But i did not use hive that much(pig user here) and in pig i could tell A = LOAD '/some/path/to/hdfs/*.log' and that would load all files except .tmp which are importing. With that there is no problems.

Hive external table with wild card

I am trying to create an external Hive table with Location as wild card. Expecting that it will able to traverse the sub directory. But somehow it does not work and instead it create a directory with wild to target location.
Location I am specifying as /hadfsfolder1/hdfsfolder2/weblog/2013/*
No data get exposed to external table, also is there any way to prevent the * directory creation , this cause a lot on inconvenience in terms of cleanup and restoring the directory back.

Making an image of a SD with a partition smaller than the SD size

I have a 16Gb SD where I have a 2Gb partition which I use with a Raspberry Pi. I want to make an image of the SD but only of the partitioned space. I tried to do it with the "Win 32 Disk Imager" but the result is a 16Gb file, so also the non allocated space is copy into the file.
Is there any way to copy only the partitioned space on the image, maybe using Linux utilities? Or is it possible to edit the image file an delete the extra "empty" space?
Thanks!
If you by any chance have an external hard drive you could copy ALL the contents of the Raspberry Pi partition including hidden and operating system files...Then you could delete the partition and merge it with your current partition using a tool like: Minitool Partition Wizard.
Then you could create a system image of your 14GB HDD and copy it to the External HDD and then at last remake the partitions using a partition tool and copy back the contents of your Rasberry Pi partition into the new partition you would have made. Now I do not know what effect this has on registry files so I would try this with caution. I do not know if there are registry files tied to the partition that you cannot copy or see so I would also make a registry backup... Idk if you want to try this but maybe this will help you out.

Resources