Move Tablestorage to BlobStorage - azure-blob-storage

is there any way of moving a table from the Tablestorage into the Blob Storage?
I thought of writing each line into a csv file. But is that really the fastest way?
Cheers,
Joe

The only supported way would be to download the data from Azure Table through Query Entities locally, then write back the data in any form you need against Blob Storage; that could be CSV, some binary format, JSON, etc..
Azure Storage does not provide any Copy or backup functionality from AzureTable to AzureBlob. It is an already requested feature but we don't have any timeline to share.
Thanks,
Jean

Related

Partition Parquet files on Azure Blob (pyarrow)

I have been manually partitioning files with pandas (creating an index or multi-index and then writing a separate parquet file for each index in a loop) to Azure Blob.
However, when reading the docs for pyarrow, I see that it is possible to create a 'dataset' which includes a folder structure for partitioned data. https://arrow.apache.org/docs/python/parquet.html
The example for the Monthly / daily folder is exactly what I am trying to achieve.
dataset_name/
year=2007/
month=01/
0.parq
1.parq
...
month=02/
0.parq
1.parq
...
month=03/
...
year=2008/
month=01/
...
fs = pa.hdfs.connect(host, port, user=user, kerb_ticket=ticket_cache_path)
pq.write_to_dataset(table, root_path='dataset_name',
partition_cols=['one', 'two'], filesystem=fs)
Can I do this with Azure Blob (or Minio which uses S3 and wraps over my Azure Blob storage)? My ultimate goal is to only read files which make sense for my 'query'.
Just per my experience and based on your current environment Linux on Azure VM, I think there are two solutions can read partition parquet files from Azure Storage.
Follow the section Reading a Parquet File from Azure Blob storage of the document Reading and Writing the Apache Parquet Format of pyarrow, manually to list the blob names with the prefix like dataset_name using the API list_blob_names(container_name, prefix=None, num_results=None, include=None, delimiter=None, marker=None, timeout=None) of Azure Storgae SDK for Python as the figure below, then to read these blobs one by one like the sample code to dataframes, finally to concat these dataframes to a single one.
Try to use Azure/azure-storage-fuse to mount a container of Azure Blob Storage to your Linux filesystem, then you just need to follow the document section Reading from Partitioned Datasets to read the Partitioned Dataset locally from Azure Blob Storage.

Difference between Database and File Storage in Parse.com

Based on the FAQ at Parse.com:
What is the difference between database storage and file storage?
Database storage refers to data stored as Parse Objects, which are
limited to 128 KB in size. File storage refers to static assets that
are stored using the Parse File APIs, typically images, documents, and
other types of binary data.
Just want some clarification here:
So the Strings, Arrays etc created are considered as Parse Objects and would fall under the database storage, also the URL of the file will be considered under the database storage since it is a Parse Object. But the actual files itself are considered under File Storage?
Thanks.
Yes. Any file that you upload to Parse goes to the File storage, the rest is stored in the database including the URL of such files.

Mass Export of BLOB data to CSV

I am working with an older Oracle database, I don't know which version of oracle, sorry, and I need to do a mass export of 200,000+ files worth of HTML data stored in BLOBs. I have downloaded and used both Toad and SQLDeveloper (Oracle's own DB GUI tool) and at best I am able to properly extract the HTML for a single row at a time.
Is there a way (query, tool, other GUI, etc...) that I can reliably do a mass export of all the BLOB data on this table to a CSV format?
Thank You.
You can use utl_file built-in package through this you can write blob data to a file.
Refer here.
I found this tool.
It works incredibly well for extracting content of any type out of any sort of LOB to a file type (HTML in this case). Takes about an hour to do 200,000 records though

Reading BLOB data from Oracle database using python

This is not a question of a code, I need to extract some BLOB data from an Oracle database using python script. My question is what are the steps in dealing with BLOB data and how to read as images, videos and text? Since I have no access to the database itself, is it possible to know the type of BLOBs stored if it is pictures, videos or texts? Do I need encoding or decoding in order to tranfer these BLOBs into .jpg, .avi or .txt files ? These are very basic questions but I am new to programming so need some help to find a starting point :)
If you have a pure BLOB in the database, as opposed to, say, an ORDImage that happens to be stored in a BLOB under the covers, the BLOB itself has no idea what sort of binary data it contains. Normally, when the table was designed, a column would be added that would store the data type and/or the file name.

How to store Partitioned data using pig in RC Format?

I was wondering if there is a UDF or something that can store my data in a partitioned fashion in RC Format. I know there is org.apache.pig.piggybank.storage.MultiStorage but it only does it for some compression format. I want to store my data in RC Format but using the same partitioned storage structure that MultiStorage provides.
Thanks,
imtiaz
There is no such solution available either in piggybank or some other alternative. I had faced a similar issue. But dropped the implementation due to some other requirements.Only solution available is to extend the MultiStorage udf to provide RC storage format.
Twitter has open sourced its RC file storage. You can take help from it.
http://grepcode.com/file/repo1.maven.org/maven2/com.twitter.elephantbird/elephant-bird-rcfile/3.0.8/com/twitter/elephantbird/pig/store/RCFilePigStorage.java

Resources