Does Mosaic supports ingesting compressed data?

Does Mosaic supports ingesting compressed data? - azure-blob-storage

We have a scenario of uploading compressed files into Blob container in Microsoft Azure and then read it.
Is it possible in Mosaic to do it and if yes, what is the way to achieve it?
We have files in .gz format.

Yes you can upload and read compressed files in Mosaic through Azure Reader.
Currently, Mosaic supports two compression types - .ZIP & .GZ
To read compressed files in Mosaic's Azure Reader node you can follow below steps -
In Path field, provide the path of the compressed folder as shown in screen shot below
Make the toggle button for Is Compressed is True
Select the compression type - (either .ZIP or .GZ)
In compressed path we will have to provide the file without the compressed extension.
e.g. if the compressed file is ‘ABC.csv.gz’ then in compressed path it would be ‘ABC.csv’
Similarly for files compressed in .zip format, the compressed path will be the path of files within that compressed folder.
e.g. compressed folder is ‘ABC.zip’ then compressed path would be ‘ABC/file.csv’
Select the format of the file and Validate.

Related

How to copy metadata exported from mp3 file (with ffmpeg) int another mp3 file

I have 2 mp3 files with the same content but with different speed and volume level. I would like to copy the metadata from one to the other and only the metadata. I have seen this post, and was able to extract the metadata to a file. But now when I try to copy the metadata to the second file as described in that post, it copies the whole file as well. This then overwrites the audio itself as well as the metadata.
How can I only copy the metadata into the second file without copying the actual audio.

Find compression codec used for an hadoop file

Given a compressed file, written on hadoop platform, in one of the following formats:
Avro
Parquet
SequenceFile
How can I find the compression codec used? Assuming that one of the following compression codecs is used (and there is no file extension in the file name):
Snappy
Gzip (not supported on Avro)
Deflate (not supported on Parquet)

The Java implementation of Parquet includes the parquet-tools utility, providing several commands. See its documentation page for building and getting started. The more detailed descriptions of the individual commands are printed by parquet-tools itself. The command you are looking for is meta. This will show all kinds of metadata, including compressions. You can find an example output here, showing SNAPPY compression.
Please note that the compression algorithm does not have to be the same across the whole file. Different column chunks can use different compressions, therefore there is no single field for the compression codec, but one for each column chunk instead. (A column chunk is the part of a column that belong to one row group.) In practice, however, you will probably find the same compression codec being used for all column chunks.
A similar utility exists for Avro, called avro-tool. I'm not that familiar with it, but it has a getmeta command which should show you the compression codec used.

Gzip compression using boost library

I want a compress write a program which can compress directory and all its files in a .gz file. I have tried using using gzip filter but I dont know how can I add directory and multiple files. Also I would like to uncompress the same.

gzip by itself only compresses a single stream of data with no assumed structure. To archive directories using gzip, it is most commonly combined with tar, which has the ability to compress using gzip built in. I'm sure you have seen those sorts of files, which end in .tar.gz. You can probably find a library that processes those files.

Best image format for/in CUDA image processing

i am new to image-processing in CUDA.
I am currently learning whatever i can about this.
Can anyone tell me what is the appropriate format (extension of image) for storing and accessing image files so that CUDA processing would have the most efficiency.
And y does all the sample cuda programs for image processing use .ppm file format for images.
And can i convert the images in other format to that format.
And how can i access those files (CUDA Code)?

Most image formats are created for efficient exchange of images, ie. on media (hard disk), the internet, etc.
For computation, the most useful representation of an image is usually in some raw, uncompressed format.
CUDA doesn't have any intrinsic functions that are used to manipulate an image in one of the interchange formats (e.g. .jpg, .png, .ppm, etc.) You should use some other library to convert an image in one of the interchange formats to a raw uncompressed format, and then you can operate on it directly in host code or in CUDA device code. Since CUDA doesn't recognize any interchange format, there is no one format that is correct or best to use. It will depend on other requirements you may have.
The sample programs that have used the .ppm format have simply done so for convenience. There are plenty of sample codes out there that use other formats such as .jpg or .bmp to store an image used by a CUDA program.

How to compress or Zip whole folder using GZipStream

Any idea how I can do this? I am able to compress a single file.

You cannot GZip an entire folder directly, since GZip operates on a single stream of data. You will first have to turn the folder into such a stream.
One way to do this would be to create a Tar archive from the directory. This will give you a single stream to work on, and since the Tar format is not compressed, GZip will usually achieve good compression ratios on Tar files.

GZip doesn't support multiple files. They have to be combined in another container first like Tar. If you need full Zip support for C# use this library:
http://www.icsharpcode.net/opensource/sharpziplib/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Does Mosaic supports ingesting compressed data? - azure-blob-storage

We have a scenario of uploading compressed files into Blob container in Microsoft Azure and then read it. Is it possible in Mosaic to do it and if yes, what is the way to achieve it? We have files in .gz format.

Related

How to copy metadata exported from mp3 file (with ffmpeg) int another mp3 file

Find compression codec used for an hadoop file

Gzip compression using boost library

Best image format for/in CUDA image processing

How to compress or Zip whole folder using GZipStream

Categories

Resources