I'd like to determine how much disk space my Azure blobs are using via Azure's Java API. Basically, I'd like something similar to Unix's df command:
>df
Filesystem 1K-blocks Used Available Use% Mounted on
C:/Tools/cygwin64 248717308 217102536 31614772 88% /
I've tried a variety of things, hoping that the information I want is in the CloudBlobContainer's metadata or properties, but apparently not. I've run the following code and examined the various variables in the debugger, but haven't seen anything close to what I'm looking for.
CloudBlobContainer container = ...
try {
AccountInformation accountInfo = container.downloadAccountInfo();
container.downloadAttributes();
HashMap<String, String> metadata = container.getMetadata();
BlobContainerProperties properties = container.getProperties();
String string = metadata.toString();
} catch (StorageException e) { // ...
I'm hoping I don't have to recursively process all of the blobs in the container. Is there a better way?
For the individual blob container, the approach you're using is the only approach available today though I must say that it is not very efficient as it will only account for base blobs and not blob snapshots and versions. Furthermore if you're using page blobs then it will report the total capacity of those page blobs instead of occupied bytes.
However if you want to get the storage size for the entire storage account, one approach you can take a look at is Azure Storage Metrics. One of the metric available there is BlobCapacity which will tell you the total number of bytes occupied by all blobs in your storage account. You can learn more about the available metrics here: https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/metrics-supported#microsoftstoragestorageaccountsblobservices.
Other option would be to look at consumption data through billing API. It's not that straightforward but it will give you the most accurate numbers because the numbers ultimately translate to your Azure bill. The REST API operation you would want to call is Usage Details - List.
Related
I just started learning microstream. After going through the examples published to microstream github repository, I wanted to test its performance with an application that deals with more data.
Application source code is available here.
Instructions to run the application and the problems I faced are available here
To summarize, below are my observations
While loading a file with 2.8+ million records, processing takes 5 minutes
While calculating statistics based on loaded data, application fails with an OutOfMemoryError
Why is microstream trying to load all data (4 GB) into memory? Am I doing something wrong?
MicroStream is not like a traditional database and starts from the concept that all data are in memory. And an Object graph can be stored to disk (or other media) when you store this through the StorageManager.
In your case, all data are in 1 list and thus when accessing this list it reads all records from the disk. The Lazy reference isn't useful how you have used it since it just handles the access to the one list with all data.
Some optimizations that you can introduce.
Split the data based on vendorId, or day using a Map<String, Lazy<List>>
When a Map value is 'processed' removed it from the memory again by clearing the lazy reference. https://docs.microstream.one/manual/5.0/storage/loading-data/lazy-loading/clearing-lazy-references.html
Increase the number of Channels to optimize the reading and writing the data. see https://docs.microstream.one/manual/5.0/storage/configuration/using-channels.html
Don't store the object graph every 10000 lines but just at the end of the loading.
Hope this helps you solve the issues you have at the moment
I want to save Kedro memory dataset in azure as a file and still want to have it in memory as my pipeline will be using this later in the pipeline. Is this possible in Kedro. I tried to look at Transcoding datasets but looks like not possible. Is there any other way to acheive this?
This may be a good opportunity to use CachedDataSet this allows you to wrap any other dataset, but once it's read into memory - make it available to downstream nodes without re-performing the IO operations.
I would try explicitly saving the dataset to Azure as part of your node logic, i.e. with catalog.save(). Then you can feed the dataset to downstream nodes in memory using the standard node inputs and outputs.
I have an app to create reports with some data and images (min 1 img, max 6). This reports keeps saved on my app, until user sent it to API (which can be done at the same day that he registered a report, or a week later).
But my question is: What's the proper way to store this images (I'm using Realm), is it saving the path (uri) or a base64 string? My current version keeps the base64 for this images (500 ~~ 800 kb img size), and then after my users send his reports to API, I deleted this base64 hash.
I was developing a way to save the path to the image, and then I display it. But image-picker uri returned is temporary. So to do this, I need to copy this file to another place, then save the path. But doing it, I got (for kind of 2 or 3 days) 2x images stored on phone (using memory).
So before I develop all this stuff, I was wondering, will it (copy image to another path then save path) be more performant that save base64 hash (to store at phone), or it shouldn't make much difference?
I try to avoid text only answers; including code is best practice but the question about storing images comes up frequently and it's not really covered in the documentation so I thought it should be addressed at a high level.
Generally speaking, Realm is not a solution for storing blob type data - images, pdf's etc. There are a number of technical reasons for that but most importantly, an image can go well beyond the capacity of a Realm field. Additionally it can significantly impact performance (especially in a sync'ing use case)
If this is a local only app, storing the images on disk in the device and keep a reference to where they are (their path) stored in Realm. That will enable the app to be fast and responsive with a minimal footprint.
If this is a sync'd solution where you want to share images across devices or with other users, there are several cloud based solutions to accommodate image storage and then store a URL to the image in Realm.
One option is part of the MongoDB family of products (which also includes MongoDB Realm) called GridFS. Another option is a solid product we've leveraged for years is called Firebase Cloud Storage.
Now that I've made those statements, I'll backtrack just a bit and refer you to this article Realm Data and Partitioning Strategy Behind the WildAid O-FISH Mobile Apps which is a fantastic article about implementing Realm in a real-world use application and in particular how to deal with images.
In that article, note they do store the images in Realm for a short time. However, one thing they left out of that (which was revealed in a forum post) is that the images are compressed to ensure they don't go above the Realm field size limit.
I am not totally on board with general use of that technique but it works for that specific use case.
One more note: the image sizes mentioned in the question are pretty small (500 ~~ 800 kb img size) and that's a tiny amount of data which would really not have an impact, so storing them in realm as a data object would work fine. The caveat to that is future expansion; if you decide to later store larger images, it would require a complete re-write of the code; so why not plan for that up front.
It might be really simple but I cannot figure it out. I made a little script that is able to upload some photos/video using rest API.
Since they all count towards the storage quota (15gb free) I would like to query the remaining free space.
How do I do this?
You can retrieve your storage quota with the method About:get, setting the query parameter fields to storageQuota
This will give you the following information:
Your remaining quota can be calculated as "limit" - "usage", in my case around 1,5 GB which is in agreement with the information I can see from the user interface in Google Drive and Google Photos.
I would like to be able to display a disk space usage breakdown chart similar to the one used in the System Information app built into Mac OS X (see image below). I've searched but have been unable to find an API which returns any detailed breakdown. The best I can find is the total disk space used.
As far as I can tell, the data in the screenshot (which actually looks incorrect in this example) is not calculated by sizing the default Music, Movies, Photos and Application folders. It does seem to add up the data used by specific file types.
Perhaps they are using the Metadata APIs and customizing the search a bit?
That's what I had used in the past to get a breakdown of certain types...
https://developer.apple.com/library/mac/documentation/Carbon/Conceptual/SpotlightQuery/Concepts/Introduction.html