I have an app to create reports with some data and images (min 1 img, max 6). This reports keeps saved on my app, until user sent it to API (which can be done at the same day that he registered a report, or a week later).
But my question is: What's the proper way to store this images (I'm using Realm), is it saving the path (uri) or a base64 string? My current version keeps the base64 for this images (500 ~~ 800 kb img size), and then after my users send his reports to API, I deleted this base64 hash.
I was developing a way to save the path to the image, and then I display it. But image-picker uri returned is temporary. So to do this, I need to copy this file to another place, then save the path. But doing it, I got (for kind of 2 or 3 days) 2x images stored on phone (using memory).
So before I develop all this stuff, I was wondering, will it (copy image to another path then save path) be more performant that save base64 hash (to store at phone), or it shouldn't make much difference?
I try to avoid text only answers; including code is best practice but the question about storing images comes up frequently and it's not really covered in the documentation so I thought it should be addressed at a high level.
Generally speaking, Realm is not a solution for storing blob type data - images, pdf's etc. There are a number of technical reasons for that but most importantly, an image can go well beyond the capacity of a Realm field. Additionally it can significantly impact performance (especially in a sync'ing use case)
If this is a local only app, storing the images on disk in the device and keep a reference to where they are (their path) stored in Realm. That will enable the app to be fast and responsive with a minimal footprint.
If this is a sync'd solution where you want to share images across devices or with other users, there are several cloud based solutions to accommodate image storage and then store a URL to the image in Realm.
One option is part of the MongoDB family of products (which also includes MongoDB Realm) called GridFS. Another option is a solid product we've leveraged for years is called Firebase Cloud Storage.
Now that I've made those statements, I'll backtrack just a bit and refer you to this article Realm Data and Partitioning Strategy Behind the WildAid O-FISH Mobile Apps which is a fantastic article about implementing Realm in a real-world use application and in particular how to deal with images.
In that article, note they do store the images in Realm for a short time. However, one thing they left out of that (which was revealed in a forum post) is that the images are compressed to ensure they don't go above the Realm field size limit.
I am not totally on board with general use of that technique but it works for that specific use case.
One more note: the image sizes mentioned in the question are pretty small (500 ~~ 800 kb img size) and that's a tiny amount of data which would really not have an impact, so storing them in realm as a data object would work fine. The caveat to that is future expansion; if you decide to later store larger images, it would require a complete re-write of the code; so why not plan for that up front.
Related
I have created a Custom Image Recognition collection on IBM Cloud and am using it in my Django website to do the processing. However, I noticed that the response time ranges from 6 to 14 seconds.
I want to reduce this turnaround time. I am already zipping the image file that I sent. So when going through the API reference document here on IBM Cloud I noticed that there is a method called "get_model_file" which download the collection file to a local space.
But no documentation on how this can be used. Anyone who has successfully implemented this? Or am i missing something here?
However, I noticed that the response time ranges from 6 to 14 seconds.
I want to reduce this turnaround time. I am already zipping the image file that I sent.
How many images at at time are you sending in the zip file to the /analyze endpoint? If you are just sending one image at a time, you should not bother zipping it. Also, if you can, you should parallelize your code so that you make 1 request per image, rather than sending, say 6 images in a single zip file. This will reduce the latency.
Using the v4 API, by the way, you should resize your images to no more than 300 pixels in either width or height. In fact, you can "squash" the aspect ratio to square and it will not affect the outcome. The service will do this resizing internally anyhow, but if you do it on the client side, you save network transmission and decoding time.
With a single image at a time, if your resolution is under 300x300 pixels, you should have latency under 1.5 seconds on a typical call, including your network transmission time.
As the documentation states
Currently, the model format is specific to Android apps.
So unless you are creating an Android App then this is not going to work for you.
You probably have two areas of latency. First will be from the browser to your Django app. Second will be from your Django app to the Visual Recognition service. I am not sure where you have hosted the Django app, but if you locate it in the same region (data centre would be even better) you might be able to reduce part of the latency.
I have built a SOLR Index which has the image thumbnail urls that I want to render an image along with the search results. The problem is that those images can run into millions and I think storing the images in index as binary data would make the index humongous.
I am seeking guidance on how to efficiently store those images after rendering them from the URLs , should I use the plain file system and have them rendered by tomcat , or should I use a JCR repository like Apache Jackrabbit ?
Any guidance would be greatly appreciated.
Thank You.
I would evaluate the effective requiriments before finally deciding how to persist the images.
Do you require versioning?
Are you planning to stir eonly the images or additional metadata?
Do you have any requirements in horizontal scaling?
Do you require any image processing or scaling?
Do you need access to the image metatdata?
Do you require additional tooling for managing the images?
Are you willing to invest time in learning an additional technology?
Storing on the file system and making them available by an image sppoler implementation is the most simple way to persist your images.
But if you identify some of the above mentioned requirements (which are typical for a content repo or a dam system), then would end up reinventing the wheel with the filesystem approach.
The other option is using a kind of content repository. A JCR repo like for example Jackrabbit or it's commercial implementation CRX is one option. Alfresco (supports CMIS) would be the another valid.
Features like versioning, post processing (scaling ...), metadata extraction and management belong are supported by both mentioned repository solutions. But this requires you to learn a new technology which can be time consuming. Both mentioned repository technologies can get complex.
If horizontal scaling is a requirement I would consider a commercially supported repository implementations (CRX or Alfresco Enterprise) because the communty releases are lacking this functionality.
Me personally I would really depend any decision on the above mentioned requirements.
I extensively worked with Jackrabbit, CRX and Alfresco CE and EE and personally I would go for the Alfresco as I experienced it to scale better with larger amounts of data.
I'm not aware of a image pooling solution that fits your needs exactly but it shouldn't be to difficult to implement that, apart from the fact that recurring scaling operations may be very resource intensive.
I would go for the following approach if FS is enough for you:
Separate images and thumbnail into two locations.
The images root folder will remain, the thumbnails folder is
temporary.
Create a temporary thumbnail folder for each indexing run.
All thumbnails for that run are stored under that location, scaling
can be achived with i.e ImageMagick.
The temporary thumbnail folder can then easily be dropped as soon as
the next run has been completed.
If you are planning to store millions of images then avoid putting all files in the same directory. Browsing flat hierarchies with two many entries will be a nightmare.
Better create a tree structure by i.e. inverting the current datetime (year/month/day/hour/minute ... 2013/06/01/08/45).
This makes sure that the number of files inside the last folder get's not too big (Alfresco is using the same pattern for storing binary objects on the FS and it has proofen to work nicely).
I'm new to Core Data and I'm working on my first personal iOS app.
I have an entity, lets call it Car, which has a thumbail as well as a gallery of other images associated with it. The data is synced to an online service using ASIHTTPRequest and JSONKit. The app doesn't need to create new Car's, just display them.
The thumbnail could be around 100kB so I may store that as blob data within the Car entity.
However I'm not sure how I should store the other multiple images?
The images would be around 800kB to 1MB each using so storing them in the Core Data store doesn't seem to be recommended.
The only options I can think of are:
Store the url of each photo within another entity CarImage and rely on ASIHTTPRequest's cache.
Create a folder structure and save each image into it's corresponding Car's folder and keep references to the file path in the CarImage entity
Because the data is synced, there is the potential for Car's to be deleted, so images in folders would have to be deleted as well. I can see this getting out of hand pretty quickly.
I would appreciate any advice. Thanks.
I'd take your first option.
Regarding the images that would have to be deleted: isn't that taken care of automatically by ASIHTTPRequest's cache, once they expire? At least that's what I'd expect from a cache...
I'd go with the first option. I've done something similar in the past, though I actually did store the image binary data in Core Data as well. I wouldn't recommend storing the data, though, as this caused problems for me - just rely on ASIHTTPRequest's cache.
I'm building a CMS-type webapp that allows users to enter arbitrary-sized blocks of HTML. These blocks are entered by the user in their admin area and inserted into their template of choice when a page is delivered.
I'm guessing a user is not going to add more than 50-100 blocks and I'm not going to be getting more than 1000 users any time soon.
I was planning on using mySQL's LONGTEXT type to store these but I'm wondering if storing files in a directory will be more performant as the Linux OS will cache them? Given that I'm building for at most (1000 * 100) text blocks is there any reasonable performance worry with using mySQL?
Obviously I will be caching the HTML before delivery so I won't be reading these blocks on every delivery - reads will only occur when someone updates/creates new content.
I could use memcached/other cache/noSQL implementation or some other storage mechanism but I'm focusing on keeping it simple and delivering ASAP so don't want to introduce other stuff that I don't have experience with unless there's a significant performance worry.
Are the blocks of HTML content the only thing you are saving? If so, a file may be easiest.
However, it seems likely that you may want to save other bits of information along with the HTML and be able to query based on those bits of data. For example: date created, date last modified, name of the block, the user(s) who have edited the block.
If this is the case, then a database may be the best way to go. Since you said you do not expect to have many users (at least not a first) I would concentrate on finding the solution that is the fastest / most flexible to program and focus on performance and caching after your website begins to grow in size.
I advise you to use a flat file rather than Mysql to store this kind of data.
Html is more a "file" than a "value information" so it hasn't to be in a DB.
Moreover, you will certainly have better performances.
You can also read this post.
I currently build a CMS system that need to save a lot of pictures per article. I have a lot of questions :-)
I need to show the pictures in a few sizes, with or without watermark. In addition I need to have the original picture too, for archive and admin purpose. What that I think to do right now is to save the pictures in the database, in two versions: 1. the original picture, 2. web-optimized version.
It is really convenient way to save all the images in a table. But does it really good idea? Let say that the database will contain a hundred of thousand pictures, the original pictures size is probably around 3MB. so the db can be easily 100TB size.... Is this really good strategy?
On the other hand, I save a smaller version to each picture. This version need to be shown in a few sizes, with and without watermark. Currently I think to do think to this in on each request. the request will have parameters width, and according to this I can decide the size and the watermark. (I'll cache this work of course). Again, Is this a good strategy? do it really gonna work, or this is very expensive extra work?
Is it really better to save this on the db? I mean each request to article, will need around 50 another requests to its images, and each request required open/close connection to the database.
Technologies that I going to use: .net, sql-server 2008, NHibernate.
The best approach would be storing those images in filesystem and ids on database. Because of performance and maintenance reasons. Backing up and restoring would be much easier on filesystem and pushing the DBMS for such a work is not the best idea, you will need to transfer them from db to application and then push to the client. I just believe that's not it's job. Put a lighttpd daemon or something for image hosting and leave it do its job.
But if you like the idea, since you are going with sql server 2008, you can use FILESTREAM to store your images in your tables. Eventually, it will create files in a storage location that you choose and store the binary data in filesystem while providing transactional features and data integrity, it is a big bonus. Take a look at that option. As I remember, that performs good and the actual database will be much compact.
About the dynamic resizing, I say avoid that. Storage is cheaper than CPU time, just create variety of thumbnails and watermarked versions upon upload time and store them once in somewhere then use when required. Do not perform same operations again and again. You may do that at first request to the resized version, this way it will be easier to add new versions or purging the cache periodically to remove unused files. You will also be able to backup just the original versions.
Putting the images in the database has a couple of advantages. ACID tanscations and backup consistency come to mind. If you absolutely need that then put the images in the database. As you pointed out, this comes with a price: you'll need a huge database infrastructure like machines, licenses, operation team. Each image retrieval is a huge DB I/O effort.
A lot of things will be much easier with only storing metadata in the DB and putting the image blobs on a filesystem.
Two approaches to come to a decison:
What is the killer feature you absolutely (absolutely like in "if I don't have that, the whole thing will not work at all") need from the image-in-database approach? If there is one, go for it
Do a back-of-the-napkin business case, calculating the total cost of the image-in-database approach (project efforts, infrastructure, machine, license, operation) and compare that with an image-in-filesystem approach. That should give some hints on how to proceed.