should images come from db or content\Images folder - asp.net-mvc-3

I am developing a eCommerce website in ASP.NET MVC 3 in C#. Using SQL Server 2008R2. My question is if I have 5 images that I want to show in gridView with thumbnails (e.g. something like Amazon website that gives customers couple of pictures to show) would it be advisory if the images are coming from the database or should I reside in the Content\Images folder? There are quite a few sub-categories in sub-category in my db design. What is the most common suit for a professional developer to follow? Thanks. I know there are few options for third party tools like jquery & Telerik Extensions. So I will use them.
Thanks

From my experience and research it is better to put it in a folder/content structure. Yes, there are security things with opening directories to the public but if you instead upload a file via ftp dynamically the problems are solved. I have heard of horror stories about storing files in database and have seen the issues come up but have resolved them. Basically, it is easier to write to database and there are not the security issues of opening up a directory to public but just make sure to regularly check backups that the files are not corrupt or make sure the data is on a fail over cluster where that will never be a problem.
So summary: Database is fine just regularly check backups by restoring them that they are not corrupt or run as a fail over cluster. Otherwise just go with the typical folder/content structure but use ftp to upload the file so there are no open directories to the public.

For me, the best anwser to this question is this: To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
Sumary: Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation – one of the operational issues that can affect the performance and/or manageability of the system as deployed long term. As expected from the common wisdom, objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors. We used the notion of “storage age” or number of object overwrites as way of normalizing wall clock time. Storage age allows our results or similar such results to be applied across a number of read:write ratios and object replacement rates.

Related

store images locally vs cloudinary vs s3

The settings:
Blog with posts, buily with Laravel, where:
Every post can have max of 1 image (nullable).
Max posts in the blog is 1000. Let's assume there are 1000 posts for the discussion.
Every post has a comment section. Where registered users can comment and include an image in their comment. Lets assume every post has 2 images in the comments.
So in total it counts to 3000 images* that need to be stored (and resized I guess), presented, and etc.
This is the ideal amount in the long range, I'm not looking for a "scaleable" solution, since there is not going to be a crazy exponential growth.
*In reality for the time being it is less, and I assume that for these amounts of media files it doesn't really matter if its 1000/1500/2000 or 3000.. Correct me if that's wrong.
Few extra things to note:
I'm hosting it in shared hosting (I can store up to 300k files).
I want it to be secured, so no malicious file is uploaded in the cover of an image file.
I'm looking for a budget solution (so if s3 will start charging hard after 12 months it makes it not relevant), preferably free (.
So the dilemma is between storing all images locally in Storage folder (manipulating images with some Laravel package). Other possibility is cloudinary, which I don't know much about, just that it enables to store/manipulate/backup/use their api to present the images I stored there.
If I choose to do it locally - is it safe to store a user uploaded image locally? how do I make sure it's not malware in disguise of an image file?
With this amount of images/content can it cause performance issues in the shared hosting when storing locally?
What would be the advantages of using cloudinary for me?
Thanks.
Cloudinary can actually help a lot in this case.
Instead of storing the resources locally and writing something up to manipulate them, you could integrate Cloudinary in the project.
This would free up server space. Storing images locally may or may not impact performance, depending on the architecture, but freeing server resources is always a good practice.
Also, manipulation and delivery of images could be done on the fly when first requested (or eagerly before they are requested if you want to) by a simple API call. So you don't need to make up something new, but leverage already existing API.
Cloudinary also has a fully-featured free tier that you could use. If you don't expect exponential growth at the moment, that tier would be more than enough for the project
Full disclosure: I'm currently working at Cloudinary, (but the above still holds :) ).

When using asp.net MVC core + EF core + ability to encrypt files. What will be the differences between Blob, FileStream & File System, to manage files

I am working on an asp.net core mvc web application, the web application is a document management workflow. where inside each of the workflow steps users can upload documents, as follow:-
users can upload documents with the following restriction; a file can not exceed 5 MB + all the documents inside a workflow can not exceed 50 MB, unless admin approves it. they can upload as many documents as they want.
we will have lot of views which will show the step and all its documents attached to it, and users can chose to download the documents.
we can have unlimited number of workflows. as the more users register with our application the more workflow will be created.
certain files can be marked as confidential, so they should be encrypted when storing them either inside the database or inside the file system.
we are planning to use EF core as the data access layer for our web application + SQL server 2016 or 2017.
now my question is how we should manage our files, where i found these 3 approaches.
Blob.
FileStream
File system.
now the first approach, will allow us to encrypt the files inside the database + will work with EF. but it will have a huge drawback on performance, since opening a file or querying the files from database means they will be loaded inside the hosting server memory. so since we are seeking for an extensible approach, so i think this approach will not work for us since it is less scalable.
Second approach. will have better performance compared to first approach (Blob), but FileStream are not supported with EF + does not allow encryption. so we have to exclude this also.
third approach. of storing the files inside a folder which have the workflow ID + store the link to the file/folder inside the DB. will allow us to encrypt the files + will work with EF. and have a better performance compared to Blob (not sure if this is valid for FileStream). the only drawback, is that we can not achieve Atomic-ity between the files and their related records inside the database. but with adding some code we can handle this by our-self. for example deleting a database record will delete all its documents inside the folder, and we can add some background jobs to make sure all the documents have database records, other wise to delete the documents..
so based on the above i found that the third approach is the best fit for our need? so can anyone advice on this please? are my assumption correct? and is there a fourth appraoch or a hybrid appraoch that can be a better fit for us?
Although modern RDBMS have been optimised for data storage with the perks of integrity and atomicity, databases should be considered the least most alternative (StackOverflow posts like this and this shall corroborate the above) and therefore the third option mentioned or an improvement thereof shall be the vote.
For instance, a potential improvement would be to store the files renamed to a hash of the content and database the hash which shall eliminate all OS restrictions on subdirectories/files, filenames, and paths. Moreover, with a well structured directory layout duplicates could be filtered out.
The User-defined Database Functions shall aid in achieving atomicity which will efface the need of background jobs. An excellent guide on UDFs particularly for the use of accessing filesytem and invoking an executable can be found here.

Core Data cloud sync - need help with logic

I'm in the middle of brainstorming a cloud sync solution for a Core Data app that I am currently developing. I'm planning to open source the code for this once its done, for anyone to use with their Core Data apps, so input from the community on how this system should work is much appreciated :-) Here's what I'm thinking:
Server Side
Storage Provider
As with all cloud sync systems, storage is a major piece of the puzzle. There are many ways to handle this. I could set up my own server for storage, or use a service like Amazon S3, but because I'm starting out with $0 capital, at this moment, a paid storage solution isn't a viable option. After some thought, I decided to settle with Dropbox (an already well established cloud sync application and storage provider). The pros of using Dropbox are:
It's free (for a limited amount of space)
In addition to being a storage service, it also handles cloud sync
They recently released an Objective-C SDK which makes it much easier to interface with it in Mac and iPhone apps
In case I decide to switch to a different storage provider in the future, I intend to add "services" to this cloud sync framework, basically allowing anyone to create a service class to interface with their choice of storage provider, which can then simply be plugged into the framework.
Storage Structure
This is a really difficult part to figure out, so I need as much input as I can here. I've been thinking about a structure like this:
CloudSyncFramework
======> [app name]
==========> devices
=============> (device id)
================> deviceinfo
================> changeset
==========> entities
=============> (entity name)
================> (object id)
A quick explanation of this structure:
The master "CloudSyncFramework" (name undecided) folder will contain separate folders for each app that uses the framework
Each app folder contains a devices folder and an entities folder
The devices folder will contain a folder for each device that is registered with the account. The device folder will be named according to the device ID, obtained using something like [[UIDevice currentDevice] uniqueIdentifier] (on iOS) or a serial number (on Mac OS).
Each device folder contains two files: deviceinfo and changeset. deviceinfo contains information about the device (e.g. OS version, last sync date, model, etc.) and the changeset file contains information about objects that have changed since the device last synchronized. Both files will just be simple NSDictionaries archived into files using NSKeyedArchiver.
Each Core Data entity has a subfolder under the entities folder
Under each entity folder, every object that belongs to that entity will have a separate file. This file will contain a JSON dictionary with the key-value pairs.
Simultaneous Sync
This is one of the areas where I am almost completely clueless. How would I handle 2 devices connecting and syncing with the cloud at the same time? There seems to be a high risk of things getting out of sync here, or even data corruption.
Handling migrations
Once again, another clueless area here. How would I handle migrations of the Core Data managed object model? The easiest thing to do here seems to be just to wipe the cloud data store clean and upload a new copy of the data from a device which has undergone the migration process, but this seems somewhat risky, and there may be a better way.
Client Side
Converting NSManagedObjects into JSON
Converting attributes into JSON isn't a very hard task (theres lots of code for it floating around the web). Relationships are the key problem here. In this stackoverflow post, Marcus Zarra posts code in which the relationship objects themselves are added to the JSON dictionary. However, he mentions that this can cause an infinite loop depending on the structure of the model, and I'm not sure if this would work with my method, because I store each object as an individual file.
I've been trying to find a way to get an ID as a string for an NSManagedObject. Then I could save relationships in JSON as an array of IDs. The closest thing I found was [[managedObject objectID] URIRepresentation], but this isn't really an ID for an object, its more of a location for the object in the persistent store, and I don't know if its concrete enough to use as a reference for an object.
I suppose I could generate a UUID string for each object and save it as an attribute, but I'm open for suggestions.
Syncing changes to the cloud
The first (and still best) solution that popped into my head for this was to listen for the NSManagedObjectContextObjectsDidChangeNotification to get a list of changed objects, then update/delete/insert those objects in the cloud data store. After the changes have been saved, I would need to update the changeset file for every other registered device to reflect the newly changed objects.
One problem that comes up here is, how would I handle a failed or interrupted sync?. One idea I have is to first push changes to a temporary directory on the cloud, then once that has been confirmed as successful, to merge it with the master data on the cloud so that an interruption in the middle of the sync won't corrupt data. Then I would save records of the objects that need to be updated in the cloud into a plist file or something, to be pushed during the next time the app is connected to the internet.
Retrieving changed objects
This is fairly simple, the device downloads its changeset file, figures out which objects need to be updated/inserted/deleted, then acts accordingly.
And that sums up my thoughts for the logic that this system will use :-) Any insight, suggestions, answers to problems, etc. is greatly appreciated.
UPDATE
After lots of thinking, and reading TechZens suggestions, I have come up with some modifications to my concept.
The largest change I've thought up is to make each device have a separate data store in the cloud. Basically, every time the managed object context saves (thanks TechZen), it will upload the changes to that device's data store. After those changes are updated, it will create a "changeset" file with change details, and save it into the changeset folders of the OTHER devices that are using the application. When the other devices connect to sync, they will go through the changeset folder and apply each changeset to the local data store, then update their respective data stores in the cloud as well.
Now, if a new device is registered with the account, it will find the newest copy of the data out of all the devices and download that for use as its local storage. This solves the problem of simultaneous sync and reduces the chances for data corruption because there is no "central" data store, each devices touches only its data and just updates changes rather than every device accessing and modifying the same data at the same time.
There's some obvious conflict situations to deal with, mainly in relation to deleting objects. If a changeset is downloading instructing the app to delete an object that is currently being edited, etc. there needs to be ways to deal with this.
You want to look at this pessimistic take on cloud sync: Why Cloud Sync Will Never Work.
It covers a lot of the issues that you are wrestling with. Many of them are largely intractable.
It is very, very, very difficult to synchronize information period. Adding in different devices, different operating systems, different data structures, etc snowballs the complexity often fatally. People have been working on variants of this problem since the 70s and things really haven't improve much.
The fundamental problem is that if you leave the system flexible and customizable, then the complexity of synchronizing all the variations explodes exponentially as a function of the number of customization. If you make it rigid, you can sync but you are limited in what you can sync.
How would I handle 2 devices
connecting and syncing with the cloud
at the same time?
If you figure that out, you will be rich. It's a big issue for current cloud sync providers. They real problem here is that your not "syncing" your merging. Software sucks at merging because its very hard to establish a predefined rule set to describe all the possible merges.
The simplest system is to establish either a canonical device or a device hierarchy such that the system always knows which input to choose. This however, destroys flexibility.
How would I handle migrations of the
Core Data managed object model?
The migration of the Core Data model is largely irrelevant to the server. That's something that Core Data manages internally to itself. Model migration updates the model i.e. the entity graph, not the actual data.
Converting NSManagedObjects into JSON
Modeling relationships is hard especially with tools that don't support it as easily as Core Data does. However, the URI of a permanent managed object ID is supposed to serve as a UUID that nails the object down to a specific location in a specific store on a specific device. It's not technically guaranteed to be universally unique but its close enough for all practical purposes.
Syncing changes to the cloud
I think you're confusing implementation details of Core Data with the cloud itself. If you use NSManagedObjectContextObjectsDidChangeNotification you will evoke network traffic every time the observed context changes regardless of whether those changes are persisted or not. Depending on the app, this could drive connections thousands of times in a few minutes. Instead, you only want to sync when context is saved at the most.
One problem that comes up here is, how
would I handle a failed or interrupted
sync?
You don't commit changes until the sync completes. This is a big problem and leads to corrupt data. Again, you can have flexibility, complexity and fragility or inflexibility, simplicity and robustness.
Retrieving changed objects: This is
fairly simple, the device downloads
its changeset file, figures out which
objects need to be
updated/inserted/deleted, then acts
accordingly
It's only simple if you have an inflexible data structure. Describing changes to a flexible data structure is a nightmare.
Not sure if I have helped any. None of the problems have elegant solutions. Most designer end up with rigidity and/or slow, brute force iterative merging.
Take a serious look at RestKit.
It is an open source project that aims to help with integrating iOS apps with cloud data, including but not limited to the scenario where there is a core-data model for that data on the client.
I have recently started to use it in one of my projects, and found it to be quite useful. In the core-data scenario, you implement declarative mappings between your data model and the content you GET from and POST to the server, and it takes care of things like injecting objects from the cloud into your client model, posting new objects to the server and incorporating server-generated objects IDs into your client-side model, doing all of this in a background thread and taking care of all the core-data context threading issues and so on.
RestKit by no means is a mature product, but is has a fairly good foundation and quite a few things that can use help from other contributors. Especially, if your goal is to create an open source solution, it would be great to contribute and improve something like this rather than re-invent a new solution. Unless of course, your see serious differences between what you have in mind and other existing solutions :-)
Since this post was current, there are several new options available. It is possible to develop a solution, and there are apps shipping with these solutions.
Here is a short list of the main Core Data sync options:
Apple's native Core Data/iCloud sync. (Had a rocky start. Seems better now.)
TICDS
Wasabi Sync, a paid service.
Simperium (Seems abandoned.)
ParcelKit with Dropbox Datastore API
Ensembles, the most recent. (Disclosure: I am the founder of the project)
It's like Apple answered my question for me with the announcement of the iCloud SDKs, which come complete with Core Data integration. Win!

Store user profile pictures on disk or in the database?

I'm building an asp.net MVC application where users can attach a picture to their profile, but also in other areas of the system like a messaging gadget on the dashboard that displays recent messages etc.
When the user uploads these I am wondering whether it would be better to store them in the database or on disk.
Database advantages
Easy to backup the entire database and keep profile content/images with associated profile/user tables
when I build web services later down the track, they can just pull all the profile related data from one spot(the database)
Filesystem advantages
loading files from disk is probably faster
any other advantages?
Where do other sites store this sort of information? Am I right to be a little concerned about database performance for something like this?
Maybe there would be a way to cache images pulled out from the database for a period of time?
Alternatively, what about the idea of storing these images in the database, but shadow copying them to disk so the web server can load them from there? This would seem to give both the backup and convenience of a Db, whilst giving the speed advantages of files on disk.
Infrastructure in question
The website will be deployed to IIS on windows server 2003 running NTFS file system.
The database will be SQL Server 2008
Summary
Reading around on a lot of related threads here on SO, many people are now trending towards the SQL Server Filestream type. From what I could gather however (I may be wrong), there isn't much benefit when the files are quite small. Filestreaming however looks to greatly improve performance when files are multiple MB's or larger.
As my profile pictures tend to sit around ~5kb I decided to just leave them stored in a filestore in the database as varbinary(max).
In ASP.NET MVC I did see a bit of a performance issue returning FileContentResults for images pulled out of the database like this. So I ended up caching the file on disk when it is read if the location to this file is not found in my application cache.
So I guess I went for a hybrid;
Database storage to make baking up of data easier and files are linked directly to profiles
Shadow copying to disk to allow better caching
At any point I can delete the cache folder on disk, and as the images are re-requested they will be re-copied on first hit and served from the cache there after.
You should store a reference to the files on a database and store the actual files on disk.
This approach is more flexible and easier to scale.
You can have a single database and several servers serving static content. It will be much trickier to have several databases doing that work.
Flickr works this way.
I gave a more detailed answer here, you may find it useful.
Actually your datastore look up with the database may actually be faster depending on the number of images you have, unless you are using highly optimized filesystem engine. Databases are designed for fast lookups and use a LOT more interesting techniques than a file system does.
reiserfs (obsolete) really awesome for lookups, zfs, xfs and NTFS all have fantastic hashing algorithms, linux ext4 looks promising too.
The hit on the system is not going to be any different in terms of block reads. The question is what is faster a query lookup that returns the filename (may be a hash?) which in turn is accessed using a separate open, filesend close? or just dumping the blob out?
There are several things to consider, including network hit, processing hit, distributability etc. If you store stuff in the database, then you can move it. Then again, if you store images on a content delivery service that may be WAY faster since you are not doing any network hits on yrouself.
Think about it, and remember bit of benchmarking never hurt nobody :-) so test it out with your typical dataset size and take into account things like simultaneous queries etc.

What is the best way to manage user photos for a website?

My question is about displaying thumbnails and storage.
Let's say I have a website where users can upload photos and view them in albums.
How are the photos usually stored in this scenario? Are the images themselves or are the file paths usually stored in the database?
If the photos are large and you want to display thumbnails, is it better to:
save a copy of the image and a reduced size image, only displaying the larger if requested?
use HTML to reduce the size?
It's almost always a bad idea to store images in a database. BLOBs can really slow down a database something fierce. It also limits your ability to spread storage around different drives. When the files are separate, you can even have one or more separate image servers to reduce the load on the main dynamic server. My recommendations are:
In your database table, have columns for both the directory the image resides in and the image name. That way you are free to change where images are stored, round-robin drives, add more storage later and put new images in the new storage, or whatever you want. Storing the path and the filename in separate fields makes it trivial to move images from one directory to another.
You definitely want to generate thumbnail images to reduce your network bandwidth and make your application run faster. However, you can generate the thumbnails on demand, or when the system load is low. If you're on Linux, ImageMagick is wonderful at automated batch resizing of images. It can even resize by a percentage instead of an absolute amount.
Some software such as TikiWiki stores the photos in a database. It then also caches thumbnail sized photos in the database.
Other software stores it in a directory. This is the way Gallery2 operates. I find the directory approach more scaleable. If a different size than the original is requested, typically the app will use ImageMagick to resize the photo, and then store a copy of the resized photo.
Another alternative is to re-upload the photo to a service like S3, and not store the photo locally at all.
This is common question and the basic answer is that it depends. You need to give more information. What database are you planning on using? SQL Server 2008 has some good new features for handling this scenario with FILESTREAM function. Generally I prefer to put them in the database, but if you just stuff them in their without thinking about design and access requirements you could have poor performance as the number of photos increases.
IF you are absolutely positively sure that your web server will always have access to the file system hosting the images, then go that route. Maybe.
However, if at any time you think you might need to, i don't know, create an image server because the hard drive on your web server is running out of space OR that you need to run multiple web servers, then save yourself the trouble and store them in a database. The hard part in storing on a file system is the security requirements of crossing the network.
Also, bear in mind that not all database servers are created equal in this regard. SQL 2008 introduced a FILESTREAM data type which actually stores the images on the local file system while allowing all read / write access through the db server. This has the added benefit of allowing you to run virus scanners on the incoming files while in storage.
Oracle has had some nice file storage facilities for awhile now. MySQL? I don't think I'd want to try, but you might be okay.
As to the second question: save a thumbnail along with the image. This process occurs only once per image and saves on presentation bandwidth. Using HTML to size an image down really does nothing for the client.

Resources