Store user profile pictures on disk or in the database? - profile

I'm building an asp.net MVC application where users can attach a picture to their profile, but also in other areas of the system like a messaging gadget on the dashboard that displays recent messages etc.
When the user uploads these I am wondering whether it would be better to store them in the database or on disk.
Database advantages
Easy to backup the entire database and keep profile content/images with associated profile/user tables
when I build web services later down the track, they can just pull all the profile related data from one spot(the database)
Filesystem advantages
loading files from disk is probably faster
any other advantages?
Where do other sites store this sort of information? Am I right to be a little concerned about database performance for something like this?
Maybe there would be a way to cache images pulled out from the database for a period of time?
Alternatively, what about the idea of storing these images in the database, but shadow copying them to disk so the web server can load them from there? This would seem to give both the backup and convenience of a Db, whilst giving the speed advantages of files on disk.
Infrastructure in question
The website will be deployed to IIS on windows server 2003 running NTFS file system.
The database will be SQL Server 2008
Summary
Reading around on a lot of related threads here on SO, many people are now trending towards the SQL Server Filestream type. From what I could gather however (I may be wrong), there isn't much benefit when the files are quite small. Filestreaming however looks to greatly improve performance when files are multiple MB's or larger.
As my profile pictures tend to sit around ~5kb I decided to just leave them stored in a filestore in the database as varbinary(max).
In ASP.NET MVC I did see a bit of a performance issue returning FileContentResults for images pulled out of the database like this. So I ended up caching the file on disk when it is read if the location to this file is not found in my application cache.
So I guess I went for a hybrid;
Database storage to make baking up of data easier and files are linked directly to profiles
Shadow copying to disk to allow better caching
At any point I can delete the cache folder on disk, and as the images are re-requested they will be re-copied on first hit and served from the cache there after.

You should store a reference to the files on a database and store the actual files on disk.
This approach is more flexible and easier to scale.
You can have a single database and several servers serving static content. It will be much trickier to have several databases doing that work.
Flickr works this way.
I gave a more detailed answer here, you may find it useful.

Actually your datastore look up with the database may actually be faster depending on the number of images you have, unless you are using highly optimized filesystem engine. Databases are designed for fast lookups and use a LOT more interesting techniques than a file system does.
reiserfs (obsolete) really awesome for lookups, zfs, xfs and NTFS all have fantastic hashing algorithms, linux ext4 looks promising too.
The hit on the system is not going to be any different in terms of block reads. The question is what is faster a query lookup that returns the filename (may be a hash?) which in turn is accessed using a separate open, filesend close? or just dumping the blob out?
There are several things to consider, including network hit, processing hit, distributability etc. If you store stuff in the database, then you can move it. Then again, if you store images on a content delivery service that may be WAY faster since you are not doing any network hits on yrouself.
Think about it, and remember bit of benchmarking never hurt nobody :-) so test it out with your typical dataset size and take into account things like simultaneous queries etc.

Related

How to speed up the TYPO3 Backend?

Given: Each call to a BE module takes several seconds even with a SSD drive. (A well configured setup runs below 1 second for general BE tasks.)
What are likely bottlenecks?
How to check for them?
What options to speed up?
On purpose I don't give a special configuration, but ask for a general checklist, so that the answer is suitable for many people as first entry point.
General tips on performance tuning for TYPO3 can be found here: https://wiki.typo3.org/Performance_tuning
However, in my experience most general performance problems are due to one of a few reasons:
Bad/no caching. Usually this is a problem with one or more extensions (partly) disabling cache. Try disabling all third party extensions and enabling them one by one to see which causes the site to slow down the most. $GLOBALS['TSFE']->set_no_cache() will disable all cache, so you could search for that. USER_INT and COA_INT in TypoScript also disable cache for anything that's configured inside there.
A lot of data. Check the database for any tables containing a lot of data. How many constitutes "a lot", depends on a lot of factors, but generally anything below a million records shouldn't be too much of a problem unless for example you do queries with things like LIKE '%...%' on fields containing a lot of data.
Not enough resources on the server. To fix this, add more memory and/or CPU cores to the server. Or if it's a shared server, reduce the number of sites running on it.
Heavy traffic. No matter how many resources a server has, it will always have a limit to the number of requests it can process in a given time. If this is your problem you will have to look into load balancing and caching servers. If you don't (normally) have a lot of visitors, high traffic can still be caused by robots crawling your site too quickly. These are usually easy to block on IP address in your firewall or webserver configuration.
A slow backend on a server without any other traffic (you're the only one who can access it) rules out 1 (can only cause a slow backend if users are accessing the frontend and causing a high server load) and 4 (no other traffic).
one further aspect you could inspect: in the user record a lot of things are stored, for example the settings you used in the log module.
one setting which could consume a lot of memory (and time to serialize and deserialize) is the state of the pagetree (which pages are expanded/ which are not).
Cleaning the user settings could make the backend faster for this user.
If you have a large page tree and the user has to navigate through many pages the effect will stall. another draw back: you loose all settings as there still is no selective cleaning.
Cannot comment here but need to say: The TSFE-Object does absolutely nothing in the TYPO3 Backend. The Backend is always uncached. The TYPO3-Backend is a standalone module to edit and maintenance the frontend output. There are tons of Google search results that will ignore this fact.
Possible performance bottlenecks are poor written extensions that do rendering or data processing. Hooks to core functions are usually no big deal but rendering of many elements for edit forms (especially in TYPO3s Fluid Template Engine) can cause performance problems.
The Extbase-DBAL-Layer can also cause massive performance problems. The reason is the database model does not know indexes. It' simple but stupid. A SQL-Join on a big table of 2000 records+ will delay the output perceptibly, depending on the data model.
Also TYPO3 Backend does not really depend on the Typoscript-Configuration but in effect to control some output or loaded by extensions, the full parsing of the *.ts files is needed. And this parser is very slow.
If you want to speed things up you need to know what goes wrong. The only way to debug this behaviour is to inspect the runtime with a PHP profiling tool like xdebug because the TYPO3 Framework is very complex. It's using some kind of Doctrine Framework and will load tons of files, by every request. Thus a good configured OpCache is a must.
Most reason the whole thing is slow is because it is poor written. You can confirm that fact by inspecting the runtime.
In addition to what already has been said, put the runtime environment onto your checklist:
Memory:
If heavy IDE and other tools are open at the same time, available memory can become an issue. To check the memory profile, you may start a tool that monitors the memory usage of the machine.
If virtualization is used, check the memory assigned to the box. Try if assigning more memory improves behaviour.
If required and possible spend more memory to your machine. This should not be a bugfix to poorly written code. Bad code can blow up any size of memory.
File access:
TYPO3 reads and writes thousands of files. If you work with a contemporary SSD, this is surprisingly fast. I did measure this. Loading all class files of TYPO3 takes just a fraction of a second.
However this may look different if you do not work with a standard setup. Many factors may slow you down:
USB-Sticks as storage.
Memory cards as storage.
All kind of external storage may be limited due to slow drivers.
Virtualization can become an issue. Again it's a question of drivers.
In doubt test and store your files and DB on a different drive to compere the behaviour.
Routing
The database itself may be fast. A bad routing of your request may still slow you down. Think of firewalls, proxies etc. even on your local machine and specially if virtualisation is used.
Database connection:
I fast database connection is crucial. If the database access is slow TYPO3 can't be fast.
Especially due to Extbase TYPO3 often queries much more data than really required and more often than really required, because a lot of relations are resolved in the PHP layer instead of the DB layer itself. Loading data structures like the root line may cause a lot of ping-pong between the PHP and the DB layer.
I can't give advice, how to measure your DB-connection. You have to as your admin for that. What you always can do is to test and compare with another DB from a completely different environment.
The speed of the database may depend on the type of the database itself. Typically you use MySQL/Maria-DB which should be fast. It also depends on the factors mentioned above, memory, file access and routing.
Strategy:
Even without being and admin and knowing all performance tools, you can always exchange parts of your system and check if matters improve. By this approach you can localise the culprit without being an expert. Once having spotted the culprit, Google may help you to get more information.
When it comes to a clean and performant setup of routing or virtualisation it's still the best idea to ask an experienced admin.
Summary
This is all in addition to what others have already pointed to.
What would be really helpful would be a BE-Plugin, that analyses and measures the environment. May there are some out there I don't know.

What are the size limits for Laravel's file-based caching?

I am a new developer and am trying to implement Laravel's (5.1) caching facility to improve the speed of my app. I started out caching a large DB table that my app constantly references - but it got too large so I have backed away from that and am now 'forever' caching smaller chunks of data - for example, for each page only the portions of that large DB table that are relevant.
I have watched 'Caching Essentials' on Laracasts, done some Googling and had a search in this forum (and Laracasts') but I still have a couple of questions:
I am not totally clear on how the cache size limits work when you are using Laravel's file-based system - is there an overall in-app size limit for the cache or is one limited size-wise only per key and by your server size?
What are the signs you should switch from file-based caching to something like Memcached or Redis - and what are the benefits of using one of those services? Is it the fact that your caching is handled on a different server (thereby lightening the load on your own)? Do you switch over to one of these services when your local, file-based cache gets too big for your server?
My app utilizes several tables that have 3,000-4,000 rows - the data in these tables is constantly referenced and will remain static unless I decide to add new options. I am basically looking for the best way to speed up queries to the data in these tables.
Thanks!
I don't think Laravel imposes any limitations on its file i/o at all - the limitations will be with how much what PHP can read / write to a file at once, or hold in its memory / process at any one time.
It does serialise the data that you cache, and unserialise it when you reload it, so your PHP environment would have to be able to process the entire cache file (which is equivalent to the top level cache key) at once. So, if you are getting cacheduser.firstname, it would have to load the whole cacheduser key from the file, unserialise it, then get the firstname key from that.
I would take the PHP memory limit (classic, i know!) as a first point to investigate if you want to keep down this road.
Caching services like Redis or memcached are bespoke, optimised caching solutions. They take some of the logic and responsibility out of your PHP environment.
They can, for example, retrieve sub-keys from items without having to process the whole thing, so can retrieve part of some cached data in a memory efficient way. So, when you request cacheduser.firstname from redis, it just returns you the firstname attribute.
They have other advantages regarding tagging / clearing out subsets of caches (see [the cache tags Laravel docs] (https://laravel.com/docs/5.4/cache#cache-tags))
Another thing to think about is scaling. If your site is large enough, and is load-balanced across multiple servers, the filesystem caching may be different across those servers, as each server can only check their local filesystem for the cache files. A caching service can be on a different server (many hosts will have a separate redis / memcached services available), so isn't victim to this issue.
Also - as I understand it (and this might be the most important thing), the file cache driver in Laravel is mainly for local development and testing. Although it can work fine for simple applications with basic caching needs, it's not intended for large scalable production environments.
Personally, I develop locally and test with file caching, as i'm only dealing with small amounts of data then, and use redis to cache on production environments.
It doesn't necessarily need to be on a separate server to get the benefits. If you are never going to scale to multiple application servers, then using a caching service on the same server will already be a large improvement to caching large documents.

Is SQLite suitable for use as a read only cache on a web server?

I am currently building a high traffic GIS system which uses python on the web front end. The system is 99% read only. In the interest of performance, I am considering using an externally generated cache of pre-generated read-optimised GIS information and storing in an SQLite database on each individual web server. In short it's going to be used as a distributed read-only cache which doesn't have to hop over the network. The back end OLTP store will be postgreSQL but that will handle less than 1% of the requests.
I have considered using Redis but the dataset is quite large and therefore it will push up the administrative cost and memory cost on the virtual machines this is being hosted on. Memcache is not suitable as it cannot do range queries.
Am I going to hit read-concurrency problems with SQLite doing this?
Is this a sensible approach?
Ok after much research and performance testing, SQLite is suitable for this. It has good request concurrency on static data. SQLite only becomes an issue if you are doing writes as well as heavy reads.
More information here:
http://www.sqlite.org/lockingv3.html
if usage case is just a cache why don't you use something like
http://memcached.org/.
You can find memcached bindings for python in pypi repository.
Another options is that you use materialized views in postgres, this way you will keep things simple and have everything in one place.
http://tech.jonathangardner.net/wiki/PostgreSQL/Materialized_Views

should images come from db or content\Images folder

I am developing a eCommerce website in ASP.NET MVC 3 in C#. Using SQL Server 2008R2. My question is if I have 5 images that I want to show in gridView with thumbnails (e.g. something like Amazon website that gives customers couple of pictures to show) would it be advisory if the images are coming from the database or should I reside in the Content\Images folder? There are quite a few sub-categories in sub-category in my db design. What is the most common suit for a professional developer to follow? Thanks. I know there are few options for third party tools like jquery & Telerik Extensions. So I will use them.
Thanks
From my experience and research it is better to put it in a folder/content structure. Yes, there are security things with opening directories to the public but if you instead upload a file via ftp dynamically the problems are solved. I have heard of horror stories about storing files in database and have seen the issues come up but have resolved them. Basically, it is easier to write to database and there are not the security issues of opening up a directory to public but just make sure to regularly check backups that the files are not corrupt or make sure the data is on a fail over cluster where that will never be a problem.
So summary: Database is fine just regularly check backups by restoring them that they are not corrupt or run as a fail over cluster. Otherwise just go with the typical folder/content structure but use ftp to upload the file so there are no open directories to the public.
For me, the best anwser to this question is this: To BLOB or Not To BLOB: Large Object Storage in a Database or a Filesystem
Sumary: Application designers often face the question of whether to store large objects in a filesystem or in a database. Often this decision is made for application design simplicity. Sometimes, performance measurements are also used. This paper looks at the question of fragmentation – one of the operational issues that can affect the performance and/or manageability of the system as deployed long term. As expected from the common wisdom, objects smaller than 256K are best stored in a database while objects larger than 1M are best stored in the filesystem. Between 256K and 1M, the read:write ratio and rate of object overwrite or replacement are important factors. We used the notion of “storage age” or number of object overwrites as way of normalizing wall clock time. Storage age allows our results or similar such results to be applied across a number of read:write ratios and object replacement rates.

What is the best way to manage user photos for a website?

My question is about displaying thumbnails and storage.
Let's say I have a website where users can upload photos and view them in albums.
How are the photos usually stored in this scenario? Are the images themselves or are the file paths usually stored in the database?
If the photos are large and you want to display thumbnails, is it better to:
save a copy of the image and a reduced size image, only displaying the larger if requested?
use HTML to reduce the size?
It's almost always a bad idea to store images in a database. BLOBs can really slow down a database something fierce. It also limits your ability to spread storage around different drives. When the files are separate, you can even have one or more separate image servers to reduce the load on the main dynamic server. My recommendations are:
In your database table, have columns for both the directory the image resides in and the image name. That way you are free to change where images are stored, round-robin drives, add more storage later and put new images in the new storage, or whatever you want. Storing the path and the filename in separate fields makes it trivial to move images from one directory to another.
You definitely want to generate thumbnail images to reduce your network bandwidth and make your application run faster. However, you can generate the thumbnails on demand, or when the system load is low. If you're on Linux, ImageMagick is wonderful at automated batch resizing of images. It can even resize by a percentage instead of an absolute amount.
Some software such as TikiWiki stores the photos in a database. It then also caches thumbnail sized photos in the database.
Other software stores it in a directory. This is the way Gallery2 operates. I find the directory approach more scaleable. If a different size than the original is requested, typically the app will use ImageMagick to resize the photo, and then store a copy of the resized photo.
Another alternative is to re-upload the photo to a service like S3, and not store the photo locally at all.
This is common question and the basic answer is that it depends. You need to give more information. What database are you planning on using? SQL Server 2008 has some good new features for handling this scenario with FILESTREAM function. Generally I prefer to put them in the database, but if you just stuff them in their without thinking about design and access requirements you could have poor performance as the number of photos increases.
IF you are absolutely positively sure that your web server will always have access to the file system hosting the images, then go that route. Maybe.
However, if at any time you think you might need to, i don't know, create an image server because the hard drive on your web server is running out of space OR that you need to run multiple web servers, then save yourself the trouble and store them in a database. The hard part in storing on a file system is the security requirements of crossing the network.
Also, bear in mind that not all database servers are created equal in this regard. SQL 2008 introduced a FILESTREAM data type which actually stores the images on the local file system while allowing all read / write access through the db server. This has the added benefit of allowing you to run virus scanners on the incoming files while in storage.
Oracle has had some nice file storage facilities for awhile now. MySQL? I don't think I'd want to try, but you might be okay.
As to the second question: save a thumbnail along with the image. This process occurs only once per image and saves on presentation bandwidth. Using HTML to size an image down really does nothing for the client.

Resources