Performance in tables with binary data/images - performance

I made a people maintenance screen. The client want me to store the photo of every person in the database, and I made it without problems. I have a separate table for the images with two fields, Id_person,and Image.
I'm a little worried because it's the first time that i work with images in database. will I have problems of performance when the table grows beyond 1000/5000 images? I suppose that the size of every image will make a difference. I'm sure that I will need to control that the user don't save very big images in the Database.
What would be a good size limit? The client only need pics of the face, but I'm sure that someone will try to make the pics with a "last model" camera in full quality ;)
Thanks.

It's usually preferred to keep a folder of images and the DB just references that folder. Ideally, each person has a unique ID and the files in the "images" folder match that ID.
If you really want to store the binary data directly, you can get a reasonable quality photo in 8KB of a JPEG (approx 250x250 pix # 25% quality). Of course, this would be unacceptable for printing, but is fine for identification.
Only you will know if you can accept an additional 8KB per row in your database server.

If you absoultely MUST do it this way, I would say limit it to just a few kilobytes each. However, every database admin in the world will probably tell you that blobing images into a database field is a very, very bad idea. Most noticably you will see the performance decrease drastically when the database file grows beyond two gigabytes in size.
I would prefer to do as jheddings said and have a folder with each person's ID be the file name and just use a standard .jpg or something after that on a network share so all computers using the app can access the images.
Some find that simply using the ID isn't good enough incase the photo needs to be deleted or archived, in which case they will put a NVARCHAR(MAX) field into their database and store the network file path to the image instead of the actual image.
I would only blob the image if your customer absolutely cannot have a network share path.

as long as it is in separate table with ID|BLOB only there shouldn't any performance issues fetching that photo, but on the other side i prefer keeping in DB only references to files on hdd (or even better if its only user photo you dont realy need a reference because user with ID 1 goes to /images/1.jpg)

Related

Create static content with images and videos and show it in my spring-boot application

I wrote some basic blog system, which based on spring boot.
I'm trying to figure out, how can I create posts with videos and images, without the need to editing everything using HTML.
Right now, I am saving my blog posts in DB as plain text.
Is it possible to create content combined with text, images and videos , and saving this "content" as one row in my DB-Table, without creating connections between different tables?
Many thanks in advance.
Images and Videos are heavy content and storing them in database could be a costly affair, until you are developing application for research purpose. Also querying it from database and serving it over the network can impact your application performance.
If you want to store it in a single row that can be done as well using database BLOB object. But i would suggest to have 2 different tables. One containing the BLOB object of Image and Videos and other is your usual table containing blog as text and primary key of of BLOB table.
If you want to take your solution to go live, better use image-videos hosting servers because of following factors
Saves your database cost
Ensures 24x7 availability
Application performance is faster as these are hosted independent of application
Videos can directly be iframed i.e. you do not need to query complete MBs of record and serve over network
A strict answer to your question, yes, you can use BLOBs to store the video/images in the database. Think about them as a column that contains bytes of video or image.
For school/cases where you have a very small amount of videos/images its probably OK. However if you're building a real application, then don't do it :)
Every DBA will raise a bunch of concerns "why do not use Blobs".
So more realistic approach would be to find some "file-system" like storage (but distributed) style S3 in AWS, hardrive on server if you're not on cloud, etc.
Then store that big image / video there and get an identifier (like path to it if we're talking about the harddrive) and store that identifier in the database along with the metadata that you're already store probably (like blogPostId, type of file, etc.)
Once the application become more "mature" - you can switch the "provider" - Grow as you go. There are even cloud storages designed especially for images (like Cloudinary).

Transform data before adding to cache system or after when reading it

I have this situation, which I don't know which could fit better.
I have this solution where I search for soccer players, I only have their names and teams, but when a user comes to my website and clicks on the player I will use detailed information from the player that I get from various external providers (usually based by country).
I know which external provider to use when a call is done, and I pay to the external providers each time I grab data, so to mitigate this, I will try to get the less times possible, so I grab once a user clicks on the player info, and next time if it's in my database cache I will show my cache info. After 10 days I will grab again for the specific player form the external provider as I want the info to be somehow updated.
I will need to transform different providers data that come, usually, as JSON in my own structure so I can handle it the right way, I have my own object structure, so the fields coming from the external providers fall/map/transform in my code always with the same naming and structure..
So, my problem is to decide when should I map/transform data coming form the providers.
I grab data from the provider, I transform it to my JSON structure and record/keep it in the database cache system this once with my main structure, and in my solution code all I need is, everytime a user clicks on the soccer player details I get from the database cache this JSON field and convert it directly to object I know how to use.
I grab data from the provider, keep it as is in my database cache system, and in my solution code everytime someone clicks to get the soccer player detail info I get the JSON record from my database cache, I transform it for my naming and structure, and convert it to object
Notes:
- this is a cache database, records won't be kept forever, if in a call I see the record have more then 10 days I will get new data from the apropriate external provider
Deciding the layer to cache data is an art form all its own. The higher the layer you cache data, the more performant it will be (less reprocessing needed), but the lower the re-use potential will be (different parts of the application may use the same cache, and find value if it hasn’t been transformed too much).
Yours is another case of this. If you store it as the provider provides it, and you need to change the way you transform it, you won’t have to pay to re-retrieve it. If on the other hand, you store it as you need it now, you may have to discard it all if you decide to change the transformation method.
Like all architectural design decisions, it's all about trade-offs. You have to decide what is more important to you and your application.

Is it still best to store images and files on a file system or non-RDBMS

I am working on a system which will store user's picture and in the future some soft documents as well.
Number of users: 4000+
Transcripts and other documents per user: 10 MB
Total system requirement in first year: 40 GB
Additional Increment Each year: 10%
Reduction due to archiving Each year: 10%
Saving locally on Ubuntu Linux system without any fancy RAIDS.
Using MySQL community edition for application.
Simultaneous Users: 10 to 20
Documents are for historical purposes and will not be accessed frequently.
I always thought it is cumbersome to store in a RDBMS due to the multiple layers to access etc. However, since we use key/value pair in nonRDBMS databases, is it still better to store the documents in file system or DB? Thanks for any pointers.
Similar question was asked about 7 years ago (storing uploaded photos and documents - filesystem vs database blob)!. I hope there was some change in the technology with all NoSQL databases in the spin. Hence, I am asking this again.
Please correct me if I should be doing something else instead of raising a fresh question.
It really depends (notably of the DBMS considered, of the file system, is it remote or local, total size of data -petabytes is not the same as gigabytes-, numbers of users/documents etc.).
If the data is remote on a 1Gb/s Ethernet the network is the bottleneck. So using a DBMS won't add significant additional overhead. See the answers section of this interesting webpage, or STFW for Approximate timing for various operations on a typical PC...
If the data is local, things matter much more (but few computers have one petabyte of SATA disks). Most filesystems on Linux use some minimal block size (e.g. 1Kbytes, 4Kbytes, ...) per file.
A possible approach might be to have some threshold (typically 4 or 8kilobytes, or even perhaps 64kilobytes, that is several pages; YMMV). Data smaller than it could be directly a field in a database, data bigger than it could be in a file. The database might sometimes contain file path for the data. Read about BLOBs in databases.
Consider not only RDBMS like PostGreSQL, but also noSQL solutions à la MongoDB, and key-value stores à la REDIS, etc.
For a local data approach, consider not only plain files, but also sqlite & GDBM, etc. If you use a file system, consider avoiding very wide directories, so instead of having widedir/000001.jpg .... widedir/999999.jpg organise it as dir/subdir000/001.jpg ... dir/subdir999/999.jpg and have no more than a thousand entries per directory.
If you locally use a MySQL database, and don't consider a lot of data (e.g. less than a terabyte), you might store directly in the database any raw data smaller than e.g. 64Kbytes, and store bigger data in individual files (whose path is going into the database); but you still should avoid very wide directories for them.
Of course, don't forget to define and apply (human decided) backup procedures.

How to use image for oracle database table 11gR2

I have created emp table to store employees information and their PP size photo.
this table has empno (number), emp_image_link(varchar2), .... etc fields.
empno is auto generated using a database trigger (max empno+1).
Image : I don't want to store images into the database since I believe it will cause problems in terms of size, performance and portability. So images should be in the file system at D:\images\
and images URL should be D:\images\empno.jpg, which means emp_image_link field will contain only the image link.
I have searched Google a lot about this, everyone is discussing about how to store into the database.
I did not find any information about how to store only the link instead of the image.
I am going to use Oracle Forms Developer 11gR2.
Can anyone give me an idea of how I can do that please.
Thank you in advance.
Murshed Khan
"i dont want to store images into the database since it will cause
problem in terms of size, performance and portability i believe. so
images should be in the file system"
Your points are not valid ones.
Size. Passport photos are pretty small, so unless you are storing pictures with extremely high pixel counts they won't take up a lot of disk. Either way they will consume comparable amounts of space in the database and on the OS.
Performance. The only possible concern would be the network traffic between the database server and the middle-tier server. This would be a function of size, so may or may not be a real issue. Using na OS file store would introduce a time delay while you retrieve the JPG for each record.
Portability. An all-in-the-database solution is more portable than what you're proposing. Nothing breaks like directory paths.
One thing you haven't considered but you really should is DML on the employee records. If the pictures are stored in the database they are committed in the same transaction as (hence consistent with) the rest of the data, they are backed-up at the same time and they are recoverable in the same window. None of which applies to an OS directory on a separate server.
"Storing in the file system ... I got the solution using BFILE "
BFILE is the mechanism for linking a database record with an OS file. So it is the appropriate solution for the problem as you define it. But the BFILE points to files on the database server, so you would lose the only possibly efficiency to be gained from not storing records in the database, the network traffic between the database and middle tier servers. BFILEs would not be backed up with the database or subject to any transactional consistency.
"empno is auto generated using a database trigger (max empno+1)"
Another bad idea. It doesn't scale and more importantly it doesn't work in a multi-user environment. Please use a sequence, they're designed for this task.

Upload images in database

I want save uploaded images in a bytea column in my PostgreSQL database. I'm looking for advice on how to how to save images from Rails into a bytea column, preferably with examples.
I use Rails 3.1 with the "pg" driver to connect to PostgreSQL.
It's often not a good idea to store images in the database its self
See the discussions on is it better to store images in a BLOB or just the URL? and Files - in the database or not?. Be aware that those questions and their answers aren't about PostgreSQL specifically.
There are some PostgreSQL specific wrinkles to this. PostgreSQL doesn't have any facilities for incremental dumps*, so if you're using pg_dump backups you have to dump all that image data for every backup. Storage space and transfer time can be a concern, especially since you should be keeping several weeks' worth of backups, not just a single most recent backup.
If the images are large or numerous you might want to consider storing images in the file system unless you have a strong need for transactional, ACID-compliant access to them. Store file names in the database, or just establish a convention of file naming based on a useful key. That way you can do easy incremental backups of the image directory, managing it separately to the database proper.
If you store the images in the FS you can't easily† access them via the PostgreSQL database connection. OTOH you can serve them directly over HTTP directly from the file system much more efficiently than you could ever hope to when you have to query them from the DB first. In particular you can use sendfile() from rails if your images are on the FS, but not from a database.
If you really must store the images in the DB
... then it's conceptually the same as in .NET, but the exact details depend on the Pg driver you're using, which you didn't specify.
There are two ways to do it:
Store and retrieve bytea, as you asked about; and
Use the built-in large object support, which is often preferable to using bytea.
For small images where bytea is OK:
Read the image data from the client into a local variable
Insert that into the DB by passing the variable as bytea. Assuming you're using the ruby-pg driver the test_binary_values example from the driver should help you.
For bigger images (more than a few megabytes) use lo instead:
For bigger images please don't use bytea. It's theoretical max may be 2GB, but in practice you need 3x the RAM (or more) as the image size would suggest so you should avoid using bytea for large images or other large binary data.
PostgreSQL has a dedicated lo (large object) type for that. On 9.1 just:
CREATE EXTENSION lo;
CREATE TABLE some_images(id serial primary key, lo image_data not null);
... then use lo_import to read the data from a temporary file that's on disk, so you don't have to fit the whole thing in RAM at once.
The driver ruby-pg provides wrapper calls for lo_create, lo_open, etc, and provides a lo_import for local file access too. See this useful example.
Please use large objects rather than bytea.
* Incremental backup is possible with streaming replication, PITR / WAL archiving, etc, but again increasing the DB size can complicate things like WAL management. Anyway, unless you're an expert (or "brave") you should be taking pg_dump backups rather than relying on repliation and PITR alone. Putting images in your DB will also - by increasing the size of your DB - greatly slow down pg_basebackup, which can be important in failover scenarios.
† The adminpack offers local file access via a Pg connection for superusers. Your webapp user should never have superuser rights or even ownership of the tables it works with, though. Do your file reads and writes via a separate secure channel like WebDAV.

Resources