I want save uploaded images in a bytea column in my PostgreSQL database. I'm looking for advice on how to how to save images from Rails into a bytea column, preferably with examples.
I use Rails 3.1 with the "pg" driver to connect to PostgreSQL.
It's often not a good idea to store images in the database its self
See the discussions on is it better to store images in a BLOB or just the URL? and Files - in the database or not?. Be aware that those questions and their answers aren't about PostgreSQL specifically.
There are some PostgreSQL specific wrinkles to this. PostgreSQL doesn't have any facilities for incremental dumps*, so if you're using pg_dump backups you have to dump all that image data for every backup. Storage space and transfer time can be a concern, especially since you should be keeping several weeks' worth of backups, not just a single most recent backup.
If the images are large or numerous you might want to consider storing images in the file system unless you have a strong need for transactional, ACID-compliant access to them. Store file names in the database, or just establish a convention of file naming based on a useful key. That way you can do easy incremental backups of the image directory, managing it separately to the database proper.
If you store the images in the FS you can't easily† access them via the PostgreSQL database connection. OTOH you can serve them directly over HTTP directly from the file system much more efficiently than you could ever hope to when you have to query them from the DB first. In particular you can use sendfile() from rails if your images are on the FS, but not from a database.
If you really must store the images in the DB
... then it's conceptually the same as in .NET, but the exact details depend on the Pg driver you're using, which you didn't specify.
There are two ways to do it:
Store and retrieve bytea, as you asked about; and
Use the built-in large object support, which is often preferable to using bytea.
For small images where bytea is OK:
Read the image data from the client into a local variable
Insert that into the DB by passing the variable as bytea. Assuming you're using the ruby-pg driver the test_binary_values example from the driver should help you.
For bigger images (more than a few megabytes) use lo instead:
For bigger images please don't use bytea. It's theoretical max may be 2GB, but in practice you need 3x the RAM (or more) as the image size would suggest so you should avoid using bytea for large images or other large binary data.
PostgreSQL has a dedicated lo (large object) type for that. On 9.1 just:
CREATE EXTENSION lo;
CREATE TABLE some_images(id serial primary key, lo image_data not null);
... then use lo_import to read the data from a temporary file that's on disk, so you don't have to fit the whole thing in RAM at once.
The driver ruby-pg provides wrapper calls for lo_create, lo_open, etc, and provides a lo_import for local file access too. See this useful example.
Please use large objects rather than bytea.
* Incremental backup is possible with streaming replication, PITR / WAL archiving, etc, but again increasing the DB size can complicate things like WAL management. Anyway, unless you're an expert (or "brave") you should be taking pg_dump backups rather than relying on repliation and PITR alone. Putting images in your DB will also - by increasing the size of your DB - greatly slow down pg_basebackup, which can be important in failover scenarios.
† The adminpack offers local file access via a Pg connection for superusers. Your webapp user should never have superuser rights or even ownership of the tables it works with, though. Do your file reads and writes via a separate secure channel like WebDAV.
Related
I wrote some basic blog system, which based on spring boot.
I'm trying to figure out, how can I create posts with videos and images, without the need to editing everything using HTML.
Right now, I am saving my blog posts in DB as plain text.
Is it possible to create content combined with text, images and videos , and saving this "content" as one row in my DB-Table, without creating connections between different tables?
Many thanks in advance.
Images and Videos are heavy content and storing them in database could be a costly affair, until you are developing application for research purpose. Also querying it from database and serving it over the network can impact your application performance.
If you want to store it in a single row that can be done as well using database BLOB object. But i would suggest to have 2 different tables. One containing the BLOB object of Image and Videos and other is your usual table containing blog as text and primary key of of BLOB table.
If you want to take your solution to go live, better use image-videos hosting servers because of following factors
Saves your database cost
Ensures 24x7 availability
Application performance is faster as these are hosted independent of application
Videos can directly be iframed i.e. you do not need to query complete MBs of record and serve over network
A strict answer to your question, yes, you can use BLOBs to store the video/images in the database. Think about them as a column that contains bytes of video or image.
For school/cases where you have a very small amount of videos/images its probably OK. However if you're building a real application, then don't do it :)
Every DBA will raise a bunch of concerns "why do not use Blobs".
So more realistic approach would be to find some "file-system" like storage (but distributed) style S3 in AWS, hardrive on server if you're not on cloud, etc.
Then store that big image / video there and get an identifier (like path to it if we're talking about the harddrive) and store that identifier in the database along with the metadata that you're already store probably (like blogPostId, type of file, etc.)
Once the application become more "mature" - you can switch the "provider" - Grow as you go. There are even cloud storages designed especially for images (like Cloudinary).
I am working on a system which will store user's picture and in the future some soft documents as well.
Number of users: 4000+
Transcripts and other documents per user: 10 MB
Total system requirement in first year: 40 GB
Additional Increment Each year: 10%
Reduction due to archiving Each year: 10%
Saving locally on Ubuntu Linux system without any fancy RAIDS.
Using MySQL community edition for application.
Simultaneous Users: 10 to 20
Documents are for historical purposes and will not be accessed frequently.
I always thought it is cumbersome to store in a RDBMS due to the multiple layers to access etc. However, since we use key/value pair in nonRDBMS databases, is it still better to store the documents in file system or DB? Thanks for any pointers.
Similar question was asked about 7 years ago (storing uploaded photos and documents - filesystem vs database blob)!. I hope there was some change in the technology with all NoSQL databases in the spin. Hence, I am asking this again.
Please correct me if I should be doing something else instead of raising a fresh question.
It really depends (notably of the DBMS considered, of the file system, is it remote or local, total size of data -petabytes is not the same as gigabytes-, numbers of users/documents etc.).
If the data is remote on a 1Gb/s Ethernet the network is the bottleneck. So using a DBMS won't add significant additional overhead. See the answers section of this interesting webpage, or STFW for Approximate timing for various operations on a typical PC...
If the data is local, things matter much more (but few computers have one petabyte of SATA disks). Most filesystems on Linux use some minimal block size (e.g. 1Kbytes, 4Kbytes, ...) per file.
A possible approach might be to have some threshold (typically 4 or 8kilobytes, or even perhaps 64kilobytes, that is several pages; YMMV). Data smaller than it could be directly a field in a database, data bigger than it could be in a file. The database might sometimes contain file path for the data. Read about BLOBs in databases.
Consider not only RDBMS like PostGreSQL, but also noSQL solutions à la MongoDB, and key-value stores à la REDIS, etc.
For a local data approach, consider not only plain files, but also sqlite & GDBM, etc. If you use a file system, consider avoiding very wide directories, so instead of having widedir/000001.jpg .... widedir/999999.jpg organise it as dir/subdir000/001.jpg ... dir/subdir999/999.jpg and have no more than a thousand entries per directory.
If you locally use a MySQL database, and don't consider a lot of data (e.g. less than a terabyte), you might store directly in the database any raw data smaller than e.g. 64Kbytes, and store bigger data in individual files (whose path is going into the database); but you still should avoid very wide directories for them.
Of course, don't forget to define and apply (human decided) backup procedures.
I have integrated the jack rabbit with Oracle database and I am storing the
Data using Jackrabbit, if I don't want to retrieve the data using the
Jackrabbit, in what way I can get the data. In database data is storing in
blob type.
The way Jackrabbit stores the data in the DB is an implementation detail, and it does not magically map this into a "nice" DB schema if that's what you mean. (The hierarchical nature and all the JCR features make this impossible). It's a bit like having a Unix file system and then asking how can I read the low level inodes etc. from the file system implementation - you really should not.
Last but not least note that while it is running nothing else (except for a Jackrabbit cluster setup) must write to the DB (the tables used by Jackrabbit) as this will easily lead to data corruption.
As #TedTrippin already mentioned above, an ORM framework would make things much easier. But if you really want to do it manually in Oracle, the approach would be:
Study the code of the OCM http://jackrabbit.apache.org/jcr/object-content-mapping.html, then get the content according to the logic of associations and relations from Oracle, probably not in one but multiple queries per document; eventually with user-defined functions, which are supported in Oracle and might make things easier.
Would be interesting to know the background of your questions. You tagged it with "Spring" and "CMS". I don't see any reason why you would want to access the data directly from Oracle, it's tedious. In case you want to provide an API for the content to an external system, or in case you have lost a CMS that was once in front of and just using the Jackrabbit repo as a content store, you could still use such ORM / OCM framework standalone to make it easier to access the data.
I have created emp table to store employees information and their PP size photo.
this table has empno (number), emp_image_link(varchar2), .... etc fields.
empno is auto generated using a database trigger (max empno+1).
Image : I don't want to store images into the database since I believe it will cause problems in terms of size, performance and portability. So images should be in the file system at D:\images\
and images URL should be D:\images\empno.jpg, which means emp_image_link field will contain only the image link.
I have searched Google a lot about this, everyone is discussing about how to store into the database.
I did not find any information about how to store only the link instead of the image.
I am going to use Oracle Forms Developer 11gR2.
Can anyone give me an idea of how I can do that please.
Thank you in advance.
Murshed Khan
"i dont want to store images into the database since it will cause
problem in terms of size, performance and portability i believe. so
images should be in the file system"
Your points are not valid ones.
Size. Passport photos are pretty small, so unless you are storing pictures with extremely high pixel counts they won't take up a lot of disk. Either way they will consume comparable amounts of space in the database and on the OS.
Performance. The only possible concern would be the network traffic between the database server and the middle-tier server. This would be a function of size, so may or may not be a real issue. Using na OS file store would introduce a time delay while you retrieve the JPG for each record.
Portability. An all-in-the-database solution is more portable than what you're proposing. Nothing breaks like directory paths.
One thing you haven't considered but you really should is DML on the employee records. If the pictures are stored in the database they are committed in the same transaction as (hence consistent with) the rest of the data, they are backed-up at the same time and they are recoverable in the same window. None of which applies to an OS directory on a separate server.
"Storing in the file system ... I got the solution using BFILE "
BFILE is the mechanism for linking a database record with an OS file. So it is the appropriate solution for the problem as you define it. But the BFILE points to files on the database server, so you would lose the only possibly efficiency to be gained from not storing records in the database, the network traffic between the database and middle tier servers. BFILEs would not be backed up with the database or subject to any transactional consistency.
"empno is auto generated using a database trigger (max empno+1)"
Another bad idea. It doesn't scale and more importantly it doesn't work in a multi-user environment. Please use a sequence, they're designed for this task.
A project I'm working on potentially entails storing large amounts (e.g. ~5GB) of binary data in CoreData. I'm wondering if this would negatively impact the user's Time Machine backup. From reading the documentation it seems that CoreData's persistent store uses a single file (e.g. XML, SQLite DB, etc) so it would seem to me that any time the user changes a piece of data in the datastore Time Machine would copy the data store in its entirety to the backup drive.
Does CoreData offer a different datastore format that is more Time Machine friendly?
Or is their a better way to do this?
You can use configurations in your data model to separate the larger entities into a different persistent store. You will need to create the persistent store coordinator yourself, using addPersistentStoreWithType:configuration:URL:options:error: to add each store with the correct configuration.
To answer your question directly, the only thing I can think of is to put your Core Data store in a sparsebundle disk image, so only the changed bands would be backed up by Time Machine. But really, I think if you're trying to store this much data in SQLite/Core Data you'd run into other problems. I'd suggest you try using a disk-based database such as PostgreSQL.