Can I set up a website to retrieve data from my own back-end server - web-hosting

I've made a website for an arts organisation. The website allows people to browse a database of artists' work. The database is large and the image files for the artists' work come to about 150Gb. I have my own server that is currently just being used to keep the images on its hard-drive.
I'm going to purchase hosting so I don't have to worry about bandwidth etc... but would it be better to purchase hosting that allows me to upload my entire image database or should I use the website to get the images from my server? If so how would I do that?
Sorry I am very new to this

I think it could be better to have the data on the same server so you avoid calls to another server for images which are quite big as you say and this can slow you down overall.
I assume you will need to set up some API on your server to deliver the images or at least URLs for them but then you must make sure they are accessible.

You'll want the image files on the same server as your website, as requests elsewhere to pull in images will definitely hinder your site's performance - especially if you have large files.

Looking at large size of database and consideration of bandwidth, dedicated server will be suitable as they includes large disk spaces and bandwidth. You can install webserver as well as database server on same server inspite of managing them separately. Managing database backups and service monitoring becomes much more easier.
For an instance, you can review dedicated server configuration and resources here :- https://www.accuwebhosting.com/dedicated-servers

Related

Storing files in a webserver

I have a project using MEAN stack that uploads imagefiles to a server and the names of the images to db. Then the images are shown for users of the applications kinda like an image gallery.
I have been trying to figure out an effiecent way of storing the imagefiles. atm im storing them under the angular application in a folder /var/www/app/files
What are the usual ways of storing them in a cloud server like digital ocean, heroku and many others.
Im a bit thrown off by the fact they offer many options for datastorage.
Lets say that hundres of thousands of images were uploaded by the application to the server.
Saving all of them in inside your front end app in a subfolder might not be the best solution? or am i wrong with this.
I am very new to these webserver cloud services and how they actually operate.
Can someone clarify on what would be the optimal solution.
Thanks!
Saving all of them in inside your front end app in a subfolder might not be the best solution?
You're very right about this. Over time this will get cluttered, and unless you use some very convoluted logic, will slow down your server.
If you're using Angular and this is in the public folder sent to every user, this is even worse.
The best solution to this is using something like an AWS S3 Bucket (DigitalOcean has Block Storage and I believe Heroku has something a bit different). These services offer storage of files, and essentially act as logic-less servers. You can set some privacy policies and other settings, but there's no runtime like NodeJS that can do logic for you.
Your Node server (or any other server setup) interfaces with this storage server, and handles most of the fetching and storing of files. You can optionally limit these storage services so they can only communicate with your Node server, so any file traffic would be done through your Node server.

if I am using caching on my website, where cached files are saved?

I have this small technical question about caching...
I am planning to use caching for my website and I was wondering if the cached file where save on visitors personal computers !?
I asked somebody and told me that they are saved on HTML files, and these are not on visitors Personal PC
Regards
That depends on what you mean by Cache. Most sites use caching to save bandwidth by reducing the hits to the database or other server resources by not having to re-generate dynamic content on every request. On the other hand, browsers will cache JavaScript and CSS files from websites on the local computer as a part of their normal process. Cookies are 'caching' important information specific to that computer / user and are also stored by the browser locally.
I am assuming that your talking about pages on a server, reusing them for multiple requests. That can be stored as tmp files or as entries in a database on the server (CakePHP and CSP comes to mind here). It really depends on your configuration and what you decide you want to do.

How can I improve the performance of this architecture?

I'm running a website that is CPU heavy due to a lot of thumbnailing of images.
This is how I currently do things:
User uploads image to server
Server keeps a copy, and stores the image on Amazon S3
When an thumbnail is requested, server uses the local copy to generate it, and then stores it on S3; then gives the S3 URL to the client
Subsequent requests are optimized like this: Server caches S3 URL in memcached, so it won't do the work again; server never generates a thumbnail again if the file exists; the server uses mid-sized thumbnails to generate small-sized one, so not to work with large files of not necessary
Now, I'm hosting on a Linode 4G instance (8 cores with 4x priority, 4GB RAM), and despite my optiomizations and having a memcached hit ratio of 70%, my average CPU is 170%. I'm constantly seeing all 8 CPUs working with frequent spikes of 100% for many of them at the same time.
I'm using nginx and gunicorn to serve a Django application, and the thumbnails are generated with PIL.
How can I improve this architecture?
I was thinking about a few possibilities:
#1. Easiest: add a second identical server with a load balancer in front, so that they'd share the load.
The problem with this is that the two servers would not share the local image cache. Could I solve this by placing such share on a network drive, or would the latency ultimately hinder the gains?
#2. A little harder: split the thumbnailing code out of my app, as a separate webservice, that would run on a second server. This way the main application and database would not suffer from high CPU usage, and the web pages would be served fast. The thumbnails are anyway already served asynchronously with JavaScript
Can anyone recommend some other solution?
Are you sure your performance problems come from thumbnails? OK, I suppose you've checked that.
You can downsize and upload the 2 thumbnails to S3 immediately (or shortly) after user uploaded the image. This way you should be able to save unnecessary CPU load you're now wasting for every HTTP request checking those thumbnails and doing IPC with memcached.
In a way your problem is a "good" problem to have (or at least it could have been a lot worse), in that there are no dependencies between separate image resizing tasks, so you can trivially distribute them over multiple servers. A few comments:
Have you checked to see if there is anything you can do to make the image resizing operations faster? (Google brought this up, don't know if it's any help: http://dmmartins.appspot.com/blog/speeding-up-image-resizing-with-python-and-pil) Even if you still find you need to add more servers, anything you can do to make each resize operation more efficient will make each server go farther.
If your users keep becoming more and more, you will eventually need to "scale out", but for the short term, it is possible you could solve the problem simply by paying another $80 for the next "tier" of service (8 cores at 8x priority).
Is image resizing really your app's only bottleneck? If image resizing was "free", how much further can you scale on your existing server before rendering pages, running DB queries, etc. would limit throughput? If you don't know, it would be good to do some simulated load testing and find out. I ask because if rendering pages, DB queries, etc. are also bottlenecks, or are soon to become bottlenecks, you are going to have to distribute the app anyways. In that case, you might as well keep thumbnailing in the main app, and distribute it right now, rather than making your thumbnailing run as a web service on a 2nd server.
Regardless of whether you distribute the main app, or split out thumbnailing into a separate app on a different server, you need some kind of authoritative store to keep track of where each thumbnail is kept on S3. You can keep that information in memcached, in a database, or wherever you want. It doesn't really matter. Even if you keep it in memcached, that doesn't mean you can't share the cache between 2 servers -- 1 server can connect to a memcached instance running on the other server.
You asked if "the latency" of checking a cache which is held on a different server will "hinder the gains". I don't think you need to worry about that. Your problem is throughput, not latency. Those high-latency network operations parallelize very well. So if you just service more requests in parallel, you can still make full use of your CPUs (which is the resource bottleneck right now).

What kind of hosting is used for *tube sites?

I'm not sure if this is the right place for this question, and will be happy to remove the Q if needed.
When a site grows from a just-a-fun project to a site with bigger load of visitor, and you want to enable them to upload videos, you might find yourself in a need of a better hosting, including dedicated server and a no-limit web traffic (or some reasonable limit).
So, if people can upload their videos, and if page has around 1000-10000 visitors per day, what kind of hosting is there to choose from? What is needed in that case?
Thx
You are looking for a scalable solution.
The term cloud hosting comes to mind. Hosting your site in full or in parts (only the large media perhaps) at a cloud provider resolves the problem of the storage limit of servers in the easiest (and cheapest) manner.

What are the best practices for image serving?

What techniques do people commonly use for uploading, storing and presenting images with a CMS?
Do you store them in the database or on the file system?
Do you generate thumbnails on upload? Or on the fly, then maybe cache them for reuse? Or rely on browser scaling?
Typically, most content management systems will store images the actual data of image uploads to the file systems and then add a link to the file within the database. Thumbnails can either be generated on upload or on first request (on the fly is considered inefficient, especially given the cheap cost of storage). Browser scaling is a bad idea (images may be uploaded as multi megabyte uncompressed files) but is done by some systems.
i agree with kevin. i can't think of any cms that doesn't store in the file system. then only issue that comes up with that technique is if you are planning on clustering multiple web servers to run your cms. if thats the case then you have to plan on it and have the ability to point all the web servers to the same file storage location.
the technique ive used for years is on upload, resize the image to something practical for the web, then generate the thumbnail, then write them to the file system and record the pointer in the database.
if the site is a huge site then you need serve the images from cache servers because file systems are very slow in comparison to network IO. take facebook for example, they have billions of images on their site and last i heard 80% were held in cache servers around the world in ram. the file storage array they have is more or less a backup to the cache servers.

Resources