CSV importing: server or browser? - performance

From a conceptual point of view, what solution would perform better for a CSV importing task into a database in a SaaS?
Parse the CSV file in the browser, make an AJAX call to server for every row.
Upload the CSV file and let the server parse and insert it into DB
I know it is a too open question, given that no technology or hardware is specified. Anyway, what's better for the web server's performance? Getting thousands of connections or having to upload and parse big files?

I think the answer to your question depends a bit, but from my experience, sending the data to a server and uploading a CSV into the database has several benefits. For one, there is less "overhead per row" on uploading a straight CSV to a web or app server, and you can take advantage of things like server HW and physical proximity to the DB server for speed. There are also a lot of tooks that handle CSVs efficiently on the server side, depending on the tech stack you choose. I think it would be advantageous to send it en masse and have the server process the data upon upload.
HTH,
CDC

Related

Can I set up a website to retrieve data from my own back-end server

I've made a website for an arts organisation. The website allows people to browse a database of artists' work. The database is large and the image files for the artists' work come to about 150Gb. I have my own server that is currently just being used to keep the images on its hard-drive.
I'm going to purchase hosting so I don't have to worry about bandwidth etc... but would it be better to purchase hosting that allows me to upload my entire image database or should I use the website to get the images from my server? If so how would I do that?
Sorry I am very new to this
I think it could be better to have the data on the same server so you avoid calls to another server for images which are quite big as you say and this can slow you down overall.
I assume you will need to set up some API on your server to deliver the images or at least URLs for them but then you must make sure they are accessible.
You'll want the image files on the same server as your website, as requests elsewhere to pull in images will definitely hinder your site's performance - especially if you have large files.
Looking at large size of database and consideration of bandwidth, dedicated server will be suitable as they includes large disk spaces and bandwidth. You can install webserver as well as database server on same server inspite of managing them separately. Managing database backups and service monitoring becomes much more easier.
For an instance, you can review dedicated server configuration and resources here :- https://www.accuwebhosting.com/dedicated-servers

Oracle Applications - programmatically manipulate form data

Oracle applications 11.5.7 with Forms Server 6 - is there any way to programmatically send data from the client to the server, as if they were coming from the client forms?
This is an old version but am stuck with it for now.
I guess this is an Oracle internal protocol which details how a client JVM sends data across the network to the forms server to get it to accept user input. But can I simulate this, at least for simple data input - my goal would be to get mass data input done in a programmatic fashion rather than having to use either the Oracle forms or get an Oracle specialist to do a custom-loader for me, which is likely to be exorbitantly expensive.
I looked already at "generic" input loaders for BOMs, routings, costs etc, such as apps4more.com, but they won't support 11.5.7. The only one I found so far which would support 11.5.7 was exorbitantly expensive.
Have you tried Forms Playback?
You can record Forms actions to file and playback it later. It can be useful if you just need to redo the same activity on diffrent instances. File is plaintext, but unfortunatly format is quite difficult and it is not easy to generate it by means other than record...

Image Server Performance

I have read many questions/comments regarding saving the image in DB or file system on server side. However i'm still confused. For now I allow user to upload image (limit to 10MB) and I save the image in the server folder and serve the image via apache context path configuration pointed to that location. However, due to the numbers of image and high load. We want to provide load balancing and fail over functionality. So I have 2 options.
Add code to replicate the uploaded image to all servers or using rsync to do that.
Using CouchDB or MongoDB and save the image as attachment of an document. So I have out of the box replicate functionality.
Can anyone show me the pros/cons of these approach. Can CouchDB/MongoDB have the same read performance compared to file system ?
You can also store files in distributed file system. The benefit over DB supported image server is you do not have to alter the application. Obviously, storing all the data the same way, including images, may be a benefit for you, but changing architecture for already working system may also be problematic.
For example, GlusterFS may be installed on top of "normal" file system to give you distributed features minimizing changes to the system itself. It is supposed to support via its plugins (translators) all the feature you would potentially expect from cloud system: replication, load balancing, stripping of files into relocated parts and fail-over.
Can CouchDB/MongoDB have the same read performance compared to file system ?
No, there will be lag between file system timers and database timers, this is an unfortunately reality.
I have no idea of your current setup, load and performance so I cannot really advise on what to do, however, Apache isn't really a good image server anyway.
Your best bet might be to look into a CDN cache for your images.

Caching Dynamic data that isn't really dynamic in an IIS7 environment

Okay, so I have an old ASP Classic website. I've determined I can reduce a huge number of DB calls by caching the data daily. Our site data is read only, and changes very slowly. I think based on our site usage, I would be able to cache pages by query string for every visit each day, without a hit to our server.
My first thought was to use Output Caching, but the problem I discovered right away was that it wasn't until the third page request was generated that I gained any performance. I verified this using SQL profiler, but I'm not sure why.
My second thought was to add this ObjPageCache include file from https://web.archive.org/web/20211020131054/https://www.4guysfromrolla.com/webtech/032002-1.shtml After some research I discovered that this could cause more issues than it may solve http://support.microsoft.com/kb/316451
I'm hoping someone on here will tell me that since 2002 the issue with Sending ServerXMLHTTP or WinHTTP Requests to the Same Server has been resolved with Microsoft.
Depending on how your data is maintained you could choose from a number of ways to cache it.
If your data is changed and saved in one single place you could choose to generate an html-file which you save to the serverdisk and refer to in your linking. This will require write access for the process running your site though (e.g. NETWORK SERVICE). This will produce fast pages as the server serves these pages without any scriptingengine getting involved.
Another option is reading the data into an DomDocument which you store in the Application object and refer to on the page that needs it (hence saving the roundtrip to the database). You could keep two timestamps together with the cached data (one for the cachingtime and one for the time of change of data in the database). Timestamps will allow for fast check for staleness of the cached data: cached timestamp <> database timestamp => refresh data; otherwise use cached data. One thing to note about this approach is that Application does not accept objects other than multithreaded object so you will have to use the MSXML2.FreeThreadedDomDocument.6.0
Personally I prefer the last one as it allows for a more dynamic usage and I don't have to worry about write access permissions for the process running my site (which would probably pose security risks anyways).

Store user profile pictures on disk or in the database?

I'm building an asp.net MVC application where users can attach a picture to their profile, but also in other areas of the system like a messaging gadget on the dashboard that displays recent messages etc.
When the user uploads these I am wondering whether it would be better to store them in the database or on disk.
Database advantages
Easy to backup the entire database and keep profile content/images with associated profile/user tables
when I build web services later down the track, they can just pull all the profile related data from one spot(the database)
Filesystem advantages
loading files from disk is probably faster
any other advantages?
Where do other sites store this sort of information? Am I right to be a little concerned about database performance for something like this?
Maybe there would be a way to cache images pulled out from the database for a period of time?
Alternatively, what about the idea of storing these images in the database, but shadow copying them to disk so the web server can load them from there? This would seem to give both the backup and convenience of a Db, whilst giving the speed advantages of files on disk.
Infrastructure in question
The website will be deployed to IIS on windows server 2003 running NTFS file system.
The database will be SQL Server 2008
Summary
Reading around on a lot of related threads here on SO, many people are now trending towards the SQL Server Filestream type. From what I could gather however (I may be wrong), there isn't much benefit when the files are quite small. Filestreaming however looks to greatly improve performance when files are multiple MB's or larger.
As my profile pictures tend to sit around ~5kb I decided to just leave them stored in a filestore in the database as varbinary(max).
In ASP.NET MVC I did see a bit of a performance issue returning FileContentResults for images pulled out of the database like this. So I ended up caching the file on disk when it is read if the location to this file is not found in my application cache.
So I guess I went for a hybrid;
Database storage to make baking up of data easier and files are linked directly to profiles
Shadow copying to disk to allow better caching
At any point I can delete the cache folder on disk, and as the images are re-requested they will be re-copied on first hit and served from the cache there after.
You should store a reference to the files on a database and store the actual files on disk.
This approach is more flexible and easier to scale.
You can have a single database and several servers serving static content. It will be much trickier to have several databases doing that work.
Flickr works this way.
I gave a more detailed answer here, you may find it useful.
Actually your datastore look up with the database may actually be faster depending on the number of images you have, unless you are using highly optimized filesystem engine. Databases are designed for fast lookups and use a LOT more interesting techniques than a file system does.
reiserfs (obsolete) really awesome for lookups, zfs, xfs and NTFS all have fantastic hashing algorithms, linux ext4 looks promising too.
The hit on the system is not going to be any different in terms of block reads. The question is what is faster a query lookup that returns the filename (may be a hash?) which in turn is accessed using a separate open, filesend close? or just dumping the blob out?
There are several things to consider, including network hit, processing hit, distributability etc. If you store stuff in the database, then you can move it. Then again, if you store images on a content delivery service that may be WAY faster since you are not doing any network hits on yrouself.
Think about it, and remember bit of benchmarking never hurt nobody :-) so test it out with your typical dataset size and take into account things like simultaneous queries etc.

Resources