I have an existing application which uses Sql Server 2005 as a back-end. It contains huge records, I need to join tables which contain 50K-70K. Client m/c is a lower hardware.
So, can I improve its performance by using MS Access as a back-end? I also need to search operation on an Access file. So, which one is better for performance?
Querying Access is better than querying SQL in lower h/w?
Because SQL Server does run as a separate process, caches results, uses ram and processing power when not being queried, etc., IF the other computer has very little RAM or a very slow processor (or perhaps even more importantly a single-core processor), I could see a situation where SQL Server is actually SLOWER than MS Access use.
Without information about your hardware setup, approximately what percentage of your application relies on querying the database, etc., I'm not sure this question can be easily answered.
MS SQL Server 2005 Express requires at least 512 MB RAM (see http://www.microsoft.com/sqlserver/2005/en/us/system-requirements.aspx), so if your lower-end hardware doesn't have at least 512MB, I would certainly choose MS Access over SQL Server.
I should also add that you may want to consider SQLite (see http://www.sqlite.org/) which should be significantly less overhead than MS SQL Server. I'm not certain how it would stack up against MS Access use over something like Jet. My gut instinct is that it would perform better with less overhead.
70,000 records is really not that big for SQL server (or access for that matter). I would echo what has already been said and say that all things being equal SQL server will out perform Access.
I would go back to your query and look at the execution plan to see why it is so slow, maybe missing indexes, out of date statistics or a whole host of other reasons could explain your current performance problems.
SQL server also gives you the option of using materialised views to help with performance. The trade of is slower insert/update/delete performance but if you read more than you write it might be worth it.
I think Albert Kallal's comment is right, and the fact is that if you have a single-user app running on a single workstation (Access client with SQL Server running on the same workstation as a client), it will quite often be slower than if the setup on that workstation were Access client to Jet/ACE back end on the same machine. SQL Server adds a lot of overhead that delivers no benefit when there is no network in between the client and the SQL Server.
The performance equation flips when there's a network involved, even for a single-user app. If the Access client runs on a workstation, and the SQL Server on a server on the other end of a network connection (even a fast one), it will likely be faster than if the data is stored in a Jet/ACE file on a file server.
But it's not a given, in my opinion. It depends entirely on the engineering of the application and the excellence of the schema.
I've tried to use SQL Server Express 2005 VS MS Access 2010, many people said SQL Server would run faster than MS Access (I also thought so first). But what happened then made me surprised, run query with MS Access is faster than SQL Server with significant result (with the same data & structure because I'd converted from Access to SQL Server before).
But i don't know its performance in another processes like insert, update, and delete yet.
Local SQL Server Express 2014
1 Minute about 2200 records (2200x connect to DB and retrieve 1 record)
External SQL Server Express 2014 (different IP)
1 Minute about 2200 records (2200x connect to DB and retrieve 1 record)
External SQL Server 2000 (old server)
1 Minute about 10000 records (10000x connect to DB and retrieve 1 record)
Locale Access Database
1 Minute about 55000 records (55000x connect to DB and retrieve 1 record)
We are also surprised.
I'll answer the question directly, but first it is important to know a few things about Access and SQL.
In general, I have found that a small database with up to 10K records will perform equally well on both Access or SQL if all machines have reasonable hardware. Access has a benefit for simplicity for a small number of users, up to 4, but also has a size limitation of 2GB. So you need to be careful that the database size stays below this limit. Some databases start small, but then have a way of growing over time. Something to keep in mind when planning for the future of your program and/or database. If you are might approach the 2GB the limit, one option is to use Microsoft SQL Server 2014 Express edition which has a database size limit of 10GB. SQL Express is full SQL, but with size limitations. Full blown SQL Server 2014 has a max database size of 524PB (524,000,000GB). So it would be fair to say it has no practical limit.
If your database has more than 10K records and especially for larger databases of 100K records or more, SQL can demonstrate significant performance gains.
Some performance with MS Access can be achieved by using "Pass through queries" as can any program that uses SQL optimized queries.
Why? The answer comes from how the technology works under the hood. With Access, if it not using "Pass through queries" it will read an entire table, find which records it needs and then show the result. With a program using SQL optimized queries, the SQL engine returns just the results in a very efficient manner.
At the end of the day, if you have a small (<10K record) database used by up to 4 people, MS Access might make sense. If you have plans that the database could grow to more than 10K records or be used by more than 5 users, SQL would be the logical choice.
Specifically for the question posed about a 50-70K record database. I think if you have reasonable hardware, generally SQL will perform better, if you have a unique situation (such as lower hardware on the SQL server) a move to Access could see some improvements.
My take on this topic is that one should think of payload in terms of pickup truck versus an 18 wheeler. Better/worse/faster/slower somewhat misses the point. It is a matter of choosing the appropriate vehicle for the payload.
70k records is easily handled by today's PCs. So one may as well stick with the pickup truck, unless an organization already has an installed skill set of SQL Servers there would be no reason to use it for an on-premise Windows application of just 70k records. Obviously if it is a web/mobile app that requires a backend db technology then Access isn't a candidate.
SQL Server will always give you better performance because the query is executed on the server. Access on the back-end won't help because your client application will need to pull all the data from the tables, and then perform the join locally.
SQL Server has better indexing options... Filtered indexes, included columns, etc
There is ZERO chance that Access query is faster than a properly indexed SQL Server database query.
Related
We have a B4ms VM running a SQL server (as well as web server). We have installed Power BI Gateway on it to make reports with on-prem data.
Basically the user can sign to the server and view power bi reports in the browser.
I find it a bit dumb that the user has to query Power BI for the data, that in turn gets it from the machine, but perhaps there is no other way.
The issue we are running into is that some visuals take a huge performance hit when loading. Some even seem to exceed the resources.
I know it's somewhat of a broad question to ask, but maybe specifically - is there a way to improve the connection between the VM and the PBI server?
It will depend on the type of query that you are doing/sending down to the SQL Server, for a number of projects that I have deployed, I have used Direct Query to sit over data sources that have been at least 50-100GB, however these have been mostly standard Star Schema data warehouses, or a defined reporting table, both will have the relevant indexes, covering indexes, or Column Store Indexes to allow more efficient retrieval of data. These have been on Azure SQL and On-Prem SQL Instances.
Direct Query Mode will slow down due to the number of query's that it has the do on the data source based on the measure, relationships and the connection overhead. Another can be the number of visuals on page, as each visual is a query and each one has to run on the data source.
One other method to increase the speed of Direct Query would be to use Aggregations in Power BI, to store an imported subset of data in Power BI. If the query can be answered by the aggregation layer then it will be answered quicker. Microsoft demonstrated this with the 'Trillion Row Demo'
In terms of the Power BI Direct Query Issues, from the range of clients that I interact with, those that do have issues with Direct Query, have a mash up of tables in an inefficient schema, running sub optimal query's on the data source, with a number of data transformations in DAX, and DAX measures that have been badly written, for example lots of DISTINCT COUNTS & SWITCH.
For the connection make sure you have the latest Data Gateway Installed/Update as optimizations to the Mash Up engine can make it faster. Another option would be to shift the DB to Azure SQL Database and remove the need for the gateway.
For DirectQuery reports you need to examine the generated SQL and evaluate the execution at SQL Server. You can use the Performance Analyzer in Power BI Desktop to capture the DAX and SQL generated as your DirectQuery model interacts with SQL Server, and then use SQL Server Management Studio and the Query Store to examine the Execution Plans and indexing options.
When I run a query to copy data from schemas, does it perform all SQL on the server end or copy data to a local application and then push it back out to the DB?
The two tables sit in the same DB, but the DB is accessed through a VPN. Would it change if it was across databases?
For instance (Running in Toad Data Point):
create table schema2.table
as
select
sum(row1)
,row2
from schema1
The purpose I ask the question is because I'm getting quotes for a Virtual Machine in Azure Cloud and want to make sure that I'm not going to break the bank on data costs.
The processing of SQL statements on the same database usually takes place entirely on the server and generates little network traffic.
In Oracle, schemas are a logical object. There is no physical barrier between them. In a SQL query using two tables it makes no difference if those tables are in the same schema or in different schemas (other than privilege issues).
Some exceptions:
Real Application Clusters (RAC) - RAC may share a huge amount of data between the nodes. For example, if the table was cached on one node and the processing happened on another, it could send all the table data through the network. (I'm not sure how this works on the cloud though. Normally the inter-node traffic is done with a separate, dedicated network connection.)
Database links - It should be obvious if your application is using database links though.
Oracle Reports and Forms(?) - A few rare tools have client-side PL/SQL processing. Possibly those programs might send data to the client for processing. But I still doubt it would do something crazy like send an entire table to the client to be sorted, and then return the results to the server.
Backups/archive logs - I assume all the data will be backed up. I'm not sure how that's counted, but possibly that means all data written will also be counted as network traffic eventually.
The queries below are examples of different ways to check the network traffic being generated.
--SQL*Net bytes sent for a session.
select *
from gv$sesstat
join v$statname
on gv$sesstat.statistic# = v$statname.statistic#
--You probably also want to filter for a specific INST_ID and SID here.
where lower(display_name) like '%sql*net%';
--SQL*Net bytes sent for the entire system.
select *
from gv$sysstat
where lower(name) like '%sql*net%'
order by value desc;
Initiation
I have a SQL Server Express 2008 R2 running. There are ten users who read / write permanently to the same tables using Stored Procedures. They do this day and night.
Problem
The performance of the Stored Procedures is getting lower and lower with increasing database size.
A Stored Procedure call needs avg 10ms when the database size is about 200MB.
The same call needs avg 200ms when the database size is about 3GB.
So we have to cleanup the database once a month.
We already did index optimization for some tables with positive effects but the problem still exists.
Finally im not a SQL Server expert. Could you give me some hints to start getting rid of this performance problem?
Download and read Waits and Queues
Download and follow the Troubleshooting SQL Server 2005/2008 Performance and Scalability Flowchart
Read Troubleshooting Performance Problems in SQL Server 2005
The SQL Server Express Edition limitations (1GB memory buffer pool, only one socket CPU used, 10GB database size) are unlikely to be the issue. Application design, bad queries, excessive locking concurrency and poor indexing are more likely to be the problem. The linked articles (specially the first one) include methodology on how to identify the bottleneck(s).
This is MOST likely simple a programmer mistake - sounds like you simply do either have:
Non proper indexing on some tables. THis is NOT optimization - bad indices is like broken HTML for web people, if you have no index then basically you are not using SQL as it is supposed to be used, you should always have proper indexes.
Not enough hardware, such as RAM. yes, it can manage a 10gb database, but if your hot set (the suff accessed all the time) is 2gb and you have only 1gb it WILL hit disc more often than it needs.
Slow discs, particularly a express problem because most people do not bother to get a proper disc layout. THen they run a sQL database againnst a slow 200 IOPS end user disc where - depending on need - a SQL database wants MANY spindles or an SSD (typical SSD these days has 40.000 IOPS).
That is it at the end - plus possibly really bad SQL. Typical filter error: somefomula(field) LIKE value, which means "forget your index, please, make a table scan and calculate someformula(field) before checking".
First, SQL Server Express is not the best edition to your requierement. Get a Developer's Edition to test it. Its exactly like the Enterprise but free if you dont use on "production".
About the performance, there are so many things involved here, and you can improve it using, since indexes until partitioning. We need more info to provide help
Before Optimizing your SQL queries, you need to find the hotspot of the queries. Usually you can use SQL Profiler to do this on SQL Server. For Express edition, there's no such tool. But you can walk around by using a few queries:
Return all renct query:
SELECT *
FROM sys.dm_exec_query_stats order by total_worker_time DESC;
Return only top time consuming queries:
SELECT total_worker_time, execution_count, last_worker_time, dest.TEXT
FROM sys.dm_exec_query_stats AS deqs
CROSS APPLY sys.dm_exec_sql_text(deqs.sql_handle) AS dest
ORDER BY total_worker_time DESC;
Now you should know which query needs to be optimized.
May be poor indexes,Poor design of database, may not apply normalization,unwanted column indexes,poor queries which take much time to execute.
SQLExpress is built for testing purposes and the performance is directly limited by Microsoft, If you use it in a production environment you may want to get a license for SQL Server.
Have a look here SQL Express for production?
I am stress testing a database table
I am looking for any software that can connect to my database and show me some metrics like no of rows in a table, time for inserts , inserts/time, table fragmentation[logical/physical] etc .
It would be great if the reporting tool can do the following:
1] Report in real time or atleast after some interval so that I do not have to wait for test to finish to get first look at the data
2] Ability to do stuff with the data later, like get 99.99 percentile, avg etc.
Is mostly freely available :)
Does anyone have any suggestion of something I can use with my Oracle table. Any pointers would be great.
I can actually write scripts to logg stuff like select count(*) etc .. but then I will have to spend a lot of time parsing and changing the data reporting rather than the tests.
I think some intelligent thing might already be out there ??
Thanks
Edit:
I am looking at a piece of design for
a new architecture
The tests are
"comparison" tests for different
designs and hence as far as I do it
on same hardware and same schema etc
they are comparable to some
granularity.
I want to monitor index
fragmentation, and response times
etc.
If you think there are other
things that can change please let me
know. I am trying to roll back the
table to particular state[basically
truncate] for each new iteration of
the test
First, Oracle has built-in functionality for telling you the number of rows in a table (either use count(*) or search 'gather statistics oracle' for another option).
But "stress testing a table" sounds to me like you're going down the wrong path. Most of the metrics you're mentioning ("time for inserts , inserts/time, table fragmentation[logical/physical] etc") are highly dependent on many factors:
what OS Oracle's running on
how the OS is tuned (i.e. other services running)
how the specific Oracle instance is configured
what underlying storage architecture Oracle's using (and how tablespaces are configured)
what other queries are being executed in the database at the exact same time as your test
But NONE of them would be related to the table design itself.
Now, if you're wondering if your normalized (or de-normalized) table schema is hurting your application, that's another matter. As is performance being degraded by improper/unneeded/missing indexes, triggers, or a host of other problems.
But if you really want an app that will give you real-time monitoring, check out Quest Software's Spotlight on Oracle. But it's definitely not free.
Just to add to the other comments, I believe what you really want is to stress test the queries you're running and not the table. The table is just a bunch of data blocks on a disk and the query is what will make the difference in performance as far as development is concerned. That will tell you if you need different indexes or need to redesign the query.
On the other hand, if you're looking at it as a DBA or system administrator, you're probably more interested in OS level statistics especially disk latency, memory paging, and CPU utilization.
All this is available in the enterprise manager which is my primary tuning tool for development and DBA. If you don't have that, read up on using sql_trace to profile your queries and your OS specific documentation on how to get those stats.
Our application runs on the web, is mostly an inquiry tool, does some transactions. We host the Oracle database. The app has always had a different instance of Oracle for each customer. A customer is a company which pays us to provide our service to the company's employees, typically 10,000-25,000 employees per customer. We intend to have several hundred customers. We do a major release every few years, and migrating to that new release is challenging: we might have a team at the customer site for a couple weeks, explaining new functionality and setting up the driving data to suit that customer.
We're considering going multi-client, putting all our customers into a single shared Oracle 11g instance on a big honkin' Windows Server 2008 server -- in order to reduce costs. I'm wondering if that's advisable.
There are some advantages to having separate instances for each customer. Tell me if these are bogus, please. In my rough guess about decreasing importance:
Our customers MyCorp and YourCo can be migrated separately when breaking changes are made to the schema. (With multi-client, we'd be migrating 300+ customers overnight!?!)
MyCorp's data can be easily backed up and (!!!) restored, without affecting other customers.
MyCorp's data is securely separated from their competitor YourCo's data, without depending on developers to get the code right and/or DBAs getting the configuration right.
Multiple instances are lower risk, because a disaster with one customer (someone accidentally doubles everyone's salary and the error is discovered after pay day) doesn't affect other customers. A disaster that affected ALL our customers (whoops, new DBA, and suddenly every participant has the same SSN!?!) might put our company under.
Having one instance on one server presents a single point of failure, with our entire customer base out of business if a hurricane knocks the building over. Multiple instances on multiple servers permits geographic dispersion: no catastrophe will affect too large a proportion of our customers, and the unaffected servers in other regions can take on the load of the failed servers.
Performance is better because the database is smaller (10,000 vs 2,000,000 rows in ~50 tables).
If MyCorp's offices are (mostly) in just one region, then the MyCorp's instance can be geographically co-located there, so network lag doesn't hurt performance. We can provide better service to global clients, for the same reason.
In MyCorp wants to take their database in-house, then we can easily export their instance, to get MyCorp their data.
Load-balancing is easier because instances can be placed on different servers (this is with a web farm).
When a DEV or QA instance is needed, it's easier to clone the real instance and anonymize the data, because there's much less data.
Because they're small enough, developers can have their own instance running locally, so they can work on code while waiting at the airport and while in-flight, without fighting VPN hassles.
Q1: What are other advantages of separate instances?
We are contemplating changing the database schema and merging all of our customers into one Oracle instance, running on one hefty server.
Here are advantages of the multi-client instance approach, most important first (my WAG). Please snipe if these are bogus:
Less work for the DBAs, since they only need to maintain one instance instead of hundreds. Less DBA work translates to cheaper, our main motive for this change.
With just one instance, the DBAs can do a better job of optimizing performance. They'll have time to add appropriate indexes and review our SQL.
It will be easier for developers to debug & enhance the application, because there is only one schema and one app (there might be dozens of schema versions if there are hundreds of instances, with a different version of the app for each version of the schema). This reduces costs too. The alternative is having to start every debug session with (1) What version is this customer running and (2) Let's struggle to recreate the corresponding development environment, code and database. (We need a Virtual Machine that includes the code AND database instance for each patch and release!)
Licensing Oracle is cheaper because it's priced per server irrespective of heft (or something -- I don't know anything about the subject).
The database becomes a viable persistent store for web session data, because there is just one instance.
Some database operations are easier with one multi-client instance, like finding a participant when they're hazy about which customer they (or their spouse, maybe) works for: all the names are in one table. Reporting across customers is straightforward.
Q2: What are other advantages of having multiple clients in one instance?
Q3: Which approach do you think is better (why)? Instance per customer, or all customers in one instance?
I'm concerned that having one multi-client instance makes migration near-impossible, and that's a deal killer...
... unless there is a compromise solution like having two multi-client instances, the old and the new. In that case case, we would design cross-instance solutions for finding participants, reporting, etc. so customers could go from one multi-client instance to the next without anything breaking.
Unless you are using Oracle XE (the limited, free edition) having one database per server will get very expensive very quickly, even if you're buying single core, single CPU boxes. Having several databases per server is inefficient, because each database incurs an overhead of CPU and RAM usage. Tuning is more difficult, because contention is harder to diagnose.
So, as well as being easier to administer, a single big server ought to work out cheaper than lots of discrete little servers (no guarantees, no money back!). Make sure you buy the biggest, fastest chips you can and as much RAM as you have free slots. Those are things which give you better performance without affecting your licensing costs.
Consider the Partitioning option, if you can afford it. This will address your concerns regarding backup and recovery, because each partition can have its own tablespace. So (given partitioning by client_id) it becomes possible to backup or restore an individual client's data without affecting the other clients. We can even export and import individual partitions. I'm surprised by David's observation that Partition pruning didn't work with VPD. But I haven't tried this combo, so I'll take his word for it.
The one thing you might lose from consolidation is the ability to support different clients on different versions of your application. However, this is not necessarily a bad thing. As you observe, maintaining several hundred customers will be a lot easier if you forgo individualised versions of the application. If you do need to offer some bespoke features - even if you just want to beta test some functionality with an individual client - then have a look at Edition-Based Redefinition in 11gR2: it is a really nifty feature. Also it is available for all Oracle licenses, not just Enterprise.
When you say 'separate instances', are you talking about one instance with multiple schemas on it? Or do you really mean multiple instances running on a single machine? There is little reason to run multiple instances on a single machine, as opposed to running multiple schemas on a single instance - each schema would still have their own set of tables, indexes, etc.
Anyways, I don't have a full answer, but one thing to keep in mind is the licensing costs of Oracle, and how that can affect what the optimal solution is.
According to the Oracle store,
Oracle standard edition one is $5,800.00 / Processor (where on x86, a processor is a socket, and you can go to up 2 sockets)
Oracle standard edition is $17,500.00 / Processor (where on x86, a processor is a socket, and you can go to up 4 sockets)
Oracle enterprise edition is $47,500.00 / Processor (where on x86, a processor is 2 CORES - so you have to effectively double that price for quad core CPUs)
So if, for example, you need 8 quad core CPUs to handle 100 customers, licensing that on a single database is VASTLY more expensive than having 4 separate databases, each having 2 quad core CPUs, each running 25 customers.
8 quad core CPUs requires enterprise edition, and would have a list price of 16 x $47,500 = $760,000. 4 machines, each running standard edition one, and each with 2 quad-core CPUS, would have a list price of 8 x $5,800.00 = $46,400 - a factor of 16 difference. Now, keep in mind that no one pays list price for enterprise edition, but there is still a huge difference to consider.
If you don't have a huge need for database operations across clients, and you don't need enterprise edition features, and you need this level of CPU power (or expect to grow to need this level of CPU power), the licensing costs are going to be a huge downside of the one-instance approach.
It may be worth researching salesforce, and the buzz word you're looking for is "multi tenant architecture"
This makes a good read:
http://blog.dayspring-tech.com/2009/02/forcecom-multitenant-architecture-under-the-covers/
It's a good example because Salesforce do use an Oracle db under the covers.
Good question, glad to see you are considering all the alternatives. Lots of good points but I will stick to just addressing one.
I was the DBA for a hosted application and the developers decided to use Oracle Virtual Private Database feature for this.
The application was constructed with intention of customers sharing a pool of app servers for load balancing and a single database schema on the back end.
Before VPD we had a Java class that tacked "where customer_id=?" or "and customer_id=?" on every query right before it went to the database so the customer would only see their data. To implement this in VPD upon login ot the DB we would have the app set a variable in the app context that would be used by the VPD policies to allow the session to only see their records. So yeah, you have to code it up right and assign VPD policies to tables, and also trust that Oracle holds up their end of the bargain.
So was it good for us? In theory it was nice to offload the SQL predicate handling to something outside our application but in practice the advantages didn't outweight the disadvantages.
When we had dozens of clients in one database and when we upgraded they all had to get upgraded at the same time. We had lots of tug-of-wars with customers that didn't want to upgrade for whatever reason or wanted to do their own QA on the new versions.
We entertained the Old instance/New instance thing for upgrades but migrating data was risky and associated downtime did not make customers happy. We did roll our own procedure that would step through tables and export data... But certainly not as easy as a quickie Export or Data Pump job.
We also had issues with VPD predicate analysis when it came to Partitioning. As with alot of Oracle features they may work OK on their own but once you combine with other features things get unpredictable. For us partitions not related to the current customer_id weren't getting eliminated because the predicate analysis was coming too late in the processing of the SQL statement. We worked around it by changing from static to dynamic VPD policies but our time spent parsing shot up.
So after all that what is my take on it? I would have spent the time making sure our app made good use of bind variables and continued with the old mechanism that added customer_id to the SQL statement.
Oracle is made to handle that kind of load.
My Question - What do you do when you have thousand customers and say ten thousand?
Do you still keep separate instances/schema?
I doubt anyone will do that. I have worked earlier in a place where each client had separate database as well as a copy at a central place.
Change management becomes a headache, you'd have to maintain a very good information about which client/company is on which database revision, schema, app version and all those things. This'd become a software in itself.
I'd suggest to create software/design based around SaaS model, that'll allow you easy maintenance and same database/schema for all users.
For Reliability you can still use clustering - Oracle RAC.
I've had to consider the same decision a few times. In our case we use MySQL, so there is no cost associated with running all customers in a separate database.
The benefits to running all of our customers on a separate database have been great. We have a script that lets us move a customer's entire instance to any server to balance load. The script merely copies over the database, copies over any custom files, spins up the application, and sets up our routing system to send users to the new instance. The whole process takes just a few minutes.
Database changes can take a very long time on large mysql databases. Since all our clients have their own database we are able to keep all of our datasets small. Backups are also very fast.
Our development instances behave the same way, so this method allows us to run a variety of database schemas simultaneously as we develop and test new features. We often work with customers to have them try out a new feature before we deploy it to the rest of our instances. The one rule that we stick to (in order to avoid a few of the drawbacks you mention), is that all clients must be within one version of each other. Maintaining more than a couple versions across clients would have a huge overhead.
Facebook took the same approach when they started their company. Each school that they launched at had a separate database and they were able to set up new instances very quickly. The primary reason they finally consolidated their database was that they wanted to enable users to communicate between schools.
If not for potential cost issues I would definitely encourage you to stick with the separate database approach.