This can be a debatable answer, but I'm looking for the case where a local Excel file needs to be exported to a local SQL Server 2008' table.
Has anyone ever had the chance to check execution time to compare OpenRowSet/OpenQuery/OpenDataSource for a very large file import in SQL Server 2008?
I'm able to use any of the 3 options, and the query can be executed from anywhere. However, the data source (Excel) is in the same server as the SQL Server.
Any pointers would be helpful.
It's been nearly 12 years since this question was asked and it's viewed 2k times, which means people are interested in knowing the answer... even today, I wanted to get to this answer...
So I created an excel spreadsheet with 100k rows and registered it as a linked server then compared the average result of the duration of four different type of open queries to this data. Here are the results:
There's a bit of setup to do that requires administrator privileges on the SQL server, registering an OLEDB provider, and acquiring permissions on the file.
This test was run on a 2016 version of SQL Server Enterprise (64-bit).
Each equation was run through 12 cycles and averaged and rounded.
1. Test for OpenRowset:
SELECT *
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0 Xml;Database=C:\temp\sample100k.xlsx;',
Sample100k$);
CPU Time: 4705 ms
Elapsed Time: 7894 ms
2. Test for OpenDatasource
SELECT *
FROM OPENDATASOURCE('Microsoft.ACE.OLEDB.12.0',
'Data Source=C:\temp\sample100k.xlsx;Extended Properties=EXCEL 12.0')
...[Sample100k$];
CPU Time: 4794 ms
Elapsed Time: 7918 ms
3. Test for a direct query on a Linked Server
/* Configuration. Only run once for setting up the linked server */
/* Note that this step needs to take place for the third and fourth tests */
EXEC sys.sp_addlinkedserver #server = N'SAMPLE100K',
#srvproduct = N'Excel',
#provider = N'Microsoft.ACE.OLEDB.12.0',
#datasrc = N'C:\temp\sample100k.xlsx',
#provstr = N'Excel 12.0'
SELECT * FROM [SAMPLE100K]...[sample100k$];
CPU Time: 4919 ms
Elapsed Time: 7934 ms
4. Test for OpenQuery on a Linked Server
/* Assume linked server has been registered, as mentioned in the third test */
SELECT * FROM OPENQUERY(SAMPLE100K, 'SELECT * FROM [sample100k$]');
CPU Time: 3569 ms
Elapsed Time: 5643 ms
I did not expect these results; it appears that test 4 (SELECT * FROM OPENQUERY...) performed 20% faster than the average and over 25% faster than the linked server query in test 3 (SELECT * FROM SAMPLE100K...)
I'll let the OP and other readers determine whether or not they should really use any of these methods compared to doing a table import, a BCP, an SSIS ETL package or some other method.
I'm simply providing an answer to the question for stack overflow visitors who visit this page every other day.
Related
I have a very small encrypted sqlite test database. I run a very simple select: just one record from the table which contains one record. This request takes very significant time: 0.3 sec.
lesnik#westfall:~/Projects/ls$ cat sql_enc.sql
PRAGMA KEY = "DUMMYKEYDUMMYKEY";
SELECT * FROM 'version';
lesnik#westfall:~/Projects/ls$
lesnik#westfall:~/Projects/ls$ time sqlcipher rabbits_enc.sqlite3 < sql_enc.sql
key ver
---------- ----------
1 aaa
real 0m0.299s
user 0m0.297s
sys 0m0.000s
Experiments show that the time doesn't depend on number of requests in script and doesn't depend on size of database (this test database is just 5kb, result is the same on 500kb databases)
There is no such problem if database is not encrypted.
Performance is slightly better on another linux installation (in different Virtual Box on the same host). And there is no this problem on yet another linux installation (script execution time is about 0.001s there), so I believe this is some problem with environment. But I have no idea how to investigate this problem further. Any help is appreciated.
We provide general performance guidance for utilizing SQLCipher here
We run Mondrian (version "3.7.preview.506") on a Tomcat Webserver.
We have some long running MDX-queries.
For example: The first calculation takes 276.764 ms and sends 84 SQL requests to the database (30 to 700ms for each SQL statement).
We see that the SQL-Statements are not executed in parallel - only two "mondrian.rolap.agg.SegmentCacheManager$sqlExecutor" are running at the same time.
Is there a way to force Mondrian/olap4j to execute the SQL statments more in parallel?
What is about the property "mondrian.rolap.maxSqlThreads" which is set to 100 by default?
Afterwards we execute the same MDX query and the calculation is finished in 4.904 ms.
Conclusion - if the "internal cache" (mondrian.rolap.agg.SegmentCacheManager) has loaded the segments the calculation is executed without any database request - but ...
3.How can we "warm up" the internal cache?
One way we tried was to rewrite the MDX-queries - we load several month into the cache by once (MDX-B):
MDX-A: SELECT ... ON ROWS FROM cube01 WHERE {[Time].[Default].[2017].[4]}
becomes
MDX-B: SELECT ... ON COLUMNS, CrossJoin( ... ,{[Time].[Default].[2017].[2]:[Time].[Default].[2017].[4]})" + " ON ROWS FROM cube01
The rewriten MDX query takes 1.235.128 ms (244 SQL requests) - afterwards we execute our orgin MDX query (MDX-A) and the calculating takes 6.987 ms
- the interessting part for us was, that the calculation takes longer as 5 sec. (compared with the second execution of the same query),
even if we did not have any SQL request anymore.
The warm-up of the cache does not work as expected (in our opinion) - MDX-B takes much longer to collect data with one statement, as we would run the the monthly execution in three steps (Febrary to April) - and the calculation in memory also takes more time - why - how does loading segmentation really works?
What is the best practice to load the segments to speed up calculation in memory?
Is there a way to feed the "Mondrian-Cube" with simple SQL statements?
Thanks in advance.
Fact table with 3.026.236 rows - growing daily
6 dimension tables.
Date dimension table 21.183 rows.
We have monitored our test classes with JVM's VisualAdmin.
Mondrian 3.7.preview.506 - olap4j-1.1.0
Database: Oracle Database 11g Release 11.2.0.4.0 - 64bit
(we tried to use also memSQL database, we was only 50% faster ...)
My 10g oracle prod database have performance problem. Some queries begun to return in 20 seconds which was comes in milliseconds. I get AWR report and top3 wait event shown below. I searched but i couldnt understand as well.
Can someone explain this events ? Thanks,
Event Waits Time(s) Avg Wait(ms) % Total Call Time Wait Class
---------------------- ---------- ------- ------------ ----------------- ----------
direct path write temp 11,941,557 866,004 73 29.8 User I/O FEBRUARY
direct path write temp 16,197,445 957,129 59 17.2 User I/O MARCH
db file scattered read 5,826,190 58,095 10 2.0 User I/O FEBRUARY
db file scattered read 10,128,657 70,408 7 1.3 User I/O MARCH
direct path read temp 34,197,762 324,663 9 11.2 User I/O FEBRUARY
direct path read temp 88,688,686 507,715 6 9.1 User I/O MARCH
Two of your wait events are related to sorting: direct path write temp and direct path read temp. These indicate an increase in sorting on disk rather than in memory; disk I/O is always slower.
So, what has changed regarding memory allocation usage? Perhaps you need to revisit the values of SORT_AREA_SIZE or PGA_AGGREGATE_TARGET init parameters (depending on whether you are using Automatic PGA memory). Here is a query which calculates the memory/disk sort ratio:
SELECT 100 * (mem.value - dsk.value)/(mem.value) AS sort_ratio
FROM v$sysstat mem
cross join v$sysstat dsk
WHERE mem.name = 'sorts (memory)'
AND dsk.name ='sorts (disk)'
In an OLTP application we would expect this to be over 95%.
The other thing is, instead of looking at macro events you need to look at the specific queries which are running much slower. What has changed with regards to them? Lots more data? New indexes or dropped indexes? Refreshed statistics?
"SORT_RATIO ---------- 99.9985462"
So, sorts are higher but not too high. You need to focus on specific queries.
"in march we begun to user phyton application for some new queries. reason can be this ? "
Could be. Application change is always the prime suspect when our system exhibits different behavior.
I am connecting to a remote Oracle DB using MS Access 2010 and ODBC for Oracle driver
IN MS Access it takes about 10 seconds to execute:
SELECT * FROM SFMFG_SACIQ_ISC_DRAWING_REVS
But takes over 20 minutes to execute:
SELECT * INTO saciq_isc_drawing_revs FROM SFMFG_SACIQ_ISC_DRAWING_REVS
Why does it take so long to build a local table with the same data?
Is this normal?
The first part is reading the data and you might not be getting the full result set back in one go. The second is both reading and writing the data which will always take longer.
You haven't said how many records you're retrieving and inserting. If it's tens of thousands then 20 minutes (or 1200 seconds approx.) seems quite good. If it's hundreds then you may have a problem.
Have a look here https://stackoverflow.com/search?q=insert+speed+ms+access for some hints as to how to improve the response and perhaps change some of the variables - e.g. using SQL Server Express instead of MS Access.
You could also do a quick speed comparison test by trying to insert the records from a CSV file and/or Excel cut and paste.
Setup:
Entity Framework 4 with lazy loading enabled (model-first, table-per-hierarchy).
Number of table is about 40 (and no table has more than 15-20 fields).
SQL Server Express 2008 (not r2).
No database triggers or any other stuff like this exist - it is only used for storage. All the logic is in the code.
Database size at the moment is approx 2gb.
(Primary keys are Guids and are generated in code via Guid.NewGuid() - if this matters)
Saving a complex operation result (which produces a complex object graph) takes anywhere from 40 to 60 seconds (the number returned by SaveChanges is approx. 8000 - mostly added objects and a some modified).
Saving the same operation result with an empty (or an almost empty) database usually takes around 1 seconds on the same computer.
The only variable that seems to affect this issue is the database size. But please note that I am only measuring the Context.SaveChages() call (so even if I have some weird sluggish queries somewhere that should not affect this issue).
Any suggestions as to why this operation may last this long are appreciated.
UPDATE 1
Just to clarify - the code that takes 40-60 seconds to execute is (it takes this long only when the DB size is around 2gb):
Stopwatch sw = Stopwatch.StartNew();
int count = objectContext.SaveChanges(); // this method is not overridden
Debug.Write(sw.ElapsedMilliseconds); // prints out 40000 - 60000 ms
Debug.Write(count); // I am testing with exactly the same operation and the
// result always gives the same count for it (8460)
The same operation with an empty DB takes around 1000 ms (while still giving the same count - 8460). Thus the question would be - how could database size affect SaveChanges()?
Update 2
Running a perf profiler shows that the main bottleneck (from "code perspective") is the following method:
Method: static SNINativeMethodWrapper.SNIReadSync
Called: 3251 times
Avg: 10.56 ms
Max: 264.25 ms
Min: 0.01 ms
Total: 34338.51 ms
Update 3
There are non-clustered indexes for all PKs and FKs in the database. We are using random Guids as surrogate keys (not sequential) thus fragmentation is always at very high levels. I tried testing executing the operation in question right after rebuilding all DB indexes (fragmentation was less than 2-3% for all indexes) but it did not seems to improve the situation in any way.
In addition I must say that during the operation in question one table involved in the process has approximately 4 million rows (this table gets lots of inserts). SQL Profiler shows that inserts to that table can last anywhere from 1 to 200 ms (this is a "spike"). Yet again, it does not seem that this changes in case indexes are freshly rebuilt.
In any case - it seems (at the moment) that the problem is on the SQL Server side of the application since the main thing taking up time is that SNIReadSync method. Correct me if I am being completely ignorant.
It hard to guess without profiler, but 8000 of records is definitely too many. Usually EF 4 works ok with up to couple of hundreds objects. I would not be surprised if it turns that change tracking takes most of this time. EF 5 and 6 have some performance optimizations, so if you cannot decrease number of tracked objects somehow, you could experiment with them.