Oracle: performance about filtering results from remote view - oracle

I have a remote database A which has a view v_myview. I am working on a local database, which has a dblink to access v_myview on databse A. If I query the view like this :
select * from v_myview # dblink ;
it returns half million rows. I just want to get some specific rows from the view,e.g., to get rows with id=123, my query is
select * from v_myview # dblink where id=123;
This works as expected. Here comes my question, when I run this query, will remote database generates the half million rows first then from there to find rows with id=123? or the remote view applies my filter first then query the DB without retrieving the half million rows first? how do I know that. Thank you!

Oracle is free to do either. You'd need to look at the query plan to see whether the filtering is being done locally or remotely.
Presumably, in a case as simple as the one you present, the optimizer would expect it to be more efficient to send the filter to the remote server rather than pulling half a million rows over the network only to filter them locally. That calculation may be different if the optimizer expects the unfiltered query to return a single row rather than half a million rows and it may be different if the query gets more complicated doing something like joining to a local table or calling a function on the local server.

Related

Can I use the results of a query as criteria for the initial pull of a second query in Excel Power Query

I am using Microsoft Excel's Power Query to pull information directly from two separate data sources (IBM DB2 and Teradata) and merge them together in an Excel worksheet. The results of the first query, from DB2, are only around 300 rows, and I want to return data from the Teradata table only where it matches those 300 rows (a left join). The Teradata table is very large (more than 5 million). When I build my query in Excel's Power Query, it wants to pull the entire Teradata table first before joining it with the 300 criteria rows, and due to the size of the Teradata table, it fails.
Is there a way for me to set it up so that the initial query pull in Power Query from the Teradata table incorporates the results of the first query, so that it will process and pull back the matching information?
Thank you!
For a query like that, with two different systems as the data sources, all the data will have to be pulled into Excel so that Power Query can work a join or a filter.
With SQL data sources, Power Query can use query folding to create a Select statement that incorporates filters and joins, but that can not be applied when the data is on two totally separate systems. In that case, Excel is the tool that performs the selection, and in order to do that, all the data has to be in Excel first.
If that is too big for Excel to handle, you could try Power BI and see if that makes a difference when the data is refreshed through a data gateway.

How to handle large amount of data using linq in mvc

I face a problem using a linq query. I am getting data from a SQL database using this query and date time parameter (see below). When this query executes, it takes a long time, and after a long wait, I get an error.
Data is available in database, and when I use Take() with number of rows, it will work. I don't to know how to figure out the problem.
Is it possible my query hits a large amount of data causing the query to not work? Can you please share any suggestions on how to solve this issue?
from ClassificationData in DbSet
where ClassificationData.CameraListId == id &&
ClassificationData.DateTime <= endDate &&
ClassificationData.DateTime >= startdate
orderby ClassificationData.Id descending
select ClassificationData
Your problem is probably more in the realm of SQL than LINQ. LINQ just translates what you write into Transact-SQL (T-SQL) that gets sent up to SQL Server. If your SQL Server is not set-up properly, then you'll get a timeout if the query takes too long.
You need to make sure that you have indexes on the ClassificationData table (I assume it's a table, but it could be a view -- whatever it is, you need to put indexes on it if it has lots of rows). Make sure that an index is on DateTime, and that an index is also on CameraListId. Then, bring down the data unordered and execute the order-by in a separate query done on the local machine -- that will let SQL Server start giving you data right away instead of sorting it first, reducing the chance for a timeout.
If your problems persist, write queries directly against SQL Server (in Query Analyzer, for instance, if they still use that -- I'm old school, but a bit rusty). Once you have SQL Server actually working, then you should be able to write the LINQ that translates into that SQL. But you could also make it a stored procedure, which has other benefits, but that's another story for another day.
Finally, if it's still too much data, break up your data into multiple tables, possibly by CameraListId, or perhaps a week at a time of data so the DateTime part of the query doesn't have as much to look through.
Good luck.

Get the top N records from two unconnected data sets

I have two Rails services that return data from distinct databases. In one data set I have records with fields that are something like this:
query, clicks, impressions
In the second I have records with fields something like this:
query, clicks, visitors
What I want to be able to do, is get paged data from the merged set, matching on queries. But it needs to also include all records that exist in one or the other data sets, and then sort them by the 'clicks' column.
In SQL if these two tables were in the same database I'd do this:
SELECT COALESCE(a.query, b.query), a.clicks, b.clicks, impressions, visitors
FROM a OUTER JOIN b ON a.query = b.query
LIMIT 100 OFFSET 1
ORDER BY MAX(a.clicks, b.clicks)
An individual "top 100" to each data set produces incorrect results because 'clicks' in data set 'a' may be significantly higher or lower than in dataset 'b'.
As they aren't in the same database, I'm looking for help with the algorithm that makes this kind of query efficient and clean.
I never found a way to do this outside of a database. In the end, we just used PostgreSQL's Foreign Data Wrapper feature to connect the two databases together and use PostgreSQL for handling the sorting and paging.
One trick for anyone heading down this path, we built VIEWs on the remote server that provided exactly the data needed in a above. This was thousands of times faster than trying to join tables across the remote connection as the value of the indexes was lost.

Delphi: ClientDataSet is not working with big tables in Oracle

We have a TDBGrid that connected to TClientDataSet via TDataSetProvider in Delphi 7 with Oracle database.
It goes fine to show content of small tables, but the program hangs when you try to open a table with many rows (for ex 2 million rows) because TClientDataSet tries to load the whole table in memory.
I tried to set "FetchOnDemand" to True for our TClientDataSet and "poFetchDetailsOnDemand" to True in Options for TDataSetProvider, but it does not help to solve the problem. Any ides?
Update:
My solution is:
TClientDataSet.FetchOnDemand = T
TDataSetProvider.Options.poFetchDetailsOnDemand = T
TClientDataSet.PacketRecords = 500
I succeeded to solve the problem by setting the "PacketRecords" property for TCustomClientDataSet. This property indicates the number or type of records in a single data packet. PacketRecords is automatically set to -1, meaning that a single packet should contain all records in the dataset, but I changed it to 500 rows.
When working with RDBMS, and especially with large datasets, trying to access a whole table is exactly what you shouldn't do. That's a typical newbie mistake, or a borrowing from old file based small database engines.
When working with RDBMS, you should load the rows you're interested in only, display/modify/update/insert, and send back changes to the database. That means a SELECT with a proper WHERE clause and also an ORDER BY - remember row ordering is never assured when you issue a SELECT without an OREDER BY, a database engine is free to retrieve rows in the order it sees fit for a given query.
If you have to perform bulk changes, you need to do them in SQL and have them processed on the server, not load a whole table client side, modify it, and send changes row by row to the database.
Loading large datasets client side may fali for several reasons, lack of memory (especially 32 bit applications), memory fragmentation, etc. etc., you will flood the network probably with data you don't need, force the database to perform a full scan, maybe flloding the database cache as well, and so on.
Thereby client datasets are not designed to handle millions of billions of rows. They are designed to cache the rows you need client side, and then apply changes to the remote data. You need to change your application logic.

The query time of a view increases after having fetched the last page from a view in Oracle PL/SQL

I'm using Oracle PL/SQL Developer on a Oracle Database 11g. I have recently written a view with some weird behaviour. When I run the simple query below without fetching the last page of the query the query time is about 0.5 sec (0.2 when cached).
select * from covenant.v_status_covenant_tuning where bankkode = '4210';
However, if i fetch the last page in PL/SQL Developer or if I run the query from Java-code (i.e. I run a query that retrieves all the rows) something happens to the view and the query time increases to about 20-30 secs.
The view does not start working properly again before I recompile it. The explain plan is exactly the same before and after. All indexes and tables are analyzed. I don't know if it's relevant but the view uses a few analytic expressions like rank() over (partition by .....), lag(), lead() and so on.
As I'm new here I can't post a picture of the explain plan (need a reputation of 10) but in general the optimizer uses indexes efficiently and it does a few sorts because of the analytic functions.
If the plan involves a full scan of some sort, the query will not complete until the very last block in the table has been read.
Imagine a table that has lots of matching rows in the very first few blocks in the table, and no matching rows in the rest of it. If there is a large volume of blocks to check, the query might return the first few pages of results very quickly, as it finds them all in the first few blocks of the table. But before it can return the final "no more results" to the client, it must check every last block of the table - it doesn't know if there might be one more result in the very last block of the table, so it has to wait until it has read that last block.
If you'd like more help, please post your query plan.

Resources