Power Query Connection Query vs Table Query Conundrum - powerquery

I am looking for guidance on whether I should be specifying queries to external files (Excel and CSV) as just connection only or create a table in the spreadsheet.
I have a large main query (call it CALCS) which takes the data from the external data connection queries and every time I refresh CALCS, it executes these external connection queries.
==> The external data changes very little
Instead of having them as connection queries, should I have each of these queries export their data to a table and then read in that table via the CALCS query. Would think that if I just refresh the CALCS query, the other queries would not execute (unless I did refresh all) and the data read-in would be much quicker.

Only load to a table (or Power Pivot Data Model) if you plan to use the data in that final state. If it is an intermediate query that is used in another query, then just create a connection.

Related

Getting schema information from JDBC in batch

Is there a way of retrieving the schema of a database via JDBC in a batch, without having to make to make a .getTableNames call followed by a series of .getColumnNames / .getColumnTypes calls? I would like to be able to issue a single call (that resulted in a single or small number of queries to the database) that returned all the tables in a schema, along with their columns and column type info. For large schemas, the separate method calls can take a long time.
Of course, it can be done with a database-system-specific query against the database's information schema, but I'm looking for a way that is generic to JDBC.

Can I use the results of a query as criteria for the initial pull of a second query in Excel Power Query

I am using Microsoft Excel's Power Query to pull information directly from two separate data sources (IBM DB2 and Teradata) and merge them together in an Excel worksheet. The results of the first query, from DB2, are only around 300 rows, and I want to return data from the Teradata table only where it matches those 300 rows (a left join). The Teradata table is very large (more than 5 million). When I build my query in Excel's Power Query, it wants to pull the entire Teradata table first before joining it with the 300 criteria rows, and due to the size of the Teradata table, it fails.
Is there a way for me to set it up so that the initial query pull in Power Query from the Teradata table incorporates the results of the first query, so that it will process and pull back the matching information?
Thank you!
For a query like that, with two different systems as the data sources, all the data will have to be pulled into Excel so that Power Query can work a join or a filter.
With SQL data sources, Power Query can use query folding to create a Select statement that incorporates filters and joins, but that can not be applied when the data is on two totally separate systems. In that case, Excel is the tool that performs the selection, and in order to do that, all the data has to be in Excel first.
If that is too big for Excel to handle, you could try Power BI and see if that makes a difference when the data is refreshed through a data gateway.

How to handle large amount of data using linq in mvc

I face a problem using a linq query. I am getting data from a SQL database using this query and date time parameter (see below). When this query executes, it takes a long time, and after a long wait, I get an error.
Data is available in database, and when I use Take() with number of rows, it will work. I don't to know how to figure out the problem.
Is it possible my query hits a large amount of data causing the query to not work? Can you please share any suggestions on how to solve this issue?
from ClassificationData in DbSet
where ClassificationData.CameraListId == id &&
ClassificationData.DateTime <= endDate &&
ClassificationData.DateTime >= startdate
orderby ClassificationData.Id descending
select ClassificationData
Your problem is probably more in the realm of SQL than LINQ. LINQ just translates what you write into Transact-SQL (T-SQL) that gets sent up to SQL Server. If your SQL Server is not set-up properly, then you'll get a timeout if the query takes too long.
You need to make sure that you have indexes on the ClassificationData table (I assume it's a table, but it could be a view -- whatever it is, you need to put indexes on it if it has lots of rows). Make sure that an index is on DateTime, and that an index is also on CameraListId. Then, bring down the data unordered and execute the order-by in a separate query done on the local machine -- that will let SQL Server start giving you data right away instead of sorting it first, reducing the chance for a timeout.
If your problems persist, write queries directly against SQL Server (in Query Analyzer, for instance, if they still use that -- I'm old school, but a bit rusty). Once you have SQL Server actually working, then you should be able to write the LINQ that translates into that SQL. But you could also make it a stored procedure, which has other benefits, but that's another story for another day.
Finally, if it's still too much data, break up your data into multiple tables, possibly by CameraListId, or perhaps a week at a time of data so the DateTime part of the query doesn't have as much to look through.
Good luck.

Dynamically creating LINQ query with dynamic connection string

I am trying to page a large set of data using IQueryable. Take method for a query that I am building dynamically. I was going to use the take method so I can keep taking until I have read the entire large dataset. Basically, I am reading the data in chucks to increase performance.
I am trying to allow a user via a few drop downs. Build a connectionstring to the database (database and table can change) and I want to be able to then take the table that is selected by the user and perform a select clause. Not knowing which table or elements are available during run time the select and from section will have to be dynamic.
I am not sure if this is possible with the Linq but does anyone have a idea on how I can solve this?

Oracle: performance about filtering results from remote view

I have a remote database A which has a view v_myview. I am working on a local database, which has a dblink to access v_myview on databse A. If I query the view like this :
select * from v_myview # dblink ;
it returns half million rows. I just want to get some specific rows from the view,e.g., to get rows with id=123, my query is
select * from v_myview # dblink where id=123;
This works as expected. Here comes my question, when I run this query, will remote database generates the half million rows first then from there to find rows with id=123? or the remote view applies my filter first then query the DB without retrieving the half million rows first? how do I know that. Thank you!
Oracle is free to do either. You'd need to look at the query plan to see whether the filtering is being done locally or remotely.
Presumably, in a case as simple as the one you present, the optimizer would expect it to be more efficient to send the filter to the remote server rather than pulling half a million rows over the network only to filter them locally. That calculation may be different if the optimizer expects the unfiltered query to return a single row rather than half a million rows and it may be different if the query gets more complicated doing something like joining to a local table or calling a function on the local server.

Resources