Is it able to return data portion by portion - jersey

I have some client, REST web servise and database. Database has big tables (10 million rows).
I need search some data in db (1 million results), put it in excel file with apache poi and return to the client. The problem is retrieving data from database and forming file can be longer than 1h. Is it exists a way to return data portionally (retrieve 1000 rows, return, retrieve next 1000, return and so on)?

Related

How to push millions of records to db

I have to read the 2 millions records from db and store in another db,tried reading all the data using spring pagination , need the best and east approach to write the batch to process the records batches wise to store in another db.

Can I use the results of a query as criteria for the initial pull of a second query in Excel Power Query

I am using Microsoft Excel's Power Query to pull information directly from two separate data sources (IBM DB2 and Teradata) and merge them together in an Excel worksheet. The results of the first query, from DB2, are only around 300 rows, and I want to return data from the Teradata table only where it matches those 300 rows (a left join). The Teradata table is very large (more than 5 million). When I build my query in Excel's Power Query, it wants to pull the entire Teradata table first before joining it with the 300 criteria rows, and due to the size of the Teradata table, it fails.
Is there a way for me to set it up so that the initial query pull in Power Query from the Teradata table incorporates the results of the first query, so that it will process and pull back the matching information?
Thank you!
For a query like that, with two different systems as the data sources, all the data will have to be pulled into Excel so that Power Query can work a join or a filter.
With SQL data sources, Power Query can use query folding to create a Select statement that incorporates filters and joins, but that can not be applied when the data is on two totally separate systems. In that case, Excel is the tool that performs the selection, and in order to do that, all the data has to be in Excel first.
If that is too big for Excel to handle, you could try Power BI and see if that makes a difference when the data is refreshed through a data gateway.

Query & Plot 1 Million datapoint using Hibernate + Highcharts

I have a table with 5 million records & need to run a query which will return minimum 1+ million records.
I tried to implement querying part to return 1 million records only using Spring Boot + Hibernate. It takes 5mins for querying 60K records for hibernate to query & map entity. But, from SQL developer it takes 0.03secs only. for querying 1M records, getting timeout error.
I cannot have Pagination, since i am going to plot result in chart (HighCharts). I need entire data for chart datasource.
Please suggest how to handle Querying of such big data & rendering plot

Delphi: ClientDataSet is not working with big tables in Oracle

We have a TDBGrid that connected to TClientDataSet via TDataSetProvider in Delphi 7 with Oracle database.
It goes fine to show content of small tables, but the program hangs when you try to open a table with many rows (for ex 2 million rows) because TClientDataSet tries to load the whole table in memory.
I tried to set "FetchOnDemand" to True for our TClientDataSet and "poFetchDetailsOnDemand" to True in Options for TDataSetProvider, but it does not help to solve the problem. Any ides?
Update:
My solution is:
TClientDataSet.FetchOnDemand = T
TDataSetProvider.Options.poFetchDetailsOnDemand = T
TClientDataSet.PacketRecords = 500
I succeeded to solve the problem by setting the "PacketRecords" property for TCustomClientDataSet. This property indicates the number or type of records in a single data packet. PacketRecords is automatically set to -1, meaning that a single packet should contain all records in the dataset, but I changed it to 500 rows.
When working with RDBMS, and especially with large datasets, trying to access a whole table is exactly what you shouldn't do. That's a typical newbie mistake, or a borrowing from old file based small database engines.
When working with RDBMS, you should load the rows you're interested in only, display/modify/update/insert, and send back changes to the database. That means a SELECT with a proper WHERE clause and also an ORDER BY - remember row ordering is never assured when you issue a SELECT without an OREDER BY, a database engine is free to retrieve rows in the order it sees fit for a given query.
If you have to perform bulk changes, you need to do them in SQL and have them processed on the server, not load a whole table client side, modify it, and send changes row by row to the database.
Loading large datasets client side may fali for several reasons, lack of memory (especially 32 bit applications), memory fragmentation, etc. etc., you will flood the network probably with data you don't need, force the database to perform a full scan, maybe flloding the database cache as well, and so on.
Thereby client datasets are not designed to handle millions of billions of rows. They are designed to cache the rows you need client side, and then apply changes to the remote data. You need to change your application logic.

Oracle: performance about filtering results from remote view

I have a remote database A which has a view v_myview. I am working on a local database, which has a dblink to access v_myview on databse A. If I query the view like this :
select * from v_myview # dblink ;
it returns half million rows. I just want to get some specific rows from the view,e.g., to get rows with id=123, my query is
select * from v_myview # dblink where id=123;
This works as expected. Here comes my question, when I run this query, will remote database generates the half million rows first then from there to find rows with id=123? or the remote view applies my filter first then query the DB without retrieving the half million rows first? how do I know that. Thank you!
Oracle is free to do either. You'd need to look at the query plan to see whether the filtering is being done locally or remotely.
Presumably, in a case as simple as the one you present, the optimizer would expect it to be more efficient to send the filter to the remote server rather than pulling half a million rows over the network only to filter them locally. That calculation may be different if the optimizer expects the unfiltered query to return a single row rather than half a million rows and it may be different if the query gets more complicated doing something like joining to a local table or calling a function on the local server.

Resources