Oracle Text URL indexing performance - oracle

I have an oracle table that has two columns - id and url.
The URL is simply http://somemachinename/mypage.php?id=
I then create an oracle text index using the URL datastore on the url column.
If I then do:
BEGIN
ctx_ddl.sync_index(idx_name => 'MY_INDEX',
memory => '50M',
parallel_degree => 4);
END;
/
Then if I look at the apache logs on somemachinename I can see oracle requesting all of the URLs in turn.
The problem is that oracle requests about 60 urls in turn and then stops for 15s, before requesting another 60-ish urls.
The amount of data in the html page is small - less than 3k, so 60 pages shouldnt be filling any buffers - and even if it were it shouldnt take 15s to clear them.
Running wireshark shows that the delay is definitely in the requests arriving (rather than a problem with the webserver), so I dont know what Oracle is doing in those 15s.
The indexing is a big job (the table has about 2m rows), and it currently takes a week, wheras without the del;ays I think it would be more like hours....
Any thoughts?

"so I dont know what Oracle is doing in those 15s."
You could look at the Wait / Event on v$session. Could be chewing CPU parsing the HTML, or it maybe that it needs to write the data somewhere (temp tablespace) first.
Don't suppose there is anything on the network side treating this as some sort of attack ?

Related

Browser: Network - response time shows 2 seconds when it displays after ~15

I wanted to check how quickly my web application will display results for a query : SELECT * FROM orders.
the query returns about 20k records on one page and it takes about 15 seconds
Why on every browser the response time stops after two seconds? Is it because the browser has trouble displaying so many records per one page? at 70k it gets out of memory.
Database - mysql on hosting
problem
correct response time
If you want to check how long it takes for the web app to process. You can add logging before and after doing the query.
You also could add some logging of the current time, when receiving the request and before returning the response.
As for why the request stops after two seconds, I don't think we have enough information to decide.
It could be from the web server default configuration that you use.
In my opinion, displaying 20k records might not be an efficient approach.
Other than the time to query and response time.
You might want to consider the looping that happens on the front end.
Personally, I would recommend paging at a lower number, and if you need to display all the data at once. You might consider using lazy loading as an option.
I know this is a very generic answer, but hopefully, this could help you out.

Laravel pagination in Data Table

I am using DataTable plugin in Laravel. I have a record of 3000 entries in some
But when i load that page it loads all 3000 records in the browser then create pagination, this slow down the page loading.
How to fix this or correct way
Use server-side processing.
Get help from some Laravel Packages. Such as Yajra's: https://yajrabox.com/docs/laravel-datatables/
Generally you can solve pagination either on the front end, the back end (server or database side), or a combination of both.
Server side processing, without a package, would mean setting up TOP/FETCH or make rows in data being returned from your server. 

You could also load a small amount (say 20) and then when the user scrolls to the bottom of the list, load another 20 or so. I mention the inclusion of front end processing as well because I’m not sure what your use cases are, but I imagine it’s pretty rare any given user actually needs to see 3000 rows at a time.

Given that Data Tables seems to have built-in functionality for paginating data, I think that #tersakyan is essentially correct — what you want is some form of back-end filtering or paginating of rows of data to limit what’s being sent to the front end.

I don’t know if that package works for you or not or what your setup looks like, but pagination can also be achieved directly from a DataBase returning data via the SQL (using TOP/FETCH for example) or could be implemented in a Controller or Service by tracking pages of data and “loading a page at a time” both from the server and then into the table. All you would need is a unique key to associate each "set of pages" for a specific request.
But for performance, you want to avoid both large data requests and operations on large sets of data. So the more you limit how much data is being grabbed or processed at any stage of your application using it, the more performant your application will be in principle.




Delphi: ClientDataSet is not working with big tables in Oracle

We have a TDBGrid that connected to TClientDataSet via TDataSetProvider in Delphi 7 with Oracle database.
It goes fine to show content of small tables, but the program hangs when you try to open a table with many rows (for ex 2 million rows) because TClientDataSet tries to load the whole table in memory.
I tried to set "FetchOnDemand" to True for our TClientDataSet and "poFetchDetailsOnDemand" to True in Options for TDataSetProvider, but it does not help to solve the problem. Any ides?
Update:
My solution is:
TClientDataSet.FetchOnDemand = T
TDataSetProvider.Options.poFetchDetailsOnDemand = T
TClientDataSet.PacketRecords = 500
I succeeded to solve the problem by setting the "PacketRecords" property for TCustomClientDataSet. This property indicates the number or type of records in a single data packet. PacketRecords is automatically set to -1, meaning that a single packet should contain all records in the dataset, but I changed it to 500 rows.
When working with RDBMS, and especially with large datasets, trying to access a whole table is exactly what you shouldn't do. That's a typical newbie mistake, or a borrowing from old file based small database engines.
When working with RDBMS, you should load the rows you're interested in only, display/modify/update/insert, and send back changes to the database. That means a SELECT with a proper WHERE clause and also an ORDER BY - remember row ordering is never assured when you issue a SELECT without an OREDER BY, a database engine is free to retrieve rows in the order it sees fit for a given query.
If you have to perform bulk changes, you need to do them in SQL and have them processed on the server, not load a whole table client side, modify it, and send changes row by row to the database.
Loading large datasets client side may fali for several reasons, lack of memory (especially 32 bit applications), memory fragmentation, etc. etc., you will flood the network probably with data you don't need, force the database to perform a full scan, maybe flloding the database cache as well, and so on.
Thereby client datasets are not designed to handle millions of billions of rows. They are designed to cache the rows you need client side, and then apply changes to the remote data. You need to change your application logic.

Caching expensive SQL query in memory or in the database?

Let me start by describing the scenario. I have an MVC 3 application with SQL Server 2008. In one of the pages we display a list of Products that is returned from the database and is UNIQUE per logged in user.
The SQL query (actually a VIEW) used to return the list of products is VERY expensive.
It is based on very complex business requirements which cannot be changed at this stage.
The database schema cannot be changed or redesigned as it is used by other applications.
There are 50k products and 5k users (each user may have access to 1 up to 50k products).
In order to display the Products page for the logged in user we use:
SELECT TOP X * FROM [VIEW] WHERE UserID = #UserId -- where 'X' is the size of the page
The query above returns a maximum of 50 rows (maximum page size). The WHERE clause restricts the number of rows to a maximum of 50k (products that the user has access to).
The page is taking about 5 to 7 seconds to load and that is exactly the time the SQL query above takes to run in SQL.
Problem:
The user goes to the Products page and very likely uses paging, re-sorts the results, goes to the details page, etc and then goes back to the list. And every time it takes 5-7s to display the results.
That is unacceptable, but at the same time the business team has accepted that the first time the Products page is loaded it can take 5-7s. Therefore, we thought about CACHING.
We now have two options to choose from, the most "obvious" one, at least to me, is using .Net Caching (in memory / in proc). (Please note that Distributed Cache is not allowed at the moment for technical constraints with our provider / hosting partner).
But I'm not very comfortable with this. We could end up with lots of products in memory (when there are 50 or 100 users logged in simultaneously) which could cause other issues on the server, like .Net constantly removing cache items to free up space while our code inserts new items.
The SECOND option:
The main problem here is that it is very EXPENSIVE to generate the User x Product x Access view, so we thought we could create a flat table (or in other words a CACHE of all products x users in the database). This table would be exactly the result of the view.
However the results can change at any time if new products are added, user permissions are changed, etc. So we would need to constantly refresh the table (which could take a few seconds) and this started to get a little bit complex.
Similarly, we though we could implement some sort of Cache Provider and, upon request from a user, we would run the original SQL query and select the products from the view (5-7s, acceptable only once) and save that result in a flat table called ProductUserAccessCache in SQL. Next request, we would get the values from this cached-table (as we could easily identify the results were cached for that particular user) with a fast query without calculations in SQL.
Any time a product was added or a permission changed, we would truncate the cached-table and upon a new request the table would be repopulated for the requested user.
It doesn't seem too complex to me, but what we are doing here basically is creating a NEW cache "provider".
Does any one have any experience with this kind of issue?
Would it be better to use .Net Caching (in proc)?
Any suggestions?
We were facing a similar issue some time ago, and we were thinking of using EF caching in order to avoid the delay on retrieving the information. Our problem was a 1 - 2 secs. delay. Here is some info that might help on how to cache a table extending EF. One of the drawbacks of caching is how fresh you need the information to be, so you set your cache expiration accordingly. Depending on that expiration, users might need to wait to get the fresh info more than they would like to, but if your users can accept that they migth be seing outdated info in order to avoid the delay, then the tradeoff would worth it.
In our scenario, we decided to better have the fresh info than quick, but as I said before, our waiting period wasn't that long.
Hope it helps

ColdFusion's cfquery failing silently

I have a query that retrieves a large amount of data.
<cfsetting requesttimeout="9999999" >
<cfquery name="randomething" datasource="ds" timeout="9999999" >
SELECT
col1,
col2
FROM
table
</cfquery>
<cfdump var="#randomething.recordCount#" /> <!---should be about 5 million rows --->
I can successfully retrieve the data with python's cx_Oracle and using sys.getsizeof on the python list returns 22621060, so about 21 megabytes.
ColdFusion does not return an error on the page, and I can't find anything in any of the logs. Why is cfdump not showing the number of rows?
Additional Information
The reason for doing it this way is because I have about 8000 smaller queries to run against the randomthing query. In other words when I run those 8000 queries against the database it takes hours for that process to complete. I suspect this is because I am competing with several other database users, and the database is getting bogged down.
The 8000 smaller queries are getting counts of col1 over a period of col2.
SELECT
count(col1) as count
WHERE
col2 < 20121109
AND
col2 > 20121108
According to Adam Cameron's suggestions.
cflog is suggesting that the query isn't finishing.
I tried changing the queries timeout both in the code and in the CFIDE/administrator, apparently CF9 no long respects the timeout attribute, regardless of what I tried I couldn't get the query to timeout.
I also started playing around with the maxrows attribute to see if I could discern any information that way.
when maxrows is set to 1300000 everything works fine
when maxrows is 1400000 or greater I get this error
when maxrows is 2000000 I observe my original problem
Update
So this isn't a limit of cfquery. By using QueryNew then looping over it to add data and I can get well past the 2 million mark without any problems.
I also created a ThinClient datasource using the information in this question, I didn't observe any change in behavior.
The messages on the database end are
SQL*Net message from client
and
SQL*Net more data to client
I just discovered that by using the thin client along with blockfactor1="100" I can retrieve more rows (appx. 3000000).
Is there anything logged on the DB end of things?
I wonder if the timeout is not being respected, and JDBC is "hanging up" on the DB whilst it's working. That's a wild guess. What if you set a very low timeout - eg: 5sec - does it error after 5sec, or what?
The browser could be timing out too. What say you write something to a log before and after the <cfquery> block, with <cflog>. To see if the query is eventually finishing.
I have to wonder what it is you intend to do with these 22M records once you get them back to CF. Whatever it is, it sounds to me like CF is the wrong place to be doing whatever it is: CF ain't for heavy data processing, it's for making web pages. If you need to process 22M records, I suspect you should be doing it on the database. That said, I'm second-guessing what you're doing with no info to go on, so I presume there's probably a good reason to be doing it.
Have you tried wrapping your cfquery within cftry tags to see if that reports anything?
<cfsetting requesttimeout="600" >
<cftry>
<cfquery name="randomething" datasource="ds" timeout="590" >
SELECT
col1,
col2
FROM
table
</cfquery>
<cfdump var="#randomething.recordCount#" /> <!--- should be about 5 million rows --->
<cfcatch type="any">
<cfdump var="#cfcatch#">
</cfcatch>
</cftry>
This is just an idea, but you could give it a go:
You mention that using QueryNew you can successfully add the more-than-two-million records you need.
Also that when your maxRows is less than 1,300,000 things work as expected.
So why not first do a query to count(*) the total number of records in the table, divide by a million and round up, then cfloop over that number executing a query with maxRows=1000000 and startRow=((i - 1 * 1000000) + 1) on each iteration...
ArrayAppend each query from within the loop to an array then when it's all done, loop over your array pushing the records into a new Query object. That way you end up with a query at the end containing all the records you were trying to retrieve.
You might hit memory issues, and it will not perform all that well, but hey - this is Coldfusion, those are par for the course, and sometimes crazy things happen / work.
(You could always append the results of each query to the one you're building up from QueryNew as you go rather than pushing each query onto an array, but it'll be easier to debug and see how far you get if it doesn't work if you build an array as you go.)
(Also, using multiple queries within the size that it CF can handle, you may then be able to execute the process you need to by looping over the array and then each query, rather than building up one massive query - would save processing time and memory, but depends on whether you need the full results set in a single Query object or not)
if your date ranges are consistent, i would suggest some aggregate functions in sql instead of having cf process it. something like:
select col1, count(col1), year(col2), month(col2)
from table
group by year(col2), month(col2)
order by year(col2), month(col2)
add day() if you need that detail level, too. you can get really creative with date parts.
this should greatly speed up the entire run time, reduce the main query size.
Your problem here is that ColdFusion cannot time out SQL. This has always been an issue since CF6 I believe. So basically what is happening is that the cfquery is taking longer than 9999999 seconds but CF cannot timeout JDBC so it waits until afterwards tries to run cfdump (which internally uses cfoutput) and this is reported as timing out because the request is now considered to have run too long.
As Adam pointed out, whatever you are trying to do is too large for CF to realistically handle and will either need to be chopped up into smaller jobs or entirely handled in the DB.
So as it turns out the server was running out of memory, apparently cfquery takes up quite a bit more memory than a python list.
It was Barry's comment that got me going in the right direction, I didn't know much about the server monitor up until this point other than the fact that it existed.
As it turns out I am also not very good at reading, the errors that were getting logged in the application.log file were
GC overhead limit exceeded The specific sequence of files included or processed is: \path\to\index.cfm, line: 10 "
and
Java heap space The specific sequence of files included or processed is: \path\to\index.cfm
I'll end up going with Adams suggestion and let the database do the processing. At least now I'll be able to explain why things are slow instead of just saying, "I don't know".

Resources