When to specify PagingInfo.PagingCookie and PagingInfo.PageNumber - dynamics-crm

Though reading lots of available resources online, I couldn't find a straight explanation that clarifies:
In what case both the values of PageingInfo.PageNumber and PageingInfo.PagingCookie are required to be assigned?
For example: Consider we have 10,000 unique records, and we'd like to retrieve all of them, but 200 records at a time. Do we have to iterate over PageNumber (50 iterations), or is it enough to use only the PagingCookie?
Now, I'd like to share what I've found on online resources:
Firstly, many online resources links to the official MSDN examples (For FetchXML: 1, 2, for QueryExpression: 1, 2), but there isn't a direct answer for this issue. They all iterate over the PageNumber, but as I understand (and it may be wrong), a Page always has 5000 records (Or is it?).
Secondly, I've found this demonstration of using PagingCookie, fetching records 6-10 out of 40 records:
<fetch mapping="logical" count="5" page="2" paging-cookie="<cookie page="1"><new_parentrecordid last="{F8DAB1AA-3A0F-E411-8189-005056B20097}" first="{F8DAB1AA-3A0F-E411-8189-005056B20097}" /></cookie>" version="1.0">
<entity name="new_parentrecord">
<attribute name="new_name" />
<link-entity name="new_childrecord" from="new_parentaid" to="new_parentrecordid">
<attribute name="new_childrecordid" />
<attribute name="new_name" />
</link-entity>
</entity>
</fetch>
Then, it's explained that the above is translated to the following SQL query:
select top 6 "new_parentrecord0".new_name as "new_name"
, "new_parentrecord0".new_parentrecordId as "new_parentrecordid"
, "new_childrecord1".new_childrecordId as "new_childrecord1.new_childrecordid"
, "new_childrecord1".new_name as "new_childrecord1.new_name"
from
new_parentrecord as "new_parentrecord0"
join new_childrecord as "new_childrecord1" on ("new_parentrecord0".new_parentrecordId = "new_childrecord1".new_ParentAId)
where
((("new_parentrecord0".new_parentrecordId > '01DBB1AA-3A0F-E411-8189-005056B20097')))
order by
"new_parentrecord0".new_parentrecordId asc
Thus, as I see it in this example, there is no need to use the page number, since only the paging cookie is being used in resulted the SQL query.
Would be great to have a nice clarification for this issue.

My experience and additional research I just did indicates that yes, we do need to iterate over all the pages. While the page number never makes it into the SQL query, it controls which ID SQL will use in its filter.
Without page # in the query:
The ID's in the paging cookie stay the same:
With page #:
The ID's change:
It is also interesting to note that if you advance the page # without changing the page or ID's in the paging cookie,
The system uses the paging cookie's last ID and expands the SQL "top" parameter to get the number of pages you requested:
And it must do some processing on the SQL recordset because it returns the right number of records (in this case 20), with the ID's advanced:

The PagingCookie is a read-only identifier for the resultset the server has delivered to the client. Its purpose is to optimize data retrieval and should not be modified. The PagingCookie basically tells the server where the start and end of the page specified can be roughly found.
I assume it enables the CRM server to use cached query result sets. When paging through a result set using paging cookies I noticed the server only submits the SQL query once (i.e. when it receives a query without a paging cookie) and delivers subsequent pages - whenever possible - from memory.
The client is expected to simply copy the cookie into the FetchXML or QueryExpression when retrieving another page of the same resultset.
As far as I can see (testing it on Dynamics CRM 2016 (V8.2) this also works with joins.
I suspect the SQL query you are showing is actually the result of tampering with the PagingCookie: the server cannot find the cookie in its cache, assumes that the original resultset has been flushed and when it builds a fresh SQL query it fully relies on the correctness of the boundaries declared by the cookie. Hence the unexpected where clause.
Conclusion
Always use the page attribute (FetchXML) or PageInfo.PageNumber property (QueryExpression). Never modify the PagingCookie, just copy it into subsequent page requests.
When the page size (count attribute) is not specified it defaults to a maximum of 5,000 rows. When the expected result set is larger than that, you need to iterate all available pages.
When you are working with Dynamics CRM On Premise, you can modify the maximum page size using a PowerShell command.

Related

Web Scraping returning empty data table UiPath

I’m using Data Scraping to scrape a product Information (i.e Product Name, Url, Price, Model) from a shopping website.
When I search for a product, I want whatever item comes first it scrapes that item’s data and for that purpose I have set maximum number of results to 1. But the problem is sometimes it is returning empty Data table And I cannot figure out why.
What I think is, if the current search result matches those elements that I selected in data scraping wizard, it returns the data table and if it doesn’t match it returns empty Data table.
For Example, While selecting elements in Data scraping wizard the search results were Samsung monitors. And when I ran the project I searched for Dell monitors, it returned Data table but when I searched for Samsung series or Dell Series it returned empty Data table. What is wrong with this?
You need to tell what you actually need as output.
But if your output is empty, mostly the reason is one of the following:
make sure the timeout is high enough, set it to 30000 if you are unsure
set a proper selector that has not a bad impact even when the website is being changed for some reason
For me it working properly with a proper timeout and a flexible selector with a *.

Smart pagination algorithm that works with local data cache

This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.
Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.
You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.

Is it a good idea to store and access an active query resultset in Coldfusion vs re-quering the database?

I have a product search engine using Coldfusion8 and MySQL 5.0.88
The product search has two display modes: Multiple View and Single View.
Multiple displays basic record info, Single requires additional data to be polled from the database.
Right now a user does a search and I'm polling the database for
(a) total records and
(b) records FROM to TO.
The user always goes to Single view from his current resultset, so my idea was to store the current resultset for each user and not have to query the database again to get (waste a) overall number of records and (waste b) a the single record I already queried before AND then getting the detail information I still need for the Single view.
However, I'm getting nowhere with this.
I cannot cache the current resultset-query, because it's unique to each user(session).
The queries are running inside a CFINVOKED method inside a CFC I'm calling through AJAX, so the whole query runs and afterwards the CFC and CFINVOKE method are discarded, so I can't use query of query or variables.cfc_storage.
So my idea was to store the current resultset in the Session scope, which will be updated with every new search, the user runs (either pagination or completely new search). The maximum results stored will be the number of results displayed.
I can store the query allright, using:
<cfset Session.resultset = query_name>
This stores the whole query with results, like so:
query
CACHED: false
EXECUTIONTIME: 2031
SQL: SELECT a.*, p.ek, p.vk, p.x, p.y
FROM arts a
LEFT JOIN p ON
...
LEFT JOIN f ON
...
WHERE a.aktiv = "ja"
AND
... 20 conditions ...
SQLPARAMETERS: [array]
1) ... 20+ parameters
RESULTSET:
[Record # 1]
a: true
style: 402
price: 2.3
currency: CHF
...
[Record # 2]
a: true
style: 402abc
...
This would be overwritten every time a user does a new search. However, if a user wants to see the details of one of these items, I don't need to query (total number of records & get one record) if I can access the record I need from my temp storage. This way I would save two database trips worth 2031 execution time each to get data which I already pulled before.
The tradeoff would be every user having a resultset of up to 48 results (max number of items per page) in Session.scope.
My questions:
1. Is this feasable or should I requery the database?
2. If I have a struture/array/object like a the above, how do I pick the record I need out of it by style number = how do I access the resultset? I can't just loop over the stored query (tried this for a while now...).
Thanks for help!
KISS rule. Just re-query the database unless you find the performance is really an issue. With the correct index, it should scales pretty well. When the it is an issue, you can simply add query cache there.
QoQ would introduce overhead (on the CF side, memory & computation), and might return stale data (where the query in session is older than the one on DB). I only use QoQ when the same query is used on the same view, but not throughout a Session time span.
Feasible? Yes, depending on how many users and how much data this stores in memory, it's probably much better than going to the DB again.
It seems like the best way to get the single record you want is a query of query. In CF you can create another query that uses an existing query as it's data source. It would look like this:
<cfquery name="subQuery" dbtype="query">
SELECT *
FROM Session.resultset
WHERE style = #SelectedStyleVariable#
</cfquery>
note that if you are using CFBuilder, it will probably scream Error at you for not having a datasource, this is a bug in CFBuilder, you are not required to have a datasource if your DBType is "query"
Depending on how many records, what I would do is have the detail data stored in application scope as a structure where the ID is the key. Something like:
APPLICATION.products[product_id].product_name
.product_price
.product_attribute
Then you would really only need to query for the ID of the item on demand.
And to improve the "on demand" query, you have at least two "in code" options:
1. A query of query, where you query the entire collection of items once, and then query from that for the data you need.
2. Verity or SOLR to index everything and then you'd only have to query for everything when refreshing your search collection. That would be tons faster than doing all the joins for every single query.

Pagination in Classic ASP with VB Script

I am using ASP/VB Script in my project but, i don't have much idea of Pagination in Classic ASP. I have designed a datagrid format using tables and looping. That table is filled by accessing database. As we have a huge amount of data to display, we need pagination.
Thanks in advance
The pagination problem is not inherently to ASP classic or VBScript. You need first to define which strategy to follow:
In the client:
Ajax style pagination (You can use a jQuery plugin like SlickGrid)
Linked pagination: Your page have links to page 1, page 2, etc.
Infite scrolling: This is a modern way to do pagination, with more results added to the page via ajax
In the server
Full DB results retrieve and return only the page asked. This is sometimes necessary.
Full DB retrieve but caching the result so subsequent page request come from the cache, not the DB
Ask the DB only the page asked (Different techniques depending on the DB engine)
There is a issue you need to be aware of... the built-in ASP record set will allow pagiing, however is not very efficient. The entire result set gets returned to the browser and then it locates the appropriate page and displays that data.
Think of it like this... your result set is a 4 shelf book case. When you ask for page one all 4 shelves of books get returned. The the display code says "Okay now only show page 1". If you then ask for page two... All four shelves of books gets returned and then the display code says "Okay give me page 4".
So, you should look for a paging solution that takes place on the server, inside the database. This way if you ask for page 15 of a 50 page result, the database will only return one shelf of books.
This google query should put you on the right track.
Edit: How SQL Paging Works
You must us a stored procedure
One of the input parameters is the page to view
The stored procedure filters the results on the server
Here is the basic concept of what happens inside the proc:
Step 1:
Create a temp table that stores the entire result set. My preference is to store only two values in this temp table. An identity seed value called RowId and the primary key of the result data. (I'm one of those people that believes in non-sensical identity seed keys)
Step 2:
Insert all the PKey values from the select statement into the temp table
Step 3:
Determine the StartRowId and EndRowId based on the input page parameter.
Step 4:
Select from the temp table using an inner join to the datatable on the PKey. In the where clause limit the result so the RowId (of the temp table) is between StartRowId and EndRowId. Make sure to Order By the RowId.
Set page size
recordset.PageSize = 100 ' number of records per page
Set the current page
recordset.AbsolutePage = nPage ' nPage being the page you want to jump to.
Other useful bits:
recordset.RecordCount ' number of records returned
recordset.PageCount ' number of pages based on PageSize and RecordCount
That's the basic info. You'll still need to loop through the appropriate number of records, and check the page number as it is passed back to the page.

Paging, listing and grouping queries with AppFabric Cache

I read a lot of documents about AppFabric caching but most of them cover simple scenarios.
For example adding city list data or shopping card data to the cache.
But I need adding product catalog data to the cache.
I have 4 tables:
Product (1 million rows), ProductProperty (25 million rows), Property (100 rows), PropertyOption (300 rows)
I display paged search results querying with some filters for Product and ProductProperty tables.
I am creating criteria set over searched result set. For example (4 Items New Product, 34 Items Phone, 26 Items Book etc.)
I query for grouping over Product table with columns of IsNew, CategoryId, PriceType etc.
and also another query for grouping over ProductProperty table with PropertyId and PropertyOptionId columns to get which property have how many items
Therefore to display search results I make one query for search result and 2 for creating criteria list (with counts)
Search result query took 0,7 second and 2 grouping queryies took 1,5 second in total.
When I run load test I reach 7 request per second and %10 dropped by IIS becasue db could not give response.
This is why I want to cache Product and property records.
If I follow items below (in AppFabric);
Create named cache
Create region for product catalog data (a table which have 1 million rows and property table which have 25 million rows)
Tagging item for querying data and grouping.
Can I query with some tags and get 1st or 2nd page of results ?
Can I query with some tags and get counts of some grouping results. (displaying filter options with count)
And do I have to need 3 servers ? Can I provide a solution with only one appfabric server (And of course I know risk.)
Do you know any article or any document explains those scenarios ?
Thanks.
Note:
Some additional test:
I added about 30.000 items to the cache and its size is 900 MB.
When I run getObjectsInRegion method, it tooks about 2 minutes. "IList> dataList = this.DataCache.GetObjectsInRegion(region).ToList();"
The problem is converting to IList. If I use IEnumerable it works very quicly. But How can I get paging or grouping result without converting it to my type ?
Another test:
I tried getting grouping count with 30.000 product item and getting result for grouping took 4 seconds. For example GetObjectByTag("IsNew").Count() and other nearly 50 query like that.
There is, unfortunately, no paging API for AppFabric in V1. Any of the bulk APIs, like GetObjectsByTag, are going to perform the query on the server and stream back all the matching cache entries to the client. From there you can obviously use any LINQ operators you want on the IEnumerable (e.g. Skip/Take/Count), but be aware that you're always pulling the full result set back from the server.
I'm personally hoping that AppFabric V2 will provide support via IQueryable instead of IEnumerable which will give the ability to remote the full request to the server so it could page results there before returning to the client much like LINQ2SQL or ADO.NET EF.
For now, one potential solution, depending on the capabilities of your application, is you can actually calculate some kind of paging as you inject the items into the cache. You can build ordered lists of entity keys representing each page and store those as single entries in the cache which you can pull out in one request and then individually (in parallel) or bulk fetch the items in the list from the cache and join them together with an in-memory LINQ query. If you wanted to trade off CPU for Memory, just cache the actual list of full entities rather than IDs and having to do the join for the entities.
You would obviously have to come up with some kind of keying mechanism to quickly pull these lists of objects from the cache based on the incoming search criteria. Some kind of keying like this might work:
private static string BuildPageListCacheKey(string entityTypeName, int pageSize, int pageNumber, string sortByPropertyName, string sortDirection)
{
return string.Format("PageList<{0}>[pageSize={1};pageNumber={2};sortedBy={3};sortDirection={4}]", entityTypeName, pageSize, pageNumber, sortByPropertyName, sortDirection);
}
You may want to consider doing this kind of thing with a separate process or worker thread that's keeping the cache up to date rather than doing it on demand and forcing the users wait if the cache entry isn't populated yet.
Whether or not this approach ultimately works for you depends on several factors of your application and data. If it doesn't exactly fit your scenarios maybe it will at least help shift your mind into a different way of thinking about solving the problem.

Resources