I have an N:N relationship in my model driven app. One table is "normal" and the other one is virtual. The problem appears when there are many items in this relationship. The request returns such message:
"The query specified in the URI is not valid. The node count limit of '100' has been exceeded. To increase the limit, set the 'MaxNodeCount' property on EnableQueryAttribute or ODataValidationSettings."
The suggested solution is not valid, as if I increased the limit the problem would still persist, but now not for 15+ records in relationship, but ~30+ (which is a possible scenario).
The fetch from UI is ok:
<link-entityname="transport_purchaseorderline" intersect="true" visible="false" to="purchaseorderlineid" from="purchaseorderlineid">
<link-entityname="transport"from=" transportid" to="transportid" alias="bb">
<filtertype="and">
<conditionattribute="transportid" operator="eq" uitype="transport" value="a0d43369-a7ae-ec11-9840-000d3aa86785"/>
</filter>
</link-entity>
</link-entity>
But the odata builds long query asking for each purchase order line separately, causing node count limit error:
...odata/PurchaseOrderLines?$orderby=Name&$filter=Guid eq 00000000-0300-4d09-0000-000100000000 or Guid eq 00000000-0300-4d09-0000-000200000000 or Guid eq 00000000-0300-4d09-0000-000300000000 and so on
Any way to overcome this? How to force odata not to build long query, asking for each guid separately?
Related
I'm a junior user of mybatis, I wonder the difference of nested select and nested results whether it's just simply like the difference between sub-query vs. join, especially in performance. Or it will do some optimization?
I used mybatis 3.4.7 version and oracle DB.
here is an example for reference:
private List<Post> posts;
<resultMap id="blogResult" type="Blog">
<collection property="posts" javaType="ArrayList" column="id"
ofType="Post" select="selectPostsForBlog"/>
</resultMap>
<select id="selectBlog" resultMap="blogResult">
SELECT * FROM BLOG WHERE ID = #{id}
</select>
<select id="selectPostsForBlog" resultType="Post">
SELECT * FROM POST WHERE BLOG_ID = #{id}
</select>
or
<select id="selectBlog" resultMap="blogResult">
select
B.id as blog_id,
B.title as blog_title,
B.author_id as blog_author_id,
P.id as post_id,
P.subject as post_subject,
P.body as post_body,
from Blog B
left outer join Post P on B.id = P.blog_id
where B.id = #{id}
</select>
<resultMap id="blogResult" type="Blog">
<id property="id" column="blog_id" />
<result property="title" column="blog_title"/>
<collection property="posts" ofType="Post">
<id property="id" column="post_id"/>
<result property="subject" column="post_subject"/>
<result property="body" column="post_body"/>
</collection>
</resultMap>
if there is still N+1 problem in nested select like sub-query?
do you have any advice or experience of which one performs better in a certain environment or condition? thanks a lot :).
First of all a slight terminology note. Subquery in SQL is a part of the query that is a query by itself, for example:
SELECT ProductName
FROM Product
WHERE Id IN (SELECT ProductId
FROM OrderItem
WHERE Quantity > 100)
In this case the following piece of the query is the subquery:
SELECT ProductId
FROM OrderItem
WHERE Quantity > 100
So you are using term "subquery" here incorrectly. In mybatis documentation the term nested select is used.
There are two ways to fetch associated entities/collections in mybatis. Here's relevant part of the documentation:
Nested Select: By executing another mapped SQL statement that returns
the complex type desired. Nested Results: By using nested result
mappings to deal with repeating subsets of joined results.
When nested select is used mybatis executes the main query first (in your case selectBlog) and then for every record it executes another select (hence the name nested select) to fetch associated Post entities.
When Nested results are used only one query is executed but it already has the association data joined. So mybatis maps the result to the object structure.
In your example single Blog entity is returned so when nested select is used two queries are executed, but in general case (if you would get the list of Blogs) you would hit N+1 problem.
Now let's deal with performance. All the following assumes that the queries are tuned (as in there are no missing indices), you are using connection pool, the database is collocated, basically speaking your system is tuned in all other regards.
Speaking of the performance there is no single correct answer and you milage may differ. You always need to test your particular workflows in your setup. Given so many factors affect performance like data distribution (think of max/min/arg posts each blog have), the size of the record in DB (think of number and size of the data fields in blog and post), the DB parameters (like disk type and speed, amount of memory available for dataset caching etc) there may no be a single answer only some generic observations that follow.
But we can understand the performance difference if we look at the cases on the ends of the performance spectrum. Like to see cases when nested select significantly outperforms join and vice versa.
For collection fetching join should be better in most cases because network latency to do N+1 request counts.
One case when nested select may be better is for one-to-many association when the record in the main table reference some other table and the cardinality of the other table is not large and the size of the record in the other table is large.
For example, let's consider Blog has a category property that references categories table and it may have one of these values Science, Fashion, News. And let's imagine the list of blogs is selected by some filter like keywords in the blog title. If the result contains let's say 500 items then most of the associated categories would be duplicates.
If we select them with join every record in the result set would contain Category data fields (and as a reminder most of them are duplicates and we have a lot of data in Category record).
If we select them using nested select we would do the query for fetch the category by category id for every record and here mybatis session cache comes to play. For the duration of the SqlSession every time mybatis executes the query it stores its result in the session cache so it does not execute repeating requests to the database but takes them from the cache. It means that after mybatis has retrieved some category by id for the first record it would reuse it for all other records in the recordset it processes.
In the above example we would do up to 4 requests to the database but the reduced amount of the data passed over the network may overweight the need to do 4 requests.
Though reading lots of available resources online, I couldn't find a straight explanation that clarifies:
In what case both the values of PageingInfo.PageNumber and PageingInfo.PagingCookie are required to be assigned?
For example: Consider we have 10,000 unique records, and we'd like to retrieve all of them, but 200 records at a time. Do we have to iterate over PageNumber (50 iterations), or is it enough to use only the PagingCookie?
Now, I'd like to share what I've found on online resources:
Firstly, many online resources links to the official MSDN examples (For FetchXML: 1, 2, for QueryExpression: 1, 2), but there isn't a direct answer for this issue. They all iterate over the PageNumber, but as I understand (and it may be wrong), a Page always has 5000 records (Or is it?).
Secondly, I've found this demonstration of using PagingCookie, fetching records 6-10 out of 40 records:
<fetch mapping="logical" count="5" page="2" paging-cookie="<cookie page="1"><new_parentrecordid last="{F8DAB1AA-3A0F-E411-8189-005056B20097}" first="{F8DAB1AA-3A0F-E411-8189-005056B20097}" /></cookie>" version="1.0">
<entity name="new_parentrecord">
<attribute name="new_name" />
<link-entity name="new_childrecord" from="new_parentaid" to="new_parentrecordid">
<attribute name="new_childrecordid" />
<attribute name="new_name" />
</link-entity>
</entity>
</fetch>
Then, it's explained that the above is translated to the following SQL query:
select top 6 "new_parentrecord0".new_name as "new_name"
, "new_parentrecord0".new_parentrecordId as "new_parentrecordid"
, "new_childrecord1".new_childrecordId as "new_childrecord1.new_childrecordid"
, "new_childrecord1".new_name as "new_childrecord1.new_name"
from
new_parentrecord as "new_parentrecord0"
join new_childrecord as "new_childrecord1" on ("new_parentrecord0".new_parentrecordId = "new_childrecord1".new_ParentAId)
where
((("new_parentrecord0".new_parentrecordId > '01DBB1AA-3A0F-E411-8189-005056B20097')))
order by
"new_parentrecord0".new_parentrecordId asc
Thus, as I see it in this example, there is no need to use the page number, since only the paging cookie is being used in resulted the SQL query.
Would be great to have a nice clarification for this issue.
My experience and additional research I just did indicates that yes, we do need to iterate over all the pages. While the page number never makes it into the SQL query, it controls which ID SQL will use in its filter.
Without page # in the query:
The ID's in the paging cookie stay the same:
With page #:
The ID's change:
It is also interesting to note that if you advance the page # without changing the page or ID's in the paging cookie,
The system uses the paging cookie's last ID and expands the SQL "top" parameter to get the number of pages you requested:
And it must do some processing on the SQL recordset because it returns the right number of records (in this case 20), with the ID's advanced:
The PagingCookie is a read-only identifier for the resultset the server has delivered to the client. Its purpose is to optimize data retrieval and should not be modified. The PagingCookie basically tells the server where the start and end of the page specified can be roughly found.
I assume it enables the CRM server to use cached query result sets. When paging through a result set using paging cookies I noticed the server only submits the SQL query once (i.e. when it receives a query without a paging cookie) and delivers subsequent pages - whenever possible - from memory.
The client is expected to simply copy the cookie into the FetchXML or QueryExpression when retrieving another page of the same resultset.
As far as I can see (testing it on Dynamics CRM 2016 (V8.2) this also works with joins.
I suspect the SQL query you are showing is actually the result of tampering with the PagingCookie: the server cannot find the cookie in its cache, assumes that the original resultset has been flushed and when it builds a fresh SQL query it fully relies on the correctness of the boundaries declared by the cookie. Hence the unexpected where clause.
Conclusion
Always use the page attribute (FetchXML) or PageInfo.PageNumber property (QueryExpression). Never modify the PagingCookie, just copy it into subsequent page requests.
When the page size (count attribute) is not specified it defaults to a maximum of 5,000 rows. When the expected result set is larger than that, you need to iterate all available pages.
When you are working with Dynamics CRM On Premise, you can modify the maximum page size using a PowerShell command.
Given I have a simple query:
List<Customer> findByEntity(String entity);
This query returns 7k records in 700ms.
Page<Customer> findByEntity(String entity, Pageable pageable);
this query returns 10 records in 1080ms. I am aware of the additional count query for pagination, but still something seems off. Also one strange thing I've noticed is that if I increase page size from 10 to 1900, response time is exactly the same around 1080 ms.
Any suggestions?
It might indeed be the count query that's expensive here. If you insist on knowing about the total number of elements matching in the collection there's unfortunately no way around that additional query. However there are two possibilities to avoid more of the overhead if you're able to sacrifice on information returned:
Using Slice as return type — Slice doesn't expose a method to find out about the total number of elements but it allows you to find out about whether a next slice is available. We avoid the count query here by reading one more element than requested and using its (non-)presence as indicator of the availability of a next slice.
Using List as return type — That will simply apply the pagination parameters to the query and return the window of elements selected. However it leaves you with no information about whether subsequent data is available.
Method with pagination runs two query:
1) select count(e.id) from Entity e //to get number of total records
2) select e from Entity e limit 10 [offset 10] //'offset 10' is used for next pages
The first query runs slow on 7k records, IMHO.
Upcoming release Ingalis of Spring Data will use improved algorithm for paginated queries (more info).
Any suggestions?
I think using a paginated query with 7k records it's useless. You should limit it.
I'm am getting irritated with iReports. Problem is that I have a data set returning data for multiple customers and I want to use the "Group Expression" against the customer ID and have the report lay out the Detail Tabs per customer.
I'm finding that, seeming randomly, where there is more that one data row for a customer iReports will generate two or more groupings (Sometimes it does what I expect and group all the customer data together), the field IDing the customer is the same and doesn't change.
Has anyone seen this before? To be honest I can't believe it is actually a bug, but something I've missed. Just much searching as yet to find a suitable result.
I think this is a data sorting problem.
The quote from iReport Ultimate Guide:
JasperReports groups records by evaluating the group expression. Every
time the expression's value changes, a new group instance is created.
The engine does not perform any record sorting (if not explicitly
requested), so when we define groups we should always take care of the
records sorting. That is, if we want to group a set of addresses by
country, the records we select for the report should already by
ordered by country. It is simple to sort data when using an SQL query
by using the ORDER BY clause. When this is not possible (that is, when
obtaining the records from an XML document), we can request that
JasperReports sort the data for us. This can be done using the sort
options available in the query window
You can sort data in these ways:
in case using of Database jdbc connection datasource type you can add ORDER BY customerId clause to the report's query, where customerId - column name of field with customer id
in case using of File csv connection or something like this you can organize data sorting by adding sortField property for field to the report's template (jrxml file):
<jasperReport ...>
...
<field name="customerId" class="java.lang.String"/>
<sortField name="customerId"/>
SQL Statement has ORDER BY?
iReport group is grouped by customer_id?
I read a lot of documents about AppFabric caching but most of them cover simple scenarios.
For example adding city list data or shopping card data to the cache.
But I need adding product catalog data to the cache.
I have 4 tables:
Product (1 million rows), ProductProperty (25 million rows), Property (100 rows), PropertyOption (300 rows)
I display paged search results querying with some filters for Product and ProductProperty tables.
I am creating criteria set over searched result set. For example (4 Items New Product, 34 Items Phone, 26 Items Book etc.)
I query for grouping over Product table with columns of IsNew, CategoryId, PriceType etc.
and also another query for grouping over ProductProperty table with PropertyId and PropertyOptionId columns to get which property have how many items
Therefore to display search results I make one query for search result and 2 for creating criteria list (with counts)
Search result query took 0,7 second and 2 grouping queryies took 1,5 second in total.
When I run load test I reach 7 request per second and %10 dropped by IIS becasue db could not give response.
This is why I want to cache Product and property records.
If I follow items below (in AppFabric);
Create named cache
Create region for product catalog data (a table which have 1 million rows and property table which have 25 million rows)
Tagging item for querying data and grouping.
Can I query with some tags and get 1st or 2nd page of results ?
Can I query with some tags and get counts of some grouping results. (displaying filter options with count)
And do I have to need 3 servers ? Can I provide a solution with only one appfabric server (And of course I know risk.)
Do you know any article or any document explains those scenarios ?
Thanks.
Note:
Some additional test:
I added about 30.000 items to the cache and its size is 900 MB.
When I run getObjectsInRegion method, it tooks about 2 minutes. "IList> dataList = this.DataCache.GetObjectsInRegion(region).ToList();"
The problem is converting to IList. If I use IEnumerable it works very quicly. But How can I get paging or grouping result without converting it to my type ?
Another test:
I tried getting grouping count with 30.000 product item and getting result for grouping took 4 seconds. For example GetObjectByTag("IsNew").Count() and other nearly 50 query like that.
There is, unfortunately, no paging API for AppFabric in V1. Any of the bulk APIs, like GetObjectsByTag, are going to perform the query on the server and stream back all the matching cache entries to the client. From there you can obviously use any LINQ operators you want on the IEnumerable (e.g. Skip/Take/Count), but be aware that you're always pulling the full result set back from the server.
I'm personally hoping that AppFabric V2 will provide support via IQueryable instead of IEnumerable which will give the ability to remote the full request to the server so it could page results there before returning to the client much like LINQ2SQL or ADO.NET EF.
For now, one potential solution, depending on the capabilities of your application, is you can actually calculate some kind of paging as you inject the items into the cache. You can build ordered lists of entity keys representing each page and store those as single entries in the cache which you can pull out in one request and then individually (in parallel) or bulk fetch the items in the list from the cache and join them together with an in-memory LINQ query. If you wanted to trade off CPU for Memory, just cache the actual list of full entities rather than IDs and having to do the join for the entities.
You would obviously have to come up with some kind of keying mechanism to quickly pull these lists of objects from the cache based on the incoming search criteria. Some kind of keying like this might work:
private static string BuildPageListCacheKey(string entityTypeName, int pageSize, int pageNumber, string sortByPropertyName, string sortDirection)
{
return string.Format("PageList<{0}>[pageSize={1};pageNumber={2};sortedBy={3};sortDirection={4}]", entityTypeName, pageSize, pageNumber, sortByPropertyName, sortDirection);
}
You may want to consider doing this kind of thing with a separate process or worker thread that's keeping the cache up to date rather than doing it on demand and forcing the users wait if the cache entry isn't populated yet.
Whether or not this approach ultimately works for you depends on several factors of your application and data. If it doesn't exactly fit your scenarios maybe it will at least help shift your mind into a different way of thinking about solving the problem.