XQuery sorting issue - sorting

Right now I am in a big trouble. I am new in XQuery, but right now I'm in a project using Xquery using PHP and XSLT..
In our project we have large no. data (its a property listing site) and I'm storing that data to Barkeley DB (XML DB).
The problem is when I am searching for a property its taking too much time for getting the result. The ORDER BY is creating the problem(Query 1).. with out sorting its working fine(Query 2). But for my project sorting is needed and its very impotent. So kindly please check my query(Query1) and please give me a solution as soon as possible. Following are the query:
Query1:
let $properties := (
for $property in collection('bdb/properties.dbxml')/properties/property
[ ( sale_price >=60000 and sale_price <=500000 ) and ( building_square_footage >=300 and building_square_footage <=3000 ) and ( bedrooms >=2 and bedrooms <=6 ) ]
order by
contains($property/mls_agent_id/text(), '505199') descending,
matches($property/mls_office_id/text(), '^CBRR') ascending,
$property/sale_price/number() descending
return $property
)
let $properties := subsequence($properties,10,10) return <properties>{$properties}</properties>
Query 2:
let $properties := (
for $property in subsequence (
collection('bdb/properties.dbxml')/properties/property
[ ( sale_price >=60000 and sale_price <=500000 ) and ( building_square_footage >=300 and building_square_footage <=3000 ) and ( bedrooms >=2 and bedrooms <=6 ) ]
, 1, 10)
)
descending return $property
) return <properties>{$properties}</properties>

I don't know if that helps.. But you may be more successful by trying an alternative XML Databases that offers more advanced query optimizations. Just a guess, Hannes

I think sorting is the wrong tool to use here. You're really getting 3 lists - the agent's listings, the office's listings other than the agent's, and other listings. It might work better to just make these 3 queries, and the optimizer can more efficiently select the nodes for each subquery.
let $properties := (
for $property in collection('bdb/properties.dbxml')/properties/property
[ ( sale_price >=60000 and sale_price <=500000 ) and ( building_square_footage >=300 and building_square_footage <=3000 ) and ( bedrooms >=2 and bedrooms <=6 ) and mls_agent_id = '505199']
order by
$property/sale_price/number() descending
return $property,
for $property in collection('bdb/properties.dbxml')/properties/property
[ ( sale_price >=60000 and sale_price <=500000 ) and ( building_square_footage >=300 and building_square_footage <=3000 ) and ( bedrooms >=2 and bedrooms <=6 ) and (starts_with(office_id, 'CBRR') and not(mls_agent_id = '505199'))]
order by
$property/sale_price/number() descending
return $property,
for $property in collection('bdb/properties.dbxml')/properties/property
[ ( sale_price >=60000 and sale_price <=500000 ) and ( building_square_footage >=300 and building_square_footage <=3000 ) and ( bedrooms >=2 and bedrooms <=6 ) and not (starts_with(office_id, 'CBRR')]
order by
$property/sale_price/number() descending
return $property
)
let $properties := subsequence($properties,10,10) return <properties>{$properties}</properties>
Other things that may be helpful are
try using starts-with instead of matches - a simple string match is likely to be a faster than a regular expression, and easier to optimize
if you break it into multiple selects as I did above, you can avoid the second or third selects if you already have enough items selected.
don't select the entire <property> node only to discard most of what you've selected with subsequence. Particularly when sorting, this likely means a lot of memory bandwidth. It's better to select just some unique identifier and then use those identifiers to get the rest of the property info later.
For example, rather than
let $properties:= for $property in collection('properties.dbxml')/properties/property[.....]
return $property
let $properties := subsequence($properties,10,10)
return <properties>{$properties}</properties>
do instead
let $property_ids:= for $property in collection('properties.dbxml')/properties/property[.....]
return $property/unique_id
return <properties>{
for $id in subsequence($property_ids,10,10) return
collection('properties.dbxml')/properties/property[unique_id = $id]
}</properties>
This means that the in-memory sequence will be a bunch of small ids rather than large nodes. Of course, this means that you need to have these unique ids to begin with, but I suspect a MLS database has such things.

Related

How to count two column values

I want count two columns using power bi to create a visual.my measure as below
Test Measure = COUNTA('Export'[Line])+COUNTA('Export'[Line 2]),
You need to create a calculated table with all the lines values, like this
TableValues =
DISTINCT ( UNION ( DISTINCT ( 'Table'[Line] ), DISTINCT ( 'Table'[Line_2] ) ) )
With that table done, you can write the following measure
CountValues =
VAR _CurrentValue =
SELECTEDVALUE ( TableValues[Line] )
VAR _C1 =
CALCULATE ( COUNTA ( 'Table'[Line] ), 'Table'[Line] = _CurrentValue )
VAR _C2 =
CALCULATE ( COUNTA ( 'Table'[Line_2] ), 'Table'[Line_2] = _CurrentValue )
RETURN
_C1 + _C2
The calculations count all the instances of a specific Line. It's important that the TableValues doesn't have any relationship with the other tables.
Output
Using 'TableValues'[Line] as the Line and CountValues as the metric.

Based on a value show different message

Follow Up :
I have these two tables that are mutually exclusive (not connected in any way) .
The first table has date , number of customers on the dayDISTINCTCOUNT(sales[user_name]), total sales , tier (text - will explain)
The second table is CustomerLimit which is basically consecutive numbers between 1 and 100.
Used the tier measure as the answer below (thank you)
Tier =
VAR Limit = SELECTEDVALUE ( CustomerLimit[CustomerLimit] )
VAR CustCount = COUNT ( Customers[CustomerID] )
RETURN
IF (
ISBLANK ( Limit ), "Select a value",
IF ( CustCount > Limit, "Good", "Bad" )
)
Now I need to aggregate the total amount of customers by Tier.
I used
calculate(DISTINCTCOUNT(sales[user_name]),Tier = "Good") .
It give me an error of : A function 'CALCULATE' has been used in a True/False expression that is used as a table filter expression. This is not allowed.
Is that possible ?
You can capture the limit using SELECTEDVALUE and then compare.
Tier =
VAR Limit = SELECTEDVALUE ( CustomerLimit[CustomerLimit] )
VAR CustCount = COUNT ( Customers[CustomerID] )
RETURN
IF (
ISBLANK ( Limit ), "Select a value",
IF ( CustCount > Limit, "Good", "Bad" )
)

DAX low performance when referencing the same measure two times

I designed a set of measures, that in the end should allow me to calculate sales amount w/o cancelled transactions and with discounts
Here are the definitions for those measures:
define measure Sales[Amount] = SUM(Amount)
define measure Sales[Discounted Amount] = CALCULATE(
ABS(SUM(Sales[DiscountValue] ) ),
FILTER(
VALUES( Sales[SalesOrderSource] ),
Sales[SalesOrderSource] = "XXX"
),
USERELATIONSHIP ( Sales, SalesInvoiceDate[SalesInvoiceDate] ),
USERELATIONSHIP ( Sales, SalesOrderDate[SalesOrderDate] ) --
)
define measure Sales[Cancelled Amount] = CALCULATE(
ABS([Amount]),
FILTER( VALUES( 'Sales' ), Sales[Status] = "Cancelled" ),
USERELATIONSHIP ( Sales[InvoiceDate], SalesInvoiceDate[SalesInvoiceDate] ),
USERELATIONSHIP ( Sales[OrderDate], SalesOrderDate[SalesOrderDate] ) --
)
-- RUNNING EXTREMELY SLOW
define measure Sales[AmountNet] = CALCULATE(
[Amount] - [Cancelled Amount] - [Discounted Amount],
USERELATIONSHIP ( Sales[InvoiceDate], SalesInvoiceDate[SalesInvoiceDate] ),
USERELATIONSHIP ( Sales[OrderDate], SalesOrderDate[SalesOrderDate] )
)
Unfortunately, the performance of the final measure Sales[AmountNet] is very slow.
BUT when I remove the [Cancelled Amount] factor from the [AmountNet] definition, it performs well. I suspect it's because of referencing the same measure ([Amount]) two times, where the second reference is overloaded with the FILTER iterator.
I would like to get some support on understanding this behaviour and how this could be rewritten to achieve better performance.
Thanks.
Can you try:
define measure Sales[Cancelled Amount] = CALCULATE(
ABS([Amount]),
KEEPFILTERS( Sales[Status] = "Cancelled" ),
USERELATIONSHIP ( Sales[InvoiceDate], SalesInvoiceDate[SalesInvoiceDate] ),
USERELATIONSHIP ( Sales[OrderDate], SalesOrderDate[SalesOrderDate] ) --
)
My hope is that this approach will be less expensive than FILTER over VALUES(Sales).

Count unique matching items with filter as a calculated column

I have two tables are Data and Report.
Data Table:
In Data table contain three columns are Item, status, and filter.
The item contains duplicated entry and the item column contains text and number or number only or text only.
The status column contains two different text/comments, "Okay" and "Not Okay"
The filter column contains two different filters which are A1 and A2.
The report table
In the Report table, I updated both comments/text as "Okay" or "Not Okay". I am looking for count against filter A1 and A2 according to the comments.
I would like to create a new calculated column in the report table in order to get the unique count according to the comments and filter based on the data table columns item and status.
DATA:
REPORT
Alexis Olson helped the following calculated column in order to get the unique count. I am trying to add one more filter in existing DAX calculated column but it's not working. Can you please advise?
1.Desired Result =
VAR Comment = REPORT[COMMENTS]
RETURN
CALCULATE (
DISTINCTCOUNT ( DATA[ITEM] ),
DATA[STATUS] = Comment
)
2.Desired Result =
COUNTROWS (
SUMMARIZE (
FILTER ( DATA, DATA[STATUS] = REPORT[COMMENTS] ),
DATA[ITEM]
)
)
3.Desired Result =
SUMX (
DISTINCT ( DATA[ITEM] ),
IF ( CALCULATE ( SELECTEDVALUE ( DATA[STATUS] ) ) = REPORT[COMMENTS], 1, 0 )
)
I think you can just add a filter to CALCULATE:
Filter by A1 Result =
VAR Comment = REPORT[COMMENTS]
RETURN
CALCULATE (
DISTINCTCOUNT ( DATA[ITEM] ),
DATA[STATUS] = Comment,
DATA[FILTER] = "A1"
)
For the second method,
Filter by A1 Result =
COUNTROWS (
SUMMARIZE (
FILTER ( DATA, DATA[STATUS] = REPORT[COMMENTS] && REPORT[FILTER] = "A1" ),
DATA[ITEM]
)
)
I do not recommend using the third one but it would be like this
Filter by A1 Result =
SUMX (
DISTINCT ( DATA[ITEM] ),
IF (
CALCULATE ( SELECTEDVALUE ( DATA[STATUS] ) ) = REPORT[COMMENTS]
&& CALCULATE ( SELECTEDVALUE ( DATA[FILTER] ) ) = "A1",
1,
0
)
)

Distinct Count without using CALCULATE

I've come across this DAX measure:
# CustMultProds =
COUNTROWS(
FILTER(
Customer,
CALCULATE( DISTINCTCOUNT( Sales[ProductKey] ) ) >= 2
)
)
I pretty much understand how it works - it iterates over Customer inside the FILTER function, then the row context created by this iterator is transitioned into a filter context so that it is able to get the number of distinct products from the Sales table.
I am wondering is it possible to re-write the measure without using CALCULATE ? I got as far as using RELATEDTABLE but then not sure how to extract the distinct ProductKeys from each related table:
# CustMultProds =
COUNTROWS(
FILTER(
Customer,
RELATEDTABLE (Sales)
...
...
)
)
This is a possible implementation of the measure using RELATEDTABLE. But Context Transition still happens once per customer because RELATEDTABLE performs a context transition
# CustMultProds =
COUNTROWS(
FILTER(
Customer,
VAR CustomerSales =
RELATEDTABLE( Sales )
RETURN
MAXX( CustomerSales, Sales[ProductKey] )
<> MINX( CustomerSales, Sales[ProductKey] )
)
)
This is another way to write a measure leveraging RELATEDTABLE, that could be modified to deal with a different number of distinct products
# CustMultProds =
COUNTROWS(
FILTER(
Customer,
COUNTROWS( SUMMARIZE( RELATEDTABLE( Sales ), Sales[ProductKey] ) ) >= 2
)
)
This is another possible implementation, without CALCULATE and RELATEDTABLE.
But it scans the entire Sales table once per customer, so, even if it doesn't perform a context transition I'd expect it to be slower
# CustMultProds =
COUNTROWS(
FILTER(
Customer,
VAR SalesProducts =
SUMMARIZE( Sales, Sales[CustomerKey], Sales[ProductKey] )
RETURN
COUNTROWS(
FILTER( SalesProduct, Sales[CustomerKey] = Customer[CustomerKey] )
) >= 2
)
)

Resources