I'm fairly new to elasticsearch (though with a fair bit of SQL experience) and am currently struggling with putting a proper query together. I have 2 boolean fields isPlayer and isEvil that an entry is either true or false on. Based on that, I want to split my dataset into 4 groups:
isPlayer: true, isEvil: true
isPlayer: true, isEvil: false
isPlayer: false, isEvil: true
isPlayer: false, isEvil: false
These groups I want to randomly sort within themselves, then attach them to be one long list that I can paginate. I'd like to do that inside the query, as that seems like the "correct" way to do this, since I'd do it similarly in SQL. In that list, the groups are to be sorted in order, so first all entries of Group 1 in a random order, then all entries of Group 2 in a random order, then all entries of Group 3 etc. . It is necessary that the randomness of the sorting is reproducible if given the same inputs, so if the sorting is based on random_score ideally I'd be using a seed for the randomness.
I can build a single query, but how do I combine 4?
As approaches I've found so far MultiSearch and Disjunction Max Query. MultiSearch seems like it doesn't support Pagination. Regarding Disjunction Max Query it might be that I'm missing the forest for the trees, but there I'm struggling in having the subqueries be randomly sorted only within themselves before appending them to one another.
Here how I write a single query for now without Disjunction Max Query, in case it helps:
{
"query": {
"bool": {
"should": [
{
"term": {
"isPlayer": true
}
},
{
"term": {
"isEvil": true
}
}
]
}
}
}
The solution to this problem is not doing 4 separate groups, but instead ensuring they all have different ranges of scores and sorting by scores. This can be achieved, by scoring the hits not by some kind of matching criteria, but through a script-score field. This field allows you to write code yourself that returns a logic score (The default language is called "painless", but I've seen examples of groovy as well).
The logic is fairly simple:
If isPlayer = true, add 2 points to the score
If isEvil = true, add 4 points to the score
Either way, add a random number between 0 and 1 to the score at the end
This creates the 4 groups I wanted with distinct score-ranges:
isPlayer = true, isEvil = true --> Score-range: 6-7
isPlayer = false, isEvil = true --> Score-range: 4-5
isPlayer = true, isEvil = false --> Score-range: 2-3
isPlayer = false, isEvil = false --> Score-range: 0-1
The query would look like this:
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": """
double score = 0;
if(doc['isPlayer']){
score += 2;
}
if(doc['isEvil']){
score += 4;
}
int partialSeed = 1;
score += randomScore(partialSeed, 'id');
return score;
"""
}
}
}
}
Related
I am building an integration between Shopify and our ERP via the admin API using GraphQL. All is working well except when I try and get the exact prices for an order.
In the documentation discountedTotalSet should be 'The total line price after discounts are applied' but I am finding it returns the full price - see examples below.
Can anyone give me guidance on how to get the API to show the same prices that are on the order? I need this to match exactly line by line. This is the query I am using for the order:
{
node(id: "gid://shopify/Order/4866288156908") {
id
...on Order {
name
lineItems (first: 50) {
edges {
node {
id
quantity
sku
discountedTotalSet {
shopMoney {
currencyCode
amount
}
}
}
}
}
}
}
}
And this is the result, note amount says 599.00 but that is not correct, see screenshot for the same order from the UI.
{
"data": {
"node": {
"id": "gid://shopify/Order/4866288156908",
"name": "AK-1003",
"lineItems": {
"edges": [
{
"node": {
"id": "gid://shopify/LineItem/12356850286828",
"quantity": 1,
"sku": "AK-A1081",
"discountedTotalSet": {
"shopMoney": {
"currencyCode": "AUD",
"amount": "599.0"
}
}
}
}
]
}
}
},
Shopify UI screenshotemphasized text
discountedTotalSet gives you the amount after discounts applied to that particular line. In your example you're applying a discount to the whole order. There is no field, in the lineItem object that will give you the expected value for that line.
So you have to distribute the whole discount to each single line.
I had the exact same problem and I had to implement this solution in python, I hope it helps:
from decimal import Decimal
def split_discounts(money, n):
quotient = Decimal(round(((money * Decimal(100)) // n) / Decimal(100), 2))
remainder = int(money * 100 % n)
q1 = Decimal(round(quotient + Decimal(0.01), 2)) # quotient + 0.01
result = [q1] * remainder + [quotient] * (n - remainder)
return result # returns an array of discounted amounts
def retrieve_shop_money(obj):
return Decimal(obj['shopMoney']['amount']) if obj and obj['shopMoney'] and obj['shopMoney']['amount'] else Decimal(
0) # this is just to retrieve the inner shopMoney field
def get_line_price(order_node):
discount = retrieve_shop_money(order_node["cartDiscountAmountSet"])
non_free_lines = len([1 for item in order_node["lineItems"]["edges"] if
retrieve_shop_money(item["node"]["discountedTotalSet"]) > 0])
if non_free_lines > 0:
discounts = split_discounts(discount, non_free_lines)
else:
discounts = 0 # this was an edge case for me, that you might not consider
discounted = 0
for item in order_node["lineItems"]["edges"]:
gross = retrieve_shop_money(item["node"]["originalTotalSet"]) # THIS IS THE VALUE WITHOUT DISCOUNTS
net = retrieve_shop_money(item["node"]["discountedTotalSet"])
if net > 0: # exluding free gifts
net = net - discounts[discounted] # THIS IS THE VALUE YOU'RE LOOKING FOR
discounted = discounted + 1
So first I retrieve if the whole order was free. This was an edge case that was giving me some issues. In that case I just know that 0 is the answer I want.
Otherwise with the method split_discounts I calculate each single disount to be applied to the lines. Discounts can be different because if you discount $1 out of 3 items is going to be [0.33,0.33,0.34]. So the result is an array. Then I just loop through the lines and apply the discount if discountedTotalSet is >0.
Thinking about it, you might also want to be sure that the discount is greater than the value of the line. But that is an edge case that I never encouted, but depends on the kind of discounts you have.
Title: How do I dynamically name a collection?
Pseudo-code: collect(n) AS :Label
The primary purpose of this is for easy reading of the properties in the API Server (node application).
Verbose example:
MATCH (user:User)--(n)
WHERE n:Movie OR n:Actor
RETURN user,
CASE
WHEN n:Movie THEN "movies"
WHEN n:Actor THEN "actors"
END as type, collect(n) as :type
Expected output in JSON:
[{
"user": {
....
},
"movies": [
{
"_id": 1987,
"labels": [
"Movie"
],
"properties": {
....
}
}
],
"actors:" [ .... ]
}]
The closest I've gotten is:
[{
"user": {
....
},
"type": "movies",
"collect(n)": [
{
"_id": 1987,
"labels": [
"Movie"
],
"properties": {
....
}
}
]
}]
The goal is to be able to read the JSON result with ease like so:
neo4j.cypher.query(statement, function(err, results) {
for result of results
var user = result.user
var movies = result.movies
}
Edit:
I apologize for any confusion in my inability to correctly name database semantics.
I'm wondering if it's enough just to output the user and their lists of both actors and movies, rather than trying to do a more complicated means of matching and combining both.
MATCH (user:User)
OPTIONAL MATCH (user)--(m:Movie)
OPTIONAL MATCH (user)--(a:Actor)
RETURN user, COLLECT(m) as movies, COLLECT(a) as actors
This query should return each User and his/her related movies and actors (in separate collections):
MATCH (user:User)--(n)
WHERE n:Movie OR n:Actor
RETURN user,
REDUCE(s = {movies:[], actors:[]}, x IN COLLECT(n) |
CASE WHEN x:Movie
THEN {movies: s.movies + x, actors: s.actors}
ELSE {movies: s.movies, actors: s.actors + x}
END) AS types;
As far as a dynamic solution to your question, one that will work with any node connected to your user, there are a few options, but I don't believe you can get the column names to be dynamic like this, or even the names of the collections returned, though we can associate them with the type.
MATCH (user:User)--(n)
WITH user, LABELS(n) as type, COLLECT(n) as nodes
WITH user, {type:type, nodes:nodes} as connectedNodes
RETURN user, COLLECT(connectedNodes) as connectedNodes
Or, if you prefer working with multiple rows, one row each per node type:
MATCH (user:User)--(n)
WITH user, LABELS(n) as type, COLLECT(n) as collection
RETURN user, {type:type, data:collection} as connectedNodes
Note that LABELS(n) returns a list of labels, since nodes can be multi-labeled. If you are guaranteed that every interested node has exactly one label, then you can use the first element of the list rather than the list itself. Just use LABELS(n)[0] instead.
You can dynamically sort nodes by label, and then convert to the map using the apoc library:
WITH ['Actor','Movie'] as LBS
// What are the nodes we need:
MATCH (U:User)--(N) WHERE size(filter(l in labels(N) WHERE l in LBS))>0
WITH U, LBS, N, labels(N) as nls
UNWIND nls as nl
// Combine the nodes on their labels:
WITH U, LBS, N, nl WHERE nl in LBS
WITH U, nl, collect(N) as RELS
WITH U, collect( [nl, RELS] ) as pairs
// Convert pairs "label - values" to the map:
CALL apoc.map.fromPairs(pairs) YIELD value
RETURN U as user, value
We have a lot of documents in each index (~10 000 000). But each document is very small and contains almost only integer values.
We needed to SUM all numerical field.
First step - We ask for all available fields with a mapping.
Example :
GET INDEX/TYPE/_mapping
Second step - We build the request with the fields from the mapping.
Example :
GET INDEX/TYPE/_search
{
// SOME FILTERS TO REDUCE THE NUMBER OF DOCUMENTS
"size":0,
"aggs":{
"FIELD 1":{
"sum":{
"field":"FIELD 1"
}
},
"FIELD 2":{
"sum":{
"field":"FIELD 2"
}
},
// ...
"FIELD N":{
"sum":{
"field":"FIELD N"
}
}
}
}
Our problem is that the second request execution time is linear with the number of field N.
That's not acceptable as this is only sums. So we tried to generate our own aggregation with a scripted metric (groovy).
Exemple with only 2 fields :
// ...
"aggs": {
"test": {
"scripted_metric": {
"init_script": "_agg['t'] = []",
"map_script": "_agg.t.add(doc)",
"combine_script": "res = [:]; res['FIELD 1'] = 0; res['FIELD 2'] = 0; for (t in _agg.t) { res['FIELD 1'] += t.['FIELD 1']; res['FIELD 2'] += t.['FIELD 2']; }; return res",
"reduce_script": "res = [:]; res['FIELD 1'] = 0; res['FIELD 2'] = 0; for (t in _aggs) { res['FIELD 1'] += t.['FIELD 1']; res['FIELD 2'] += t.['FIELD 2']; }; return res"
}
}
}
// ...
But it appears that the more affectations we add in the script, the more time it takes to execute it, so it doesn't solve our problem.
There is not a lot of example out there.
Do you have some ideas to improve this script performances ?
Or other ideas ?
How could it calculate N sums in sub-linear time, does any such system exist?
10 million document's isn't actually that many. How long are your queries taking, how many shards do you have and is the CPU maxed at 100%? (I was gonna ask these in a comment but don't have 50 reputation yet).
If you are interested in the total sum of all fields you could pre-calculate document-level sums when you are indexing the document and then at query time just take the sum of these values.
You could also try storing fields as doc_values and see if it helps. You would have less memory pressure and garbage collection, although docs mention a possible 10 - 25% performance hit.
Here, I am trying to get search results for multiple terms. Say fulltext="Lee jeans", then regexresult={"lee","jeans"}.
Code :
IProviderSearchContext searchContext = index.CreateSearchContext();
IQueryable<SearchItem> scQuery = searchContext.GetQueryable<SearchItem>();
var predicate = PredicateBuilder.True<SearchItem>();
//checking if the fulltext includes terms within " "
var regexResult = SearchRegexHelper.getSearchRegexResult(fulltext);
regexResult.Remove(" ");
foreach (string term in regexResult)
{
predicate = predicate.Or(p => p.TextContent.Contains(term));
}
scQuery = scQuery.Where(predicate);
IEnumerable<SearchHit<SearchItem>> results = scQuery.GetResults().Hits;
results=sortResult(results);
Sorting is based on sitecore fields:
switch (query.Sort)
{
case SearchQuerySort.Date:
results = results.OrderBy(x => GetValue(x.Document, FieldNames.StartDate));
break;
case SearchQuerySort.Alphabetically:
results = results.OrderBy(x => GetValue(x.Document, FieldNames.Profile));
break;
case SearchQuerySort.Default:
default:
results = results.OrderByDescending(x => GetValue(x.Document, FieldNames.Updated));
break;
}
Now, what i need is to have results for "lee" first and sort them and then find results for "jeans" and sort them. The final search result will have the concatenated sets of sorted items for "lee" first and then for "jeans".
Thus we would have to get results for "lee" first and then results for "jeans"
Is there a way to get results term by term ?
You can use Query-Time Boosting to give the terms more relevance and therefore affect the ranking:
Sitecore 7: Six Types of Search Boosting
Lucene Boost With LINQ in Sitecore 7 ContentSearch
You want to give the first term the highest boost, and then gradually reduce for each additional term:
var regexResult = SearchRegexHelper.getSearchRegexResult(fulltext);
regexResult.Remove(" ");
float boost = regexResult.Count();
foreach (string term in regexResult)
{
predicate = predicate.Or(p => p.TextContent.Contains(term)).Boost(boost--);
}
EDIT:
Boosting and sorting in the same query is not possible, at least, the sorting will undo the "relevance" based sorting that was returned due to boosting.
Alternative way would be to search multiple times and concatenate the results returning a single list. Not as efficient since you are essentially making multiple searches:
IProviderSearchContext searchContext = index.CreateSearchContext();
var items = new List<SearchResultItem>();
var regexResult = SearchRegexHelper.getSearchRegexResult(fulltext);
regexResult.Remove(" ");
foreach (string term in regexResult)
{
var results = searchContext.GetQueryable<SearchResultItem>()
.Where(p => p.Content.Contains(term));
SortSearchResults(results); //results passed in by reference, no need to return object to set it back to itself
items.AddRange(results);
}
NOTE: The above does not take into account duplicates between the result sets.
For columns in y-dimension how to do natural sort for alpha numeric column names ?
For example:
consider column names AA1, AA2, AA3, AA10, AA11.
These are listed in order AA1, AA10, AA11, AA2, AA3 in pivot table y-dimension.
Desired order of columns is AA1, AA2, AA3, AA10, AA11
Free jqGrid 4.9 contains full rewritten version of jqPivot. I tried to hold compatibility with the previous version, but it contains many advanced features. I tried to describe there in wiki.
Not so many people uses jqPivot. So I remind what it do. It gets an input data as source and generate new data, which will be input data for jqGrid. Additionally jqPivot generates colModel based on input data and yDimension parameter. During analyzing of input data jqPivot sorts input data by xDimension and by yDimension. The order or sorting of xDimension defines the order of rows of resulting grid. The order or sorting of yDimension defines the order of columns of resulting grid and the total number of resulting columns. The options compareVectorsByX and compareVectorsByY of allows to specify callback function which will be used for custom sorting by the whole x or y vector. It's important to understand that sorting function not only specify the the order of columns, but it informs jqPivot which vectors should be interpreted as the same. For example it can interpret the values 12, 12.0 and 012.00 as the same and specify that 12.0 is larger as 6.
I describe below some ways which can be used to customize sorting by xDimension and yDimension.
First of all one can specify skipSortByX: true or skipSortByY: true parameters. In the case the input data have to be already sorted in the order which you want. The next important options are Boolean options caseSensitive (with default value false) and trimByCollect (default value true). caseSensitive: true can be used to distinguish input data by case and trimByCollect: false can be used to hold trailing spaces in the input data.
Some other important option can be specified in xDimension or yDimension: sorttype and sortorder. sortorder: "desc" can be used to reverse the order of sorted data. The option sorttype can be "integer" (or "int") which means to truncate (Math.floor(Number(inputValue))) input data during sorting; The values "number", "currency" and "float" means that the input data should be converted to numbers during sorting (Number(inputValue)). Finally one can don't specify any sorttype, but specify compare callback function instead. The compare callback is function with two parameters and it should return well known -1, 0 or 1 values.
For example I created the demo for one issue. One asked my about the following situation. The web site contains login, which identifies the country of the user. One want to set the user's country as the first in the sorting order. The demo uses the following yDimension parameter:
yDimension: [
{ dataName: "sellyear", sorttype: "integer" },
{ dataName: "sell month",
compare: function (a, b) {
if (a === "Germany") { return b !== "Germany" ? -1 : 0; }
if (b === "Germany") { return 1; }
if (a > b) { return 1; }
if (a < b) { return -1; }
return 0;
}}
]
It sets "Germany" first in the sorting order. As the results one sees the results like on the picture below
You can use the same approach using the code for natural compare from the answer and you will implements your requirements.
In more advanced cases one can use the options compareVectorsByX and compareVectorsByY. The requirement was to place specific country only in one specific year on the first place holding the standard order on all other cases. The corresponding demo uses compareVectorsByY to implement the requirement. It displays
and uses the following compareVectorsByY:
compareVectorsByY: function (vector1, vector2) {
var fieldLength = this.fieldLength, iField, compareResult;
if (fieldLength === 2) {
if (vector1[0] === "2011" && vector1[1] === "Germany") {
if (vector2[0] === "2011" && vector2[1] === "Germany") {
return {
index: -1,
result: 0
};
}
return {
index: vector2[0] === "2011" ? 1 : 0,
result: -1
};
}
// any vector1 is larger as vector2 ("2011", "Germany")
if (vector2[0] === "2011" && vector2[1] === "Germany") {
return {
index: vector2[0] === "2011" ? 1 : 0,
result: 1
};
}
}
for (iField = 0; iField < fieldLength; iField++) {
compareResult = this.fieldCompare[iField](vector1[iField], vector2[iField]);
if (compareResult !== 0) {
return {
index: iField,
result: compareResult
};
}
}
return {
index: -1,
result: 0
};
}
It's important to mention that compareVectorsByY callback function should return object with two properties: index and result. The value of result property should be -1, 0 or 1. The value of index property should be -1 in case of result: 0 and be 0-based index of vectors where vector1 and vector2 are different.