Freebase MQL Query - Sort by Relevance - sorting

Say I have the following MQL query:
[{
"id": null,
"name": null,
"type": "/base/givennames/given_name",
"sort": "name",
"limit": 100
}]
I get a list of the first 100 names sorted alphabetically:
[
{
"id": "/wikipedia/fr/Ay$015Fe",
"name": "A'isha",
"type": "/base/givennames/given_name"
},
{
"id": "/en/aadu",
"name": "Aadu",
"type": "/base/givennames/given_name"
},
{
"id": "/m/0g9wn3v",
"name": "Aage",
"type": "/base/givennames/given_name"
},
{
"id": "/en/aakarshan",
"name": "Aakarshan",
"type": "/base/givennames/given_name"
},
...
]
Is there a way to get the 100 most relevant / common / important names instead?
I want to do this for a number of queries, not just given names - so I am not exactly sure how to define the relevancy metric. Perhaps by sorting by the number of inbound links to the id with a subquery?
The search API returns a score element, but I believe it's a relevancy metric related to the search query term (null in this case). I just started with MQL yesterday and I have no idea if this is possible.

You can do this with the Search API like this:
https://www.googleapis.com/freebase/v1/search?filter=(all+type:/base/givennames/given_name)&limit=100
This will give you a list of 100 given names. We don't give out the exact details of how they're ordered but the number of links is definitely a factor.

Related

Elasticsearch Rank based on rarity of a field value

I'd like to know how can I rank lower items, which have fields that are frequently appearing among the results.
Say, we have a similar result set:
"name": "Red T-Shirt"
"store": "Zara"
"name": "Yellow T-Shirt"
"store": "Zara"
"name": "Red T-Shirt"
"store": "Bershka"
"name": "Green T-Shirt"
"store": "Benetton"
I'd like to rank the documents in such a manner that the documents containing frequently found fields,
"store" in this case, are deboosted to appear lower in the results.
This is to achieve a bit of variety, so that the search doesn't yield top results from the same store.
In the example above, if I search for "T-Shirt", I want to see one Zara T-Shirt at the top and the rest
of Zara T-Shirts should be appearing lower, after all other unique stores.
So far I tried to research for using aggregation buckets for sorting or script sorting, but without success.
Is it possible to achieve this inside of the search engine?
Many thanks in advance!
This is possible with a combination of diversified sampler aggregation and top hits aggregation, as learned from the Elastic forum. I don't know what the performance implications are, if used on a high-load production system. Here is a code example, use at your own risk:
{
"query": {}, // whatever query
"size": 0, // since we don't use hits
"aggs": {
"my_unbiased_sample": {
"diversified_sampler": {
"shard_size": 100,
"field": "store"
},
"aggs": {
"keywords": {
"top_hits": {
"_source": {
"includes": [ "name", "store" ]
},
"size": 100
}
}
}
}
}
}

Sorting a set of results with pre-ordered items

I have a list of pre-ordered items (order by score ASC) like:
[{
"id": "id2",
"score": 1
}, {
"id": "id12",
"score": 1
}, {
"id": "id8",
"score": 1.4
}, {
"id": "id9",
"score": 1.4
}, {
"id": "id14",
"score": 1.75
}, {
...
}]
Let's say I have an elasticsearch index with a massive of items. Note that there's no "score" field in indexed documents.
Now I want elasticsearch to return only those items with ids in the said list. Ok, this one is easy. I'm now stuck at sorting the result. That means I need the result to be sorted exactly as my pre-ordered list above.
Any suggestion for me to achieve that?
I'm not an English native speaker, so sorry for my grammar and words.
As version of 7.4, Elastic introduced pinned query that promotes selected documents to rank higher than those matching a given query. In your case this search query should return what you want:
GET /_search
{
"query": {
"pinned" : {
"ids" : ["id2", "id12", "id8"],
"organic" : {
other queries
}
}
}
}
For more information you can check Elasticsearch official documentation here.

Term aggregation consider only the prefix to aggregate

In my elastic search documents I have users and some sort of representation of his place in the organization, for instance:
The CEO is position 1
The ones directly under the CEO will be 1/1, 1/2, 1/3, and so on
The ones under 1/1 will be 1/1/1, 1/1/2, 1/2/3, etc
I have an aggregration in which I want to aggregate by VP, so I want everybody under 1/1, 1/2, 1/3.
To do that I created a query like this one:
"aggs": {
"information": {
"terms":{
"field": "position",
"script": "_value.replaceAll('(1/1/[0/]*[1-9]).+', '$1')"
}
This would get the prefix and replace by the group in the regex, so everyone would have the same position, then I could make the aggregation. This has a poor performance.
I was thinking about using something like this
"aggs": {
"information": {
"terms":{
"field": "position",
"prefix": "1/1/.*'
}
So I would group by everyone that starts with 1/1 (1/1/1/1, 1/1/1/2, 1/1/1/3 would be one group, 1/1/2/1, 1/1/2/2, 1/1/2/3 would be a second group and so on).
Is it possible?
If you know beforehand that on how deep level you want to run this aggregation, you could simply store these levels at different fields:
{
"name": "Jack",
"own_level": 4,
"level_1": "1",
"level_2": "3",
"level_3": "2",
"level_4": null
}
But this would require many nested terms aggregations to reproduce the hierarchy. This version would make one such aggregation sufficient:
{
"name": "Jack",
"own_level": 4,
"level_1": "1",
"level_2": "1/3",
"level_3": "1/3/2",
"level_4": null
}
It also has simpler query filter if you want to focus on people under for example 1/1 by having a filter on field level_2 and terms aggregation on field level_3.
If you don't know the maximum level of the hierarchy you can use nested documents like this, but then queries and aggregations get a bit more complex:
{
"name": "Jack",
"own_level": 4,
"bosses": [
{
"level": 1,
"id": "1"
},
{
"level": 2,
"id": "1/3"
},
{
"level": 3,
"id": "1/3/2"
}
]
}

Elasticsearch how to return the number of maching term for each document

I'm very new to elasticsearch and I'm interested in how is possible to retrieve the number of matching term inside each document processed.
I know that I can get a score, but I'm looking to get number of matches for each document, is it possible?
Edit after mguillermin answer
What I was looking to is to query my index, and receive at the same time the tf on each document result, and not simply to find the term frequency of a single document
Thanks
For checking a single document, you can retrieve this kind of information using the explain API : http://www.elasticsearch.org/guide/reference/api/explain/
If you need this information collected along with the query results, you can just add the "explain": true to the body sent to the _search. Ex :
{
"explain": true,
"query": {
"term": {
"description": "test"
}
}
}
With this parameter, you will get for each hit the associated _explanation data. Ex :
"_explanation": {
"value": 1.4845161,
"description": "fieldWeight(description:test in 63), product of:",
"details": [
{
"value": 1,
"description": "tf(termFreq(description:test)=1)"
},
{
"value": 5.9380646,
"description": "idf(docFreq=23, maxDocs=3348)"
},
{
"value": 0.25,
"description": "fieldNorm(field=description, doc=63)"
}
]
}

Is it possible to sort nested documents in ElasticSearch?

Lets say I have the following mapping:
"site": {
"properties": {
"title": { "type": "string" },
"description": { "type": "string" },
"category": { "type": "string" },
"tags": { "type": "array" },
"point": { "type": "geo_point" }
"localities": {
type: 'nested',
properties: {
"title": { "type": "string" },
"description": { "type": "string" },
"point": { "type": "geo_point" }
}
}
}
}
I'm then doing an "_geo_distance" sort on the parent document and am able to sort the documents on "site.point". However I would also like the nested localities to be sorted by "_geo_distance", inside the parent document.
Is this possible? If so, how?
Unfortunately, no (at least not yet).
A query in ElasticSearch just identifies which documents match the query, and how well they match.
To understand what nested documents are useful for, consider this example:
{
"title": "My post",
"body": "Text in my body...",
"followers": [
{
"name": "Joe",
"status": "active"
},
{
"name": "Mary",
"status": "pending"
},
]
}
The above JSON, once indexed in ES, is functionally equivalent to the following. Note how the followers field has been flattened:
{
"title": "My post",
"body": "Text in my body...",
"followers.name": ["Joe","Mary"],
"followers.status": ["active","pending"]
}
A search for: followers with status == active and name == Mary would match this document... incorrectly.
Nested fields allow us to work around this limitation. If the followers field is declared to be of type nested instead of type object then its contents are created as a separate (invisible) sub-document internally. That means that we can use a nested query or nested filter to query these nested documents as individual docs.
However, the output from the nested query/filter clauses only tells us if the main doc matches, and how well it matches. It doesn't even tell us which of the nested docs matched. To figure that out, we'd have to write code in our application to check each of the nested docs against our search criteria.
There are a few open issues requesting the addition of these features, but it is not an easy problem to solve.
The only way to achieve what you want is to index your sub-docs as separate documents, and to query and sort them independently. It may be useful to establish a parent-child relationship between the main doc and these separate sub-docs. (see parent-type mapping, the Parent & Child section of the index api docs, and the top-children and has-child queries.
Also, an ES user has mailed the list about a new has_parent filter that they are currently working on in a fork. However, this is not available in the main ES repo yet.

Resources