Is it possible to sort nested documents in ElasticSearch? - elasticsearch

Lets say I have the following mapping:
"site": {
"properties": {
"title": { "type": "string" },
"description": { "type": "string" },
"category": { "type": "string" },
"tags": { "type": "array" },
"point": { "type": "geo_point" }
"localities": {
type: 'nested',
properties: {
"title": { "type": "string" },
"description": { "type": "string" },
"point": { "type": "geo_point" }
}
}
}
}
I'm then doing an "_geo_distance" sort on the parent document and am able to sort the documents on "site.point". However I would also like the nested localities to be sorted by "_geo_distance", inside the parent document.
Is this possible? If so, how?

Unfortunately, no (at least not yet).
A query in ElasticSearch just identifies which documents match the query, and how well they match.
To understand what nested documents are useful for, consider this example:
{
"title": "My post",
"body": "Text in my body...",
"followers": [
{
"name": "Joe",
"status": "active"
},
{
"name": "Mary",
"status": "pending"
},
]
}
The above JSON, once indexed in ES, is functionally equivalent to the following. Note how the followers field has been flattened:
{
"title": "My post",
"body": "Text in my body...",
"followers.name": ["Joe","Mary"],
"followers.status": ["active","pending"]
}
A search for: followers with status == active and name == Mary would match this document... incorrectly.
Nested fields allow us to work around this limitation. If the followers field is declared to be of type nested instead of type object then its contents are created as a separate (invisible) sub-document internally. That means that we can use a nested query or nested filter to query these nested documents as individual docs.
However, the output from the nested query/filter clauses only tells us if the main doc matches, and how well it matches. It doesn't even tell us which of the nested docs matched. To figure that out, we'd have to write code in our application to check each of the nested docs against our search criteria.
There are a few open issues requesting the addition of these features, but it is not an easy problem to solve.
The only way to achieve what you want is to index your sub-docs as separate documents, and to query and sort them independently. It may be useful to establish a parent-child relationship between the main doc and these separate sub-docs. (see parent-type mapping, the Parent & Child section of the index api docs, and the top-children and has-child queries.
Also, an ES user has mailed the list about a new has_parent filter that they are currently working on in a fork. However, this is not available in the main ES repo yet.

Related

Merging fields in Elastic Search

I am pretty new to Elastic Search. I have a dataset with multiple fields like name, product_info, description etc., So while searching a document, the search term can come from any of these fields (let us call them as "search core fields").
If I start storing the data in elastic search, should I derive a field which is a concatenated term of all the "search core fields" ? and then index this field alone ?
I came across _all mapping concept and little confused. Does it do the same ?
no, you don't need to create any new field with concatenated terms.
You can just use _all with match query to search a text from any field.
About _all, yes, it searches the text from any field
The _all field has been removed in ES 7, so it would only work in ES 6 and previous versions. The main reason for this is that it used too much storage space.
However, you can define your own all field using the copy_to feature. You basically specify in your mapping which fields should be copied to your custom all field and then you can search on that field.
You can define your mapping like this:
PUT my-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"copy_to": "custom_all"
},
"product_info": {
"type": "text",
"copy_to": "custom_all"
},
"description": {
"type": "text",
"copy_to": "custom_all"
},
"custom_all": {
"type": "text"
}
}
}
}
PUT my-index/_doc/1
{
"name": "XYZ",
"product_info": "ABC product",
"description": "this product does blablabla"
}
And then you can search on your "all" field like this:
POST my-index/_search
{
"query": {
"match": {
"custom_all": {
"query": "ABC",
"operator": "and"
}
}
}
}

how to index questions and answers in elaticsearch

I am doing a project to index questions and answers of a website in elasticsearch (version 6) for search purpose.
I have first thought of creating two indexes as shown below, one for questions and one for answers.
questions mapping:
{"mappings": {
"question": {
"properties": {
"title":{
"type":"text"
},
"question": {
"type": "text"
},
"questionId":{
"type":"keyword"
}
}
}
}
}
answers mapping:
{"mappings": {
"answer": {
"properties": {
"answer":{
"type":"text"
},
"answerId": {
"type": "keyword"
},
"questionId":{
"type":"keyword"
}
}
}
}
}
I have used multimatch query along with term and top_hits aggregation to search the indexed Q&As (referred question).I used this method to remove the duplicates from the search results. As answers or the question itself of the same question can appear in the result. I only want one entry per question in the results. the problem I am facing is to paginate the results. there is no possible way to paginate aggregation in elasticsearch. It can only paginate hits not aggregations.
then I thought of saving the both question and answers in one document, answers in a Json array. the problem with this approach is that there is no clean way to add, remove, update a specific answer in a given question document. only way I found was using a groovy script (referred question). which is deprecated in elasticsearch v6 AFAIK.
Is there a better and clean way to design this ?
Thanks.
Parent-Child Relationship
Use the parent-child relationship. It is similar to the nested model, and allows association of one entity with another. You can associate one document type with another, in a one-to-many relationship.
More information on here: https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child.html
Child documents can be added, changed, or deleted without affecting the parent nor other children. You can do pagination on the parent documents using the Scroll API.
Child documents can be retrieved using the has_parent join.
The trade-off: you do not have to take care of duplicates and pagination problems, but parent-child queries can be 5 to 10 times slower than the equivalent nested query.
Your mapping can be like the following:
PUT /my-index
{
"mappings": {
"question": {
"properties": {
"title": {
"type": "text"
},
"question": {
"type": "text"
},
"questionId": {
"type": "keyword"
}
}
},
"answer": {
"_parent": {
"type": "question"
},
"properties": {
"answer": {
"type": "text"
},
"answerId": {
"type": "keyword"
},
"questionId": {
"type": "keyword"
}
}
}
}
}

Elasticsearch: Duplicate properties in a single record

I have to find every document in Elasticsearch that has duplicate properties. My mapping looks something like this:
"type": {
"properties": {
"thisProperty": {
"properties" : {
"id":{
"type": "keyword"
},
"other_id":{
"type": "keyword"
}
}
}
The documents I have to find have a pattern like this:
"thisProperty": [
{
"other_id": "123",
"id": "456"
},
{
"other_id": "123",
"id": "456"
},
{
"other_id": "4545",
"id": "789"
}]
So, I need to find any document by type that has repeat property fields. Also I cannot search by term because I do not what the value of either Id field is. So far the API hasn't soon a clear way to do this via query and the programmatic approach is possible but cumbersome. Is it possible to get this result set in a elasticsearch query? If so, how?
(The version of Elasticsearch is 5.3)

Mapping in elasticsearch

Good morning, In my code I can't search data which contain separate words. If I search on one word all good. I think problem in mapping. I use postman. When I put in URL http://192.168.1.153:9200/sport_scouts/video/_mapping and use method GET I get:
{
"sport_scouts": {
"mappings": {
"video": {
"properties": {
"hashtag": {
"type": "string"
},
"id": {
"type": "long"
},
"sharing_link": {
"type": "string"
},
"source": {
"type": "string"
},
"title": {
"type": "string"
},
"type": {
"type": "string"
},
"user_id": {
"type": "long"
},
"video_preview": {
"type": "string"
}
}
}
}
}
}
All good title have type string but if I search on two or more words I get empty massive. My code in Trait:
public function search($data) {
$this->client();
$params['body']['query']['filtered']['filter']['or'][]['term']['title'] = $data;
$search = $this->client->search($params)['hits']['hits'];
dump($search);
}
Then I call it in my Controller. Can you help me with this problem?
The reason that your indexed data can't be found is caused by a mismatch of the analyzing during indexing and a strict term filter when querying the data.
With your mapping configuration, you are using the default analyzing which (besides many other operations) does a tokenizing. So every multi-word data you insert is split at punctuation or whitespaces. If you insert for example "some great sentence", elasticsearch maps the following terms to your document: "some", "great", "sentence", but not the term "great sentence". So if you do a term filter on "great sentence" or any other part of the original value containing a whitespace, you will not get any results.
Please see the elasticsearch docs on how to configure your mapping for indexing without analyzing (https://www.elastic.co/guide/en/elasticsearch/guide/current/mapping-intro.html#_index_2) or consider doing a match query instead of a term filter on the existing mapping (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html).
Please be aware that if you switch to not_analyzed you will be disabling many of the great fuzzy fulltext query functionality. Of course you can set up a mapping that does both, analyzed and not_analyzed in different fields. Then it's up on you to decide on which field you want to query on.

elasticsearch: sorting by values of matching nested document

I chose nested documents to realize a multilingual book search with common book data in root of the doc and edition data in nested docs. The mapping:
{
"book": {
"properties": {
"bookinfo": {
...
},
"editions": {
"type": "nested",
"properties": {
"editionid": {
"type": "long",
"store": "yes",
"index": "no"
},
"title_author": {
"type": "string",
"store": "no",
"index": "analyzed"
},
"title": {
"type": "string",
"store": "yes",
"index": "not_analyzed"
},
"languageid": {
"type": "short",
"store": "yes",
"index": "no"
},
"ratings": {
"type": "integer",
"store": "no"
}
}
}
}
}
}
Different editions of one book go in the nested doc - that can be different languages but also just different publishers, isbn and so on. Sometimes even the title differs from editions in the same language.
When searching the document (on the title_author field) I need to know the other nested doc information like languageid and ratings to boost the matching score according to the users language skills and relevance of the edition.
The reason why I don't put every edition in a separate document is that I only want to have one hit (the best matching one) per book. And ElasticSearch doesn't have a UNIQUE functionality. And I need pagination. So whenever I change a result set after querying with double books inside, pagination of ElasticSearch breaks.
Nested sorting functionality doesn't seem to help here, since it sorts over all nested documents of one book.
How do I access the information of the matching nested doc?
And if that is not achievable, how could I solve this by a multi search?
In order to access the nested documents fields you can use:
doc['editions. languageid'].value
And for the boosting part, try some of the examples here:
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html
Is that what you're looking for?

Resources