Search specific fields in nested documents as one document - elasticsearch

I have the following structure:
{
"mappings": {
"document": {
"properties": {
"title": {
"type": "string"
},
"paragraphs": {
"type": "nested",
"properties": {
"paragraph": {
"type" : "object",
"properties" : {
"content": { "type": "string"},
"number":{"type":"integer"}
}
}
}
}
}
}
}
}
With these sample documents
{
"title":"Dubai seeks cause of massive hotel fire at New Year",
"paragraphs":[
{"paragraph": {"number": "1", "content":"Firefighters managed to subdue the blaze, but part of the Address Downtown Hotel is still smouldering."}},
{"paragraph": {"number": "2", "content":"A BBC reporter says a significant fire is still visible on the 20th floor, where the blaze apparently started."}},
{"paragraph": {"number": "3", "content":"The tower was evacuated and 16 people were hurt. But a fireworks show went ahead at the Burj Khalifa tower nearby."}},
{"paragraph": {"number": "4", "content":"The Burj Khalifa is the world's tallest building and an iconic symbol of the United Arab Emirates (UAE)."}}]
}
{
"title":"Munich not under imminent IS threat",
"paragraphs":[{"paragraph": {"number": "1", "content":"German officials say there is no sign of any imminent terror attack, after an alert that shut down two Munich railway stations on New Year's Eve."}}]
}
I can now search each paragraph using
{
"query": {
"nested": {
"path": "paragraphs", "query": {
"query_string": {
"default_field": "paragraphs.paragraph.content",
"query": "Firefighters AND still"
}
}
}
}
}
Question: How can I wright a query that searches several paragraphs but only the content field?
This works, but searches all fields
{
"query": {
"query_string": {
"query": "Firefighters AND apparently AND 1"
}
}
}
It is matching Firefighters from paragraph 1 and apparently from paragraph 2 which I want. I do however not want 1 to be matched since it isn't a content field.
Clarification: The first search performs a search per paragraph which I want some times. I do however also want to be able to search the whole document (all paragraphs) sometimes.
Solution
I added "include_in_parent": true as it is mentioned in https://www.elastic.co/guide/en/elasticsearch/reference/1.7/mapping-nested-type.html

The way you are querying is wrong because nested documents are indexed separately. See the last para from the doc.
Your query
{
"query": {
"nested": {
"path": "paragraphs",
"query": {
"query_string": {
"default_field": "paragraphs.paragraph.content",
"query": "Firefighters AND apparently"
}
}
}
}
}
is looking for both words in the same para and hence you are not getting the result. You need to query them separately like this
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "paragraphs",
"query": {
"match": {
"paragraphs.paragraph.content": "firefighters"
}
}
}
},
{
"nested": {
"path": "paragraphs",
"query": {
"match": {
"paragraphs.paragraph.content": "apparently"
}
}
}
}
]
}
}
}
This will give you the right results.
As a side note I do not think you need object datatype inside paragraphs. Following will work fine too
"paragraphs": {
"type": "nested",
"properties": {
"content": {
"type": "string"
},
"number": {
"type": "integer"
}
}
}
Hope this helps!!

Related

Proximity-Relevance in elasticsearch

I have an json record in the elastic search with fields
"streetName": "5 Street",
"name": ["Shivam Apartments"]
I tried the below query but it does not return anything if I add streetName bool in the query
{
"query": {
"bool": {
"must": [
{
"bool": {
"must": {
"match": {
"name": {
"query": "shivam apartments",
"minimum_should_match": "80%"
}
}
}
}
},
{
"bool": {
"must": {
"match": {
"streetName": {
"query": "5 street",
"minimum_should_match": "80%"
}
}
}
}
}
]
}
}
}
Document Mapping
{
"rabc_documents": {
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "autocomplete_analyzer",
"position_increment_gap": 0
},
"streetName": {
"type": "keyword"
}
}
}
}
}
Based on the E.S Documentation (Keywords in Elastic Search)
"Keyword fields are only searchable by their exact value".
Along with that keywords are case sensitive as well.
Taking aforementioned into account:
Searching for "5 street" will not match "5 Street" ('s' vs 'S') on keyword field
minimum_should_match will not work on a keyword field.
Suggestion: For partial matches use "text" mapping instead of "keyword". Keywords are meant to be used for filtering, aggregation based on term, etc.

How to boost specific terms in elastic search?

If I have the following mapping:
PUT /book
{
"settings": {},
"mappings": {
"properties": {
"title": {
"type": "text"
},
"author": {
"type": "text"
}
}
}
}
How can i boost specific authors higher than others?
In case of the below example:
PUT /book/_doc/1
{
"title": "car parts",
"author": "john smith"
}
PUT /book/_doc/2
{
"title": "car",
"author": "bob bobby"
}
PUT /book/_doc/3
{
"title": "soap",
"author": "sam sammy"
}
PUT /book/_doc/4
{
"title": "car designs",
"author": "joe walker"
}
GET /book/_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "car" }},
{ "match": { "title": "parts" }}
]
}
}
}
How do I make it so my search will give me books by "joe walker" are at the top of the search results?
One solution is to make use of function_score.
The function_score allows you to modify the score of documents that are retrieved by a query.
From here
Base on your mappings try to run this query for example:
GET book/_search
{
"query": {
"function_score": {
"query": {
"bool": {
"should": [
{
"match": {
"title": "car"
}
},
{
"match": {
"title": "parts"
}
}
]
}
},
"functions": [
{
"filter": {
"match": {
"author": "joe walker"
}
},
"weight": 30
}
],
"max_boost": 30,
"score_mode": "max",
"boost_mode": "multiply"
}
}
}
The query inside function_score is the same should query that you used.
Now we want to take all the results from the query and give more weight (increase the score) to joe walker's books, meaning prioritize its books over the others.
To achieved that we created a function (inside functions) that compute a new score for each document returned by the query filtered by joe walker books.
You can play with the weight and other params.
Hope it helps

Search-as-you-type inside arrays

I am trying to implement a search-as-you-type query inside an array.
This is the structure of the documents:
{
"guid": "6f954d53-df57-47e3-ae9e-cb445bd566d3",
"labels":
[
{
"name": "London",
"lang": "en"
},
{
"name": "Llundain",
"lang": "cy"
},
{
"name": "Lunnainn",
"lang": "gd"
}
]
}
and up to now this is what I came with:
{
"query": {
"multi_match": {
"fields": ["labels.name"],
"query": name,
"type": "phrase_prefix"
}
}
which works exactly as requested.
The problem is that I would like to search also by language.
What I tried is:
{
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
but these queries act on separate values of the array.
So, for example, I would like to search only Welsh language (cy). That means that my query that contains the city name should match only values that have "cy" on the "lang" tag.
How do I write this kind of query?
Internally, ElasticSearch flattens nested JSON objects, so it can't correlate the lang and name of a specific element in the labels array. If you want this kind of correlation, you'll need to index your documents differently.
The usual way to do this is to use the nested data type with a matching nested query.
The query would end up looking something like this:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}
But note that you'll need to also specify nested mappings for your labels, e.g.:
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
/* you might want to add other mapping-related configuration here */
},
"lang": {
"type": "keyword"
}
}
}
}
Other ways to do this include:
Indexing each label as a separate document, repeating the guid field
Using parent/child documents
You should use Nested datatype in mapping instead of Object datatype. For detail explanation refer this:
https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html
So, you should define mapping of your field something like this:
{
"properties": {
"labels": {
"type": "nested",
"properties": {
"name": {
"type": "text"
},
"lang": {
"type": "keyword"
}
}
}
}
}
After this you could query using Nested Query as:
{
"query": {
"nested": {
"path": "labels",
"query": {
"bool": {
"must": [
{
"multi_match": {
"fields": ["labels.name"],
"query": "london",
"type": "phrase_prefix"
}
},
{
"term": {
"labels.lang": "gd"
}
}
]
}
}
}
}
}

promote results in Elasticsearch

I searched in the documentation for a way to promote ElasticSearch results if a specific field has a certain value, but I didn't find any good practice, for example, I have a user that lives in Paris if the user search for a query I want the documents that are relevant to Paris to appear the first or just to be promoted.
There is a lot to this but you want to research "boosting". This can be done at the mapping level or the query level.
Mapping example:
{
"mappings": {
"_doc": {
"properties": {
"location": {
"type": "keyword",
"boost": 2 <--- 2x boost to the final score
}
}
}
}
}
Query Example:
GET /_search
{
"query": {
"bool": {
"must": {
"match": {
"content": {
"query": "full text search",
"operator": "and"
}
}
},
"should": [
{ "term": {
"location": {
"value": "xxx",
"boost": 3 <--- 3x boost if the location matches
}
}}
]
}
}
}

How to get Elastic search to return both exact matched and then other matches in result

Need help with Elasticsearch. I try to get first exact match result then those documents that have one field matched using the following query but with no luck. Basically, trying to get top score hits first and then less accurate and only matched by one field in the total search result.
The mapping is as following:
{
"palsx1493": {
"mappings": {
"pals": {
"properties": {
"aboutme": {
"type": "string"
},
"dob": {
"type": "date",
"format": "date"
},
"fccode": {
"type": "string"
},
"fcname": {
"type": "string"
},
"learning": {
"type": "nested",
"properties": {
"skillslevel": {
"type": "string"
},
"skillsname": {
"type": "string"
}
}
},
"name": {
"type": "string"
},
"rating": {
"type": "string"
},
"teaching": {
"type": "nested",
"properties": {
"skillslevel": {
"type": "string"
},
"skillsname": {
"type": "string"
}
}
},
"trate": {
"type": "string"
},
"treg": {
"type": "string"
}
}
}
}
}
}
When Searching, I need the result to return the exact matched documents followed by lower score matched with the teaching skillname in that prioritized order. what happens now is that I get the exact matches correctly first and then I get the learning.skillname matched, and then teaching.skillname matched. I want these two last ones swapped having the teaching.skillname coming after the exact matched results.
Exact match:
1. fcname (is crom country name and can be either a specific name or just set to "Any Country".
2. dob: Date of birth is a range value - a range value is given as input
3. teaching: skillname
4. learning: skillname
This is what I have tried with no luck:
{
"query": {
"bool": {
"should": [
{ "match": { "fcname": "spain"}},
{ "range": {
"bod": {
"from": "1950-10-10",
"to": "1967-12-12"
}
}
},
{
"nested": {
"path": "learning",
"score_mode": "max",
"query": {
"bool": {
"must": [
{ "match": { "learning.skillname": learningSkillName}}
]
}
}
}
},
{
"nested": {
"path": "teaching",
"query": {
"bool": {
"must": [
{ "match": { "teaching.skillname": teachingSkillName}}
]
}
}
}
}
]
}
}
}
Please look into indices. The default is a full text search which does inverted indexing to store data. So it would store the string according to the analyzer.
Fo exact string match please use : index = 'not_analyzed'
eg.
"nick"{
"type": "string",
"index":"not_analyzed"
},
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-core-types.html
I figured it out. Solution was to use function_score feature to override/ add score to a document with certain matched field. Replacing the nested part above with following gave me the correct result:
"nested": {
"path": "teaching",
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{ "match": { "teaching.skillname": "xxx"}}
]
}
},
"functions": [
{
"script_score": {
"script": "_score + 2"
}
}],

Resources