Full-text search through complex structure Elasticsearch - elasticsearch

I have the following issue in case of a full-text search in Elasticsearch. I would like to search for all indexed attributes. However, one of my Project attributes is a very complex array of hashes/objects:
[
{
"title": "Group 1 title",
"name": "Group 1 name",
"id": "group_1_id",
"items": [
{
"pos": "1",
"title": "Position 1 title"
},
{
"pos": "1.1",
"title": "Position 1.1 title",
"description": "<p>description</p>",
"extra_description": {
"rotation": "2 years",
"amount": "1.947m²"
},
"inputs": {
"unit_price": true,
"total_net": true
},
"additional_inputs": [
{
"name": "additonal_input_name",
"label": "Additional input label:",
"placeholder": "Additional input placeholder",
"description": "Additional input description",
"type": "text"
}
]
}
]
}
]
My mappings look like this:
{:title=>{:type=>"text", :analyzer=>"english"},
:description=>{:type=>"text", :analyzer=>"english"},
:location=>{:type=>"keyword"},
:company=>{:type=>"keyword"},
:created_at=>{:type=>"date"},
:due_date=>{:type=>"date"},
:specification=>
{:type=>:nested,
:properties=>
{:id=>{:type=>"keyword"},
:title=>{:type=>"text"},
:items=>
{:type=>:nested,
:properties=>
{:pos=>{:type=>"keyword"},
:title=>{:type=>"text"},
:description=>{:type=>"text", :analyzer=>"english"},
:extra_description=>{:type=>:nested, :properties=>{:rotation=>{:type=>"keyword"}, :amount=>{:type=>"keyword"}}},
:additional_inputs=>
{:type=>:nested,
:properties=>
{:label=>{:type=>"keyword"},
:placeholder=>{:type=>"text"},
:description=>{:type=>"text"},
:type=>{:type=>"keyword"},
:name=>{:type=>"keyword"}
}
}
}
}
}
}
}
The question is, how to properly seek through it? For no nested attributes, it works as a charm, but for instance, I would like to seek by title in the specification, no result is returned. I tried both:
query:
{ nested:
{
multi_match: {
query: keyword,
fields: ['title', 'description', 'company', 'location', 'specification']
}
}
}
Or
{
nested: {
path: 'specification',
query: {
multi_match: {
query: keyword
}
}
}
}
Without any result.
Edit:
It's with elasticsearch-ruby for Ruby.
I am trying to query by: MODEL_NAME.all.search(query: with_specification("Group 1 title")) where with_specification is:
def with_specification(keyword)
{
bool: {
should: [
{
nested: {
path: 'specification',
query: {
bool: {
should: [
{
match: {
'specification.title': keyword,
}
},
{
multi_match: {
query: keyword,
fields: [
'specification.title',
'specification.id'
]
}
},
{
nested: {
path: 'specification.items',
query: {
match: {
'specification.items.title': keyword,
}
}
}
}
]
}
}
}
}
]
}
}
end

Querying on multi-level nested documents must follow a certain schema.
You cannot multi-match on nested & non-nested fields at the same time and/or query on nested fields under different paths.
You can wrap your queries in a bool-should but keep the 2 rules above in mind:
GET your_index/_search
{
"query": {
"bool": {
"should": [
{
"nested": {
"path": "specification",
"query": {
"bool": {
"should": [
{
"match": {
"specification.title": "TEXT" <-- standalone match
}
},
{
"multi_match": { <-- multi-match but 1st level path
"query": "TEXT",
"fields": [
"specification.title",
"specification.id"
]
}
},
{
"nested": {
"path": "specification.items", <-- 2nd level path
"query": {
"match": {
"specification.items.title": "TEXT"
}
}
}
}
]
}
}
}
}
]
}
}
}

Related

How to score based on number of substring occurrences within a field in ElasticSearch

I have an ElasticSearch document setup like so
{
metadata: {
content: "",
other_fields: ...
},
other_fields: ...
}
and am querying the content field like so
{
"function_score": {
"query": {
"multi-match": {
"query": "searchTerm",
"fields": ["content", "other_fields", ...]
}
},
"functions": [
{
"weight": {
"filter": {
"match": {
"metadata.content": {
"query": "searchTerm"
},
"weight": 3
}
}
},
{
other weight functions for the other fields with custom score values
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
This works great calculating the score for all of the fields based on matches (including the content field). However, I want to multiply the content field score by the number of occurrences of the search term in the content field.
For example when I search for "test" in the doc below, the score is 3 but I want it to be 9:
{
metadata: {
content: "test test test"
}
}
Any suggestions on how I can do this?
I could not replicate your query -- too many typos/syntax issues. Here's what I could reconstruct:
Let's first create some sample docs
POST metaa/_doc
{
"metadata": {
"content": "test"
}
}
POST metaa/_doc
{
"metadata": {
"content": "test test"
}
}
POST metaa/_doc
{
"metadata": {
"content": "test test test"
}
}
then querying & using script_score, inspired by this cool answer:
GET metaa/_search
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "test",
"fields": [
"metadata.content",
"other_fields"
]
}
},
"functions": [
{
"script_score": {
"script": {
"source": """
def docval = doc['metadata.content.keyword'].value;
String temp = docval.replace('test', "");
return (docval.length() - temp.length()) / 4;
"""
}
},
"filter": {
"match": {
"metadata.content": {
"query": "test"
}
}
},
"weight": 3
}
],
"boost_mode": "replace",
"score_mode": "sum"
}
}
}
yielding
[
{
"_index":"metaa",
"_type":"_doc",
"_id":"suh763EBG_KW3EFnjwNq",
"_score":9.0,
"_source":{
"metadata":{
"content":"test test test"
}
}
},
{
"_index":"metaa",
"_type":"_doc",
"_id":"seh763EBG_KW3EFnewP6",
"_score":6.0,
"_source":{
"metadata":{
"content":"test test"
}
}
},
{
"_index":"metaa",
"_type":"_doc",
"_id":"s-h_63EBG_KW3EFnMQMU",
"_score":3.0,
"_source":{
"metadata":{
"content":"test"
}
}
}
]
_score = 9 -> 6 -> 3.
Note: you may want to perform some validity checks within the script (could be a simple try/catch). But that's an exercise for the reader.

Elasticsearch with nested objects query

I have an index with a nested mapping.
I want to preform a query that will return the following: give me all the documents where each word in the search term appears in one or more of the nested documents.
Here is the index:
properties: {
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
Here is the latest query I try:
nested: {
path: "column_values_index_as_objects",
query: {
bool: {
must: [
{
match: {
"column_values_index_as_objects.value.analyzed": {
query: search_term,
boost: 10 * boost_factor,
operator: "or",
analyzer: "searchkick_search"
}
}
}
For example if I search the words 'food and water', I want that each word will appear in at least on nested document.
The current search returns the document even if only one of the words exists
Thanks for the help!
Update:
As Cristoph suggested, the solution works. now I have the following problem.
Here is my index:
properties: {
name: {
type: "text"
},
column_values_index_as_objects: {
type: "nested",
properties: {
value: {
ignore_above: 256,
type: 'keyword',
fields: {
word_middle: {
analyzer: "searchkick_word_middle_index",
type: "text"
},
analyzed: {
term_vector: "with_positions_offsets",
type: "text"
}
}
}
}
}
}
And the query I want to preform is if I search for 'my name is guy', and will give all the documents where all the words are found - might be in the nested documents and might in the name field.
For example, I could have a document with the value 'guy' in the name field and other words in the nested documents
In order to do this, I usually split the terms and generate a request like this (foo:bar is an other criteria on an other field) :
{
"bool": {
"must": [
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "food",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "and",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"nested": {
"path": "column_values_index_as_objects",
"query": {
"match": {
"column_values_index_as_objects.value.analyzed": {
"query": "water",
"boost": "10 * boost_factor",
"analyzer": "searchkick_search"
}
}
}
}
},
{
"query": {
"term": {
"foo": "bar"
}
}
}
]
}
}

elasticsearch: combine text match and array contains

We have documents in elastic saved in the following structure:
{
...
name: "name",
ancestors: ["id1", "id2", "id3"],
...
}
I want to create a search query that searches for name: "some name" AND ancestors contains "id1".
I've tried many queries but none seems to work, or return the desired result. If it's of any help, this combined query should only return one entry every time.
Some of the queries I've tried are the following:
filtered: {
query: {
query_string: {
query: "name:name"
},
term: {
ancestors: "id1"
}
}
}
__
match: {
name: "name",
ancestors: "id1"
},
defaultOperator: 'AND'
__
bool: {
must: { term: { name: "name" }},
filter: {
term: { ancestors: "id1" }
}
}
The mappings are the following:
{
"data": {
"mappings": {
"entry": {
"properties": {
"ancestors": {
"type": "string"
},
"id": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}
}
}
We haven't changed the default mappings, that's why ancestors is of type string, but I don't think this makes any difference
Try this query:
{
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "name",
"query": "stripe AND 1"
}
},
{
"match": {
"ancestors": "id1"
}
}
]
}
}
}

ElasticSearch - Get only matching nested objects with All Top level fields in search response

let say I have following Document:
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
},
{
name: 'xyz',
surname: 'wef'
},
{
name: 'defg',
surname: 'pqr'
}
]
}
I want to Get only matching nested objects with All Top level fields in search response.
I mean If I search/filter for users with name 'abc', I want below response
{
id: 1,
name: "xyz",
users: [
{
name: 'abc',
surname: 'def'
}
]
}
How can I do that?
Reference : select matching objects from array in elasticsearch
If you're ok with having all root fields except the nested one and then only the matching inner hits in the nested field, then we can re-use the previous answer like this by specifying a slightly more involved source filtering parameter:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": { <---- this is where the magic happens
"_source": [
"name", "surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Maybe late, I use nested sorting to limit element on my nested relation, here a example :
"sort": {
"ouverture.periodesOuvertures.dateDebut": {
"order": "asc",
"mode": "min",
"nested_filter": {
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
"nested_path": "ouverture.periodesOuvertures"
}
},
Since 5.5 ES (I think) you can use filter on nested query.
Here a example of nested query filter I use:
{
"nested": {
"path": "ouverture.periodesOuvertures",
"query": {
"bool": {
"must": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
],
"filter": [
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"gte": "2017-08-29",
"format": "yyyy-MM-dd"
}
}
},
{
"range": {
"ouverture.periodesOuvertures.dateFin": {
"lte": "2017-09-30",
"format": "yyyy-MM-dd"
}
}
}
]
}
}
}
}
Hope this can help ;)
Plus if you ES is not in the last version (5.5) inner_hits could slow your query Including inner hits drastically slows down query results
https://www.elastic.co/guide/en/elasticsearch/reference/5.6/search-request-inner-hits.html#nested-inner-hits-source
"inner_hits": {
"_source" : false,
"stored_fields" : ["name", "surname"]
}
but you may need to change mapping to set those fields as "stored_fields" , otherwise you can use
"inner_hits": {}
to get a result that not that perfect.
You can make such a request, but the response will have internal fields starting with _
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {},
"query": {
"bool": {
"must": [
{ "match": { "users.name": "abc" }}
]
}
}
}
}
}
In one of my projects, My expectation was to retrieve unique conversation messages text(inner fields like messages.text) having specific tags. So instead of using inner_hits, I used aggregation like below,
final NestedAggregationBuilder aggregation = AggregationBuilders.nested("parentPath", "messages").subAggregation(AggregationBuilders.terms("innerPath").field("messages.tag"));
final NativeSearchQuery searchQuery = new NativeSearchQueryBuilder()
.addAggregation(aggregation).build();
final Aggregations aggregations = elasticsearchOperations.search(searchQuery, Conversation.class).getAggregations();
final ParsedNested parentAgg = (ParsedNested) aggregations.asMap().get("parentPath");
final Aggregations childAgg = parentAgg.getAggregations();
final ParsedStringTerms childParsedNested = (ParsedStringTerms) childAgg.asMap().get("innerPath");
// Here you will get unique expected inner fields in key part.
Map<String, Long> agg = childParsedNested.getBuckets().stream().collect(Collectors.toMap(Bucket::getKeyAsString, Bucket::getDocCount));
I use the following body to get that result (I have set the full path to the values):
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
Also another way exists:
{
"_source": {
"includes": [ "*" ],
"excludes": [ "users" ]
},
"query": {
"nested": {
"path": "users",
"inner_hits": {
"_source": false,
"docvalue_fields": [
"users.name", "users.surname"
]
},
"query": {
"bool": {
"must": [
{
"term": {
"users.name": "abc"
}
}
]
}
}
}
}
}
See results in inner_hits of the result hits.
https://www.elastic.co/guide/en/elasticsearch/reference/7.15/inner-hits.html#nested-inner-hits-source

How to search on multiple fields in URI Search

I would like to perform an AND operation in ElasticSearch using the URI Search (q=). How do I do it?
If I have document like:
[{ "name":"Test 1", "pub":"2"}, { "name":"Test 2", "pub":"1"}, { "name":"A", "pub":"1"}]
And I would like to query for documents containing with a name containing "Test" AND where pub equals "1". How do I do that?
Thanks!
Assuming your document looks like this:
{
"my_field": [
{ "name":"Test 1", "pub":"2"},
{ "name":"Test 2", "pub":"1"},
{ "name":"A", "pub":"1"}
]
}
And the mapping of my_field is of type nested similar to this:
{
"mappings": {
"doc_type": {
"properties": {
"my_field": {
"type": "nested",
"properties": {
"name": { "type": "string" },
"pub": {"type": "integer" }
}
}
}
}
}
}
Then you can query your index and get the expected documents with the following nested query:
POST /_search
{
"query": {
"nested": {
"path": "my_field",
"query": {
"bool": {
"filter": [
{
"match": {
"name": "Test"
}
},
{
"match": {
"pub": 1
}
}
]
}
}
}
}
}
Actually you'd need nested fields. The following is a good resource.
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-objects.html

Resources