Elasticsearch partial matching of a number field - elasticsearch

I'm trying to partially match a number field. In the query below, I'd like id (which is defined as a long) to match any document which starts with 419. (so 4191 should match, as should 419534 but not 123419)
{
"size": 20,
"from": 0,
"sort": [{
"customerName": "asc"
}],
"query": {
"bool": {
"must": [{
"bool": {
"should": [{
"term": {
"id": 419
}
}]
}
}]
}
}
}
Anyone got a neat solution to use in my query?

To avoid a egde ngram, you could declare a not analyzed text sub field in your id mapping :
"mappings": {
"default": {
"properties": {
"id": {
"type": "integer",
"fields": {
"prefixed": {
"type": "string",
"index": "not_analyzed"
}
}
},
...
}
}
}
and use a prefix query against that field:
"query": {
"prefix" : {
"id.prefixed" : { "value" : 419 }
}
}

in a string field you can use wildcard query:
{
"query" : {
"wildcard" : { "id" : "*419*" }
}
}

Related

Elasticseach wildcard query on nested types

I'm trying to run a wildcard query on a nested type in ElasticSearch. I have records with the following structure:
{
"field_1": "value_1",
"nested_field_1": [
{
"field_type": "some_field_type",
"field_value": "some_value"
},
{
"field_type": "another_field_type",
"field_value": "another_value"
}
]
}
I want to be able to run wildcard query on the nested_field, either on field_value or on field_type.
I can query for an exact match with this syntax:
"query": {
"nested": {
"path": "nested_field_1",
"query": {
"bool": {
"must": [
{
"match": {
"nested_field_1.field_value": "another_value"
}
}
]
}
}
}
}
}
But replacing the match with wildcard doesn't yield any results.
Any help would be welcome.
So I just tried your example and it gives me the result and used elasticsearch official wildcard query doc.
Index Def
{
"mappings": {
"properties": {
"field_1": {
"type": "text"
},
"nested_field_1" :{
"type" : "nested",
"properties" : {
"field_type" :{
"type" : "text"
},
"field_value" :{
"type" : "integer" --> created as interfere field
}
}
}
}
}
}
Index doc
{
"field_1": "value_1",
"nested_field_1": [
{
"field_type": "some_field_type",
"field_value": 20
},
{
"field_type": "another_field_type",
"field_value": 40
}
]
}
Wildcard search query
{
"query": {
"nested": {
"path": "nested_field_1",
"query": {
"bool": {
"must": [
{
"wildcard": { --> note
"nested_field_1.field_type": {
"value": "another_field_type"
}
}
}
]
}
}
}
}
}
Search result
"nested_field_1": [
{
"field_type": "some_field_type",
"field_value": 20
},
{
"field_type": "another_field_type",
"field_value": 40
}
]
}

Term query on nested fields returns no result in Elasticsearch

I have a nested type field in my mapping. When I use Term search query on my nested field no result is returned from Elasticsearch whereas when I change Term to Match query, it works fine and Elasticsearch returns expected result
here is my mapping, imagine I have only one nested field in my type mapping
{
"homing.estatefiles": {
"mappings": {
"estatefile": {
"properties": {
"DynamicFields": {
"type": "nested",
"properties": {
"Name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"ValueBool": {
"type": "boolean"
},
"ValueDateTime": {
"type": "date"
},
"ValueInt": {
"type": "long"
}
}
}
}
}
}
}
}
And here is my term query (which returns no result)
{
"from": 50,
"size": 50,
"query": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"must": [
{
"term": {
"DynamicFields.Name":{"value":"HasParking"}
}
},
{
"term": {
"DynamicFields.ValueBool": {
"value": true
}
}
}
]
}
},
"path": "DynamicFields"
}
}
]
}
}
}
And here is my query which returns expected result (by changing Term query to Match query)
{
"from": 50,
"size": 50,
"query": {
"bool": {
"filter": [
{
"nested": {
"query": {
"bool": {
"must": [
{
"match": {
"DynamicFields.Name":"HasParking"
}
},
{
"term": {
"DynamicFields.ValueBool": {
"value": true
}
}
}
]
}
},
"path": "DynamicFields"
}
}
]
}
}
}
This is happening because the capital letters with the analyzer of elastic.
When you are using term the elastic is looking for the exact value you gave.
up until now it sounds good, but before it tries to match the term, the value you gave go through an analyzer of elastic which manipulate your value.
For example in your case it also turn the HasParking to hasparking.
And than it will try to match it and of course will fail. They have a great explanation in the documentation in the "Why doesn’t the term query match my document" section. This analyzer not being activated on the value when you query using match and this why you get your result.

Elastic Search: filter query results by entry its field into another query results

I found the question about the IN equivalent operator:
ElasticSearch : IN equivalent operator in ElasticSearch
But I would to find equivalent to the another more complicated request:
SELECT * FROM table WHERE id IN (SELECT id FROM anotherTable WHERE something > 0);
Mapping:
First index:
{
"mappings": {
"products": {
"properties": {
"id": { "type": "integer" },
"name": { "type": "text" },
}
}
}
}
Second index:
{
"mappings": {
"reserved": {
"properties": {
"id": { "type": "integer" },
"type": { "type": "text" },
}
}
}
}
I want to get products which ids are contained in reserved index and have the specific type of a reserve.
First step - get all relevant ids from reserved index:
{
"size": 0,
"query": {
"bool": {
"must": [
{
"term": {
"type": "TYPE_HERE"
}
}
]
}
},
"aggregations": {
"ids": {
"terms": {
"field": "id"
}
}
}
}
--> see: Terms Aggregations, Bool Query and Term Query.
--> _source will retrieve only relevant field id.
Second step - get all relevant documents from products index:
{
"query": {
"bool": {
"must": [
{
"terms": {
"id": [
"ID_1",
"ID_2",
"AND_SO_ON..."
]
}
}
]
}
}
}
--> take all the ids from first step and put them as a list under terms:id[...]
--> see Terms Query.

ElasticSearch query on tags

I am trying to crack the elasticsearch query language, and so far I'm not doing very good.
I've got the following mapping for my documents.
{
"mappings": {
"jsondoc": {
"properties": {
"header" : {
"type" : "nested",
"properties" : {
"plainText" : { "type" : "string" },
"title" : { "type" : "string" },
"year" : { "type" : "string" },
"pages" : { "type" : "string" }
}
},
"sentences": {
"type": "nested",
"properties": {
"id": { "type": "integer" },
"text": { "type": "string" },
"tokens": { "type": "nested" },
"rhetoricalClass": { "type": "string" },
"babelSynsetsOcc": {
"type": "nested",
"properties" : {
"id" : { "type" : "integer" },
"text" : { "type" : "string" },
"synsetID" : { "type" : "string" }
}
}
}
}
}
}
}
}
It mainly resembles a JSON file referring to a pdf document.
I have been trying to make queries with aggregations and so far is going great. I've gotten to the point of grouping by (aggregating) rhetoricalClass, get the total number of repetitions of babelSynsetsOcc.synsetID. Heck, even the same query even by grouping the whole result by header.year
But, right now, I am struggling with filtering the documents that contain a term and doing the same query.
So, how could I make a query such that grouping by rhetoricalClass and only taking into account those documents whose field header.plainText contains either ["Computational", "Compositional", "Semantics"]. I mean contain instead of equal!.
If I were to make a rough translation to SQL it would be something similar to
SELECT count(sentences.babelSynsetsOcc.synsetID)
FROM jsondoc
WHERE header.plainText like '%Computational%' OR header.plainText like '%Compositional%' OR header.plainText like '%Sematics%'
GROUP BY sentences.rhetoricalClass
WHERE clauses are just standard structured queries, so they translate to queries in Elasticsearch.
GROUP BY and HAVING loosely translate to aggregations in Elasticsearch's DSL. Functions like count, min max, and sum are a function of GROUP BY and it's therefore also an aggregation.
The fact that you're using nested objects may be necessary, but it adds an extra layer to each part that touches them. If those nested objects are not arrays, then do not use nested; use object in that case.
I would probably look at translating your query to:
{
"query": {
"nested": {
"path": "header",
"query": {
"bool": {
"should": [
{
"match": {
"header.plainText" : "Computational"
}
},
{
"match": {
"header.plainText" : "Compositional"
}
},
{
"match": {
"header.plainText" : "Semantics"
}
}
]
}
}
}
}
}
Alternatively, it could be rewritten as this, which is a little less obvious of its intent:
{
"query": {
"nested": {
"path": "header",
"query": {
"match": {
"header.plainText": "Computational Compositional Semantics"
}
}
}
}
}
The aggregation would then be:
{
"aggs": {
"nested_sentences": {
"nested": {
"path": "sentences"
},
"group_by_rhetorical_class": {
"terms": {
"field": "sentences.rhetoricalClass",
"size": 10
},
"aggs": {
"nested_babel": {
"path": "sentences.babelSynsetsOcc"
},
"aggs": {
"count_synset_id": {
"count": {
"field": "sentences.babelSynsetsOcc.synsetID"
}
}
}
}
}
}
}
}
Now, if you combine them and throw away hits (since you're just looking for the aggregated result), then it looks like this:
{
"size": 0,
"query": {
"nested": {
"path": "header",
"query": {
"match": {
"header.plainText": "Computational Compositional Semantics"
}
}
}
},
"aggs": {
"nested_sentences": {
"nested": {
"path": "sentences"
},
"group_by_rhetorical_class": {
"terms": {
"field": "sentences.rhetoricalClass",
"size": 10
},
"aggs": {
"nested_babel": {
"path": "sentences.babelSynsetsOcc"
},
"aggs": {
"count_synset_id": {
"count": {
"field": "sentences.babelSynsetsOcc.synsetID"
}
}
}
}
}
}
}
}

Term, nested documents and must_not query incompatible in ElasticSearch?

I have trouble combining term, must_not queries on nested documents.
Sense example can be found here : http://sense.qbox.io/gist/be436a1ffa01e4630a964f48b2d5b3a1ef5fa176
Here my mapping :
{
"mappings": {
"docs" : {
"properties": {
"tags" : {
"type": "nested",
"properties" : {
"type": {
"type": "string",
"index": "not_analyzed"
}
}
},
"label" : {
"type": "string"
}
}
}
}
}
with two documents in this index :
{
"tags" : [
{"type" : "POST"},
{"type" : "DELETE"}
],
"label" : "item 1"
},
{
"tags" : [
{"type" : "POST"}
],
"label" : "item 2"
}
When I query this index like this :
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must": {
"term": {
"tags.type": "DELETE"
}
}
}
}
}
}
}
I've got one hit (which is correct)
When I want to get documents WHICH DON'T CONTAIN the tag "DELETE", with this query :
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must_not": {
"term": {
"tags.type": "delete"
}
}
}
}
}
}
}
I've got 2 hits (which is incorrect).
This issue seems very close to this one (Elasticsearch array must and must_not) but it's not...
Can you give me some clues to resolve this issue ?
Thank you
Your original query would search in each individual nested object and eliminate the objects that don't match, but if there are some nested objects left, they do match with your query and so you get your results. This is because nested objects are indexed as a hidden separate document
Original code:
{
"query": {
"nested": {
"path": "tags",
"query": {
"bool": {
"must_not": {
"term": {
"tags.type": "delete"
}
}
}
}
}
}
}
The solution is then quite simple really, you should bring the bool query outside the nested documents. Now all the documents are discarded who have a nested object with the "DELETE" type. Just what you wanted!
The solution:
{
"query": {
"bool": {
"must_not": {
"nested": {
"path": "tags",
"query": {
"term": {
"tags.type": "DELETE"
}
}
}
}
}
}
}
NOTE: Your strings are "not analyzed" and you searched for "delete" instead of "DELETE". If you want to search case insensitive, make your strings analyzed
This should fix your problem: http://sense.qbox.io/gist/f4694f542bc76c29624b5b5c9b3ecdee36f7e3ea
Two most important things:
include_in_root on "tags.type". This will tell ES to index tag types as "doc.tags.types" : ['DELETE', 'POSTS'], so you can access an array of those values "flattened" on the root doc . This means you no longer need a nested query (see #2)
Drop the nested query.
{
"mappings": {
"docs" : {
"properties": {
"tags" : {
"type": "nested",
"properties" : {
"type": {
"type": "string",
"index": "not_analyzed"
}
},
"include_in_root": true
},
"label" : {
"type": "string"
}
}
}
}
}
{
"query": {
"bool": {
"must_not": {
"term": {
"tags.type": "DELETE"
}
}
}
}
}

Resources