Ranged Query with ElasticSearch - elasticsearch

I'm testing with ElasticSearch and I'm having problems with ranged queries.
Consider the following document that I've inserted:
curl -XPUT 'localhost:9200/test/test/test?pretty' -d '
{
"name": "John Doe",
"duration" : "10",
"state" : "unknown"
}'
And now I'me trying to do a ranged query that catches all documents whose duration is between 5 and 15:
curl -XPOST 'localhost:9200/test/_search?pretty' -d '
{
"query": {
"range": {
"duration": {
"gte": "5",
"lte": "15"
}
}
}
}'
This returns no hits however if I run the Query like this:
curl -XPOST 'localhost:9200/test/_search?pretty' -d '
{
"query": {
"range": {
"duration": {
"gte": "10"
}
}
}
}'
It returns the Document I've inserted earlier. How can I query ElasticSearch for documents with the duration value between 5 and 15.

The problem is that you are indexing your values as strings. This causes the range query not to work. Try indexing and querying as follows:
curl -XPUT 'localhost:9200/test/test/test?pretty' -d '
{
"name": "John Doe",
"duration" : 10,
"state" : "unknown"
}'
curl -XPOST 'localhost:9200/test/_search?pretty' -d '
{
"query": {
"range": {
"duration": {
"gte": 5,
"lte": 15
}
}
}
}'
This wil yield the following result:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "test",
"_id" : "test",
"_score" : 1.0,
"_source":
{
"name": "John Doe",
"duration" : 10,
"state" : "unknown"
}
} ]
}
}

Related

How to order search results with related to slop in elasticsearch?

I have an index in ES:
curl -XGET 'http://127.0.0.1:9200/so/_settings?pretty=true'
{
"so" : {
"settings" : {
"index" : {
"number_of_shards" : "1",
"provided_name" : "so",
"creation_date" : "1594912442805",
"analysis" : {
"analyzer" : {
"my_simple_analyzer" : {
"type" : "simple",
"tokenizer" : "lowercase"
}
}
},
"number_of_replicas" : "1",
"uuid" : "8YVu4zU_Sdylr3KhOIwu9Q",
"version" : {
"created" : "7080099"
}
}
}
}
}
It has around 1.5M data.
curl -XGET 'http://127.0.0.1:9200/so/_count?pretty=true'
{
"count" : 15426942,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}
I wanted to perform a full text search, so that the query string first does the phrase match and then followed by results which has slop of 1, then slop of 2 and so on.
So I came up with the below query for the same:
curl -XGET 'http://127.0.0.1:9200/so/_search?pretty=true' -H 'Content-Type: application/json' -d '{
"query": {
"bool": {
"should": [
{
"match_phrase": {
"posts": {
"query": "get the scanner on a specific family like this",
"_name": "exact_match"
}
}
},
{
"match": {
"posts": {
"query": "get the scanner on a specific family like this",
"_name": "partial_match"
}
}
}
]
}
}
}'
Is this the correct query? Because I do see the partial_match doesnt sort from slop of distance 1 and so on. How to achieve it?

Search by exact match in all fields in Elasticsearch

Let's say I have 3 documents, each of them only contains one field (but let's imagine that there are more, and we need to search through all fields).
Field value is "first second"
Field value is "second first"
Field value is "first second third"
Here is a script that can be used to create these 3 documents:
# drop the index completely, use with care!
curl -iX DELETE 'http://localhost:9200/test'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/one' -d '{"name":"first second"}'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/two' -d '{"name":"second first"}'
curl -H 'content-type: application/json' -iX PUT 'http://localhost:9200/test/_doc/three' -d '{"name":"first second third"}'
I need to find the only document (document 1) that has exactly "first second" text in one of its fields.
Here is what I tried.
A. Plain search:
curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
"query": {
"query_string": {
"query": "first second"
}
}
}'
returns all 3 documents
B. Quoting
curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
"query": {
"query_string": {
"query": "\"first second\""
}
}
}'
gives 2 documents: 1 and 3, because both contain 'first second'.
Here https://stackoverflow.com/a/28024714/7637120 they suggest to use 'keyword' analyzer to analyze the fields when indexing, but I would like to avoid any customizations to the mapping.
Is it possible to avoid them and still only find document 1?
Yes, you can do that by declaring name mapping type as keyword. The key to solve your problem is just simple -- declare name mapping type:keyword and off you go
to demonstrate it, I have done these
1) created mapping with `keyword` for `name` field`
2) indexed the three documents
3) searched with a `match` query
mappings
PUT so_test16
{
"mappings": {
"_doc":{
"properties":{
"name": {
"type": "keyword"
}
}
}
}
}
Indexing the documents
POST /so_test16/_doc
{
"id": 1,
"name": "first second"
}
POST /so_test16/_doc
{
"id": 2,
"name": "second first"
}
POST /so_test16/_doc
{
"id": 3,
"name": "first second third"
}
The query
GET /so_test16/_search
{
"query": {
"match": {"name": "first second"}
}
}
and the result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.2876821,
"hits" : [
{
"_index" : "so_test16",
"_type" : "_doc",
"_id" : "m1KXx2sB4TH56W1hdTF9",
"_score" : 0.2876821,
"_source" : {
"id" : 1,
"name" : "first second"
}
}
]
}
}
Adding second solution
( if the name is not a keyword type but a text type. Only thing here is fielddata:true also needed to be added for name field)
Mappings
PUT so_test18
{
"mappings" : {
"_doc" : {
"properties" : {
"id" : {
"type" : "long"
},
"name" : {
"type" : "text",
"fielddata": true
}
}
}
}
}
and the search query
GET /so_test18/_search
{
"query": {
"bool": {
"must": [
{"match_phrase": {"name": "first second"}}
],
"filter": {
"script": {
"script": {
"lang": "painless",
"source": "doc['name'].values.length == 2"
}
}
}
}
}
}
and the response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 0.3971361,
"hits" : [
{
"_index" : "so_test18",
"_type" : "_doc",
"_id" : "o1JryGsB4TH56W1hhzGT",
"_score" : 0.3971361,
"_source" : {
"id" : 1,
"name" : "first second"
}
}
]
}
}
In Elasticsearch 7.1.0, it seems that you can use keyword analyzer even without creating a special mapping. At least I didn't, and the following query does what I need:
curl -H 'Content-Type: application/json' -iX POST 'http://localhost:9200/test/_search' -d '{
"query": {
"query_string": {
"query": "first second",
"analyzer": "keyword"
}
}
}'

elastic search 5 - how to query Object datatype and nested array of json

I want to query against nested data already loaded into Elasticsearch 5 but every query returns nothing. The data is of object datatype and nested array of json.
This the nested datatype ie team_members array of json:
[{
"id": 6,
"name": "mike",
"priority": 1
}, {
"id": 7,
"name": "james",
"priority": 2
}]
This object datatype ie the availability_slot json:
{
"monday": {
"on": true,
"end_time": "15",
"start_time": "9",
"end_time_unit": "pm",
"start_time_unit": "am",
"events_starts_every": 10
}
}
This is my elasticsearch mapping:
{
"meetings_development_20170716013030509": {
"mappings": {
"meeting": {
"properties": {
"account": {"type": "integer"},
"availability_slot": {
"properties": {
"monday": {
"properties": {
"end_time": {"type": "text"},
"end_time_unit": {"type": "text"},
"events_starts_every": {
"type":"integer"
},
"on": {"type": "boolean"},
"start_time": {"type": "text"},
"start_time_unit": {
"type": "text"
}
}
}
}
},
"team_members": {
"type": "nested",
"properties": {
"id": {"type": "integer"},
"name": {"type": "text"},
"priority": {"type": "integer"}
}
}
}
}
}
}
}
I have two queries which are failing for different reasons:
query 1
This query returns a count of zero despite the records existing in elasticsearch, I discovered the queries are failing because of the filter:
curl -u elastic:changeme http://172.19.0.4:9200/meetings_development/_search?pretty -d '{"query":{"nested":{"path":"team_members","score_mode":"avg","query":{"bool":{"must":[{"match":{"team_members.name":"mike"}},{"match":{"team_members.priority":1}}],"filter":[{"match":{"account":1}}]}}}}}'
This returns zero result:
{
"took" : 8,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
query 1 without filter
Thesame query from above without the filter works:
curl -u elastic:changeme http://172.19.0.4:9200/meetings_development/_search?pretty -d '{"query":{"nested":{"path":"team_members","score_mode":"avg","query":{"bool":{"must":[{"match":{"team_members.name":"mike"}},{"match":{"team_members.priority":1}}]}}}}}'
The query above returns 3 hits:
{
"took" : 312,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 2.1451323,
"hits" : [{**results available here**} ]
}
}
query 2 for the object datatype
curl -u elastic:changeme http://172.19.0.4:9200/meetings_development/_search?pretty -d '{"query":{"bool":{"must":{"match":{"availability_slot.start_time":1}},"filter":[{"match":{"account":1}}]}}}'
The query returns a hit of zero but the data is in elasticsearch:
{
"took" : 172,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
How do I get both queries to work filtering by account. Thanks
This elasticsearch guide link was very helpful in coming up with the correct elasticsearch queries shown below:
query 1 for the nested array of json
{
"query" => {
"bool": {
"must": [
{
"match": {
"name": "sales call"
}
},
{"nested" => {
"path" => "team_members",
"score_mode" => "avg",
"query" => {
"bool" => {
"must" => {
"match" => {"team_members.name" => "mike"}
}
}
}
}
}
],
"filter": {
"term": {
"account": 1
}
}
},
}
}
Just pass the query to elastic search like this:
curl http://172.19.0.4:9200/meetings_development/_search?pretty -d '{"query":{"bool":{"must":[{"match":{"name":"sales call"}},{"nested":{"path":"team_members","score_mode":"avg","query":{"bool":{"must":{"match":{"team_members.name":"mike"}}}}}}],"filter":{"term":{"account":1}}}}}'
correct syntax for query 2 for the object datatype ie json
{
"query": {
"bool": {
"must": {
"match": {'availability_slot.monday.start_time' => '9'}
},
"filter": [{
"match": {'account': 1}
}]
}
}
}
You the pass this to elasticsearch like this:
curl http://172.19.0.4:9200/meetings_development/_search?pretty -d '{"query":{"bool":{"must":{"match":{"availability_slot.monday.start_time":"9"}},"filter":[{"match":{"account":1}}]}}}'

Aggregations in Elasticsearch cutting string instead of taking everything

Having the following simple mapping:
curl -XPUT localhost:9200/transaciones/ -d '{
"mappings": {
"ventas": {
"properties": {
"tipo": { "type": "string" },
"cantidad": { "type": "double" }
}
}
}
}'
Adding data:
curl -XPUT localhost:9200/transaciones/ventas/1 -d '{
"tipo": "Ingreso bancario",
"cantidad": 80
}'
curl -XPUT localhost:9200/transaciones/ventas/2 -d '{
"tipo": "Ingreso bancario",
"cantidad": 10
}'
curl -XPUT localhost:9200/transaciones/ventas/3 -d '{
"tipo": "PayPal",
"cantidad": 30
}'
curl -XPUT localhost:9200/transaciones/ventas/4 -d '{
"tipo": "Tarjeta de credito",
"cantidad": 130
}'
curl -XPUT localhost:9200/transaciones/ventas/5 -d '{
"tipo": "Tarjeta de credito",
"cantidad": 130
}'
When I try to get the aggs with:
curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
"size": 0,
"aggs": {
"tipos_de_venta": {
"terms": {
"field": "tipo"
}
}
}
}'
The response is:
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 5,
"max_score" : 0.0,
"hits" : [ ]
},
"aggregations" : {
"tipos_de_venta" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "bancario",
"doc_count" : 2
}, {
"key" : "credito",
"doc_count" : 2
}, {
"key" : "de",
"doc_count" : 2
}, {
"key" : "ingreso",
"doc_count" : 2
}, {
"key" : "tarjeta",
"doc_count" : 2
}, {
"key" : "paypal",
"doc_count" : 1
} ]
}
}
}
As you can see it cuts the strings Tarjeta de credito into Tarjeta, de, credit.
How can I take the entire string without using on the mapping not_analyzed on tipo? My desired output would be Ingreso bancario, PayPal and Tarjeta de crédito, on the response would be something like this:
"aggregations" : {
"tipos_de_venta" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [ {
"key" : "Ingreso bancario",
"doc_count" : 2
}, {
"key" : "PayPal",
"doc_count" : 1
}, {
"key" : "Tarjeta de credito",
"doc_count" : 2
} ]
}
}
PS: I'm using ES 2.3.2
It's because your tipo field is an analyzed string. The right way to do this is to create a not_analyzed field in order to achieve what you want:
curl -XPUT localhost:9200/transaciones/_mapping/ventas -d '{
"properties": {
"tipo": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}'
Then you need to reindex your documents and finally you'll be able to run this and get the desired results:
curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
"size": 0,
"aggs": {
"tipos_de_venta": {
"terms": {
"field": "tipo.raw"
}
}
}
}'
UPDATE
If you really don't want to create a not_analyzed field, then you have another way using a script terms aggregation but it can really kill the performance of your cluster
curl -XGET localhost:9200/transaciones/ventas/_search?pretty=true -d '{
"size": 0,
"aggs": {
"tipos_de_venta": {
"terms": {
"script": _source.tipo"
}
}
}
}'

ElasticSearch - searching different doc_types with the same field name but different analyzers

Let's say I make a simple ElasticSearch index:
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"analysis": {
"char_filter": {
"de_acronym": {
"type": "mapping",
"mappings": [".=>"]
}
},
"analyzer": {
"analyzer1": {
"type": "custom",
"tokenizer": "keyword",
"char_filter": ["de_acronym"]
}
}
}
}
}'
And I make two doc_types that have the same property name but they are analyzed slightly differently from one another:
curl -XPUT 'http://localhost:9200/test/_mapping/docA' -d '{
"docA": {
"properties": {
"name": {
"type": "string",
"analyzer": "simple"
}
}
}
}'
curl -XPUT 'http://localhost:9200/test/_mapping/docB' -d '{
"docB": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer1"
}
}
}
}'
Next, let's say I put a document in each doc_type with the same name:
curl -XPUT 'http://localhost:9200/test/docA/1' -d '{ "name" : "U.S. Army" }'
curl -XPUT 'http://localhost:9200/test/docB/1' -d '{ "name" : "U.S. Army" }'
Let's try to search for "U.S. Army" in both doc types at the same time:
curl -XGET 'http://localhost:9200/test/_search?pretty' -d '{
"query": {
"match_phrase": {
"name": {
"query": "U.S. Army"
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.5,
"hits" : [ {
"_index" : "test",
"_type" : "docA",
"_id" : "1",
"_score" : 1.5,
"_source":{ "name" : "U.S. Army" }
} ]
}
}
I only get one result! I get the other result when I specify docB's analyzer:
curl -XGET 'http://localhost:9200/test/_search?pretty' -d '
{
"query": {
"match_phrase": {
"name": {
"query": "U.S. Army",
"analyzer": "analyzer1"
}
}
}
}'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "test",
"_type" : "docB",
"_id" : "1",
"_score" : 1.0,
"_source":{ "name" : "U.S. Army" }
} ]
}
}
I was under the impression that ES would search each doc_type with the appropriate analyzer. Is there a way to do this?
The ElasticSearch docs say that precedence for search analyzer goes:
1) The analyzer defined in the query itself, else
2) The analyzer defined in the field mapping, else
...
In this case, is ElasticSearch arbitrarily choosing which field mapping to use?
Take a look at this issue in github, which seems to have started from this post in ES google groups. I believe it answers your question:
if its in a filtered query, we can't infer it, so we simply pick one of those and use its analysis settings

Resources