I have the following indexed document:
curl -XGET "http://127.0.0.1:8200/logstash-test/1/_search"
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVthzksHqNe69jLmmCEp",
"_score": 1,
"_source": {
"foo": "bar2"
}
},
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVthzlbfqNe69jLmmCSr",
"_score": 1,
"_source": {
"foo": "bar3"
}
},
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVthwg4_qNe69jLmlStd",
"_score": 1,
"_source": {
"foo": "bar"
}
},
{
"_index": "logstash-test",
"_type": "1",
"_id": "AVth0IS1qNe69jLmmMpZ",
"_score": 1,
"_source": {
"foo": "bar4.foo_bar.foo"
}
}
]
}
}
I want to search foo=bar2 or foo=ba3 or foo=bar4.foo_bar.foo
curl -XPOST "http://127.0.0.1:8200/logstash-test/1/_search" -d
'{"query":{"bool":{"filter":[{"terms":{"foo":["bar3","bar2","bar4.foo_bar.foo"]}}]}}}'
But bar4.foo_bar.foo do not match.
Thank you.
As you are searching on exact terms use keyword field available on foo field as shown below:
curl -XPOST "http://127.0.0.1:8200/logstash-test/1/_search" -d
'{
"query": {
"bool": {
"filter": [
{
"terms": {
"foo.keyword": [
"bar3",
"bar2",
"bar4.foo_bar.foo"
]
}
}
]
}
}
}'
You can read more about multi-fields here
Method #2
You can solve it by using different analyzer( e.g whitespace analyzer) for foo field while defining mapping for it.
PUT logstash-test
{
"mappings": {
"1": {
"properties": {
"foo": {
"type": "text",
"analyzer": "whitespace"
}
}
}
}
}
But as your are searching on exact terms method #1 is preferred over method #2
Related
I have serveral documents that look like the following stored in my elastic search index:
PUT tests
{
"mappings": {
"_doc": {
"dynamic": false,
"properties": {
"objects": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"text": {
"type": "text"
}
}
}
}
}
PUT tests/_doc/1
{
"text": "lel",
"objects": ["A"]
}
PUT tests/_doc/2
{
"text": "lol",
"objects": ["B"]
}
PUT tests/_doc/3
{
"text": "lil",
"objects": ["C"]
}
PUT tests/_doc/4
{
"text": "lul",
"objects": ["A", "B", "C"]
}
I want to query for objects with the following query:
GET _search
{
"query": {
"terms": {
"objects.keyword": ["A", "B", "C"]
}
}
}
The result includes all three sample objects I provided.
My question is simply whether I can make an object appear of a higher importance (boost) that has a full match (all keywords in the objects array) and not just only a partial match and if so how, since I could not find any information in the elastic search documentation.
This is the result I am currently receiving:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 11,
"successful": 11,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1,
"hits": [
{
"_index": "tests",
"_type": "_doc",
"_id": "2",
"_score": 1,
"_source": {
"text": "lol",
"objects": [
"B"
]
}
},
{
"_index": "tests",
"_type": "_doc",
"_id": "4",
"_score": 1,
"_source": {
"text": "lul",
"objects": [
"A",
"B",
"C"
]
}
},
{
"_index": "tests",
"_type": "_doc",
"_id": "1",
"_score": 1,
"_source": {
"text": "lel",
"objects": [
"A"
]
}
},
{
"_index": "tests",
"_type": "_doc",
"_id": "3",
"_score": 1,
"_source": {
"text": "lil",
"objects": [
"C"
]
}
}
]
}
}
I think your best bet is using a bool query with should and minimum_should_match: 1.
GET _search
{
"query": {
"bool": {
"should": [
{
"term": {
"objects.keyword": "A"
}
},
{
"term": {
"objects.keyword": "B"
}
},
{
"term": {
"objects.keyword": "C"
}
}
],
"minimum_should_match": 1
}
}
}
Results:
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 6,
"successful": 6,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 4,
"max_score": 1.5686159,
"hits": [
{
"_index": "tests",
"_type": "_doc",
"_id": "4",
"_score": 1.5686159,
"_source": {
"text": "lul",
"objects": [
"A",
"B",
"C"
]
}
},
{
"_index": "tests",
"_type": "_doc",
"_id": "1",
"_score": 0.2876821,
"_source": {
"text": "lel",
"objects": [
"A"
]
}
},
{
"_index": "tests",
"_type": "_doc",
"_id": "3",
"_score": 0.2876821,
"_source": {
"text": "lil",
"objects": [
"C"
]
}
},
{
"_index": "tests",
"_type": "_doc",
"_id": "2",
"_score": 0.18232156,
"_source": {
"text": "lol",
"objects": [
"B"
]
}
}
]
}
}
EDIT: Here's why, as explained by the docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-bool-query.html):
The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.
I have the following field in my index
field1:{key:value}
Is it possible to sort my query on sum of values in field1.
Thanks
Here's one way you could do this, assuming you know the fields ahead of time. It should be possible with some minor refinements if you need to wildcard the fields. This assumes the sibling fields on the nested type are numeric.
Example mapping:
"test": {
"mappings": {
"type1": {
"properties": {
"field1": {
"properties": {
"key1": {
"type": "integer"
},
"key2": {
"type": "integer"
}
}
}
}
}
}
}
Default results:
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test",
"_type": "type1",
"_id": "AV8O7956gIcGI2d5A_5g",
"_score": 1,
"_source": {
"field1": {
"key1": 11,
"key2": 17
}
}
},
{
"_index": "test",
"_type": "type1",
"_id": "AV8O78FqgIcGI2d5A_5f",
"_score": 1,
"_source": {
"field1": {
"key1": 5,
"key2": 6
}
}
}
]
}
Query with script:
GET /test/_search
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"script_score": {
"script": "return (doc['field1.key1'].value + doc['field1.key2'].value) * -1"
}
}
]
}
}
}
Logic taking the lowest score as the best score (least negative in this case):
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": -11,
"hits": [
{
"_index": "test",
"_type": "type1",
"_id": "AV8O78FqgIcGI2d5A_5f",
"_score": -11,
"_source": {
"field1": {
"key1": 5,
"key2": 6
}
}
},
{
"_index": "test",
"_type": "type1",
"_id": "AV8O7956gIcGI2d5A_5g",
"_score": -28,
"_source": {
"field1": {
"key1": 11,
"key2": 17
}
}
}
]
}
}
Hopefully this gives you the gist of whatever specific scoring logic you need
Get the results of only those documents which contain '#test' and ignore the documents that contain just 'test' in elasticsearch
People may gripe at you about this question, so I'll note that it was in response to my comment on this post.
You're probably going to want to read up on analysis in Elasticsearch, as well as match queries versus term queries.
Anyway, the convention here is to use a .raw sub-field on a string field. That way, if you want to do searches involving analysis, you can use the base field, but if you want to search for exact (un-analyzed) values, you can use the sub-field.
So here is a simple mapping that accomplishes this:
PUT /test_index
{
"mappings": {
"doc": {
"properties": {
"post_text": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
Now if I add these two documents:
PUT /test_index/doc/1
{
"post_text": "#test"
}
PUT /test_index/doc/2
{
"post_text": "test"
}
A "match" query against the base field will return both:
POST /test_index/_search
{
"query": {
"match": {
"post_text": "#test"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5945348,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.5945348,
"_source": {
"post_text": "#test"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.5945348,
"_source": {
"post_text": "test"
}
}
]
}
}
But the "term" query below will only return the one:
POST /test_index/_search
{
"query": {
"term": {
"post_text.raw": "#test"
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"post_text": "#test"
}
}
]
}
}
Here is the code I used to test it:
http://sense.qbox.io/gist/2f0fbb38e2b7608019b5b21ebe05557982212ac7
How do I instruct elasticsearch to return all documents which have data in one of the following fields: ['field1','field2']?
I have tried:
{
'query': {
'bool':{
'must':[
'multi_match':{
'fields':['field1','field2'],
'operator':'AND',
'tie_breaker':1.0,
'query': '*',
'type':'cross_fields'
}
]
}
}
}
I also tried:
{
"query":{
"wildcard":
{
"field1":"*"
}
}
}
which works, but:
{
"query":{
"wildcard":
{
"field*":"*"
}
}
}
does not
You can do it with two exists filters in a bool filter
As an example, I set up a simple index and gave it some data:
PUT /test_index
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"field1":"foo","field2":"bar"}
{"index":{"_id":2}}
{"field2":"foo","field3":"bar"}
{"index":{"_id":3}}
{"field3":"foo","field4":"bar"}
{"index":{"_id":4}}
{"field4":"foo","field5":"bar"}
If I want to find all documents that have "field1" or "field3", I can do this:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"should": [
{ "exists": { "field": "field1" } },
{ "exists": { "field": "field3" } }
]
}
}
}
}
}
It returns what I expect:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"field1": "foo",
"field2": "bar"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"field2": "foo",
"field3": "bar"
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"field3": "foo",
"field4": "bar"
}
}
]
}
}
Here's the code I used:
http://sense.qbox.io/gist/991b828de250e5125fd372bf7e6b066acec55fcd
I have this large set of data and I want a sample that I can use in a graph. For this I don't need all of the data, I need every Nth item.
For instance if I have 4000 results, and I only need 800 results, I want to be able to get every 5th result.
So some like: get, skip, skip, skip, skip, get, skip, skip, skip,..
I was wondering if such a thing is possible in Elasticsearch?
You're better off using a scripted filter. Otherwise you're needlessly using the score. Filters are just like queries, but they don't use scoring.
POST /test_index/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "doc['unique_counter'].value % n == 0",
"params" : {
"n" : 5
}
}
}
}
}
}
You're also better off not using dynamic scripting in real world usage.
That said, you probably want to take a look at aggregations for graphing analytical information about your data rather than taking an arbitrary sample.
One way you could do it is with random scoring. It won't give you precisely every nth item according to a rigid ordering, but if you can relax that requirement this trick should do nicely.
To test it I set up a simple index (I mapped "doc_id" to "_id" just so the documents would have some contents, so that part isn't required, in case that's not obvious):
PUT /test_index
{
"mappings": {
"doc": {
"_id": {
"path": "doc_id"
}
}
}
}
Then I indexed ten simple documents:
POST /test_index/doc/_bulk
{"index":{}}
{"doc_id":1}
{"index":{}}
{"doc_id":2}
{"index":{}}
{"doc_id":3}
{"index":{}}
{"doc_id":4}
{"index":{}}
{"doc_id":5}
{"index":{}}
{"doc_id":6}
{"index":{}}
{"doc_id":7}
{"index":{}}
{"doc_id":8}
{"index":{}}
{"doc_id":9}
{"index":{}}
{"doc_id":10}
Now I can pull back three random documents like this:
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "some seed"
}
}
]
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.93746644,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.93746644,
"_source": {
"doc_id": 1
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "10",
"_score": 0.926947,
"_source": {
"doc_id": 10
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "5",
"_score": 0.79400194,
"_source": {
"doc_id": 5
}
}
]
}
}
Or a different random three like this:
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"random_score": {
"seed": "some other seed"
}
}
]
}
}
}
...
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 0.817295,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "4",
"_score": 0.817295,
"_source": {
"doc_id": 4
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "8",
"_score": 0.469319,
"_source": {
"doc_id": 8
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.4374538,
"_source": {
"doc_id": 3
}
}
]
}
}
Hopefully it's clear how to generalize this method to what you need. Just take out however many documents you want, in however many chunks make it performant.
Here is all the code I used to test:
http://sense.qbox.io/gist/a02d4da458365915f5e9cf6ea80546d2dfabc75d
EDIT: Actually now that I think about it, you could also use scripted scoring to get precisely every nth item, if you set it up right. Maybe something like,
POST /test_index/_search
{
"size": 3,
"query": {
"function_score": {
"functions": [
{
"script_score": {
"script": "if(doc['doc_id'].value % 3 == 0){ return 1 }; return 0;"
}
}
]
}
}
}
...
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 10,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 1,
"_source": {
"doc_id": 3
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "6",
"_score": 1,
"_source": {
"doc_id": 6
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "9",
"_score": 1,
"_source": {
"doc_id": 9
}
}
]
}
}