Filter by terms in an array - elasticsearch

I'm trying to filter by terms within an array on elasticsearch documents. This is what the documents look like:
{
"name": "Foo",
"id": 10,
"industries": ["Tech", "Fashion"],
...
}
But for the various filter-based queries I try, I've gotten zero results. e.g.:
$ curl -XGET 'http://localhost:9200/_search?pretty=true' -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [{
"terms": {
"industries": ["Tech"],
"execution": "or"
}
}]
}
},
"query": {"match_all": {}}
}
},
"from": 0,
"size": 20
}
'
I've tried about a dozen different queries against various simplifications and filter clauses, e.g. here's a simplified one:
$ curl -XGET 'http://localhost:9200/_search?pretty=true' -d '
{
"query": {
"filtered": {
"filter": {
"terms": {
"industries": ["Tech"],
"execution": "or"
}
}
}
},
"from": 0,
"size": 20
}
'
What am I missing here?

What analyzer are you using for the industries field? If you are using the default, it will actually lower case and split your stings, which would explain why your filters aren't picking those documents up (e.g., it's looking for "Tech" when only "tech" exists). If you set the mapping to not_analyzed (or use the multi fields option), that might solve your problem.

Related

elasticsearch how do i query (search) in single document?

assuming that index's name is index & document 1's id is "1"
how can i query in single document?
something like this..
GET index/_search
{
"query": {
"id": "1",
"terms": ["is this text in document 1?"]
}
}
or
GET index/_doc/1/_search
{
...
}
far as i found,
GET test/_doc/_search
{
"query": {
"terms" : {
"_id" : ["1"]
}
}
}
this will get the document id of "1", but cannot perform any further queries.
the reason i want to query inside single document is because my app is using live-news view
and once news is retrieved from server, i want to search it in elasticsearch for keywork higlighting, and spam filtering.
You have to compose your query with Boolean Query
The best approch is to specify the id query under the filter because it will not have effect on scoring. You can next specify queries under must, must_not and should, according to your need :
GET index/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"term": {
"field": "value"
}
}
],
"must_not": [],
"should": [],
"filter": [
{
"terms": {"_id": ["1"]}
}
]
}
}
}

Elasticsearch search in documents with certain values for a field

I have an index with following document structure with 5 fields. I have written a search query as follows :
{
"query": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"from": 0,
"size": 1000
}
This works fine but as a new requirement I have to search only in documents where field4 has a given set of values suppose (1,2,3) and omit rest of the documents.
It is possible for me to obtain a list of field4 values which are to be omitted as they are present in the db with skip status.
Please suggest a solution for the same.Thanks in advance.
I suggest using a filter query inside a bool query to match the docs that meet the condition.
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"filter": {
"terms": {
"field4.keyword": [1, 2, 3]
}
}
}
}
}

To find the distinct fields in an elastic search query

I need the values of only one field and there are duplicate values in it.
POST _search
{
"query": {
"bool": {
"must": [
{"term": {
"report": {
"value": "some_value"
}
}}
]
}
},
"fields": [
"field_name"
]
}
I need only the distinct values of field_name.
What if you have your query, with the use of terms aggregation and then by applying a top_hits aggregation in order to narrow down to the single value which you wanted to achieve:
"aggs": {
"values": {
"terms": {
"field": "your_field"
}
}
}
This SO could be helpful as well.

elasticsearch default_field vs fields different results

Here is two queries.
First:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "27444.2",
"default_field": "text"
}
}
}
},
"from": 0,
"size": 50
}
Second:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "27444.2"
}
}
}
},
"fields": ["text"],
"from": 0,
"size": 50
}
The only difference between them is that in first i use default_field to specify a field to search, and in second i specify it through fields param. The field name is the same.
I expect both variant to produce same results, but thats not the case. The first variant doesn't return any results, and the second return a result. So what im doing wrong here? Where is the catch
elasticsearch 1.4.2
The way you have given fields param is wrong.
In the second case you are referring to the field params in the query where you are restricting the results to show only certain fields and not the entire _source
The following one is what you are looking for -
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "27444.2",
"fields": ["text"]
}
}
}
},
"from": 0,
"size": 50
}
2 queries are not the same.
First searches the field 'text' and second searches all fields and in response, returns only 'field'.

Issues with null value in Elasticsearch

Here's an example of my data :
{
"MOD_DATE_START": "2010-04-20T15:05:49Z",
"MOD_DATE_END": null,
"MOD_ID": "123456789",
}
I'm having some issues with my Elasticsearch query. I have a couple of date fields where I am doing a range based filtering to make sure that my date is in between the start and end dates.
My first query (which works well) is filtering on the :
curl -s -XPOST http://server:9200/myindex/mytype/_search?pretty=true -d '
{
"fields": ["MOD_ID", "MOD_DATE_START", "MOD_DATE_END"],
"query": {
"bool": {
"must": [
{"term": {"MOD_ID": "123456789"}},
{"range": {"MOD_DATE_START": {"lte": "2012-04-20T15:05:49Z"}}}
]
}
}
}
'
The MOD_DATE_START field always contains information, so the first query works well.
Since the second date field, MOD_DATE_END, is null in most cases I would like to modify my query too add the following test :
IF "MOD_DATE_END" NOT NULL then
{"range": {"MOD_DATE_END": {"gte": "2012-04-20T15:05:49Z"}}}
ELSE skip "MOD_DATE_END"
I am, however, not quite able to figure out how to modify my query to add the third condition to be able to perform the gte test successfully.
Thanks in advance for your help.
One way to achieve this is by using a missing filter in a filtered query.
Example below :
curl -s -XPOST http://server:9200/myindex/mytype/_search?pretty=true -d '
{
"fields": ["MOD_ID", "MOD_DATE_START", "MOD_DATE_END"],
"query": {
"filtered": {
"filter": {
"bool": {
"must": {
"range": {
"MOD_DATE_START": {
"lte": "2012-04-20T15:05:49Z"
}
}
},
"should": [
{
"missing": {
"field": "MOD_DATE_END",
"null_value": true,
"existence": true
}
},
{
"range": {
"MOD_DATE_START": {
"gte": "2012-04-20T15:05:49Z"
}
}
}
]
}
},
"query": {
"term": {
"MOD_ID": "123456789"
}
}
}
}
}
'

Resources