Filtering across multiple indices using ElasticSearch - elasticsearch

Is is possible to write a conditional filter on an Elasticsearch multi-index query?
I am looking at the filter script, but I can't see anywhere in the documentation if the documents index is a variable I can check?
My existing query looks like this, note the filter script doesn't work - but I assume this is where I need to do my query.
{
"index": "tweets,articles,animals,buildings",
"type": "item",
"body": {
"query": {
"multi_match": {
"query": "cat",
"type": "phrase_prefix",
"fields": [
"label",
"body"
]
}
},
"filter": {
"script": {
"script": "if (_index == \"animals\") {return true;} else {return false}
}
},
"from": 0,
"size": 8
}
}
Obviously I'd like to do more in this filter than just exclude items from a certain index, this is simply an example.

You should be able to combine several indices query to solve this task.

Related

How to sort elasticsearch results based on number of collapsed items?

I'm using a a query with collapse in order to gather some documents under a certain person, yet I wish to sort the results based on the number of documents in which the search found a match.. this is my query:
GET documents/_search
{
"_source": {
"includes": [
"text"
]
},
"query": {
"query_string": {
"fields": [
"text"
],
"query": "some text"
}
},
"collapse": {
"field": "person_id",
"inner_hits": {
"name": "top_mathing_docs",
"_source": {
"includes": [
"doc_year",
"text"
]
}
}
}
}
Any suggestions?
Thanks
If I understand correctly, what you require here is to sort the documents i.e. parent documents, based on the count of inner_hits i.e. count of inner_hits based on person_id.
So that means, the _score of the parent documents in the result doesn't matter.
The only way I've found this doable is making use of the Top Hits Aggregation for Field Collapse Example and below is what your query would look like.
Aggregation Query Field Collapse Example:
POST <your_index_name>/_search
{
"size":0,
"query": {
"query_string": {
"fields": [
"text"
],
"query": "some text"
}
},
"aggs": {
"top_person_ids": {
"terms": {
"field": "person_id"
},
"aggs": {
"top_tags_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
Note that I'm assuming person_id is of type keyword or any numeric.
Also if you look at query closely, I've mentioned "size":"0". Which means I'm only returning the result of aggregation.
Another note is that the above aggregation has nothing to do with Field Collapse in Search Request feature that you have posted in the question. It's just that using this aggregation, your result could be formatted in a similar way.
Let me know if this helps!

MySql Order By Value equivalent in ElasticSearch 5.6

ElasticSearch Version: 5.6
I have imported MySQL data in ElasticSearch and I have added mapping to the elastic search as required. Following is one mapping for the column application_status.
Mappings:
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["lowercase"]
}
}
}
},
"mappings": {
"lead": {
"properties": {
"application_status": {
"type": "string",
"analyzer": "case_insensitive",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
}}
On the above mapping, I am able to do simple sorting (asc or desc) using following query:
{
"size": 50,
"from": 0,
"sort": [{
"application_status.keyword": {
"order": "asc"
}
}]}
which is MySql equivalent of
select * from <table_name> order by application_status asc limit 50;
Need help on following problem:
I have MySQL query which sorts based on application_status:
select * from vLoan_application_grid order by CASE WHEN application_status = "IP_QUAL_REASSI" THEN application_status END desc, CASE WHEN application_status = "IP_COMPLE" THEN application_status END desc, CASE WHEN application_status LIKE "IP_FRESH%" THEN application_status END desc, CASE WHEN application_status LIKE "IP_%" THEN application_status END desc
Please help me write the same query in ElasticSearch. I am not able to find order by value equivalent for strings in ElasticSearch. Searching online, I understood that, I should use sorting scripts but not able to find any proper documentation.
I have following query which just does simple sort.
{
"size": 500,
"from": 0,
"query" : {
"match_all": {}
},
"sort": {
"_script": {
"type": "string",
"script": {
"source": "doc['application_status.keyword'].value",
"params": {
"factor": ["IP_QUAL_REASS", "IP_COMPLE"]
}
},
"order": "desc"
}
}}
In the above query, I am not using params section as I am not aware how to use it for type: string
I believe I am asking too much. Please help or any relevant documentation links would be greatly appreciated. Hope question is clear. I'll provide more details if necessary.
You have two options:
the most performant one is to index at indexing time another field that should be a number. This number (your choice) will be the numerical representation of that status. Then at search time, you simply sort by that number and not by the status
at search time use a script that will do almost the same thing as the first option, but dynamically, and less performant (but still quite fast)
Below you have the second option:
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else return 1000;",
"params": {
"factor": [{
"IP_QUAL_REASS":1,
"IP_COMPLE":2,
"whatever":3
}
]
}
},
"order": "asc"
}
}
If you also want things like LIKE WHATEVER%, my suggestion is to consider an indexing time change, rather than search time because the script gets more complex. But, this is the one for wildcard matches as well:
"sort": {
"_script": {
"type": "number",
"script": {
"source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else { params.wildcard_factors[0].entrySet().stream().filter(kv -> doc['application_status.keyword'].value.startsWith(kv.getKey())).map(Map.Entry::getValue).findFirst().orElse(1000)}",
"params": {
"factor": [
{
"IP_QUAL_REASS": 1,
"IP_COMPLE": 2,
"whatever": 3
}
],
"wildcard_factors": [
{
"REJ_": 66
}
]
}
},
"order": "asc"
}
}

Elasticsearch prioritize specific _ids but don't filter?

I'm trying to sort my query in elasticsearch where the query will prioritize documents with specific _ids to appear first but it won't filter the entire query based on the _ids it's just prioritizing them.
Here's an example of what I've tried as an attempt:
{"query":{"constant_score":{"filter":{"terms":{"_id":[2,3,4]}},"boost":2}}}
So the above would be included along with other queries however the query just returns the exact matches and not the rest of the results.
Any ideas as to how this can be done so that it just prioritizes the documents with the ids but doesn't filter the entire query?
Try this (and instead of that match_all() there you can use a query to actually filter the results):
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"terms": {
"_id": [
2,
3,
4
]
}
},
"weight": 2
}
]
}
}
}
If you need to return in exact order as you need go with
"sort": [
{
"_script": {
"script": "doc['id'] != null ? sortOrder.indexOf(doc['id'].value.toInteger()) : 0",
"type": "number",
"params": {
"sortOrder": [
2,3,4
]
},
"order": "desc"
}
},
"_score"
]
P.S. As #Val mentioned wityh _id this will not work, so you would need to store id field as separate.
If you need move documents to top look to function_score

ElasticSearch more_like_this with restricted result set

I want to run a more_like_this query, but only get the top results within a specific set of documents, so I would provide the IDs of these documents. Is there any way to do this? Docs indicate no.
One way would be to use a filtered query and use the id filter to specify the set of documents you want the more_like_this query to work on
Example:
{
"query": {
"filtered": {
"query": {
"more_like_this": {
"fields": [
"ticker.whitespace"
],
"like_text": "WFC",
"min_term_freq": 1,
"max_query_terms": 12
}
},
"filter": {
"ids": {
"values": [
"7667"
]
}
}
}
}
}

I don't get any documents back from my elasticsearch query. Can someone point out my mistake?

I thought I had figured out Elasticsearch but I suspect I have failed to grok something, and hence this problem:
I am indexing products, which have a huge number of fields, but the ones in question are:
{
"show_in_catalogue": {
"type": "boolean",
"index": "no"
},
"prices": {
"type": "object",
"dynamic": false,
"properties": {
"site_id": {
"type": "integer",
"index": "no"
},
"currency": {
"type": "string",
"index": "not_analyzed"
},
"value": {
"type": "float"
},
"gross_tax": {
"type": "integer",
"index": "no"
}
}
}
}
I am trying to return all documents where "show_in_catalogue" is true, and there is a price with site_id 1:
{
"filter": {
"term": {
"prices.site_id": "1",
"show_in_catalogue": true
}
},
"query": {
"match_all": {}
}
}
This returns zero results. I also tried an "and" filter with two separate terms - no luck.
A subset of one of the documents returned if I have no filters looks like:
{
"prices": [
{
"site_id": 1,
"currency": "GBP",
"value": 595,
"gross_tax": 1
},
{
"site_id": 2,
"currency": "USD",
"value": 745,
"gross_tax": 0
}
]
}
I hope I am OK to omit so much of the document here; I don't believe it to be contingent but I cannot be certain, of course.
Have I missed a vital piece of knowledge, or have I done something terminally thick? Either way, I would be grateful for an expert's knowledge at this point. Thanks!
Edit:
At the suggestion of J.T. I also tried reindexing the documents so that prices.site_id was indexed - no change. Also tried the bool/must filter below to no avail.
To clarify, the reason I'm using an empty query is that the web interface may supply a query string, but the same code is used to simply filter all products. Hence I left in the query, but empty, since that's what Elastica seems to produce with no query string.
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}
}
}
You have site_id set as {"index": "no"}. This tells ElasticSearch to exclude the field from the index which makes it impossible to query or filter on that field. The data will still be stored. Likewise, you can set a field to only be in the index and searchable, but not stored.
I'm new to ElasticSearch as well and can't always grok the questions! I'm actually confused by you query. If you are going to "just filter" then you don't need a query. What I don't understand is your use of two fields inside the term filter. I've never done this. I guess it acts as an OR? Also, if nothing matches, it seems to return everything. If you wanted a query with the results of that query filtered, then you would want to use a
-d '{
"query": {
"filtered": {
"query": {},
"filter": {}
}
}
}'
If you just want to apply filters is the filter that should work without any "query" necessary
-d '{
"filter": {
"bool": {
"must": [
{
"term": {
"show_in_catalogue": true
}
},
{
"term": {
"prices.site_id": 1
}
}
]
}
}
}'

Resources