ElasticSearch more_like_this with restricted result set - elasticsearch

I want to run a more_like_this query, but only get the top results within a specific set of documents, so I would provide the IDs of these documents. Is there any way to do this? Docs indicate no.

One way would be to use a filtered query and use the id filter to specify the set of documents you want the more_like_this query to work on
Example:
{
"query": {
"filtered": {
"query": {
"more_like_this": {
"fields": [
"ticker.whitespace"
],
"like_text": "WFC",
"min_term_freq": 1,
"max_query_terms": 12
}
},
"filter": {
"ids": {
"values": [
"7667"
]
}
}
}
}
}

Related

elasticsearch how do i query (search) in single document?

assuming that index's name is index & document 1's id is "1"
how can i query in single document?
something like this..
GET index/_search
{
"query": {
"id": "1",
"terms": ["is this text in document 1?"]
}
}
or
GET index/_doc/1/_search
{
...
}
far as i found,
GET test/_doc/_search
{
"query": {
"terms" : {
"_id" : ["1"]
}
}
}
this will get the document id of "1", but cannot perform any further queries.
the reason i want to query inside single document is because my app is using live-news view
and once news is retrieved from server, i want to search it in elasticsearch for keywork higlighting, and spam filtering.
You have to compose your query with Boolean Query
The best approch is to specify the id query under the filter because it will not have effect on scoring. You can next specify queries under must, must_not and should, according to your need :
GET index/_search
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": [
{
"term": {
"field": "value"
}
}
],
"must_not": [],
"should": [],
"filter": [
{
"terms": {"_id": ["1"]}
}
]
}
}
}

How to sort elasticsearch results based on number of collapsed items?

I'm using a a query with collapse in order to gather some documents under a certain person, yet I wish to sort the results based on the number of documents in which the search found a match.. this is my query:
GET documents/_search
{
"_source": {
"includes": [
"text"
]
},
"query": {
"query_string": {
"fields": [
"text"
],
"query": "some text"
}
},
"collapse": {
"field": "person_id",
"inner_hits": {
"name": "top_mathing_docs",
"_source": {
"includes": [
"doc_year",
"text"
]
}
}
}
}
Any suggestions?
Thanks
If I understand correctly, what you require here is to sort the documents i.e. parent documents, based on the count of inner_hits i.e. count of inner_hits based on person_id.
So that means, the _score of the parent documents in the result doesn't matter.
The only way I've found this doable is making use of the Top Hits Aggregation for Field Collapse Example and below is what your query would look like.
Aggregation Query Field Collapse Example:
POST <your_index_name>/_search
{
"size":0,
"query": {
"query_string": {
"fields": [
"text"
],
"query": "some text"
}
},
"aggs": {
"top_person_ids": {
"terms": {
"field": "person_id"
},
"aggs": {
"top_tags_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
Note that I'm assuming person_id is of type keyword or any numeric.
Also if you look at query closely, I've mentioned "size":"0". Which means I'm only returning the result of aggregation.
Another note is that the above aggregation has nothing to do with Field Collapse in Search Request feature that you have posted in the question. It's just that using this aggregation, your result could be formatted in a similar way.
Let me know if this helps!

Elasticsearch search in documents with certain values for a field

I have an index with following document structure with 5 fields. I have written a search query as follows :
{
"query": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"from": 0,
"size": 1000
}
This works fine but as a new requirement I have to search only in documents where field4 has a given set of values suppose (1,2,3) and omit rest of the documents.
It is possible for me to obtain a list of field4 values which are to be omitted as they are present in the db with skip status.
Please suggest a solution for the same.Thanks in advance.
I suggest using a filter query inside a bool query to match the docs that meet the condition.
{
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"field1.keyword",
"field2.keyword",
"field3.keyword"
],
"query": "*abc*"
}
},
"filter": {
"terms": {
"field4.keyword": [1, 2, 3]
}
}
}
}
}

Elasticsearch prioritize specific _ids but don't filter?

I'm trying to sort my query in elasticsearch where the query will prioritize documents with specific _ids to appear first but it won't filter the entire query based on the _ids it's just prioritizing them.
Here's an example of what I've tried as an attempt:
{"query":{"constant_score":{"filter":{"terms":{"_id":[2,3,4]}},"boost":2}}}
So the above would be included along with other queries however the query just returns the exact matches and not the rest of the results.
Any ideas as to how this can be done so that it just prioritizes the documents with the ids but doesn't filter the entire query?
Try this (and instead of that match_all() there you can use a query to actually filter the results):
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"functions": [
{
"filter": {
"terms": {
"_id": [
2,
3,
4
]
}
},
"weight": 2
}
]
}
}
}
If you need to return in exact order as you need go with
"sort": [
{
"_script": {
"script": "doc['id'] != null ? sortOrder.indexOf(doc['id'].value.toInteger()) : 0",
"type": "number",
"params": {
"sortOrder": [
2,3,4
]
},
"order": "desc"
}
},
"_score"
]
P.S. As #Val mentioned wityh _id this will not work, so you would need to store id field as separate.
If you need move documents to top look to function_score

How to implement 'Starts with' search in elasticsearch 2.x

I have a requirement where I need to return only those records whose comments donot start with a String. PFB the query and this approach is not working. Need help
{
"size": 0,
"fields": ["id","comment"],
"query": {
"bool": {
"must_not": [
{
"wildcard": {
"comment":
"AG//*"
}
}
]
}
}
}
First, you should remove the "size": 0 from your query (or set the required size) to see the results.
Now, the best way to implement 'Starts with' in elasticsearch is by using the Prefix Query as follows:
{
"fields": ["id", "comment"],
"query": {
"bool": {
"must_not": [
{
"prefix": {
"comment": "AG" <-- No need for any wildcards
}
}
]
}
}
}
Note: The Prefix Query and Wildcard Query makes sense only on not_analyzed fields, so make sure your "comment" field has the same mapping.

Resources