Get document by index position in Elasticsearch - elasticsearch

I am working with Elasticsearch and I am getting a query error:
elasticsearch.exceptions.TransportError: TransportError(500, 'search_phase_execution_exception', 'script score query returned an invalid score: NaN for doc: 32894')
It seems like my metric is returning NaN for document 32894 (NaN for doc: 32894). Naturally, the next step is to look at that document to see if there is anything wrong with it.
The problem is that I upload my documents using my own ID, so "32894" is meaningless for me.
A query like
curl -X GET "localhost:9200/my_index/_doc/one_of_my_ids?pretty&pretty"
works fine, but this fails if I try with the doc number from the error message.
I expected this to be trivial, but some Google has failed to help.
How can I then find this document? Or is using my own IDs not recommended and the unfixable source of this problem?
Edit: as requested, this is the query that fails. Note that obviously fixing this is my ultimate goal, but not the specific point of this question. Help appreciated in either case.
I am using the elasticsearch library in Python.
self.es.search(index=my_index, body=query_body, size=number_results)
With
query_body = {
"query": {
"script_score": {
"query": {"match_all": {}},
"script": {
"source": "cosineSimilaritySparse(params.queryVector, doc['embedding']) + 10.0",
"params": {"queryVector": query_vector}
}
}
}
}

Related

How to set case_insensitive for term query in elasticsearch?

In elasticsearch term query documentation: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-term-query.html, it is stated there's a case_insensitive field
However, I can't get to set it. I can set boost and value without issue, but not case_insensitive.
GET movies/_search
{
"query": {
"term": {
"overview": {
"value" : "batman",
"boost": 0.5,
"case-insensitive": true
}
}
}
}
When I run, I get the error state "[term] query does not support [case_insensitive]".
Where did I get it wrong, or the documentation is wrong?
Looks like you are on less than the ES 7.10.0 version where it was not present, Even if you check the documentation of ES 7.9 option of case_insensitive is not present.
Please find the related link of Github issue and PR which added support of case-insensitive to term query.
Please refer to this diff where caseInsensitive field was added to TermQuery.

Return which field got matched in Elastic Search

I am trying to find out what actually got matched for a search in a specific for which the doc is returned.
Ex. I have a table index where there are fields called table_name and column_name...
My search query is finding both those fields, now If I fire a search query and any one of them gets matched ,but I want to know what got matched .. whether its column_name or the table_name.
I am aware of the Explain API but that will require me to call another API...
You don't need to call the explain API. The search API supports the explain flag
GET stackoverflow/_search?explain=true
This will return the _explanation section along with the _source section.
Update
Another solution would be to use highlight. I've used this before, for manually evaluating queries. It's an easy way to get some feedback on what matched
GET stackoverflow/_search
{
"query": {
"match": {
"FIELD": "TEXT"
}
},
"highlight": {
"fields": {
"*": {}
}
}
}
Of course, you can have the explain flag set as well

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

Elasticsearch truncate string field in query

To display recent exceptions on a Grafana dashboard I am doing a query on exceptions in logfiles. Grafana doesn't seem to have an option to limit a string value in table view. Of course the stacktraces are huge.
So I came up with the idea to limit this field in the used Lucene query, but I am unaware on how to do this. I tried doing this using a painless script:
{
"query": {
"match_all": {}
},
"script_fields": {
"message_short": {
"script": {
"lang": "painless",
"inline": "return doc['message'].value.substring(50);"
}
}
}
}
I don't get any error but also no additional field "message_short" which I would have expected. Do I have to enable scripting support somehow? I'm running on v6.1.2
I got a workaround implemented where I have a drilldown URL ("Render value as link" in Grafana Table) where I render a link to my Kibana instance and use the Grafana variable $__cell that references the document_id I get from the underlying Elasticsearch query:
https://mykibana.host/app/kibana#/doc/myindex-*/myindex-prod-*/logs?id=$__cell&_g=h#8b5b71a
Not perfect, but keeps my Dashboard readable and allows more info if needed. Even better would be to add a shorted field into the ES index, but that is not possible for me currently.

How to update multiple documents that match a query in elasticsearch

I have documents which contains only "url"(analyzed) and "respsize"(not_analyzed) fields at first. I want to update documents that match the url and add new field "category"
I mean;
at first doc1:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500"
}
I have an external data and I know "stackoverflow.com" belongs to category 10,
And I need to update the doc, and make it like:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500",
"category":"10"
}
Of course I will do this all documents which url fields has "stackoverflow.com"
and I need the update each doc oly once.. Because category data of url is not changeable, no need to update again.
I need to use _update api with _version number to check it but cant compose the dsl query.
EDIT
I run this and looks works fine:
But documents not changed..
Although query result looks true, new field not added to docs, need refresh or etc?
You could use the update by query plugin in order to do just that. The idea is to select all document without a category and whose url matches a certain string and add the category you wish.
curl -XPOST 'localhost:9200/webproxylog/_update_by_query' -H "Content-Type: application/json" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "stackoverflow.com"
}
},
{
"missing": {
"field": "category"
}
}
]
}
}
}
},
"script" : "ctx._source.category = \"10\";"
}'
After running this, all your documents with url: stackoverflow.com that don't have a category, will get category: 10. You can run the same query again later to fix new stackoverflow.com documents that have been indexed in the meantime.
Also make sure to enable scripting in elasticsearch.yml and restart ES:
script.inline: on
script.indexed: on
In the script, you're free to add as many fields as you want, e.g.
...
"script" : "ctx._source.category1 = \"10\"; ctx._source.category2 = \"20\";"
UPDATE
ES 2.3 now features the update by query functionality. You can still use the above query exactly as is and it will work (except that filtered and missing are deprecated, but still working ;).
That all sounds great but just to add to #Val answer, Update By Query is available form ElasticSearch 2.x but not for earlier versions. In our case we're using 1.4 for legacy reasons and there is no chance of upgrading in forseeable future so another solution is using the Update by query plugin provided here: https://github.com/yakaz/elasticsearch-action-updatebyquery

Resources