Why am I not getting expected results when searching in ElasticSearch using chrome plugin Sense? - elasticsearch

So I've setup the following data set so I can test searching on an field storing multiple values:
post /test/participant
{
"Synonyms" : [ "foo" ]
}
post /test/participant
{
"Synonyms" : [ "bar" ]
}
post /test/participant
{
"Synonyms" : [ "foo", "bar" ]
}
I've tried to get some data back by trying something like:
get /test/participant/_search
{
"query": {
"filtered": {
"filter": {
"term": { "Synonyms": "foo" }
}
}
}
}
and I was expecting to get back the first and third records (see order above). However, I keep on getting all the records back. I've tried no end of alerations to the query to try and get something sensible (there's not enough space to add them here) and all I keep on getting is all the records in the index. Does anyone have an idea how I would query to get back those records with "foo" as a value (1st and 3rd)? And is there some subtle point I've been missing here? I'm aware that ElasticSearch does not store the values as an array but as an unordered collection.

I think you are running these queries in Sense, right?
The commands you need are these:
POST /test/participant
{"Synonyms":["foo"]}
POST /test/participant
{"Synonyms":["bar"]}
POST /test/participant
{"Synonyms":["foo","bar"]}
GET /test/participant/_search
{
"query": {
"filtered": {
"filter": {
"term": {
"Synonyms": "foo"
}
}
}
}
}
The explanation is related to GET vs. POST http methods.
Behind the scene Sense actually converts a GET request to a HTTP POST (given that many browsers do not support HTTP GET requests with a request body). This means that, even if you write GET, the actual http request is a POST.
Because Sense has the autocomplete that forces upper case letters for request methods, it uses the same upper case letters when deciding if it's a GET (and not a get) request together with a request body. If it is, then that request is transformed to a POST one. If it compares the request method and decides is not a GET it sends the request as is, meaning with a get method and with a body. Since the body is ignored, what reaches Elasticsearch will be a get /test/participant/_search which is basically a match_all which, of course, returns all documents :-).

Related

Best practices for writing a PUT endpoint for a REST API

I am building a basic CRUD service with some business logic under the hood, and I'm about to start working on the PUT (update) endpoint. I have already fully written+tested GET (read) and POST (create) for my data object. The data store for my documents is an ElasticSearch instance on AWS.
I have some decisions to make about how I want to architect the PUT, namely, how I want to determine a valid request. My goal is to make it so the POST is only for the creation of new assets, and PUT will only update existing documents. (At the moment, I am POSTing to elastic with /_doc/, however the intent is to move to /_create/ as part of this work)
What I'm a little hung-up on is the "right" way to check that a document exists before making the API call to Elastic to update.
When a user submits a document to PUT, should I first GET from Elastic with the document ID to make sure the document already exists? Or should I simply try to "update" the resource and if it doesn't exists, one is created?
Obviously there are trade-offs to each strategy. With the latter, PUTing a document that doesn't exist almost completely negates the need for a POST at all, so I'd be more inclined to go with the former - despite the additional REST call - to maintain the integrity of the basic REST definition.
Thoughts?
The consideration whether to update a doc (with versioning) or create a new one with some shared ID related to all previous versions depends on your use case -- either of them are 'correct' but there's too little information to advise on that right now.
With regards to the document-exists strategies -- there are essentially 2 types of IDs in ES -- what I call:
internal ids (_id)
external ids (doc_values-provided ids)
Create an index & a doc:
PUT myindex
PUT myindex/_doc/internal_id_1
{
"external_id": "1"
}
Internal ID check
GET myindex/_doc/internal_id_1
or
GET myindex/_count
{
"query": {
"ids": {
"values": [
"internal_id_1"
]
}
}
}
or
GET myindex/_count
{
"query": {
"term": {
"_id": {
"value": "internal_id_1"
}
}
}
}
External ID check
GET myindex/_count
{
"query": {
"term": {
"external_id": {
"value": "1"
}
}
}
}
and many others (terms, match (for partial matches etc), ...)
Note that I've used the _count endpoint instead of _search -- it's slightly faster.
If you intend to check the _version of a given doc before you proceed to update it, replace _count with _search?version=true and the _version attribute will become available:
{
"_index":"myindex",
"_type":"_doc",
"_id":"internal_id_1",
"_version":2, <---
"_score":1.0,
"_source":{
"external_id":"1"
}
}

How to ignore highlighted fields in a kibana query?

I'm trying to make a query in kibana that shows all the errors in a service, but the results only shows the data with the field "highlight", how can I ignore it?
I've tried making a DSL query like this:
{
"query": {
"exists": {
"field": "payload.error"
}
}
}
but it does not work as I expect
This is the structure of the data that is show in the query answer:
"payload": {
"method": "standardError",
"error": {
"code": "300",
"detail": "{\"Cliente no posee fecha\"}",
"message": "BUS_ERROR"
}
},
"highlight": {
"payload.error.code.keyword": [
"#kibana-highlighted-field#107#/kibana-highlighted-field#"
]
}
The data that is not show in the query result does not have the field "highlight" but have the exact same payload structure
I expect a query that shows all the data with the field payload.error, no matter if it has a highlight field or not
Not sure what you are looking for, but can't you just use the Search Box on the top and write something like:
payload.error:* AND highlight.payload.error.code.keyword:
That will give you all the hits that fulfill them both. Or:
payload.error:*
That will give you all the hits where "payload.error" is used.
As I read your example, then highlight doesn't really have anything to do with the first payload, you have above and therefore searching only for "payload.error:*" should be enough?

Search within the results got from elasticsearch

Is it possible to search within the results that I get from elasticsearch?
To achieve that currently I need to run & wait for two searches on elasticsearch: the first search is
{ "match": { "title": "foo" } }
It takes 5 seconds and returns 500 docs etc.. And then a second search
{
"bool": {
"must": [
{ "match": { "title": "foo" } },
{ "match": { "title": "bar" } }
]
}
}
It takes another 5 seconds and returns 200 docs, which basically has nothing to do with the first search from elasticsearch's perspective.
Instead of doing it this way, I'd like to offer a "search further within the result" option to my users. Hopefully with this option, users can make a search with more keyword provided based on the result returned from the first search.
So my scenario is that a user makes a first search with keyword "foo", and gets 500 results on the webpage, and then selects "search further within the result", to make a second search within the 500 results, and hope to get some refined results really quick.
How can I achive it? Thanks!
What you could do is use the IDS query. Collect all document IDs from the first request, and then post them with a new Bool query that includes an IDS query in a must clause next to the original query. You could efficiently collect the IDs in the first request using the Scroll API. Since you will return the second result sorted anyway, it does not make sense to do any sorting in the first request, so you can speed up the first request.
See:
Scroll API: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
IDS Query: https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html
post filter is a way to search inside an other search.
In your case :
GET _search
{
"query": {
"match": {
"title": "foo"
}
},
"post_filter": {
"match": {
"title": "bar"
}
}
}
post_filter will be executed on the query result.

How about using body and GET parameters at the same time?

I am passing here some parameters via get to limit query result and also query_string is passed in url. Although, I am also giving request body to filter results.
curl -XGET 'http://localhost:9200/books/fantasy/_search?from=0&size=10&q=%2A' -d '{
"query":{
"filtered":{
"filter":{
"exists":{
"field":"speacial.ean"
}
}
}
}
}'
I just want to check is this approach okay? is there any downsides doing it like this? Or should I pass any parameters in url when body is used?
This seems to work, but is it bad practice?
GET requests are not supposed to use a body ( more information on this here). While curl might convert your GET requests with a body to POST, many tools might simply drop the body, or it might be sent to Elastic but ignored because you used GET.
When executing this query in my SENSE, I get all the documents instead of just the document matching my query, proving that the body has been ignored:
GET myIndex/_search
{
"query": {
"match": {
"zlob": true
}
}
}
This example shows that you should avoid to use GET to make requests with a body, because the result will depend on the tool you use for your rest queries.

Confused about elasticsearch query

POST http://localhost:9200/test2/drug?pretty
{
"title": "I can do this"
}
get test2/drug/_search
{
"query" : {
"match": {
"title": "cancer"
}
}
}
The mappings are:
{
"test2": {
"mappings": {
"drug": {
"properties": {
"title": {
"type": "string"
}
}
}
}
}
}
Running the above query returns the document. I want to understand what elastic is doing behind the scenes? From looking at the output of the default analyzer it does not tokenize cancer such that it returns "can" so why is a document with the word "can" being returned and what is causing this to be returned? In other words, what other processing is happening to the search query "cancer".
Updated
Is there a command I can run on my box that will clear all indexes and everything so I have a clean slate? I ran delete /* which succeeded but still getting a match.
The problem with your test is, if you are using Sense, the get request. In Sense it should be GET (capital letters).
The explanation is related to GET vs. POST http methods.
Behind the scene Sense actually converts a GET request to a HTTP POST (given that many browsers do not support HTTP GET requests with a request body). This means that, even if you write GET, the actual http request is a POST.
Because Sense has the autocomplete that forces upper case letters for request methods, it uses the same upper case letters when deciding if it's a GET (and not a lowercase get) request together with a request body. If it is, then that request is transformed to a POST one. If it compares the request method and decides is not a GET it sends the request as is, meaning with a get method and with a body. Since the body is ignored, what reaches Elasticsearch will be a test2/drug/_search which is basically a match_all.
I guess that you configured in your index mappings an NGram filter or tokenizer. Let's suppose (I hope you'll confirm my hypothesis) an Edge NGram is configured. You can check it with:
GET test2/_mapping
Then the document is tokenized: i,c,ca,can,d,do,t,th,thi,this. As a result, in the index, the token can points to the document I can do this
When you're searching cancer, the tokens c,ca,can,canc,cance,cancer are produced by the same analysis chain, and then looked for in the index. As a result your document is found.
With the NGram filter, you often need to configure a different analyzer for search than for indexing, for instance:
index_analyzer/analyzer: standard + edge ngram
search_analyzer: stardand along
Then if you search can you'll find documents containing can,cancer,candy... But if you search cancer, you'll only find documents containing cancer,cancerology... and so on.

Resources