How do I log all queries in embedded ElasticSearch? - elasticsearch

I'm trying to debug an ElasticSearch query. I've enabled explain for the problematic query, and that is showing that the query is doing a product of intermediate scores where it should be doing a sum. (I'm creating the query request using elastic4s.)
The problem is I cannot see what the generated query actually is. I want to determine whether the bug is in elastic4s (generating the query request incorrectly), in my code, or in elasticsearch. So I've enabled logging for the embedded elasticsearch instance used in the tests using the following code:
ESLoggerFactory.setDefaultFactory(new Slf4jESLoggerFactory())
val settings = Settings.settingsBuilder
.put("path.data", dataDirPath)
.put("path.home", "/var/elastic/")
.put("cluster.name", clusterName)
.put("http.enabled", httpEnabled)
.put("index.number_of_shards", 1)
.put("index.number_of_replicas", 0)
.put("discovery.zen.ping.multicast.enabled", false)
.put("index.refresh_interval", "10ms")
.put("script.engine.groovy.inline.search", true)
.put("script.engine.groovy.inline.update", true)
.put("script.engine.groovy.inline.mapping", true)
.put("index.search.slowlog.threshold.query.debug", "0s")
.put("index.search.slowlog.threshold.fetch.debug", "0s")
.build
but I can't find any queries being logged in the log file configured in my logback.xml. Other log messages from elasticsearch are appearing there, just not the actual queries.

You can't, at least not directly, at least not in ES versions currently available. It's something that has been discussed at some length (eg https://github.com/elastic/elasticsearch/issues/9172 and https://github.com/elastic/elasticsearch/issues/12187) it seems like this may change soon, with the rewrite of the tasks API. In the meantime, you can use things like ES Restlog (https://github.com/etsy/es-restlog) and/or put nginx in front of ES and capture the queries in the nginx logs. You can also use tcpdump (eg tcpdump -vvv -x -X -i any port 9200) and capture the query as it's running on the server. One last option is to modify your application and echo the query instead of executing it (and/or inserting the query into ES itself before you execute it, since the query itself is JSON).

In the specific case of elastic4s, it offers the ability to call .show on the elastic4s query object to generate what the JSON body part of the request would have been if the JSON-over-HTTP protocol had been used to send the request, for most types of request. This can then be logged at a convenient point in your code, e.g. if you have one method that generates all ES search queries. The code in Elasticsearch that generates the fake JSON could still have bugs of course, so it should not entirely be trusted. However, it's worth trying to reproduce the issue with the output of .show using Sense against a real Elasticsearch cluster over HTTP - if you can, you (a) know that it's not an elastic4s bug, and (b) can easily manipulate the JSON to try to figure out what's causing the problem.
show calls toString in some cases, so with the plain Elasticsearch API or another JVM-based wrapper on top of it, you can call that to get the JSON string to log.
With embedded Elasticsearch, this is as good as you're going to get in terms of logging - short of putting a breakpoint on the builder invocations and observing the actual Java Elasticsearch request objects that are created (which is the most accurate approach).

Related

How to submit queries from the elastic cloud api console?

I'm new to the elastic-cloud interface. It allows to chooose operations get, post, put and del. I'm trying to submit queries, but I don't know the precise syntax. For instance:
tweet/_search?q=something
works, but:
tweet/_search?q={ "match_all": {} }
does not, returning a parser error. I have tried with double quotes, but it seems that then it searches for the query as a string.
The preferred way to test the search APIs are using the POST method, GET API in some case, gives even incorrect search results as it ignores the search and brings the top 10 search results for match_all query.
Elasticsearch supports both methods GET and POST to search but using the GET method which has payload information isn't common on modern app-severs, although Elasticsearch implemented it requires carefully crafting your queries.
Still, if you want to use the GET API, then for complex queries its better to send it as part of request body, I know it sounds weird to send a body to GET request but it works 😀 .

In Elasticsearch searches, are query string parameters for GET requests and the "Query DSL" for POST requests functionally equivalent?

I'm trying to create a small app that displays some simple visualizations from data indexed on Elasticsearch (on an AWS managed Elasticsearch service).
Since, to the best of my knowledge, the degree of access control that AWS offers over its ES service is based on allowing specific HTTP verbs (GET, POST, etc), to simplify my life and the ES admin's, I'm granting this app "read only" permissions, so only GET and HEAD.
However, I see that for its search API, ES exposes a GET endpoint that works with query string parameters, and a POST endpoint that works with a JSON based "Query DSL". This DSL seems to be the preferred method in all examples I have seen online and in the books.
Given the predominance of the Query DSL throughout the documentation, I was wondering:
Does the the Query DSL exposes functionality that standard query string parameters don't, or are they both functionally equivalent?
Does the POST search endpoint result in any data being actually POSTED, or is this only a workaround to allow to send JSON as a query that breaks a little bit with REST conventions?
As per the docs
You can use query parameters to define your search criteria directly in the request URI, rather than in the request body. Request URI searches do not support the full Elasticsearch Query DSL, but are handy for testing.
The GET behavior is slightly confusing but even Kibana sends a POST in the background when you perform a GET with a body. If you have to use GET, some query results might be unexpected. What's your exact use case? Which queries are we talking?
FYI more useful info is here and here.

ElasticSearch: Is it possible to use dfs_query_then_fetch with the explain API?

Did the Explain endpoint ever support search_type: dfs_query_then_fetch? If it does now (I'm on 7.1), how do I specify it?
I was thrown for a loop when using the Explain API on two identical documents, but seeing different score calculations. Learning the documents lived in different shards, and that the TF/IDF inputs were calculated per-shard explained the difference. Using dfs_query_then_fetch on the Search API normalized the scores, but the ElasticSearch .net client (both LowLevel and NEST) don't appear to expose a way to specify it for calls to the Explain API.
I also tried to form a request manually, passing it as a querystring or request body parameter. Both fail saying the argument is invalid. I thought perhaps the Explain endpoint didn't offer a way to specify dfs_query_then_fetch, but digging through some old issues it appears that it at least did at some point:
https://github.com/elastic/elasticsearch/issues/2612
Search type is not supported on the explain API. An approach that might work would be to use the Search API with dfs_query_then_fetch and explain, with a compound query that filters only to the document you're interested in (using IdsQuery), along with the query you want the explanation for.

How do I generate fragment types for apollo client?

I have an app written with reason-react using apollo-client. I have defined some fragments on the frontend to basically reuse some field definitions. I'm setting up automated tests for a components that uses fragments, but I keep getting this warning saying I need to use the IntrospectionFragmentMatcher.
'You are using the simple (heuristic) fragment matcher, but your queries contain union or interface types. Apollo Client will not be able to accurately map fragments. To make this error go away, use the `IntrospectionFragmentMatcher` as described in the docs: https://www.apollographql.com/docs/react/advanced/fragments.html#fragment-matcher'
I've tried setting up the fragment matcher according to the docs. The codegen result returns no types:
{
"__schema": {
"types": []
}
}
When I queried my server and looked at the manual method recommended by apollo-client, I noticed it would also return no types.
Another strange thing is that when I don't use the fragment matcher, I get the mocked response back but I just get the warnings from apollo. If I do use it then the mocked response doesn't return correctly.
Why would I query the graphql api for fragments defined in my frontend code? Why would I only received these errors when running the tests & using mock data, but not when running my actual application?
As the error states, the default fragment matcher does not work on intersection or union types. You will need to use Apollo's IntrospectionFragmentMatcher. It works by asking the server (introspecting) for information about your schema types, and then providing that information for reference to the cache so that it can match the fields accurately. It's not querying the server for information about the fragments you are defining on the front end, it's asking for data about the GraphQL schema that must be defined on your back end so that it can properly relate the two. There is an example in the documentation, also more information here.
As for why your server is not returning any types, that is a separate issue that would require more info to debug. If you're using Apollo Server, doublecheck your schema to make sure all the necessary types are defined properly and that you are passing them into the server when it's initialized.

Why I can't search for UUID in elasticsearch database using filters

I have Django app which use elasticsearch database, I use elasticsearch-dsl and all my filters and queries works. But I have a problem with one parameter, it's UUID. I always got 0 results from my request in shell:
s = Search(index='my_index_name').filter('term', UUID='0deaa49b-15b6-4c10-acb7-d98df800e0df')
response=s.execute()
response
I use django-rest-elasticsearch and I have the same issue, I got correct REST result with all my filters, but not with UUID request. Something like this works, but I need to use filtering.
q = Q("multi_match", query="0deaa49b-15b6-4c10-acb7-d98df800e0df", fields=["UUID",])
response=s.execute()
response
Maybe someone know hot to use UUID in my REST, because UUID=0deaa49b-15b6-4c10-acb7-d98df800e0df don't work.

Resources