how to log or print python elasticsearch-dsl query that gets invoked - elasticsearch

I am using elasticsearch-dsl for my python application to query elastic search.
To debug what query is actually getting generated by elasticsearch-dsl library, I am unable to log or print the final query that goes to elasticsearch.
For example, like to see the request body sent to elasticsearch like this :
{
"query": {
"query_string": {
"query": "Dav*",
"fields": ["name", "short_code"],
"analyze_wildcard": true
}
}
}
Tried to bring the elasticsearch log level to TRACE. Even then, unable to see the queries that got executed.

Take a look at my blog post here, "Slowlog settings at index level" section. Basically, you can use slowlog to print in a separate log file Elasticsearch generates, the queries. I suggest using a very low threshold to be able to see all the queries.
For example, something like this, for a specific index:
PUT /test_index/_settings
{
"index": {
"search.slowlog.level": "trace",
"search.slowlog.threshold.query.trace": "1ms"
}
}
Or
PUT /_settings
{
"index": {
"search.slowlog.level": "trace",
"search.slowlog.threshold.query.trace": "1ms"
}
}
as a cluster-wide setting, for all the indices.
And the queries will be logged in your /logs location, a file called [CLUSTER_NAME]_index_search_slowlog.log.

Related

How to extract and visualize values from a log entry in OpenShift EFK stack

I have an OKD cluster setup with EFK stack for logging, as described here. I have never worked with one of the components before.
One deployment logs requests that contain a specific value that I'm interested in. I would like to extract just this value and visualize it with an area map in Kibana that shows the amount of requests and where they come from.
The content of the message field basically looks like this:
[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}
This plz is a German zip code, which I would like to visualize as described.
My problem here is that I have no idea how to extract this value.
A nice first success would be if I could find it with a regexp, but Kibana doesn't seem to work the way I think it does. Following its docs, I expect this /\"plz\":\"[0-9]{5}\"/ to deliver me the result, but I get 0 hits (time interval is set correctly). Even if this regexp matches, I would only find the log entry where this is contained and not just the specifc value. How do I go on here?
I guess I also need an external geocoding service, but at which point would I include it? Or does Kibana itself know how to map zip codes to geometries?
A beginner-friendly step-by-step guide would be perfect, but I could settle for some hints that guide me there.
It would be possible to parse the message field as the document gets indexed into ES, using an ingest pipeline with grok processor.
First, create the ingest pipeline like this:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
}
]
}
Then, when you index your data, you simply reference that pipeline:
PUT plz/_doc/1?pipeline=parse-plz
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}"""
}
And you will end up with a document like the one below, which now has a field called plz with the 12345 value in it:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345"
}
When indexing your document from Fluentd, you can specify a pipeline to be used in the configuration. If you can't or don't want to modify your Fluentd configuration, you can also define a default pipeline for your index that will kick in every time a new document is indexed. Simply run this on your index and you won't need to specify ?pipeline=parse-plz when indexing documents:
PUT index/_settings
{
"index.default_pipeline": "parse-plz"
}
If you have several indexes, a better approach might be to define an index template instead, so that whenever a new index called project.foo-something is created, the settings are going to be applied:
PUT _template/project-indexes
{
"index_patterns": ["project.foo*"],
"settings": {
"index.default_pipeline": "parse-plz"
}
}
Now, in order to map that PLZ on a map, you'll first need to find a data set that provides you with geolocations for each PLZ.
You can then add a second processor in your pipeline in order to do the PLZ/ZIP to lat,lon mapping:
PUT _ingest/pipeline/parse-plz
{
"processors": [
{
"grok": {
"field": "message",
"patterns": [
"%{POSINT:plz}"
]
}
},
{
"script": {
"lang": "painless",
"source": "ctx.location = params[ctx.plz];",
"params": {
"12345": {"lat": 42.36, "lon": 7.33}
}
}
}
]
}
Ultimately, your document will look like this and you'll be able to leverage the location field in a Kibana visualization:
{
"message": """[fooServiceClient#doStuff] {"somekey":"somevalue", "multivalue-key": {"plz":"12345", "foo": "bar"}, "someotherkey":"someothervalue"}""",
"plz": "12345",
"location": {
"lat": 42.36,
"lon": 7.33
}
}
So to sum it all up, it all boils down to only two things:
Create an ingest pipeline to parse documents as they get indexed
Create an index template for all project* indexes whose settings include the pipeline created in step 1

ElasticSearch - Delete documents by specific field

This seemingly simple task is not well-documented in the ElasticSearch documentation:
We have an ElasticSearch instance with an index that has a field in it called sourceId. What API call would I make to first, GET all documents with 100 in the sourceId field (to verify the results before deletion) and then to DELETE same documents?
You probably need to make two API calls here. First to view the count of documents, second one to perform the deletion.
Query would be the same, however the end points are different. Also I'm assuming the sourceId would be of type keyword
Query to Verify
POST <your_index_name>/_search
{
"size": 0,
"query": {
"term": {
"sourceId": "100"
}
}
}
Execute the above Term Query and take a note at the hits.total of the response.
Remove the "size":0 in the above query if you want to view the entire documents as response.
Once you have the details, you can go ahead and perform the deletion using the same query as shown in the below query, notice the endpoint though.
Query to Delete
POST <your_index_name>/_delete_by_query
{
"query": {
"term": {
"sourceId": "100"
}
}
}
Once you execute the Deletion By Query, notice the deleted field in the response. It must show you the same number.
I've used term queries however you can also make use of any Match or any complex Bool Query. Just make sure that the query is correct.
Hope it helps!
POST /my_index/_delete_by_query?conflicts=proceed&pretty
{
"query": {
"match_all": {}
}
}
Delete all the documents of an index without deleting the mapping and settings:
See: https://opster.com/guides/elasticsearch/search-apis/elasticsearch-delete-by-query/

Is it possible to put a comment in an Elasticsearch query?

Is it possible to put a comment into an Elasticsearch query JSON? I want to be able to add some extra text to the query that's human-readable but ignored by Elasticsearch.
For example, if I have the following query:
{ "query": { "match_all": {} } }
I would like to be able to add a comment, maybe something like this:
{ "query": { "match_all": {} }, "comment": "This query matches all documents." }
Hacky workarounds (e.g., a query clause that has no effect on the results) would also be appreciated.
Seems like Elasticsearch does allow Javascript comments (/* */ and //) in JSON (Despite the JSON standard not supporting comments). So that's another option.
One solution to make this work is to use named queries, i.e. each query can be named
{
"query": {
"match_all": {
"_name": "This query matches all documents."
}
}
}
by inserting # [hash symbol] ,yes you can put comment for elastic search queries in console

Elasticsearch to wildcard search email addresses

I'm trying to use elasticsearch for a project I'm working on. I was wondering if someone could help steer me in the right direction. I'm using an index with 100+ million records.
I need to be able to search with a wildcard query like the following:
b*g#gmail.com
b*g#*.com
*gus#gmail.com
br*gu*#gmail.com
*g*#*
When I try using Wildcard and other searches, I don't get completely expected results.
What type of search with elasticsearch should I look into implementing? Is ElasticSearch even the right tool to be using? The source I'm pulling this out of is Mysql, so if not I may consider using Sphinx or Solr.
I assume that you have tried out the wildcard query as described here.
However, it has very different behaviour if your email is analyzed versus not analyzed. I would suggest you delete your index and change your mapping. e.g.
PUT /emails
{
"mappings": {
"email": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
Once you have this, you can just do the normal wildcard query or query_string. e.g.
GET emails/_search
{
"query": {
"wildcard": {
"email": {
"value": "s*com"
}
}
}
}
As an aside, when you just index email without setting it as not_analyzed, the default mapping actually splits up the email prefix from the domain and so that's why you don't get results for when you do s*#gmail.com. You would still get results for s* or *gmail.com but for your case, using not_analyzed works correctly. If you want to support case insensitivity, then you might want to look at a custom analyzer that uses the uax_url_email tokenizer as described here.

How to update multiple documents that match a query in elasticsearch

I have documents which contains only "url"(analyzed) and "respsize"(not_analyzed) fields at first. I want to update documents that match the url and add new field "category"
I mean;
at first doc1:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500"
}
I have an external data and I know "stackoverflow.com" belongs to category 10,
And I need to update the doc, and make it like:
{
"url":"http://stackoverflow.com/users/4005632/mehmet-yener-yilmaz",
"respsize":"500",
"category":"10"
}
Of course I will do this all documents which url fields has "stackoverflow.com"
and I need the update each doc oly once.. Because category data of url is not changeable, no need to update again.
I need to use _update api with _version number to check it but cant compose the dsl query.
EDIT
I run this and looks works fine:
But documents not changed..
Although query result looks true, new field not added to docs, need refresh or etc?
You could use the update by query plugin in order to do just that. The idea is to select all document without a category and whose url matches a certain string and add the category you wish.
curl -XPOST 'localhost:9200/webproxylog/_update_by_query' -H "Content-Type: application/json" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"term": {
"url": "stackoverflow.com"
}
},
{
"missing": {
"field": "category"
}
}
]
}
}
}
},
"script" : "ctx._source.category = \"10\";"
}'
After running this, all your documents with url: stackoverflow.com that don't have a category, will get category: 10. You can run the same query again later to fix new stackoverflow.com documents that have been indexed in the meantime.
Also make sure to enable scripting in elasticsearch.yml and restart ES:
script.inline: on
script.indexed: on
In the script, you're free to add as many fields as you want, e.g.
...
"script" : "ctx._source.category1 = \"10\"; ctx._source.category2 = \"20\";"
UPDATE
ES 2.3 now features the update by query functionality. You can still use the above query exactly as is and it will work (except that filtered and missing are deprecated, but still working ;).
That all sounds great but just to add to #Val answer, Update By Query is available form ElasticSearch 2.x but not for earlier versions. In our case we're using 1.4 for legacy reasons and there is no chance of upgrading in forseeable future so another solution is using the Update by query plugin provided here: https://github.com/yakaz/elasticsearch-action-updatebyquery

Resources