Distributed Tracing logging and Integrating with Logstash,Kibana and ElasticSearch - elasticsearch

I have implemented a Distributed Transaction Logging library with Tree like Structure as mention in Google Dapper(http://research.google.com/pubs/pub36356.html) and eBay CAL Transaction Logging Framework(http://devopsdotcom.files.wordpress.com/2012/11/screen-shot-2012-11-11-at-10-06-39-am.png).
Log Format
TIMESTAMP HOSTNAME DATACENTER ENVIRONMENT EVENT_GUID PARENT_GUID TRACE_GUID APPLICATION_ID TREE_LEVEL TRANSACTION_TYPE TRANSACTION_NAME STATUS_CODE DURATION(in ms) PAYLOAD(key1=value2,key2=value2)
GUID HEX NUMBER FORMAT
MURMER_HASH(HOSTNAME + DATACENTER + ENVIRONMENT)-JVM_THREAD_ID-(TIME_STAMP+Atomic Counter)
What I would like to do is to integrate this format with Kibana UI and when user want to search and click on on TRACE_GUID it will show something similar to Distributed CALL graph which show where the time was spent. Here is UI http://twitter.github.io/zipkin/. This will be great. I am not UI developer if some can point me how to do this that will be great.
Also I would like to know how I can index elastic search payload data so user specify some expression like in payload (duration > 1000) then, Elastic Search will bring all the loglines that satisfy condition. Also, I would like to index Payload as Name=Value pair so user can query (key3=value2 or key4 = exception) some sort of regular expression. Please let me know if this can be achieved. Any help pointer would be great..
Thanks,
Bhavesh

The first step to good searching in elasticsearch is to create fields from your data. With logs, logstash is the proper tool. The grok{} filter uses patterns (existing or user-defined regexps) to split the input into fields.
You would need to make sure that it was mapped to an integer (e.g. %{INT:duration:int} in your pattern). You could then query elasticsearch for "duration:>1000" to get the results.
Elasticsearch uses the lucene query engine, so you can find sample queries based on that.

Related

Does ElasticSearch Keep Count The Number Of Times A Record Is Returned In A Given Period Of Time?

I have an ElasticSearch instance and it does one type of search - it takes a few parameters and returns the companies in its index that match the parameters given.
I'd like to be able to pull some stats that essentially says "This company has been returned from search queries X number of times in the past week".
Does ElasticSearch store metadata that will allow to pull this kind of info from it? If this kind of data isn't stored in ES out of the box, is there a way to enable it?
Elasticsearch (not ElasticSearch ;) ) does not do this natively, no. you can build something using the slow log, where you set the timing to 0 to get it to log everything, but that then logs everything which may not be useful/too noisy
things like https://www.elastic.co/enterprise-search, built on top of Elasticsearch, do provide this sort of insight

elasticsearch query statistics and analysis in near real time

I am pretty new to elasticsearch and I want to create statistics and kibana dashboards on queries sent to elasticsearch index , what is the best approach to do so ? Any advice or recommendations will be highly appreciated?
The idea is to analyze all queries sent to the index and do some performance optimisation in the future when the userbase increase ...
I am planning for the moment to store the logs in different index , but parsing seems to be kind of complex activity ...
Ideally I need to have:
-Counting of user queries
-Counting of queries that returned no results
-Logging of all search terms
-Sorting of queries, and queries that returned no results, by most frequently contained search term
-A view of top queries, including the search term not found results for and the exact query
-A view of top queries returning no results, including the search term not found results for and the exact query
Thanks
There is no OOTB functionality available in Elasticsearch for search analysis. But there are some workaround you can do for same and get information what you are asking.
First option, you can enable slow log in Elasticsearch by executing below command and it will log each and every request to coming to Elasticsearch.
PUT /my-index-000001/_settings
{
"index.search.slowlog.threshold.query.info": "0s",
"index.search.slowlog.threshold.fetch.info": "0s"
}
Second option, You can log all the query the application layer or intermediate level using which application and elasticsearch talking to each other.
Once you have logs, You can configured Logstash / Filebeat / Fleet to read log and transform and index to Elasticsearch. Logstash provide differnt kind of filter which you can use and easily transofrm your plain text logs to strcture logs (grok filter).

Using ElasticSearch Local version in postman

I am trying to Use my Elastic search server installed in my local machine to use Postman .i.e., With the help of Postman I want to Post Data and retrieve it with a get operation but unable to do it as I am getting error unknown key [High] for create index
So please help me with the same.
If you want to add a document to your index,
your url should look something like this ( for document ID 1 ) :
PUT http://localhost:9200/test/_doc/1
A good place to start :
https://www.elastic.co/guide/en/elasticsearch/reference/current/getting-started-index.html
For indexing document in the index
PUT http://localhost:9200/my_index/_doc/1
Retrieving indexed document
GET http://localhost:9200/my_index/_doc/1
Introduction:
Elasticsearch is a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases. As the heart of the Elastic Stack, it centrally stores your data for lightning fast search, fine‑tuned relevancy, and powerful analytics that scale with ease.
Kibana is a free and open user interface that lets you visualize your Elasticsearch data and navigate the Elastic Stack. Do anything from tracking query load to understanding the way requests flow through your apps.
Logstash is a free and open server-side data processing pipeline that ingests data from a multitude of sources, transforms it, and then sends it to your favorite “stash.” .
Elasticsearch exposes itself through rest API so in this case you don't have to use logstash as we are directly adding data to elastic search
How to add it directly
you can create an index and type using :
{{url}}/index/type
where index is like a table and type is like just a unique data type that we will be storing to the index. Eg {{url}/movielist/movie
https://praveendavidmathew.medium.com/visualization-using-kibana-and-elastic-search-d04b388a3032

Elastisearch-Hadoop how to do a bulk search in spark program

I am writing a spark program which is basically a RDD of Strings. What i need to to do is basically create a query per string and do the query based on Elastic search index. So essentially Query would differ on string. I wanted to use elasticsearch-hadoop to do the search so i can have optimizations. The RDD can be large and i m looking for any optimizations possible
For Example RDD is List[India, IBM Company , Netflix , Lebron James]. We will create More like this search on all these terms and do search on the Index Wikipedia and get back the results. For example we will create four more like this query for India and IBM and Netflix and Lebron James and get back the hits for them
I do have work around where i can use HTTP Rest Api call with Bulk search to get back the hits , but there i will be doing optimizations on my own . I wanted to see if we can use the spark elastic connector to create queries and do the search in optimized way
This use case is not possible. Elastic search basically assumes one or some more queries, but does not work with n=batch query mode

Kibana: How to visualise based on two fields

I have imported weblogs into Elasticsearch via Logstash. This has completed successfully.
I have a field in the log file (clientip) that is always populated and another field that is sometimes populated (trueclientip). I want to aggregate based on the coalescing of the two; e.g. if trueclientip is not empty then use that otherwise use clientip.
How can I do this with the Visualisation in Kibana? Do I need to generate a scripted field or is there another approach?
Thanks.
Define a scripted field that should have this formula: doc['trueclientip'].value ? doc['trueclientip'].value : doc['clientip'].value and use this in your aggregations.
But, there is a downside to this scripted fields functionality AND the ip type: it seems what you get back from the script is the number itself (which is logic because the scripted fields in Kibana 4 only use Lucene expressions as a language), not the string representation. IPs internally are actually long numbers in Lucene.
For example, 127.0.0.1 is represented internally as 2130706433. And this is what you will see in Visualize.
Is not ideal, indeed, and it would be good to have a more advanced scripting language in scripted fields, but a github issue already exists.

Resources