Using Timelion in ElasticSearch/Kibana 5.0 - elasticsearch

I'm trying to visualize a timeseries in Timelion. I have a few hundred datapoints in elasticsearch with this sort of format - I've manually removed some fields which I never meant to use in the timeseries plot.
"_index": "foo-2016-11-06",
"_type": "bar",
"_id": "7239171989271733678",
"_score": 1,
"_source": {
"timestamp": "2016-11-06T15:27:37.123581+00:00",
"rank": 2,
}
What I want is to quite simply plot the change in rank over time. I found this post Kibana Timelion plugin how to specify a field in the elastic search which seems to describe the same thing and I understand I should be able to just do .es(metric='sum:rank').
My problem is that no matter how I define my timelion query (even just calling .es(*)), I end up just getting a horizontal line where y=0.
timelion
Things I've tried so far:
Changed timefield in timelion.json from #timefield to just timefield
Extending the timeseries window (even into the future)
Set default_index to _all in timelion.json
Queried specific indices that I know contain data
All of them give me the same outcome which you can see in the attached picture. Does anyone have any idea what might be going on here?

Set the timelion.json as above:
{
"quandl": {
"key": ""
},
"es": {
"timefield": "timestamp",
"default_index": "_all",
"allow_url_parameter": false
},
"graphite": {
"url": "https://www.hostedgraphite.com/UID/ACCESS_KEY/graphite"
},
"default_interval": "1h",
"max_buckets": 2000
}
set the granularity to 'Auto' and use the above Timelion query:.es(index='foo-2016-11-06', metric='max:rank').

Related

Misspelling suggestion ("did you mean") with phrase suggest and whitespace correction with Elasticsearch

I use default analyzer "english" for searching documents and it is pretty good.
But also I need "did you mean" results when search query is misspelled OR search by such misspelled prhases.
What analyzers/filters/query do I need to achieve such behaveour?
Source text
Elasticsearch is a distributed, open source search and analytics engine for all types of data,
including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is built
on Apache Lucene and was first released in 2010 by Elasticsearch N.V. (now known as Elastic).
Known for its simple REST APIs, distributed nature, speed, and scalability, Elasticsearch is
the central component of the Elastic Stack, a set of open source tools for data ingestion,
enrichment, storage, analysis, and visualization. Commonly referred to as the ELK Stack
(after Elasticsearch, Logstash, and Kibana), the Elastic Stack now includes a rich collection
of lightweight shipping agents known as Beats for sending data to Elasticsearch.
Search terms
search query => did you mean XXX?
missed letter or something like
Elastisearch => Elasticsearch
distribated => distributed
Apacje => Apache
extra space
Elastic search => Elasticsearch
no space
opensource => open source
misspelled phrase
serach engne => search engine
Your first example of missed letter or something else can be achieved using the fuzzy query and second one using the custom analyzer which uses ngram or edge-ngram tokenizer for examples on it, please refer to my blog on autocomplete.
Adding fuzzy query example on your sample doc
Index mapping
{
"mappings": {
"properties": {
"title": {
"type": "text"
}
}
}
}
Index your sample docs and use below search queries
{
"query": {
"fuzzy": {
"title": {
"value": "distributed"
}
}
}
}
And search res
"hits": [
{
"_index": "didyou",
"_type": "_doc",
"_id": "2",
"_score": 0.89166296,
"_source": {
"title": "distribated"
}
}
]
And for Elasticsearch
{
"query": {
"fuzzy": {
"title": {
"value": "Elasticsearch"
}
}
}
}
And search Result
"hits": [
{
"_index": "didyou",
"_type": "_doc",
"_id": "1",
"_score": 0.8173577,
"_source": {
"title": "Elastisearch"
}
}
]

Elasticsearch, understanding completion suggester

I got the completion suggest working for autocomplete
However I have a question that I can't answer myself
Why are we storing the suggest in a field of the document?
GET /my_index/_search
{
hits: [{
"_id": 1,
"suggest": {
"input": [
"p1",
"p22",
],
"weight": 1
}
}, {
"_id": 2,
"suggest": {
"input": [
"p22",
"p3",
],
"weight": 1
}
}]
}
For autocomplete, don't we just need a list of phrases?
[
"p1",
"p22",
"p3"
]
What do we gain by the association of suggest and the doc?
as in example, multiple docs can have same suggest input , p22 in the example. When I ask for autocomplete for p2 I get two p22.
is there a way of handling this?
There's no other way to store suggestions than storing them in a completion field inside the document itself. This gives you maximum flexibility, because even if two documents have the same or similar suggestions, you can give one a higher weight than the other if you deem necessary.
If you have multiple documents with the same suggestions, you can leverage the skip_duplicates setting and ES will filter out duplicate suggestions from the response.

Count of "actual hits" (not just matching docs) for arbitrary queries in Elasticsearch

This one really frustrates me. I tried to find a solution for quite a long time, but wherever I try to find questions from people asking for the same, they either want something a little different (like here or here or here) or don't get an answer that solves the problem (like here).
What I need
I want to know how many hits my search has in total, independently from the type of query used. I am not talking about the number of hits you always get from ES, which is the number of documents found for that query, but rather the number of occurrences of document features matching my query.
For example, I could have two documents with text a text field "description", both containing the word hero, but one of them containing it twice.
Like in this minimal example here:
Index mapping:
PUT /sample
{
"settings": {
"index" : {
"number_of_shards" : 1,
"number_of_replicas" : 0
}
},
"mappings": {
"doc": {
"properties": {
"name": { "type": "keyword" },
"description": { "type": "text" }
}
}
}
}
Two sample documents:
POST /sample/doc
{
"name": "Jack Beauregard",
"description": "An aging hero"
}
POST /sample/doc
{
"name": "Master Splinter",
"description": "This rat is a hero, a real hero!"
}
...and the query:
POST /sample/_search
{
"query": {
"match": { "description": "hero" }
},
"_source": false
}
... which gives me:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.22396864,
"hits": [
{
"_index": "sample",
"_type": "doc",
"_id": "hoDsm2oB22SyyA49oDe_",
"_score": 0.22396864
},
{
"_index": "sample",
"_type": "doc",
"_id": "h4Dsm2oB22SyyA49xDf8",
"_score": 0.22227617
}
]
}
}
So there are two hits ("total": 2), which is correct, because the query matches two documents. BUT I want to know many times my query matched inside each document (or the sum of this), which would be 3 in this example, because the second document contained the search term twice.
IMPORTANT: This is just a simple example. But I want this to work for any type of query and any mapping, also nested documents with inner_hits and all.
I didn't expect this to be so difficult, because it must be an information ES comes across during search anyway, right? I mean it ranks the documents with more hits inside them higher, so why can't I get the count of these hits?
I am tempted to call them "inner hits", but that is the name of a different ES feature (see below).
What I tried / could try (but it's ugly)
I could use highlighting (which I do anyway) and try to make the highlighter generate one highlight for each "inner match" (and don't combine them), then post-process the complete set of search results and count all the highlights --> Of course, this is very ugly, because (1) I don't really want to post-process my results and (2) I'd have to get all results to do this by setting size to a high enough value, but actually i only want to get the number of results requested by the client. This would be a lot of overhead!
The feature inner_hits sounds very promising, but it just means that you can handle the hits inside nested documents independently to get a highlighting for each of them. I use this for my nested docs already, but it doesn't solve this problem because (1) it persists on inner hit level and (2) I want this to work with non-nested queries, too.
Is there a way to achieve this in a generic way for arbitrary queries? I'd be most thankful for any suggestions. I'm even down for solving it by tinkering with the ranking or using script fields, anything.
Thank's a lot in advance!
I would definitely not recommend this for any kind of practical use due to the awful performance, but this data is technically available in the term frequency calculation in the results from the explain API. See What is Relevance? for a conceptual explanation and Explain API for usage.

Percentage of matched terms in Elasticsearch

I am using elasticsearch to find similar documents. Below is the query I am using:
{
"query": {
"more_like_this":{
"like": {
"_index": "docs",
"_type": "pdfs",
"_id": "pdf_1"
},
"min_term_freq": 1,
"min_doc_freq": 1,
"max_query_terms: 50,
"minimum_should_match": "50%"
}
}
}
I am extracting the text from PDF and storing in my index "docs". Below are the mappings for type "pdfs":
{
"properties": {
"content":{
"type": "string",
"analyzer": "my_analyzer"
}
}
}
In the result sets I am getting similar documents with their scores. Based on what I have read so far it is not possible to calculate percentage similarity based on score so I am not trying to do that. I am trying to figure out if it is possible to know:
"Out of 50 query terms from the source document how many terms are
matched in a document? or percentage of terms matched?"
As you can see that in my query I am specifying minimum_should_match as 50% so I am assuming that elasticsearch is filtering the documents somewhere based on the how much percentage of terms are matched in a document. I want to get that percentage. I am fairly new to elasticsearch. So far I have gone through the documentation but couldn't find out how to do it.
Any pointer/help is appreciated!

Elasticsearch termvector API not working

I've set the mapping the title field for the article type for the testindex1 index as follows:
PUT /testindex1/article/_mapping
{
"article": {
"type": "object",
"dynamic": false,
"properties": {
"title": {
"type": "string",
"store": true,
"term_vector": "with_positions_offsets",
"_index": {
"enabled": true
}
},
}
}
}
omitting the remainder of the mapping specification. (This example and those that follow assume the Marvel Sense dashboard interface.) testindex1 is then populated with articles, including article with id 4540.
As expected,
GET /testindex1/article/4540/?fields=title
produces
{
"_index": "testindex1",
"_type": "article",
"_id": "4540",
"_version": 1,
"exists": true,
"fields": {
"title": "Elasticsearch is the best solution"
}
}
(The title text has been changed to protect the innocent.)
However,
GET /testindex1/article/4540/_termvector?fields=title
produces
No handler found for uri [/testindex1/article/4540/_termvector?fields=title&_=1404765178625] and method [GET]
I've experimented with variants of the mapping specification, and variants of the termvector request, so far to no avail. I've also looked for tips in official and non-official documentation, and on forums that cover Elasticsearch topics, including Stack Overflow. elasticsearch.org looks authoritative. I expect I've misused the termvector API in a way that will be instantly obvious to people who are familiar with it. Please point out my mistake(s). Thanks.
The _termvector api endpoint for returning term vector stats was only added in the 1.0 Beta - you will need to upgrade if you want to use term vectors.
Term Vectors
Note
Added in 1.0.0.Beta1.
Returns information and statistics on terms in the fields of a
particular document as stored in the index.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/docs-termvectors.html

Resources