Grafana + elastic, find ratio of queries - elasticsearch

I'm trying to find a ratio of users that have some attribute versus those that do not.
I assume, if it is possible at all, I'd have to use the inline scripting option. I am unable to figure out (or find good documentation), what are the limitations of inline scripts. Can it be used for mapreduce?
Has someone figured out, how to find ratio between counts in grafana, using elastic as the datasource?

Related

Does elastic search use previous search frequencies?

Does elastic search utilize the frequency of a previously searched document. For example there are document A and document B. Both have similar score in terms of edit distances and other metrics however document A is very frequently searched and B is not. Will elastic search score A better than B. If not, how to acheive this?
Elasticsearch does not change score based on previous searches in its default scoring algorithm. In fact, this is really a question about Lucene scoring, since Elasticsearch uses it for all of the actual Search logic.
I think you may be looking at this from the wrong viewpoint. Users search with a query, and Elasticsearch recommends documents. You have no way of knowing if the document it recommended was valid or not just based on the search. I think your question should really be, "How can I tune Search relevance in an intelligent way based on user data?".
Now, there are a number of ways you can achieve this, but they require you to gather user data and build the model yourself. So unfortunately, there is no easy way.
However, I would recommend taking a look at https://www.elastic.co/app-search/, which offers a managed solution with lots of custom relevant tuning which may save you lots of time depending on your use case.

Can ElasticSearch "explain" search option be used for all requests but not for debugging?

I would like to retrieve information about what exact terms match the search query.
I found out that this problem was discussed in the following topic: https://github.com/elastic/elasticsearch/issues/17045
but was not resolved "since it would be too cumbersome and expensive to keep this information around" (inside of ElasticSearch context).
Then I discovered that using "explain" option in search request I get the detailed information about score calculation including matching terms.
I made some kind of performance test to compare search requests with explain option set to true and without explain option. And this test doesn't show significant impact of explain option usage.
So I'm wondering if this option can be used for production system? It looks like some kind of workaround but seems it's working.
Any considerations about this?
First of all, you didn't include the details of your performance test, so it's really difficult to know and say whether it would make a performance impact or not and again it's relative to:
What is your cluster configuration, total nodes, size, shards, replicas, JVM, no of documents, size of documents?
Index configuration ie, for which index you are using the explain API, again is it a ready or write-heavy index, how many docs, during peak time how it performs, etc.
Apart from that, in An application, there will be only certain types of queries although search term might change, the underlying concept of whether it matches or not them can be understood by samples itself.
I've worked with search systems extensively and I use explain API a lot but only on samples and not on all queries and have not seen this happening anywhere.
EDIT:- Please have a look at named queries which can also be used to check which part of your queries matched the search results and more info on this official blog

Build an inverted index in distributed environment

What tools/libs/platforms would you use if you had to build a distributed inverted index from scratch? elasticseach (I need partial TF with dates constraints) only partially does what I need it, and thinking about building an inverted index using hbase, but wondering if there are some more sane choices (I will not fit all into memory, and will initially looking into caching).
Your requirements still sound pretty vague to me, so some additional detail would be helpful in providing a better answer.
Solr Cloud may be a good option if you need support for faceting and fuzzy term matching. Solr Cloud is simply the distributed configuration of Solr. It's a bit more tedious to setup than elasticsearch but still a very powerful and popular tool.
If you're not already using HBase I'm not sure I'd recommend introducing it just for the sole purpose of creating an index.
Could probably give you a better answer if I understood your use case and current environment better.

Can anyone point me toward a content relevance algorithm?

A new project with some interesting requirements has arrived on my desk. I need to develop a searchable directory of businesses, with a focus on delivering relevant results based on arbitrary search queries. The businesses can be of any niche; there's no one area that is more represented than another.
When googling for things like "search algorithm" or "content relevance algorithm," all I get are references to Google's "Mystical Algorithm of the Old Gods" and SEO firms.
Does the relevance value of MySQL's full text Match() function have what it takes for the task? I've never used it, but I'm definitely going to do some testing. Also, since this will largely be a human edited directory, I can assume that we can add weighted factors like tagging and categories. What would be a good way to combine these factors with MySQL's Match() relevancy?
I'm also open to ideas that I've not discussed here.
For an example of information retrieval based techniques lookup TF-IDF or BM25.
For machine learning based techniques, lookup RankNet and its variants from MSR.
If you have hand edited data, have a look at Oracle text search. In one of my previous projects we had some good results.
I was not directly involved in the database setups, but I know that the results were very welcome. (Before this they had just keyword based search).
Use a search engine like Solr to index the data. You can still use MySql to hold the data, but for searches use a search engine.

Indexing algorithms to develop an app like google desktop search?

I want to develop google desktop search like application, I want to know that which Indexing Techniques/ Algorithms I should use so I can get very fast data retrival.
In general, what you want is an Inverted Index. You can do the indexing yourself, but it's a lot of work to get right - you need to handle stemming, stop words, extending the posting list to include positions in the document so you can handle multi-word queries, and so forth. Then, you need to store the index, probably in a B-Tree on disk - or you can make life easier for yourself by using an existing database for the disk storage, such as BDB. You also need to write a query planner that interprets user queries, performs query expansion and converts them to a series of index scans. Wikipedia's article on Search Engine Indexing provides a good overview of all the challenges, too.
Or, you can leverage existing work and use ready-made full text indexing solutions like Apache Lucene and Compass (which is built on Lucene). These tools handle practically everything detailed above (and more), which just leaves you writing the tool to build and update the index by feeding all your documents into Lucene, and the UI to allow users to search it.
The Burrows-Wheeler transform, used to compress data in bzip2, can be used to make substring searching of text a constant time function.
http://en.wikipedia.org/wiki/Burrows-Wheeler_transform
I haven't seen a simple introduction online, but here is a lot of detail:
http://www.ddj.com/architect/184405504

Resources