Fuzzy search by default in kibana - elasticsearch

I'm trying to make some fuzzy search in kibana using their IHM (ideally by default). I know how to do such a request in the DEV TOOLS section. The problem is to have that option by default. Is it possible? I'd like also to save all the requests that I entered (by default).
Please find below the search I try to incorporate and get the results.
GET /_search
{
"query": {
"fuzzy" : { "NOM" : "COUT" }
}
}
PS: I know that there is a Lucene Syntax for sophisticated requests.
Thanks a lot for your help !

Related

Elasticsearch - List all sources sending messages to ES

I am trying to get a list which shows me all sources ES is receiving messages from. I am pretty new with this topic and trying to get deeper into it. I am searching basically for a solution to see the total amount of sources sending logs to my central logging solution and in best case also provided my a list with the source names.
Does anyone have an idea how to get such information querying Elasticsearch?
Yes, this is possible, though the solution depends on how your data looks.
Users typically index data in Elasticsearch so that it contains more than just the raw log lines. This is done automatically if you're using Filebeat. Otherwise, you'd do something (add a field using Logstash, rely on a host field in syslog, etc) to ensure you have a field that contains your "source" identifier:
{
"message": "my super valuable logline",
"source": "my_kinda_awesome_app"
}
given ^^ you can identify all sources (and record counts!) with a terms aggregation like:
{
"aggs": {
"my_sources": {
"terms": { "field": "source" }
}
}
}
Kibana makes this all easier since you don't need to know/write ES queries and can do stuff visually.

How to define a query timeout for a spring data elastic search query?

My question is more in general assuming i have a simple query like this to elastic search
Page<MyEntity> findAll(Pageable pageable);
I want to be able to set a timeout for this query for instance so it doesn't hang forever, although I read the documentation I didn't see anything clear about how to do it.
Is the any way to do it? a way to set a timeout for Spring-data-elasticsearch queries that I can make sure that nothing will get for too long?
One way of achieving a 'timeout' in a search request query is using the 'timeout' parameter in the query itself. here
Lets assume we want to perform a full-text 'match query', we can add 'timeout' before the query itself:
{
"timeout": "1ms",
"query": {
"match" : {
"description" : "This is a fullText test"
}
}
}
You will have to use Elasticsearch time units as mentioned here and ship them as String values.
In your case - I don't see any way to achieve this using spring-data-es repository, but - you can add custom functionality to your repository and use the ElasticsearchIndexTemplate with the matchAllQuery() (java elastic api),
Something like that (haven't tested it):
nodeEsTemplate.getClient().prepareSearch("test-index")
.setQuery(QueryBuilders.matchAllQuery())
.setTimeout(TimeValue.timeValueMillis(1))
.execute().actionGet();
As nodeEsTemplate is of type ElasticsearchIndexTemplate and assuming you have created a custom findAllWithTimeOut method in your repository class.

Using Elastic Query DSL in Kibana Discover to enable more_like_this etc

The Kibana documentation says:
When lucene is selected as your query language you can also submit
queries using the Elasticsearch Query DSL.
However, whenever I try to enter such a query in the Discover pane, I get a parse error. These are queries that work fine in the Dev Tools pane.
For example, if I try even a simple query like this:
{"query":{"match_phrase":{"summary":"stochastic noise"}}}
I get the following error:
Discover: [parsing_exception] no [query] registered for [query], with { line=1 & col=356 }
Error: [parsing_exception] no [query] registered for [query], with { line=1 & col=356 }
at respond (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:111:161556)
at checkRespForFailure (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:111:160796)
at http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:105:285566
at processQueue (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:132456)
at http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:133349
at Scope.$digest (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:144239)
at Scope.$apply (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:147018)
at done (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:100026)
at completeRequest (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:104697)
at XMLHttpRequest.xhr.onload (http://<mydomain>:5601/bundles/vendors.bundle.js?v=16602:58:105435)
(I've removed my domain above and replaced with <mydomain>)
The above query works fine and returns results using cURL on the command line, or using
GET /_search
{
"query": {
"match_phrase": {
"summary": "stochastic noise"
}
}
}
In the Dev Tools console.
I'm hoping to use the more_like_this query from the Discover panel, so (I think) I will need to use the Query DSL and not just use the straight lucene query syntax. But if there's a way to use the specialty queries like that using straight lucene (or kuery) that would be great.
The reason is simply because the input box only supports whatever you include inside the query section, so if you input this, it will work:
{"match_phrase":{"summary":"stochastic noise"}}
It makes sense if you think about it, i.e. the aggs section makes no sense in the Discover pane and the from/size attributes are already taken care of by the default settings.
If you look at the full query DSL, you'll see that there are several sections: query, aggs, from, size, _source, highlight, etc. In the Discover pane, you should only specify whatever goes into the query section, nothing else.

Cannot get Elasticsearch Highlight to work

I am working on a project that involves Elasticsearch. So far I can get most function to work except highlight. I am using Laravel + Elasticsearch official PHP client.
Previously I thought it was a problem of my PHP code, and asked a question here:
highlight field missing from Elasticsearch results, PHP
Later when I tried with elasticsearch-head in browser, I still cannot see highlight field in results, so I guess there must be something wrong with either my settings of elasticsearch or the way I indexed the documents.
Here is the query I entered into elasticsearch-head:
{
"query" : {
"match" : {
"combined" : "DNA"
}
},
"highlight": {
"fields" : {
"combined" : {}
}
}
}
And I don't see "highlight" after "_source" in hits returned by elasticsearch.
What might I did wrong here?
Please advise,
Thanks.
Update: I'm running Elasticsearch 2.3.3, on Ubuntu 16.04 LTS desktop, JDK 1.8.
Documentation says "store" in mapping needs to be set true. I did so, and re-indexed a bunch of documents. This didn't fix the problem.
OK, after stopping and restarting elasticsearch service, my code started to work as intended. I got "highlight" field in results.
The issue is that I need to set "store" to true. Everything else being equal, including the following line
"store" => true
in mapping ensured "highlight" appear in the results I have. Vice versa.
Not sure why doing this earlier didn't solve my problem.

Carrot2+ElasticSearch Basic Flow of Information

I am using Carrot2 and ElasticSearch. I has elastic search server running with a lot of data when I installed carrot2 plugin.
Wanted to get answers to a few basic questions:
Will clustering work only on newly indexed documents or even old documents?
How can I specify which fields to look at for clustering?
The curl command is working and giving some results. How can I get the curl command which takes a JSON as input to a REST API url of the form localhost:9200/article-index/article/_search_with_clusters?.....
Appreciate any help.
Yes, if you want to use the plugin straight off the ES installation, you need to make REST calls of your own. I believe you are using Python. Take a look at requests. It is a delightful REST tool for python.
To make POST requests you can do the following :
import json
url = 'localhost:9200/article-index/article/_search_with_clusters'
payload = {'some': 'data'}
r = requests.post(url, data=json.dumps(payload))
print r.text
Find more information at requests documentation.
Will clustering work only on newly indexed documents or even old
documents?
It will work even on old documments
How can I specify which fields to look at for clustering?
Here's an example using the shakepspeare dataset. The query is which of shakespeare's plays are about war?
$ curl -XPOST http://localhost:9200/shakespeare/_search_with_clusters?pretty -d '
{
"search_request": {
"query": {"match" : { "_all": "war" }},
"size": 100
},
"max_hits": 0,
"query_hint": "war",
"field_mapping": {
"title": ["_source.play_name"],
"content": ["_source.text_entry"]
},
"algorithm": "lingo"
}'
Running this you'll get back plays like Richard, Henry... The title is what carrot2 uses to develop the cluster names and the text entry is what it uses to make the clusters.
The curl command is working and giving some results. How can I get the
curl command which takes a JSON as input to a REST API url of the form
localhost:9200/article-index/article/_search_with_clusters?.....
Typically use the elasticsearch client libraries for your language of choice.

Resources