Elastic Search query analyzing - elasticsearch

I want to see all the tokens is generated from 'match' text
I am wondering to know is there any specific file or capability to show details of query executing in elastic search or another way to see what is generated as a sequence of tokens when I am using 'match level' queries?

I did not use the log file to see which tokens are generated in the time of 'match' operation. Instead, I used _analayze endpoint.
Better to say if you want to use analyser of the specific index (in the case of different indices that each of them using its own customised analyser) put the name of the index in the URL:
POST /index_name/_analyze
{
"text" : "1174HHA8285M360"
}
This will use the default analyser defined in that index. And if we have more than one analyser in one index we can specify it in the query just as follow:
POST /index_name/_analyze
{
"text" : "1174HHA8285M360",
"analyzer" : "analyser_name"
}

Related

Elasticsearch: How does search work when using combination of analyzers?

I'm a novice to Elasticsearch (ES), messing around with the analyzers. As the documentation states, the analyzer can be specifed "index time" and "search time", depending on the use case.
My document has a text field title, and i have defined the following mapping that introduces a sub-field custom:
PUT index/_mapping
{
"properties": {
"title": {
"type": "text",
"fields": {
"custom": {
"type": "text",
"analyzer": "standard",
"search_analyzer":"keyword"
}
}
}
}
}
So if i have the text : "email-id is someid#someprovider.com", the standard-analyzer would analyze the text into the following tokens during indexing:
[email, id, is, someid, someprovider.com].
However whenever I try to query on the field (with different variations in query terms) title.custom, it results in no hits.
This is what I think is happening when i query with the keyword: email:
It gets analyzed by the keyword analyzer.
The field title.custom's value also analyzed by keyword analyzer (analysis on tokens), resulting in same set of tokens as mentioned earlier.
An exact match should happen on email token, returning the document.
Clearly this is not the case and there are gaps in my understanding.
I would like to know what exactly is happening during search.
On a generic level, I would like to know how the analysis and search happens when combination of search and index analyzer is specified.
search_analyzer is set to "keyword" for title.custom, making the whole string work as a single search keyword.
So, in order to get a match on title.custom, it is needed to search for "email-id is someid#someprovider.com", not a part of it.
search_analyzer is applied at search time to override the default behavior of the analyzer applied at indexing time.
Good question, but to make it simple let me explain one by one different use cases:
Analyzers plays a role based on
Type of query (match is analyzed while term is not analyzed query).
By default, if the query is analyzed like match query it uses the same analyzer on the search term used on a field that is used at index time.
If you override the default behavior by specifying the search_analyzer on a field that at query time that analyzer is used to create the tokens which will be matched with the tokens generated depends on the analyzer(Standard is default analyzer).
Now using the above three points and explain API you can figure out what is happening in your case.
Let me know if you need further information and would be happy to explain further.
Match vs term query difference and Analyze API to see the tokens will be helpful as well.

elasticsearch wild card query not working

I seem to be running into a peculiar issue when I run my query with the match directive as below I get a hit
{
"query":
{"match": {
"value.account.names.lastName" : "*GUILLERMO*"
}
}
}
Now when I use the query with the wild card character such as below I don't get a hit.
{
"query":
{"wildcard": {
"value.account.names.lastName" : "*GUILLERMO*"
}
}
}
I am really lost as to what the issue maybe. Many thanks in advance for any input
Assuming you are trying to run wildcard query against analyzed field the behavior of Elasticsearch is totally correct. As Elasticsearch documentation states wildcard query operates on the terms level. When you index document with field name that contains string "Guillermo del Toro" value of that field will be lowercased and split into three tokens: "guillermo", "del" and "toro". Then when you run wildcard query *GUILLERMO* against name field Elasticsearch compares query string as it is with every single token trying to find a match. Here you will not get a hit just because of your query string is in uppercase and analyzed token is in lowercase.
Running wildcard queries against analyzed field is probably a bad idea but if it is strongly required I would recommend to use built-in name.keyword field instead of just name field (but again you will face a problem of case sensitivity). Better solution is to create your own lowercased not-analyzed field for that purpose.

Changing field properties

I am using packetbeat to monitor mysql port on 3306 and it is working very well.
I can easily search for any word on discovery tab. For e.g.
method:SET
This works as expected. But If I change it to
query:SET
then it does not return the documents with the word "SET" in query field. Is the query field indexed differently?
How do I make "query" field searchable?
Update:
Is this because of parameter "ignore_above" that is used for all string fields? I checked the mapping using this API...
GET /packetbeat-2018.02.01/_mapping/mysql/
How do I remove this restriction and make all future beats to index query field?
Update 2:
If I mention the entire string in the search based on "query" field, it works as expected...
query:"SELECT name, type, comment FROM mysql.proc WHERE name like 'residentDetails_get' and db <=> 'portal' ORDER BY name, type"
This returns all 688 records in the last 15 minutes. When I search the following, I expect to get more...
query:"SELECT"
But I do not get a single record. I guess this is because the way document is indexed. I will prefer to get back equivalent of SQL : query like '%SELECT%'
That's the correct behavior, given the query and the mapping of the field. And it's not about the 1024 limit. You can either omit query: part so that Elasticsearch will use the _all field (which will be removed in the near future) but here it depends on the version of the Stack you use.
Or, better and more correct approach, is to configure the query field differently in the packetbeat template (so that next indices will use the new mapping) to be like this:
"query": {
"type": "text",
"fields": {
"raw": {
"type": "keyword",
"ignore_above": 1024
}
}
}
The main idea is that ES is not splitting the values in the query field (since it's keyword) and you need a way to do this. You could use wildcards, but ES doesn't like them (especially the leading wildcards) and you could have performance issue when running such a query. The "correct" approach from ES point of view is the one I already mentioned: make the field analyzed, keep a raw version of it (for sorting and aggregations) and the simple version of it for searches.
The query field of packetbeat is declared as "keyword". Therefore you can search the entire query only. For e.g.
query: "select * from mytable"
But what if we need to search for query: "mytable" ?
You need to make the query field searchable by modifying fields.yml file. Add the type:text parameter to query field of MySQL section of fields.yml file found in /etc/packetbeat
The relevant section of the file will look like this...
- name: query
type: text
description: >
The query in a human readable format. For HTTP, it will typically be
something like `GET /users/_search?name=test`. For MySQL, it is
something like `SELECT id from users where name=test`.

combine fields of different documents in same index

I have 2 fields type in my index;
doc1
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
}
doc2
{
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
now I need such a query that filter in same url field and returns fields both of documents:
result:
{
"category":"15",
"url":"http://stackoverflow.com/questions/ask"
"requestsize":"231",
"logdate":"22/12/2012",
"username":"mehmetyeneryilmaz"
}
The results given by elasticsearch are always per document, means that if there are multiple documents satisfying your query/filter, they would always appear as a different documents in the result and never merged into a single document. Hence merging them at client side is the one option which you can use. To avoid getting complete document and just to get the relevant fields, you can use "fields" in your query.
If this is not what you need and still needs narrowing down the result from the query itself, you can use top hit aggregations. It will give you the complete list of documents under a single bucket. But it would also have source field which would contain the complete documents itself.
Try giving a read to page:
https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-top-hits-aggregation.html

ElasticSearch: How to specify specific fields to search at?

Right now in my mapping, I am setting "include_in_all" to true, which means all the fields are included in _all field.
However, when I am searching, instead of wasting space, and putting everything in the _all field, I want to specify the specific fields to certain for (and taking into account the boost scores in the mapping).
How do I create a query that tells Elastic Search to only look at specific fields(not just 1) and take into account the boosting I gave it during my mapping?
Start with a multi_match query. It allows you to query multiple fields, giving them different weights, and it's usually the way to go when you have a search box.
{
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^2", "message" ]
}
}
The query_string is more powerful but more dangerous too since it's parsed and can break. Use it only if you need it.
You don't need to keep data in _all field to query for a field.
You can use query_string or bool queries to search over multiple fields.

Resources