Why return error result when use _routing field? - elasticsearch

Such as url index-0/_search?routing=24320,i search data from 24230 routing,but the result is
"_index": "index-0",
"_type": "member",
"_id": "40865630",
"_score": 1,
"_routing": "22500",
Why 22500 match the search condition?

What happens is that when specifying ?routing=24320 in your search query, you're basically selecting the single shard on which documents with the routing value of 24320 have been stored.
Now, since your query doesn't specify any other constraints, you're basically getting all documents stored on that shard, which obviously means that you also get documents whose routing value is 22500 (and probably others, too).

Related

How to use Kibana and elastichsearch [7.5.0] to track number of documents containing particular value

I have an index which contains information about some objects. I want to display some of the information on my Kibana's dasboard. Lets assume an object looks as follows:
{
"_index": "obj",
"_type": "_doc",
"_id": "KwDPAHABfo5V345r4IYV",
"_version": 1,
"_score": 0,
"_source": {
"value_1": "some value",
"value_2": "some_other value",
"owner": "jason",
"modified_date": "2020-02-01T12:53:08.210317+00:00",
"created_date": "2020-02-01T12:53:08.243980+00:00"
}
}
I need to show (live) number of objects that has owner: 'UNKNOWN'. Thing is, that this value changes in time. Each change is a new document - they are not being updated. I need to track how many UNKNOWN owners currently I see. Updates (new documents) are being sent to elk in fixed intervals.
When I try to set up a metric, it sometimes shows 0, during the window between one update and another - when there is no documents flowing into elk. How can I make Kibana display only last documents with owner: 'UNKNOWN'?
How can I make Kibana display only last documents with owner: 'UNKNOWN'?
You could set up a data table visualization for that as an alternative to the one-dimensional metric visualization.
This is how I personally would configure the data table:
Set a filter with 'owner(.keyword) is UNKNOWN'.
Use the metric 'Top Hit' on the field created_date (or #timestamp, thats up to you) instead of the count metric.
Set the order to descending based on the timestamp field.
Split the rows (Term Aggregations) for every field you want to display in the rows. This will create 'columns' in your table.
Go to the options tab and enable count on the sum of all rows.
Set an appropriate time interval, e.g. last 1 hour.
This will display all the relevant data of your documents that have the field owner equal to UNKNOWN. Also, you see the ingestion/creation date timestamp of these documents in a descending order. Furthermore, you see the number of documents that match (configured via the options tab as described above).
I hope I could help you.

Elasticsearch: Multiple partial words not scored high enough

so I'm trying to get good search results out of an Elasticsearch installation.
But I run into problems when I'm trying to make a fuzzy search on some very simple data.
Somehow multiple (some of them partial) words are scored too low and only get scored higher, when more letters of the word are present in the search query.
Let me explain:
I have a simple index built with two simple documents.
{
"name": "Product with good qualities and awesome sound system"
},
{
"name": "Another Product that has better acustics than the other one"
}
Now I query the index with this parameters:
{
"query": {
"multi_match": {
"fields": ["name"],
"query": "product acust",
"fuzziness": "auto"
}
}
}
And the results look like this:
"hits": [
{
"_index": "test_products",
"_type": "_doc",
"_id": "1",
"_score": 0.19100355,
"_source": {
"name": "Product with good qualities and awesome sound system"
}
},
{
"_index": "test_products",
"_type": "_doc",
"_id": "2",
"_score": 0.17439455,
"_source": {
"name": "Another Product that has better acustics than the other one"
}
}
]
As you can see the product with the ID 2 is scored less than the other product even though it has possibly more similarity with the given query string than the other product because it has 1 full word match and 1 partial word match.
When the query would looke like "product acusti" the results would start to behave correctly.
I've already fiddled around with bool search but the results are identical.
Any ideas how I can get the wanted results back faster than having to have almost the whole second word typed in?
As far as I know, Elasticsearch does not do partial word matching by default, so the term acust is not matched in neither of your documents.
The reason you are getting a higher score in the first document is that your matched term, product, appears in a shorter sentence:
Product with good qualities and awesome sound system
But as for the second document, product appears in a longer sentence:
Another Product that has better acoustics than the other one
So your second document is getting a lower score because the ratio of your match term (product) to the number of terms in the sentence is lower.
In other words in has lower Field length normalization:
norm = 1/sqrt(numFieldTerms)
Now if you you want to be able to do partial prefix matching, you need to tokenize your term into ngrams, for example you can create the following ngrams for the term "acoustics":
"ac", "aco", "acou", "acous", "acoust", "acousti", "acoustic", "acoustics"
You have 2 options to achieve this, see the answer by Russ Cam on this question
use Analyze API
with an analyzer that will tokenize the field into tokens/terms from
which you would want to partial prefix match, and index this
collection as the input to the completion field. The Standard analyzer
may be a good one to start with...
Don't use the Completion Suggester here and instead set up your field (name) as a text datatype with
multi-fields
that include the different ways that name should be analyzed (or not
analyzed, with a keyword sub field for example). Spend some time with the Analyze API to build an analyzer that will
allow for partial prefix of terms anywhere in the name. As a start,
something like the Standard tokenizer, Lowercase token filter,
Edgengram token filter and possibly Stop token filter would get you
running...
You may also find this guide helpful.

How to plot aggregated data in kibana

I'm a newbie to kibana.
I have following data stored in ES:
{
"_index": "test",
"_type": "impressions",
"_id": "AVZ4QLgkLqvQLIzbvF4e",
"_version": 1,
"_score": 1,
"_source": {
"campaign_id": "1011",
"count": 691,
"played_dt": "2016-01-02"
}
}
So, basically I have counts per campaign_id which is already aggregated data.
I want a simple bar chart which plots counts per campaign_id where X axis is campaign_id and Y axis is it's count.
I'm getting hits for that specific campaign_id as unique count rather than the actual value in count field.
Thanks in advance!
Go to "Visualize" tab, select "Vertical bar chart":
Choose new search and select appropriate index. Now you probably want to visualize your data in time. So, on X axis use "Date histogram" and select your time filed (played_dt).
Now you can use e.g. "Split bars", use splitting by terms and select campaign_id field.

How to configure index pattern in Kibana

I have connected Kibana to my ES instance.
cat/indices returns:
yellow open .kibana 1 1 1 0 3.1kb 3.1kb
yellow open tests 5 1 413042 0 3.4gb 3.4gb
However I get the following on the kibana configuration screen. What am I missing?
Update:
My sample document looks like this
"_index": "tests",
"_type": "test7",
"_id": "AVGlIKIM1CQ8BZRgLZVg",
"_score": 1.7840601,
"_source": {
"severity": "ERROR",
"code": "CODE,
"message": "MESSAGE",
"environment": "TEST",
"error_uuid": "cbe99080-0bf3-495c-a417-77384ba0fd39",
"correlation_id": "cf5a1fd5-4fd2-40bb-9cdf-405b91dcbd6f",
"timestamp": "2015-11-20 15:24:39.831"
Disable the option Use event times to create index names and put the index name instead of the pattern (tests).
The option you are trying to use is used when you have index names based on timestamp (imagine you create a new index per day with tests-2015.12.01, tests-2015.12.02...). It's quite clear if you read the message when you enable that option:
Patterns allow you to define dynamic index names. Static text in an index name is denoted using brackets. Example: [logstash-]YYYY.MM.DD. Please note that weeks are setup to use ISO weeks which start on Monday
EDIT: The problem with an empty dropdown in the time-field name is because you don't have any field with date type in the mapping of your index. You can actually check if you do GET /<index-name>/_mapping?pretty, that the timestamp is a "string" type and not "date". This happens because the format didn't match the regex for the date detection (yyyy/MM/dd HH:mm:ss Z||yyyy/MM/dd Z). To solve this:
You can change the format of the timestamp you are inserting to match the default regex.
You can modify the dynamic_date_format property and put a regex that matches the current format of your timestamp.
You can set an index template and set the type "date" for the "timestamp" field.
In any of the cases, you would need to delete the index and create a new one or reindex the data.

ElasticSearch - how to SUM function_score by field

I have a query that is using function_score to rank the results. Here is a sample of what is returned:
{
"_index": "clone",
"_type": "authEvent",
"_id": "6431823",
"_score": 4.8,
"fields": {
"authInput.uID": "MPXWDKW2P",
"authResult.productValue": 1,
"authInput.userName": "F936F3AA-E26C-48DB-BDBC-44956B634260",
"authResult.authEventDate": "2014-02-27T09:29:30.703125-06:00",
"authResult.rulesFailed": [
"AuthCountByUser"
]
}
}
What I want to is take the results and run the equivalent of this SQL statement:
SELECT TOP 20 "authInput.userName", SUM("_score")
FROM foo
GROUP BY "authInput.userName"
ORDER BY SUM("_score") DESC
How can I do this with ES?
NOTE: I'm using ES 0.9x, we will be moving to 1.0.0 soon but we have not yet.
Use a facet query to get the total of the amount returned in the query where the facet contains the field where you need the count

Resources