Kibana. Data tables. Exclude terms depending on the length - elasticsearch

I'm storing sentences in Elasticsearch.
Example:
this is a sentence
this is a second sentence
And I want to show a data table with the most used terms in my Kibana 4.3.1, selecting:
Metric = count
Split rows
Aggregation = terms
Field = input
Order by = metric count
Order descending. Size 5
This is what I'm getting in the table:
this 2
is 2
a 2
sentence 2
second 1
And I want to remove the short words, with less than 3 chars. In this example, "is" and "a".
How can achieve this?
Thanks!

It works adding this Exclude Pattern:
[a-zA-Z0-9]{0,3}

Related

Elastic Index. Enrich document based on aggregated value of a field from the same index

Is it possible to enrich documents in the index based on the data from the same index ? Like if the source index has 10000 documents, and I need to calculate aggregated sum from each group of those documents, and then use the sum to enrich same index....
Let me try to explain. My case can be simplified to the one as below:
My elastic index A has documents with 3 fields:
timestamp1 identity_id hours_spent
...
timestamp2 identity_id hours_spent
Every hour I need to check the index and update documents with SKU field. If the timestamp1 is between [date1:date2] and total amount of hours_spent by indetity_id < a_limit I need to enrich the document with additional field sku=A otherwise with field sku=B.

Aggregation on time interval

Let's say document has
{startDate,endDate}
fields among other fields.
And now we want to create range aggregation based on these fields.
So document must appear in each bucket that overlaps with start-end interval.
I found that aggregation is not supported yet on Range Fields => https://github.com/elastic/elasticsearch/issues/34644
As a workaround => I can use script and since I know size(time interval) of buckets I could generate array of values between start-end dates
aggregation_field : [startDate + bucketSize, startDate + bucketSize * 2, .... endDate]
But in some cases this array could be huge.
Are there other workarounds ?
Thanks !

Get final score by sum of multiple fields boost

I want to build a search that prioritizes the amount of field matches instead of one field over another. All the fields would have the same boost value and the final score should be calculated by sum matched fields boost. If the full text matches two fields and each field have boost 1, the final score would be 1 + 1 = 2.
Let's use an example:
class Event < ApplicationRecord
searchable do
text :title
text :category
text :artist_name
end
end
Suppose I have two events:
Event 1: Name: "Christmas festival" Artist name: "AC/DC"
Event 2: Name: "New year festival" Artist name: "Queen"
So, if the user searches just "festival", both events are returned with the same score because it matches both event's name.
But, if the user searches "festival AC/DC", I want to return Event 1 in the first place or just Event 1 because it matches the event name (festival) and the artist name (AC/DC). While Event 2 just matches the event name (festival). Event 1 score should be 2 while Event 2 score should be 1.
Any suggestion about How can I do that? Is this even possible?
It seems you are mixing up scoring and boosting, I think your question should be titled Compute total score by summing each field score (regardless of the boosts).
Field scores are computed based on field matches, and they can be applied arbitrary set of additive or multiplicative boosts (functions and/or matching subqueries). But in the end what you want is to compute the global score by summing each field score, not the boosts themselves.
DisMax query parser for example precisely allows you to control how the final score is computed using the tie (Tie Breaker) parameter :
The tie parameter specifies a float value (which should be something
much less than 1) to use as tiebreaker in DisMax queries.
When a term from the user’s input is tested against multiple fields,
more than one field may match. If so, each field will generate a
different score based on how common that word is in that field (for
each document relative to all other documents). The tie parameter lets
you control how much the final score of the query will be influenced
by the scores of the lower scoring fields compared to the highest
scoring field.
A value of "0.0" - the default - makes the query a pure "disjunction
max query": that is, only the maximum scoring subquery contributes to
the final score. A value of "1.0" makes the query a pure "disjunction
sum query" where it doesn’t matter what the maximum scoring sub query
is, because the final score will be the sum of the subquery scores.
Typically a low value, such as 0.1, is useful.
In your situation you need a disjunction sum query so you might want to set the tie to 1.0.

Elasticsearch Space independent search

I have a elasticsearch instance setup with default index. 10000's of text document has been indexed in them and I want to perform space independent query. I want to perform searches like below cases.
Case 1 space in index no space in query:
index data : 123 456 43
query data :12345643
Case 2 space in query no space in index:
index data : 12345643
query data : 123 456 43
As you can see above query will not since query data is one term and index data is three terms. Vice-versa for case 2
Case 3 partial matches with space difference:
index data : 12345643
query data : 123 4 5
Case 4 partial matches with additional data(trailing/leading) and space difference:
index data : 12345643
query data : 123 4 54
index data : 1234564343
query data : 123 4 5
I thought of creating a index with space removed and index complete content as one word but I don't know how it would work for case3 and case4. I also don't know the fallbacks of this method.
I would remove spaces and create a custom index-time analyzer with (edge-)ngrams (either tokenizer or token filter).
You can also use an edge-ngram tokenizer/token-filter at search time if you want to match prefixes and suffixes.

How to use the elasticseach java api for dynamic searches?

So I'm trying to use elasticsearch for dynamic query building. Imagine that I can have a query like:
a = "something" AND b >= "other something" AND (c LIKE "stuff" OR c LIKE "stuff2" OR d BETWEEN "x" AND "y");
or like this:
(c>= 23 OR d<=43) AND (a LIKE "text" OR a LIKE "text2") AND f="text"
Should I use the QueryBuilder or the FilterBuilder, how do you match both? The official documentation says that for exact values we should use the filter approach? I assume I should use filters for equal comparisons? what about dates and numbers? Should I use the Filter or Query?
For the Like/Equals for the number/number problem I tried this:
#Field(type = String, index = FieldIndex.analyzed, pattern = "(\\d+\\/\\d+)|(\\d+\\/)|(\\d+)|(\\/\\d+)")
public String processNumber;
The pattern would deal with the structure number + slash + number, but also number and number + slash.
But when using either the term filter or the match_query I can't get only hits with the exact structure like 20/2014, if I type 20 I would still get hits on the term filter.
Query is the main component when you search for something, it takes into consideration ranking and other features such as stemming, synonyms and other things. Filter, on the other hand, just filters the result set you get from your query.
I suggest that if you don't care about the ranking use filters because they are faster. Otherwise, use query.

Resources