Searching or filtering on a multi value field - elasticsearch

I have a MySQL database with a table that contains 2 importants fields title and age_range.
That table saves documents like this '45;60' for documents designed for users between 45 and 60 years old, '18;70' for users between 18 and 70 years old and so on...
Now I would like to fire the query 'test' on the field title with the filter '18;50' for the field age_range that will return all documents matching 'test' with the age range field contained in this interval including the 2 cases above for example.
For instance, I use Logstash to index my data.
How can I achieve this?
Any treatment to do while indexing my data with logstash?
Any filter, tokenizer to use while indexing using ES analyzer?
Thank you in advance

You can split the data as two fields with grok filter. To ship data to Elasticsearch, you can use logstash jdbc_streaming input and elasticsearch output firstly. And you configure your input like below:
input {
jdbc_streaming {
# Configuration of jdbc
# https://www.elastic.co/guide/en/logstash/current/plugins-filters-jdbc_streaming.html
}
}
filter {
# Split the field as separeted two fields
grok {
match => { "age_range" => "%{NUMBER:age_range_top};%{NUMBER:age_range_bottom}" }
}
}
output {
elasticsearch {
# elasticsearch output configuration
}
}
Analysis depends on your search method. How can you want to search these fields. Range fields is necessary default one if you want to do only range filter. But you should do some work about title. For example you can follow this example to handle autocomplete.

Related

Logstash Elasticsearch Output with specific fields to index

Is there possibility for logstash to ingest whole document to one index and specific fields to other index in the elastic search output.
Example: If i have 10 fields coming in from input , can i write all the fields to one index and some specific fields to other index.
my current logstash is like
input{
kafka input
}
filter{
fingerprint id will be generated using some fields
}
output{
elastic{
whole document should go into this index
}
elastic{
Only the fingerprint field should go into this index
}
}

elastic stack : i need set Time Filter field name with another field

i need read messages(content is logs) from rabbitMq by logstash and then send that to elasticsearch for make visualize monitoring in kibana. so i wrote input for read from rabbitmq in logstash like this:
input {
rabbitmq {
queue => "testLogstash"
host => "localhost"
}
}
and i wrote output configuration for store in elasticsearch in logstash like this:
output {
elasticsearch{
hosts => "http://localhost:9200"
index => "d13-%{+YYYY.MM.dd}"
}
}
Both of them are placed in myConf.conf
In the content of each message, there is a Json that contains the fields like this:
{
"mDate":"MMMM dd YYYY, HH:mm:ss.SSS"
"name":"test name"
}
But there are two problems. First, there is no date field in the field of creating a new index(Time Filter field name). Second, I use the same timestamp as the default #timestamp, this field will not be displayed in the build type of graphs. I think the reason for this is because of the data type of the field. The field is of type date, but the string is considered.
i try to convert value of field to date by mutate in logstash config like this:
filter {
mutate {
convert => { "mdate" => "date" }
}
}
Now, two questions arise:
1- Is this the problem? If yes What is the right solution to fix it?
2- My main need is to use the time when messages are entered in the queue, not when Logstash takes them. What is the best solution?
If you don't specify a value for #timestamp, you should get the current system time when elasticsearch indexes the document. With that, you should be able to see items in kibana.
If I understand you correctly, you'd rather use you mDate field for #timestamp. For this, use the date{} filter in logstash.

Logstash doc_as_upsert cross index in Elasticsearch to eliminate duplicates

I have a logstash configuration that uses the following in the output block in an attempt to mitigate duplicates.
output {
if [type] == "usage" {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usage-%{+YYYY-MM-dd-HH}"
document_id => "%{[#metadata][fingerprint]}"
action => "update"
doc_as_upsert => true
}
}
}
The fingerprint is calculated from a SHA1 hash of two unique fields.
This works when logstash sees the same doc in the same index, but since the command that generates the input data doesn't have a reliable rate at which different documents appear, logstash will sometimes insert duplicates docs in a different date stamped index.
For example, the command that logstash runs to get the input generally returns the last two hours of data. However, since I can't definitively tell when a doc will appear/disappear, I tun the command every fifteen minutes.
This is fine when the duplicates occur within the same hour. However, when the hour or day date stamp rolls over, and the document still appears, elastic/logstash thinks it's a new doc.
Is there a way to make the upsert work cross index? These would all be the same type of doc, they would simply apply to every index that matches "usage-*"
A new index is an entirely new keyspace and there's no way to tell ES to not index two documents with the same ID in two different indices.
However, you could prevent this by adding an elasticsearch filter to your pipeline which would look up the document in all indices and if it finds one, it could drop the event.
Something like this would do (note that usages would be an alias spanning all usage-* indices):
filter {
elasticsearch {
hosts => ["elastic4:9204"]
index => "usages"
query => "_id:%{[#metadata][fingerprint]}"
fields => {"_id" => "other_id"}
}
# if the document was found, drop this one
if [other_id] {
drop {}
}
}

Elasticsearch converting a string to number

I am new to Elasticsearch and am just starting up with ELK stack. I am collecting key value type logs in my Logstash and passing it to an index in Elasticsearch. I am using the kv filter plugin in Logstash. Due to this, all the fields are string type by default.
When I try to perform aggregation like avg or sum on a numeric field in Elasticsearch, I am getting an Exception: ClassCastException[org.elasticsearch.index.fielddata.plain.PagedBytesIndexFieldData cannot be cast to org.elasticsearch.index.fielddata.IndexNumericFieldData]
When I check the mappings in the index, all the fields except the timestamp ones are marked as string.
Please tell me how to overcome this issue as I have many numeric fields in my log events for aggregation.
Thanks,
Keerthana
You could set explicit mappings for those fields (see e.g. Change default mapping of string to "not analyzed" in Elasticsearch for some guidance), but it's easier to just convert those fields to integers in Logstash using the mutate filter:
mutate {
convert => ["name-of-field", "integer"]
}
Then Elasticsearch will do a better job at guessing the best data type for your field(s).
(See also Data type conversion using logstash grok.)
In latest Logstash the syntax is as follows
filter {
mutate {
convert => { "fieldname" => "integer" }
}
}
You can visit this link for more detail: https://www.elastic.co/guide/en/logstash/current/plugins-filters-mutate.html#plugins-filters-mutate-convert

How to search fields with '-' characters in elastic search

I am new to elastic search. I have got following document where one of the field "eventId" has "-" in value.
When i try to search with complete value of eventId, i don't get any results.
Sample Document app/event
{
"tags": {}
"eventId": "cc98d57b-c6bc-424c-b54c-df1e3df0d942",
}
I haven't created any explicit settings for my index.
Thanks.
you should check if the tokenizer splits your value into multiple fields. Maybe your value is stored as 5 fields: "cc98d57b", "c6bc", "424c", "b54c" and "df1e3df0d942"
You can analyze that with the 'Kopf' Plugin (https://github.com/lmenezes/elasticsearch-kopf).
If that is your problem you should change your field mapping, so that the value is not analyzed ("index" : "not_analyzed").
For an example how to set that mapping see here: Elasticsearch mapping settings 'not_analyzed' and grouping by field in Java
After that, you should be able to search for your specific value.

Resources