search with comma separated values in elasticsearch [closed] - elasticsearch

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I am quite new in elastic search here so if anybody can help me here
Suppose I have selected
1) Category - Hollywood
2) Sub-Category - Bond Special
3) Genre - Action & Drama & Comedy ( as multiple selection will be there )
4) Language - English, Russian and Hindi ( as multiple selection will be there)
5) Release Year - 1990,1999,2000 ( as multiple selection will be there)
6) 3D Movie - True OR False (any one will be selected)
7) SortBy - “A-Z”OR “Z-A” OR “Date”
Can anyone help me in making this query for elastic-search. I will use "match_phrase" for making AND condition but the issue is matching parameters or search parameter will be multiple and comma separated (u can say).
and my index array is given below : -
[_source] => Array (
[id] => 43
[value] => GREENBERG
[imageName] => Done
[date] => (1905) USA (Bengali)
[language] => (Bengali) | 1905 | 1.47hrs
[directorName] => Alejandro González Iñárritu, Ang Lee
[castForSearch] => Ben Stiller, John Turturro
[viewDetailsUrl] => /movie/greenberg
[movieType] => Animation
[rating] => 0
[cast] => Ben Stiller, John Turturro, Olivier Martinez
[synopsis] => A man from Los Angeles, who moved to New York years ago, returns to L.A. to figure out his life while he house-sits for his brother. He soon sparks with his brother's assistant.
[code] => HLR06
[type] => Non-3D
[trailer] => https://www.youtube.com/watch?v=cwdliqOGTLw
[imdb_code] => 1234654
[tags] => Array
(
[0] => Animation
)
[genre] => Adventure
[languages] => Bengali
[categories_filter] => Array
(
[0] => Category 2,Hollywood
)
[sub_categories_filter] => Array
(
[0] => Sub-Category 1,Sub-Category 4,Sub-Category 5,Sub-Category 6,Sub-Category 7
)
)
Weekly Sunday 12 AM
everyday 12 AM
every day 12:15 AM
daily 12:01 AM
daily 12:01 AM
joinScreenCancellationScheduler - Weekly Sunday 12 AM
0 0 * * 7 curl <url>
goLiveDate - everyday 12 AM
0 0 * * * curl <url>
nearestDateDisable - every day 12:15 AM
15 0 * * * curl <url>
reminderOfEvent - daily 12:01 AM
01 0 * * * curl <url>
thresholdNotMet - daily 12:01 AM
daily 12:01 AM

To match against one of multiple possible values, use a terms query. You don't need a match_phrase query because you're not doing any sort of free-text matching.
You'll need to split the comma-separated values into arrays before indexing your data into Elasticsearch (or use a comma-separated tokenizer).
Your use case suggests that you don't care about scoring but only about filtering, in which case your query should probably just have a filter.
Sorting is not the same as filtering; for your A-Z/Z-A/Date sorting you'll need to use a sort clause outside of the query.
The final thing would probably look like this:
GET /my_index/my_type/_search
{
"query": {
"bool": {
"filter": [
"terms": { "genre": ["Action", "Drama", "Comedy"] },
"terms": { "language": ["English", "Russian", "Hindi"] },
// more terms filters
]
}
},
"sort": { "title": "asc" }
}

Related

Laravel group data by months

I am working on a Laravel application, users can place football bets.
This is a simplified version of my tables:
users
- id
- name
bets
- id
- id_user
- cost
- profit (e.g. can be 0 if user lost this bet or any integer value if won)
- created_at (default laravel column, this should be used to group bets by month)
I need to show a chart with ROI (not looking for the formula, this can be simplified as calculateROI in your comments) of last six months from current one.
Let's assume current month is july, how can i write a query or use Eloquent to have something like:
[
[
"february" => 2%
],
[
"march" => 0%
],
[
"april" => 100%
],
[
"may" => 500%
],
[
"june" => 13%
],
[
"july" => 198%
],
]

Logstash ignoring multiple date filters

i'm making a logstash .conf, and on my filter i need to extract the weekday of two timestamps, but Logstash act as if he only is making one match, example:
Timestamp 1: Mar 7, 2019 # 23:41:40.476 . => Thursday
Timestamp 2: Mar 1, 2019 # 15:22:47.209 . => Thu
Expected Output
Timestamp 1: Mar 7, 2019 # 23:41:40.476 . => Thursday
Timestamp 2: Mar 1, 2019 # 15:22:47.209 . => Fri
These are my filters:
date {
match => ["[system][process][cpu][start_time]", "dd-MM-YYYY HH:mm:ss", "ISO8601"]
target => "[system][process][cpu][start_time]"
add_field => {"[Weekday]" => "%{+EEEEE}"}
}
date {
match => ["[FechaPrimero]", "dd-MM-YYYY HH:mm:ss", "ISO8601"]
target => "[FechaPrimero]"
add_field => {"[WeekdayFirtsDay]" => "%{+EE}"}
}
It's because by default %{+EEEEE} and %{+EE} take into account the #timestamp field, and no a user made field (don't know it is written in the doc)
The only way of doing that, as far as I know, is using a part of ruby code, to extract day of week, as following :
ruby {
code => 'event.set("Weekday", Time.parse(event.get("[system][process][cpu][start_time]").to_s).strftime("%A"))'
}
ruby {
code => 'event.set("FechaPrimero", Time.parse(event.get("FechaPrimero").to_s).strftime("%a"))'
}

logstash cron schedule to run every 12 hours starting at certain time

I am trying 0 1/12 * * * cron expression but it only fires once a day. Below is 1 of my configuration.
input {
jdbc {
jdbc_connection_string => "jdbc:redshift://xxx.us-west-2.redshift.amazonaws.com:5439/xxx"
jdbc_user => "xxx"
jdbc_password => "xxx"
jdbc_validate_connection => true
jdbc_driver_library => "/mnt/logstash-6.0.0/RedshiftJDBC42-1.2.10.1009.jar"
jdbc_driver_class => "com.amazon.redshift.jdbc42.Driver"
schedule => "0 1/12 * * *" #01:00,13:00, tried from https://crontab.guru/#0_1/12_*_*_*
statement_filepath => "conf/log_event_query.sql"
use_column_value => true
tracking_column => dw_insert_dt
last_run_metadata_path => "metadata/logstash_jdbc_last_run_log_event"
}
}
output {
elasticsearch {
index => "logs-ics_%{+dd_MM_YYYY}"
document_type => "log_event"
document_id => "%{log_entry_id}"
hosts => [ "x.x.x.x:xxxx" ]
}
}
I also tried below 0 0 1/12 ? * * * from https://www.freeformatter.com/cron-expression-generator-quartz.html but lostash does not support this type.
Original cron used.
Please help me get a cron expression which works in logstash according to following dates and also is there a online page where I can test my future logstash cron expressions?
1st at 2018-08-01 01:00:00
then at 2018-08-01 13:00:00
then at 2018-08-02 01:00:00
then at 2018-08-02 13:00:00
then at 2018-08-03 01:00:00
It looks like your scheduling format is wrong.
To do a once-every-twelve hours task, you would use */12, not 1/12:
0 */12 * * * # Every twelve hours at minute 0 of the hour.
Your schedule looks more like an attempt to run the task at one and 12, but to do that you would use a comma, like this:
0 1,12 * * * # Run at one and twelve hours at minute 0.
The rufus extension also allows for adding the timezone (like Asia/Kuala_Lumpur), if you need this to run scheduled on a specific timezone that is not the default machine clock.
Your code above doesn't show us the SQL query you are running. The query could be firing, but if there are no results from the query, you aren't going to get any input in logstash. In any case, your scheduling syntax needs to change from 1/12 to */12 to do what you want it to.
More generally, according to the logstash jdbc input plugin documentation, the scheduling format is considered to be cron-"like." The logstash jdbc input plugin uses the ruby Rufus scheduler. The docs on that scheduling format are here: https://github.com/jmettraux/rufus-scheduler#parsing-cronlines-and-time-strings
Logstash 6.0 JDBC plugin docs are here: https://www.elastic.co/guide/en/logstash/6.0/plugins-inputs-jdbc.html
Hope this helps.

I don't know how to filter my log file with grok and logstash

I have an small java app that loads logs similar to these ones bellow:
Fri May 29 12:10:34 BST 2015 Trade ID: 2 status is :received
Fri May 29 14:12:36 BST 2015 Trade ID: 4 status is :received
Fri May 29 17:15:39 BST 2015 Trade ID: 3 status is :received
Fri May 29 21:19:43 BST 2015 Trade ID: 3 status is :Parsed
Sat May 30 02:24:48 BST 2015 Trade ID: 8 status is :received
Sat May 30 08:30:54 BST 2015 Trade ID: 3 status is :Data not found
Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found
Sat May 30 23:46:09 BST 2015 Trade ID: 6 status is :received
I want to use ELK stack to analyse my logs and filter them.
I would like at least 3 filters : Date and time, trade Id and status.
In the filter part of my logstash configuration file here is what I did:
filter {
grok {
match => { "message" => "%{DAY} %{MONTH} %{DAY} %{TIME} BST %{YEAR} Trade ID: %{NUMBER:tradeId} status is : %{WORD:status}" }
}
And for the moment I can't filter my logs as I want.
You have some extra spaces between the pattern, and for the status, you would like to parse the entire message, so using the GREEEDYDATA instead of the WORD is your choice.
filter {
grok {
match => { "message" => "%{DAY:day} %{MONTH:month} %{MONTHDAY:monthday} %{TIME:time} BST %{YEAR:year} Trade ID: %{NUMBER:tradeId} status is :%{GREEDYDATA:status}" }
}
}
For this log line:
Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found
You will end up with a json like:
{
"message" => "Sat May 30 15:38:01 BST 2015 Trade ID: 3 status is :Book not found",
"#version" => "1",
"#timestamp" => "2015-08-18T18:28:47.195Z",
"host" => "Gabriels-MacBook-Pro.local",
"day" => "Sat",
"month" => "May",
"monthday" => "30",
"time" => "15:38:01",
"year" => "2015",
"tradeId" => "3",
"status" => "Book not found"
}

How to do a year over year aggregation with Elasticsearch?

Assuming I have a date field on a document, I know using the date_histogram aggregation I can get a document count by day, month, year, etc.
What I want to do is get the average document count for January, February, March, etc. over several given years. The same goes for Monday, Tuesday, Wednesday, etc. over several given weeks. Is there a way to do this having just that same date field or what is the best way to accomplish this with Elasticsearch?
Example
Let's say we have a bunch of orders placed over three years:
2012 - Jan (10 orders), Feb (5 orders), Mar (7 orders), Apr (11 orders), etc
2013 - Jan (13 orders), Feb (7 orders), Mar (12 orders), Apr (15 orders), etc.
2014 - Jan (10 orders), Feb (7 orders), Mar (6 orders), Apr (13 orders), etc.
What I want is the average of each month over the given years, so the output would be:
Jan (10 + 13 + 10 / 3 = 11 orders), Feb (6.33 orders), Mar (8.33 orders), Apr (13 orders), etc.
It would be best if this can be generalized for N years (or N Januaries, etc.) so that we search over any date range.
You can use 'monthOfYear' like this:
"aggregations": {
"timeslice": {
"histogram": {
"script": "doc['timestamp'].date.getMonthOfYear()",
"interval": 1,
"min_doc_count": 0,
"extended_bounds": {
"min": 1,
"max": 12
},
"order": {
"_key": "desc"
}
}
}
The extended bounds will ensure you get a value for every month (even if it is zero).
If you want the month names, you can either do that in your own code, or, do this (at the consequence that you won't get values for months that have no data):
"aggregations": {
"monthOfYear": {
"terms": {
"script": "doc['timestamp'].date.monthOfYear().getAsText()",
"order": {
"_term": "asc"
}
}
}
Once you've got this, you can nest your stats aggregation inside this one:
"aggregations: {
"monthOfYear": {
"terms": {
...
},
"aggregations": {
"stats": ...
}
}
}
The question is pretty old now, but, hope this helps someone.
My understanding of what you want is:
You'd like to see the average number of documents per month in yearly buckets
is that correct?
if so, you could count the number of documents in a year (i.e. the yearly bucket) and then divide by 12 using a script.
E.g. to show the daily average doc count in weekly buckets (assuming 30 days per month):
curl -XGET 'http://localhost:9200/index/type/_search?pretty' -d '{
"aggs" : {
"monthly_bucket": {
"date_histogram": {"field": "datefield","interval": "week"},
"aggs" : {
"weekly_average": {"sum" : {"script" : " doc[\"datefield\"].value>0 ? 1/30 : 0"} }}
}
}
}'

Resources