Calculating the time duration of a particular Log event using Logstash - elasticsearch

Objective : I want to to calculate the time duration on how long a particualr event has lasted using logstash.
Scenario : Consider a customer Who is searching for a product to purchase from my page. Each and every page he is visiting has been recorded in the log along with time duration. Now I want to find how long an Average customer is taking to get a product. and how long my server is taking time to respond him back.
Now here is my Log file:
16-09-2004 00:37:22 BEGIN_CUST
ts:16-09-2004T00:37:26+05:30
ID-XYZ456
16-09-2004 00:37:23 PAGE_1
ID-XYZ456
ts:16-09-2004T00:39:26+05:30
16-09-2004 00:37:23 PAGE_2
ID-XYZ456
ts:16-09-2004T00:41:26+05:30
16-09-2004 00:37:23 BUT_REQ
ID-XYZ456
ts:16-09-2004T00:43:26+05:30
16-09-2004 00:37:23 PURCHASE
ID-XYZ456
ts:16-09-2004T00:47:26+05:30
16-09-2004 00:51:22 BEGIN_CUST
ts:16-09-2004T00:52:26+05:30
ID-YUB98I
16-09-2004 00:53:23 PAGE_1
ID-YUB98I
16-09-2004 00:55:23 PURCHASE
ID-YUB98I
In the above log file, It is clear that BEGIN_CUST is the beginning of the event and PURCHASE is the end of an event.
ID (plays as a unique ID for each customer).
I have tried Scripted fields. but it is not yielding me proper results due to the following points,
It is not necessary that a customer needs to purchase it.
Customer Purchase may last even in Seconds.
Is there any way better to plot the duration of an Individual Customer in a separate field in Kibana to visualize it using Logstash.
Thanks in Advance.

So long as you're using ElasticSearch as your store, the elasticsearch filter may do what you need. The trick is to search for the BEGIN_CUST event as soon as you get a PURCHASE event. The documentation for this plugin includes and example that does much of what you're looking for, but here is a summary:
if [trans_type] == "PURCHASE" {
elasticsearch {
hosts => localhost,
query => "trans_type:BEGIN_CUST AND cust_id:%{[cust_id]}],
fields => { "#timestamp" => "started" }
}
date {
match => [ "[started]", "ISO8601" ]
target => "[started]"
}
ruby {
code => "event['shopping_time'] = (event['#timestamp'] - event['started'] rescue nil"
}
}
Which will yield a shopping_time field measured in seconds between when the BEGIN_CUST record arrived and when the first PURCHASE arrived. If a customer purchases twice, then each PURCHASE record will have its own shopping_time field based on the same BEGIN_CUST.
This works by querying ElasticSearch for the BEGIN_CUST record, and using the #timestamp data on that record in the PURCHASE record's started field. The date {} filter then turns that into a datetime data-type. Finally, the ruby {} block computes the difference in time between the current #timestamp field and the one pulled out of ElasticSearch, creating the shopping_time field.

Related

Remove duplicates from groupBy

I would like to find out how many Users have Swipes per day without duplicates of user_id within group.
So if a User has swiped multiple times on a day, I want the User only show once per group (per day). I am not really interested in the actual Swipes but rather in the swipe count per day.
I tried:
Swipe::all()->groupBy(function($item){ return $item->created_at->format('d-M-y'); })->unique('user_id')
To remove duplicate data, you can use unique().
I create an example for you.
I have dummy data like
.
So you want the result is data grouped by created_at and on every date return how many users swipe it but without duplicate user?
The code should be like:
$collect = Swipe::all()->groupBy(function($data){
return $item->created_at->format('d-M-y');
})->transform(function($dataGrouped,$date){
return [
$date => $dataGrouped->unique('user_id')
];
});
The result will be like:

Time-sensitive Cloudant view not always returning correct results

I have a view on a Cloudant database that is designed to show events that are happening in the next 24 hours:
function (doc) {
// activefrom and activeto are in UTC
// set start to local time in UTC
var m = new Date();
var start = m.getTime();
// end is start plus 24 hours of milliseconds
var end = start + (24*60*60*1000);
// only want approved disruptions for today that are not changed conditions
if (doc.properties.status === 'Approved' && doc.properties.category != 'changed' && doc.properties.activefrom && doc.properties.activeto){
if (doc.properties.activeto > start && doc.properties.activefrom < end)
emit([doc.properties.category,doc.properties.location], doc.properties.timing);
}
}
}
This works fine for most of the time but every now and then the view does not show the expected results.
If I edit the view, even just adding a comment, the output changes to the expected results. If I re-edit the view and remove the change, the results return to the incorrect results.
Is this because of the time-sensitive nature of the view? Is there a better way to achieve the same result?
The date that is indexed by your MapReduce function is the time that the server dealing with the work performs the indexing operation.
Cloudant views are not necessarily generated at the point that data is added to the database. Sometimes, depending on the amount of work the cluster is having to do, the Cloudant indexer is not triggered until later. Documents can even remain unindexed until the view is queried. In that circumstance, the date in your index would not be "the time the document was inserted" but "the time the document was indexed/queried", which is probably not your intention.
Not only that, different shards (copies) of the database may process the view build at different times, giving you inconsistent results depending on which server you asked!
You can solve the problem by indexing something from your source document e.g.
if your document looked like:
{
"timestamp": 1519980078159,
"properties": {
"category": "books",
"location": "Rome, IT"
}
}
You could generate an index using the timestamp value from your document and the view you create would be consistent across all shards and would be deterministic.

Fetch first charge of a customer in stripe

I am reading stripe documentation and I want to fetch the first charge of the a customer. Currently I am doing
charge_list = Stripe::Charge.list(
{
customer: "cus_xxx"
},
"sk_test_xxxxxx"
)
first_charge = charge_list.data.last
Since stripe api returns the charges list in sorted order with the most recent charges appearing first. But I don't think it is a good approach. Can anyone help me with how can I fetch the first charge by a customer or how can I sort the list with descending order of created date so that I could get the first object from the array.
It seems there is no reverse order sorting feature in stripe API.
Also remember the first charge may not be on the first page result set, so you have to iterate using #auto_paging_each.
A quick possible solution:
charge_list = Stripe::Charge.list(
{customer: "cus_xxx", limit: 100 }, # limit 100 to reduce request
"sk_test_xxxxxx")
first_charge = nil
charge_list.auto_paging_each {|c| first_charge = c }
You may want to persist the result somewhere since it is a heavy operation.
But the cleanest solution IMO would be to store all charge records into your DB and make subsequent queries against it.

How can I create a histogram of time stamp deltas?

We are storing small documents in ES that represent a sequence of events for an object. Each event has a date/time stamp. We need to analyze the time between events for all objects over a period of time.
For example, imagine these event json documents:
{ "object":"one", "event":"start", "datetime":"2016-02-09 11:23:01" }
{ "object":"one", "event":"stop", "datetime":"2016-02-09 11:25:01" }
{ "object":"two", "event":"start", "datetime":"2016-01-02 11:23:01" }
{ "object":"two", "event":"stop", "datetime":"2016-01-02 11:24:01" }
What we would want to get out of this is a histogram plotting the two resulting time stamp deltas (from start to stop): 2 minutes / 120 seconds for object one and 1 minute / 60 seconds for object two.
Ultimately we want to monitor the time between start and stop events but it requires that we calculate the time between those events then aggregate them or provide them to the Kibana UI to be aggregated / plotted. Ideally we would like to feed the results directly to Kibana so we can avoid creating any custom UI.
Thanks in advance for any ideas or suggestions.
Since you're open to use Logstash, there's a way to do it using the aggregate filter
Note that this is a community plugin that needs to be installed first. (i.e. it doesn't ship with Logstash by default)
The main idea of the aggregate filter is to merge two "related" log lines. You can configure the plugin so it knows what "related" means. In your case, "related" means that both events must share the same object name (i.e. one or two) and then that the first event has its event field with the start value and the second event has its event field with the stop value.
When the filter encounters the start event, it stores the datetime field of that event in an internal map. When it encounters the stop event, it computes the time difference between the two datetimes and stores the duration in seconds in the new duration field.
input {
...
}
filter {
...other filters
if [event] == "start" {
aggregate {
task_id => "%{object}"
code => "map['start'] = event['datetime']"
map_action => "create"
}
} else if [event] == "stop" {
aggregate {
task_id => "%{object}"
code => "map['duration'] = event['datetime'] - map['start']"
end_of_task => true
timeout => 120
}
}
}
output {
elasticsearch {
...
}
}
Note that you can adjust the timeout value (here 120 seconds) to better suit your needs. When the timeout has elapsed and no stop event has happened yet, the existing start event will be ditched.

SOQL - single row per each group

I have the following SOQL query to display List of ABCs in my Page block table.
Public List<ABC__c> getABC(){
List<ABC__c> ListABC = [Select WB1__c, WB2__c, WB3__c, Number, tentative__c, Actual__c, PrepTime__c, Forecast__c from ABC__c ORDER BY WB3__c];
return ListABC;
}
As you can see in the above image, WB3 has number of records for A, B and C. But I want to display only 1 record for each WB3 group based on Actual__c. Only latest Actual__c must be displayed for each WB3 Group.
i.e., Ideally I want to display only 3 rows(one each for A,B,C) in this example.
For this, I have used GROUPBY and displayed the result using AggregateResults. Here is the result.
I got the Latest Actual Date for each WB3 as shown above. But the Tentative date is not corresponding to it. The Tentative Date is also the MAX in the list.
Here is the code I used
public List<SiteMonitoringOverview> getSPM(){
AggregateResult[] AgR = [Select WB_3__c, MAX(Tentaive_Date__c) dtTentativeDate , MAX(Actual_Date__c) LatestCDate FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
if(AgR.size()>0){
for(AggregateResult SalesList : AgR){
CustSumList.add(new SiteMonitoringOverview(String.ValueOf(SalesList.ge​t('WB_3__c')), String.valueOf(SalesList.get('dtTentativeDate')), String.valueOF(SalesList.get('LatestCDate')) ));
}
}
return CustSumList;
}
I am forced to use MAX() for tentative date. I want the corresponding Tentative date of the MAX Actual Date. Not the Max Tentative Date.
For group A, the Tentative Date of Max Actual Date is 12/09/2012. But it is displaying the MAX tentative date: 27/02/2013. It should display 12/09/2012. This is because I am using MAX(Tentative_Date__c) in my code. Every column in the SOQL query must be either GROUPED or AGGREGATED. That's weird.
How do I get the required 3 rows in this example?
Any suggestions? Any different approach (looping within in groups)? how?
Just ran into this issue myself. The solution I came up with only works if you want the oldest or newest record from each grouping. Unfortunately it probably won't work in your case. I'll still leave this here incase it does happen to help someone searching for a solution to this issue.
AggregateResult[] groupedResults = [Select Max(Id), WBS_3__c FROM Site_progress_Monitoring__c GROUP BY WBS_3__c];
Calling MAX or MIN on the Id will let you get 1 record per group condition. You can then query other information. I my case I just need 1 record from each group and didn't really care which one it was.

Resources