Complex Queries in ELK? - elasticsearch

I've successfully set-up ELK stack. ELK gives me great insights on data. However, I'm not sure how I'll fetch the following result.
Let say, I've a column user_id and action. The values in action can be installed , activated, engagement and click. So, I want that if a particular user has performed an activity installed on 21 May and 21 June, then while fetching results for the month of June, ELK should not return those users who has already performed that activity earlier before. For eg, for the following table:-
Date UserID Activityin the previous month
1 May 1 Activated
3 May 2 Activated
6 May 1 Click
8 May 2 Activated
11 June 1 Activated
12 June 1 Activated
13 June 1 Click
User1 and User2 has activated on 1May and 3May respectively. User2 has also activated on 8May. So, when I filter the users for the month of May having activity Activated, it should return me count 2, ie
1 May 1 Activated
3 May 2 Activated
User2 on 8May is being removed because it has performed that same activity before.
Now if I write the same query for the month of June, it should return me nothing, because the same users have perform the same activity earlier as well.
How can I write this query in ELK?

This type of relational query is not possible with ElasticSearch.
You would need to add another column (FirstUserAction) and either populate it when the data is loaded, or schedule a task (in whatever scripting/programming language you're comfortable with) to periodically calculate and update the values for this column.

Related

Query to prevent booking overlap

I'm doing an app in Apex Oracle and trying to find a query that could prevent people from booking a room already booked. I managed to find a query that can prevent picking a date that starts or ends in between the booking time but I can't find how to prevent overlaping. By that I mean if someone books a conference room feb 2nd to feb 5th, someone can book the same room from feb 1st to feb 7th. That is what I'm trying to prevent. Thanks for the help!
Here's my first query
SELECT RES_ID_LOC FROM WER_RES
WHERE (CAST(RES_DATE_ARRIVE AS DATE) < CAST(TRY_RESERVE_START_DATE AS DATE) OR CAST(RES_DATE_DEPART AS DATE)
CAST(TRY_RESERVE_START_DATE AS DATE))
AND (CAST(RES_DATE_ARRIVE AS DATE) < CAST(TRY_RESERVE_END_DATE AS DATE) OR CAST(RES_DATE_DEPART AS DATE) > CAST(TRY_RESERVE_END_DATE AS DATE))
The main issue you'll have here is concurrency, namely (in chronological order)
User 1
runs overlap check query, see Room 5 is free, and inserts a row to book it
User 2
runs overlap check query, see Room 5 is free, and inserts a row to book it
User 1
commits
User 2
commits
and voila! You have a data corruption, even though the code all ran as you expected.
To avoid this, you'll need some way to lock a resource that multiple might want to book. Thus lets say you have a ROOMS table (list of available rooms) and a BOOKINGS table which is a child of ROOM.
Then your logic will need be something like:
select from ROOM where ROOM_NO = :selected_room for update;
This gives someone exclusive access to the room to check for bookings.
Now you can run your overlap check on that room against the BOOKINGS table. If that passes, then you insert your booking and commit the change to release the lock on the ROOMS row.
As an aside, take care with simply casting strings to dates, because you're at the whim of the format mask of the item matching that default of the database. Better to explicitly use a known format mask and TO_DATE

How to deal with reporting slowly changing dimensions

For a client I am creating a data warehouse in which we have some slowly changing dimensions (or facts if that is even a thing?). For example we want to report the annually recurring revenue (ARR) for subscriptions and we want to have both the currently active and the expired subscriptions in there. So that we can see the ARR over a timeline.
The data we retrieve looks like this:
subscription_id
account_id
ARR
start_date
end_date
1
1
10
01-01-2022
31-03-2022
2
2
20
01-01-2022
31-12-2022
3
1
5
01-04-2022
31-11-2022
So in this case the same account (account_id 1) renewed a subscription at the 01-04-2022. In the report of 2022 we want to see the ARR for all months in 2022. I've looked into slowly changing dimensions, however something I can not really see in that concept is how to report both the currently active license and the history in a dashboard. If we for example want to visualize the ARR in all of 2022 per month in a dashboarding tool we want to see both subscriptions for account_id 1 over the course of the year, not just the currently active one. This seems to be very tricky to do in most dashboarding tools.
To overcome this I've done the following. I created a calendar table with an interval of 1 month and I cross join it with the table above to generate a fact table. The end result would look like:
timestamp
account_id
ARR
01-01-2022
1
10
01-01-2022
2
20
01-02-2022
1
10
...
...
...
01-11-2022
1
10
01-11-2022
2
20
01-11-2022
2
20
This makes it really easy for the user of the reporting tool to filter on a specific month and show the ARR between the dates and over multiple subscriptions. It does however generate a lot of extra data, but at the moment the storage space is not an issue. And it makes it more of a transactional style table, but the ARR is not really a transaction (i.e. it is not really a sold product on a specific date).
My question is: Are there better ways of generating a fact table where the source data contains a date range?

OsTicket dual level filter

This is my first question in stack overflow.
I have a problem with OSTicket and its Ticket filter option.
The following is what I set in the filter 1 and filter 2
Filter 1 -- User email contains xxx then set priority to high -- Excecution order 1
Filter 2 -- Priority contains high then SLA plan is 24 hours.
So I raise 2 new tickets as follows
Ticket 1 - user email xxx#yyy.com -- The priority becomes high though I did not make any input in the priority but the system does not do my second check where the filter says if priority is high then SLA should be 24 hrs
Ticket 2 - user email as sidd#yyy.com with priority input as high -- Now the ticket checks for my priority and gives me an SLA plan of 24 hours without my giving an input of SLA plan.
My ticket filter is working fine but the dual level is what is not working. How can I overcome this?
Thanks in advance.
Make sure you uncheck "Stop processing further on match!" in filter settings. It is checked by default.

How to deal with intermediate additions while paginating a list with Spring Data MongoDB?

I'm trying to create a backend in Spring Data Mongodb. I have the following code which works and I have used the built in methods by extending my repo with the MongoRepository class:
#RequestMapping(value="/nextpost", method=RequestMethod.GET)
public List getNextPosts(#RequestParam int next) {
Pageable pageable = new PageRequest(next, 5, new Sort(new Sort.Order(Direction.DESC, "id")));
page = repo.findAll(pageable);
return page.getContent();
}
The above code will return the page as per the page number inserted into the "next" variable.
My android frontend however allows for things to be added and deleted from the database and this causes problems with this pagination method. Lets take an example:
When my android frontend starts up, it loads the first 5 items by
calling "getNextPosts" with next = 0.
My android frontend also keeps track of the page it is on and
increments it when the user wants to see more items.
Now, we immediately add 5 more items.
When I swipe up to fetch the next 5 items, it calls the
"getNextPosts" method passing the the "next page" value = 1. The app
will load the
same 5 items originally displayed when the app was started as the 5 "NEW" items I have added just pushed the 5 "OLD" items down in
the database.
Therefore on the app, we see 15 items comprising of:
5 "NEW" + 5 "OLD" + 5 "OLD"
So if I gave numbers to all my items on my android ListView, I would see:
15
14
13
12
11
// the above would be the new items added
10
9
8
7
6
//the above would be the original items on page 0
10
9
8
7
6
//the above would be still be the original items but now we are on page 1
Does anyone know how one can solve this issue so that when I swipe up, the items would be:
15
14
13
12
11
// the above would be the new items added
10
9
8
7
6
5
//the above would be the original items on page 0
4
3
2
1
0
//the above would be on page 1
tl;dr
That's the nature of the beast. Pagination in Spring Data is defined as retrieving a part of the result set at the time of querying. Especially for remote communication, that kind of statelessness is usually the best tradeoff between keeping state, keeping connections open, scalability etc.
Details
The only way to avoid this would be to capture the state of the database at the time of the first access and only work on that. You can actually build this by retrieving all items and page through the data locally.
Of course hardly anyone does this as it easily gets out of hand for larger data volumes. Also, this would bring up other problems like: when do you actually want to see the items introduced in the meantime? So the definition of "correct content" when paginating a list is not distinct.
Mitigation strategies
If applicable to your scenario you could try to apply a sorting that guarantees new items to be added at the very end and thus basically making this an append-only list. This would naturally sort the most recent items last though, which is contrary to what's needed often times.
If you use the pagination to work down a list of items and process all of them, another approach is to keep track of the identifiers of the items you already have processed. In your particular scenario, you'd be able to detect that the items have already been processed and go on with the next page. This of course only makes sense if you read and process faster than someone else manipulates the list in the backend.
Another solution could be to store an insert timestamp into the db for each entry. This enables you to create deterministic pagination queries:
The moment you initialize pagination (querying first page) you restrict items to have an insert timestamp lower equals than now(). You have to save now() as the pagination timestamp for querying more pages in the future. Since newly added items all get an insert timestamp greater than the pagination timestamp those items won't affect existing paginations.
Please keep in mind that new items won't show until you re-initialize pagination by refreshing the pagination timestmap. But you can simply check for the existence of new items by counting the number of items with an insertion timestamp greater than the pagination timestamp and in this case show a refresh button or something like that.

Scribe: Not getting proper results

I am working on Sage ERP MAS 200 and Microsoft Dynamics CRM integration using Scribe.
I have a chain of 5 Scribe jobs with which I am trying to compute various values and update/insert in CRM (Target):
(1) Job 1: This job simply transfers all the data from AR_Customer table of MAS (source) to the same table in CRM (target). Also, for few new fields (yeartilldate sales, monthtilldate sales, prioryear sales, monthlytrend), it inserts value 0.
(2) Job 2: Month till date or Period till date:
This one computes the values of Month till date sales or Period till date sales and updates in the CRM. The values for those accounts, which do not get updated, the value is already inserted as 0 (in job 1).
(3) Job 3: Prior year:
This one computes the values of Prior Year sales and updates in the CRM. The values for those accounts, which do not get updated, the value is already inserted as 0 (in job 1).
(4) Job 4: Year till date:
This one computes the values of Year till date sales and updates in the CRM. The values for those accounts, which do not get updated, the value is already inserted as 0 (in job 1).
(5) Job 5: MonthlyTrend:
This one computes the values of MonthlyTrend and updates in the CRM. The values for those accounts, which do not get updated, the value is already inserted as 0 (in job 1).
Issue:
For jobs 1,2,3 and 4, there is no issue happening at all. The issue is happening in job 5.
I have 7 steps in my job. 7th step (CRM admin) is not called by any of the steps (i.e., there is no step in workflow which passes data to this step). But, still I have not removed it for some reasons.
Step 6 in my job (Account) is supposed to do account update. I have same formula for calculating the values of MonthlyTrend on both step 6 and 7.
Following are the observations:
1> For those records, where the flow never reaches steps 6 and 7: Value of MonthlyTrend is getting properly calculated (I could see the values when I clicked on 'Test Job') for both steps 6 and 7.
2> For those records, where the flow never reaches steps 7, but reaches step 6: Value of MonthlyTrend is getting properly calculated for step 7, but does not get calculated for step 6 (value remains #NULL).
Also, for step 6, when I tried giving constant value (like 0 or 8), it gets displayed even in case 2 mentioned above.
Please let me know why this might be happening.

Resources