ActiveRecord conditional order clause - activerecord

I need to sort a list of tasks:
|----------------------------------------------------------------------|
| title | priority | due_at |
| ---------------------------------|-------------|---------------------|
| Mow the lawn | 1 | 2011-09-11 22:00:00 |
| Call mom | 3 | 2010-01-26 09:29:03 |
| Bake a cake | 2 | 2013-09-13 08:45:37 |
| Feed the cat | 2 | 2015-09-12 16:03:51 |
| Remember you don't like the cat | 2 | 2014-03-19 23:00:00 |
|----------------------------------------------------------------------|
The order clause should sort overdue tasks by priority, all others by due_at, e.g. the resulting order should be
Mow the lawn
Bake a cake
Call mom
Remember you don't like the cat
Feed the cat

I ended up with the following (plain SQL, not yet translated to AR):
SELECT *
FROM tasks
ORDER BY
due_at <= Now() DESC,
CASE due_at <= Now() WHEN true THEN priority END ASC,
CASE due_at <= Now() WHEN true THEN due_at END ASC,
CASE due_at <= Now() WHEN false THEN due_at END DESC,
CASE due_at <= Now() WHEN false THEN priority END ASC

If you just need an array of the tasks stored in #tasks you can do this:
#tasks = Task.where(due_at: 10.years.ago..Time.now).order(:priority)
#tasks += Task.where.not(due_at: 10.years.ago..Time.now).order(:due_at)
If you need an Task::ActiveRecord_Relation you'll have to do this:
Task.where(id: Task.where(due_at: 10.years.ago..Time.now).
order(:priority).pluck(:id) +
Task.where.not(due_at: 10.years.ago..Time.now).
order(:due_at).pluck(:id))

Related

Unexpected behaviour of rand() in MySQL

I encountered a very weird result while trying to filter my data using RAND() function.
Suppose i have a table filled with some data:
CREATE TABLE `status_log` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`rank` int(11) DEFAULT 50,
)
Then i do the following simple select:
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50
and have a clear and expected output:
<...skip...>
| 6575476 | 50 | 34.51090244065123 |
| 6575511 | 50 | 67.84258230388404 |
| 6575589 | 50 | 35.68020727083106 |
| 6575644 | 50 | 74.87329251586766 |
| 6575723 | 50 | 67.32584384020961 |
| 6575771 | 50 | 12.009344726809621 |
| 6575863 | 50 | 58.06919518678374 |
+---------+------+-----------------------+
66169 rows in set (2.502 sec).
So, i generate some random data from 0 to 100 and join each result to the table, around 66000 results in total.
Then i want only a (random) part of the data to be shown. It doesn't have any purpose for production, by the way, it's just some artificial test, so let's not discuss it.
select *
from (
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50) t
where thres>rank
order by thres;
After that i get the following:
<...skip...>
| 4396732 | 50 | 99.97966075314177 |
| 4001782 | 50 | 99.98002871869134 |
| 1788580 | 50 | 99.98064143581375 |
| 5300286 | 50 | 99.98275954274717 |
| 146401 | 50 | 99.98552389441573 |
| 4744748 | 50 | 99.98644758014609 |
+---------+------+--------------------+
16449 rows in set (2.188 sec)
It's obvious that for the mean of 50 the expected number of results should be around 33000 out of total 66000. So it seems that the distribution of rand() is biased, correct?
Let's then change > to <:
select *
from (
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50) t
where thres<rank
order by thres;
<...skip...>
| 4653786 | 50 | 49.98035016467827 |
| 6041489 | 50 | 49.980370281245904 |
| 5064204 | 50 | 49.989308742796354 |
| 1699741 | 50 | 49.991373205549436 |
| 3234039 | 50 | 49.99390454030959 |
| 806791 | 50 | 49.99575274996064 |
| 3713581 | 50 | 49.99814410693771 |
+---------+------+----------------------+
16562 rows in set (2.373 sec)
Again 16000! So not the half but the quarter of all results is shown!
It seems that the output of rand() inside the brackets is somehow influenced with the expression outside them. How is this possible?
I can also union it:
select * from (select id,rank as rank,(rand()*100) as thres from status_log where rank = 50) t where thres<50
UNION ALL
select * from (select id,rank as rank,(rand()*100) as thres from status_log where rank = 50) t where thres>=50;
The expected number of results has to be somewhere around 66000, but it returns only 33000 or so.
I observe this behavior only when rand() is non-deterministic and is generated dynamically each time. If i do ...select id,rank as rank,(rand(id)*100)... (i.e. make the output of rand() dependent of id), i start getting the expected number of results (33000-ish). The same happens if i precalculate and fill a temporary field in the table.
I also tried making the filtering with rank=30, and the results were ~6000 and ~32000 for < and > respectively.
Version 10.5.8-MariaDB-3, InnoDB
Using a single query with HAVING instead of a subquery with WHERE in the main query seems to work around it.
select id,rank as rank,(rand()*100) as thres
from status_log
where rank = 50
having thres > rank
order by thres
This appears to be this bug:
RAND() evaluated and filtered twice with subquery

How to count total amount of pending tickets for each day this week in oracle-sql?

I want to count the total amount of pending tickets for each day in this week. I was only able to get it for one day at a time. I have this query right now:
SELECT (n.TOTAL - v.TODAY) + d.GISTER AS GISTER
FROM
(
-- Counts yesterday
SELECT
COUNT(ID) AS Gister
FROM FRESHDESK_API
-- 4 = resolved 5 = closed
-- Both count as closed
WHERE STATUS IN(4, 5)
AND TRUNC(UPDATED_AT) = TRUNC(SYSDATE - 1)
) d
CROSS JOIN
(
-- Total pending
SELECT
COUNT(ID) AS TOTAL
FROM FRESHDESK_API
-- 3 is pending
WHERE STATUS IN(3)
) n
CROSS JOIN
(
-- Pending tickets today
SELECT
COUNT(ID) AS TODAY
FROM FRESHDESK_API
-- 3 is pending
WHERE STATUS IN(3)
AND TRUNC(UPDATED_AT) = TRUNC(SYSDATE)
) v
I want to get a result like this:
+----------------------------------+---------+----------+
| day | pending_tickets |
+----------------------------------+---------+----------+
| Monday | 20 |
| Tuesday | 22 |
| Wednesday | 25 |
| Thursday | 24 |
| Friday | 19 |
+----------------------------------+---------+----------+
The table is someting like this (left the unused data out):
+----------------------------------+---------+----------+---------+-----------+----------+----------+
| id | created_at | updated_at | status |
+----------------------------------+---------+----------+----------+----------+----------+----------+
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
+----------------------------------+---------+----------+---------+-----------+---------+-----------+
You can use left join and group by as follows:
Select to_char(tday.updated_at, 'day') as updated_at,
count(tday.id) - count(yday.id) as pending_tickets
From FRESHDESK_API tday
Left join FRESHDESK_API yday
On trunc(tday.UPDATED_AT) = trunc(yday.UPDATED_AT - 1)
And trunc(yday.UPDATED_AT + 1, 'iw') = trunc(sysdate, 'iw')
And yday.status in (4,5)
Where trunc(tday.UPDATED_AT, 'iw') = trunc(sysdate, 'iw')
And tday.status = 3
Group by to_char(tday.updated_at, 'day'), trunc(tday.updated_at)
Order by trunc(tday.updated_at);

Convert raw query into laravel eloquent

I have this written and working as a raw SQL query, but I am trying to convert it to a more Laravel eloquent / query builder design instead of just a raw query.
My table structure like this:
Table One (Name model)
______________
| id | name |
|------------|
| 1 | bob |
| 2 | jane |
--------------
Table Two (Date Model)
_________________________________
| id | table_1_id | date |
|-------------------------------|
| 1 | 1 | 2000-01-01 |
| 2 | 1 | 2000-01-31 |
| 4 | 1 | 2000-02-28 |
| 5 | 1 | 2000-03-03 |
| 6 | 2 | 2000-01-03 |
| 7 | 2 | 2000-01-05 |
---------------------------------
I am returning only the the highest (most recent) dates from table 2 (Dates model) that match the user bob from table 1 (Name model).
For instance, in the example above, I return this from my query
2000-01-31
2000-02-28
2000-03-03
Here is what I am doing now (which works), but i'm just not sure how to use YEAR, MONTH and MAX with laravel.
DB::select(
DB::raw("
SELECT MAX(date) as max_date
FROM table_2
INNER JOIN table_1 ON table_1.id = table_2.table_1_id
WHERE table_1.name = 'bob'
GROUP BY YEAR(date), MONTH(date)
ORDER BY max_date DESC
")
);
Try this code if any problem then,
DB::table('table_1')->join('table_2', 'table_1.id','=','table_2.table_1_id')
->select(DB::raw('MAX(date) as max_date'),DB::raw('YEAR(date) year, MONTH(date) month'),'table_1.name')
->where('name','bob')
->groupBy('year','month')
->orderBy('max_date')
->get();
If any problem with above code then feel free to ask.

Aggregating several columns in oracle sql

Having a difficult time phrasing this question. Let me know if there's a better title.
I have a query that produces data like this:
+----------+----------+----------+----------+----------+
| KEY | FEB_GRP1 | JAN_GRP1 | FEB_GRP2 | JAN_GRP2 |
+----------+----------+----------+----------+----------+
| 50840992 | 1 | 1 | 0 | 0 |
| 50840921 | 0 | 1 | 1 | 0 |
| 50848995 | 0 | 0 | 0 | 0 |
+----------+----------+----------+----------+----------+
Alternatively, I can produce data like this:
+----------+------+------+
| KEY | JAN | FEB |
+----------+------+------+
| 50840992 | <50 | ~<50 |
| 50840921 | <50 | <50 |
| 50848995 | ~<50 | ~<50 |
| 50840885 | <50 | <50 |
+----------+------+------+
Where <50 should be counter as "group 1" and ~<50 should be counter as "group 2".
And I want it to be like this:
+-------+------+------+
| MONTH | GRP1 | GRP2 |
+-------+------+------+
| JAN | 2 | 0 |
| FEB | 1 | 1 |
+-------+------+------+
I can already get JAN_GRP1_SUM just by summing JAN_GRP1, but I want that to just be a data point, not a column itself.
My query (generates the first diagram):
SELECT *
FROM (
SELECT KEY,
CASE WHEN "FEB-1-2016" = '<50' THEN 1 ELSE 0 END AS FEB_GRP1,
CASE WHEN "FEB-1-2016" != '<50' THEN 1 ELSE 0 END AS FEB_GRP2,
CASE WHEN "JAN-1-2016" = '<50' THEN 1 ELSE 0 END AS JAN_GRP1,
CASE WHEN "JAN-1-2016" != '<50' THEN 1 ELSE 0 END AS JAN_GRP2
FROM MY_TABLE);
Your data model doesn't make much sense, but from what you've shown you can do:
select 'JAN' as month,
count(case when "JAN-1-2016" = '<50' then 1 end) as grp1,
count(case when "JAN-1-2016" != '<50' then 1 end) as grp2
from my_table
union all
select 'FEB' as month,
count(case when "FEB-1-2016" = '<50' then 1 end) as grp1,
count(case when "FEB-1-2016" != '<50' then 1 end) as grp2
from my_table;
That doesn't scale well - if you have more months you need to add another union branch for each one.
If your query is based on a view or a previously calculated summary then it will probably be much easier to go back to the original data.
If you are stuck with this then another possible approach, which might be more manageable if you actually have more than two months to look at, could be to unpivot the data:
select *
from my_table
unpivot(value for month in ("JAN-1-2016" as date '2016-01-01',
"FEB-1-2016" as date '2016-02-01') --, etc. for other months
);
and then aggregate that:
select to_char(month, 'MON', 'NLS_DATE_LANGUAGE=ENGLISH') as month,
count(case when value = '<50' then 1 end) as grp1,
count(case when value != '<50' then 1 end) as grp2
from (
select *
from my_table
unpivot(value for month in ("JAN-1-2016" as date '2016-01-01',
"FEB-1-2016" as date '2016-02-01') --, etc. for other months
)
)
group by month;
Still not pretty and Oracle is doing pretty much the same thing under the hood I think, but fewer case expressions to create and maintain - the drudge part is the unpivot pairs. You might need to include the year in the `'month' field, depending on the range of data you have.

Hive Contiguous Date Ranges

I am using Hive and I would like to take a table with a historical list of customers, subscription events, and subscription types and summarize by contiguous runs of subscription types for each customer.
Example Input (db.cust_hist):
customer_id | eff_dt | exp_dt | sub_cd | sub_type
---------------------------------------------------------
1 | 02/01/2015 | 03/01/2015 | active | A
1 | 03/01/2015 | 04/01/2015 | active | A
1 | 03/15/2015 | 12/31/9999 | cancel | A
1 | 04/01/2015 | 05/01/2015 | active | A
1 | 05/01/2015 | 06/01/2015 | active | A
1 | 02/01/2015 | 03/01/2015 | active | B
1 | 03/01/2015 | 04/01/2015 | active | B
The sub_cd in this case refers to the type of event that is effective over the date range for that row. For example, the user canceled their A subscription type on 3/15 and resumed on 4/01.
The output I'm trying to get looks like this (db.cust_snapshot):
customer_id | eff_dt | exp_dt | sub_type
------------------------------------------------
1 | 02/01/2015 | 03/15/2015 | A
1 | 04/01/2015 | 06/01/2015 | A
1 | 02/01/2015 | 04/01/2015 | B
and reflects the gap in coverage.
From what I have read in this link from BetterAtOracle (specific to SQL) which does a very good job of laying things out, I need to use row numbers and a lagging window, but I can't seem to apply it to my situation in Hive (perhaps because of the 12/31/9999 notation/subscription code?)
I tried:
SELECT customer_id
, eff_dt
, exp_dt
, sub_cd
, sub_type
, CASE WHEN DATEDIFF(TO_DATE(eff_dt), TO_DATE(lag(exp_dt) OVER (PARTITION BY customer_id, sub_type ORDER BY eff_dt)) <=1 THEN NULL
ELSE row_number() OVER(PARTITION BY customer_id, sub_type ORDER BY eff_dt)
END) as grp
FROM db.cust_hist
ORDER BY TO_DATE(eff_dt)
As you can see, I haven't applied the subscription event code. This sort of gets me there as I can start to see different groups based on subscription type, but I feel like I'm stuck from here on out.
Any help or pointers would be greatly appreciated. Before this task, I never understood the true power of ranks, rows, lag, and other window functions!

Resources