Single table version controled data - ruby

So I have a table designed as such:
create_table "entries", :force => true do |t|
t.integer "id", :null => false, :autoincrement => true
t.text "text"
t.string "uuid"
t.datetime "created_at", :null => false
end
I have a similar problem to " Database - data versioning in single table" except that I have multiple sources (some of which do work offline) writing into this database, so the current bit doesn't really work well for me.
What I'm wondering is what is the best way to get the most recent of each UUID.
In sqlite: SELECT "entries".* FROM "entries" GROUP BY uuid ORDER BY updated_at DESC works great, but it's not valid syntax in postgres, and it kind of feels janky. Is there a good way to do this, or do I need to redo my schema?

One way to get the results you're after is to use a window function and a derived table. ActiveRecord doesn't understand either of those but there's always find_by_sql:
Entry.find_by_sql(%q{
select id, text, uuid, created_at
from (
select id, text, uuid, created_at,
row_number() over (partition by uuid order by created_at desc) as r
from entries
) as dt
where r = 1
})
The interesting part is this:
row_number() over (partition by uuid order by created_at desc)
partition by uuid is similar to group by uuid but it doesn't collapse the rows, it just modifies how the row_number() window function behaves so that row_number() is computed with respect to rows with matching uuids; the order by created_at desc similarly applies only to the window that row_number() will be looking at. The result is that r = 1 peels off the first row in each group.

Related

Oracle select rows from a query which are not exist in another query

Let me explain the question.
I have two tables, which have 3 columns with same data tpyes. The 3 columns create a key/ID if you like, but the name of the columns are different in the tables.
Now I am creating queries with these 3 columns for both tables. I've managed to independently get these results
For example:
SELECT ID, FirstColumn, sum(SecondColumn)
FROM (SELECT ABC||DEF||GHI AS ID, FirstTable.*
FROM FirstTable
WHERE ThirdColumn = *1st condition*)
GROUP BY ID, FirstColumn
;
SELECT ID, SomeColumn, sum(AnotherColumn)
FROM (SELECT JKM||OPQ||RST AS ID, SecondTable.*
FROM SecondTable
WHERE AlsoSomeColumn = *2nd condition*)
GROUP BY ID, SomeColumn
;
So I make a very similar queries for two different tables. I know the results have a certain number of same rows with the ID attribute, the one I've just created in the queries. I need to check which rows in the result are not in the other query's result and vice versa.
Do I have to make temporary tables or views from the queries? Maybe join the two tables in a specific way and only run one query on them?
As a beginner I don't have any experience how to use results as an input for the next query. I'm interested what is the cleanest, most elegant way to do this.
No, you most probably don't need any "temporary" tables. WITH factoring clause would help.
Here's an example:
with
first_query as
(select id, first_column, ...
from (select ABC||DEF||GHI as id, ...)
),
second_query as
(select id, some_column, ...
from (select JKM||OPQ||RST as id, ...)
)
select id from first_query
minus
select id from second_query;
For another result you'd just switch the tables, e.g.
with ... <the same as above>
select id from second_query
minus
select id from first_query

Rails 4: column reference "updated_at" is ambiguous with Postgres

I am trying to query the database with a DateTime range for the 'updated_at' field. The front end sends queries in a JSON array:
["2015-09-01 00:00:00","2015-10-02 23:00:00"]
At the Rails controller, I parse the two strings to DateTime using:
start_date = DateTime.parse(params[:date_range_arr][0])
end_date = DateTime.parse(params[:date_range_arr][1])
#...
#events = #events.where('updated_at BETWEEN ? AND ?,
start_date, end_date
)
The queries show:
WHERE (updated_at BETWEEN '2015-09-01 00:00:00.000000' AND '2015-10-02 23:00:00.000000')
And the errors are:
ActionView::Template::Error (PG::AmbiguousColumn: ERROR: column reference "updated_at" is ambiguous
LINE 1: ...texts"."id" = "events"."context_id" WHERE (updated_at...
Do you have by any chance a default scope with a join or an include on the Event model or in the code above what's included in the original question?
Either way, you simply need to be more specific with your query as follow:
#...
#events = #events.where('events.updated_at BETWEEN ? AND ?,
start_date, end_date
)

How to check if a set of values exist in item table in Oracle

I have two table- 'Order' and 'Order Item'.
Order table contains-
Order Number, Order Date, etc.
Order Item table contains-
Order Number, Order Item Number, Product Name, etc.
The joining condition between these two tables is on Order Number.
In my target table I need orders and a flag. The flag should tell, if there is a predefined set of products which has been ordered as part of that order then it should be set to 'Yes'.
E.g., Suppose an order 'ORD-01' contains three products in Order Item table - 'Mobile', 'PC' and 'Tablet', then my resulting table should contain Order Number as ORD-01 and Flag as 'Yes'.
In the same way, if order 'ORD-02' contains only two prods 'Mobile' an 'Tablet', then the resulting table should contains 'ORD-02' and Flag 'No'.
Similarly, if order 'ORD-03' contains three different prods 'Notebook', 'PC' an 'Tablet', then the resulting table should contains 'ORD-03' and Flag 'No'.
As per my understanding, I have written below query-
SELECT order_number,(SELECT CASE WHEN COUNT(DISTINCT product_name)>=3
THEN 'Yes' ELSE 'No' END Prod_Flag
FROM order_item b
WHERE a.order_number=b.order_number
AND b.product_name IN ('Mobile','PC','Tablet'))
FROM order a
WHERE order_date>last_run_date;
But it takes too much of time, as the order item is a very big table (>1 Billion rows). However I need incremental data based upon order date from Order table. Even if there is an index of order number in both tables, it takes time.
Would a query like this get you to your result any quicker?
SELECT ON.ORDER_NUMBER,
CASE WHEN SET_FOUND.ORDER_NUMBER IS NOT NULL
THEN 'Yes' ELSE 'No' END PROD_FLAG
FROM ORDER ON,
(SELECT ORDER_NUMBER
FROM ORDER_ITEM
WHERE PRODUCT_NAME = 'Mobile'
INTERSECT
SELECT ORDER_NUMBER
FROM ORDER_ITEM
WHERE PRODUCT_NAME = 'PC'
INTERSECT
SELECT ORDER_NUMBER
FROM ORDER_ITEM
WHERE PRODUCT_NAME = 'Tablet') SET_FOUND
WHERE ON.ORDER_NUMBER = SET_FOUND.ORDER_NUMBER (+)
My proposal would be this one:
WITH t AS
(SELECT product_name, order_number
FROM order_item
WHERE product_name IN ('Mobile','PC','Tablet')
GROUP BY order_number, product_name)
SELECT order_number,
CASE WHEN COUNT(DISTINCT product_name) >= 3 THEN 'Yes' ELSE 'No' END
FROM t
JOIN order USING (order_number)
GROUP BY order_number
Is the order number an increasing sequence number? If so the one approach would be to limit data selected from the order_item, which you said is a large table, by putting condition on order_number, which you said is an indexed column. I assume last_run_date signifficantly limits number of concerned orders.
If so you can:
select min(order_number) into order_num_from from Order where order_date>last_run_date
and then make your query
SELECT order_number,(SELECT CASE WHEN COUNT(DISTINCT product_name)>=3
THEN 'Yes' ELSE 'No' END Prod_Flag
FROM order_item b
WHERE a.order_number=b.order_number
AND b.order_number> order_num_from
AND b.product_name IN ('Mobile','PC','Tablet'))
FROM order a
WHERE order_date>last_run_date;
If this runs significantly faster (I didn't see explain plan, so this is just an idea how to avoid full table scan ), put an index on order_date column and eventually make finding order_num_from into subquery to have one single query.
Generally, your query is right. As I understood, you wish to raise it's speed. If so, there are several ways you can try.
You can consider to put these tables into indexed cluster. It will store the data physically joined so querying would require less physical reads.
For this query, server should scan two tables: one for appropriate dates (eigther full table scan or index scan), other for products and joins the results by reading ORDER_NUMBER via rowid. It isn't very fast anyway. The simpliest way is to add (ORDER_DATE, ORDER_NUMBER) index for ORDERs and (ORDER_NUMBER, PRODUCT_NAME) index for ORDER_ITEMs; it will allow to use indexes only.
Maybe it would be suitable to make a fast-refreshable materialized view, something like
create materialized view as
select
a.order_date,
a.order_number,
sum(case when b.product_name = 'Mobile' then 1 else 0 end) cnt_mobiles,
sum(case when b.product_name = 'PC' then 1 else 0 end) cnt_pcs,
sum(case when b.product_name = 'Tablet' then 1 else 0 end) cnt_tablets
from
order a, order_item b
where
a.order_number = b.order_number
group by
a.order_number, a.order_date
If it would be impossible to make this fast-refreshable, you can do equal thing manually using trigger. Anyway, in this case you'll get precalculated data ready to check.

ORACLE PL/SQL: ORDER BY CASE Performance

I currently have a server side paging query as such:
SELECT * FROM (SELECT a.*, rownum rnum FROM (
( /* Insert Query Here */ ) ) a
WHERE rownum <= ((page_number + 1) * page_size))
WHERE rnum >= (((page_number + 1) * page_size)) - (page_size - 1);
The problem, however, is trying to determine what the user is sorting on as this is tied to a gridview. Currently, I'm using:
ORDER BY
CASE sort_direction
WHEN 'ASC' THEN
CASE sort_column
WHEN 'PRIMARY_KEY' THEN
primary_key
ELSE key
END
END
ASC,
CASE sort_direction
WHEN 'DESC' THEN
CASE sort_column
WHEN 'PRIMARY_KEY' THEN
primary_key
ELSE key
END
END
DESC
I'm using this on every query I stick into the server side paging scheme. The problem is that when I have a grid that has quite a few fields, the performance degrades substantially. Is there any other way to do this? Do I need to simply set less fields to allow paging?
Use Dynamic SQL and build the ORDER BY at runtime. Oracle's SQL engine will see a simple ORDER BY.
I suspect the optimizer can't figure this out, and thus can't use indices. Check what EXPLAIN PLAN says.
Obvious solution is to have your app "evaluate" that case, and send a much simpler query. I think you'll find ORDER BY primary_key ASC to be much faster.
If the primary_key and key are numbers then a descending query is the same as an ascending multiplied by -1.
order by direction * CASE sort_column
WHEN 'PRIMARY_KEY' THEN primary_key
ELSE key END
In the above, direction is an integer. Pass in 1 or -1. This is a much simpler expression and should run faster.

Can't define :joins conditions in has_many relationship?

I have a relationship table :
create_table "animal_friends", :force => true do |t|
t.integer "animal_id"
t.integer "animal_friend_id"
t.datetime "created_at"
t.datetime "updated_at"
t.integer "status_id", :default => 1
end
linking animals to others. Best way to retreive associations in SQL is :
SELECT animals.*
from animals join animal_friends as af
on animals.id =
case when af.animal_id = #{id} then af.animal_friend_id else af.animal_id end
WHERE #{id} in (af.animal_id, af.animal_friend_id)
And I can't find a way to create a proper has_many relation in rails with this. Apparently, there's no way to provide joining conditions for has_many.
I'm currently using a finder_sql :
has_many :friends, :class_name => "Animal", :finder_sql => 'SELECT animals.* from animals join animal_friends as af on animals.id = case when af.animal_id = #{id} then af.animal_friend_id else af.animal_id end ' +
'WHERE #{id} in (af.animal_id, af.animal_friend_id) and status_id = #{Status::CONFIRMED.id}'
but this method has the great disadvantage of breaking activerecord magic. For instance :
#animal.friends.first
will execute the finder_sql without limit, fetching thousands of rows, then taking the first of the array (and loosing several precious seconds / req).
I guess it's a missing feature from AR, but I'd like to be sure first :)
Thanks
You could solve this on the database level with a view, which would be the correct method anyway.
CREATE VIEW with_inverse_animal_friends (
SELECT id,
animal_id,
animal_friend_id,
created_at,
updated_at,
status_id
FROM animal_friends
UNION
SELECT id,
animal_friend_id AS animal_id,
animal_id AS animal_friend_id,
created_at,
updated_at,
status_id
FROM animal_friends
)
If you dont want to have double entries for friends with relations both ways you could do this:
CREATE VIEW unique_animal_friends (
SELECT MIN(id), animal_id, animal_friend_id, MIN(created_at), MAX(updated_at), MIN(status_id)
FROM
(SELECT id,
animal_id,
animal_friend_id,
created_at,
updated_at,
status_id
FROM animal_friends
UNION
SELECT id,
animal_friend_id AS animal_id,
animal_id AS animal_friend_id,
created_at,
updated_at,
status_id
FROM animal_friends) AS all_animal_friends
GROUP BY animal_id, animal_friend_id
)
You would need a way to decide which status_id to use in case there are two conflicting ones. I chose MIN(status_id) but that is probably not what you want.
In Rails you can do this now:
class Animal < ActiveRecord::Base
has_many :unique_animal_friends
has_many :friends, :through => :unique_animal_friends
end
class UniqueAnimalFriend < ActiveRecord::Base
belongs_to :animal
belongs_to :friend, :class_name => "Animal"
end
This is out of my head and not tested. Also, you might need some plugin for view handling in rails (like "redhillonrails-core").
There is a plugin that does what you want.
There is a post about it here.
There is an alternative here.
Both allow you do to joining conditions and are just using lazy initialization.
So you can use dynamic conditions. I find the former prettier, but you can use the latter if you don't want to install plugins.
The two ways to create many to many relationships in active record are has_and_belongs_to_many and has_many :through. This site critiques the differences between the two. You don't have to write any SQL using these methods.

Resources