JPA #Query count AND select - spring

I have a somewhat complicated #Query in a JpaRepository.
I need to get the results of this query in two forms (but not at the same time!):
First, the client asks for a count of the number of results: SELECT COUNT(x.*) FROM my_table x ...
Then later (maybe), they want to see the actual data: SELECT x.* FROM my_table x ...
What follows (the ...) is identical for both queries. Is there any way to combine these so that I don't repeat myself?
I know I could just use the second method, and count the number of elements in the resulting List. However, this adds the overhead of actually fetching all those elements from the database.
I could put the ... in a String constant somewhere, but that kind of separates it from its context (I'd lose IntelliJ's syntax highlighting/error checking)
I can't convert it to a Criteria or Example query, because I need to use PostGIS's geography type. (And these are less readable anyway...)
Any other ideas?

If your worries is about some developer change the COUNT query and forgot to change the SELECT query too, you can create a repository integration test to guarantee the expected result between the two queries.
Another alternative is create a unit test to read the annotation content and verify if the final of these two queries are equal.

Related

Transform select statement

How can i dynamically transform an SQL-Query?
I know there is a Select.getSelect(), but how can i add fields in the select-query?
Use-case: for a Rest-Query i have a lot of paginated resources and i have an abstraction to create the paginated-query. It takes the SelectConditionStep and adds the rest, depending on additional parameters. It works really well for simple queries, but for queries containing joins a little bit of transformation of the query would required. (Mainly because i can't naively limit the number results, since the join can be a one to many relationship)
The easiest way is to keep a List<Field<?>> where you add the fields for your select() clause, and then create the Select statement only when you actually execute it, instead of passing a Select object around. Example:
List<Field<?>> fields = new ArrayList<>();
// Just some examples:
fields.addAll(getDefaultFields());
fields.addAll(getFieldsFromUI());
fields.addAll(getCalculatedFields());
// Much later on, you finally create the statement:
DSL.using(configuration)
.select(fields)
.from(...)
.fetch();

Oracle In Clause not working when using Parameter

I have a Pesky SSRS report Problem where in the main query of my report has a condition that can have more than 1000 choices and when user selects all it will fail as my backend database is Oracle. I have done some research and found a solution that would work.
Solution is
re-writing the in clause something like this
(1,ColumnName) in ((1,Searchitem1),(1,SearchItem2))
this will work however when I do this
(1,ColumnName) in ((1,:assignedValue))
and pass just one value it works. But when I pass more than one value it fails and gives me ORA-01722: Invalid number error
I have tried multiple combination of the same in clause but nothing is working
any help is appreciated...
Wild guess: your :assignedValue is a comma-separated list of numbers, and Oracle tries to parse it as a single number.
Passing multiple values as a single value for an IN query is (almost) never a good idea - either you have to use string concatenation (prone to SQL injection and terrible performance), or you have to have a fixed number of arguments to IN (which generally is not what you want).
I'd suggest you
INSERT your search items into a temporary table
use a JOIN with this search table in your SELECT

Should I apply string manipulation after or before joining tables in Oracle

I have two tables need to inner join, one table has relatively small number of records compared to the other one. I need to apply some string manipulation to the smaller table, and my question is can I apply the string function after the join, or should I apply them in a sub query and then join the sub select to the bigger table?
An example would be something like this:
Option 1:
SELECT SUBSTR("SMALL_TABLE"."COL_NAME",x,y) "NEW_COL" FROM "BIG_TABLE"
JOIN "SMALL_TABLE" ON ...
Option 2:
SELECT "NEW_COL"
FROM "BIG_TABLE"
JOIN
(
SELECT SUBSTR("SMALL_TABLE"."COL_NAME",x,y) "NEW_COL" FROM "SMALL_TABLE"
) "T"
ON ...
Which is better for performance option 1 or 2?
I am using oracle 11g.
Regardless of how you structure the query, Oracle's optimizer is free to evaluate the function before or after the join. Assuming that the string manipulation is only done as part of the projection step (i.e. it is done only in the SELECT clause and is not used as a predicate in the WHERE clause), I would expect that Oracle would apply the SUBSTR before joining the tables if you used either formulation because it would then have to apply the function to fewer rows (though it can probably treat the SUBSTR as a deterministic call and cache the results if it applies the function after the join).
As with any query optimization question, the first step is always to generate a query plan and see if the different queries actually produce different plans. I would expect the plans to be identical and, thus, the performance to be identical. But there are any number of reasons that one of the two options might produce different plans on your system given your optimizer statistics, initialization parameters, etc.
It is better to apply the operations before doing the join and then joining and querying for the final result. This is called query optimization.
By doing so for ur question you will perform lesser operations when "join"ing as u will be eliminating the useless rows beforehand.
Lots of examples here : http://beginner-sql-tutorial.com/sql-query-tuning.htm
and this is the best one I could find : http://www.cse.iitb.ac.in/~sudarsha/db-book/slide-dir/ch14.ppt‎

Sqlite view vs plain select statement performance

I have a simple table (with about 8 columns and a LOT of rows) in a SQLite database. There is a single program that runs as a service and performs selects, updates and inserts on the table quite often (approximately every 5 minutes). The selects are used only to determine which rows are to be updated, and they are based on a column that holds boolean values (probably translated to integer internally by SQLite).
There is also a web application that performs selects (always with a GROUP BY clause) whenever a web user wishes to view part of the data.
There are two ways to ask for data through the web application: (a) predefined filters (i.e. the where clause has specific conditions on 3 specific columns) an (b) custom filters (i.e. the user chooses the values for the conditions, but the columns participating in the where clause are the same as in (a)). As mentioned, in both cases there is a GROUP BY operation.
I am wondering whether using a view or a custom function might increase the performance. Currently, a "custom" select may take more than 30 seconds to complete - and that's before any data has been sent back to the user.
EDIT:
Using EXPLAIN QUERY PLAN on a "predefined" select statement yields only one row:
0|0|TABLE mytable
Using EXPLAIN on the same query, yields the following:
0|OpenVirtual|1|4|keyinfo(2,-BINARY,BINARY)
1|OpenVirtual|2|3|keyinfo(1,BINARY)
2|MemInt|0|5|
3|MemInt|0|4|
4|Goto|0|27|
5|MemInt|1|5|
6|Return|0|0|
7|IfMemPos|4|9|
8|Return|0|0|
9|AggFinal|0|0|count(0)
10|AggFinal|2|1|sum(1)
11|MemLoad|0|0|
12|MemLoad|1|0|
13|MemLoad|2|0|
14|MakeRecord|3|0|
15|MemLoad|0|0|
16|MemLoad|1|0|
17|Sequence|1|0|
18|Pull|3|0|
19|MakeRecord|4|0|
20|IdxInsert|1|0|
21|Return|0|0|
22|MemNull|1|0|
23|MemNull|3|0|
24|MemNull|0|0|
25|MemNull|2|0|
26|Return|0|0|
27|Gosub|0|22|
28|Goto|0|82|
29|Integer|0|0|
30|OpenRead|0|2|
31|SetNumColumns|0|9|
32|Rewind|0|48|
33|Column|0|8|
34|String8|0|0|123456789
35|Le|356|39|collseq(BINARY)
36|Column|0|3|
37|Integer|180|0|
38|Gt|100|42|collseq(BINARY)
39|Column|0|7|
40|Integer|1|0|
41|Ne|356|47|collseq(BINARY)
42|Column|0|6|
43|Sequence|2|0|
44|Column|0|3|
45|MakeRecord|3|0|
46|IdxInsert|2|0|
47|Next|0|33|
48|Close|0|0|
49|Sort|2|69|
50|Column|2|0|
51|MemStore|7|0|
52|MemLoad|6|0|
53|Eq|512|58|collseq(BINARY)
54|MemMove|6|7|
55|Gosub|0|7|
56|IfMemPos|5|69|
57|Gosub|0|22|
58|AggStep|0|0|count(0)
59|Column|2|2|
60|Integer|30|0|
61|Add|0|0|
62|ToReal|0|0|
63|AggStep|2|1|sum(1)
64|Column|2|0|
65|MemStore|1|1|
66|MemInt|1|4|
67|Next|2|50|
68|Gosub|0|7|
69|OpenPseudo|3|0|
70|SetNumColumns|3|3|
71|Sort|1|80|
72|Integer|1|0|
73|Column|1|3|
74|Insert|3|0|
75|Column|3|0|
76|Column|3|1|
77|Column|3|2|
78|Callback|3|0|
79|Next|1|72|
80|Close|3|0|
81|Halt|0|0|
82|Transaction|0|0|
83|VerifyCookie|0|1|
84|Goto|0|29|
85|Noop|0|0|
The select I used was as the following
SELECT
COUNT(*) as number,
field1,
SUM(CAST(filter2 +30 AS float)) as column2
FROM
mytable
WHERE
(filter1 > '123456789' AND filter2 > 180)
OR filter3=1
GROUP BY
field1
ORDER BY
number DESC, field1;
Whenever you're going to be doing comparisons of a non-primary-key field, it's a good design idea to add an index into to the field(s). Too many, however, can cause INSERTs to crawl, so plan accordingly.
Also, if you have simple fields such as ones that only hold a boolean value, you may want to consider declaring it as an INTEGER instead of whatever you declared it as. Declaring it as any type not specifically defined by SQLite will cause it to default to a NUMERIC type which will take longer to compare values because it will store it internally as a double and will use the floating-point math processor instead of the integer math processor.
IMO, the GROUP BY sorting directive is sometimes a dead giveaway to an unoptimized query; its methodology involves eliminating redundant data which could have been eliminated beforehand if it hadn't been pulled out of the database to begin with.
EDIT:
I saw your query and saw there are some simple things you can do to optimize it:
SUM(CAST(filter2 +30 AS float)) is inefficient; why are you casting it as a float? Why not just SUM it then add 30 * the COUNT?
filter1 > '123456789' - Why the string comparison? Why not just use integer comparison?

How do I sort, group a query properly that returns a tuple of an orm object and a custom column?

I am looking for a way to have a query that returns a tuple first sorted by a column, then grouped by another (in that order). Simply .sort_by().group_by() didn't appear to work. Now I tried the following, which made the return value go wrong (I just got the orm object, not the initial tuple), but read for yourself in detail:
Base scenario:
There is a query which queries for test orm objects linked from the test3 table through foreign keys.
This query also returns a column named linked that either contains true or false. It is originally ungrouped.
my_query = session.query(test_orm_object)
... lots of stuff like joining various things ...
add_column(..condition that either puts 'true' or 'false' into the column..)
So the original return value is a tuple (the orm object, and additionally the true/false column).
Now this query should be grouped for the test orm objects (so the test.id column), but before that, sorted by the linked column so entries with true are preferred during the grouping.
Assuming the current unsorted, ungrouped query is stored in my_query, my approach to achieve this was this:
# Get a sorted subquery
tmpquery = my_query.order_by(desc('linked')).subquery()
# Read the column out of the sub query
my_query = session.query(tmpquery).add_columns(getattr(tmpquery.c,'linked').label('linked'))
my_query = my_query.group_by(getattr(tmpquery.c, 'id')) # Group objects
The resulting SQL query when running this is (it looks fine to me btw - the subquery 'anon_1' is inside itself properly sorted, then fetched and its id aswell as the 'linked' column is extracted (amongst a few other columns SQLAlchemy wants to have apparently), and the result is properly grouped):
SELECT anon_1.id AS anon_1_id, anon_1.name AS anon_1_name, anon_1.fk_test3 AS anon_1_fk_test3, anon_1.linked AS anon_1_linked, anon_1.linked AS linked
FROM (
SELECT test.id AS id, test.name AS name, test.fk_test3 AS fk_test3, CASE WHEN (anon_2.id = 87799534) THEN 'true' ELSE 'false' END AS linked
FROM test LEFT OUTER JOIN (SELECT test3.id AS id, test3.fk_testvalue AS fk_testvalue
FROM test3)
AS anon_2 ON anon_2.fk_testvalue = test.id ORDER BY linked DESC
)
AS anon_1 GROUP BY anon_1.id
I tested it in phpmyadmin, where it gave me, as expected, the id column (for the orm object id), then the additional columns SQL_Alchemy seems to want there, and the linked column. So far, so good.
Now my expected return values would be, as they were from the original unsorted, ungrouped query:
A tuple: 'test' orm object (anon_1.id column), 'true'/'false' value (linked column)
The actual return value of the new sorted/grouped query is however (the original query DOES indeed return a touple before the code above is applied):
'test' orm object only
Why is that so and how can I fix it?
Excuse me if that approach turns out to be somewhat flawed.
What I actually want is, have the original query simply sorted, then grouped without touching the return values. As you can see above, my attempt was to 'restore' the additional return value again, but that didn't work. What should I do instead, if this approach is fundamentally wrong?
Explanation for the subquery use:
The point of the whole subquery is to force SQLAlchemy to execute this query separately as a first step.
I want to order the results first, and then group the ordered results. That seems to be hard to do properly in one step (when trying manually with SQL I had issues combining order and group by in one step as I wanted).
Therefore I don't simply order, group, but I order first, then subquery it to enforce that the order step is actually completed first, and then I group it.
Judging from manual PHPMyAdmin tests with the generated SQL, this seems to work fine. The actual problem is that the original query (which is now wrapped as the subquery you were confused about) had an added column, and now by wrapping it up as a subquery, that column is gone from the overall result. And my attempt to readd it to the outer wrapping failed.
It would be much better if you provided examples. I don't know if these columns are in separate tables or what not. Just looking at your first paragraph, I would do something like this:
a = session.query(Table1, Table2.column).\
join(Table2, Table1.foreign_key == Table2.id).\
filter(...).group_by(Table2.id).order_by(Table1.property.desc()).all()
I don't know exactly what you're trying to do since I need to look at your actual model, but it should look something like this with maybe the tables/objs flipped around or more filters.

Resources