Distinct function somehow not giving back all types - distinct

Feel free to check it on my latest project: http://arda-maps.org:2480 arda arda as login.
Now check select distinct(type) from Location
Here you get 8 records (River,Lake,Region,City,Island,House,Mountain,Hill). But actually there are way more...
To show you that the distinct somehow is not giving back all distinct we search for one specific vertex with another type:
select * from Location where name = "Citadel of Gondor"
So am I using distinct in a wrong way. Or what could be the reason for the incomplete result list?

Indeed, the order the query parts are applied in the result set is not very obvious. You typed:
select distinct(type) from Location
and the orient studio applies a limit 20 (unless you change that or include a different limit in your query). So the query that finally runs is
select distinct(type) from Location limit 20
Now, this could mean one of the two:
Find at most 20 locations and give me their distinct types
Find at most 20 distinct types of all locations
Obviously, what you expect is the 2nd and what happens is the 1st. The solution is to have inner queries so limit will explicitly apply on the outer query:
select from (select distinct(type) from Location) limit 20
This now clearly says, find me the distinct types of all locations and return me at most 20 (which is the same as (2))

Weird. But if you try:
select distinct(type) from Location limit -1
you'll have all 64 distinct entrances.

Related

Oracle Sql group function is not allowed here

I need someone who can explain me about "group function is not allowed here" because I don't understand it and I would like to understand it.
I have to get the product name and the unit price of the products that have a price above the average
I initially tried to use this, but oracle quickly told me that it was wrong.
SELECT productname,unitprice
FROM products
WHERE unitprice>(AVG(unitprice));
search for information and found that I could get it this way:
SELECT productname,unitprice FROM products
WHERE unitprice > (SELECT AVG(unitprice) FROM products);
What I want to know is why do you put two select?
What does group function is not allowed here mean?
More than once I have encountered this error and I would like to be able to understand what to do when it appears
Thank you very much for your time
The phrase "group function not allowed here" is referring to anything that is in some way an "aggregation" of data, eg SUM, MIN, MAX, etc et. These functions must operate on a set of rows, and to operate on a set of rows you need to do a SELECT statement. (I'm leaving out UPDATE/DELETE here)
If this was not the case, you would end up with ambiguities, for example, lets say we allowed this:
select *
from products
where region = 'USA'
and avg(price) > 10
Does this mean you want the average prices across all products, or just the average price for those products in the USA? The syntax is no longer deterministic.
Here's another option:
SELECT *
FROM (
SELECT productname,unitprice,AVG(unitprice) OVER (PARTITION BY 1) avg_price
FROM products)
WHERE unitprice > avg_price
The reason your original SQL doesn't work is because you didn't tell Oracle how to compute the average. What table should it find it in? What rows should it include? What, if any, grouping do you wish to apply? None of that is communicated with "WHERE unitprice>(AVG(unitprice))".
Now, as a human, I can make a pretty educated guess that you intend the averaging to happen over the same set of rows you select from the main query, with the same granularity (no grouping). We can accomplish that either by using a sub-query to make a second pass on the table, as your second SQL did, or the newer windowing capabilities of aggregate functions to internally make a second pass on your query block results, as I did in my answer. Using the OVER clause, you can tell Oracle exactly what rows to include (ROWS BETWEEN ...) and how to group it (PARTITION BY...).

Oracle duplicate field but still correct

So i built a query for my leadership team that was correct, but i dont understand why oracle gave me the correct answer.
i have 3 tables that i needed to get data out of in order to get the total billed amount.
Here is my query (please forgive me, my 2nd post and im not sure how to properly format my querys)
select b.total_amount_billed as billed from t1.billing_information b
where b.billing_no in
(select h.billing_no
from t1.res_history h where h.res_seq_no in
(Select r.reservation_seq_no
from t1.res r where r.customer_order_no in ('THO40000') ))
so in the deepest select, i take the the sequence number where my customer order number was THO40000, this query returns 2 sequence numbers.
the second sub query returns the billing numbers for my order from the history table where the sequence number match, in this case for this order they both use the same billing number, 312000.
the final select, returns my total billed amount where it matched my billing numbers it found, in my case $110.
the query works, but what i dont understand is why is it not duplicated? why does it not return 110, for each time it found 312000, giving me 2 records of 110? the billing number is a PK in the billing_information table. im not sure why it worked without me using the distinct keyword on the query for the billing number.
anyway thanks for the help, ill do my best to explain if you have questions!
You are being saved because you used IN to get the billing_no values to use, rather than an INNER JOIN between the two tables using b.billing_no = h.billing_no. A join would have duplicated the records, but your IN query is essentially this:
select b.total_amount_billed as billed
from t1.billing_information b
where b.billing_no in (312000, 312000);
If there is a single row in billing_information having billing_no equal to 312000, it is in the list, so the WHERE condition is true and it is included in the results. The fact that it is in the list twice doesn't make the IN condition "more true".

Is there a way to randomize search results (record ids) with Sphinx?

I have a complex SphinxQL query which, at the end, orders results by a specific field, Preferred, so that all records with that indexed value of Preferred=1 come before all records w Preferred=0. I also order by weight() so basically I end up with:
Select * from idx_X where MATCH('various parameters') ORDER by Preferred DESC,Weight() Desc
The problem is that, though Preferred records come first I end up with records sorted by ID which puts results from one field, Vendor, in blocks so for instance I get:
Beta Shipping
Beta Shipping
Beta Shipping
Acme Widgets
Acme Widgets
Acme Widgets
Acme Widgets
Acme Widgets
Which doesn't serve my purposes in this case well (often one 'Vendor' will have 1000 results)
So I'm looking to essentially do:
ORDER BY Preferred DESC,weight() DESC,ID RANDOM
So that after getting to Preferred Vendors whose weight is (e.g.) 100, I will get random Vendors vs blocks of them.
Update: Though I did find what appears to be a possible answer in another Stackoveflow Question
The issue is it seems to require the SPH_SORT_EXTENDED and I am forced to use SPH_RANK_PROXIMITY (ranker=proximity) and I am unclear if I can combine ranking and sorting.
Update 2: If I remove my existing two-level Order and just do Order by Rand() it indeed returns random IDs. However I cannot add Rand() after Order by Preferred DESC,Weight() DESC or I get the following error:
1064 - sphinxql: syntax error, unexpected '(', expecting $end near '()
Sadly yes, RAND() only works as a single sort order expression, but it DOES work as a select function....
Select *, RAND() AS r from idx_X where MATCH('various parameters')
ORDER by Preferred DESC,Weight() Desc, r DESC
Or if want a more consistent ordering, but still mixed, can for example use CRC32() function on a string atribute
Select *, CRC32(title) AS r from idx_X where MATCH('various parameters')
ORDER by Preferred DESC,Weight() Desc, r DESC
Can also just limit results to a few per vendor (vendor will need to be an attribute)
Select * from idx_X where MATCH('various parameters')
GROUP 3 BY vendor_id ORDER by Preferred DESC,Weight() Desc
Group by N is a little known by very useful sphinx feature.

Access - Get the total of a column in a select query

I have an Access database set up that takes a bunch of raw data, splits things up in different 'select' queries and pipes the results into various CSV files, where a dashboard set up in Excel will pick it up.
There's some data that I'm trying to calculate in Access, namely I have a quantity field, and I need to calculate the percentage of for each record. In other words, quantity / total of quantity.
Using my rather limited Access abilities, I tried the following query:
SELECT [Sales].*, [Quantity] / Sum([Quantity]) AS QuantityPercent FROM [Sales];
Which comes up with an error:
Your query does not include the specified expression 'company_name' as part of an aggregate function.
Company_name is the first field of the table, and after some Googling and Binging, I'm still quite confused as to what it means in this context.
To sum it up, my question is this: Is there a way to calculate data based off the total of a column/field?
The easy method is to use DSum:
SELECT
[Sales].*,
[Quantity] / DSum("[Quantity]", "[Sales]") AS QuantityPercent
FROM
[Sales];

How to automatically exclude items already visited in recommendation algorithm?

I'm now using slope One for recommendation.
How to exclude visited items from result?
I can't do it simply by not in (visited_id_list) to filter those visited ones because it will have scalability issue for an old user!
I've come up with a solution without not in:
select b.property,count(b.id) total from propertyviews a
left join propertyviews b on b.cookie=a.cookie
left join propertyviews c on c.cookie=0 and b.property=c.property
where a.property=1 and a.cookie!=0 and c.property is null
group by b.property order by total;
Seriously, if you are using MySQL, look at 12.2.10.3. Subqueries with ANY, IN, and SOME
For example:
SELECT s1 FROM t1 WHERE s1 IN (SELECT s1 FROM t2);
This is available in all versions of MySQL I looked at, albeit that the section numbers in the manual are different in the older versions.
EDIT in response to the OP's comment:
OK ... how about something like SELECT id FROM t1 WHERE ... AND NOT id IN (SELECT seen_id FROM user_seen_ids where user = ? ). This form avoids having to pass thousands of ids in the SQL statement.
If you want to entirely avoid the "test against a list of ids" part of the query, I don't see how it is even possible in theory, let alone how you would implement it.

Resources