Working on vertica database currently,faced some error,have no idea for the error so need some helps here :)
below are my query and expected output:
SELECT
country
, merchant
, DISTINCT(merchant)
, COUNT(*) as 'Total Transaction'
, Max(price) as 'Max_Charge'
FROM transaction_table
WHERE ("action")='CHARGE' and action_status='COMPLETED'
GROUP by(msisdn)
my table and expected output
The query does not seem to make a lot of sense, and from what I understood, it has really basic SQL shortcomings, that's why I voted you down. See the explanation below; try to follow the suggestions; and finally edit your question once you've tried something along the lines below.
Looks like going back to the documentation of (any, not just Vertica) SQL could help you a lot:
The DISTINCT keyword is only "legal" directly after SELECT or in COUNT(DISTINCT <expression>)
In a GROUP BY query, the columns in the SELECT list are either columns that will be repeated in the GROUP BY clause, or they are aggregate functions - like your MAX() and your COUNT() . GROUP BY (msisdn) when msisdn is not in the SELECT list won't help at all.
Hope these hints help ---
Good luck
Marco the Sane
Related
I have Googled for a long time, but guessing that I struggle to find the right way to ask Google for my questions. I guess my question is pretty easy to solve, just as I need to know how ;)
I just started using PowerBI, and have established a connection to a Oracle database.
My challenge is:
I need to create some kind of "join" towards multiple tables, so I get the data I need.
Example:
Table 1
Table1_Id
Table1_FirstName
Table 2
Table2_Id
Table2_Table1_Id
Table2_LastName
Table 3
Table3_Id
Table3_Table2_Id
Table3_Email
etc....
And the user might have 100 emails, so there could be multiple rows here.
--
How do I do this? I've tried with "merge"/join I think, but maybe in the wrong way, as I get sh*t load of rows in return, more than I should.
I hope I'm clear, if not, please let me know and I will try to be more clear
Brgds
Kristian
I don't use PowerBi, but - from what you said - it looks like you didn't properly join all tables and there's - somewhere - cross join which results in too many rows to be returned.
If you'd write query yourself (I presume PowerBI lets you do that in a GUI), it would be something like this:
select a.first_name,
b.last_name,
c.email
from table1 a join table2 b on b.table1_id = a.table1_id --> this
join table3 c on c.table2_id = b.table2_id --> this
I marked joins you should be having.
I am a SQL Server guy and just started working on Netezza, one thing pops up to me is a daily query to find out the size of a table filtered out by year: 2016,2015, 2014, ...
What I am using now is something like below and it works for me, but I wonder if there is a better way to do it:
select count(1)
from table
where extract(year from datacolumn) = 2016
extract is a built-in function, applying a function on a table with size like 10 billion+ is not imaginable in SQL Server to my knowledge.
Thank you for your advice.
The only problem i see with the query is the where clause which executes a function on the 'variable' side. That effectively disables zonemaps and thus forces netezza to scan all data pages, not only those with data from that year.
Instead write something like:
select count(1)
from table
where datecolumn between '2016-01-01' and '2016-12-31'
A more generic alternative is to create a 'date dimension table' with one row per day in your tables (and a couple of years into the future)
This is an example for Postgres: https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac
This enables you to write code like this:
Select count(1)
From table t join d_date d on t.datecolumn=d.date_actual
Where year_actual=2016
You may not have the generate_series() function on your system, but a 'select row_number()...' can do the same trick. A download is available here: https://www.ibm.com/developerworks/community/wikis/basic/anonymous/api/wiki/76c5f285-8577-4848-b1f3-167b8225e847/page/44d502dd-5a70-4db8-b8ee-6bbffcb32f00/attachment/6cb02340-a342-42e6-8953-aa01cbb10275/media/generate_series.tgz
A couple of further notices in 'date interval' where clauses:
Those columns are the most likely candidate for a zonemaps optimization. Add a 'organize on (datecolumn)' at the bottom of your table DDL and organize your table. That will cause netezza to move around records to pages with similar dates, and the query times will be better.
Furthermore you should ensure that the 'distribute on' clause for the table results in an even distribution across data slices of the table is big. The execution of the query will never be faster than the slowest dataslice.
I hope this helps
I have to join 3 tables to retrieve data and looks like full outer join is a potential solution but during my try it took more than a hour to execute the query .
Any alternatives would be helpful .
thank you.
Not sure what your query looks like however, add indexes on the table if these tables are newly being created.
However, answering your question using UNION ALL will be faster, as it simply passes the first SELECT statement, and then parses the second SELECT statement and adds the results to the end of the output table. Even a Normal UNION is faster than a join.
The UNION's will make better use of indexes which could result in a faster query.
I try to create a query which gives me back how many customer has been registered to our system (using the REGISTERED_ID). However when a customer registrates he can then registrate again with a different car. I want to give back the amount of registers by month. I count the X__INSDATE because basicly I can count anything, all I need is a number. The error points to the DISTINCT, I tried to use having instead of where, but I may missed something.
I use Oracle SQL Developer 4.0.0.12
SELECT
TRUNC(X__INSDATE, 'MONTH') as HONAP,
COUNT(X__INSDATE),
DISTINCT REGISTERED_ID
FROM
DATABASE.data_history
WHERE
DATABASE.data_history.X__INSDATE >= to_date('2013-JÚL. -01', 'YYYY-MON-DD')
GROUP BY TRUNC(X__INSDATE, 'MONTH') ORDER BY HONAP;
Thank you for your help!
You need to apply an aggregate function to all the clauses not on the grup by.
Try with:
SELECT
TRUNC(X__INSDATE, 'MONTH') as HONAP,
COUNT(X__INSDATE),
COUNT(DISTINCT REGISTERED_ID)
Or by grouping as well by REGISTERED_ID
such as:
select country
from table1
inner join table2 on table1.id=table2.id
where table1.name='a' and table2.name='b'
group by country
after the parse, which part will be executed first?
It looks like you want to know the execution plan chosen by Oracle. You can get that ouput from Oracle itself:
set serveroutput off
< your query with hint "/*+ gather_plan_statistics */" inserted after SELECT >
select * from table(dbms_xplan.display_cursor(null, null, 'last allstats'));
See here for an explanation how to read a query plan: http://download.oracle.com/docs/cd/E11882_01/server.112/e16638/ex_plan.htm#i16971
Be aware however that the choice of a query plan is not fixed. Oracle tries to find the currently best query plan, based on available statistics data.
There are plenty of places you can find the order in which SQL is executed:
FROM clause
WHERE clause
GROUP BY clause
HAVING clause
SELECT clause
ORDER BY clause
But note that this is the "theoretical" order - SQL engines are allowed to perform the operations in other orders, provided that the end result appears to have been produced by using the above order.
If you install the free tool SQL*Developer from Oracle, then you can click a button to get the explain plan.
A quick explanation is at http://www.seeingwithc.org/sqltuning.html