I have Googled for a long time, but guessing that I struggle to find the right way to ask Google for my questions. I guess my question is pretty easy to solve, just as I need to know how ;)
I just started using PowerBI, and have established a connection to a Oracle database.
My challenge is:
I need to create some kind of "join" towards multiple tables, so I get the data I need.
Example:
Table 1
Table1_Id
Table1_FirstName
Table 2
Table2_Id
Table2_Table1_Id
Table2_LastName
Table 3
Table3_Id
Table3_Table2_Id
Table3_Email
etc....
And the user might have 100 emails, so there could be multiple rows here.
--
How do I do this? I've tried with "merge"/join I think, but maybe in the wrong way, as I get sh*t load of rows in return, more than I should.
I hope I'm clear, if not, please let me know and I will try to be more clear
Brgds
Kristian
I don't use PowerBi, but - from what you said - it looks like you didn't properly join all tables and there's - somewhere - cross join which results in too many rows to be returned.
If you'd write query yourself (I presume PowerBI lets you do that in a GUI), it would be something like this:
select a.first_name,
b.last_name,
c.email
from table1 a join table2 b on b.table1_id = a.table1_id --> this
join table3 c on c.table2_id = b.table2_id --> this
I marked joins you should be having.
I am a SQL Server guy and just started working on Netezza, one thing pops up to me is a daily query to find out the size of a table filtered out by year: 2016,2015, 2014, ...
What I am using now is something like below and it works for me, but I wonder if there is a better way to do it:
select count(1)
from table
where extract(year from datacolumn) = 2016
extract is a built-in function, applying a function on a table with size like 10 billion+ is not imaginable in SQL Server to my knowledge.
Thank you for your advice.
The only problem i see with the query is the where clause which executes a function on the 'variable' side. That effectively disables zonemaps and thus forces netezza to scan all data pages, not only those with data from that year.
Instead write something like:
select count(1)
from table
where datecolumn between '2016-01-01' and '2016-12-31'
A more generic alternative is to create a 'date dimension table' with one row per day in your tables (and a couple of years into the future)
This is an example for Postgres: https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac
This enables you to write code like this:
Select count(1)
From table t join d_date d on t.datecolumn=d.date_actual
Where year_actual=2016
You may not have the generate_series() function on your system, but a 'select row_number()...' can do the same trick. A download is available here: https://www.ibm.com/developerworks/community/wikis/basic/anonymous/api/wiki/76c5f285-8577-4848-b1f3-167b8225e847/page/44d502dd-5a70-4db8-b8ee-6bbffcb32f00/attachment/6cb02340-a342-42e6-8953-aa01cbb10275/media/generate_series.tgz
A couple of further notices in 'date interval' where clauses:
Those columns are the most likely candidate for a zonemaps optimization. Add a 'organize on (datecolumn)' at the bottom of your table DDL and organize your table. That will cause netezza to move around records to pages with similar dates, and the query times will be better.
Furthermore you should ensure that the 'distribute on' clause for the table results in an even distribution across data slices of the table is big. The execution of the query will never be faster than the slowest dataslice.
I hope this helps
I'm trying to get all the counts of tables within a specific schema (or owner). I currently am using the code at the bottom, which works, but is really slow. I know of the systems tables dba_tables and all_tables but we don't have refreshed stats within the past 1.5 years.
Are there other alternatives to this? I've tried looking at all the system tables, but none seem to have updated stats.
select
table_name,
owner as schemaname,
to_number(extractvalue(xmltype(dbms_xmlgen.getxml('select count(*) c from '||owner||'.'||table_name)),'/ROWSET/ROW/C')) as cnt
from all_tables
where lower(owner) = 'ownername'
You will have to count the rows in each table. Stats should not be used as an accurate measure of the row count. However, the fact that you have not refreshed your stats for 1.5 years is quite concerning, unless you have a database that has not had data added in that period. I suspect that you next SO post will be that of a performance problem ;)
In my company's application there is a query in oracle using parallel execution (configured to 4 servers), it wasn't me who built it, but the developer put it that way for performance.
The query makes joins between views and tables and the weirdest thing is: sometimes it returns 11k results (incorrect), sometimes 27k results (correct).
After much research I found out that if I removed this parallel thing, it always returns the correct number: 27k. And if I increase the number of server to 6 or 7, it always returns the incorrect number: 11k.
The layout of the query is like this:
SELECT /*+ PARALLEL(NAME, 4) */ * FROM(
SELECT DISTINCT COLUMNS
FROM VIEW
JOIN TABLE1 ON (....)
JOIN TABLE2 ON (....)
JOIN TABLE3 ON (....)
ORDER BY 3
) NAME
Anyone has any idea why? I don't know much about this subject.
I'm trying to cope with data densification for reporting purposes. I created two dimension tables (time & skills) and one data table (calls). Now since during certain time there are no calls in the data table, I will not get a time series including all the days. I now have studied many samples in the Internet how to cope with data densification and came up the the solution below.
Query works as intended, just it takes quite long and I have the feeling it is quite inefficient. Could you please advice me how to speed up query execution time?
Thank you and best regards,
Alex
SELECT DISTINCT
DAY_ID,
DAY_SHORT,
WEEK_ID,
MONTH_ID,
QUARTER_ID,
YEAR_ID,
AREA,
FIRMA,
PRODUCT,
PRODUCT_FAMILY,
PRODUCT_WFM,
LANGUAGE,
NVL(NCO,0) NCO,
NVL(NCH,0) NCH,
NVL(NCH60,0) NCH60,
NVL(LOST,0) LOST
FROM (
SELECT
DS.AREA,
DS.FIRMA,
DS.PRODUCT,
DS.PRODUCT_FAMILY,
DS.PRODUCT_WFM,
DS.LANGUAGE,
SUM(NVL(CH.HANDLED,0)+NVL(CH.LOST,0)) AS NCO,
SUM(CH.HANDLED) AS NCH,
SUM(CH.HANDLED_IN_SL) AS NCH60,
SUM(CH.LOST) AS LOST,
CH.DELIVER_DATE,
CH.SKILL_NAME
FROM
WFM.WFM_TBL_DIMENSION_SKILL DS
LEFT JOIN
OPS.VW_CALL_HISTORY CH
ON
DS.SPLIT_NAME=CH.SKILL_NAME
GROUP BY
DS.AREA,
DS.FIRMA,
DS.PRODUCT,
DS.PRODUCT_FAMILY,
DS.PRODUCT_WFM,
DS.LANGUAGE,
CH.DELIVER_DATE,
CH.SKILL_NAME
) temp_values
PARTITION BY
(
temp_values.AREA,
temp_values.FIRMA,
temp_values.PRODUCT,
temp_values.PRODUCT_FAMILY,
temp_values.PRODUCT_WFM,
temp_values.LANGUAGE,
temp_values.DELIVER_DATE,
temp_values.SKILL_NAME
)
RIGHT OUTER JOIN (
SELECT
DAY_ID,
DAY_SHORT,
WEEK_ID,
MONTH_ID,
QUARTER_ID,
YEAR_ID
FROM
WFM.WFM_TBL_DIMENSION_TIME
WHERE
DAY_ID BETWEEN(SELECT MIN(DELIVER_DATE) FROM OPS.VW_CALL_HISTORY) and TRUNC(sysdate-1)
) temp_time
ON
temp_values.DELIVER_DATE=temp_time.DAY_ID
Have a look at the execution plan and check which steps take very long. Use EXPLAIN PLAN to get it. Look for full table scans, see if indexes could help. Make sure you have up-to-date stats on the tables.
Since you are talking about dimension tables, this code is assumed to be from a data warehousing database. If it is, do you use partitions? Parallel DML? Are you using EE?
I reduced the arguments in PARTITION BY () to a single primary key (temp_values.SKILL_NAME) and joined the missing information from the skill dimension with a LEFT OUTER JOIN at the end of the above described query. In that way no more equal duplications are produced which leds me reduce SELECT DISTINCT to SELECT.
Additionally I added foreign & primary keys and let the query run in parallel mode.
It helps me to reduce execution time by over 80%, which is sufficient. Thanks guys!