I have two differente SELECTs with different conditions but the same columns.
Imagin that I have this:
So what I want to do is to substract the amount of the first table to the amount of the second table (where both IDs match and the date differs by one month) to know the delta between them.
This is a very simple explanation, but the data model and the extraction is pretty difficult, so this is the only way I can afford this problem.
You can use the left join as follows:
select s1.id, s1.date d1, s2.date d2, s1.value - s2.value as delta
from subquery1 s1
left join subquery2 s2
on s1.id = s2.id
and add_months(s1.date,-1) = s2.date
Assuming date difference can be either ways
select t1.id,t2.id, T1.DT, t2.dt, t1.amt,t2.amt, t1.amt-t2.amt diff
from t1, t2
where t1.id=t2.id
and abs(MONTHS_BETWEEN(t1.dt, t2.dt)) = 1
Related
We have almost 1B records in a replicated merge tree table.
The primary key is a,b,c
Our App keeps writing into this table with every user action. (we accumulate almost a million records per hour)
We append (store) the latest timestamp (updated_at) for a given unique combination of (a,b)
The key requirement is to provide a roll-up against the latest timestamp for a given combination of a,b,c
Currently, we are processing the queries as
select a,b,c, sum(x), sum(y)...etc
from table_1
where (a,b,updated_at) in (select a,b,max(updated_at) from table_1 group by a,b)
and c in (...)
group by a,b,c
clarification on the sub-query
(select a,b,max(updated_at) from table_1 group by a,b)
^ This part is for illustration only.. our app writes latest updated_at for every a,b implying that the clause shown above is more like
(select a,b,updated_at from tab_1_summary)
[where tab_1_summary has latest record for a given a,b]
Note: We have to keep the grouping criteria as-is.
The table is structured with partition (c) order by (a, b, updated_at)
Question is, is there a way to write a better query. (that can returns results faster..we are required to shave off few seconds from the overall processing)
FYI: We toyed working with Materialized View ReplicatedReplacingMergeTree. But, given the size of this table, and constant inserts + the FINAL clause doesn't necessarily work well as compared to the query above.
Thanks in advance!
Just for test try to use join instead of tuple in (tuples):
select t.a, t.b, t.c, sum(x), sum(y)...etc
from table_1 AS t inner join tab_1_summary using (a, b, updated_at)
where c in (...)
group by t.a, t.b, t.c
Consider using AggregatingMergeTree to pre-calculate result metrics:
CREATE MATERIALIZED VIEW table_1_mv
ENGINE = AggregatingMergeTree()
PARTITION BY toYYYYMM(updated_at)
ORDER BY (updated_at, a, b, c)
AS SELECT
updated_at,
a,b,c,
sum(x) AS x, /* see [SimpleAggregateFunction data type](https://clickhouse.tech/docs/en/sql-reference/data-types/simpleaggregatefunction/) */
sum(y) AS y,
/* For non-simple functions should be used [AggregateFunction data type](https://clickhouse.tech/docs/en/sql-reference/data-types/aggregatefunction/). */
// etc..
FROM table_1
GROUP BY updated_at, a, b, c;
And use this way to get result:
select a,b,c, sum(x), sum(y)...etc
from table_1_mv
where (updated_at,a,b) in (select updated_at,a,b from tab_1_summary)
and c in (...)
group by a,b,c
I have table[Table 1] having three columns
OrganizationName, FieldName, Acres having data as follows
organizationname fieldname Acres
ABC |F1 |0.96
ABC |F1 |0.96
ABC |F1 |0.64
I want to calculate the sum of Distinct values of Acres
(eg: 0.96+0.64) in DAX.
One of the problems with doing what you want is that many measures rely on filters and not actual table expressions. So, getting a distinct list of values and then filtering the table by those values, just gives you the whole table back.
The iterator functions are handy and operate on table expressions, so try SUMX
TotalDistinctAcreage = SUMX(DISTINCT(Table1[Acres]),[Acres])
This will generate a table that is one column containing only the distinct values for Acres, and then add them up. Note that this is only looking at the Acres column, so if different fields and organizations had the same acreage -- then that acreage would still only be counted once in this sum.
If instead you want to add up the acreage simply on distinct rows, then just make a small change:
TotalAcreageOnDistinctRows = SUMX(DISTINCT(Table1),[Acres])
Hope it helps.
Ok, you added these requirements:
Thank You. :) However, I want to add Distinct values of Acres for a
Particular Fieldname. Is this possible? – Pooja 3 hours ago
The easiest way really is just to go ahead and slice or filter the original measure that I gave you. But if you have to apply the filter context in DAX, you can do it like this:
Measure =
SUMX(
FILTER(
SUMMARIZE( Table1, [FieldName], [Value] )
, [FieldName] = "<put the name of your specific field here"
)
, [Value]
)
If I execute both query individually it is not taking 4sec to get data, but when combine both I see query is deadslow. Any help much appreciated
Query1:
Select Med_Number,Med_Code,Member_Name,DOB FROM Med
WHERE Med.Med_Code=:Med_Code
Query2:
Select Red_Number,Red_Name,Red_Code FROM Red
WHERE Red.Red_Code =:Red_Code
Final One:Im passing one value at a time
Select Med_Number,Member_Name,Red_Number,Red_Name FROM Med M
LEFT JOIN Red R ON M.Med_Number=R.Red_Number
Where (Med.Med_Code=:Med_Code) OR (Red.Red_Code=:Red_Code)
If you look at the execution plan for all 3 statements, you'll figure it out. If you're not interested in figuring it out and you must execute only one query, then you can execute this, using src to determine which rows below to which row source assuming you have to know the difference and assuming the numbers and names are suitably equivalent data types:
Select 1 src, Med_Number,Member_Name,DOB
FROM Med
WHERE Med.Med_Number=:Med_Number
UNION ALL
Select 2 src, Red_Number,Red_Name, null
FROM Red
WHERE Red.Red_Number=:Red_Number
Of course, if the data types are equivalent and DOB is not allowed to be null, then this would suffice
Select Med_Number,Member_Name,DOB
FROM Med
WHERE Med.Med_Number=:Med_Number
UNION ALL
Select Red_Number,Red_Name, null
FROM Red
WHERE Red.Red_Number=:Red_Number
As you join your two tables on m.Med_number = R.Red_number, you don't need 2 parameters.
Select Med_Number,Member_Name,Red_Number,Red_Name FROM Med M
LEFT JOIN Red R ON M.Med_Number=R.Red_Number
Where M.Med_Number=:Number;
I'm trying to get an count based on two dates and I'm not sure how it should look in a query. I have two date fields; I want to get a count based on those dates.
<cfquery>
SELECT COUNT(*)
FROM Table1
Where month of date1 is one month less than month of date2
</cfquery>
Assuming Table1 is your original query, you can accomplish your goal as follows.
Step 1 - Use QueryAddColumn twice to add two empty columns.
Step 2 - Loop through your query and populate these two columns with numbers. One will represent date1 and the other will represent date2. It's not quite as simple as putting in the month numbers because you have to account for the year as well.
Step 3 - Write your Q of Q with a filter resembling this:
where NewColumn1 - NewColumn2 = 1
I have a table created. With one column named states and another column called land area. I am using oracle 11g. I have looked at various questions on here and cannot find a solution. Here is what I have tried so far:
SELECT LandAreas, State
FROM ( SELECT LandAreas, State, DENSE_RANK() OVER (ORDER BY State DESC) sal_dense_rank
FROM Map )
WHERE sal_dense_rank >= 5;
This does not provide the top 5 land areas as far as number wise.
I have also tried this one but no go either:
SELECT * FROM Map order by State desc)
where rownum < 5;
Anyone have any suggestions to get me on the right track??
Here is a samle of the table
states land areas
michagan 15000
florida 25000
tennessee 10000
alabama 80000
new york 150000
california 20000
oregon 5000
texas 6000
utah 3000
nebraska 1000
Desired output from query:
States land area
new york 150000
alabama 80000
florida 25000
california 20000
Try:
Select * from
(SELECT State, LandAreas FROM Map ORDER BY LandAreas DESC)
where rownum < 6
Link to Fiddle
Use a HAVING clause and count the number state states larger:
SELECT m.state, m.landArea
FROM Map m
LEFT JOIN Map m2 on m2.landArea > m.landArea
GROUP BY m.state, m.landArea
HAVING count(*) < 5
ORDER BY m.landArea DESC
See SQLFiddle
This joins each state to every state whose area is greater, then uses a HAVING clause to return only those states where the number of larger states was less than 5.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
The left join is needed for the case of the largest state, which has no other larger state to join to.
The ORDER BY is optional.
Try something like this
select m.states,m.landarea
from map m
where (select count(‘x’) from map m2 where m2.landarea > m.landarea)<=5
order by m.landarea
There are two bloomers in your posted code.
You need to use landarea in the DENSE_RANK() call. At the moment you're ordering the states in reverse alphabetical order.
Your filter in the outer query is the wrong way around: you're excluding the top four results.
Here is what you need ...
SELECT LandArea, State
FROM ( SELECT LandArea
, State
, DENSE_RANK() OVER (ORDER BY landarea DESC) as area_dr
FROM Maps )
WHERE area_dr <= 5
order by area_dr;
... and here is the SQL Fiddle to prove it. (I'm going with the statement in the question that you want the top 5 biggest states and ignoring the fact that your desired result set has only four rows. But adjust the outer filter as you will).
There are three different functions for deriving top-N result sets: DENSE_RANK, RANK and ROW_NUMBER.
Using ROW_NUMBER will always guarantee you 5 rows in the result set, but you may get the wrong result if there are several states with the same land area (unlikely in this case, but other data sets will produce such clashes). So: 1,2,3,4,5
The difference between RANK and DENSE_RANK is how they handle ties. DENSE_RANK always produces a series of consecutive numbers, regardless of how many rows there are in each rank. So: 1,2,2,3,3,3,4,5
RANK on the other hand will produce a sparse series if a given rank has more than one hit. So: 1,2,2,4,4,4.
Note that each of the example result sets has a different number of rows. Which one is correct? It depends on the precise question you want to ask.
Using a sorted sub-query with the ROWNUM pseudo-column will work like the ROW_NUMBER function, but I prefer using ROW_NUMBER because it is more powerful and more error-proof.