i have table:
how do i select ID form this table when value after less than previous value (expected result is ID = B2 and C1 ). Thanks you
You can use window function lead for to get the last row's value and check if current row's value is lesser than that. You can also use lead in similar way to get the value from next row.
select distinct ID
from (
select t.*,
lag(value) over (
partition by ID order by time
) as last_value
from your_table t
) t
where value < last_value;
Related
This is question in Oracle views.I have a table with Emp_id,Start_Period and Key. Sample data is given in Descending order of start period with 201909 on top. Need to generate a column named Key_order. (Finally I am planning to create a view with all 4 columns.)
With the sample data as shown. In the sorted list with Start_period what ever comes in first position with number 1 and then on, when the Key changes order has to increment by one.
That is in row 1 and 2 key is same and order is 1. In row 3 SCD changed to ABC so order has to increment by 1 so order value is 2. 4th position key changes and order becomes 3.
See in 7th and 8th position value is same so order remains 6 for both. I am trying to do this inside a view. Tried RANK() but it is sorting column Key and giving order based on that.
Please help.Sample Data
Set a one in each line that has a different key than the line before. Use LAG for this. Then build a running total of these ones with SUM OVER.
select
emp_id, start_period, key,
sum(chg) over (partition by emp_id order by start_period desc) as key_order
from
(
select
emp_id, start_period, key,
case when key = lag(key) over (partition by emp_id order by start_period desc)
then 0 else 1 end as chg
from mytable
)
order by emp_id, start_period desc;
Am trying to list top 3 records from atable based on some amount stored in a column FTE_TMUSD which is of varchar datatype
below is the query i tried
SELECT *FROM
(
SELECT * FROM FSE_TM_ENTRY
ORDER BY FTE_TMUSD desc
)
WHERE rownum <= 3
ORDER BY FTE_TMUSD DESC ;
o/p i got
972,9680,963 -->FTE_TMUSD values which are not displayed in desc
I am expecting an o/p which will display the top 3 records of values
That should work; inline view is ordered by FTE_TMUSD in descending order, and you're selecting values from it.
What looks suspicious are values you specified as the result. It appears that FTE_TMUSD's datatype is VARCHAR2 (ah, yes - it is, you said so). It means that values are sorted as strings, not numbers - and it seems that you expect numbers. So, apply TO_NUMBER to that column. Note that it'll fail if column contains anything but numbers (for example, if there's a value 972C).
Also, an alternative to your query might be use of analytic functions, such as row_number:
with temp as
(select f.*,
row_number() over (order by to_number(f.fte_tmusd) desc) rn
from fse_tm_entry f
)
select *
from temp
where rn <= 3;
Is there any method to reduce the time taken to get the result from below query?
Please help. Thanks in advance!
select status, count(distinct id)
from emp
where id >=
( select min(id)
from emp
where id >= (select max(id-200000) from emp)
and trunc(join_date) >= '01-Mar-2018')
group by status;
Use analytic functions - this will perform only a single table scan (whereas your query has three table/index scans):
SELECT status,
COUNT( DISTINCT id )
FROM (
SELECT status,
id,
MIN( CASE WHEN join_date >= DATE '2018-03-01' THEN id END ) OVER () AS min_id
FROM (
SELECT status,
id,
join_date,
MAX( id ) OVER () AS max_id
FROM emp
)
WHERE id >= max_id - 20000
)
WHERE id >= min_id
GROUP BY status;
Also, you can use a date literal (rather than relying on implicit conversion of a string to a date using the NLS_DATE_FORMAT session parameter) and you do not need to use the TRUNC() function (since that may prevent Oracle using an index on the join_date column and would instead require a function-based index).
It is important to know if id is a primary key (as columns with that name usually are) or not. If it is not, you definitely need an index on id for it to perform (and I would also wonder what the purpose of the column was). If id is the primary key, you don't need to the distinct as the values will be unique anyway.
The select min(id) sub-select is redundant as you already found max(id - 200000) so you don't need to know the first min(id) greater than that. You can just use >= by itself (with the condition on the date added). By the way, I would write max(id) - 200000 instead; on some databases, it might work better.
The date comparison may be problematic. You should try an index on join_date if you haven't got one already, but the trunc might stop that from being used, so it would be best to remove that and make the other side of the compare use a TO_TIMESTAMP or TO_DATE to generate a corresponding literal as appropriate, setting the time to midnight.
But there can be problems with comparing timestamps due to timezones, etc. I'd need to know more about your setup to know whether that is likely to be a problem.
I have a table with >1M rows of data and 20+ columns.
Within my table (tableX) I have identified duplicate records (~80k) in one particular column (troubleColumn).
If possible I would like to retain the original table name and remove the duplicate records from my problematic column otherwise I could create a new table (tableXfinal) with the same schema but without the duplicates.
I am not proficient in SQL or any other programming language so please excuse my ignorance.
delete from Accidents.CleanedFilledCombined
where Fixed_Accident_Index
in(select Fixed_Accident_Index from Accidents.CleanedFilledCombined
group by Fixed_Accident_Index
having count(Fixed_Accident_Index) >1);
You can remove duplicates by running a query that rewrites your table (you can use the same table as the destination, or you can create a new table, verify that it has what you want, and then copy it over the old table).
A query that should work is here:
SELECT *
FROM (
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
)
WHERE row_number = 1
UPDATE 2019: To de-duplicate rows on a single partition with a MERGE, see:
https://stackoverflow.com/a/57900778/132438
An alternative to Jordan's answer - this one scales better when having too many duplicates:
#standardSQL
SELECT event.* FROM (
SELECT ARRAY_AGG(
t ORDER BY t.created_at DESC LIMIT 1
)[OFFSET(0)] event
FROM `githubarchive.month.201706` t
# GROUP BY the id you are de-duplicating by
GROUP BY actor.id
)
Or a shorter version (takes any row, instead of the newest one):
SELECT k.*
FROM (
SELECT ARRAY_AGG(x LIMIT 1)[OFFSET(0)] k
FROM `fh-bigquery.reddit_comments.2017_01` x
GROUP BY id
)
To de-duplicate rows on an existing table:
CREATE OR REPLACE TABLE `deleting.deduplicating_table`
AS
# SELECT id FROM UNNEST([1,1,1,2,2]) id
SELECT k.*
FROM (
SELECT ARRAY_AGG(row LIMIT 1)[OFFSET(0)] k
FROM `deleting.deduplicating_table` row
GROUP BY id
)
Not sure why nobody mentioned DISTINCT query.
Here is the way to clean duplicate rows:
CREATE OR REPLACE TABLE project.dataset.table
AS
SELECT DISTINCT * FROM project.dataset.table
If your schema doesn’t have any records - below variation of Jordan’s answer will work well enough with writing over same table or new one, etc.
SELECT <list of original fields>
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Fixed_Accident_Index) AS pos,
FROM Accidents.CleanedFilledCombined
)
WHERE pos = 1
In more generic case - with complex schema with records/netsed fields, etc. - above approach can be a challenge.
I would propose to try using Tabledata: insertAll API with rows[].insertId set to respective Fixed_Accident_Index for each row.
In this case duplicate rows will be eliminated by BigQuery
Of course, this will involve some client side coding - so might be not relevant for this particular question.
I havent tried this approach by myself either but feel it might be interesting to try :o)
If you have a large-size partitioned table, and only have duplicates in a certain partition range. You don't want to overscan nor process the whole table. use the MERGE SQL below with predicates on partition range:
-- WARNING: back up the table before this operation
-- FOR large size timestamp partitioned table
-- -------------------------------------------
-- -- To de-duplicate rows of a given range of a partition table, using surrage_key as unique id
-- -------------------------------------------
DECLARE dt_start DEFAULT TIMESTAMP("2019-09-17T00:00:00", "America/Los_Angeles") ;
DECLARE dt_end DEFAULT TIMESTAMP("2019-09-22T00:00:00", "America/Los_Angeles");
MERGE INTO `gcp_project`.`data_set`.`the_table` AS INTERNAL_DEST
USING (
SELECT k.*
FROM (
SELECT ARRAY_AGG(original_data LIMIT 1)[OFFSET(0)] k
FROM `gcp_project`.`data_set`.`the_table` AS original_data
WHERE stamp BETWEEN dt_start AND dt_end
GROUP BY surrogate_key
)
) AS INTERNAL_SOURCE
ON FALSE
WHEN NOT MATCHED BY SOURCE
AND INTERNAL_DEST.stamp BETWEEN dt_start AND dt_end -- remove all data in partiion range
THEN DELETE
WHEN NOT MATCHED THEN INSERT ROW
credit: https://gist.github.com/hui-zheng/f7e972bcbe9cde0c6cb6318f7270b67a
Easier answer, without a subselect
SELECT
*,
ROW_NUMBER()
OVER (PARTITION BY Fixed_Accident_Index)
row_number
FROM Accidents.CleanedFilledCombined
WHERE TRUE
QUALIFY row_number = 1
The Where True is neccesary because qualify needs a where, group by or having clause
Felipe's answer is the best approach for most cases. Here is a more elegant way to accomplish the same:
CREATE OR REPLACE TABLE Accidents.CleanedFilledCombined
AS
SELECT
Fixed_Accident_Index,
ARRAY_AGG(x LIMIT 1)[SAFE_OFFSET(0)].* EXCEPT(Fixed_Accident_Index)
FROM Accidents.CleanedFilledCombined AS x
GROUP BY Fixed_Accident_Index;
To be safe, make sure you backup the original table before you run this ^^
I don't recommend to use ROW NUMBER() OVER() approach if possible since you may run into BigQuery memory limits and get unexpected errors.
Update BigQuery schema with new table column as bq_uuid making it NULLABLE and type STRING
Create duplicate rows by running same command 5 times for example
insert into beginner-290513.917834811114.messages (id, type, flow, updated_at) Values(19999,"hello", "inbound", '2021-06-08T12:09:03.693646')
Check if duplicate entries exist
select * from beginner-290513.917834811114.messages where id = 19999
Use generate uuid function to generate uuid corresponding to each message
UPDATE beginner-290513.917834811114.messages
SET bq_uuid = GENERATE_UUID()
where id>0
Clean duplicate entries
DELETE FROM beginner-290513.917834811114.messages
WHERE bq_uuid IN
(SELECT bq_uuid
FROM
(SELECT bq_uuid,
ROW_NUMBER() OVER( PARTITION BY updated_at
ORDER BY bq_uuid ) AS row_num
FROM beginner-290513.917834811114.messages ) t
WHERE t.row_num > 1 );
Over in the SQL side, my data is looking like this:
Select f.id, f.TimeKey,t.CalendarYearMonth
from FactSubmission f
inner join DimTime t on t.TimeKey = f.TimeKey
order by f.Id asc
Sorting from MDX we have descending
SELECT
NON EMPTY ORDER(
[DimTime.CalendarYearMonth].[CalendarYearMonth].Members,
[DimTime.CalendarYearMonth].CurrentMember.Properties("MEMBER_KEY"),
DESC
) ON COLUMNS
FROM [PSE_FactSubmission]
And Ascending
The January dates aren't at the top of either sort, which suggests I'm sorting by the FactSubmission.ID key instead of DimTime.CalendarYearMonth
Is this how things are supposed to work? I'd like to pull back Jan,Feb,March.
DimTime.CalendarYearMonthNum is a column with data in the form 201501,201502,201503 etc. Here's an attempt at using this column to to sort the CalendarYearMonth data.
Debugging Query to Select Keys
NonEmpty Query
Try ordering using a different property:
SELECT
NON EMPTY ORDER(
[DimTime.CalendarYearMonth].[CalendarYearMonth].Members,
[DimTime.CalendarYearMonth].CurrentMember.Properties("MEMBER_Value"),
DESC
) ON COLUMNS
FROM [PSE_FactSubmission];
or maybe this:
SELECT
NON EMPTY ORDER(
[DimTime.CalendarYearMonth].[CalendarYearMonth].Members,
[DimTime.CalendarYearMonth].CurrentMember.MEMBERValue,
DESC
) ON COLUMNS
FROM [PSE_FactSubmission];
In the above you should be good using DESC - sometimes you need to break the underlying hierarchical ordering by adding a B i.e. BDESC
From here I cannot see MEMBER_VALUE: http://mondrian.pentaho.com/documentation/mdx.php
...but there is a function .VALUE so maybe try the following:
SELECT
NON EMPTY ORDER(
[DimTime.CalendarYearMonth].[CalendarYearMonth].Members,
[DimTime.CalendarYearMonth].CurrentMember.Value,
DESC
) ON COLUMNS
FROM [PSE_FactSubmission];
Strange that the key doesn't work. What values do you get if you run something like this?
WITH MEMBER [KEYcheck] AS
[DimTime.CalendarYearMonth].CurrentMember.Properties("MEMBER_KEY")
//[DimTime.CalendarYearMonth].CurrentMember.MEMBER_KEY
//[DimTime.CalendarYearMonth].[CalendarYearMonth].CurrentMember.MEMBER_KEY
//[DimTime].CurrentMember.MEMBER_KEY
SELECT
[KEYcheck] ON 0,
[DimTime.CalendarYearMonth].[CalendarYearMonth].Members ON 1
FROM [PSE_FactSubmission];
You are doing an alphabetical sort. February->January->March.
For doing a sort based on the month number, there needs to be a field which maps January-1, February-2, March-3.
If you have such a column in cube, use that to to sort. If not create a calculated member like below -
WITH MEMBER Measures.CalendarMonth AS
CASE [DimTime.CalendarYearMonth].CurrentMember
WHEN [DimTime.CalendarYearMonth].&[January] THEN 1
WHEN [DimTime.CalendarYearMonth].&[February] THEN 2
WHEN [DimTime.CalendarYearMonth].&[March] THEN 3
END
SELECT
NON EMPTY ORDER(
[DimTime.CalendarYearMonth].[CalendarYearMonth].Members,
Measures.CalendarMonth,
DESC
) ON COLUMNS
FROM [PSE_FactSubmission]
EDIT for Andrew
with member Measures.[MonthNum] as
NonEmpty
(
[DimTime.CalendarYearMonthNum].[CalendarMonthNum].members,
([DimTime.CalendarYearMonth].[CalendarYearMonth].currentmember, Measures.foo)
).item(0).membervalue
select
non empty
order
(
[DimTime.CalendarYearMonth].[CalendarYearMonth].members,
Measures.[MonthNum],
desc
) on rows
from [PSE_FactSubmission]
EDIT - with EXISTS
with member Measures.[MonthNum] as
EXISTS
(
[DimTime.CalendarYearMonthNum].[CalendarMonthNum].members,
[DimTime.CalendarYearMonth].[CalendarYearMonth].currentmember,
"SomeMeasureGroup"
).item(0).membervalue