My query looks something like this:
select datesent, count(*) the_count from receivedmessaged where status=5000
and datesent>(to_date('20130101', 'YYYYMMDD')) group by datesent
What I'm looking for is a table that has the count of messages with a status of 5000 per day, newer than a certain date. What I'm getting is a table with the same dates over and over. What I think is happening is that there is a hidden time part in that datesent field, and its grouping the entries by the exact time they were sent, rather than just looking at the date. Can anyone confirm this and tell me how I can fix it? Thanks!
What I think is happening is that there is a hidden time part in that datesent field, and its grouping the entries by the exact time they were sent, rather than just looking at the date.
That's very probably what's happening. So try that:
select TRUNC(datesent), count(*) the_count from receivedmessaged where status=5000
and datesent>(to_date('20130101', 'YYYYMMDD')) group by TRUNC(datesent)
TRUNC will remove the "time part" and allow you to group by day.
Please note that the use of TRUNC wil invalidate your index. Take a look at your execution plan. And if needed, you should add a function-based index on TRUNC(datesend).
Of course, using TRUNC would solve your issue, and using a function-based index would make it efficient.
However, from 11g onwards, you could also use VIRTUAL colums. In your case, you can add a virtual column as new_date ALWAYS GENERATED AS (TRUNC(date_column)). You just need to use this virtual column in your query. For performance improvement, if required, you could create an index.
NOTE : Indexes defined against virtual columns are equivalent to function-based indexes.
Related
There are records in table for particular date. But when I query with that value, I am unable to filter the records.
select * from TBL_IPCOLO_BILLING_MST
where LAST_UPDATED_DATE = '03-09-21';
The dates are in dd-mm-yy format.
To the answer by Valeriia Sharak, I would just add a few things since your question is tagged Oracle. I was going to add this as a comment to her answer, but it's too long.
First, it is bad practice to compare dates to strings. Your query, for example, would not even execute for me -- it would end with ORA-01843: not a valid month. That is because Oracle must do an implicit type conversion to convert your string "03-09-21" to a date and it uses the current NLS_DATE_FORMAT setting to do that (which in my system happens to be DD-MON-YYYY).
Second, as was pointed out, your comparison is probably not matching rows due LAST_UPDATED_DATE having hours, minutes, and seconds. But a more performant solution for that might be:
...
WHERE last_update_date >= TO_DATE('03-09-21','DD-MM-YY')
AND last_update_date < TO_DATE('04-09-21','DD-MM-YY')
This makes the comparison without wrapping last_update_date in a TRUNC() function. This could perform better in either of the following circumstances:
If there is an index on last_update_date that would be useful in your query
If the table with last_update_date is large and is being joined to other tables (because it makes it easier for Oracle to estimate the number of rows from your table that are inputs to the join).
Your column might contain hours and seconds, but they can be hidden.
So when you filter on the date, oracle implicitly adds time to the date. So basically you are filtering on '03-09-21 00:00:00'
Try to trunc your column:
select * from TBL_IPCOLO_BILLING_MST
where trunc(LAST_UPDATED_DATE) = '03-09-21';
Hope, I understood your question correctly.
Oracle docs
Have a query regarding this - have a table update where I have to backfill for over a years worth of data, and due to the code I have to update by day (which takes 4-5 mins per day), does anyone know how I can do this more effectively by setting a list of dates so I can do this in the background.
So for example if I set a variable called :reqdate which is the date field and I have a list of dates from a query (e.g. 01/01/20, 02/01/20... 04/04/20) is there something I can do to get sql to run this repeatedly eg :regdate=01/01/20, then when thats done it automatically does 02/01/20 and so on
Thanks
If I understood you correctly, the easiest way is to use merge clause like:
merge into dest_table t
using (
select date'2020-01-01'+N as dt
from xmltable('0 to 10' columns N int path '.')
) dates
on (t.date_col = dates.dt)
whem matched then update
set ...
Though I think you need to redesign your update to simple update like
update (select ... from) t
set ...
where t.dt between date'2020-01-01' and date'2020-01-20'
I am a SQL Server guy and just started working on Netezza, one thing pops up to me is a daily query to find out the size of a table filtered out by year: 2016,2015, 2014, ...
What I am using now is something like below and it works for me, but I wonder if there is a better way to do it:
select count(1)
from table
where extract(year from datacolumn) = 2016
extract is a built-in function, applying a function on a table with size like 10 billion+ is not imaginable in SQL Server to my knowledge.
Thank you for your advice.
The only problem i see with the query is the where clause which executes a function on the 'variable' side. That effectively disables zonemaps and thus forces netezza to scan all data pages, not only those with data from that year.
Instead write something like:
select count(1)
from table
where datecolumn between '2016-01-01' and '2016-12-31'
A more generic alternative is to create a 'date dimension table' with one row per day in your tables (and a couple of years into the future)
This is an example for Postgres: https://medium.com/#duffn/creating-a-date-dimension-table-in-postgresql-af3f8e2941ac
This enables you to write code like this:
Select count(1)
From table t join d_date d on t.datecolumn=d.date_actual
Where year_actual=2016
You may not have the generate_series() function on your system, but a 'select row_number()...' can do the same trick. A download is available here: https://www.ibm.com/developerworks/community/wikis/basic/anonymous/api/wiki/76c5f285-8577-4848-b1f3-167b8225e847/page/44d502dd-5a70-4db8-b8ee-6bbffcb32f00/attachment/6cb02340-a342-42e6-8953-aa01cbb10275/media/generate_series.tgz
A couple of further notices in 'date interval' where clauses:
Those columns are the most likely candidate for a zonemaps optimization. Add a 'organize on (datecolumn)' at the bottom of your table DDL and organize your table. That will cause netezza to move around records to pages with similar dates, and the query times will be better.
Furthermore you should ensure that the 'distribute on' clause for the table results in an even distribution across data slices of the table is big. The execution of the query will never be faster than the slowest dataslice.
I hope this helps
this might be a lack of very basic knowledge, but I just can't figure it out. Searching for the answer and trial and error haven't helped much.
Returning all recordsets from a table (SELECT * FROM X) --> no problem.
Returning today's date (SELECT TO_CHAR(SYSDATE, 'DD-MM-YYYY') FROM DUAL) --> no problem.
Returning all recordsets from the same table as well as today's date --> no luck. I have tried subselects, union, joins, with-statements, ... it's driving me nuts.
When I name the columns I want returned (SELECT Columname1, Columnname2, to_char(sysdate....)) it works. This problems seems to only occur when using wildcards.
How do I get Oracle to return "all columns", today's date"?
Thanks!
You have to prefix the wildcard with the table name (or alias, if you've used one):
SELECT X.*, TO_CHAR(SYSDATE, 'DD-MM-YYYY') AS TODAYS_DATE FROM X
Using the wildcard is generally not considered a good idea, as you have no control over the order the columns are listed (if the table was built differently in different environments) and anyone consuming this output may be thrown if the table definition changes in the future, e.g. by adding another column. It's better to list all the columns individually.
I have the following query (generated by Entity Framework with standard paging. This is the inner query and I added the TOP 438 part):
SELECT TOP 438 [Extent1].[Id] AS [Id],
[Extent1].[MemberType] AS [MemberType],
[Extent1].[FullName] AS [FullName],
[Extent1].[Image] AS [Image],
row_number() OVER (ORDER BY [Extent1].[FullName] ASC) AS [row_number]
FROM [dbo].[ShowMembers] AS [Extent1]
WHERE 3 = CAST( [Extent1].[MemberType] AS int)
ShowMembers table has about 11K rows, but only 438 with MemberType == 3. The 'Image' column is of type nvarchar(2000) that holds the URL to the image on a CDN. If I include this column in the query (only in SELECT part), the query chokes up somehow and generates result in a range between 2-30 seconds (it differs in different runs). If I comment out that column, query runs fast as expected. If I include the 'Image' column, but comment out the row_number column, query also runs fast as expected.
Obviously, I've been too liberal with the size of the URL, so I started playing around with the size. I found out that if I set the Image column to nvarchar(884), then the query runs fast as expected. If I set it up to 885 it's slow again.
This is not bound to one column, but to the size of all columns in the SELECT statement. If I just increase the size by one, performance differences are obvious.
I am not a DB expert, so any advice is welcomed.
PS In local SQL Server 2012 Express there are no performance issues.
PPS Running the query with OFFSET 0 ROWS FETCH NEXT 438 ROWS ONLY (without the row_count column of course) is also slow.
Row_number has to sort all the rows to get you things in the order you want. Adding a larger column into the result set implies that it all get sorted and thus is much slower/does more IO. You can see this, btw, if you enable "set statistics io on" and "set statistics time on" in SSMS when debugging problems like this. It will give you some insight into the number of IOs and other operations happening at runtime in the query:
https://learn.microsoft.com/en-us/sql/t-sql/statements/set-statistics-io-transact-sql?view=sql-server-2017
In terms of what you can do to make this query go faster, I encourage you to think about some things that may change the design of your database schema a bit. First, consider whether you actually need the rows sorted in a specific order at all. If you don't need things in order, it will be cheaper to iterate over them without the row_number (by a measurable amount). So, if you just want to conceptually iterate over each entry once, you can do that by doing an order by something more static that is still monotonic such as the identity column. Second, if you do need to have things in sorted order, then consider whether they are changing frequently/infrequently. If it is infrequent, it may be possible to just compute and persist a column value into each row that has the relative order that you want (and update it each time you modify the table). In this model, you could index the new column and then request things in that order (in the top-level order by in the query - row_number not needed). If you do need things dynamically computed like you are doing and you need things in an exact order all the time, your final option is to move the URL to a second table and join with it after the row_number. This will avoid the sort being "wide" in the computation of row_number.
Best of luck to you