What is a reasonable year datatype in Oracle? - oracle

Two possibilities come into my mind:
NUMBER(4)
DATE
Pro NUMBER(4):
No duplicate entries possible if specified as UNIQUE
Easy arithmetic (add one, subtract one)
Con NUMBER(4):
No Validation (e.g. negative numbers)
Pro DATE:
Validation
Con DATE:
Duplicate entries are possible ('2013-06-24', '2013-06-23', ...)
Not so easy arithmetic (add one = ADD_MONTHS(12))
As additional requirement the column gets compared with the current year EXTRACT (YEAR FROM SYSDATE). In my opinion NUMBER(4) ist the better choice. What do you think, is there another option I have missed?

You can restrict a date column to only have one entry per year if you want to, with a function-based index:
create unique index uq_yr on <table> (trunc(<column>, 'YYYY'));
Trying to insert two dates in the same year would give you an ORA-00001 error. Of course, if you don't want the rest of the date then it may be unhelpful or confusing to hold it, but on the other hand there may be secondary info you want to keep (e.g. if you're recording that an annual audit happened, holding the full date might not hurt anything). You could also have a virtual column (from 11g) that holds the trunc value for easier manipulation, perhaps.
You could also use an interval year(4) to month data type, and insert using numtoyminterval(2013, 'year'), etc. You could do interval arithmetic to add and subtract years, and extract to get the year back out as a number. That would probably be more painful than using a date though, overall.
If you're really only interested in the year (and you are not holding the month in a different column!) then a number is probably going to be simplest, with a check constraint to make sure it's a sensible number - number(4) doesn't stop you inserting 2.013 when you meant 2,013 (though you need to be converting from a string to hit that, and not have an NLS parameter mismatch), which would be truncated to just 2.

You've quite well summed up the pros/cons.
Provided that you name clearly your field so that it's easy to understand that it contains a year information, I would go with a NUMBER(4) for simplicity & storing no more or less than what is necessary. And even if there is no validation, IMO negative years are valid :)

Depending on your use case you might also consider building a one-off date (dimension) table and linking to a specific row via ID. That way, you have access to more information which you could later add to the dinemsion table (leap year etc.) and the entries in your dimension can be validated on creation.

Related

How to create a DAX cross-sectional measure?

I don't know if I even worded the question correctly, but I'm trying to create a measure that depends on what is showing in the pivot table (using PowerPivot). In the image I posted, "DealMonth" is an expression in the PowerQuery table itself that simply takes the start date of the employee and subtracts it from the month a deal was closed in. That will show how long it took for that salesperson to close the deal. "TenureMonths" is also an expression in the PowerQuery table that calculates the tenure of the person. The values populating this screenshot are coming from a total headcount measure created. What I'm trying to do is create a separate measure that will show when the "TenureMonths" is less than the "DealMonth." So if the TenureMonths is 5, then after DealMonth of 5, the value would be 0. Is this possible?
Screenshot
I should add the following information.
"DealMonth" - Comes from the FactData table
"TenureMonths" - Comes from the DimSalesStart table
These two tables are joined by name. I feel like I'm so close because I can see what I want. The second image below is a copy/paste of the pivot table result but with my edits to show what I'd want to have shown. Basically, if(TenureMonths >= DealMonth,1,0). The trouble seems to be that since they're in two different tables, I can't make it work. The rows in the fact table are transactions, but the rows in the dim table are just the people with their start and end dates.
Desired Result
This is possible with some IF([measure1]<[measure2],blank(),[measure1]), however without seeing more of the data it will be hard to guide you specifically.
However you need to create two separate measures, one for TenureMonths and one for DealMonth, depending on the data this can be done with an aggregator forumla such as sum, min, max, etc (depends if there will be more than one value).
Then reference those two measures in the formula pattern I mentioned above, and that should give you want you want.
I figured out a solution. I added a dimension table for DealMonth itself and joined to my fact table. That allowed me to do the formulas that I needed.

Converting dates from different centuries

I have staging table which contains date as string with format 'mm/dd/yy'. I have Oracle 11g procedure to convert the string to date format before loading into main table. I'm using to_date('03/20/34','mm/dd/rr') to convert into date format which is giving wrong output as 03/20/2034 whereas the correct date is 03/20/1934. Please help me out to get the correct output where my table contains dates from both centuries.
"I'm using to_date('03/20/34','mm/dd/rr') to convert into date format which is giving wrong output as 03/20/2034 whereas the correct date is 03/20/1934. "
RR was a hack Oracle introduced in the last Millennium as part of the fight to resolve the Y2K bug. The standard date mask YY defaults the century to the current century. But in 1999 it was more likely that 01/01/00 meant 01/01/2000 rather than 01/01/1900. So the RR hack was to derive the century for dates using fixed windows pivoting on 00: values 00-49 are given century 20, 50-99 are given 19. Clearly some of the time this guess would be wrong, but the data corruption introduced was of a lower level than defaulting all dates to century 19.
The key point is, the windows are fixed. It was intended to be a temporary solution, because there wasn't time to switch all the legacy systems to use four-digit years before 2000 arrived. But the vision was always that all systems would be fixed in the long term, even if only through retirement or replacement. Certainly nobody expected that new systems would be built supporting two-digit years.
It is now 2017 and there is no excuse for systems to still be using two-digit years. Back in the old days storage was expensive, and shaving two digits from a date was a valuable space saving. Now it is just sloppiness.
Which obviously doesn't help you solve your problem. The short answer is there is no way to change the pivot used by RR. The best solution would be to enforce stricter validation on the data input aspect of your system, and insist on four-digit years. Whether that's feasible depends on your office politics. The other solution is to write your own conversion function:
create or replace function my_to_date (p_str varchar2) return date as
begin
if to_number(substr(p_str, 7) <= 35 then
return to_date(substr(p_str, 1, 6)||'19'||substr(p_str, 7), 'dd/mm/yyyy');
else
return to_date(substr(p_str, 1, 6)||'20'||substr(p_str, 7), 'dd/mm/yyyy');
end;
Obviously you'll need to define the actual rules for deciding whether to use 19 or 20.
I also encountered an issue like this, when inserting date values from the late 90s. The format in the script I was given read DD-MON-YY, so the database read that as 20YY, instead of 19YY.
My very inelegant solution was to open the raw data file and simply add a "19" before the YY year values.

Informatica: If Current month data missing, use previous month

The project I'm working on has monthly data for gas prices in California. The data is taken from a website and loaded into a table. I've done this part - the data is current until March 2016. We are now in April, which does not have any data yet, so the next step I need to do is use March's data and place that into April.
Here is what my table looks like right now:
My question is: How do I add a new row with first column data of 201604 and use March's price?
Let me know if I need to add more information.
INSERT INTO GAS_PRICES(YYYYMM, GAS_PRICE) VALUES (201604, 2.68);
Commit;
I can't help but thinking that your table structure is going to hurt later.
You don't appear to have a primary key which helps with integrity and performance.
YYYYMM could be a key but it's not clear whether you are storing it as a number or a string.
The use of YYYYMM as a column name might prove troublesome as that is part of the Oracle data format.
your naming convention of GAS_PRICES table and GAS_PRICE column could provide confusion due the similarity

How do I specify invalid time in a time dimension?

I am building a time dimension for only time in my data warehouse. I already have a date dimension.
How do I denote an unknown time? In my DimDate dimension, I marked 01/01/1753 as being reserved for unknown dates, but I think a time will be a bit harder. We don't allow NULLs in our fact tables. How do I do this, and what might that row look like?
You state the "We don't allow NULLs in our fact tables " but ask "How do I denote an unknown time?"
Assuming you are using in your FACT table a data type TIME + enforce a NOT NULL constraint on data arriving from source system => you simply cannot insert unknown\invalid time into your fact and hence should have no problem.
The obvious exception to the above is an invalid business wise value reported by the source system such as Sunil proposed ('00:59:59.9999999') but this is very uncommon, unstable solution for obvius reasons (changing requirements can easily turn this value into a valid one)
If you chose to allow (and i hope you did) records with NULL values or invalid dates from your source system to enter the fact then the best practice would be using surrogate keys on our DimTime and insert them as FK into your FACT tables – this will easily allow you to represent valid + invalid values in your dimension.
This approach can easily also support the approach of an invalid business wise value ('00:59:59.9999999'), such a value gets an FK_DimTime=-1.
I strongly advise on allowing specific types of garbage from source systems to enter the FACT (i.e – invalid\missing\NULL date\time values) tables as long as you clearly mark it in relevant DIMs as this tends to drive Users to improve data quality in source systems.
Here is some background on the matter
https://www.kimballgroup.com/1997/07/its-time-for-time/
https://www.kimballgroup.com/2004/02/design-tip-51-latest-thinking-on-time-dimension-tables/
It can look like anything you want. Most dimensions have a 'display name' of some kind, so your dimensions could look something like this:
create table dbo.DimDate (DateID int, DateValue date, DisplayDate nvarchar(20))
go
-- this is an unknown date; 1753-01-01 is only there because we need some valid date value
insert into dbo.DimDate values (1, '1753-01-01', 'Unknown')
go
-- this is the real date 1 Jan 1753
insert into dbo.DimDate values (2, '1753-01-01', '01 Jan 1753')
go
create table dbo.DimTime (TimeID int, TimeValue time, DisplayTime nvarchar(20))
go
-- this is an unknown time; 00:00 is only there because we need some valid time value
insert into dbo.DimTime values (1, '00:00', 'Unknown')
go
-- this is the real time value for midnight
insert into dbo.DimTime values (2, '00:00', 'Midnight')
go
Of course, this assumes that your reporting tool and users use the DisplayDate and DisplayTime columns for filtering instead of the DateValue and TimeValue columns directly, but that's simply a matter of training and standards and whatever solution you adopt needs to be understood anyway.
There are other alternatives such as a flag column for unknown values, or a convention that a negative TimeID indicates an unknown value. But those are less obvious and harder to maintain than an explicit row value, in my opinion.
Just create a DimTime records with a -1 technical surrogate key and populate to the time column a value '00:59:59.9999999'. This way this will be a unlikely time ever captured (accuracy to the last digit) by your DWH, it will always equate to a unknown in your reports or queries when you want to put filter like,
EventTime < #ReportTime AND EventTime <> '00:59:59.9999999'
Hope this is viable solution to your problem.

Oracle - Fetch date/time in milliseconds from DATE datatype field

I have last_update_date column defined as DATE field
I want to get time in milliseconds.
Currently I have:
TO_CHAR(last_update_date,'YYYY-DD-MM hh:mi:ss am')
But I want to get milliseconds as well.
I googled a bit and think DATE fields will not have milliseconds. only TIMESTAMP fields will.
Is there any way to get milliseconds? I do not have option to change data type for the field.
DATE fields on Oracle only store the data down to a second so there is no way to provide anything more precise than that. If you want more precision, you must use another type such as TIMESTAMP.
Here is a link to another SO question regarding Oracle date and time precision.
As RC says, the DATE type only supports a granularity down to the second.
If converting to TIMESTAMP is truly not an option then how about the addition of another numerical column that just holds the milliseconds?
This option would be more cumbersome to deal with than a TIMESTAMP column but it could be workable if converting the type is not possible.
In a similar situation where I couldn't change the fields in a table, (Couldn't afford to 'break' third party software,) but needed sub-second precision, I added a 1:1 supplemental table, and an after insert trigger on the original table to post the timestamp into the supplemental table.
If you only need to know the ORDER of records being added within the same second, you could do the same thing, only using a sequence as a data source for the supplemental field.

Resources