Informatica: If Current month data missing, use previous month - oracle

The project I'm working on has monthly data for gas prices in California. The data is taken from a website and loaded into a table. I've done this part - the data is current until March 2016. We are now in April, which does not have any data yet, so the next step I need to do is use March's data and place that into April.
Here is what my table looks like right now:
My question is: How do I add a new row with first column data of 201604 and use March's price?
Let me know if I need to add more information.

INSERT INTO GAS_PRICES(YYYYMM, GAS_PRICE) VALUES (201604, 2.68);
Commit;
I can't help but thinking that your table structure is going to hurt later.
You don't appear to have a primary key which helps with integrity and performance.
YYYYMM could be a key but it's not clear whether you are storing it as a number or a string.
The use of YYYYMM as a column name might prove troublesome as that is part of the Oracle data format.
your naming convention of GAS_PRICES table and GAS_PRICE column could provide confusion due the similarity

Related

How to create a DAX cross-sectional measure?

I don't know if I even worded the question correctly, but I'm trying to create a measure that depends on what is showing in the pivot table (using PowerPivot). In the image I posted, "DealMonth" is an expression in the PowerQuery table itself that simply takes the start date of the employee and subtracts it from the month a deal was closed in. That will show how long it took for that salesperson to close the deal. "TenureMonths" is also an expression in the PowerQuery table that calculates the tenure of the person. The values populating this screenshot are coming from a total headcount measure created. What I'm trying to do is create a separate measure that will show when the "TenureMonths" is less than the "DealMonth." So if the TenureMonths is 5, then after DealMonth of 5, the value would be 0. Is this possible?
Screenshot
I should add the following information.
"DealMonth" - Comes from the FactData table
"TenureMonths" - Comes from the DimSalesStart table
These two tables are joined by name. I feel like I'm so close because I can see what I want. The second image below is a copy/paste of the pivot table result but with my edits to show what I'd want to have shown. Basically, if(TenureMonths >= DealMonth,1,0). The trouble seems to be that since they're in two different tables, I can't make it work. The rows in the fact table are transactions, but the rows in the dim table are just the people with their start and end dates.
Desired Result
This is possible with some IF([measure1]<[measure2],blank(),[measure1]), however without seeing more of the data it will be hard to guide you specifically.
However you need to create two separate measures, one for TenureMonths and one for DealMonth, depending on the data this can be done with an aggregator forumla such as sum, min, max, etc (depends if there will be more than one value).
Then reference those two measures in the formula pattern I mentioned above, and that should give you want you want.
I figured out a solution. I added a dimension table for DealMonth itself and joined to my fact table. That allowed me to do the formulas that I needed.

Table design : conversion rates on turnaround times

I have following table called "status" in source :
and following is the target table requirement:
My issue is I am not able to finalise how to write the job to put it against a column time dimension. It's easy in excel to simply divide the cell but I am not able to do it using ETL tool.
I am sure someone must have faced and resolved similar requirement.
Please help.
Caveat: I have not worked with MYSQL and SAP BODS, I am using SQL Server tools as a platform to explain the solution. I am not including any codes, but just listing high level steps; which I recommend.
I believe you should be able to tackle this problem by:
Build a date table (or date dimension in DWH language).This table should have dates, associated financial ears . Refer to the following location for info:
https://www.mssqltips.com/sqlservertip/4054/creating-a-date-dimension-or-calendar-table-in-sql-server/
Select the data into a staging table and increment a counter column , for each of the source columns; Req in count, offer issued etc. The date dim should have the months associated with each of the dates, so you should be able to map the counts.
You should then be able to pivot the date into your destination table.
Hope that helps.
Cheers
Nithin

Track the rows which were updated or encrypted

I want to scrub(or encrypt) the email information from a few tables which are older than a few years.
This I am planning to do as part of a job, next time when I run the job how can I omit the rows which are already scrubbed or encrypted.
I am looking for an approach which will be having good performance.
"I want to scrub(or encrypt) the email information from a few tables which are older than a few years"
I hope this means you have a date column on these tables which you can use to determine which ones need to be scrubbed. The most efficient way of tackling the job is to track that date in an operational table, recording the most recent date scrubbed.
For example you have ten years' worth of data, and you need to scrub records which are more than four years old. Now this would work:
update t23
set email = null
where date_created < add_months(sysdate, -48);
But it seems like you want to batch things up. So build a tracking table, which at its simplest would be
create table tracker (
last_date_scrubbed);
Populate the last_date_scrubbed with a really old date say date '2010-01-01'
Now you can write a query like this
update t23
set email = null
where date_created
< (select last_date_scrubbed + interval '1' year from tracker);
That will clean all records older than 2011. Increment the date in the tracker table by one year. Run the query again to clean stuff from 2011. Repeat until you get to your target state of cleanliness. At which point you can switch to running the query monthly , with an interval of one month , or whatever.
Obviously you should proceduralize this. A procedure is the best way to encapsulate the steps and make sure everything is kept in step. Also you can use the database scheduler to run the procedure.
"there is one downside to this approach. I thought that you want to be free upon choosing which rows to be updated."
I don't see any requirement to track which individual rows have been scrubbed. After all, the end state is that every record older than a certain date has been scrubbed. When I have done jobs like this previously all anybody wanted to know was, "how many rows have we done so far and how many have we still got to do?" Which can be answered by tracking the sql%rowcount for each run.
For The best performance, you can add a Flag Column to your main table. a Column like IsEncrypted. then every time you try to run any query for the "not Encrypted rows" you easily use WHERE when IsEncrypted Column is false to condition on those rows only. there are other ways though.
EDIT
another way is to create a logger table. basically what this table does, is that it records any more information you want about a certain ID in another table. have another table called EncryptionLogger, in it you would have at least two columns: EmailTableId, IsEncrypted. then in any query you can simply get any rows WHERE their Ids are NOT IN this table.

How to get the last "row" in a cassandra's long row

In Cassandra, a row can be very long and store units of time relevant data. For example, one row could look like the following:
RowKey: "weather"
name=2013-01-02:temperature, value=90,
name=2013-01-02:humidity, value=23,
name=2013-01-02:rain, value=false",
name=2013-01-03:temperature, value=91,
name=2013-01-03:humidity, value=24,
name=2013-01-03:rain, value=false",
name=2013-01-04:temperature, value=90,
name=2013-01-04:humidity, value=23,
name=2013-01-04:rain, value=false".
9 columns of 3 days' weather info.
time is a primary key in this row. So the order of this row would be time based.
My question is, is there any way for me to do a query like: what is the last/first day's humidity value in this row? I know I could use a Order By statement in CQL but since this row is already sorted by time, there should be some way to just get the first/last one directly, instead of doing another sort. Or is cassandra optimizing it already with Order By statement under the hood?
Another way I could think of is, store another column in this row called "last_time_stamp" that always updates itself as new data is inserted in. But that would require one more update every time I insert new weather data.
Thanks for any suggestion!:)
Without seeing more of your actual table, I suggest using a timestamp (or timeuuid if there is a possibility for collisions) as the second component in a compound primary key. Using this, you can get the last "row" by selecting ORDER BY t DESC LIMIT 1.
You could also change the clustering order in your schema to order it naturally for "last N" queries.
Please see examples and linked resource in this answer.

What is a reasonable year datatype in Oracle?

Two possibilities come into my mind:
NUMBER(4)
DATE
Pro NUMBER(4):
No duplicate entries possible if specified as UNIQUE
Easy arithmetic (add one, subtract one)
Con NUMBER(4):
No Validation (e.g. negative numbers)
Pro DATE:
Validation
Con DATE:
Duplicate entries are possible ('2013-06-24', '2013-06-23', ...)
Not so easy arithmetic (add one = ADD_MONTHS(12))
As additional requirement the column gets compared with the current year EXTRACT (YEAR FROM SYSDATE). In my opinion NUMBER(4) ist the better choice. What do you think, is there another option I have missed?
You can restrict a date column to only have one entry per year if you want to, with a function-based index:
create unique index uq_yr on <table> (trunc(<column>, 'YYYY'));
Trying to insert two dates in the same year would give you an ORA-00001 error. Of course, if you don't want the rest of the date then it may be unhelpful or confusing to hold it, but on the other hand there may be secondary info you want to keep (e.g. if you're recording that an annual audit happened, holding the full date might not hurt anything). You could also have a virtual column (from 11g) that holds the trunc value for easier manipulation, perhaps.
You could also use an interval year(4) to month data type, and insert using numtoyminterval(2013, 'year'), etc. You could do interval arithmetic to add and subtract years, and extract to get the year back out as a number. That would probably be more painful than using a date though, overall.
If you're really only interested in the year (and you are not holding the month in a different column!) then a number is probably going to be simplest, with a check constraint to make sure it's a sensible number - number(4) doesn't stop you inserting 2.013 when you meant 2,013 (though you need to be converting from a string to hit that, and not have an NLS parameter mismatch), which would be truncated to just 2.
You've quite well summed up the pros/cons.
Provided that you name clearly your field so that it's easy to understand that it contains a year information, I would go with a NUMBER(4) for simplicity & storing no more or less than what is necessary. And even if there is no validation, IMO negative years are valid :)
Depending on your use case you might also consider building a one-off date (dimension) table and linking to a specific row via ID. That way, you have access to more information which you could later add to the dinemsion table (leap year etc.) and the entries in your dimension can be validated on creation.

Resources