Let's say I have a date table.
Let's say I also have another table where I want to join based on sales date and then do another join based upon the shipped date with the goal of joining based on date values in order to get the month name, year, etc from the date table.
Using Exago BI, is it possible to do such a join or would I have to create a view and just put the data in there manually?
Related
currently in our on-prem Hadoop environment we are using hive table with transaction properties. However as we have moving to AWS we don't have that feature yet. and so want to understand how to handle SCD Type 2 without updates.
for example.
for following record.
With Updates
In table with transaction properties enabled, when I get an update for a record, I go ahead and change the end_date to current date and create new record with effective_date as current date and end_date as 12/31/9999, as shows in above table. And so it's easier to find my active record (where end_date = "12/31/9999").
However, if I can't update the past record. I have two records with same end_date. as shows in table below.
My question are.
if I can update end_date of past record,
How do I get the historical duration of stay?
How do i get active record?
without updates
First of all, convert all dates to the 'yyyy-MM-dd' format, so they all will be sortable and analytic functions will work. Then you can use lead(effective_date, '2019-01-01') over(partition by id order by effective_date). For id=1 and effective_date = 2019-01-01 it should give you '2020-08-15' and you can assign this value as end_date for '2019-01-01' record. If there is no record with bigger effective_date, '9999-01-01' will be assigned. After this transformation Active record is that having '9999-01-01'.
Suppose dates are already converted to yyyy-MM-dd, this is how you can rewrite your table (after insert):
insert overwrite table your_table
select name, id, location, effective_date,
lead(effective_date,'2019-01-01') over(partition by id order by effective_date) as end_date
from your_table
Or without doing insert first, you can UNION ALL existing records with new records, in a subquery, then calculate lead.
Actually, SCD2 is not recommended for historical data rewriting because of non-equi join implementation in hive. It is implemented as cross-join + filter (or duplicating join on dim.id=fact.id (this will duplicate rows) + where fact.date<=dim.end_date and fact.date>=dim.effective_date - this should filter one record). This join is very expensive if the dimension and fact are big because of duplication before filtering.
I am newish to SSRS and I am trying to create a report where where each row displays sales for the selected year and the previous 2 years in separate columns. I would like to sort the results by the current year's sales and create a rank column and create a parameter allowing the user to choose the top N rows they wish to display. I see the tablix sorting tab which allows you to select an expression to sort by, but how would I specify that I want to sort by the sales for one year specifically? Or would I go about this by creating the rank column by that expression first then sorting the table by that rank column?
UPDATE: It is a SQL Server database and I can edit the SQL. For the columns I have storeID, storeName, Year, and Sales.
If I want to run a report daily and store the report's date as one of the column headers. Is this possible?
Example output (Counting the activities of employees for that day):
SELECT EMPLOYEE_NAME AS EMPLOYEE, COUNT(ACTIVITY) AS "Activity_On_SYSDATE" FROM EMPLOYEE_ACCESS GROUP BY EMPLOYEE_NAME;
Employee Activity_On_17042016
Jane 5
Martha 8
Sam 11
You are looking to do a reporting job with a data storing tool. The database (and SQL) is for storing and retrieving data, not for creating reports. There are special tools for creating reports.
In database design, it is very unhealthy to encode actual data in table or column name. Neither a table name nor a column name should have, as part of the name (and of the way they are used), an employee id, a date, or any other bit of actual data. Actual data should only be in fields, which in turn are in columns in different tables.
From what you describe, your base table should have columns for employee, activity and date. Then on any given day, if you want the count for the "current" day, you can query with
select employee, count(activity) ct
from table_name
where activity_date = SYSDATE
group by employee
If you want, you can also include the "activity_date" column in the output, that will show for which date the report was run.
Note that I assumed the column name for "date" is "activity_date." And in the output I used "ct" for a column alias, not "count." DATE and COUNT are reserved words, like SYSDATE, and you should NOT use them as table or column name. You could use them as aliases, as long as you don't need to refer to these aliases anywhere else in SQL, but it is still a very bad idea. Imagine you ever need to refer to a column (by name or by alias) and the name or alias is SYSDATE. What would a where clause like this mean?
where sysdate = sysdate
Do you see the problem?
Also, I can't tell from your question - were you thinking of storing these reports back in the database? To what end? It is better to store just one query and run it whenever needed (and make the "activity_date" for which you want the counts) an input parameter, so you can run the query for any date, at any time in the future. There is no need to store the actual daily reports in the database, as long as the base table is properly maintained.
Good luck!
I want to know how to get latest updated record via Informatica. Suppose I have 10 records in a temporary table. 3 records for Account1, 3 for Account2 and 4 records for Account3. Now out of these 3 accounts, I need fetch only those records which has maximum date value (Latest date) and insert in another temporary table. So which transformations I could use to get this or informatica logic I should use? Please help.
If the Date column comes from input with unique date , based on that use the aggregator transformation and take the maximum date.
If no date column is present, please assign system timestamp but cannot take maximum date from this. You have to go for some other logic like rowid and rownum features.
If the source is DB, we can do it in SQ itself - write a temp table by grouping by the pk field, and select this pk field and max(date). then join this output with the original source based on pk and date.
for eg:
select * from src_table
join ( select pk,max(date) as maxdate from src_table ) aggr_table
on src_table.pk=aggr_table.pk
and src_table.date=aggr_table.maxdate
same can be implemented inside informatica using an aggregator and joiner. but since the aggregator source is sq and again it's output is joining with sq, one sorter will be required in between the aggregator and joiner.
you can use aggregator transformation. You can use sorter transformation first with sorting based on account and date asc.After that you can aggregator transformation (grouping based on account). You dont need to add any condition or grouping function, as aggregator will give last records of every group.
I'm using SSRS 2005. I've got a table with various inventory data. In one columns I've got a subreport that is designed to pull the date of the most recent Purchase Order based upon the product code of whichever row the subreport is in. This would be fine, however I'm now being asked to be able to sort by this date column. My assumption is that you cannot sort a column with a subreport in it, but I thought I'd ask. Is there any way to do this?
You can include the most recent purchase order value in your main report's dataset as a subquery like this:
SELECT *
,(SELECT TOP 1 PurchaseOrder
FROM Purchasing p
WHERE p.ProductCode = i.ProductCode
ORDER BY PurchaseDate DESC
) as LastPurchaseOrder
FROM Inventory
Then you can use that value to sort your table.