I am looking to merge data in way described below:
I have a table below:
table: PTLANALYSIS
RENTALDATE
OUTBOUND,
INBOUND,
VEHICLE_SIZE,
COMPETITOR,
RATE;
The data I am trying to load into the tabs:
RENTALDATE,
OUTBOUND,
INBOUND,
VEHICLE_SIZE,
LOLY,
KAY,
RATE;
Now LOLY and KAY are suppose to be in column "Competitor" in table PTLANALYSIS. Can someone help me merge my data in an appropriate manner, the output should look something like this...
Rental Date | OUTBOUND | INBOUND | VEHICLE_SIZE | COMPETITOR | RATE
12/28/2019 223 333 small loly 33.5
12/28/2019 223 333 small kay 33.5
Currently it looks like this in my csv..
Rental Date | OUTBOUND | INBOUND | VEHICLE_SIZE | lolyRATE | KAYRATE
12/28/2019 223 333 small 33.5 NULL
12/28/2019 223 333 small NULL 33.5
Thanks in advance!
Most of the columns in the CSV file have fixed targets. You need to evaluate the LOLYRATE and KAYRATE to conditionally populate COMPETITOR and RATE. Something like this:
insert into PTLANALYSIS (
RENTALDATE
OUTBOUND,
INBOUND,
VEHICLE_SIZE,
COMPETITOR,
RATE
)
select
RENTALDATE,
OUTBOUND,
INBOUND,
VEHICLE_SIZE,
case when LOLYRATE is not null then 'loly' else 'kay' end as competitor,
coalesce(LOLYRATE, KAYRATE) as rate
from ext_table
;
You haven't said how you intend to load the data but I have assumed an external table, because it allows you to use SQL, and everything is easier with SQL. Find out more.
Related
I have a dataset like this in Power BI with connections between "Participant ID" Column and "Knows Participant":
Participant ID
Knows Participant
111
353
111
777
111
112
111
249
112
143
112
144
113
111
113
244
114
NaN
115
113
...
...
777
111
777
398
777
114
778
NaN
779
112
3499
NaN
I've build Network chart. However, there are a lot of 1-1 connections that are not very useful for visualization, so I want to exclude them (see image):
Is it possible to count a number of connections in each network using DAX and then use this value to filter out all nodes with only 1 connection (red circled)? Or maybe filter out 1 connection nodes using another approach?
I've tried to make a calculated column using DAX:
Connection Column = COUNTROWS(
FILTER(Table,
EARLIER(Table[Knows Participant])=Table[Knows Participant])
)
However, it only shows duplicate values in "Knows Participant" Column, but not number of connections in each network.
Example of desired output:
Participant ID
Knows Participant
Number of Connections in the Network
111
353
4
353
444
4
444
551
4
551
987
4
112
143
1
220
190
1
333
337
2
337
410
2
765
0
You need the PATH functions as you're essentially trying to flatten a hierarchy and then exclude certain parts of it. The following help page gives a good rundown of the approach to take.
https://learn.microsoft.com/en-us/dax/understanding-functions-for-parent-child-hierarchies-in-dax
You can add a column to the table with a measure like this:
VAR pIdLinksCount = CALCULATE(COUNTROWS(tbl), ALL('tbl'[Knows Participant]))
VAR neighbourLinksCount =
IF(
pIdLinksCount=1
, -- if pIdLinksCount=1 then count neighbour links
VAR neighbourId =
CALCULATETABLE(
Values('tbl'[Knows Participant])
)
RETURN
CALCULATE(
COUNTROWS(tbl)
,ALL() -- removes all filters from data model
,'tbl'[Participant ID] = neighbourId -- applies filter to [Participant ID] column
--,'tbl'[Participant ID] IN neighbourId -- alternatively try this. I believe it is not necessary
)
,2 -- returns 2 if pIdLinksCount>1.
-- The "value = 2" will return "result > 3 = TRUE()"
)
VAR result = pIdLinksCount + neighbourLinksCount
RETURN
IF(
result>2
,1
,0
)
The idea is to check a neighbor too - if it has more then 1 link
I would like to extract all the cases completed/working by particular customer throughout the time. E.g.:
CustomerId 111 completed case before 1 month and 222 completed before 2 months.
Now, user is filling the case with id 333, and I would like to fetch and display the data for case id 333 like:
CaseId Status
111 completed
222 working
Any ideas how to do that?
I am trying to load a set of policy numbers in my Target based on below criteria using Informatica PowerCenter.
I want to select all those rows of policy numbers, for which policy the Rider = 0
This is my source: -
Policy Rider Plan
1234 0 1000
1234 1 1010
1234 2 3000
9090 0 2000
9090 2 2545
4321 3 2000
4321 1 2000
Target should look like this: -
Policy Rider Plan
1234 0 1000
1234 1 1010
1234 2 3000
9090 0 2000
9090 2 2545
The policy number 4321 would not be loaded.
If I use filter as Rider = 0, then I miss out on below rows: -
1234 1 1010
1234 2 3000
9090 0 2000
9090 2 2545
What would be ideal way to load this kind of data using PowerCenter Designer?
Take the same source in one more qualifier in same mapping, use a filter as Rider=0 to get list of unique policy numbers that has Rider=0, then use a joiner with your regular source on policy number. This should work.
Another method, sort your data based on policy and Rider, and use variable ports with condition similar to below.
v_validflag=IIF(v_policy_prev!=policy, IIF(Rider=0, 'valid','invalid'), v_validflag)
v_policy_prev=policy
Then filter valid records.
There are many options. Here are two...
First:
It'll look like:
// AGGREGATOR \\
SOURCE >> SOURCE QUALIFIER >> SORTER << >> JOINER >> TARGET
\\============//
Connect all ports from Source Qualifier (SQ) to SORTER transformation (or sort in SQ itself) and define sorting Key for ‘Policy’ and ‘Rider’. After that split stream into two pipelines:
- Connect ‘Policy’ and ‘Rider’ to FILTER transformation and filter records by ‘Rider’ = 0. - After that link ‘Policy’ (only) to AGGREGATOR and set Group By to ‘YES’ for ‘Policy’. - Add a new port with FIRST or MAX function for ‘Policy’ port. This is to remove duplicate ‘Policy’-es.- Indicate ‘Sorted Input’ in the AGGREGATOR properties.- After that link ‘Policy’ from AGR to JOINER as Master in Port tab.
2.- Second stream, from SORTER, directly link to above JOINER (with aggregated ‘Policy’) as Detail. - Indicate ‘Sorted Input’ in the JOINER properties. - Set Join Type as ‘Normal Join’ and Join Condition as POLICY(master)=POLICY(detail) in JOINER properties.
... Target
Second option:
Just Override SQL in Source Qualifier...
WITH PLC as (
select POLICY
from SRC_TBL
where RIDER=0)
select s.POLICY, s.RIDER, s.PLAN
from PLC p left JOIN SRC_TBL s on s.POLICY = p.POLICY;
may vary depend on your source table constructions...
I have the following XML group...
<BOXES>
<BOX_CODE>01</BOX_CODE>
<BOX_CODE>12</BOX_CODE>
<BOX_CODE>15</BOX_CODE>
<BOX_CODE>45</BOX_CODE>
<BOX_CODE>46</BOX_CODE>
<BOX_CODE>70</BOX_CODE>
<BOX_CODE>80</BOX_CODE>
<BOX_CODE>98</BOX_CODE>
<BOX_CODE>SA</BOX_CODE>
</BOXES>
... and in the RTF template I would like to display each of those values in a separate column, like this:
01 | 12 | 15 | 45 | 46 | 70 | 80 | 98 | SA
I am trying to use a for-each-group function but not getting the results I want.
Keep in mind that the number of BOX_CODE values is dynamic. In my example there are 9, but there could be less or more at any given time.
I tried using for-each-group#column but did not get the results I wanted. Any help would be greatly appreciated.
Well, I went about it a different way. I created the group like this instead...
<BOX_GROUP>
<BOXES><BOX_CODE>01</BOX_CODE></BOXES>
<BOXES><BOX_CODE>12</BOX_CODE></BOXES>
<BOXES><BOX_CODE>15</BOX_CODE></BOXES>
<BOXES><BOX_CODE>45</BOX_CODE></BOXES>
<BOXES><BOX_CODE>46</BOX_CODE></BOXES>
<BOXES><BOX_CODE>70</BOX_CODE></BOXES>
<BOXES><BOX_CODE>80</BOX_CODE></BOXES>
<BOXES><BOX_CODE>98</BOX_CODE></BOXES>
<BOXES><BOX_CODE>SA</BOX_CODE></BOXES>
</BOX_GROUP>
And I was able to get the desired results using...
<?for-each-group#column: BOXES; BOX_CODE?>
<?BOX_CODE?>
<?end for-each-group?>
I created a toy example of my code below.
In this toy example I would like to create a measure of all higher prices minus lower prices within a self-created reference group. So within each reference group, I would like to take each individual and subtract its price value from all higher price values from other individuals in the same group. I do not want to have negative differences. Then I would like to sum all these differences. In creating this code I found some help here:
http://www.stata.com/support/faqs/data-management/try-all-values-with-foreach/
However, the code didn't work perfectly for me, because my dataset is quite large (several 100K obs) and the examples on the website and my code only work until the numlist maximum of 1600 in Stata. (I am using version 12). The toy example with the auto dataset works, due to small size of the dataset.
I would like to ask if someone has an idea how to code this more efficiently, so that I can get around the numlist restriction. I thought about summing the differences directly without saving them in intermediate variables, but that also blow up the numlist restriction.
clear all
sysuse auto
ren headroom refgroup
bysort refgroup : egen pricerank = rank(price)
qui: su pricerank, meanonly
gen test = `r(max)'
su test
foreach i of num 1/`r(max)' {
qui: bys refgroup: gen intermediate`i' = price[_n+`i'] -price if price[_n+`i'] > price
}
egen price_diff = rowmax(intermediate*)
drop intermediate*
If I understand this correctly, this isn't even a problem that requires explicit loops. The sum of all higher prices is just the difference between two cumulative sums. You might need to think through what you want to do if prices are tied.
. clear
. set obs 10
obs was 0, now 10
. gen group = _n > 5
. set seed 2803
. gen price = ceil(1000 * runiform())
. bysort group (price) : gen sumhigherprices = sum(price)
. by group : replace sumhigherprices = sumhigherprices[_N] - sumhigherprices
(10 real changes made)
. list
+--------------------------+
| group price sumhig~s |
|--------------------------|
1. | 0 218 1448 |
2. | 0 264 1184 |
3. | 0 301 883 |
4. | 0 335 548 |
5. | 0 548 0 |
|--------------------------|
6. | 1 125 3027 |
7. | 1 213 2814 |
8. | 1 828 1986 |
9. | 1 988 998 |
10. | 1 998 0 |
+--------------------------+
Edit: For what the OP needs, there is an extra line
. by group : replace sumhigherprices = sumhigherprices - (_N - _n) * price
If I understand the wording of the problem correctly, maybe this can help. It uses joinby (new observations are created and depending on the size of the original database, you may or not hit the Stata hard-limit on number of observations). The code reproduces the results that would follow from the code of the original post. This is a second attempt. The code before this final edit did not provide the sought-after results. The wording of the problem was somewhat difficult for me to understand.
clear all
set more off
* Load data
sysuse auto
* Delete unnecessary vars
ren headroom refgroup
keep refgroup price
* Generate id´s based on rankings (sort)
bysort refgroup (price): gen id = _n
* Pretty list
order refgroup id
sort refgroup id price
list, sepby(refgroup)
* joinby procedure
tempfile main
save "`main'"
rename (price id) =0
joinby refgroup using "`main'"
list, sepby(refgroup)
* Do not compare with itself and drop duplicates
drop if id0 >= id
* Compute differences and max
gen dif = abs(price0 - price)
collapse (max) dif, by(refgroup id0)
list, sepby(refgroup)