expanding dates outside of current range - time

I am trying to expand a data set to include dates outside of the current range.
The data I have ranges from 1992q1 to 2017q1. Each observation exists within a portion of that larger window, for example from 1993q2 to 1997q1.
I need to create quarterly observations for each range to fill the missing time. I have already expanded the existing data into quarters.
What I cannot figure out how to do is add in those missing quarters. For example, country1 may have the dates 1993q2 to 1997q1. I need to add in the missing dates from 1992q1 to 1993q1 and 1997q2 to 2017q1.

A very simple analogue of I want I think is your question is shown by this sandbox dataset.
clear
set obs 10
gen id = cond(_n < 7, 1, 2)
gen qdate = yq(1992, 1) in 1
replace qdate = yq(1992, 3) in 7
bysort id (qdate) : replace qdate = qdate[_n-1] + 1 if missing(qdate)
format qdate %tq
list, sepby(id)
+-------------+
| id qdate |
|-------------|
1. | 1 1992q1 |
2. | 1 1992q2 |
3. | 1 1992q3 |
4. | 1 1992q4 |
5. | 1 1993q1 |
6. | 1 1993q2 |
|-------------|
7. | 2 1992q3 |
8. | 2 1992q4 |
9. | 2 1993q1 |
10. | 2 1993q2 |
+-------------+
fillin id qdate
list, sepby(id)
+-----------------------+
| id qdate _fillin |
|-----------------------|
1. | 1 1992q1 0 |
2. | 1 1992q2 0 |
3. | 1 1992q3 0 |
4. | 1 1992q4 0 |
5. | 1 1993q1 0 |
6. | 1 1993q2 0 |
|-----------------------|
7. | 2 1992q1 1 |
8. | 2 1992q2 1 |
9. | 2 1992q3 0 |
10. | 2 1992q4 0 |
11. | 2 1993q1 0 |
12. | 2 1993q2 0 |
+-----------------------+
So. fillin is a simple way of ensuring that all cross-combinations of identifier and time are present. However, to what benefit? Although not shown in this example, values of other variables spring into existence only as missing values. In some situations, proceeding with interpolation is justified, but usually, you just live with incomplete panels.
How to find solutions like these? One good strategy is to skim through the [D] manual to see what basic data management commands exist.

Related

Getting cumulative risk values

Consider the following toy example:
use https://data.princeton.edu/pop509/justices2.dta, clear
stset tenure, fail(event == 1)
stcrreg age year, compete (event == 2)
stcurve, cif
I want to plot a cumulative incidence curve as done above but then I want to store those values with their 95% confidence intervals. However, it is not clear to me how to access/store them as variables.
Cross-posted at Statalist.
Use the outfile() option of the stcurve command:
stcurve, cif outfile(stdata)
use stdata
list in 1/10
+---------------------+
| ci1 _t |
|---------------------|
1. | .0465373 5.691992 |
2. | 0 1.045859 |
3. | .2600816 20.6078 |
4. | .1169629 8.876112 |
5. | .0465373 5.724846 |
|---------------------|
6. | .1249585 9.440109 |
7. | 0 .4462697 |
8. | .1574731 13.49213 |
9. | .1991083 15.36756 |
10. | .0232038 4.769336 |
+---------------------+

MDX - filter empty outside of selected range

Cube is populated with data divided into time dimension ( period ) which represents a month.
Following query:
select non empty {[Measures].[a], [Measures].[b], [Measures].[c]} on columns,
{[Period].[Period].ALLMEMEMBERS} on rows
from MyCube
returns:
+--------+----+---+--------+
| Period | a | b | c |
+--------+----+---+--------+
| 2 | 3 | 2 | (null) |
| 3 | 5 | 3 | 1 |
| 5 | 23 | 2 | 2 |
+--------+----+---+--------+
Removing non empty
select {[Measures].[a], [Measures].[b], [Measures].[c]} on columns,
{[Period].[Period].ALLMEMEMBERS} on rows
from MyCube
Renders:
+--------+--------+--------+--------+
| Period | a | b | c |
+--------+--------+--------+--------+
| 1 | (null) | (null) | (null) |
| 2 | 3 | 2 | (null) |
| 3 | 5 | 3 | 1 |
| 4 | (null) | (null) | (null) |
| 5 | 23 | 2 | 2 |
| 6 | (null) | (null) | (null) |
+--------+--------+--------+--------+
What i would like to get, is all records from period 2 to period 5, first occurance of values in measure "a" denotes start of range, last occurance - end of range.
This works - but i need this to be dynamically calculated during runtime by mdx:
select non empty {[Measures].[a], [Measures].[b], [Measures].[c]} on columns,
{[Period].[Period].&[2] :[Period].[Period].&[5]} on rows
from MyCube
desired output:
+--------+--------+--------+--------+
| Period | a | b | c |
+--------+--------+--------+--------+
| 2 | 3 | 2 | (null) |
| 3 | 5 | 3 | 1 |
| 4 | (null) | (null) | (null) |
| 5 | 23 | 2 | 2 |
+--------+--------+--------+--------+
I tried looking for first/last values but just couldn't compose them into the query properly. Anyone has this issue before ? This should be pretty common seeing as I want to get a continuous financial report without skipping months where nothing is going on. Thanks.
Maybe try playing with NonEmpty / Tail function in a WITH clause:
WITH
SET [First] AS
{HEAD(NONEMPTY([Period].[Period].MEMBERS, [Measures].[a]))}
SET [Last] AS
{TAIL(NONEMPTY([Period].[Period].MEMBERS, [Measures].[a]))}
SELECT
{
[Measures].[a]
, [Measures].[b]
, [Measures].[c]
} on columns,
[First].ITEM(0).ITEM(0)
:[Last].ITEM(0).ITEM(0) on rows
FROM MyCube;
to debug a custom set, to see what members it is returning you can do something like this:
WITH
SET [First] AS
{HEAD(NONEMPTY([Period].[Period].MEMBERS, [Measures].[a]))}
SELECT
{
[Measures].[a]
, [Measures].[b]
, [Measures].[c]
} on columns,
[First] on rows
FROM MyCube;
I think reading your comment about Children means that this is also an alternative - to add an extra [Period]:
WITH
SET [First] AS
{HEAD(NONEMPTY([Period].[Period].[Period].MEMBERS
, [Measures].[a]))}
SET [Last] AS
{TAIL(NONEMPTY([Period].[Period].[Period].MEMBERS
, [Measures].[a]))}
SELECT
{
[Measures].[a]
, [Measures].[b]
, [Measures].[c]
} on columns,
[First].ITEM(0).ITEM(0)
:[Last].ITEM(0).ITEM(0) on rows
FROM MyCube;

How to implement Oracle's "func(...) keep (dense_rank ...)" In Hive

I have a table abcd in Oracle DB
+-------------+----------+
| abcd.speed | abcd.ab |
+-------------+----------+
| 4.0 | 2 |
| 4.0 | 2 |
| 7.0 | 2 |
| 7.0 | 2 |
| 8.0 | 1 |
+-------------+----------+
And I'm using a query like this:
select min(speed) keep (dense_rank last order by abcd.ab NULLS FIRST) MOD from abcd;
I'm trying to convert the code to Hive, but it looks like keep is not available in Hive.
Could you suggest an equivalent statement?
select -max(struct(ab,-speed)).col2 as mod
from abcd
;
+------+
| mod |
+------+
| 4.0 |
+------+
Let start by explaining min(speed) keep (dense_rank last order by abcd.ab NULLS FIRST):
Find the row(s) with the max value of ab.
For this/those row(s), find the min value of speed.
We are using 2 tricks here.
The 1st is based on the ability to get the max value of a struct.
max(struct(c1,c2,c3,...)) returns the same result as if you have sorted the structs by c1, then by c2, then by c3 etc. and then chose the last element.
The 2nd trick is to use -speed (which is the same of -1*speed).
Finding the max of -speed and then taking the minus of that value (which gives us speed), is the same of finding the min of speed.
If we would have ordered the structs, it would have looked like this (since 2 is bigger than 1 and -4 is bigger than -7):
+----+-------+
| ab | speed |
+----+-------+
| 1 | -8.0 |
| 2 | -7.0 |
| 2 | -7.0 |
| 2 | -4.0 |
| 2 | -4.0 |
+----+-------+
The last struct in this case in struct(2,-4.0), therefore this is the result of the max function.
The fields names for a struct are col1, col2, col3 etc., so
struct(2,-4.0).col2 is -4.0. and preceding it with minus (which is the same as multiple it by -1) as in -struct(2,-4.0).col2 is 4.0.

Vbscript basic functions

I am new to programming and computer science. HTML is all I know and I have been facing problems with vbscript.
This program (my first in vbscript) was given by my teacher. But I really do not understand anything. I referred to my book but in vain.
I am not even sure if this is the right SE to post the question.
Please help.
What you have there is a loop with another nested loop, both of which print some text to the screen (document.write("...")).
The outer loop
For i = 1 To 5 Step 1
...
Next
iterates from 1 to 5 in steps of 1 (which is redundant, since 1 is the default step size, so you could just omit the Step 1). If you printed the value of i inside the loop
For i = 1 To 5 Step 1
document.Write(i & "<br>")
Next
You'd get the following output:
1
2
3
4
5
In your code sample you just print <br>, though, so each cycle of the outer loop just prints a line break.
In addition to printing line breaks in the outer loop you also have a nested loop, which for each cycle of the outer loop iterates from 1 to the current value of i, again in steps of 1.
For j = 1 To i Step 1
...
Next
So in the first cycle of the outer loop (i=1) the inner loop iterates from 1 to 1, in the second cycle of the outer loop (i=2) it iterates from 1 to 2, and so on.
For i = 1 To 5 Step 1
document.Write(i & "<br>")
For j = 1 To i Step 1
document.Write("*")
Next
Next
Since the inner loop prints an asterisk with each cycle you get i asterisks per line before the inner loop ends, the outer loop then goes into the next cycle and prints a line break, thus ending the current output line.
A good (although somewhat tedious) way to get an understanding of how the loops work is to note the current value of each variable as well as the current output line in a table on a sheet of paper, e.g. like this:
code line | instruction | i | j | output line
----------+------------------------+-------+-------+------------
1 | For i = 1 To 5 Step 1 | 1 | Empty |
2 | document.Write("<br>") | 1 | Empty | <br>
3 | For j = 1 To i Step 1 | 1 | 1 |
4 | document.Write("*") | 1 | 1 | *
5 | Next | 1 | 1 | *
6 | Next | 1 | 1 | *
1 | For i = 1 To 5 Step 1 | 2 | 1 | *
2 | document.Write("<br>") | 2 | 1 | *<br>
3 | For j = 1 To i Step 1 | 2 | 1 |
4 | document.Write("*") | 2 | 1 | *
5 | Next | 2 | 1 | *
3 | For j = 1 To i Step 1 | 2 | 2 | *
4 | document.Write("*") | 2 | 2 | **
5 | Next | 2 | 2 | **
6 | Next | 2 | 2 | **
1 | For i = 1 To 5 Step 1 | 3 | 2 | **
2 | document.Write("<br>") | 3 | 2 | **<br>
3 | For j = 1 To i Step 1 | 3 | 1 |
4 | document.Write("*") | 3 | 1 | *
... | ... | ... | ... | ...

Sum of the grouped distinct values

This is a bit hard to explain in words ... I'm trying to calculate a sum of grouped distinct values in a matrix. Let's say I have the following data returned by a SQL query:
------------------------------------------------
| Group | ParentID | ChildID | ParentProdCount |
| A | 1 | 1 | 2 |
| A | 1 | 2 | 2 |
| A | 1 | 3 | 2 |
| A | 1 | 4 | 2 |
| A | 2 | 5 | 3 |
| A | 2 | 6 | 3 |
| A | 2 | 7 | 3 |
| A | 2 | 8 | 3 |
| B | 3 | 9 | 1 |
| B | 3 | 10 | 1 |
| B | 3 | 11 | 1 |
------------------------------------------------
There's some other data in the query, but it's irrelevant. ParentProdCount is specific to the ParentID.
Now, I have a matrix in the MS Report Designer in which I'm trying to calculate a sum for ParentProdCount (grouped by "Group"). If I just add the expression
=Sum(Fields!ParentProdCount.Value)
I get a result 20 for Group A and 3 for Group B, which is incorrect. The correct values should be 5 for group A and 1 for group B. This wouldn't happen if there wasn't ChildID involved, but I have to use some other child-specific data in the same matrix.
I tried to nest FIRST() and SUM() aggregate functions but apparently it's not possible to have nested aggregation functions, even when they have scopes defined.
I'm pretty sure there is some way to calculate the grouped distinct sum without needing to create another SQL query. Anyone got an idea how to do that?
Ok I got this sorted out by adding a ROW_NUMBER() function my SQL query:
SELECT Group, ParentID, ROW_NUMBER() OVER (PARTITION BY ParentID ORDER BY ChildID ASC) AS Position, ChildID, ParentProdCount FROM Table
and then I replaced the SSRS SUM function with
=SUM(IIF(Position = 1, ParentProdCount.Value, 0))
Put a grouping over the ParentID and use a summation over that group,
eg:
if group over ParentID = "ParentIDGroup"
then
column sum of ParentPrdCount = SUM(Fields!ParentProdCount.Value,"ParentIDGroup")

Resources