Stata overwrite all observations in cross section except last 20 non NA - panel

I have a large strongly unbalanced panel in Stata, where each cross section only has a few observations, and the rest is NA (.).
I want to overwrite all non NA observations that are not the last 20 non NA observations, in each cross section. I'm not sure how to correctly specify the range, but you can see my thoughts below. There are gaps between the observations.
Thanks
*Edit
I removed the code as it created uncertainty. It was included to show what I had tried.
My cross section dimension identifier is xsection
My time dimension identifier is id01
*Edit
I have created an example below. The code needs to extract the last 3 non NA (.) values of each cross section in variable x, and enter these into a new variable z. Alternatively, all observations in x should be set to . except the last 3 (with allowed gaps). It does not matter if a new variable z is created, or the observations in x is replaced so that it looks like z.
id01 xsection x z
2005 1 20 .
2006 1 21 .
2007 1 22 .
2008 1 23 23
2009 1 37 37
2010 1 38 38
2011 1 . .
2012 1 . .
2005 2 24 .
2006 2 25 .
2007 2 21 .
2008 2 27 27
2009 2 33 33
2010 2 . .
2011 2 37 37
2012 2 . .

Note that NA is the jargon of some other programs, but not native to Stata. Stata calls these "missing values".
If you just (1) segregate the observations with missing values, then immediately (2) identifying the last so many observations with non-missing values follows from sorting within the other observations, those with non-missing values.
. clear
. input id01 xsection x z
id01 xsection x z
1. 2005 1 20 .
2. 2006 1 21 .
3. 2007 1 22 .
4. 2008 1 23 23
5. 2009 1 37 37
6. 2010 1 38 38
7. 2011 1 . .
8. 2012 1 . .
9. 2005 2 24 .
10. 2006 2 25 .
11. 2007 2 21 .
12. 2008 2 27 27
13. 2009 2 33 33
14. 2010 2 . .
15. 2011 2 37 37
16. 2012 2 . .
17. end
. gen ismiss = missing(x)
. bysort ismiss xsection (id01) : gen z_last = z if _N - _n < 3
(10 missing values generated)
. sort id01 xsection
. assert z_last == z
Here z was supplied as what was wanted and z_last is calculated and shown to be equivalent.

This answer is a bit clunky, but it should get the job done. If x is the variable that you want to replace values to missing,
by xsection: gen maxCount = _N
by xsection: gen counter = _n
gen dropVar = maxCount - counter
replace x = . if dropVar >= 20
I am fairly sure that the equal sign should be included, but this would be easy to check.

Related

Creating subset of a data

I have a column called Project_Id which lists the names of many different projects, say Project A, Project B and so on. A second column lists the sales for each project.
A third column shows time series information. For example:
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
B 22 1
B 38 2
B 76 3
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
D 14 1
D 62 2
From this dataset, I need to choose (and thus create a new data set) only those projects which have at least 4 time series points, say, to get the new dataset (How do I get this by using an R code, is my question):
Project_ID Sales Time Series Information
A 10 1
A 25 2
A 31 3
A 59 4
C 82 1
C 23 2
C 83 3
C 12 4
C 90 5
Could someone please help?
Thanks a lot!
I tried to do some filtering with R but had little success.

Make a matrix B of the first, fourth and fifth row and the first and fifth column from matrix A in OCTAVE

I have matrix A
A =
5 10 15 20 25
10 9 8 7 6
-5 -15 -25 -35 -45
1 2 3 4 5
28 91 154 217 280
And i need to make a matrix B of the first, fourth and fifth row and the first and fifth column from matrix A.
How can i do it?
>> B = A([1,4,5],[1,5])
B =
5 25
1 5
28 280
You should look up how to use index expressions in the Matlab and Octave language to extract and work with submatrices.
See the Octave help on Index expressions: https://octave.org/doc/latest/Index-Expressions.html

Calculate within correlation in panel data long form

We have a simple panel data set in long form, which has the following structure:
i t x
1 Aug-2011 282
2 Aug-2011 -220
1 Sep-2011 334
2 Sep-2011 126
1 Sep-2012 -573
2 Sep-2012 305
1 Nov-2013 335
2 Nov-2013 205
3 Nov-2013 485
I would like to get the cross-correlation between each i within the time-variable t.
This would be possible by converting the data in wide format. Unfortunately, this approach is not feasible due to the big number of i and t values in the real data set.
Is it possible to do something like in this fictional command:
by (tabulate t): corr x
You can easily calculate the correlations of a single variable such as x across panel groups using the reshape option of the community-contributed command pwcorrf:
ssc install pwcorrf
For illustration, consider (a slightly simplified version of) your toy example:
clear
input i t x
1 2011 282
2 2011 -220
1 2012 334
2 2012 126
1 2013 -573
2 2013 305
1 2014 335
2 2014 205
3 2014 485
end
xtset i t
panel variable: i (unbalanced)
time variable: t, 2011 to 2014
delta: 1 unit
pwcorrf x, reshape
Variable(s): x
Panel var: i
corrMatrix[3,3]
1 2 3
1 1 0 0
2 -.54223207 1 0
3 . . 1

ZIO 2012: Toy Set [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
A toy set contains blocks showing the numbers from 1 to 9. There are plenty of blocks
showing each number and blocks showing the same number are indistinguishable. We
want to examine the number of different ways of arranging the blocks in a sequence so
that the displayed numbers add up to a fixed sum.
For example, suppose the sum is 4. There are 8 different arrangements:
1 1 1 1
1 1 2
1 2 1
1 3
2 1 1
2 2
3 1
4
The rows are arranged in dictionary order (that is, as they would appear if they were
listed in dictionary).
In each of the cases below, you are given the desired sum S and a number K. You have
to write down the Kth line when all arrangements that add up to S are written down
as described above. For instance, if S is 4 and K is 5, the answer is 2 1 1. Remember
that S may be large, but the numbers on the blocks are only from 1 to 9.
(a) S = 9, K = 156 (b) S = 11, K = 881 (c) S = 14, K = 4583
So basically each case (1111, 112, etc.) also known in maths as a partition of a number, although 112 and 121 count as the same partition(in maths), here I will have to consider them different partitions. In this case we are considering it differently. I tried bruteforcing by trying to find a common pattern, and if we consider an array par[] comprising of all the partitions of 9 (the first part of the question), arranged in terms of dictionary order, par[0] = 111111111, par[1] = 11111112 par[2] - par[3] will have 2 terms that comprise of 11111121 and 1111113. If we look carefully at the last 2 digits, we will notice that they are the partitions of 3. So basically the partions starting with 1 will follow an order 1+1 (partitions of 2) + 2 (partitions of 3) + 4 (partitions of 4) and so on, increasing in powers of 2, until par[127] = 18, no. of partitions of 8. We notice that on adding them we get powers of 2. However, I seem to be stuck on calculating position 156, as par[128] = 21111111, and I am unable to move further in my method. A recurrence relation or pseudocode will be most welcome. The answer as an integer is available online, but not the algorithm. Please help me out.
Source: http://www.iarcs.org.in/inoi/2012/zio2012/zio2012-qpaper.pdf
Solution: http://www.iarcs.org.in/inoi/2012/zio2012/zio2012-solutions.pdf
A hint:
partitions of 1
1 the number itself
partitions of 2
11 1 followed by partitions of 1
2 the number itself
partitions of 3
111 1 followed by partitions of 2
12 .
21 2 followed by partitions of 1
3 the number itself
partitions of 4
1111 1 followed by partitions of 3
112 .
121 .
13 .
211 2 followed by partitions of 2
22 .
31 3 followed by partitions of 1
4 the number itself
partitions of 5
11111 1 followed by partitions of 4
1112 .
1121 .
113 .
1211 .
122 .
131 .
14 .
2111 2 followed by partitions of 3
212 .
221 .
23 .
311 3 followed by partitions of 2
32 .
41 4 followed by partitions of 1
5 the number itself

How can I define a verb in J that applies a different verb alternately to each atom in a list?

Imagine I've defined the following name in J:
m =: >: i. 2 4 5
This looks like the following:
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16 17 18 19 20
21 22 23 24 25
26 27 28 29 30
31 32 33 34 35
36 37 38 39 40
I want to create a monadic verb of rank 1 that applies to each list in this list of lists. It will double (+:) or add 1 (>:) to each alternate item in the list. If we were to apply this verb to the first row, we'd get 2 3 6 5 10.
It's fairly easy to get a list of booleans which alternate with each item, e.g., 0 1 $~{:$ m gives us 0 1 0 1 0. I thought, aha! I'll use something like +:`>: #. followed by some expression, but I could never quite get it to work.
Any suggestions?
UPDATE
The following appears to work, but perhaps it can be refactored into something more elegant by a J pro.
poop =: monad define
(($ y) $ 0 1 $~{:$ y) ((]+:)`(]>:) #. [)"0 y
)
I would use the oblique verb, with rank 1 (/."1)- so it applies to successive elements of each list in turn.
You can pass a gerund into /. and it applies them in order, extending cyclically.
+:`>: /."1 m
2
3
6
5
10
12
8
16
10
20
22
13
26
15
30
32
18
36
20
40
42
23
46
25
50
52
28
56
30
60
62
33
66
35
70
72
38
76
40
80
I spent a long time and I looked at it, and I believe that I know why ,# works to recover the shape of the argument.
The shape of the arguments to the parenthesized phrase is the shape of the argument passed to it on the right, even though the rank is altered by the " conjugate (well, that is what trace called it, I thought it was an adverb). If , were monadic, it would be a ravel, and the result would be a vector or at least of a lower rank than the input, based on adverbs to ravel. That is what happens if you take the conjunction out - you get a vector.
So what I believe is happening is that the conjunction is making , act like a dyadic , which is called an append. The append alters what it is appending to what it is appending to. It is appending to nothing but that thing still has a shape, and so it ends up altering the intermediate vector back to the shape of the input.
Now I'm probably wrong. But $,"0#(+:>:/.)"1 >: i. 2 4 5 -> 2 4 5 1 1` which I thought sort of proved my case.
(,#(+:`>:/.)"1 a) works, but note that ((* 2 1 $~ $)#(+ 0 1 $~ $)"1 a) would also have worked (and is about 20 times faster, on large arrays, in my brief tests).

Resources