Why does panelr::long_panel() keep giving "column name "" cannot match any column" error? - panel

I have a very basic economic dataset in a consistent wide panel format: years as column names, rows denoting different countries. This should be pretty straightforward to reshape into a long panel. I have found workarounds but I'd like to know how to do this with panelr::long_panel() since it is so much simpler.
I keep on getting "column name "" cannot match any column" error however. Here is a reproducible example
library(panelr)
mockcountries <- c("A", "B", "C")
mockyears <- c(2001:2020)
mockdata <- data.frame(replicate(20,sample(0:1,3,rep=TRUE)))
mockdata <- cbind(mockcountries, mockdata)
colnames(mockdata) <- c("id", mockyears)
At this point the data looks like this:
id 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016
1 A 0 0 1 1 0 0 0 1 0 1 0 0 0 1 1 0
2 B 0 1 0 1 0 0 0 0 0 0 0 1 0 1 1 0
3 C 1 1 0 1 0 0 0 1 1 1 1 1 0 0 1 0
2017 2018 2019 2020
1 1 0 1 1
2 0 1 1 0
3 1 1 0 0
Then I try to use panelr::long_panel()
mockdata_panel <- panelr::long_panel(mockdata,
id = "id",
begin = 2001,
end = 2020)
#OR alternatively
years <- c("2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013",
"2014", "2015", "2016", "2017", "2018", "2019", "2020")
mockdata_panel <- panelr::long_panel(mockdata,
id = "id",
periods = years)
And I get the following error:
Error in [<-.data.frame(*tmp*, , v.names, value = c(0L, 0L, 1L)) :
column name "" cannot match any column
Neither approach seems to work. Where does it go wrong? Thank you!

Related

DAX formula for rept customers

I recently worked on a task where I needed to identify new clients.
I managed to find something similar on google and the final result was this measure that I don't understand
and maybe you can help me understand the logic behind this measure. I obviously thought wrongly that it should be >=MIN(Sheet1[Data])))
not <MIN(Sheet1[Data])))
I improvised some data along with the formula.
new_cust =
CALCULATE(
DISTINCTCOUNT(Sheet1[Cust_id])
,FILTER(
ALL(Sheet1[Data])
,Sheet1[Data]<=MAX(Sheet1[Data])
)
)
-
CALCULATE(
DISTINCTCOUNT(Sheet1[Cust_id])
,FILTER(
ALL(Sheet1[Data])
,Sheet1[Data]<MIN(Sheet1[Data])
)
)
Cust_id Data New_Cust
1 1/1/2023 1
1 1/2/2023 0
2 1/3/2023 1
2 1/4/2023 0
2 1/5/2023 0
3 1/6/2023 1
3 1/7/2023 0
1 2/1/2023 0
1 2/2/2023 0
3 2/3/2023 0
3 2/4/2023 0
3 2/5/2023 0
4 2/6/2023 1
4 2/7/2023 0
4 2/8/2023 0
1 3/1/2023 0
1 3/2/2023 0
2 3/3/2023 0
2 3/4/2023 0
3 3/5/2023 0
3 3/6/2023 0
4 3/7/2023 0
4 3/8/2023 0
6 3/9/2023 1
6 3/10/2023 0
Thank you in advance for your understanding and help

How do I rearrange elements in a 1D matrix array in-place?

I want to align the memory of a 5x5 matrix represented as an one-dimensional array.
The original array looks like this:
let mut a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25];
or
[ 1 2 3 4 5 ]
[ 6 7 8 9 10 ]
a = [ 11 12 13 14 15 ]
[ 16 17 18 19 20 ]
[ 21 22 23 24 25 ]
with a length of 25 elements.
after resizing the memory to memory aligned bounds (power of 2), the array will look like this:
a = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 6 7 8 ]
[ 9 10 11 12 13 14 15 16 ]
[ 17 18 19 20 21 22 23 24 ]
[ 25 0 0 0 0 0 0 0 ]
a = [ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
the len of a is now 64 elements.
so it will become an 8x8 matrix
the goal is to have following representation:
a = [1 2 3 4 5 0 0 0 6 7 8 9 10 0 0 0 11 12 13 14 15 0 0 0 16 17 18 19 20 0 0 0 21 22 23 24 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ];
or
[ 1 2 3 4 5 0 0 0 ]
[ 6 7 8 9 10 0 0 0 ]
[ 11 12 13 14 15 0 0 0 ]
[ 16 17 18 19 20 0 0 0 ]
[ 21 22 23 24 25 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 0 0 ]
The background is to have a memory aligned to a power of two, so calculations can be partially done in parallel ( for OpenCL float4, or the available vector sizes.). I also do not want to use a new array to simply insert the old elements at the correct positions to keep memory consumption low.
At first, I thought about swapping the elements at the range, where there should be a zero with the elements at the end of the array, keeping a pointer to the elements and simulating a queue, but elements would stack up towards the end, and I didn't came up with a working solution.
My language of choice is rust. Is there any smart algorithm to achieve the desired result?
So you have an N * N matrix represented as a vector of size N^2, then you resize the vector to M^2 (M > N), so that the first N^2 elements are the original ones. Now you want to rearrange the original elements, so that the N * N sub-matrix in the upper left of the M * M matrix is the same as the original.
One thing to note is that if you go backwards you will never overwrite a value that you will need later.
The position of index X in the M * M matrix is row X / M (integer division) and column X % M.
The desired position of index X is row X / N and column X % N
An element at row R and column C in the M * M matrix has the index R * M + C
Now taking all this information we can come up with the formula to get the new index Y for the old index X:
Y = (X / N) * M + (X % N)
So you can just make a loop from N^2 - 1 to N and copy the element to the new position calculated with the formula and set its original position to 0. (Everything is 0-based, I hope rust is 0-based as well or you will have to add some +1.)
According to maraca's solution, the code would look like this:
fn zeropad<T: Copy>(
first: T,
data: &mut Vec<T>,
dims: (usize, usize),
) -> (usize, usize) {
let r = next_pow2(dims.0);
let c = next_pow2(dims.1);
if (r, c) == (dims.0, dims.1) {
return (r, c);
}
let new_len = r * c;
let old_len = data.len();
let old_col = dims.1;
// resize
data.resize(new_len, first);
for i in (old_col..old_len).rev() {
let row: usize = i / c;
let col: usize = i % c;
// bigger matrix
let pos_old = row * c + col;
// smaller matrix
let pos_new = (i / dims.1) * c + (i % dims.1);
data[pos_new] = data[pos_old];
data[pos_old] = first;
}
return (r, c);
}

How to create relational matrix?

I have the following data:
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
client_id product_id connected clientID_productID
1 1 10 1 1;10
2 2 10 1 2;10
3 3 10 0 3;10
4 1 20 1 1;20
5 2 20 0 2;20
6 3 20 0 3;20
The goal is to produce a relational matrix:
client_id product_id clientID_productID client_pro_1_10 client_pro_2_10 client_pro_3_10 client_pro_1_20 client_pro_2_20 client_pro_3_20
1 1 10 1;10 0 1 0 0 0 0
2 2 10 2;10 1 0 0 0 0 0
3 3 10 3;10 0 0 0 0 0 0
4 1 20 1;20 0 0 0 0 0 0
5 2 20 2;20 0 0 0 0 0 0
6 3 20 3;20 0 0 0 0 0 0
In other words, when product_id equals 10, clients 1 and 2 are connected. Importantly, I do not want client 1 to be connected with herself. When product_id=20, I have only one client, meaning that there is no connection, so I should have only zeros.
To be more specific, all that I am trying to create is a square matrix of relations, with all the combinations of client/product in the columns. A client can only be connected with another if they bought the same product.
I have searched a bunch and played with other code. The difference between this problem and others already answered is that I want to keep on my table client number 3, even though she never bought any product. I want to show that she does not have a relationship with any other client. Right now, I am able to create the matrix by stacking the relationships by product (How to create relational matrix in R?), but I am struggling with a way to not stack them.
I apologize if the question is not specific enough, or too specific. Thank you anyway, stackoverflow is a lifesaver for beginners.
I believe I figured it out.
It is for sure not the most elegant answer, though.
client_id <- c(1,2,3,1,2,3)
product_id <- c(10,10,10,20,20,20)
connected <- c(1,1,0,1,0,0)
clientID_productID <- paste0(client_id,";",product_id)
df <- data.frame(client_id, product_id,connected,clientID_productID)
df2 <- inner_join(df[c(1:3)], df[c(1:3)], by = c("product_id", "connected"))
df2$Source <- paste0(df2$client_id.x,"|",df2$product_id)
df2$Target <- paste0(df2$client_id.y,"|",df2$product_id)
df2 <- df2[order(df2$product_id),]
indices = unique(as.character(df2$Source))
mtx <- as.matrix(dcast(df2, Source ~ Target, value.var="connected", fill=0))
rownames(mtx) = mtx[,"Source"]
mtx <- mtx[,-1]
diag(mtx)=0
mtx = as.data.frame(mtx)
mtx = mtx[indices, indices]
I got the result I wanted:
1|10 2|10 3|10 1|20 2|20 3|20
1|10 0 1 0 0 0 0
2|10 1 0 0 0 0 0
3|10 0 0 0 0 0 0
1|20 0 0 0 0 0 0
2|20 0 0 0 0 0 0
3|20 0 0 0 0 0 0

Kindly help to get office wise data row wise

select NUM_OFC_CODE,NUM_RO_CODE,
case when TXT_MONTH='JAN' then 1 ELSE 0 end as JAN,
case when TXT_MONTH='FEB' then 1 ELSE 0 end as FEB,
case when TXT_MONTH='MAR' then 1 ELSE 0 end as MAR,
case when TXT_MONTH='APR' then 1 ELSE 0 end as APR,
case when TXT_MONTH='MAY' then 1 ELSE 0 end as MAY,
case when TXT_MONTH='JUN' then 1 ELSE 0 end as JUN,
case when TXT_MONTH='JUL' then 1 ELSE 0 end as JUL,
case when TXT_MONTH='AUG' then 1 ELSE 0 end as AUG,
case when TXT_MONTH='SEP' then 1 ELSE 0 end as SEP,
case when TXT_MONTH='OCT' then 1 ELSE 0 end as OCT,
case when TXT_MONTH='NOV' then 1 ELSE 0 end as NOV,
case when TXT_MONTH='DEC' then 1 ELSE 0 end as DEC
from LEG_OMBUDSMAN_NONMACT where
NUM_YEAR=2019 group by NUM_OFC_CODE,TXT_MONTH,NUM_RO_CODE;
Result is showing as below:-
NUM_OFC_CODE NUM_RO_CODE JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC
280400 280000 0 0 0 0 0 0 0 1 0 0
282300 280000 0 0 0 0 0 0 0 1 0 0 0
281600 280000 0 0 0 0 0 0 0 1 0 0 0
280500 280000 0 0 0 0 0 0 1 0 0 0 0
280500 280000 0 0 0 1 0 0 0 0 0 0 0
281800 280000 0 0 0 0 0 0 0 1 0 0 0
282200 280000 0 0 0 0 0 0 0 1 0 0 0
280500 280000 0 0 0 0 1 0 0 0 0 0 0
280500 280000 0 0 0 0 0 1 0 0 0 0 0
280500 280000 0 0 0 0 0 0 0 1 0 0 0
281300 280000 0 0 0 0 0 0 0 1 0 0 0
I want office wise data. If August data is present, Then It should show 1 else 0. Like wise for other months. But in my query Separate row is showing for separate months.
Basically you have to group the data only by NUM_OFC_CODE,NUM_RO_CODE (excluding TXT_MONTH as you don't want a row for each instance of TXT_MONTH) and then use something like NVL(MAX(CASE WHEN TXT_MONTH='JAN' THEN 1 END), 0) as JAN (using a aggregate function to decide wether an entry exists or not) etc.
It's easier with the use of pivot:
-- Just some sampledata:
WITH LEG_OMBUDSMAN_NONMACT(NUM_OFC_CODE, NUM_RO_CODE, NUM_YEAR, TXT_MONTH) AS
(SELECT 1,1,2019, 'JAN' FROM dual union ALL
SELECT 1,1,2019, 'FEB' FROM dual)
-- Here starts the actual query:
SELECT NUM_OFC_CODE, NUM_RO_CODE
, NVL(JAN,0) AS JAN
, NVL(FEB,0) AS FEB
, NVL(MAR,0) AS MAR
, NVL(APR,0) AS APR
, NVL(MAY,0) AS MAY
, NVL(JUN,0) AS JUN
, NVL(JUL,0) AS JUL
, NVL(AUG,0) AS AUG
, NVL(SEP,0) AS SEP
, NVL(OCT,0) AS OCT
, NVL(NOV,0) AS NOV
, NVL(DEC,0) AS DEC
FROM LEG_OMBUDSMAN_NONMACT
pivot (MAX(1) FOR TXT_MONTH IN ('JAN' AS JAN,'FEB' AS FEB,'MAR' as MAR, 'APR' as APR, 'MAY' as MAY, 'JUN' as JUN, 'JUL' as JUL, 'AUG' as AUG, 'SEP' as SEP, 'OCT' as OCT, 'NOV' as NOV, 'DEC' as DEC ))
WHERE NUM_YEAR=2019
Your query is perfectly fine, just need couple of changes.
Remove txt_month from group by.
Use Max in all case statements.
So your query should look like this
select NUM_OFC_CODE,
NUM_RO_CODE,
Max(case when TXT_MONTH='JAN' then 1 ELSE 0 end) as JAN,
Max(case when TXT_MONTH='FEB' then 1 ELSE 0 end) as FEB,
Max(case when TXT_MONTH='MAR' then 1 ELSE 0 end) as MAR,
Max(case when TXT_MONTH='APR' then 1 ELSE 0 end) as APR,
Max(case when TXT_MONTH='MAY' then 1 ELSE 0 end) as MAY,
Max(case when TXT_MONTH='JUN' then 1 ELSE 0 end) as JUN,
Max(case when TXT_MONTH='JUL' then 1 ELSE 0 end) as JUL,
Max(case when TXT_MONTH='AUG' then 1 ELSE 0 end) as AUG,
Max(case when TXT_MONTH='SEP' then 1 ELSE 0 end) as SEP,
Max(case when TXT_MONTH='OCT' then 1 ELSE 0 end) as OCT,
Max(case when TXT_MONTH='NOV' then 1 ELSE 0 end) as NOV,
Max(case when TXT_MONTH='DEC' then 1 ELSE 0 end) as DEC
from LEG_OMBUDSMAN_NONMACT
where NUM_YEAR=2019
group by NUM_OFC_CODE ,NUM_RO_CODE;
Cheers!!

how can I create an incidence matrix in Julia

I would like to create an incidence matrix.
I have a file with 3 columns, like:
id x y
A 22 2
B 4 21
C 21 360
D 26 2
E 22 58
F 2 347
And I want a matrix like (without col and row names):
2 4 21 22 26 58 347 360
A 1 0 0 1 0 0 0 0
B 0 1 1 0 0 0 0 0
C 0 0 1 0 0 0 0 1
D 1 0 0 0 1 0 0 0
E 0 0 0 1 0 1 0 0
F 1 0 0 0 0 0 1 0
I have started the code like:
haps = readdlm("File.txt",header=true)
hap1_2 = map(Int64,haps[1][:,2:end])
ID = (haps[1][:,1])
dic1 = Dict()
for (i in 1:21)
dic1[ID[i]] = hap1_2[i,:]
end
X=[zeros(21,22)]; #the original file has 21 rows and 22 columns
X1 = hcat(ID,X)
The problem now is that I don't know how to fill the matrix with 1s in the specific columns as in the example above.
I'm also not sure if I'm on the right way.
Any suggestion that could help me??
Thanks!
NamedArrays is a neat package which allows naming both rows and columns and seems to fit the bill for this problem. Suppose the data is in data.csv, here is one method to go about it (install NamedArrays with Pkg.add("NamedArrays")):
data,header = readcsv("data.csv",header=true);
# get the column names by looking at unique values in columns
cols = unique(vec([(header[j+1],data[i,j+1]) for i in 1:size(data,1),j=1:2]))
# row names from ID column
rows = data[:,1]
using NamedArrays
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
# now stamp in the 1s in the right places
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],(header[c],data[r,c])] = 1 ; end
Now we have (note I transposed narr for better printout):
julia> narr'
10x6 NamedArray{Int64,2}:
attr ╲ id │ A B C D E F
──────────┼─────────────────
("x",22) │ 1 0 0 0 1 0
("x",4) │ 0 1 0 0 0 0
("x",21) │ 0 0 1 0 0 0
("x",26) │ 0 0 0 1 0 0
("x",2) │ 0 0 0 0 0 1
("y",2) │ 1 0 0 1 0 0
("y",21) │ 0 1 0 0 0 0
("y",360) │ 0 0 1 0 0 0
("y",58) │ 0 0 0 0 1 0
("y",347) │ 0 0 0 0 0 1
But, if DataFrames are necessary, similar tricks should apply.
---------- UPDATE ----------
In case the column of a value should be ignored i.e. x=2 and y=2 should both set a 1 on column for value 2, then the code becomes:
using NamedArrays
data,header = readcsv("data.csv",header=true);
rows = data[:,1]
cols = map(string,sort(unique(vec(data[:,2:end]))))
narr = NamedArray(zeros(Int,length(rows),length(cols)),(rows,cols),("id","attr"));
for r=1:size(data,1),c=2:size(data,2) narr[data[r,1],string(data[r,c])] = 1 ; end
giving:
julia> narr
6x8 NamedArray{Int64,2}:
id ╲ attr │ 2 4 21 22 26 58 347 360
──────────┼───────────────────────────────────────
A │ 1 0 0 1 0 0 0 0
B │ 0 1 1 0 0 0 0 0
C │ 0 0 1 0 0 0 0 1
D │ 1 0 0 0 1 0 0 0
E │ 0 0 0 1 0 1 0 0
F │ 1 0 0 0 0 0 1 0
Here is a slight variation on something that I use for creating sparse matrices out of categorical variables for regression analyses. The function includes a variety of comments and options to suit it to your needs. Note: as written, it treats the appearances of "2" and "21" in x and y as separate. It is far less elegant in naming and appearance than the nice response from Dan Getz. The main advantage here is that it works with sparse matrices so if your data is huge, this will be helpful in reducing storage space and computation time.
function OneHot(x::Array, header::Bool)
UniqueVals = unique(x)
Val_to_Idx = [Val => Idx for (Idx, Val) in enumerate(unique(x))] ## create a dictionary that maps unique values in the input array to column positions in the new sparse matrix.
ColIdx = convert(Array{Int64}, [Val_to_Idx[Val] for Val in x])
MySparse = sparse(collect(1:length(x)), ColIdx, ones(Int32, length(x)))
if header
return [UniqueVals' ; MySparse] ## note: this won't be sparse
## alternatively use return (MySparse, UniqueVals) to get a tuple, second element is the header which you can then feed to something to name the columns or do whatever else with
else
return MySparse ## use MySparse[:, 2:end] to drop a value (which you would want to do for categorical variables in a regression)
end
end
x = [22, 4, 21, 26, 22, 2];
y = [2, 21, 360, 2, 58, 347];
Incidence = [OneHot(x, true) OneHot(y, true)]
7x10 Array{Int64,2}:
22 4 21 26 2 2 21 360 58 347
1 0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 1 0 0 0
0 0 1 0 0 0 0 1 0 0
0 0 0 1 0 1 0 0 0 0
1 0 0 0 0 0 0 0 1 0
0 0 0 0 1 0 0 0 0 1

Resources