Unable to set up xtset function for panel data - panel

I have panel data with different dates and object names.
I want to create panel var in Stata (function pvar), but I need to "tsset variables"
sort object date
by object: gen t = _n
encode object , gen(icode)
xtset icode t
I get this error message
t ambiguous abbreviation

I can't reproduce this problem. Consider this
. webuse grunfeld
. xtset
panel variable: company (strongly balanced)
time variable: year, 1935 to 1954
delta: 1 year
. gen t = time
. xtset company t
panel variable: company (strongly balanced)
time variable: t, 1 to 20
delta: 1 unit
Here t would be an ambiguous abbreviation for time, but Stata's rules imply that it chooses t, so the ambiguity doesn't bite.
In your case, I would ask for the results of
describe t*

Related

QuickSight add subset of fields

Total AWS QuickSight newbie here. I'm trying to import some cost data in CSV form into QuickSight and add some calculated fields.
The data I have is of the form:
Type
Units Consumed
A
2
B
3
A
1
B
5
... and so on
Unit Cost ($) is not part of the dataset and is something like
Unit Cost
Amount ($)
Unit Cost (A)
1
Unit Cost (B)
2
I would like to compute (either as part of the dataset or as part of an analysis visual, maybe) the total costs for A and B as separate line items. Something like
Total Cost (A) = Sum(Amount where Type = A) * Unit Cost (A)
Total Cost (B) = Sum(Amount where Type = B) * Unit Cost (B)
Here are the things I've tried which don't work:
sumOver({Units Consumed}, Type='A')
sumIf({Units Consumed}, Type='A')
To break it down and test smaller parts, I added a calculated field which simply does
sum({Units Consumed})
But it just adds a column to the dataset with every field as "Undefined".
How can I achieve what I'm trying to do?
I tried to replicate the code
sumIf({Units Consumed}, Type='A')
and it worked. Could you check if Units Consumed is a integer column type?
How to change column type

Trying to generate an expression from a table based on table values

My SSRS report creates a summarized table. I need to do a calculation based on 2 pieces of data from the table and add it as an expression at the bottom of the report. My expression errors out as it is not reading the data output from the table.
I have followed up on all examples I could find with the error I get, but found no resolution. I am new to SSRS. I have tested the expression by simply using the IIf statement to return the values I'm looking for (without the rest of the calculations I need) and it doesn't return either value. I get "#Error" as the result. I have copied the lookup values directly from the SQL code, so I KNOW there are no typos in my comparison values.
I have this for the code in my expression:
=Code.Divide
(IIf(Fields!Results.Value = "BKR3 - Total Overtime Hours", Fields!Results.Value, 0)) ,
(IIf(Fields!Emp_Type.Value = "BKR1 - Total Paid Hours", Fields!Results.Value, 1))
Through another stackoverflow question I found this code and have added it to my report:
Public Function Divide(ByVal dividend as Double, ByVal divisor as Double) As Double
If IsNothing(divisor) Or divisor = 0 Or IsNothing(dividend) Or dividend=0 THEN
Return 0
Else
Return dividend/divisor
End If
End Function
I am getting this error:
The Value expression for the textrun
‘Textbox3.Paragraphs[0].TextRuns[0]’ contains an error: [BC30455]
Argument not specified for parameter 'divisor' of 'Public Function
Divide(dividend As Double, divisor As Double) As Double'.
This is what my output looks like:
*Emp_Type Results
*A3-Facility Payroll Hours 28,252.20
*A4-Provider Payroll Hours 1,998.50
*BKR1-Total Paid Hours 30,250.70
*BKR2-Total Worked Hours 27,037.62
*BKR3-Total Overtime Hours 504.20
*BS1-Hospital FTEs 99.72
*BS2-Clinic FTEs 23.25
Overtime % #Error
(I'm very sorry, I cannot get the list to indent!)
I am expecting this as my results:
Overtime % 0.18%
This is what I am getting instead:
Overtime % #Error

Poor h2o GBM Classification Performance in a balanced binomial response

In a fairly balanced binomial classification response problem, I am observing unusual level of error in h2o.gbm classification for determining class 0, on train set itself. It is from a competition which is over, so interest is only towards understanding what is going wrong.
Confusion Matrix (vertical: actual; across: predicted) for F1-optimal threshold:
0 1 Error Rate
0 147857 234035 0.612830 =234035/381892
1 44782 271661 0.141517 =44782/316443
Totals 192639 505696 0.399260 =278817/698335
Any expert suggestions to treat the data and reduce the error is welcome.
Following approaches are tried and error is not found decreasing.
Approach 1: Selecting top 5 important variables via h2o.varimp(gbm)
Approach 2: Converting the negative normalized variable as zero and possitive as 1.
#Data Definition
# Variable Definition
#Independent Variables
# ID Unique ID for each observation
# Timestamp Unique value representing one day
# Stock_ID Unique ID representing one stock
# Volume Normalized values of volume traded of given stock ID on that timestamp
# Three_Day_Moving_Average Normalized values of three days moving average of Closing price for given stock ID (Including Current day)
# Five_Day_Moving_Average Normalized values of five days moving average of Closing price for given stock ID (Including Current day)
# Ten_Day_Moving_Average Normalized values of ten days moving average of Closing price for given stock ID (Including Current day)
# Twenty_Day_Moving_Average Normalized values of twenty days moving average of Closing price for given stock ID (Including Current day)
# True_Range Normalized values of true range for given stock ID
# Average_True_Range Normalized values of average true range for given stock ID
# Positive_Directional_Movement Normalized values of positive directional movement for given stock ID
# Negative_Directional_Movement Normalized values of negative directional movement for given stock ID
#Dependent Response Variable
# Outcome Binary outcome variable representing whether price for one particular stock at the tomorrow’s market close is higher(1) or lower(0) compared to the price at today’s market close
temp <- tempfile()
download.file('https://github.com/meethariprasad/trikaal/raw/master/Competetions/AnalyticsVidhya/Stock_Closure/test_6lvBXoI.zip',temp)
test <- read.csv(unz(temp, "test.csv"))
unlink(temp)
temp <- tempfile()
download.file('https://github.com/meethariprasad/trikaal/raw/master/Competetions/AnalyticsVidhya/Stock_Closure/train_xup5Mf8.zip',temp)
#Please wait for 60 Mb file to load.
train <- read.csv(unz(temp, "train.csv"))
unlink(temp)
summary(train)
#We don't want the ID
train<-train[,2:ncol(train)]
# Preserving Test ID if needed
ID<-test$ID
#Remove ID from test
test<-test[,2:ncol(test)]
#Create Empty Response SalePrice
test$Outcome<-NA
#Original
combi.imp<-rbind(train,test)
rm(train,test)
summary(combi.imp)
#Creating Factor Variable
combi.imp$Outcome<-as.factor(combi.imp$Outcome)
combi.imp$Stock_ID<-as.factor(combi.imp$Stock_ID)
combi.imp$timestamp<-as.factor(combi.imp$timestamp)
summary(combi.imp)
#Brute Force NA treatment by taking only complete cases without NA.
train.complete<-combi.imp[1:702739,]
train.complete<-train.complete[complete.cases(train.complete),]
test.complete<-combi.imp[702740:804685,]
library(h2o)
y<-c("Outcome")
features=names(train.complete)[!names(train.complete) %in% c("Outcome")]
h2o.shutdown(prompt=F)
#Adjust memory size based on your system.
h2o.init(nthreads = -1,max_mem_size = "5g")
train.hex<-as.h2o(train.complete)
test.hex<-as.h2o(test.complete[,features])
#Models
gbmF_model_1 = h2o.gbm( x=features,
y = y,
training_frame =train.hex,
seed=1234
)
h2o.performance(gbmF_model_1)
You've only trained a single GBM with the default parameters, so it doesn't look like you've put enough effort into tuning your model. I'd recommend a random grid search on GBM using the h2o.grid() function. Here is an H2O R code example you can follow.

IF statement with current time Function in Xpath

I have a Text box Infopath that displays the current time using this function:
(substring-after(now(), "T")
I also have a Drop-Down list called "Location" that has the following values:
1.Boston
2.India
3.London
I want to modify this function to set the current time to always display the current time in Boston whether the user is from India or London.
I believe that inputting an If Statement that follows these conditions:
If Location (Drop down list) = "London" - Then use (substring-after(now(), "T") subtracted by 5 hours.
If Location (Drop down list) = "India" - Then use (substring-after(now(), "T") subtracted by 11 hours.
If Location (Drop down list) = "Boston" - Then use (substring-after(now(), "T").
I'm relatively new to Xpath and require assistance.
The easiest way is to subtract the hours before ripping the time off the end with substring.
substring-after(addSeconds(now(), -3600 * 5), "T")
WARNING: Remember the current time (now()) is calculated from the LOCAL machine. So if a person in London opens the form they will already be at London time and then your code will subtract 5 hours on top of that - making it incorrect.

calculate standard deviation of daily data within a year

I have a question,
In Matlab, I have a vector of 20 years of daily data (X) and a vector of the relevant dates (DATES). In order to find the mean value of the daily data per year, I use the following script:
A = fints(DATES,X); %convert to financial time series
B = toannual(A,'CalcMethod', 'SimpAvg'); %calculate average value per year
C = fts2mat(B); %Convert fts object to vector
C is a vector of 20 values. showing the average value of the daily data for each of the 20 years. So far, so good.. Now I am trying to do the same thing but instead of calculating mean values annually, i need to calculate std annually but it seems there is not such an option with function "toannual".
Any ideas on how to do this?
THANK YOU IN ADVANCE
I'm assuming that X is the financial information and it is an even distribution across each year. You'll have to modify this if that isn't the case. Just to clarify, by even distribution, I mean that if there are 20 years and X has 200 values, each year has 10 values to it.
You should be able to do something like this:
num_years = length(C);
span_size = length(X)/num_years;
for n = 0:num_years-1
std_dev(n+1,1) = std(X(1+(n*span_size):(n+1)*span_size));
end
The idea is that you simply pass the date for the given year (the day to day values) into matlab's standard deviation function. That will return the std-dev for that year. std_dev should be a column vector that correlates 1:1 with your C vector of yearly averages.
unique_Dates = unique(DATES) %This should return a vector of 20 elements since you have 20 years.
std_dev = zeros(size(unique_Dates)); %Just pre allocating the standard deviation vector.
for n = 1:length(unique_Dates)
std_dev(n) = std(X(DATES==unique_Dates(n)));
end
Now this is assuming that your DATES matrix is passable to the unique function and that it will return the expected list of dates. If you have the dates in a numeric form I know this will work, I'm just concerned about the dates being in a string form.
In the event they are in a string form you can look at using regexp to parse the information and replace matching dates with a numeric identifier and use the above code. Or you can take the basic theory behind this and adapt it to what works best for you!

Resources