how to define for in rweka- InfoGainAttributeEval

how to define for in rweka- InfoGainAttributeEval - algorithm

is there anybody can tell me how to define the formula in the rweka?
A<- InfoGainAttributeEval(formula ~ . , data = TrainDataLSVT,na.action=NULL )
there are 310 features in the TrainDataLSVT.

Since you do not provide your data, I will illustrate with the built-in iris data (see ?iris). For this data, the goal is to predict the Species as a function of the other variables. You can express that as a formula for InfoGainAttributeEval like this:
InfoGainAttributeEval(Species ~ ., data=iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width
0.6982615 0.3855963 1.4180030 1.3784027
The values returned are the scores for each variable. The key part is the formula Species ~ . You should read this as "Species as a function of all other variables". Details on how to write a formula are available on the help page ?formula.

Related

CountIFS with filter

Syntax:
CountIfS(Range1, condition1, Range2, Condition2,.... So on)
Can we use FILTER function to retrieve a value.
I am trying below function
=COUNTIFS(A2:A610, "Yes", $B$2:$B$610, FILTER(Sheet2!14:14, Sheet2!2:2=G1))
The output is not correct answer, neither the its throwing any error.
I could save the output of filter(Sheet2!14:14,Sheet2!2:2=G1) in different cell and refer that cell in 2nd condition. But for that I need make plethora cells as I need to use this countifs function in every column.
PS : filter(Sheet2!14:14,Sheet2!2:2=G1) returns the correct value.

that's not how it works. the output of your FILTER formula needs to be exactly 609 cells to match the range of your COUNTIFS formula. only then your formula will work.
in your case try:
=COUNTA(IFNA(FILTER(A2:A; A2:A="yes"; REGEXMATCH(B2:B;
TEXTJOIN("|"; 1; FILTER(Sheet2!14:14; Sheet2!2:2=G1))))))

using ols from statsmodels.formula.api with only a constant term?

I'd like to show students what happens when only a constant is used in a regression model. I specified one model as price ~ age for an OLS model of the price of used cars as a function of age plus a constant. Now I'd like to drop the age variable and just have the constant. How do I do this?

The formula fitting in statsmodels uses Patsy, which tries to mimic R-style model specifications.
Since you didn't specify a data source, I've taken a dataset from the
statsmodels OLS guide to provide a worked example - can wealth explain lottery spending:
import statsmodels.api as sm
import statsmodels.formula.api as smf
# load example and trim to a few features
df = sm.datasets.get_rdataset("Guerry", "HistData").data
df = df[['Lottery', 'Literacy', 'Wealth', 'Region']].dropna()
# fit with y=mx + c
model1 = smf.ols(formula='Lottery ~ Wealth', data=df).fit()
print(model1.summary())
# fit with y=c (only an intercept)
model2 = smf.ols(formula='Lottery ~ 1', data=df).fit()
print(model2.summary())
For your question, a model with only the intercept is nothing more than the mean, but presumably you are interested in techniques for comparing different models, so let's do a quick comparison to see whether the simpler model gives a better fit - one option is the f-test:
f_val, p_val, _ = model1.compare_f_test(model2)
print(f_val, p_val, p_val<0.01)
The p value is below 1% significance level, so we interpret that the more complex model is "more correct" in this case.
For completeness, to specify a model without an intercept (useful e.g. if we already mean-centered the data), we can exclude with -1 in the formula:
# y = mx
model3 = smf.ols(formula='Lottery ~ Wealth -1', data=df).fit()
print(model3.summary())
f_val, p_val, _ = model1.compare_f_test(model3)
print(f_val, p_val, p_val<0.01)
Again, p_val is below 1% significance level, so including intercept and slope improves model fit. (No multi-test correction here, but p values are <<1%)

SPSS Count function advice

Im having some issues with some SPSS code. Im new to SPSS and still trying to figure out the syntax. I'm trying to get my code to count the sum of two dice equal to 7. I cant get the count function to work the way I want it. Below is my code. Any tips would be greatly appreciated.
INPUT PROGRAM.
LOOP #I=1 TO 100000.
COMPUTE case = 1.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.
COMPUTE Dice_1 = TRUNC (RV.UNIFORM(1,7)).
COMPUTE Dice_2 = TRUNC (RV.UNIFORM(1,7)).
COMPUTE total = Dice_1+Dice_2.
COMPUTE Number_Sum7= Dice_1+Dice_2 = 7.
COUNT Num= case TO Number_Sum7(1).
SAVE outfile = 'my file path'.

count function counts across a list of variables, in each line separately.
What you seem to be looking for is to count over rows.
You can start with:
frequencies total. /* see counts of all possible totals.
means Number_Sum7/cells=sum. /* count only the cases where total=7.
These will give you the answers in the output window.
If you want the answers in data for further analysis, look up the aggregate function.
For example, the following will give you the same results but in a new datasets:
DATASET NAME ORIG.
DATASET DECLARE freqs.
AGGREGATE /OUTFILE='freqs' /BREAK=total /Mycount=N.
DATASET ACTIVATE ORIG.
DATASET DECLARE only7.
AGGREGATE /OUTFILE='only7' /BREAK= /only7=sum(Number_Sum7).
Or, instead, you can add the results to your present data:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=total /TotalCount=N.
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /total7=sum(Number_Sum7).

Random number in Lua script Load Impact

I'm trying to create a random number generator in Lua. I found out that I can just use math.random(1,100) to randomize a number between 1 and 100 and that should be sufficient.
But I don't really understand how to use the randomize number as variables in the script.
Tried this but of course it didn't work.
$randomCorr = math.random(1,100);
http.request_batch({
{"POST", "https://store.thestore.com/priceAndOrder/selectProduct", headers={["Content-Type"]="application/json;charset=UTF-8"}, data="{\"ChoosenPhoneModelId\":4,\"PricePlanId\":\"phone\",\"CorrelationId\":\"$randomCorr\",\"DeliveryTime\":\"1 vecka\",\"$$hashKey\":\"006\"},\"ChoosenAmortization\":{\"AmortizationLength\":0,\"ChoosenDataPackage\":{\"Description\":\"6 GB\",\"PricePerMountInKr\":245,\"DataAmountInGb\":6,\"$$hashKey\":\"00W\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"Title\":\"Fastpris\",\"Description\":\"Fasta kostnader till fast pris\",\"MonthlyAmount\":0,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":0,\"$$hashKey\":\"00K\"}}", auto_decompress=true},
{"GET", "https://store.thestore.com/api/checkout/getproduct?correlationId=$randomCorr", auto_decompress=true},
})

In Lua, you can not start a variable name with $. This is where your main issue is at. Once the $ is removed from your code, we can easily see how to refer to variables in Lua.
randomCorr = math.random(100)
print("The random number:", randomCorr)
randomCorr = math.random(100)
print("New Random Number:", randomCorr)
Also, concatenation does not work the way you are implying it into your Http array. You have to concatenate the value in using .. in Lua
Take a look at the following example:
ran = math.random(100)
data = "{\""..ran.."\"}"
print(data)
--{"14"}
The same logic can be implied into your code:
data="{\"ChoosenPhoneModelId\":4,\"PricePlanId\":\"phone\",\"CorrelationId\":\""..randomCorr.."\",\"DeliveryTime\":\"1 vecka\",\"$$hashKey\":\"006\"},\"ChoosenAmortization\":{\"AmortizationLength\":0,\"ChoosenDataPackage\":{\"Description\":\"6 GB\",\"PricePerMountInKr\":245,\"DataAmountInGb\":6,\"$$hashKey\":\"00W\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"Title\":\"Fastpris\",\"Description\":\"Fasta kostnader till fast pris\",\"MonthlyAmount\":0,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":0,\"$$hashKey\":\"00K\"}}"
Or you can format the value in using one of the methods provided by the string library
Take a look at the following example:
ran = math.random(100)
data = "{%q}"
print(string.format(data,ran))
--{"59"}
The %q specifier will take whatever you put as input, and safely surround it with quotations
The same logic can be applied to your Http Data.

Here is a corrected version of the code snippet:
local randomCorr = math.random(1,100)
http.request_batch({
{"POST", "https://store.thestore.com/priceAndOrder/selectProduct", headers={["Content-Type"]="application/json;charset=UTF-8"}, data="{\"ChoosenPhoneModelId\":4,\"PricePlanId\":\"phone\",\"CorrelationId\":\"" .. randomCorr .. "\",\"DeliveryTime\":\"1 vecka\",\"$$hashKey\":\"006\"},\"ChoosenAmortization\":{\"AmortizationLength\":0,\"ChoosenDataPackage\":{\"Description\":\"6 GB\",\"PricePerMountInKr\":245,\"DataAmountInGb\":6,\"$$hashKey\":\"00W\"},\"ChoosenPriceplan\":{\"IsPostpaid\":true,\"Title\":\"Fastpris\",\"Description\":\"Fasta kostnader till fast pris\",\"MonthlyAmount\":0,\"AvailiableDataPackages\":null,\"SubscriptionBinding\":0,\"$$hashKey\":\"00K\"}}", auto_decompress=true},
{"GET", "https://store.thestore.com/api/checkout/getproduct?correlationId=" .. randomCorr, auto_decompress=true},
})
There is something called $$hashKey also, in the quoted string. Not sure if that is supposed to be referencing a variable or not. If it is, it also needs to be concatenated into the resulting string, using the .. operator (just like with the randomCorr variable).

How to give equations in Apache pig

I am trying to get a value from this equation
--counted gives the total row count in a file
samplecount = counted*(10/100);
How to sample data according to this
--Load data
examples = LOAD '/home/sreeveni/myfiles/PE/USCensus1990New.csv' ;
--Group data
groupedByUser = group examples all;
--count no of lines in the file
counted = FOREACH groupedByUser generate COUNT(examples) ;
--sampling
sampled = SAMPLE examples counted*(10/100);
store sampled into '/home/sreeveni/myfiles/OUT/samplesout';
Showing error in above line
Invalid scalar projection: counted : A column needs to be projected
from a relation for it to be used as a scalar
Please advice.
Am I doing anything wrong.

i guess sample works with a number between [0,1]. In your case, its exceeding the required value. If you want just 10% of the data, pass 0.1 directly and to get that in a code, find this percentage in a FOREACH statement only.

If you are trying to generate a sample of "examples" with 10% of the total number of rows, all you have to do is:
SAMPLE examples 0.1;
Read the documentation for SAMPLE command here.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

how to define for in rweka- InfoGainAttributeEval - algorithm

is there anybody can tell me how to define the formula in the rweka? A<- InfoGainAttributeEval(formula ~ . , data = TrainDataLSVT,na.action=NULL ) there are 310 features in the TrainDataLSVT.

Related

CountIFS with filter

using ols from statsmodels.formula.api with only a constant term?

SPSS Count function advice

Random number in Lua script Load Impact

How to give equations in Apache pig

Categories

Resources