Using weka from a batch file - windows

I want to make predictions from a Weka saved model without opening Weka Explorer or Simple CLI interfaces. So I created a batch file:
#ECHO ON
title Weka caller
set root=C:\Program Files\Weka-3-8\
cd /D %root%
java -classpath weka.jar weka.classifiers.functions.LinearRegression -T Z:\ARFF_FILES\TestSet_regression.arff -l Z:\WEKA_MODELS\Regression_model_03_05_2018.model -p 0
I have this error message:
C:\Program Files\Weka-3-8>java -classpath weka.jar weka.classifiers.functions.LinearRegression -T Z:\ARFF_FILES\TestSet_regression.arff -l Z:\WEKA_MODELS\Regression_model_03_05_2018.model -p 0
Error: A JNI error has occurred, please check your installation and try again
Exception in thread "main" java.lang.NoClassDefFoundError: no/uib/cipr/matrix/Matrix
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Unknown Source)
at java.lang.Class.privateGetMethodRecursive(Unknown Source)
at java.lang.Class.getMethod0(Unknown Source)
at java.lang.Class.getMethod(Unknown Source)
at sun.launcher.LauncherHelper.validateMainClass(Unknown Source)
at sun.launcher.LauncherHelper.checkAndLoadMain(Unknown Source)
Caused by: java.lang.ClassNotFoundException: no.uib.cipr.matrix.Matrix
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 7 more
Is someone already called weka from windows cmd shell ??

I have not used Weka in a windows shell but the way you could do it in Linux is as follows:
#!/bin/bash
export CLASSPATH=/home/stalai/Weka/weka-3-9-1/weka.jar:.
echo $CLASSPATH
# Code that loops through various classification routines and saves the results in a corresponding text file
# Defult values
CV=103 # Cross Validation: change to 10 or keep leave one out cross validation [change by (-x)]
files=dataset.csv # Look at the required .csv files and process them
for i in {100..10};
do
java weka.classifiers.meta.AttributeSelectedClassifier -t $files -x $CV >> $CorAttEvalResults -E "weka.attributeSelection.CorrelationAttributeEval " -S "weka.attributeSelection.Ranker -T -1.7976931348623157E308 -N $i" -W weka.classifiers.lazy.IBk -- -K 1 -W 0 -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\""
done
In this example, we are eliminating the top 100 features down to 10 by using a correlation based feature ranker and saving the results in CorAttEvalResults following a leave-one-out-cross validation. The CV=103 is infact the total number of classes in the dataset.csv file.
Once you figured out the desired model, change the corresponding flag values and reload the model. Let me know if you need more help!
Also I would recommend using CSV instead of Arff as it is easier to handle cross platform if you wanna expand your code or something like that.

Related

Unable to create(Mismatch between expected number of columns) Dashboard report in Jmeter...!

I have issues while generating dashboard report in Jmeter (through command line)
1)Coped reportgenerator Properties to User Properties file
2)Restarted Jmeter to pick up the data
3)Added below to user properties file:
jmeter.save.saveservice.bytes=true
jmeter.save.saveservice.label=true
jmeter.save.saveservice.latency=true
jmeter.save.saveservice.response_code=true
jmeter.save.saveservice.response_message=true
jmeter.save.saveservice.successful=true
jmeter.save.saveservice.thread_counts=true
jmeter.save.saveservice.thread_name=true
jmeter.save.saveservice.time=true
jmeter.save.saveservice.timestamp_format=ms
jmeter.save.saveservice.timestamp_format=yyyy/MM/dd HH:mm:ss
I feel main problem is with mismatch with the CSV file/JTL file I have and trying to create report. – Give me your suggestions
ERROR | An error occurred:
org.apache.jmeter.report.dashboard.GenerationException: Error while processing samples:Mismatch between expected number of columns:16 and columns in CSV file:6, check your jmeter.save.saveservice.* configuration
at org.apache.jmeter.report.dashboard.ReportGeenter code herenerator.generate(ReportGenerator.java:246)
at org.apache.jmeter.JMeter.start(JMeter.java:517)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.jmeter.NewDriver.main(NewDriver.java:248)
Caused by: org.apache.jmeter.report.core.SampleException: Mismatch between expected number of columns:16 and columns in CSV file:6, check your
jmeter.save.saveservice.* configuration
at org.apache.jmeter.report.core.CsvSampleReader.nextSample(CsvSampleReader.java:183)
at org.apache.jmeter.report.core.CsvSampleReader.readSample(CsvSampleReader.java:201)
at org.apache.jmeter.report.processor.CsvFileSampleSource.produce(CsvFileSampleSource.java:180)
at org.apache.jmeter.report.processor.CsvFileSampleSource.run(CsvFileSampleSource.java:238)
at org.apache.jmeter.report.dashboard.ReportGenerator.generate(ReportGenerator.java:244)
... 6 more
An error occurred: Error while processing samples:Mismatch between expected number of columns:16 and columns in CSV file:6, check your jmeter.save.saveservice.* configuration
errorlevel=1
I made the same mistake. Just forget about those properties and copy in user.properties file only this:
jmeter.reportgenerator.overall_granularity=60000
jmeter.reportgenerator.apdex_statisfied_threshold=1500
jmeter.reportgenerator.apdex_tolerated_threshold=3000
jmeter.reportgenerator.exporter.html.series_filter=((^s0)|(^s1))(-success|-failure)?
jmeter.reportgenerator.exporter.html.filters_only_sample_series=true
Then from the command line run this:
.\jmeter -n -t sample_jmeter_test.jmx -l test.csv -e -o tmp
Where:
.\jmeter - you run the jmeter in \bin directory
sample_jmeter_test.jmx - name of the test that will be run, located in \bin directory
test.csv - located again in the \bin directory, this is the file that all gathered statistics will be written into
tmp is the directory where I create under \bin where the dashboard files will be saved
The file csv or jtl may be in writing still, so jmeter process report try to read file while another field and row are added to the same file. Infact I resolve the error by manual running of report generation command on the same jtl file:
jmeter -g <file csv or jtl> -o <path report>
may be possible configure a delay after running load process and the report process but I don't know if exist this option.

Using s3distcp with Amazon EMR to copy a single file

I want to copy just a single file to HDFS using s3distcp. I have tried using the srcPattern argument but it didn't help and it keeps on throwing java.lang.Runtime exception.
It is possible that the regex I am using is the culprit, please help.
My code is as follows:
elastic-mapreduce -j $jobflow --jar s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar --args '--src,s3://<mybucket>/<path>' --args '--dest,hdfs:///output' --arg --srcPattern --arg '(filename)'
Exception thrown:
Exception in thread "main" java.lang.RuntimeException: Error running job at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:586) at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:216) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at com.amazon.external.elasticmapreduce.s3distcp.Main.main(Main.java:12) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs:/tmp/a088f00d-a67e-4239-bb0d-32b3a6ef0105/files at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:197) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:40) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:1036) at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1028) at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:172) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:944) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:897) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:871) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1308) at com.amazon.external.elasticmapreduce.s3distcp.S3DistCp.run(S3DistCp.java:568) ... 9 more
DistCp is intended to copy many files using many machines. DistCp is not the right tool if you want to only copy one file.
On the hadoop master node, you can copy a single file using
hadoop fs -cp s3://<mybucket>/<path> hdfs:///output
The regex I was using was indeed the culprit.
Say the file names have dates, for example files are like abcd-2013-06-12.gz , then in order to copy ONLY this file, following emr command should do:
elastic-mapreduce -j $jobflow --jar
s3://us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar
--args '--src,s3:///' --args '--dest,hdfs:///output' --arg --srcPattern --arg '.*2013-06-12.gz'
If I remember correctly, my regex initially was *2013-06-12.gz and not .*2013-06-12.gz. So the dot at the beginning was needed.

debugging a mahout logistic regression

I am new to mahout.. And I am trying out the standard "donut" example listed here:
http://imiloainf.wordpress.com/2011/11/02/mahout-logistic-regression/
So this example works like a charm.
But when I try to implement it on my dataset (whcih is a huge dataset) it doesnt works.
The dataset is in one csv file.. everything is same except it has a lot of features (~100) and is 1TB file.
I am getting this error.
bin/mahout trainlogistic --input /path/mahout_input/complete/input.csv \
--output mahoutmodel --target default --categories 2 --predictors O1 E1 I1 \
--types numeric --features 30 --passes 100 --rate 50
Running on hadoop, using HADOOP_HOME=/opt/mapr/hadoop/hadoop-0.20.2
No HADOOP_CONF_DIR set, using /opt/mapr/hadoop/hadoop-0.20.2/conf
Exception in thread "main" java.lang.NullPointerException
at org.apache.mahout.classifier.sgd.CsvRecordFactory.firstLine(CsvRecordFactory.java:167)
at org.apache.mahout.classifier.sgd.TrainLogistic.main(TrainLogistic.java:75)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:187)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:186)
What am i doing wrong?
How do you debug this.. what is the error??
Thanks
My guess is your input doesn't exist or is empty. I'd check that /path/mahout_input/complete/input.csv is really what you mean.
Either check your input path or make sure your first line of input path having values in "" only like "x1","x2","x3","lablel"..so on
Happened to me as well.
My fault was to bypass an incorrect --target parameter, which does not exists in columns. Specifically my header line was
myColumn1,myColumn2,myColumn3
and my command line was
mahout trainlogistic --input ./input.csv --output ./logistic_model
--target myMisTypedColumn1 --predictors myColumn2 myColumn3 --types w w w --features 2 --passes 100 --rate 50 --categories 2
One another tip is: Dont use " ( quotes) or long column names, so you should avoid the headache of "Does mahout did not like my column name?" etc.
And as a feedback to MAHOUT: the error message is terrible. We should never see a "NullPointerException" in such a promising framework.

Why do we still get SWT Invalid thread access Exception on Mac with -XstartOnFirstThread VM option?

I'm working on a context simulator named siafu,
just the first step, try to build and run it on Mac OSX 10.6.8,
but I'm confused that it fails to run with Exception:
***WARNING: Display must be created on main thread due to Cocoa restrictions.
Exception in thread "GUI thread" org.eclipse.swt.SWTException: Invalid thread access
at org.eclipse.swt.SWT.error(Unknown Source)
at org.eclipse.swt.SWT.error(Unknown Source)
at org.eclipse.swt.SWT.error(Unknown Source)
at org.eclipse.swt.widgets.Display.error(Unknown Source)
at org.eclipse.swt.widgets.Display.createDisplay(Unknown Source)
at org.eclipse.swt.widgets.Display.create(Unknown Source)
at org.eclipse.swt.graphics.Device.<init>(Unknown Source)
at org.eclipse.swt.widgets.Display.<init>(Unknown Source)
at org.eclipse.swt.widgets.Display.<init>(Unknown Source)
at de.nec.nle.siafu.graphics.GUI.run(Unknown Source)
at java.lang.Thread.run(Thread.java:680)
The script to run is like this, can anyone kindly point out why do we have exception even though we have specify the -XstartOnFirstThread?
#!/bin/sh
if [ "$1" = "-h" ]; then echo "Syntax: $0 [SimulationJarFile]"; exit 1; fi
if [ -n "$1" ]; then SIMULATION="--simulation=$1"; fi
java -Xmx512m -XstartOnFirstThread -classpath \
lib/org.eclipse.swt.osx64.jar:\
Siafu.jar:\
lib/org.apache.commons.collections-3.2.1.jar:\
lib/org.apache.commons.configuration-1.6.0.jar:\
lib/org.apache.commons.lang-2.4.0.jar:\
lib/org.apache.commons.logging-1.1.1.jar \
de.nec.nle.siafu.control.Siafu $SIMULATION

bash - how to filter java exception info

We've got a multi-agent Java environment where different agent would most likely produce all sorts of exceptions thrown to stderr.
Here is a sample taken from the huge exception log
**java.security.AccessControlException: access denied (java.io.FilePermission ..\tournament\Driver\HotelRoomAnalyser.class read)**
at java.security.AccessControlContext.checkPermission(Unknown Source)
at java.security.AccessController.checkPermission(Unknown Source)
at java.lang.SecurityManager.checkPermission(Unknown Source)
at java.lang.SecurityManager.checkRead(Unknown Source)
at java.io.File.length(Unknown Source)
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClassData(EmarketSandbox.java:218)
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClass(EmarketSandbox.java:199)
at java.lang.ClassLoader.loadClass(Unknown Source)
**java.security.AccessControlException: access denied (java.io.FilePermission ..\tournament\Driver\HotelRoomAnalyser.class read)**
at java.security.AccessControlContext.checkPermission(Unknown Source)
at java.security.AccessController.checkPermission(Unknown Source)
at java.lang.SecurityManager.checkPermission(Unknown Source)
at java.lang.SecurityManager.checkRead(Unknown Source)
at java.io.File.length(Unknown Source)
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClassData(EmarketSandbox.java:218)
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClass(EmarketSandbox.java:199)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClassInternal(Unknown Source)
at MySmarterAgent.hotelRoomBookings(MySmarterAgent.java:108)
fortunately all top-tier exceptions are denoted by no leading spaces, as wrapped by ** above.
My concern is to get all of the top-tier exception name (delimited by colon :), together with the first line below which contains something like
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClassData(EmarketSandbox.java:218)
Basically, something with padded style, starts with "at" and ends with ".java:108"
So this info can be forwarded to the owner of that error-prone agent and let him/her fix it.
My code in ~/.bashrc is incompleted now :
alias startmatch='java -jar "emarket.jar" ../tournament 100';
function geterrors()
{
startmatch 2>"$1";
a=0;
while read line
do
if true;
then a=$(($a+1));
echo $a;
fi;
done
}
What it does now is to redirect all stderr to a text file specified by the first argument passed in, and after that, parse that text file line by line, if certain conditions returns true, echo only that line.
And I am stuck with what to do inside the loop.
Any suggestion is much appreciates, any hint is welcomed.
you can use awk
awk ' $1~/^\*\*/{except=$0}
/emarket\.client/{
print except
print
}' logfile
output
$ ./shell.sh
**java.security.AccessControlException: access denied (java.io.FilePermission ..\tournament\Driver\HotelRoomAnalyser.class read)**
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClassData(EmarketSandbox.java:218)
**java.security.AccessControlException: access denied (java.io.FilePermission ..\tournament\Driver\HotelRoomAnalyser.class read)**
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClass(EmarketSandbox.java:199)
**java.security.AccessControlException: access denied (java.io.FilePermission ..\tournament\Driver\HotelRoomAnalyser.class read)**
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClassData(EmarketSandbox.java:218)
**java.security.AccessControlException: access denied (java.io.FilePermission ..\tournament\Driver\HotelRoomAnalyser.class read)**
at emarket.client.EmarketSandbox$SandboxFileLoader.loadClass(EmarketSandbox.java:199)
A more accurate version, only when "emarket.client" pattern is found will it print
awk 'f&&g{next}
$1~/^\*\*/{
except=$0
f=1
g=0
}
f&&/emarket\.client/{
print except
print
f=0;g=1
}' file
How about:
java -jar "emarket.jar" ../tournament 100 | grep '^\([^ ]\| \+at.*\.java:[0-9]\+)$\)' | grep -A 1 '^[^ ]'
Not super super efficient, since it reads things twice, but eh, it's short. Look for either the unpadded line or the padded with line number, then look again for the unpadded line and keep the next line too. It puts a '--' line between each pair of matches, which you could remove by tacking on | grep -v '^--$'.

Resources