WinMLRunner: Extract ordered features from ONNX model - windows-machine-learning

Using WinMLRunner, I pass in an ONNX model (used Netron to see it i think). Then an image, the program runs in cmd, it shows out what follows:
Outputting top 5 values
Feature Name: pool5/7x7_s1
index: 661, value: 7.60391
index: 581, value: 7.41787
...
If the model features can be exported to a list, the index can be used to see what WinML is ranking.
Without extract features, WinMLRunning works as a nice rng util ;)

the full tensor output can be output from winmlrunner into a csv file like this :
.\WinMLRunner.exe -model <path to model> -input <input path> -cpu -savetensordata first -periterationpath <desired directory path to output csv file>
Please let me know if you have any further questions.

Related

JMeter: user variable to report

I'm having next structure of TG in Jmeter:
So I have variable LB_LEVEL and 100 threads. It's value for every user may be different (from 1lv to 23lv e.g.)
I try to find a way to make some visual report with this variable, simply I have to show how users were casted to different levels.
It may be csv, or smh. Ideal table of my dream looks like this:
Googling still brings me some foam, so I need a small idea or kick to correct direction.
Thank you!
Add the next lines to user.properties file:
sample_variables=LB_LEVEL
jmeter.reportgenerator.graph.custom_testGraph.classname=org.apache.jmeter.report.processor.graph.impl.CustomGraphConsumer
jmeter.reportgenerator.graph.custom_testGraph.title=LB LEVEL
jmeter.reportgenerator.graph.custom_testGraph.property.set_Y_Axis=LB LEVEL
jmeter.reportgenerator.graph.custom_testGraph.set_X_Axis=Over Time
jmeter.reportgenerator.graph.custom_testGraph.property.set_granularity=60000
jmeter.reportgenerator.graph.custom_testGraph.property.set_Sample_Variable_Name=LB_LEVEL
jmeter.reportgenerator.graph.custom_testGraph.property.set_Content_Message=LB_LEVEL:
sample_variables is a special property which saves custom variable(s) into .jtl results file
Restart JMeter to pick the properties up
Run your JMeter test in command-line non-GUI mode and generate the HTML Reporting Dash board as:
jmeter -n -t <test JMX file> -l <test log file> -e -o <Path to output folder>
Open <Path to output folder>/index.html file with your favourite browser - you will see plotted LB_LEVEL values along with other tables and charts.
If for some reason it doesn't fit your needs you can consider using Flexible File Writer to store the metrics of your choice into a file, in your case they would be grpThreads and variable#0

How to use im2rec in MXnet to create my own dataset

In windows 10, I followed the step-by-step MXnet tutorial to use im2rec.py to create a dataset. I created a image list file like this:
integer_image_index \t label_index \t path_to_image
Next, I modified .txt to .lst.
Finally, I executed the command:
python im2rec.py --exts '.jpg' --train-ratio 0.41 --test-ratio 0.49 --recursive=True --pack-label=True D:\CUB_200_2011\data\image_label.lst D:\CUB_200_2011\CUB_200_2011\image
It is shown that "read no error", but the files created by the command like .lst and .rec are 0K, there is empty. I don't know why.
Please tell me what mistakes I made.
im2rec.py will print
read none error:(filename)
for any file that it can't load for whatever reason. Maybe some of the files you list aren't there or are empty? Or maybe the base path you've specified is wrong -- I notice you have the folder name CUB_200_2011 twice.

Running Word Count or Pig Script on a Directory to produce result in separate files

I am new to Hadoop/Pig.
I have a directory which has several files. Now I need to run a word count on those. I can use the Hadoop sample example wordcount and run it on the directory to get the output, but the output will be in a single file. What should I do if I want the output of each file should be in a different file?
I can use Pig too. And give the directory as input to pig. However how can I read the file names inside the Directory and then give it to the LOAD?
What I meant is:
Say I have a directory Test which has 5 files test1, test2, test3, test4, test5. Now I want the word count of each file separately in a separate file. I know I can provide individual names and do it, but that would take a lot of time.
Is it possible that I can read filenames from the directory and provide them as input to LOAD of pig?
If you're using Pig version 0.10.0 or later, you can take advantage of a combination of source tagging and MultiStorage to keep track of the files.
For example, if you had an input directory pigin with files and content as the following:
pigin
|-test1 => "hello"
|-test2 => "world"
|-test3 => "Apache"
|-test4 => "Hadoop"
|-test5 => "Pig"
The following script will read each script and write the contents of each file to a different directory.
%declare inputPath 'pigin'
%declare outputPath 'pigout'
-- Define MultiStorage to write output to different directories based on the
-- first element in the tuple
define MultiStorage org.apache.pig.piggybank.storage.MultiStorage('$outputPath','0');
-- Load the input files, prepending each tuple with the file name
A = load '$inputPath' using PigStorage(',', '-tagsource');
-- Write output to different directories
store A into '$outputPath' using MultiStorage();
The above script will create an output directory tree that looks like the following:
pigout
|-test1
| `-test1-0 => "test1 hello"
|-test2
| `-test2-0 => "test2 world"
|-test3
| `-test3-0 => "test3 Apache"
|-test4
| `-test4-0 => "test4 Hadoop"
|-test5
| `-test5-0 => "test5 Pig"
The -0 at the end of the filenames correspond to the reducers that produced the output. If you have more than one reducer, you may see more than one file per directory.
You could extend the PigStorage code to add the file name to the tuple, see Code Sample look for question "Q: I load data from a directory which contains different file. How do I find out where the data comes from?". For the output you could do similar extension of the PigStorage to write into different output files.

How to successfully run kmeans clustering using Mahout (esp. get human-readable output)

I tried to follow many online tutorials to run kmeans example present in Mahout.
But did not succeed yet to get meaningful output. The main problem I am facing is,
the conversion from text file to sequencefile and back.
When I followed the steps of "Mahout Wiki" for "Clustering of synthetic control data"
(https://cwiki.apache.org/MAHOUT/clustering-of-synthetic-control-data.html) I could run the clustering process (using $MAHOUT_HOME/bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job) and that created some readable console output. But I wish to get output files (as the size is large) from the clustering process.
The output files which were generated by Mahout clustering are all sequence file and I cant convert them to readable files.
When I tried to do "clusterdump" ($MAHOUT_HOME/bin/mahout clusterdump --seqFileDir output/clusters-10...) I got errors.
First it complains that "seqFileDir" option is unexpected and I guess either there is no "seqFileDir" for clusterdump or I am missing something.
Trying to use Mahout in the way of "mahout in action" seems tricky. I am not sure what are the required classes ("import ??") to compile that code.
Can you please suggest me the steps to successfully RUN kmeans on Mahout ? Specially how to get readable output from sequence files ?
Regarding 2nd question - you can obtain source code for the book from the repository. The code in master branch is for Mahout 0.5, while code in the branches mahout-0.6 & mahout-0.7 is for corresponding Mahout's version.
The source code is also posted to book's site, so you download it there (but this is version only for Mahout 0.5)
P.S. If you're reading book right now, then I recommend to use Mahout 0.5 or 0.6, as all code was checked for version 0.5, while for other versions it will be different - this is especially true for clustering code in Mahout 0.7
As for seqFileDir in clusterdump, you need to use --input not --seqFileDir.
I'm using Mahout 0.7. The call to clusterdump that i use to (for example) get a simple dump is:
mahout clusterdump --input output/clusters-9-final --pointsDir output/clusteredPoints --output <absolute path of dir where you want to output>/clusteranalyze.txt
Be sure that the path to the directory output/clusters-9-final above is correct for your system. Depending on the clustering algorithm, this directory may be different. Look in the output directory and make sure you use the directory with the word "final" init.
To dump data as CSV or GRAPH_ML, you would add the -of CSV argument to the above call. For e.g.:
mahout clusterdump --input output/clusters-9-final -of CSV --pointsDir output/clusteredPoints --output <absolute path of dir where you want to output>/clusteranalyze.txt
Hope that helps.

Mahout - Naive Bayes

I tried deploying 20- news group example with mahout, it seems working fine. Out of curiosity I would like to dig deep into the model statistics,
for example: bayes-model directory contains the following sub directories,
trainer-tfIdf trainer-thetaNormalizer trainer-weights
which contains part-0000 files. I would like to read the contents of the file for better understanding, cat command doesnt seems to work, it prints some garbage.
Any help is appreciated.
Thanks
The 'part-00000' files are created by Hadoop, and are in Hadoop's SequenceFile format, containing values specific to Mahout. You can't open them as text files, no. You can find the utility class SequenceFileDumper in Mahout that will try to output the content as text to stdout.
As to what those values are to begin with, they're intermediate results of the multi-stage Hadoop-based computation performed by Mahout. You can read the code to get a better sense of what these are. The "tfidf" directory for example contains intermediate calculations related to term frequency.
You can read part-0000 files using hadoop's filesystem -text option. Just get into the hadoop directory and type the following
`bin/hadoop dfs -text /Path-to-part-file/part-m-00000`
part-m-00000 will be printed to STDOUT.
If it gives you an error, you might need to add the HADOOP_CLASSPATH variable to your path. For example, if after running it gives you
text: java.io.IOException: WritableName can't load class: org.apache.mahout.math.VectorWritable
then add the corresponding class to the HADOOP_CLASSPATH variable
export HADOOP_CLASSPATH=/src/mahout/trunk/math/target/mahout-math-0.6-SNAPSHOT.jar
That worked for me ;)
In order to read part-00000 (sequence files) you need to use the "seqdumper" utility. Here's an example I used for my experiments:
MAHOUT_HOME$: bin/mahout seqdumper -s
~/clustering/experiments-v1/t14/tfidf-vectors/part-r-00000
-o ~/vectors-v2-1010
-s is the sequence file you want to convert to plain text
-o is the output file

Resources