the output of stanford nlp classifier - stanford-nlp

We are learning the usage of stanford-nlp classifier. As its wiki page said, it can be used to build model for classifying numerical data like Iris:
http://www-nlp.stanford.edu/wiki/Software/Classifier#Iris_data_set
But on interpreting the output we have difficult on some of them: there are 4 columns for input attributes(1-Value, 2-Value, 3-Value, 4-Value) and one column for output label (Iris-setosa, Iris-versicolor, Iris-virginica). But what is CLASS here? Is it the output column overall?
Built this classifier: Linear classifier with the following weights
Iris-setosa Iris-versicolor Iris-virginica
3-Value -2.27 0.03 2.26
CLASS 0.34 0.65 -1.01
4-Value -1.07 -0.91 1.99
2-Value 1.60 -0.13 -1.43
1-Value 0.69 0.42 -1.23
Total: -0.72 0.05 0.57
Prob: 0.15 0.32 0.54

CLASS is like the intercept term in a simple linear regression - it represents the relative frequency of different classes. It is a feature of every instance.

Related

How to format number in PL/SQL?

I need to convert some numbers to chars according to the following logic :
Input => Expected Output | Current Output
0 => 0 | 0.00 << Wrong
.1111 => 0.11 | 0.11
.1 => 0.1 | 0.10 << Wrong
1.111 => 1.11 | 1.11
Basically my logic is to have the minimum of characters. Only the user friendly caracters that describe the number.
Here is my current function
to_char(Value,'9999999999999990D99');
As you can see for 0 for example, it returns 0.00
Does anyone know how to solve that please ?
Thanks.
Looks like you want this one:
rtrim(to_char(Value,'fm99999999999990D99'),'.')
Ie, you need to add 'fm' in format mask and them remove '.':
Example:
select
to_char(Value,'9999999999999990D99') xx
,to_char(Value,'fm9999999999999990D99') x_fm -- just FM
,rtrim(to_char(Value,'fm99999999999990D99'),'.') x_fm_trim -- FM + rtrim
from xmltable('0, 0.1111, 0.1, 1.111' columns value number path '.');
XX X_FM X_FM_TRIM
-------------------- -------------------- ------------------
0.00 0. 0
0.11 0.11 0.11
0.10 0.1 0.1
1.11 1.11 1.11

cplex prints a lot to terminal although corresponding parameters are set

I am using CPLEX in Cpp.
After googling I found out what parameters need to be set to avoid cplex from printing to terminal and I use them like this:
IloCplex cplex(model);
std::ofstream logfile("cplex.log");
cplex.setOut(logfile);
cplex.setWarning(logfile);
cplex.setError(logfile);
cplex.setParam(IloCplex::MIPInterval, 1000);//Controls the frequency of node logging when MIPDISPLAY is set higher than 1.
cplex.setParam(IloCplex::MIPDisplay, 0);//MIP node log display information-No display until optimal solution has been found
cplex.setParam(IloCplex::SimDisplay, 0);//No iteration messages until solution
cplex.setParam(IloCplex::BarDisplay, 0);//No progress information
cplex.setParam(IloCplex::NetDisplay, 0);//Network logging display indicator
if ( !cplex.solve() ) {
....
}
but yet cplex prints such things:
Warning: Bound infeasibility column 'x11'.
Presolve time = 0.00 sec. (0.00 ticks)
Root node processing (before b&c):
Real time = 0.00 sec. (0.01 ticks)
Parallel b&c, 4 threads:
Real time = 0.00 sec. (0.00 ticks)
Sync time (average) = 0.00 sec.
Wait time (average) = 0.00 sec.
------------
Total (root+branch&cut) = 0.00 sec. (0.01 ticks)
Is there any way to avoid printing them?
Use setOut method from IloAlgorithm class (IloCplex inherits from IloAlgorithm). You can set a null output stream as a parameter and prevent logging the message on the screen.
This is what works in C++ according to cplex parameters doc:
cplex.setOut(env.getNullStream());
cplex.setWarning(env.getNullStream());
cplex.setError(env.getNullStream());

Try to find better way to manipulate large, multiple txt files through multiple directory using Ruby script

I'm working on collecting test measurement data from product in manufacturing environment.
The test measurement result of units under test are generated by the test system. It is in an 2Mb txt file and was keep in share folders separated by products.
The folder structure looks like...
LOGS
|-Product1
| |-log_p1_1.txt
| |-log_p1_2.txt
| |..
|-Product2
| |-log_p2_1.txt
| |-log_p2_2.txt
| |..
|-...
My ruby script can iterate through each Product directory under LOGS and then read each log_px_n.txt file, parse data I need in the file and update it into database.
The thing is that all log_px_n.txt files of must be keep in its current directory, both old file and new files, while I need to keep my database update as soon as the new log_px_n.tx file was generated.
what I did today is to try iterate through each Product directories then read each individual .txt file and after that update file into database if it was not exist.
My script looks like..
Dir['*'].each do |product|
product_dir = File.join(BASE_DIR, product)
Dir.chdir(product_dir)
Dir['*.txt'].each do |log|
if (Time.now - File.mtime(log) < SIX_HOURS_AGO) # take only new files in last six hours
# Here we do..
# - read each 2Mb .txt file
# - extract infomation from txt file
# - update into database
end
end
end
There are upto 30 differents product directories and each product contain around 1000 .txt file (2Mb each), and they are growing !
I don't have issue about disk space to store such .txt file but the time it take to complete this operation.
It takes >45min to complete task each time when run above code block.
Is there any better way to deal with this situation ?
Update:
I tried as Iced's suggest to use profiler, so I run below code and got following result...
require 'profiler'
class MyCollector
def initialize(dir, period, *filetypes)
#dir = dir
#filetypes = filetypes.join(',')
#period = period
end
def collect
Dir.chdir(#dir)
Dir.glob('*').each do |product|
products_dir = File.join(#dir, product)
Dir.chdir(products_dir)
puts "at product #{product}"
Dir.glob("**/*.{#{#filetypes}}").each do |log|
if Time.now - File.mtime(log) < #period
puts Time.new
end
end
end
end
path = '//10.1.2.54/Shares/Talend/PRODFILES/LOGS'
SIX_HOURS_AGO = 21600
Profiler__::start_profile
collector = MyCollector.new(path, SIX_HOURS_AGO, "LOG")
collector.collect
Profiler__::stop_profile
Profiler__::print_profile(STDOUT)
The result shows...
at product ABU43E
..
..
..
at product AXF40J
at product ACZ16C
2014-04-21 17:32:07 +0700
at product ABZ14C
at product AXF90E
at product ABZ14B
at product ABK43E
at product ABK01A
2014-04-21 17:32:24 +0700
2014-04-21 17:32:24 +0700
at product ABU05G
at product ABZABF
2014-04-21 17:32:28 +0700
2014-04-21 17:32:28 +0700
2014-04-21 17:32:28 +0700
2014-04-21 17:32:28 +0700
2014-04-21 17:32:28 +0700
2014-04-21 17:32:28 +0700
% cumulative self self total
time seconds seconds calls ms/call ms/call name
32.54 1.99 1.99 43 46.40 265.60 Array#each
24.17 3.48 1.48 41075 0.04 0.04 File#mtime
13.72 4.32 0.84 43 19.AX 19.AX Dir#glob
9.13 4.88 0.AX 41075 0.01 0.03 Time#-
8.14 5.38 0.50 41075 0.01 0.01 Float#quo
6.65 5.79 0.41 41075 0.01 0.01 Time#now
2.06 5.91 0.13 41084 0.00 0.00 Time#initialize
1.79 6.02 0.11 41075 0.00 0.00 Float#<
1.79 6.13 0.11 41075 0.00 0.00 Float#/
0.00 6.13 0.00 1 0.00 0.00 Array#join
0.00 6.13 0.00 51 0.00 0.00 Kernel.puts
0.00 6.13 0.00 51 0.00 0.00 IO#puts
0.00 6.13 0.00 102 0.00 0.00 IO#write
0.00 6.13 0.00 42 0.00 0.00 File#join
0.00 6.13 0.00 43 0.00 0.00 Dir#chdir
0.00 6.13 0.00 10 0.00 0.00 Class#new
0.00 6.13 0.00 1 0.00 0.00 MyCollector#initialize
0.00 6.13 0.00 9 0.00 0.00 Integer#round
0.00 6.13 0.00 9 0.00 0.00 Time#to_s
0.00 6.13 0.00 1 0.00 6131.00 MyCollector#collect
0.00 6.13 0.00 1 0.00 6131.00 #toplevel
[Finished in 477.5s]
It turn out that it take up to 7 mins to walk over each files in each directories. then call mtime.
Although my .txt file is 2Mb, it should not suppose to take time that long, no ?
Any suggestion, pls ?
Relying on mtime is not robust. In fact, Rails switched from using mtime to hash in naming the versions of asset files.
You should keep a list of file-hash pair. That can be obtained like this:
require "digest"
file_hash_pair =
Dir.glob("LOGS/**/*")
.select{|f| File.file?(f)}
.map{|f| [f, Digest::SHA1.hexdigest(File.read(f))]}
and perhaps you can keep the content of this in a file as YAML. You can run the code above each time, and whenever file_hash_pair is different from the previous value, you can tell that there was a change. If file_hash_pair.transpose[0] changed, then you can tell there was a file manipulation. If for a particular [file, hash] pair, the hash changed, then you can tell that the file file changed.

Importing CSV into Postgresql with duplicate values that are not duplicate rows

I am using Rails 4 and postgresql database and I have a question about entering in a CSV dataset into the database.
Date Advertiser Name Impressions Clicks CPM CPA CPC CTR
10/21/13 Advertiser 1 77 0 4.05 0.00 0.00 0.00
10/21/13 Advertiser 2 10732 23 5.18 0.00 2.42 0.21
10/21/13 Advertiser 3 16941 14 4.64 11.23 5.62 0.08
10/22/13 Advertiser 1 59 0 3.67 0.00 0.00 0.00
10/22/13 Advertiser 2 10130 15 5.24 53.05 3.54 0.15
10/22/13 Advertiser 3 18400 22 4.59 10.55 3.84 0.12
10/23/13 Advertiser 1 77 0 4.06 0.00 0.00 0.00
10/23/13 Advertiser 2 9520 22 5.58 26.58 2.42 0.23
Using the data above I need to create a show page for each Advertiser.
Ultimately I need to have a list of Advertiser's that I can click on any one of them and go to their show page and display the informations relevant to each advertiser (impressions, clicks, cpm, etc)
Where I am confused is how to import the CSV data when there are rows with duplicate Advertiser's, but the other columns contain relevant and non duplicate information. How can I set up my database tables so that I will not have duplicate Advertiser's and still import and then display the correct information?
You will want to create two models: Advertiser and Site. (or maybe date).
Advertiser "has many" Sites, and Site "has one" advertiser. This association will allow you to import your data correctly.
See: http://api.rubyonrails.org/classes/ActiveRecord/Associations/ClassMethods.html
Instead of creating two different models I just created 1 advertiser model and inputted the complete dataset into that model.
require 'csv'
desc "Import advertisers from csv file"
task :import => [:environment] do
CSV.foreach('db/MediaMathPerformanceReport.csv', :headers => true) do |row|
Advertiser.create!(row.to_hash)
end
end
After the data was imported by the above rake task, I simply set up the show route as follows:
def show
#advertiser = Advertiser.where(advertiser_name: advertiser_name)
end

Need help analysing the VarnishStat results

I am a newbie with Varnish. I have successfully installed it and now its working, but I need some guidance from the more knowledgeable people about how the server is performing.
I read this article - http://kristianlyng.wordpress.com/2009/12/08/varnishstat-for-dummies/ but I am still not sure howz the server performance.
The server has been running since last 9 hours. I understand that more content will be cached with time so cache hit ratio will better, but right now my concern is about intermediate help from your side on server performance.
Hitrate ratio: 10 100 613
Hitrate avg: 0.2703 0.3429 0.4513
239479 8.00 7.99 client_conn - Client connections accepted
541129 13.00 18.06 client_req - Client requests received
157594 1.00 5.26 cache_hit - Cache hits
3 0.00 0.00 cache_hitpass - Cache hits for pass
313499 9.00 10.46 cache_miss - Cache misses
67377 4.00 2.25 backend_conn - Backend conn. success
316739 7.00 10.57 backend_reuse - Backend conn. reuses
910 0.00 0.03 backend_toolate - Backend conn. was closed
317652 8.00 10.60 backend_recycle - Backend conn. recycles
584 0.00 0.02 backend_retry - Backend conn. retry
3 0.00 0.00 fetch_head - Fetch head
314040 9.00 10.48 fetch_length - Fetch with Length
4139 0.00 0.14 fetch_chunked - Fetch chunked
5 0.00 0.00 fetch_close - Fetch wanted close
386 . . n_sess_mem - N struct sess_mem
55 . . n_sess - N struct sess
313452 . . n_object - N struct object
313479 . . n_objectcore - N struct objectcore
38474 . . n_objecthead - N struct objecthead
368 . . n_waitinglist - N struct waitinglist
12 . . n_vbc - N struct vbc
61 . . n_wrk - N worker threads
344 0.00 0.01 n_wrk_create - N worker threads created
2935 0.00 0.10 n_wrk_queued - N queued work requests
1 . . n_backend - N backends
47 . . n_expired - N expired objects
149425 . . n_lru_moved - N LRU moved objects
1 0.00 0.00 losthdr - HTTP header overflows
461727 10.00 15.41 n_objwrite - Objects sent with write
239468 8.00 7.99 s_sess - Total Sessions
541129 13.00 18.06 s_req - Total Requests
64678 3.00 2.16 s_pipe - Total pipe
5346 0.00 0.18 s_pass - Total pass
318187 9.00 10.62 s_fetch - Total fetch
193589421 3895.84 6459.66 s_hdrbytes - Total header bytes
4931971067 14137.41 164569.09 s_bodybytes - Total body bytes
117585 3.00 3.92 sess_closed - Session Closed
2283 0.00 0.08 sess_pipeline - Session Pipeline
892 0.00 0.03 sess_readahead - Session Read Ahead
458468 10.00 15.30 sess_linger - Session Linger
414010 9.00 13.81 sess_herd - Session herd
36912073 880.96 1231.68 shm_records - SHM records
What VCL are you using? If the answer is 'none' then you are probably not getting a very good hitrate. On a fresh install, Varnish is quite conservative about what it caches (and rightly so), but you can probably improve matters by reading how to achieve a high hitrate. If it's safe to, you can selectively unset cookies and normalise requests with your VCL, which will result in fewer backend calls.
How much of your website is cacheable? Is your object cache big enough? If you can answer those two questions, you ought to be able to achieve a great hitrate with Varnish.

Resources