Accuracy for Random Forest Algorithm is 0.0 - algorithm

I'm doing a machine learning project using Jupyter notebook. I'm using Random Forest with GridSearchCV, the execution is working fine, but I got Accuracy = 0.0
When I tried Decision Tree the Accuracy = 99.99
How do I solve this issue?
Input
#Training the RandomForest Algorithm
from sklearn.model_selection import GridSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
rfc=RandomForestClassifier(random_state=42)
param_grid = {
'n_estimators': [50, 100, 200],
'max_depth' : [5, 10, 20],
'min_samples_leaf': [1, 2, 3, 4, 5, 10, 20]
}
CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X_train, y_train)
CV_rfc.best_params_
rfc1=RandomForestClassifier(random_state=42, n_estimators= 50, max_depth=5, criterion='gini')
rfc1.fit(X_train, y_train)
Which gives an output:
RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
max_depth=5, max_features='auto', max_leaf_nodes=None,
min_impurity_split=1e-07, min_samples_leaf=1,
min_samples_split=2, min_weight_fraction_leaf=0.0,
n_estimators=50, n_jobs=1, oob_score=False, random_state=42,
verbose=0, warm_start=False)
INPUT:
pred=rfc1.predict(X_test)
print("Accuracy for Random Forest on CV data: ",accuracy_score(y_test,pred))
OUTPUT:
Accuracy for Random Forest on CV data: 0.0
INPUT :
'''
Compute confusion matrix and print classification report.
'''
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
# score the model
Ntest = len(y_test)
Ntestpos = len([val for val in y_test if val])
NullAcc = float(Ntest-Ntestpos)/Ntest
print("Mean accuracy on Training set: %s" %rfc1.score(X_train, y_train))
print("Mean accuracy on Test set: %s" %rfc1.score(X_test, y_test))
print("Null accuracy on Test set: %s" %NullAcc)
print(" ")
y_pred = rfc1.predict(X_test)
f1_score(y_test, y_pred, average='weighted')
y_true, y_pred = y_test, rfc1.predict(X_test)
cm = confusion_matrix(y_true, y_pred)
print("Confusion matrix:\ntn=%6d fp=%6d\nfn=%6d tp=%6d" %(cm[0][0],cm[0][1],cm[1][0],cm[1][1]))
print("\nDetailed classification report: \n%s" %classification_report(y_true, y_pred))
OUTPUT:
Mean accuracy on Training set: 1.0
Mean accuracy on Test set: 0.0
Null accuracy on Test set: 0.0
with That Error
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
UndefinedMetricWarning: F-score is ill-defined and being set to 0.0 in labels with no predicted samples.
'precision', 'predicted', average, warn_for)
Confusion matrix:
tn= 0 fp= 0
fn=1745395 tp= 0
Detailed classification report:
precision recall f1-score support
0 0.00 0.00 0.00 0
1 0.00 0.00 0.00 1745395
2 0.00 0.00 0.00 143264
3 0.00 0.00 0.00 75044
4 0.00 0.00 0.00 46700
5 0.00 0.00 0.00 31568
6 0.00 0.00 0.00 22966
7 0.00 0.00 0.00 16903
8 0.00 0.00 0.00 13188
9 0.00 0.00 0.00 10160
.
.
.
119 0.00 0.00 0.00 2
123 0.00 0.00 0.00 2
124 0.00 0.00 0.00 1
141 0.00 0.00 0.00 1
165 0.00 0.00 0.00 1
avg / total 0.00 0.00 0.00 2148603

Related

Julia pmap speed - parallel processing - dynamic programming

I am trying to speed up filling in a matrix for a dynamic programming problem in Julia (v0.6.0), and I can't seem to get much extra speed from using pmap. This is related to this question I posted almost a year ago: Filling a matrix using parallel processing in Julia. I was able to speed up serial processing with some great help then, and I'm now trying to get extra speed from parallel processing tools in Julia.
For the serial processing case, I was using a 3-dimensional matrix (essentially a set of equally-sized matrices, indexed by the 1st-dimension) and iterating over the 1st-dimension. I wanted to give pmap a try, though, to more efficiently iterate over the set of matrices.
Here is the code setup. To use pmap with the v_iter function below, I converted the three dimensional matrix into a dictionary object, with the dictionary keys equal to the index values in the 1st dimension (v_dict in the code below, with gcc equal to the 1st-dimension size). The v_iter function takes other dictionary objects (E_opt_dict and gridpoint_m_dict below) as additional inputs:
function v_iter(a,b,c)
diff_v = 1
while diff_v>convcrit
diff_v = -Inf
#These lines efficiently multiply the value function by the Markov transition matrix, using the A_mul_B function
exp_v = zeros(Float64,gkpc,1)
A_mul_B!(exp_v,a[1:gkpc,:],Zprob[1,:])
for j=2:gz
temp=Array{Float64}(gkpc,1)
A_mul_B!(temp,a[(j-1)*gkpc+1:(j-1)*gkpc+gkpc,:],Zprob[j,:])
exp_v=hcat(exp_v,temp)
end
#This tries to find the optimal value of v
for h=1:gm
for j=1:gz
oldv = a[h,j]
newv = (1-tau)*b[h,j]+beta*exp_v[c[h,j],j]
a[h,j] = newv
diff_v = max(diff_v, oldv-newv, newv-oldv)
end
end
end
end
gz = 9
gp = 13
gk = 17
gcc = 5
gm = gk * gp * gcc * gz
gkpc = gk * gp * gcc
gkp = gk*gp
beta = ((1+0.015)^(-1))
tau = 0.35
Zprob = [0.43 0.38 0.15 0.03 0.00 0.00 0.00 0.00 0.00; 0.05 0.47 0.35 0.11 0.02 0.00 0.00 0.00 0.00; 0.01 0.10 0.50 0.30 0.08 0.01 0.00 0.00 0.00; 0.00 0.02 0.15 0.51 0.26 0.06 0.01 0.00 0.00; 0.00 0.00 0.03 0.21 0.52 0.21 0.03 0.00 0.00 ; 0.00 0.00 0.01 0.06 0.26 0.51 0.15 0.02 0.00 ; 0.00 0.00 0.00 0.01 0.08 0.30 0.50 0.10 0.01 ; 0.00 0.00 0.00 0.00 0.02 0.11 0.35 0.47 0.05; 0.00 0.00 0.00 0.00 0.00 0.03 0.15 0.38 0.43]
convcrit = 0.001 # chosen convergence criterion
E_opt = Array{Float64}(gcc,gm,gz)
fill!(E_opt,10.0)
gridpoint_m = Array{Int64}(gcc,gm,gz)
fill!(gridpoint_m,fld(gkp,2))
v_dict=Dict(i => zeros(Float64,gm,gz) for i=1:gcc)
E_opt_dict=Dict(i => E_opt[i,:,:] for i=1:gcc)
gridpoint_m_dict=Dict(i => gridpoint_m[i,:,:] for i=1:gcc)
For parallel processing, I executed the following two commands:
wp = CachingPool(workers())
addprocs(3)
pmap(wp,v_iter,values(v_dict),values(E_opt_dict),values(gridpoint_m_dict))
...which produced this performance:
135.626417 seconds (3.29 G allocations: 57.152 GiB, 3.74% gc time)
I then tried to serial process instead:
for i=1:gcc
v_iter(v_dict[i],E_opt_dict[i],gridpoint_m_dict[i])
end
...and received better performance.
128.263852 seconds (3.29 G allocations: 57.101 GiB, 4.53% gc time)
This also gives me about the same performance as running v_iter on the original 3-dimensional objects:
v=zeros(Float64,gcc,gm,gz)
for i=1:gcc
v_iter(v[i,:,:],E_opt[i,:,:],gridpoint_m[i,:,:])
end
I know that parallel processing involves setup time, but when I increase the value of gcc, I still get about equal processing time for serial and parallel. This seems like a good candidate for parallel processing, since there is no need for messaging between the workers! But I can't seem to make it work efficiently.
You create the CachingPool before adding the worker processes. Hence your caching pool passed to pmap tells it to use just a single worker.
You can simply check it by running wp.workers you will see something like Set([1]).
Hence it should be:
addprocs(3)
wp = CachingPool(workers())
You could also consider running Julia -p command line parameter e.g. julia -p 3 and then you can skip the addprocs(3) command.
On top of that your for and pmap loops are not equivalent. The Julia Dict object is a hashmap and similar to other languages does not offer anything like element order. Hence in your for loop you are guaranteed to get the same matching i-th element while with the values the ordering of values does not need to match the original ordering (and you can have different order for each of those three variables in the pmap loop).
Since the keys for your Dicts are just numbers from 1 up to gcc you should simply use arrays instead. You can use generators very similar to Python. For an example instead of
v_dict=Dict(i => zeros(Float64,gm,gz) for i=1:gcc)
use
v_dict_a = [zeros(Float64,gm,gz) for i=1:gcc]
Hope that helps.
Based on #Przemyslaw Szufeul's helpful advice, I've placed below the code that properly executes parallel processing. After running it once, I achieved substantial improvement in running time:
77.728264 seconds (181.20 k allocations: 12.548 MiB)
In addition to reordering the wp command and using the generator Przemyslaw recommended, I also recast v_iter as an anonymous function, in order to avoid having to sprinkle #everywhere around the code to feed functions and data to the workers.
I also added return a to the v_iter function, and set v_a below equal to the output of pmap, since you cannot pass by reference to a remote object.
addprocs(3)
v_iter = function(a,b,c)
diff_v = 1
while diff_v>convcrit
diff_v = -Inf
#These lines efficiently multiply the value function by the Markov transition matrix, using the A_mul_B function
exp_v = zeros(Float64,gkpc,1)
A_mul_B!(exp_v,a[1:gkpc,:],Zprob[1,:])
for j=2:gz
temp=Array{Float64}(gkpc,1)
A_mul_B!(temp,a[(j-1)*gkpc+1:(j-1)*gkpc+gkpc,:],Zprob[j,:])
exp_v=hcat(exp_v,temp)
end
#This tries to find the optimal value of v
for h=1:gm
for j=1:gz
oldv = a[h,j]
newv = (1-tau)*b[h,j]+beta*exp_v[c[h,j],j]
a[h,j] = newv
diff_v = max(diff_v, oldv-newv, newv-oldv)
end
end
end
return a
end
gz = 9
gp = 13
gk = 17
gcc = 5
gm = gk * gp * gcc * gz
gkpc = gk * gp * gcc
gkp =gk*gp
beta = ((1+0.015)^(-1))
tau = 0.35
Zprob = [0.43 0.38 0.15 0.03 0.00 0.00 0.00 0.00 0.00; 0.05 0.47 0.35 0.11 0.02 0.00 0.00 0.00 0.00; 0.01 0.10 0.50 0.30 0.08 0.01 0.00 0.00 0.00; 0.00 0.02 0.15 0.51 0.26 0.06 0.01 0.00 0.00; 0.00 0.00 0.03 0.21 0.52 0.21 0.03 0.00 0.00 ; 0.00 0.00 0.01 0.06 0.26 0.51 0.15 0.02 0.00 ; 0.00 0.00 0.00 0.01 0.08 0.30 0.50 0.10 0.01 ; 0.00 0.00 0.00 0.00 0.02 0.11 0.35 0.47 0.05; 0.00 0.00 0.00 0.00 0.00 0.03 0.15 0.38 0.43]
convcrit = 0.001 # chosen convergence criterion
E_opt = Array{Float64}(gcc,gm,gz)
fill!(E_opt,10.0)
gridpoint_m = Array{Int64}(gcc,gm,gz)
fill!(gridpoint_m,fld(gkp,2))
v_a=[zeros(Float64,gm,gz) for i=1:gcc]
E_opt_a=[E_opt[i,:,:] for i=1:gcc]
gridpoint_m_a=[gridpoint_m[i,:,:] for i=1:gcc]
wp = CachingPool(workers())
v_a = pmap(wp,v_iter,v_a,E_opt_a,gridpoint_m_a)

gprof on both OpenMP and without OpenMP codes produces different flat profile

After successfully implementing OpenMP to my code, I am trying to check how much the implementation has improved my code performance, but using gprof it gives me totally different flat profile. Below is my main program calling all subroutines.
program main
use my_module
call inputf !to read inputs from a file
! call echo !to check if the inputs are read in correctly, but is muted
call allocv !to allocate dimension to all array variable
call bathyf !to read in the computational domain
call inicon !to setup initial conditions
call comput !computation from iteration 1 to n
call deallv !to deallocate all array variables
end program main
Following is the cpu_time and OMP_GET_WTIME() for both serial and parallel codes. The OpenMP parallel region is within subroutine comput.
!serial code
CPU time elapsed = 260.5080 seconds.
!parallel code
CPU time elapsed = 153.3600 seconds.
OMP time elapsed = 49.3521 seconds.
And the following are the flat profile for both serial and parallel codes.
!Serial code
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
96.26 227.63 227.63 1 227.63 236.45 comput_
3.60 236.13 8.50 2001 0.00 0.00 update_
0.08 236.32 0.19 2000 0.00 0.00 openbc_
0.05 236.45 0.13 41 0.00 0.00 output_
0.01 236.47 0.02 1 0.02 0.02 bathyf_
0.01 236.49 0.02 1 0.02 0.03 inicon_
0.00 236.50 0.01 1 0.01 0.01 opwmax_
0.00 236.50 0.00 1001 0.00 0.00 timser_
0.00 236.50 0.00 2 0.00 0.00 timestamp_
0.00 236.50 0.00 1 0.00 0.00 allocv_
0.00 236.50 0.00 1 0.00 0.00 deallv_
0.00 236.50 0.00 1 0.00 0.00 inputf_
!Parallel code
Flat profile:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
95.52 84.90 84.90 openbc_
1.68 86.39 1.49 2001 0.74 0.74 update_
0.10 86.48 0.09 41 2.20 2.20 output_
0.00 86.48 0.00 1001 0.00 0.00 timser_
0.00 86.48 0.00 2 0.00 0.00 timestamp_
0.00 86.48 0.00 1 0.00 0.00 allocv_
0.00 86.48 0.00 1 0.00 0.00 bathyf_
0.00 86.48 0.00 1 0.00 0.00 deallv_
0.00 86.48 0.00 1 0.00 2.20 inicon_
0.00 86.48 0.00 1 0.00 0.00 inputf_
0.00 86.48 0.00 1 0.00 0.00 comput_
0.00 86.48 0.00 1 0.00 0.00 opwmax_
subroutine update, openbc, output and timser are called within subroutine comput. As you can see, the subroutine comput is suppose to spend the most runtime, but the flat profile of the parallel code shows otherwise. Please let me know if you need other information.
gprof is poorly suited for analysis of parallel programs as it doesn't understand the intricacies of OpenMP. You should instead use something like a combination of Score-P and Cube. The former is an instrumentation framework while the latter is a visualisation tool for hierarchical performance data. Both are open-source projects. On the commercial front, Intel VTune Amplifier could be used.
This article says:
One problem with gprof under certain kernels (such as Linux) is that it doesn’t behave correctly with multithreaded applications. It actually only profiles the main thread, which is quite useless.
The article also provides a work-around, but since you don't create your threads manually, but instead use OpenMP (which creates the threads transparently), you will have to modify it to make it work for you.
You could also choose a profiler that is able to work with parallel programs instead.

Ruby-prof with graph printer and sorting by self puts out total percentages higher than 100%

If I run
ruby-prof -p graph -s self aggregate.rb > graph.txt
the first few lines of my graph.txt will look like:
Total Time: 40.092432
%total %self total self wait child calls Name
--------------------------------------------------------------------------------
5.16 5.16 0.00 0.00 98304/98304 Object#totalDurationFromFile
100.00% 100.00% 5.16 5.16 0.00 0.00 98304 IO#read
--------------------------------------------------------------------------------
4.91 4.91 0.00 0.00 98304/98304 <Class::IO>#new
95.17% 95.17% 4.91 4.91 0.00 0.00 98304 File#initialize
--------------------------------------------------------------------------------
0.37 0.19 0.00 0.17 32768/32769 Hash#each
28.89 4.67 0.00 24.22 1/32769 Object#readFiles
566.81% 94.24% 29.26 4.86 0.00 24.39 32769 Array#collect
14.71 1.98 0.00 12.73 98304/98304 Object#totalDurationFromFile
9.11 0.64 0.00 8.48 98304/131072 Class#new
0.39 0.39 0.00 0.00 98304/196609 <Class::File>#basename
0.00 0.17 0.00 0.00 98304/1202331 Object#main
--------------------------------------------------------------------------------
3.76 3.35 0.00 0.42 524288/524288 Module#class_eval
72.94% 64.85% 3.76 3.35 0.00 0.42 524288 Module#define_method
0.42 0.42 0.00 0.00 524288/524288 BasicObject#singleton_method_added
I don't think that this is specific to my script aggregate.rb. Therefore, I am leaving the source code out for the sake of brevity.
Question is: Why are there percentages higher than 100% in the %total column? Is sorting by self not allowed with the graph printer? Is this a bug or did I overlook something. Help greatly appreciated.
Thanks!
Have you checked if this change on Github resolves the issue? Apparently, the gem version is out of date and/or does not include that change (as it would also increase the number of decimal places to three).

Why python implementation of miller-rabin faster than ruby by a lot?

For one of my classes I recently came across both a ruby and a python implementations of using the miller-rabin algorithm to identify the number of primes between 20 and 29000. I am curious why, even though they are seemingly the same implementation, the python code runs so much faster. I have read that python was typically faster than ruby but is this much of a speed difference to be expected?
miller_rabin.rb
def miller_rabin(m,k)
t = (m-1)/2;
s = 1;
while(t%2==0)
t/=2
s+=1
end
for r in (0...k)
b = 0
b = rand(m) while b==0
prime = false
y = (b**t) % m
if(y ==1)
prime = true
end
for i in (0...s)
if y == (m-1)
prime = true
break
else
y = (y*y) % m
end
end
if not prime
return false
end
end
return true
end
count = 0
for j in (20..29000)
if(j%2==1 and miller_rabin(j,2))
count+=1
end
end
puts count
miller_rabin.py:
import math
import random
def miller_rabin(m, k):
s=1
t = (m-1)/2
while t%2 == 0:
t /= 2
s += 1
for r in range(0,k):
rand_num = random.randint(1,m-1)
y = pow(rand_num, t, m)
prime = False
if (y == 1):
prime = True
for i in range(0,s):
if (y == m-1):
prime = True
break
else:
y = (y*y)%m
if not prime:
return False
return True
count = 0
for j in range(20,29001):
if j%2==1 and miller_rabin(j,2):
count+=1
print count
When I measure the execution time of each using Measure-Command in Windows Powershell, I get the following:
Python 2.7:
Ticks: 4874403
Total Milliseconds: 487.4403
Ruby 1.9.3:
Ticks: 682232430
Total Milliseconds: 68223.243
I would appreciate any insight anyone can give me into why their is such a huge difference
In ruby you are using (a ** b) % c to calculate the modulo of exponentiation. In Python, you are using the much more efficient three-element pow call whose docstring explicitly states:
With three arguments, equivalent to (x**y) % z, but may be more
efficient (e.g. for longs).
Whether you want to count the lack of such built-in operator against ruby is a matter of opinion. On the one hand, if ruby doesn't provide one, you might say that it's that much slower. On the other hand, you're not really testing the same thing algorithmically, so some would say that the comparison is not fair.
A quick googling reveals that there are implementations of modulo exponentiation for ruby.
I think these profile results should answer your question:
%self total self wait child calls name
96.81 43.05 43.05 0.00 0.00 17651 Fixnum#**
1.98 0.88 0.88 0.00 0.00 17584 Bignum#%
0.22 44.43 0.10 0.00 44.33 14490 Object#miller_rabin
0.11 0.05 0.05 0.00 0.00 32142 <Class::Range>#allocate
0.11 0.06 0.05 0.00 0.02 17658 Kernel#rand
0.08 44.47 0.04 0.00 44.43 32142 *Range#each
0.04 0.02 0.02 0.00 0.00 17658 Kernel#respond_to_missing?
0.00 44.47 0.00 0.00 44.47 1 Kernel#load
0.00 44.47 0.00 0.00 44.47 2 Global#[No method]
0.00 0.00 0.00 0.00 0.00 2 IO#write
0.00 0.00 0.00 0.00 0.00 1 Kernel#puts
0.00 0.00 0.00 0.00 0.00 1 IO#puts
0.00 0.00 0.00 0.00 0.00 2 IO#set_encoding
0.00 0.00 0.00 0.00 0.00 1 Fixnum#to_s
0.00 0.00 0.00 0.00 0.00 1 Module#method_added
Looks like Ruby's ** operator is slow as compared to Python.
It looks like (b**t) is often too big to fix in a Fixnum, so you are using Bignum (or arbitrary-precision) arithmetic, which is much slower.

ruby-prof says Ruby increment operator (+=) takes 25 seconds

I'm trying to profile some Ruby code I wrote using ruby-prof gem and see that basic operations like i += 1 (listed as Fixnum#+ in the table below) take over 24 seconds to run (in this particular test, the operation is performed 2,199,978 times). Is this normal?
Thread 582936
%Total %Self Total Self Wait Child Calls Name
203.93 81.72 0.00 122.21 100001/100001 InputFile#parse
46.96% 18.82% 203.93 81.72 0.00 122.21 100001 InputFile#split_on_semicolon
24.59 24.59 0.00 0.00 2199978/3200094 Fixnum#+
16.02 16.02 0.00 0.00 100001/399998 String#split
14.72 14.72 0.00 0.00 999990/999991 String#[]
13.12 13.12 0.00 0.00 1199988/1199990 Fixnum#<
10.97 10.97 0.00 0.00 999990/2239978 String#empty?
10.49 10.49 0.00 0.00 1199988/1199988 String#<<
9.75 9.75 0.00 0.00 1199988/1200074 Array#[]
7.77 7.77 0.00 0.00 999990/999990 String#eql?
6.76 6.76 0.00 0.00 599994/599994 Fixnum#-
4.62 4.62 0.00 0.00 599994/599994 Array#delete_at
1.25 1.25 0.00 0.00 100001/1339989 Kernel#nil?
1.14 1.14 0.00 0.00 100001/300003 Array#size
1.01 1.01 0.00 0.00 100001/300002 Fixnum#>
Your results don't say += takes 25 seconds. They say that 2199978 calls to + took 24.59 seconds, which comes to 89.5 calls per ms. That's a bit slow, but probably only because it's being profiled. I don't see anything unusual in that.

Resources