Area Under Half Normal Distribution is too big - transformation

I have a project where I need to look what I think is a half normal distribution.
x is roughly normally distributed and centered at 0. What I need to look at is y = |x|.
I've never used half normal distributions before.
To get more familiar with it I've been experimenting with both R and excel.
I created a standard normal distribution and verify that the area under the curve is one.
Then, I doubled all those probabilities for z scores greater than or equal to 0.
I verified this method by thinking about it, and using the R package "extraDist."
Now the rub is that the area under the half normal is equal to 1.4.
Here is a simple table to illustrate my problem;
Z norm half
-6.00 0.00 0.00
-5.00 0.00 0.00
-4.00 0.00 0.00
-3.00 0.00 0.00
-2.00 0.05 0.00
-1.00 0.24 0.00
0.00 0.40 0.80
1.00 0.24 0.48
2.00 0.05 0.11
3.00 0.00 0.01
4.00 0.00 0.00
5.00 0.00 0.00
6.00 0.00 0.00
I am pretty certain that the area under any pdf should not exceed 1.
Can someone please enlighten me?
Best of luck, thanks!

Related

Is there any parameter in Oracle statspack report to indicate how many commits are done?

Looking at Oracle Statspack report, which has snap of exactly at before and after of the batch program, is it possible to say how much commits has been done on database? How many transaction has been done?
In the Load Profile section you have number of transactions per second and number of rollbacks per second:
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 0.4 0.0 0.01 0.06
DB CPU(s): 0.2 0.0 0.00 0.03
Redo size: 20,766.2 2,092.1
Logical reads: 548.5 55.3
Block changes: 138.3 13.9
Physical reads: 22.9 2.3
Physical writes: 16.5 1.7
User calls: 7.0 0.7
Parses: 9.2 0.9
Hard parses: 0.0 0.0
W/A MB processed: 1.8 0.2
Logons: 0.1 0.0
Executes: 62.6 6.3
Rollbacks: 0.0 0.0
Transactions: 9.9

Heavy disk io write in select statement in HIVE

In hive I running a query -
select ret[0],ret[1],ret[2],ret[3],ret[4],ret[5],ret[6] from (select combined1(extra) as ret from log_test1) a ;
Here ret[0],ret[1],ret[2] ... are domain, date, IP, etc. This query is doing heavy write on disk.
iostat result on one of the box in cluster.
avg-cpu: %user %nice %system %iowait %steal %idle
20.65 0.00 1.82 57.14 0.00 20.39
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
xvdb 0.00 0.00 0.00 535.00 0.00 23428.00 87.58 143.94 269.11 0.00 269.11 1.87 100.00
My mapper is basically stuck in disk IO. I have 3 box cluster. My yarn configuration is
Mapper memory(mapreduce.map.memory.mb)=2GB,
I/O Sort Memory Buffer=1 GB.
I/O Sort Spill Percent=0.8
Counters of my jobs are
FILE: Number of bytes read 0
FILE: Number of bytes written 2568435
HDFS: Number of bytes read 1359720216
HDFS: Number of bytes written 19057298627
Virtual memory (bytes) snapshot 24351916032
Total committed heap usage (bytes) 728760320
Physical memory (bytes) snapshot 2039455744
Map input records 76076426
Input split bytes 2738
GC time elapsed (ms) 55602
Spilled Records 0
As mapper should initially write everything in RAM and when RAM gets full(I/O Sort Memory Buffer),it should spill the data into disk. But as I am seeing, Spilled Records=0 and also mapper is not using full RAM, still there is so heavy disk write.
Even when I am running query
select combined1(extra) from log_test1;
I am getting same heavy disk io write.
What can be the reason of this heavy disk write and how can I reduce this heavy disk write ? As in this case disk io is becoming bottleneck for my mapper.
It may be that your subquery is being written to disk before the second stage of the processing takes place. You should use Explain to examine the execution plan.
You could try rewriting your subquery as a CTE https://cwiki.apache.org/confluence/display/Hive/Common+Table+Expression

What do the statistics (usr, sys, cusr, csys, and CPU) outputted by Perl's prove command mean?

I've done quite a bit of Googling but can't find an answer to this question. When you run prove on your tests (http://perldoc.perl.org/prove.html), you get some statistics that look like:
Files=3, Tests=45, 2 wallclock secs ( 0.03 usr 0.00 sys + 0.50 cusr 0.12 csys = 0.65 CPU)
What do the numbers given for usr, sys, cusr, csys and CPU mean?
Wallclock seconds is the actual elapsed time, as if you looked at your watch to time it.
The usr seconds in the time actually spent on the CPU, in user space.
The sys seconds is the time actually spent on the CPU, in kernel space.
The CPU time is the total time spent on the CPU.
I don't know what cusr and csys represents, I guess they mean children_user and children_system?

xpce prolog consult program

in this url http://www.commonkads.uva.nl/frameset-commonkads.html
on tools tab there is an example program in prolog
i download it and consult main_xpce but no window appear to show program
this is a xpce prolog program and must have a window
i have similar problem with demo xpce in swi that they are act like that ,when consult no windows appear!
for example kangaroo.pl in C:\Program Files (x86)\swipl\xpce\prolog\demo
when consult this file no appearance results
but when use xpce.man menu it work properly
plz help me
ck-prolog seems working... but I don't know what's expected.
I downloaded ck-prolog.zip, unzipped, loaded main_pce.pl, and run
?- [main_pce].
% auxilliary compiled 0.01 sec, 33 clauses
% oo_kernel compiled 0.01 sec, 31 clauses
% ck_kernel compiled 0.02 sec, 116 clauses
% inf_methods compiled 0.00 sec, 14 clauses
% architecture compiled 0.04 sec, 203 clauses
% views_pce compiled 0.00 sec, 9 clauses
% controller compiled 0.00 sec, 24 clauses
% model compiled 0.04 sec, 256 clauses
% database compiled 0.00 sec, 2 clauses
% main_pce compiled 0.10 sec, 506 clauses
true.
?- go.
after some sec I get a window with 3 panels, then after 3 clicks I get
I'm sorry I don't know what that means...

Performance and resource testing of standalone Ruby code (gem)

I have a gem that does some tough numbercrunching: it crops "interesting" parts out of an image. For that, I set up several algorithms. Overall, it just performs bad; which, obviously, I want to improve :).
I want to test and measure three things:
memory usage
CPU-usage
overall time spent in a method/routine.
I want to investigate this and compare the values for various algorithms, parameters and set-ups.
Is there some Ruby functionality, a gem or anything like that, which will allow me to run my code, change a few parameters or a little bit of code, run it again and then compare the results?
I have test:unit and shoulda in place already, byt the way, so if there is something that uses these testing frameworks, that is fine.
You can use the 'profiler' library that is standard. It reports the time spent in each of your methods.
require 'profiler'
def functionToBeProfiled
a = 0
1000.times do |i|
a = a + i * rand
end
end
Profiler__::start_profile
functionToBeProfiled
Profiler__::stop_profile
Profiler__::print_profile($stdout)
This will produce the following output:
% cumulative self self total
time seconds seconds calls ms/call ms/call name
16.58 0.03 0.03 1 31.00 93.00 Integer#times
16.58 0.06 0.03 1000 0.03 0.03 Kernel.rand
8.56 0.08 0.02 3 5.33 10.33 RubyLex#identify_identifier
8.56 0.09 0.02 10 1.60 6.20 IRB::SLex::Node#match_io
8.56 0.11 0.02 84 0.19 0.19 Kernel.===
8.56 0.13 0.02 999 0.02 0.02 Float#+
8.56 0.14 0.02 6 2.67 5.33 String#gsub!
Be careful however as this library will hinder your application's performance. The measurements you get from it are only useful when compared with other measurements you obtain with the same method. They can't be used to assess absolute measurements.
I've made good experiences with ruby-prof:
http://ruby-prof.rubyforge.org/
There's also a good presentation on various ways of profiling somewhere on the web, but I can't remember title and author and have a hard time finding it right now... :-(
I was pleasantly surprised by JRuby. If your code runs on that implementation without change and you're familiar with Java benchmarking software you should take a look (and let me know how you get on).
http://www.engineyard.com/blog/2010/monitoring-memory-with-jruby-part-1-jhat-and-visualvm/
http://danlucraft.com/blog/2011/03/built-in-profiler-in-jruby-1.6/
Having now read your question more carefully I realise this doesn't afford you the capability of automating the process.

Resources