Partial KENDALL'S TAU-B (τb) correlation with bootstrapping - correlation

Is it possible to use partial KENDALL'S TAU-B (τb) correlation with bootstrapping?
I used SPSS syntax to run partial KENDALL'S TAU-B (τb) correlation. I followed steps of this website (https://toptipbio.com/spearman-partial-correlation-spss/) to run TAU-B without bootsrapping (by using 'TAUB'='CORR' instead of 'RHO'='CORR'). But I don't know how to have a bootsrapping result.
Because my sample size can be quite small (some groups can be as small as 8), I am not very confident to run KENDALL'S TAU-B without bootstrapping. (I am quite new to use statistics. Maybe I was wrong about bootstrapping. Please correct me if I am wrong.)
Thank you so much!!!
P.S., I tried to run bootstrapping before 'matrix out' command, and SPSS generated 2000 results (I set 2000 resampling for bootstrapping). And after I run 'matrix in' command, SPSS warned me something like that it cannot split the data as the 'matrix out' data set did. (It might be kind of confusing. The SPSS's warning was not written in English, and I am sorry that I couldn't understand it so to give a proper translation).

Related

Using `callgrind` to count function calls in Linux

I am trying to track function call counts on a program I'm interested in. If I run the program on its own, it will run fine. If I try to run it with valgrind using the command seen below I seem to be getting a different result.
Command run:
Produces this input immediately, even though the execution is normally slow.
I'd say that this is more likely to be related to this issue. However to be certain you will need to tell us
what compilation options are being used - specifically are you using anything related to AVX or x87?
What hardware this is running on.
It would help if you can cut this down to a small example and either update this or the frexp bugzilla items.
valgrind has limited floating point support. You're probably using non-standard or very large floats.
UPDATE: since you're using long double, you're outta luck. Unfortunately,
Your least-worst option
is to find a way to make your world work just using standard IEEE754
64-bit double precision.
This probably isn't easy considering you're using an existing project.

How do I use the balance classes option for autoML in the flow interface?

I'm trying to use autoML in the flow interface for a classification problem.
My response column is a enum data type with values of 1 and 0.
My data set is really imbalanced, around 0.5% of rows have a 1 response.
I want to try the balance classes option, but every time I try it, the program ends up throwing errors.
If I check the balance classes option, am I required to also input values in the class_sampling_factors input box? If so, what do I put in?
The documentation says:
"class_sampling_factors: (DRF, GBM, DL, Naive-Bayes, AutoML) Specify the per-class (in lexicographical order) over/under-sampling ratios. By default, these ratios are automatically computed during training to obtain the class balance. This option is only applicable for classification problems and when balance_classes is enabled."
But it seems like the function fails to run unless I put something in.
I've tried putting in 200.0, 1 and also 1.0,200.0 but neither seemed to work well.
You are not required to specify the "Class sampling factors" parameter when using "Balance classes".
I just verified on H2O 3.26.0.9 that you can successfully run AutoML with "Balance classes" checked and leaving the "Class sampling factors" blank by using the HIGGS dataset (10k subset). I also entered 1.0,0.5 for "Class sampling factors" and that worked as well. I don't see any bugs reported on older versions of H2O (not sure which version you are using), so maybe the error is caused by something else?
Here's the Flow output generated by both options:

Why do different homographies affect running time?

I am applying OpenCV's warpPerspective() function to an image and I'm timing this task (only the call to the function, nothing else). I noticed that if I use different homographies the running time changes.
For example I tried using the identity matrix and found that it is faster than another homography that I generated with OpenCV's findHomography(), specifically this one:
[ -4.2374501377308356e+00, -4.1373817174321941e+00, 1.6044389922446646e+03,
-1.6805996938549963e+00, -9.0838245171456080e+00, 1.9901208871396577e+03,
-2.4454046226610403e-03, -8.2658343249518724e-03, 1. ]
Please note that the output is not my concern, I am only talking about running time. So why is it different?
Thanks
EDIT: I'm using OpenCV 3.4 on a PowerVR GX6650. I tested it with and without OpenCL and the pattern is still the same.
As mentioned by #Micka in the comments, the difference seems to be given by the different number of times that the interpolation method is called.

Checking Calculator Type in TI-BASIC

I've been looking to make a program in TIBASIC that can evaluate what kind of calculator the code is running from, no assembly. Since I don't think there's anything that would get information from the about screen. Here's one piece of code I came up with:
:ClrDraw
:Text(0,0,0
:PxlTest(6,1
This will have different outputs based on which calculator it was run on. Are there any other tricks of a similar nature, or is there a better way of doing this?
Here's a simple and fast way to tell the difference between a TI-84 and TI-84 CE. The other answer seems to be focusing on distinguishing between SE and non-SE. Since you approved it (and asked this a year ago), I don't know if this is useful to you, but here you go.
: 0→Xmin
: 1→ΔX
: If Xmax=264
: Disp "TI-84 CE
Because the CE screens are wider, the auto-generated max is set to a higher value (264) than a normal TI-84 would be. You can also set the window vars used to something else and restore them afterwards to keep the graph screen unaffected.
Great question! The only thing I could think of off the top of my head is the processor speed difference (or RAM/ROM difference, but I couldn't think of a way to test that without assembly). Unfortunately, the TI-83 doesn't have a built-in clock, but some code like this should be able to tell the difference between a TI-84 and a TI-84 SE:
:startTmr→T
:For(I,1,99
:e^9
:End
:sub("TI-84+ SE",1,6+3(19>T

Hadoop for the Wikipedia pagecount dataset

I want to build a Hadoop-Job that basically takes the wikipedia pagecount-statistic as input and creates a list like
en-Articlename: en:count de:count fr:count
For that I need the different articlenames related to each language - i.e. Bruges(en, fr), Brügge(de), which the MediaWikiApi query articlewise(http://en.wikipedia.org/w/api.php?action=query&titles=Bruges&prop=langlinks&lllimit=500).
My question is to find the right approach to solve this problem.
My sketched approach would be:
Process the pagecount file line by line (line-example 'de Brugge 2 48824')
Query the MediaApi and write sth. like'en-Articlename: process-language-key:count'
Aggreate all en-Articlename-values to one line (maybe in a second job?)
Now it seems rather unhandy to query the MediaAPI for every line but currently can not get my head around a better solution.
Do you think the current approach for is feasible or can you think of a different one?
On a sidenote: The created job-chain shall be used to do some time-measuring on my (small) Hadoop-Cluster, so altering the task is still okay
Edit:
Here is a quite similar discussion which I just found now..
I think it isn't a good idea to query MediaApi during your batch processing due to:
network latency (your processing will be considerably slowed down)
single point of failure (if the api or your internet connection goes down your calculation will be aborted)
external dependency (its hard to repeat the calculation and got the same result)
legal issues and a ban possibility
The possible solution to your problem is to download the whole wikipedia dump. Each article contains links to that article in other languages in a predefined format, so you can easily write a map/reduce job that collects that information and builds a correspondence between English article name and the rest.
Then you can use the correspondence in a map/reduce job processing pagecount-statistic. If you do that you'll become independent to mediawiki's api, speed up your data processing and improve debugging.

Resources