Does anybody know how to obtain the confidence intervals of a polychoric and tetrachoric correlations in R - correlation

I tried with John Fox's "polycor" package, but it does not show the ICs. I have also tried with the "psych" package but nothing. The only thing I got are the standard errors and the thresholds. Any help will be greatly appreciated.

Try the cor.ci function in psych. This will find the confidence intervals by bootstrapping.
In addition, cor.plot.upperLowerCi will then display a correlation plot showing the upper and lower confidence values.
p.c <-cor.ci(bfi[1:200,1:5], poly = TRUE)
cor.plot.upperLowerCi(p.c, numbers=TRUE)

Related

Statistics/Algorithm: How do I compare a weekly graph with its own history to see when in the past it was almost the same?

I’ve got a statistical/mathematical problem I’m stumped on and I was really hoping to get some help. I’m working on a research where I need to compare a weekly graph with its own history to see when in the past it was almost the same. Think of this as “finding the closest match”. The information is displayed as a line graph, but it’s readily available as raw data:
Date...................Result
08/10/18......52.5
08/07/18......60.2
08/06/18......58.5
08/05/18......55.4
08/04/18......55.2
and so on...
What I really want is the output to be a form of correlation between the current data points with the other set of 5 concurrent data points in history. So, something like:
Date range.....................Correlation
07/10/18-07/15/18....0.98
We’ll be getting a code written in Python for the software to do this automatically (so that as new data is added, it automatically runs and finds the closest set of numbers to match the current one).
Here’s where the difficulty sets in: Since numbers are on a general upward trend over time, we don’t want it to compare the absolute value (since the numbers might never really match). One suggestion has been to compare the delta (rate of change as a percentage over the previous day), or using a log scale.
I’m wondering: how do I go about this? What kind of calculation I can use to get the desired results? I’ve looked at the different kind of correlation equations, but they don’t account for the “shape” of the data, and they generally just average it out. The shape of the line chart is the important thing.
Thanks very much in advance!
I would simply divide the data of each week by their average (i.e., normalize them to an average of 1), then sum the squares of the differences of each day of each pair of weeks. This sum is what you want to minimize.
If you don't care about how much a graph oscillates relative to its mean, you can normalize also the variance. For each week, calculate mean and variance, then subtract the mean and divide by the root of the variance. Each week will have mean 0 and variance 1. Then minimize the sum of squares of differences like before.
If the normalization of data is all you can change in your workflow, just leave out the sum of squares of differences minimization part.

determine optimal cut-off value for data (in matlab)

I realize this is an unspecific question (because I don't know a lot about the topic, please help me in this regard), that said here's the task I'd like to achieve:
Find a statistically sound algorithm to determine an optimal cut-off value to binarize a vector to filter out minimal values (i.e. get rid of). Here's code in matlab to visualize this problem:
randomdata=rand(1,100,1);
figure;plot(randomdata); %plot random data between 0 and 1
cutoff=0.5; %plot cut-off value
line(get(gca,'xlim'),[cutoff cutoff],'Color','red');
Thanks
You could try using Matlab's percentile function:
cutoff = prctile(randomdata,10);

Mood median test

I have two vectors, and I would like to use a statistic test to know if their median are equal, but I don't know how to do that with RStudio.
Is there someone who could help me ?
Thank you very much !
You should use boxplot.
Read the description of it. You can make notches=T, to include more stats on the graph.
Also, be sure to assign boxplot to a name to gather the stats from it
info<-boxplot(your.vectors)

Good algorithm for maximum likelihood estimation

I have a problem. I need to estimate some statistics with GARCH/ARCH model. In Matlab I use something like this:
spec = garchset('P', 1, 'Q', 1)
[fit01,~,LogL01] =garchfit(spec, STAT);
so this returns three parameters of GARCH model with maximum likelihood.
But I really need to how which algorithm is used in garchfit , because I need to write a program which makes the same work in estimating parameters automatically.
My program works now very slow and sometimes not correct.
So the questions are:
How get the code of garchfit or MLE in Matlab?
Does anyone know some good and fast algorithm on MLE?
(MLE = maximum likelihood estimation)
To see the code (if possible) you can type edit garchfit.
From the documentation of garchfit I have found some recommendations:
garchfit will be removed in a future release. Use estimate, estimate,
estimate, or estimate instead.
My guess is that you want to look into garch.estimate.

Equation for "importance" value of twitter user according to #followers #following

I am trying to find an equation which calculates the "importance" of a twitter user according to #following #followers
Things I want to consider:
1. The more #followers / #following is bigger, the more important he his.
2. differ between 20/20 and 10k/10k (10k is more important although the ratio is the same).
Considering these two, I expect to get a similar output importance value to these two inputs:
#followers=1000 #following=100
#followers=30k #following=30k
I'm having problems inserting the second point into consideration. I believe it needs to be quite simple. Help?
Thanks
one possibility is (#followers/#following)*[log(#followers) - CONST] where CONST is some predefined value, tested as appropriate. this will ensure the ratio has its appropriate importance, but also the scale matters.
for your last example, you will need to set CONST~=9.4 to achieve similar results.
There are too many answers to this question, you need to weight how important is the number of followers compared to the ratio so you get a common number to relationate this two. For example the first idea that come to my mind is to multiply the ratio by the log of the #Followers. Something like this.
Importance = (#Followers / #Following)*Log(#Followers)
Based on what you said there, you could do 3*followers^2/following.
But you've described a system where users can increase their importance by following fewer other users. Doesn't seem too awesome.
You could normalize it by the total number of users.
I'd suggest using logarithms on all the values to get a less dramatic increase or change in higher values.
(log(#followers)/log(#TotalNumberOfPeopleInTwitter))*(log(#followers)/log(#following))

Resources