How to fit and calculate the average for data sets in xmgrace? - curve-fitting

I have a function which I like to fit it to a function of Y=a+(1-a)exp(-x/T) to get the T value for it.
I want to do these using Xmgrace but I do not know how.
Thanks for your suggestions.

On the xmgrace window, click Data Transformations Non-linear curve fitting.
On formula section, type in
a0+(1-a0)*exp(-x/a1)
You have 2 parameters, a0 and a1. On parametes section, select 2. Make an inital guess, range, tolerance and number of iterations. Normally default values would suffice for tolerance and iterations.
Hit Apply. Keep on hitting it untill a good fit is obtained.
Note - A good guess of initial parameters will help you get a good fit faster.

Related

how to plot variables with possibly wild variable values?

I want to build an application that would do something equivalent to running lsof (maybe changing it to output differently, because string processing may mean it is not real time enough) in a loop and then associate each line (entries) with what iteration it was present in, what I will be referring further as frames, as later on it will be better for understanding. My intention with it is that showing the times in which files are open by applications can reveal something about their structure, while not having big impact on their execution, which is often a problem. One problem I have is on processing the output, which would be a table relating "frames X entry", for that I am already anticipating that I will have wildly variable entry lengths. Which can fall in that problem of representing on geometry when you have very different scales, the smaller get infinitely small, while the bigger gets giant and fragmentation makes it even worse; so my question is if plotting libraries deal with this problem and how they do it
The easiest and most well-established technique for showing both small and large values in reasonable detail is a logarithmic scale. Instead of plotting raw values, plot their logarithms. This is notoriously problematic if you can have zero or even negative values, but as I understand your situations all your lengths would be strictly positive so this should work.
Another statistical solution you could apply is to plot ranks instead of raw values. Take all the observed values, and put them in a sorted list. When plotting any single data point, instead of plotting the value itself you look up that value in the list of values (possibly using binary search since it's a sorted list) then plot the index at which you found the value.
This is a monotonous transformation, so small values map to small indices and big values to big indices. On the other hand it completely discards the actual magnitude, only the relative comparisons matter.
If this is too radical, you could consider using it as an ingredient for something more tuneable. You could experiment with a linear combination, i.e. plot
a*x + b*log(x) + c*rank(x)
then tweak a, b and c till the result looks pleasing.

Outcome difference: using list & for-loop vs. single parameter input

This is my first question, so please let me know if I'm not giving enough details or asking a question that is not relevant on this platform!
I want to compute the same formula over a grid running from 0 to 4.0209, therefore I'm using a for-loop with an defined array using numpy.
To be certain that the for-loop is right, I've computed a selection of values by just using specific values for the radius an input in the formula.
Now, the outcomes with the same input of the radius is just slightly different. Do I interpret my grid wrongly? Or is there an error in my script?
It probably is something pretty straightforward, but maybe some of you can find a minute to help me out.
Here I use a selection of values for my radius parameter.
Here I use a for-loop to compute over a distance
Here are the differences in the outcomes:
Outcomes computed with for-loop:
9.443,086753902220000000
1.935,510475232510000000
57,174050755727700000
1,688894026484580000
0,020682674424032700
Outcomes computed with selected radii:
9.444,748178731630000000
1.938,918526458330000000
57,476599453309800000
1,703815523775800000
0,020957378277984600

Clustsig with modified method.distance

I am attempting to perform a Simprof test using a Pearson correlation as a distance method. I am aware that it is designed for the typical distance methods such as euclidean or bray curtis, but it supposedly allows any function that returns a dist object.
My issue lies with the creation of that function. My original data exists as a set of 35 rows and 2146 columns. I wish to correlate the columns. Below is a small subset of that data (lines 78-82).
I need a function that takes the absolute value of the Pearson correlation coefficient metric to be used as the method.distance function. I can calculate those individually, as seen in lines 84-86, but I have no idea how to make a single function that contains all of that. My attempt is on lines 89-91, but I know that as.dist needs the matrix of correlation coefficients, which you can only get from CorrelationSmall$r. I'm assuming it needs to be nested, but I'm at a loss. I apologize if I'm am asking something ridiculous. I have combed the forums and don't know who else to ask. Many thanks!
library(clustsig)
library(Hmisc)
NetworkAnalysisSmall <- read_csv("C:/Users/WilhelmLab/Desktop/Lena/NetworkAnalysisSmall.csv")
NetworkAnalysisSmallMatrix<-as.matrix(NetworkAnalysisSmall)
#subset of NetworkAnalysisSmall
a<-c(0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000001505,0.0000000000685,0.0000000009909,0.0000000001543,0.0000000000000,0.0000000000000,0.0000000000000)
b<-c(0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000002228,0.0000000000000,0.0000000001375,0.0000000000000,0.0000000000000)
c<-c(0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000546,0.0000000000000,0.0000000000000,0.0000000002293,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000540,0.0000000002085,0.0000000000000,0.0000000000000,0.0000000000000,0.0000000000000)
subset<-data.frame(a,b,c)
CorrelationSmall<-rcorr(as.matrix(NetworkAnalysisSmall),type=c("pearson"))
CCsmall<-CorrelationSmall$r
CCsmallAbs<-abs(CCsmall)
dist3 = function(x) {
as.dist(rcorr(as.matrix(x),type=c("pearson")))
}
NetworkSimprof<-simprof(NetworkAnalysisSmall,num.expected=1000,num.simulated=1000,method.cluster=c("ward"),method.distance=c("dist3"),method.transform=c("log"),alpha=0.05,sample.orientation="column")

Statistics/Algorithm: How do I compare a weekly graph with its own history to see when in the past it was almost the same?

I’ve got a statistical/mathematical problem I’m stumped on and I was really hoping to get some help. I’m working on a research where I need to compare a weekly graph with its own history to see when in the past it was almost the same. Think of this as “finding the closest match”. The information is displayed as a line graph, but it’s readily available as raw data:
Date...................Result
08/10/18......52.5
08/07/18......60.2
08/06/18......58.5
08/05/18......55.4
08/04/18......55.2
and so on...
What I really want is the output to be a form of correlation between the current data points with the other set of 5 concurrent data points in history. So, something like:
Date range.....................Correlation
07/10/18-07/15/18....0.98
We’ll be getting a code written in Python for the software to do this automatically (so that as new data is added, it automatically runs and finds the closest set of numbers to match the current one).
Here’s where the difficulty sets in: Since numbers are on a general upward trend over time, we don’t want it to compare the absolute value (since the numbers might never really match). One suggestion has been to compare the delta (rate of change as a percentage over the previous day), or using a log scale.
I’m wondering: how do I go about this? What kind of calculation I can use to get the desired results? I’ve looked at the different kind of correlation equations, but they don’t account for the “shape” of the data, and they generally just average it out. The shape of the line chart is the important thing.
Thanks very much in advance!
I would simply divide the data of each week by their average (i.e., normalize them to an average of 1), then sum the squares of the differences of each day of each pair of weeks. This sum is what you want to minimize.
If you don't care about how much a graph oscillates relative to its mean, you can normalize also the variance. For each week, calculate mean and variance, then subtract the mean and divide by the root of the variance. Each week will have mean 0 and variance 1. Then minimize the sum of squares of differences like before.
If the normalization of data is all you can change in your workflow, just leave out the sum of squares of differences minimization part.

In matlab, speed up cross correlation

I have a long time series with some repeating and similar looking signals in it (not entirely periodical). The length of the time series is about 60000 samples. To identify the signals, I take out one of them, having a length of around 1000 samples and move it along my timeseries data sample by sample, and compute cross-correlation coefficient (in Matlab: corrcoef). If this value is above some threshold, then there is a match.
But this is excruciatingly slow (using 'for loop' to move the window).
Is there a way to speed this up, or maybe there is already some mechanism in Matlab for this ?
Many thanks
Edited: added information, regarding using 'xcorr' instead:
If I use 'xcorr', or at least the way I have used it, I get the wrong picture. Looking at the data (first plot), there are two types of repeating signals. One marked by red rectangles, whereas the other and having much larger amplitudes (this is coherent noise) is marked by a black rectangle. I am interested in the first type. Second plot shows the signal I am looking for, blown up.
If I use 'xcorr', I get the third plot. As you see, 'xcorr' gives me the wrong signal (there is in fact high cross correlation between my signal and coherent noise).
But using "'corrcoef' and moving the window, I get the last plot which is the correct one.
There maybe a problem of normalization when using 'xcorr', but I don't know.
I can think of two ways to speed things up.
1) make your template 1024 elements long. Suddenly, correlation can be done using FFT, which is significantly faster than DFT or element-by-element multiplication for every position.
2) Ask yourself what it is about your template shape that you really care about. Do you really need the very high frequencies, or are you really after lower frequencies? If you could re-sample your template and signal so it no longer contains any frequencies you don't care about, it will make the processing very significantly faster. Steps to take would include
determine the highest frequency you care about
filter your data so higher frequencies are blocked
resample the resulting data at a lower sampling frequency
Now combine that with a template whose size is a power of 2
You might find this link interesting reading.
Let us know if any of the above helps!
Your problem seems like a textbook example of cross-correlation. Therefore, there's no good reason using any solution other than xcorr. A few technical comments:
xcorr assumes that the mean was removed from the two cross-correlated signals. Furthermore, by default it does not scale the signals' standard deviations. Both of these issues can be solved by z-scoring your two signals: c=xcorr(zscore(longSig,1),zscore(shortSig,1)); c=c/n; where n is the length of the shorter signal should produce results equivalent with your sliding window method.
xcorr's output is ordered according to lags, which can obtained as in a second output argument ([c,lags]=xcorr(..). Always plot xcorr results by plot(lags,c). I recommend trying a synthetic signal to verify that you understand how to interpret this chart.
xcorr's implementation already uses Discere Fourier Transform, so unless you have unusual conditions it will be a waste of time to code a frequency-domain cross-correlation again.
Finally, a comment about terminology: Correlating corresponding time points between two signals is plain correlation. That's what corrcoef does (it name stands for correlation coefficient, no 'cross-correlation' there). Cross-correlation is the result of shifting one of the signals and calculating the correlation coefficient for each lag.

Resources