My understanding of Common Lisp pseudorandom number generation is that (random 1.0) will generate a fraction strictly less than 1. I would like to get numbers upto 1.0 inclusive. Is this possible? I guess I could decide on a degree of precision and generate integers and divide by the range but I'd like to know if there is a more widely accepted way of doing this. Thanks.
As you say, random will generate numbers in [0,1) by default, and in general (random x) will generate random numbers in [0,x). If these were real numbers and if the distribution really is random, then the probability of getting any number is zero, so this is effectively no different than [0,1]. But they're not real numbers: they're floats, so the probability of getting any particular value is higher since there are only a finite number of floats in [0,1].
Fortunately you can express exactly what you want: CL has a bunch of constants with names like *-epsilon which are defined so that, for instance
(/= (+ 1.0f0 single-float-epsilon) 1.0f0)
and single-float-epsilon is the smallest single-float for which this is true.
Thus (random (+ 1.0f0 single-float-epsilon)) will produce random single-floats in the range [0,1], and will eventually probably turn out 1.0f0. You can test this:
(defun tsit ()
(let ((f (+ 1.0f0 single-float-epsilon)))
(assert (/= f 1.0f0) (f) "oops")
(loop for i upfrom 1
for v = (random f)
when (= v 1.0f0)
return (values i v))))
And for me
> (tsit)
12839205
1.0
If you use double floats it takes ... quite a lot longer ... to get 1.0d0 (and remember to use double-float-epsilon).
I have a bit of a different idea here. Instead of trying to stretch the range over an epsilon, we can work with the original range, and pick a victim number somewhere in that range which gets mapped to the range limit. We can avoid a hard-coded victim by choosing one randomly, and changing it from time to time:
(defun make-random-gen (range)
(let ((victim nil)
(count 1))
(lambda ()
(when (zerop (decf count))
(setf count 10000
victim (random range)))
(let ((out (random range)))
(if (eql out victim) range out)))))
(defun testit ()
(loop with r = (make-random-gen 1.0)
for x = (funcall r)
until (eql x 1.0)
counting t))
At the listener:
[5]> (testit)
23030093
There is a small bias here in that the victim is never equal to range. So that is to say, the range value such as 1.0 is never victim and therefore always has a certain chance of occurring. Whereas every other value can potentially take a turn at being victim, having its chance of occurring temporarily reduced to zero. That should be faintly detectable in a statistical analysis of the output in that the range value will occur slightly more often than any other value.
It would be interesting to update this approach with a correction for that, an attempt to do which is this:
(defun make-random-gen (range)
(let ((victim nil)
(count 1))
(labels ((gen ()
(when (zerop (decf count))
(setf count 10000
victim (gen)))
(let ((out (random range)))
(if (eql out victim) range out))))
#'gen)))
Now when we select victim, we recurse on our own function which can potentially select range. Whenever range is selected as victim, that value is correctly suppressed: range will not occur in the output, because out will never be eql to range.
We can justify this with the following hand-waving argument:
Let us suppose that the recursive call to gen has a slight bias in favor of range being output. But whenever that happens, range is selected as victim, which prevents it from appearing in the output of gen.
There is a kind of negative feedback which should almost entirely correct the bias.
Note: our random-number-generating lambda would be better designed if it also captured a random state object also and used that. Then the sequence it yields would be undisturbed by other uses of the pseudo-random-number generator. That's a different topic.
On a theoretical note, note that neither [0, 1) nor [0, 1] yield strictly correct distributions. If we had a mathematically ideal PRNG, it would yield actual real numbers in these ranges. Since that range contains an uncountable infinity of real values, each one would occur with a zero probability: 1/aleph-null, which, I'm guessing, so tiny, that it cannot be distinguished from a real zero.
What we want is the floating-point PRNG to approximate the ideal PRNG.
The problem is that each floating-point value approximates a range of real values. So this means that if we have a generator of values in the range 0.0 to 1.0, it actually represents a range of real numbers from -epsilon to 1.0 + epsilon. If we take values from this PRNG and plot a bar graph of values, each bar in the graph has to have some nonzero width. The 0.0 bar is centered on 0, and the 1.0 bar is centered on 1. The distribution of real numbers extends from the left edge of the left bar, to the right edge of the right bar.
In order to create a PRNG which mimics an even distribution of values in the 0.0 to 1.0 interval, we have to include the 0.0 and 1.0 values with half probability. So that is to say, when we collect a large number of values from the PRNG, the 0.0 and 1.0 bars of the graph should be about half as high as all the other bars.
Under these conditions, we cannot distinguish the [0, 1.0) interval from the [0, 1.0] interval because they are exactly as large. We must include the 1.0 value, at about half the usual probability to account for the above uniformity problem. If we simply exclude that value, we create a bias in the wrong direction, because the 1.0 bar in the histogram now has a zero value.
One way we could rescue the situation might be to take the 1.0-epsilon bar of the histogram and make that value 50% more likely, so that the bar is 50% taller than average. Basically, we overload that last value of the range just before 1.0 to represent everything up to and not including 1.0, requiring that value to be more likely. And then, we exclude the 1.0 value from the output. All values approaching 1.0 from the left get mapped to the extra 50% probability of 1.0 - epsilon.
Related
This question already has answers here:
Round to nearest multiple of a number
(3 answers)
Closed last year.
Let’s imagine I would like to round a number (i.e x = 7.4355) to a given arbitrary precision (i.e p = 0.002). In this case, I would expect to see:
round_arbitrary(x, p) = 7.436
What would be the best approach to design such a rounding function? Ideas in pseudocode or Rust are welcome
What would be the best approach to design such a rounding function?
An approach that gets near to OP's goal:
// Pseudo code (p != 0)
round_arbitrary(x, p)
x /= p
x = round(x)
return x*p
A key point is that floating point numbers are finite in size and so can represent about 264 different values exactly whereas code values like 7.4355, 0.002 and the math quotient 1/7.0 are of a much bigger set. Thus the above will get one close, but not certainty to an exact mathematically rounded value.
More advanced code would avoid overflow by not rounding large values which do not need rounding.
// Assume 0 < |p| < 1.0
round_arbitrary_2(x, p)
if (round(x) != x)
x /= p
x = round(x)
x *= p;
return x*p
Deeper
This issues lies with floating point numbers that are encoded with an integer times a power-of-2. Then the question is not so much "How to round to an arbitrary (non power-of-ten) precision", but "How to round to an arbitrary (non power-of-2) precision".
Im trying to construct and compare, the fastest possible way, two 01 random vectors of the same length using Julia, each vector with the same number of zeros and ones.
This is all for a MonteCarlo simulation of the following probabilistic question
We have two independent urns, each one with n white balls and n black balls. Then we take a pair of balls, one of each urn, each time up to empty the urns. What is the probability that each pair have the same color?
What I did is the following:
using Random
# Auxiliar function that compare the parity, element by element, of two
# random vectors of length 2n
function comp(n::Int64)
sum((shuffle!(Vector(1:2*n)) .+ shuffle!(Vector(1:2*n))).%2)
end
The above generate two random permutations of the vector from 1 to 2n, add element by element, apply modulo 2 to each elemnt and after sum all the values of the remaining vector. Then Im using above the parity of each number to model it color: odd black and white even.
If the final sum is zero then the two random vectors had the same colors, element by element. A different result says that the two vectors doesnt had paired colors.
Then I setup the following function, that it is just the MonteCarlo simulation of the desired probability:
# Here m is an optional argument that control the amount of random
# experiments in the simulation
function sim(n::Int64,m::Int64=24)
# A counter for the valid cases
x = 0
for i in 1:2^m
# A random pair of vectors is a valid case if they have the
# the same parity element by element so
if comp(n) == 0
x += 1
end
end
# The estimated value
x/2^m
end
Now I want to know if there is a faster way to compare such vectors. I tried the following alternative construction and comparison for the random vectors
shuffle!( repeat([0,1],n)) == shuffle!( repeat([0,1],n))
Then I changed accordingly the code to
comp(n)
With these changes the code runs slightly slower, what I tested with the function #time. Other changes that I did was changing the forstatement for a whilestatement, but the computation time remain the same.
Because Im not programmer (indeed just yesterday I learn something of the Julia language, and installed the Juno front-end) then probably will be a faster way to make the same computations. Some tip will be appreciated because the effectiveness of a MonteCarlo simulation depends on the number of random experiments, so the faster the computation the larger values we can test.
The key cost in this problem is shuffle! therefore in order to maximize the simulation speed you can use (I add it as an answer as it is too long for a comment):
function test(n,m)
ref = [isodd(i) for i in 1:2n]
sum(all(view(shuffle!(ref), 1:n)) for i in 1:m) / m
end
What are the differences from the code proposed in the other answer:
You do not have to shuffle! both vectors; it is enough to shuffle! one of them, as the result of the comparison is invariant to any identical permutation of both vectors after independently shuffling them; therefore we can assume that one vector is after random permutation reshuffled to be ordered so that it has trues in the first n entries and falses in the last n entries
I do shuffle! in-place (i.e. ref vector is allocated only once)
I use all function on the fist half of the vector; this way the check is stopped as I hit first false; if I hit all true in the first n entries I do not have to check the last n entries as I know they are all false so I do not have to check them
To get something cleaner, you could generate directly vectors of 0/1 values, and then just let Julia check for vector equality, e.g.
function rndvec(n::Int64)
shuffle!(vcat(zeros(Bool,n),ones(Bool,n)))
end
function sim0(n::Int64, m::Int64=24)
sum(rndvec(n) == rndvec(n) for i in 1:2^m) / 2^m
end
Avoiding allocation makes the code faster, as explained by Bogumił Kamiński (and letting Julia make the comparison is faster than his code).
function sim1(n::Int64, m::Int64=24)
vref = vcat(zeros(Bool,n),ones(Bool,n))
vshuffled = vref[:]
sum(shuffle!(vshuffled) == vref for i in 1:2^m) / 2^m
end
To go even faster use lazy evaluation and fast exit: if the first element is different, you don't even need to generate the rest of the vectors.
This would make the code much trickier though.
I find it's a bit not in the spirit of the question, but you could also do some more math.
There is binomial(2*n, n) possible vectors generated and you could therefore just compute
function sim2(n::Int64, m::Int64=24)
nvec = binomial(2*n, n)
sum(rand(1:nvec) == 1 for i in 1:2^m) / 2^m
end
Here are some timings I obtain:
#time show(("sim0", sim0(6, 21)))
#time show(("sim1", sim1(6, 21)))
#time show(("sim2", sim2(6, 21)))
#time test(("test", test(6, 2^21)))
("sim0", 0.0010724067687988281) 4.112159 seconds (12.68 M allocations: 1.131 GiB, 11.47% gc time)
("sim1", 0.0010781288146972656) 0.916075 seconds (19.87 k allocations: 1.092 MiB)
("sim2", 0.0010628700256347656) 0.249432 seconds (23.12 k allocations: 1.258 MiB)
("test", 0.0010166168212890625) 1.180781 seconds (2.14 M allocations: 98.634 MiB, 2.22% gc time)
I want to run tests with randomized inputs and need to generate 'sensible' random
numbers, that is, numbers that match good enough to pass the tested function's
preconditions, but hopefully wreak havoc deeper inside its code.
math.random() (I'm using Lua) produces uniformly distributed random
numbers. Scaling these up will give far more big numbers than small numbers,
and there will be very few integers.
I would like to skew the random numbers (or generate new ones using the old
function as a randomness source) in a way that strongly favors 'simple' numbers,
but will still cover the whole range, i.e., extending up to positive/negative infinity
(or ±1e309 for double). This means:
numbers up to, say, ten should be most common,
integers should be more common than fractions,
numbers ending in 0.5 should be the most common fractions,
followed by 0.25 and 0.75; then 0.125,
and so on.
A different description: Fix a base probability x such that probabilities
will sum to one and define the probability of a number n as xk
where k is the generation in which n is constructed as a surreal
number1. That assigns x to 0, x2 to -1 and +1,
x3 to -2, -1/2, +1/2 and +2, and so on. This
gives a nice description of something close to what I want (it skews a bit too
much), but is near-unusable for computing random numbers. The resulting
distribution is nowhere continuous (it's fractal!), I'm not sure how to
determine the base probability x (I think for infinite precision it would be
zero), and computing numbers based on this by iteration is awfully
slow (spending near-infinite time to construct large numbers).
Does anyone know of a simple approximation that, given a uniformly distributed
randomness source, produces random numbers very roughly distributed as
described above?
I would like to run thousands of randomized tests, quantity/speed is more
important than quality. Still, better numbers mean less inputs get rejected.
Lua has a JIT, so performance is usually not much of an issue. However, jumps based
on randomness will break every prediction, and many calls to math.random()
will be slow, too. This means a closed formula will be better than an
iterative or recursive one.
1 Wikipedia has an article on surreal numbers, with
a nice picture. A surreal number is a pair of two surreal
numbers, i.e. x := {n|m}, and its value is the number in the middle of the
pair, i.e. (for finite numbers) {n|m} = (n+m)/2 (as rational). If one side
of the pair is empty, that's interpreted as increment (or decrement, if right
is empty) by one. If both sides are empty, that's zero. Initially, there are
no numbers, so the only number one can build is 0 := { | }. In generation
two one can build numbers {0| } =: 1 and { |0} =: -1, in three we get
{1| } =: 2, {|1} =: -2, {0|1} =: 1/2 and {-1|0} =: -1/2 (plus some
more complex representations of known numbers, e.g. {-1|1} ? 0). Note that
e.g. 1/3 is never generated by finite numbers because it is an infinite
fraction – the same goes for floats, 1/3 is never represented exactly.
How's this for an algorithm?
Generate a random float in (0, 1) with a library function
Generate a random integral roundoff point according to a desired probability density function (e.g. 0 with probability 0.5, 1 with probability 0.25, 2 with probability 0.125, ...).
'Round' the float by that roundoff point (e.g. floor((float_val << roundoff)+0.5))
Generate a random integral exponent according to another PDF (e.g. 0, 1, 2, 3 with probability 0.1 each, and decreasing thereafter)
Multiply the rounded float by 2exponent.
For a surreal-like decimal expansion, you need a random binary number.
Even bits tell you whether to stop or continue, odd bits tell you whether to go right or left on the tree:
> 0... => 0.0 [50%] Stop
> 100... => -0.5 [<12.5%] Go, Left, Stop
> 110... => 0.5 [<12.5%] Go, Right, Stop
> 11100... => 0.25 [<3.125%] Go, Right, Go, Left, Stop
> 11110... => 0.75 [<3.125%] Go, Right, Go, Right, Stop
> 1110100... => 0.125
> 1110110... => 0.375
> 1111100... => 0.625
> 1111110... => 0.875
One way to quickly generate a random binary number is by looking at the decimal digits in math.random() and replace 0-4 with '1' and 5-9 with '1':
0.8430419054348022
becomes
1000001010001011
which becomes -0.5
0.5513009827118367
becomes
1100001101001011
which becomes 0.25
etc
Haven't done much lua programming, but in Javascript you can do:
Math.random().toString().substring(2).split("").map(
function(digit) { return digit >= "5" ? 1 : 0 }
);
or true binary expansion:
Math.random().toString(2).substring(2)
Not sure which is more genuinely "random" -- you'll need to test it.
You could generate surreal numbers in this way, but most of the results will be decimals in the form a/2^b, with relatively few integers. On Day 3, only 2 integers are produced (-3 and 3) vs. 6 decimals, on Day 4 it is 2 vs. 14, and on Day n it is 2 vs (2^n-2).
If you add two uniform random numbers from math.random(), you get a new distribution which has a "triangle" like distribution (linearly decreasing from the center). Adding 3 or more will get a more 'bell curve' like distribution centered around 0:
math.random() + math.random() + math.random() - 1.5
Dividing by a random number will get a truly wild number:
A/(math.random()+1e-300)
This will return an results between A and (theoretically) A*1e+300,
though my tests show that 50% of the time the results are between A and 2*A
and about 75% of the time between A and 4*A.
Putting them together, we get:
round(6*(math.random()+math.random()+math.random() - 1.5)/(math.random()+1e-300))
This has over 70% of the number returned between -9 and 9 with a few big numbers popping up rarely.
Note that the average and sum of this distribution will tend to diverge towards a large negative or positive number, because the more times you run it, the more likely it is for a small number in the denominator to cause the number to "blow up" to a large number such as 147,967 or -194,137.
See gist for sample code.
Josh
You can immediately calculate the nth born surreal number.
Example, the 1000th Surreal number is:
convert to binary:
1000 dec = 1111101000 bin
1's become pluses and 0's minuses:
1111101000
+++++-+---
The first '1' bit is 0 value, the next set of similar numbers is +1 (for 1's) or -1 (for 0's), then the value is 1/2, 1/4, 1/8, etc for each subsequent bit.
1 1 1 1 1 0 1 0 0 0
+ + + + + - + - - -
0 1 1 1 1 h h h h h
+0+1+1+1+1-1/2+1/4-1/8-1/16-1/32
= 3+17/32
= 113/32
= 3.53125
The binary length in bits of this representation is equal to the day on which that number was born.
Left and right numbers of a surreal number are the binary representation with its tail stripped back to the last 0 or 1 respectively.
Surreal numbers have an even distribution between -1 and 1 where half of the numbers created to a particular day will exist. 1/4 of the numbers exists evenly distributed between -2 to -1 and 1 to 2 and so on. The max range will be negative to positive integers matching the number of days you provide. The numbers go to infinity slowly because each day only adds one to the negative and positive ranges and days contain twice as many numbers as the last.
Edit:
A good name for this bit representation is "sinary"
Negative numbers are transpositions. ex:
100010101001101s -> negative number (always start 10...)
111101010110010s -> positive number (always start 01...)
and we notice that all bits flip accept the first one which is a transposition.
Nan is => 0s (since all other numbers start with 1), which makes it ideal for representation in bit registers in a computer since leading zeros are required (we don't make ternary computer anymore... too bad)
All Conway surreal algebra can be done on these number without needing to convert to binary or decimal.
The sinary format can be seem as a one plus a simple one's counter with a 2's complement decimal representation attached.
Here is an incomplete report on finary (similar to sinary): https://github.com/peawormsworth/tools/blob/master/finary/Fine%20binary.ipynb
How would you implement a function that is returning a random number from interval 1..1000
in the case there is a number N determining the chance of reaching higher numbers or lower numbers?
It should behave as follows:
e.g.
if N = 0 and we will generate many times the random number we will get a certain equilibrium (every number from interval 1..1000 has equal chance).
if N = 2321 (I call it positive factor) it will be very hard to achieve small number (often will be generated numbers > 900, sometimes numbers near 500 and rarely numbers < 100). The highest the positive factor the highest probability for high numbers
if N = -2321 (negative factor) this will be the opposite of positive factor
It's clear that the generated numbers will create for given N certain characteristic curve. Could you advise me how to achieve this goal and what curves can I create? What possibilities do I have here? How would you limit positive and negative factors etc.
thank you for help
If you generate a uniform random number, and then raise it to a power > 1, it will get smaller, but stay in the range [0, 1]. If you raise it to a power greater than 0 but less than 1, it will get larger, but stay in the range [0, 1].
So you can use the exponent to pick a power when generating your random numbers.
def biased_random(scale, bias):
return random.random() ** bias * scale
sum(biased_random(1000, 2.5) for x in range(100)) / 100
291.59652962214676 # average less than 500
max(biased_random(1000, 2.5) for x in range(100))
963.81166161355998 # but still occasionally generates large numbers
sum(biased_random(1000, .3) for x in range(100)) / 100
813.90199860117821 # average > 500
min(biased_random(1000, .3) for x in range(100))
265.25040459294883 # but still occasionally generates small numbers
This problem is severely underspecified. There are a million ways to solve it as it is mentioned.
Instead of arbitrary positive and negative values, try to think what is the meaning behind them. IMHO, beta distribution is the one you should consider. By selecting the parameters \alpha and \beta you should be appropriately modulate the behavior of your distribution.
See what shapes you can get with certain \alpha and \beta http://en.wikipedia.org/wiki/Beta_distribution#Shapes
http://en.wikipedia.org/wiki/File:Beta_distribution_pdf.svg
Lets for beginning decide that we will pick numbers from [0,1] because it makes stuff simpler.
n is number that represents distribution (0,2321 or -2321) as in example
We need solution only for n > 0, because if n < 0. You can take positive version of n and subtract from 1.
One simple idea for PDF in interval [0,1] is x^n. (or at least this kind of shape)
Calculating CDF is then integrating x^n and is x^(n+1)/(n+1)
Because CDF must be 1 at the end (in our case at 1) our final CDF is than x^(n+1) and is properly weighted
In order to generate this kind distribution from this, we must calaculate quantile function
Quantile function is just inverse of CDF and is in our case. x^(1/(n+1))
And that is it. Your QF is x^(1/(n+1))
To generate numbers from [0,1] you have to pick uniformly distributetd random from [0,1] (most common random function in programming languages)
and than power this whit (1/(n+1))
Only problem I see is that it can be problem to calculate 1-x^(1/(-n+1)) correctly, where n < 0 but i think that you can use log1p,
so it becomes exp(log1p(-x^(1/(-n+1))) if n<0
conclusion whit normalizations
if n>=0: (x^(1/(n/1000+1)))*1000
if n<0: exp(log1p(-(x^(1/(-(n/1000)+1)))))*1000
where x is uniformly distributed random value in interval [0,1]
I have some periodic data, but the amount of data is not a multiple of
the period. How can I Fourier analyze this data? Example:
% Let's create some data for testing:
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}]
% I now receive this data, but have no idea that it came from the
formula above. I'm trying to reconstruct the formula just from 'data'.
% Looking at the first few non-constant terms of the Fourier series:
ListPlot[Table[Abs[Fourier[data]][[x]], {x,2,20}], PlotJoined->True,
PlotRange->All]
shows an expected spike at 6 (since the number of periods is really
25000/(623*2*Pi) or about 6.38663, though we don't know this).
% Now, how do I get back 6.38663? One way is to "convolve" the data with
arbitrary multiples of Cos[x].
convolve[n_] := Sum[data[[x]]*Cos[n*x], {x,1,25000}]
% And graph the "convolution" near n=6:
Plot[convolve[n],{n,5,7}, PlotRange->All]
we see a spike roughly where expected.
% We try FindMaximum:
FindMaximum[convolve[n],{n,5,7}]
but the result is useless and inaccurate:
FindMaximum::fmmp:
Machine precision is insufficient to achieve the requested accuracy or
precision.
Out[119]= {98.9285, {n -> 5.17881}}
because the function is very wiggly.
% By refining our interval (using visual analysis on the plots), we
finally find an interval where convolve[] doesn't wiggle too much:
Plot[convolve[n],{n,6.2831,6.2833}, PlotRange->All]
and FindMaximum works:
FindMaximum[convolve[n],{n,6.2831,6.2833}] // FortranForm
List(1.984759605826571e7,List(Rule(n,6.2831853071787975)))
% However, this process is ugly, requires human intervention, and
computing convolve[] is REALLY slow. Is there a better way to do this?
% Looking at the Fourier series of the data, can I somehow divine the
"true" number of periods is 6.38663? Of course, the actual result
would be 6.283185, since my data fits that better (because I'm only
sampling at a finite number of points).
Based on Mathematica help for the Fourier function / Applications / Frequency Identification:
Checked on version 7
n = 25000;
data = Table[N[753 + 919*Sin[x/623 - 125]], {x, 1, n}];
pdata = data - Total[data]/Length[data];
f = Abs[Fourier[pdata]];
pos = Ordering[-f, 1][[1]]; (*the position of the first Maximal value*)
fr = Abs[Fourier[pdata Exp[2 Pi I (pos - 2) N[Range[0, n - 1]]/n],
FourierParameters -> {0, 2/n}]];
frpos = Ordering[-fr, 1][[1]];
N[(pos - 2 + 2 (frpos - 1)/n)]
returns 6.37072
Look for the period length using autocorrelation to get an estimate:
autocorrelate[data_, d_] :=
Plus ## (Drop[data, d]*Drop[data, -d])/(Length[data] - d)
ListPlot[Table[{d, autocorrelate[data, d]}, {d, 0, 5000, 100}]]
A smart search for the first maximum away from d=0 may be the best estimate you can get form the available data?
(* the data *)
data = Table[N[753+919*Sin[x/623-125]], {x,1,25000}];
(* Find the position of the largest Fourier coefficient, after
removing the last half of the list (which is redundant) and the
constant term; the [[1]] is necessary because Ordering returns a list *)
f2 = Ordering[Abs[Take[Fourier[data], {2,Round[Length[data]/2+1]}]],-1][[1]]
(* Result: 6 *)
(* Directly find the least squares difference between all functions of
the form a+b*Sin[c*n-d], with intelligent starting values *)
sol = FindMinimum[Sum[((a+b*Sin[c*n-d]) - data[[n]])^2, {n,1,Length[data]}],
{{a,Mean[data]},{b,(Max[data]-Min[data])/2},{c,2*f2*Pi/Length[data]},d}]
(* Result (using //InputForm):
FindMinimum::sszero:
The step size in the search has become less than the tolerance prescribed by
the PrecisionGoal option, but the gradient is larger than the tolerance
specified by the AccuracyGoal option. There is a possibility that the method
has stalled at a point that is not a local minimum.
{2.1375902350021628*^-19, {a -> 753., b -> -919., c -> 0.0016051364365971107,
d -> 2.477886509998064}}
*)
(* Create a table of values for the resulting function to compare to 'data' *)
tab = Table[a+b*Sin[c*x-d], {x,1,Length[data]}] /. sol[[2]];
(* The maximal difference is effectively 0 *)
Max[Abs[data-tab]] // InputForm
(* Result: 7.73070496506989*^-12 *)
Although the above doesn't necessarily fully answer my question, I found it
somewhat remarkable.
Earlier, I'd tried using FindFit[] with Method -> NMinimize (which is
supposed to give a better global fit), but that didn't work well,
possibly because you can't give FindFit[] intelligent starting values.
The error I get bugs me but appears to be irrelevant.