So heres my question, I have a big 3dim array which is 100GB in size as a #zarr file (the array is more than twice the size). I have tried using the histogram from #Dask to calculate but I get an error saying that it cant do it because the file has tuples within tuples. Im guessing thats the zarr file formate rather than anything else?
any thoughts?
edit:
yes the bigger computer thing wouldnt actually work...
Im running a dask client on a single machine, it runsthe calculation but just gets stuck somewhere.
I just tried dask.map function across the file but when I plot it out I get something like this:
ValueError: setting an array element with a sequence.
heres a version of the script:
def histo(img):
return da.histogram(img, bins=255, range=[0, 255])
histo_1 = da.map_blocks(histo, fimg)
I am actually going to try and use it out side of the map function. I wonder rather than the map funtion, does the windowing from map blocks, actually cause the issue. well, ill let you know if it is or now....
edit 2
So I tried to remove the map blocks function as suggested and this was my result:
[in] h, bins =da.histogram(fused_crop, bins=255, range=[0, 255])
[in] bins
[out] array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9., 10.,
11., 12., 13., 14., 15., 16., 17., 18., 19., 20., 21.,
22., 23., 24., 25., 26., 27., 28., 29., 30., 31., 32.,
33., 34., 35., 36., 37., 38., 39., 40., 41., 42., 43.,
44., 45., 46., 47., 48., 49., 50., 51., 52., 53., 54.,
55., 56., 57., 58., 59., 60., 61., 62., 63., 64., 65.,
66., 67., 68., 69., 70., 71., 72., 73., 74., 75., 76.,
77., 78., 79., 80., 81., 82., 83., 84., 85., 86., 87.,
88., 89., 90., 91., 92., 93., 94., 95., 96., 97., 98.,
99., 100., 101., 102., 103., 104., 105., 106., 107., 108., 109.,
110., 111., 112., 113., 114., 115., 116., 117., 118., 119., 120.,
121., 122., 123., 124., 125., 126., 127., 128., 129., 130., 131.,
132., 133., 134., 135., 136., 137., 138., 139., 140., 141., 142.,
143., 144., 145., 146., 147., 148., 149., 150., 151., 152., 153.,
154., 155., 156., 157., 158., 159., 160., 161., 162., 163., 164.,
165., 166., 167., 168., 169., 170., 171., 172., 173., 174., 175.,
176., 177., 178., 179., 180., 181., 182., 183., 184., 185., 186.,
187., 188., 189., 190., 191., 192., 193., 194., 195., 196., 197.,
198., 199., 200., 201., 202., 203., 204., 205., 206., 207., 208.,
209., 210., 211., 212., 213., 214., 215., 216., 217., 218., 219.,
220., 221., 222., 223., 224., 225., 226., 227., 228., 229., 230.,
231., 232., 233., 234., 235., 236., 237., 238., 239., 240., 241.,
242., 243., 244., 245., 246., 247., 248., 249., 250., 251., 252.,
253., 254., 255.])
[in] h.compute
[out] <bound method DaskMethodsMixin.compute of dask.array<sum-aggregate, shape=(255,), dtype=int64, chunksize=(255,), chunktype=numpy.ndarray>>
im going to try in another notebook and see if it still occurs.
edit 3
its the stranges thing, but if I just declare the variable h, it comes out as one small element from the dask array?
edit
Strange, if i call the xarray.hist or the da.hist function, they both fall over. If I use the skimage.exposure.histogram it works but it appears that the zarr file is unpacked before the histogram is a calculated. Which is a bit of a problem...
Update 7th June 2020 (with a solution for not big but annoyingly medium data) see below for answer.
You probably want to use dask's function for this rather than map_blocks. For the latter, Dask expects the output of each call to be the same size as the input block, or a shape derived from the input block, instead of the one-dimensional fixed-size output of histogram.
h, bins =da.histogram(fused_crop, bins=255, range=[0, 255])
h.compute()
Update 7th June 2020 (with a solution for not big but annoyingly medium data):
So unfortunately I got a bit ill around this time and it took a while for me to feel a bit better. Then the pandemic happened and I was on full childcare duty. I tried lots of different option and what ultimately, this looked like was that the following:
1) if just using x.compute, the memory would very quickly fill up.
2) Using distributed would fill the hard drive with spill to disk and take hours but would hang and crash and not do anything because...it would compute (im guessing here but based on the graph and dask api) it would create a sub histogram array for every chunk... that would all need to be merged at some point.
3) The chunking of my data was sub optimal so the amount of tasks was massive but even then I couldn't compute a histogram when i improved the chunking.
In the end I looked for a dynamic way of updating the histogram data. So I used Zarr to do it, by computing to it. Since it allows conccurrent reads and writing functions. As a reminder : my data is a zarr array in 3 dims x,y,z and uncompressed 300GB but compressed it's about 100GB. On my 4 yr old laptop with 16GB of ram using the following worked (I should have said my data was 16 bit unsigned:
imgs = da.from_zarr(.....)
imgs2 = imgs.rechunk((a,b,c)) ## individual chunk dim per dim
h, bins = da.histogram(imgs2, bins = 255, range=[0, 65535]) # binning to 256
h_out = da.to_zarr(h, "histogram.zarr")
I ran the progress bar alongside the process and to get a histogram from file took :
[########################################] | 100% Completed | 18min 47.3s
Which I dont think is too bad for a 300GB array. Hopefully this helps someone else as well, thanks for the help earlier in the year #mdurant.
I have created a very basic script in pinescript.
study(title='Renko Strat w/ Alerts', shorttitle='S_EURUSD_5_[MakisMooz]', overlay=true)
rc = close
buy_entry = rc[0] > rc[2]
sell_entry = rc[0] < rc[2]
alertcondition(buy_entry, title='BUY')
alertcondition(sell_entry, title='SELL')
plot(buy_entry/10)
The problem is that I get a lot of duplicate alerts. I want to edit this script so that I only get a 'Buy' alert when the previous alert was a 'Sell' alert and visa versa. It seems like such a simple problem, but I have a hard time finding good sources to learn pinescript. So, any help would be appreciated. :)
One way to solve duplicate alters within the candle is by using "Once Per Bar Close" alert. But for alternative alerts (Buy - Sell) you have to code it with different logic.
I Suggest to use Version 3 (version shown above the study line) than version 1 and 2 and you can accomplish the result by using this logic:
buy_entry = 0.0
sell_entry = 0.0
buy_entry := rc[0] > rc[2] and sell_entry[1] == 0? 2.0 : sell_entry[1] > 0 ? 0.0 : buy_entry[1]
sell_entry := rc[0] < rc[2] and buy_entry[1] == 0 ? 2.0 : buy_entry[1] > 0 ? 0.0 : sell_entry[1]
alertcondition(crossover(buy_entry ,1) , title='BUY' )
alertcondition(crossover(sell_entry ,1), title='SELL')
You'll have to do it this way
if("Your buy condition here")
strategy.entry("Buy Alert",true,1)
if("Your sell condition here")
strategy.entry("Sell Alert",false,1)
This is a very basic form of it but it works.
You were getting duplicate alerts because the conditions were fulfulling more often. But with strategy.entry(), this won't happen
When the sell is triggered, as per paper trading, the quantity sold will be double (one to cut the long position and one to create a short position)
PS :You will have to add code to create alerts and enter this not in study() but strategy()
The simplest solution to this problem is to use the built-in crossover and crossunder functions.
They consider the entire series of in-this-case close values, only returning true the moment they cross rather than every single time a close is lower than the close two candles ago.
//#version=5
indicator(title='Renko Strat w/ Alerts', shorttitle='S_EURUSD_5_[MakisMooz]', overlay=true)
c = close
bool buy_entry = false
bool sell_entry = false
if ta.crossover(c[1], c[3])
buy_entry := true
alert('BUY')
if ta.crossunder(c[1], c[3])
sell_entry := true
alert('SELL')
plotchar(buy_entry, title='BUY', char='B', location=location.belowbar, color=color.green, offset=-1)
plotchar(sell_entry, title='SELL', char='S', location=location.abovebar, color=color.red, offset=-1)
It's important to note why I have changed to the indices to 1 and 3 with an offset of -1 in the plotchar function. This will give the exact same signals as 0 and 2 with no offset.
The difference is that you will only see the character print on the chart when the candle actually closes rather than watch it flicker on and off the chart as the close price of the incomplete candle moves.
I have measured data from MATLAB and I'm wondering how to best smooth the data?
Example data (1st colum=x-data / second-colum=y-data):
33400 209.11
34066 210.07
34732 212.3
35398 214.07
36064 215.61
36730 216.95
37396 218.27
38062 219.52
38728 220.11
39394 221.13
40060 221.4
40726 222.5
41392 222.16
42058 223.29
42724 222.77
43390 223.97
44056 224.42
44722 225.4
45388 225.32
46054 225.98
46720 226.7
47386 226.53
48052 226.61
48718 227.43
49384 227.84
50050 228.41
50716 228.57
51382 228.92
52048 229.67
52714 230.02
53380 229.54
54046 231.19
54712 231.00
55378 231.5
56044 231.5
56710 231.79
57376 232.26
58042 233.12
58708 232.65
59374 233.51
60040 234.16
60706 234.21
The data in the second column should be monoton but it isn't. How to make it smooth?
I could probably invent a short algorithm myself but I think it's a better way to use an established and proven one... do you know a good way to somehow integrate the outliners to make it a monoton curve!?
Thanks in advance
Monotone in your case is always increasing!
See the options below (1. Cobb-Douglas; 2. Quadratic; 3. Cubic)
clear all
close all
load needSmooth.dat % Your data
x=needSmooth(:,1);
y=needSmooth(:,2);
n=length(x);
% Figure 1
logX=log(x);
logY=log(y);
Y=logY;
X=[ones(n,1),logX];
B=regress(Y,X);
a=exp(B(1,1));
b=B(2,1);
figure(1)
plot(x,y,'k*')
hold
for i=1:n-1
plot([x(i,1);x(i+1,1)],[a*x(i,1)^b;a*x(i+1,1)^b],'k-')
end
%Figure 2
X=[ones(n,1),x,x.*x];
Y=y;
B=regress(Y,X);
c=B(1,1);
b=B(2,1);
a=B(3,1);
figure(2)
plot(x,y,'k*')
hold
for i=1:n-1
plot([x(i,1);x(i+1,1)],[c+b*x(i,1)+a*x(i,1)^2; c+b*x(i+1,1)+a*x(i+1,1)^2],'k-')
end
%Figure 3
X=[ones(n,1),x,x.*x,x.*x.*x];
Y=y;
B=regress(Y,X);
d=B(1,1);
c=B(2,1);
b=B(3,1);
a=B(4,1);
figure(3)
plot(x,y,'k*')
hold
for i=1:n-1
plot([x(i,1);x(i+1,1)],[d+c*x(i,1)+b*x(i,1)^2+a*x(i,1)^3; d+c*x(i+1,1)+b*x(i+1,1)^2+a*x(i+1,1)^3],'k-')
end
There are also some cooked functions in Matlab such as "smooth" and "spline" that should also work in your case since your data is almost monotone.