How can I resample only a certain type of feature? - resampling

I want to resample from a data set so that I only resample positive cases since my data is very unbalanced. Here is my code:
bootstrap=resample(Stroke['stroke']==1,replace=True,n_samples=800,random_state=1)
All I get back is a True\False array.

I got it working!
Here is the code:
bootstrap=resample(Stroke[(Stroke['stroke'] == 1)],replace=True,n_samples=800,random_state=101)

Related

(Using Julia) How can I reduce my data matrix by averaging values from the same hour?

I am trying to reduce the size of my data and I cannot make it work. I have data points taken every minute over 1 month. I want to reduce this data to have one sample for every hour. The problem is: Some of my runs have "NA" value, so I delete these rows. There is not exactly 60 points for every hour - it varies.
I have a 'Timestamp' column. I have used this to make a 'datehour' column which has the same value if the data set has the same date and hour. I want to average all the values with the same 'datehour' value.
How can I do this? I have tried using the if and for loop below, but it takes so long to run.
Thanks for all your help! I am new to Julia and come from a Matlab background.
======= CODE ==========
uniquedatehour=unique(datehour,1)
index=[]
avedata=reshape([],0,length(alldata[1,:]))
for j in uniquedatehour
for i in 1:length(datehour)
if datehour[i]==j
index=vcat(index,i)
else
rows=alldata[index,:]
rows=convert(Array{Float64,2},rows)
avehour=mean(rows,1)
avedata=vcat(avedata,avehour)
index=[]
continue
end
end
end
There are several layers to optimizing this code. I am assuming that your data is sorted on datehour (your code assumes this).
Layer one: general recommendation
Wrap your code in a function. Executing code in global scope in Julia is much slower than within a function. By wrapping it make sure to either pass data to your function as arguments or if data is in global scope it should be qualified with const;
Layer two: recommendations to your algorithm
Statement like [] creates an array of type Any which is slow, you should use type qualifier like index=Int[] to make it fast;
Using vcat like index=vcat(index,i) is inefficient, it is better to do push!(index, i) in place;
It is better to preallocate avedata with e.g. fill(NA, length(uniquedatehour), size(alldata, 2)) and assign values to an existing matrix than to do vcat on it;
Your code will produce incorrect results if I am not mistaken as it will not catch the last entry of uniquedatehour vector (assume it has only one element and check what happens - avedata will have zero rows)
Line rows=convert(Array{Float64,2},rows) is probably not needed at all. If alldata is not Matrix{Float64} it is better to convert it at the beginning with Matrix{Float64}(alldata);
You can change line rows=alldata[index,:] to a view like view(alldata, index, :) to avoid allocation;
In general you can avoid creation of index vector as it is enough that you remember start s and end e position of the range of the same values and then use range s:e to select rows you want.
If you correct those things please post your updated code and maybe I can help further as there is still room for improvement but requires a bit different algorithmic approach (but maybe you will prefer option below for simplicity).
Layer three: how I would do it
I would use DataFrames package to handle this problem like this:
using DataFrames
df = DataFrame(alldata) # assuming alldata is Matrix{Float64}, otherwise convert it here
df[:grouping] = datehour
agg = aggregate(df, :grouping, mean) # maybe this is all what you need if DataFrame is OK for you
Matrix(agg[2:end]) # here is how you can convert DataFrame back to a matrix
This is not the fastest solution (as it converts to a DataFrame and back but it is much simpler for me).

MATLAB ConnectedComponentLabeler does not work in for loop

I am trying to get a set of binary images' eccentricity and solidity values using the regionprops function. I obtain the label matrix using the vision.ConnectedComponentLabeler function.
This is the code I have so far:
files = getFiles('images');
ecc = zeros(length(files)); %eccentricity values
sol = zeros(length(files)); %solidity values
ccl = vision.ConnectedComponentLabeler;
for i=1:length(files)
I = imread(files{i});
[L NUM] = step(ccl, I);
for j=1:NUM
L = changem(L==j, 1, j); %*
end
stats = regionprops(L, 'all');
ecc(i) = stats.Eccentricity;
sol(i) = stats.Solidity;
end
However, when I run this, I get an error says indicating the line marked with *:
Error using ConnectedComponentLabeler/step
Variable-size input signals are not supported when the OutputDataType property is set to 'Automatic'.'
I do not understand what MATLAB is talking about and I do not have any idea about how to get rid of it.
Edit
I have returned back to bwlabel function and have no problems now.
The error is a bit hard to understand, but I can explain what exactly it means. When you use the CVST Connected Components Labeller, it assumes that all of your images that you're going to use with the function are all the same size. That error happens because it looks like the images aren't... hence the notion about "Variable-size input signals".
The "Automatic" property means that the output data type of the images are automatic, meaning that you don't have to worry about whether the data type of the output is uint8, uint16, etc. If you want to remove this error, you need to manually set the output data type of the images produced by this labeller, or the OutputDataType property to be static. Hopefully, the images in the directory you're reading are all the same data type, so override this field to be a data type that this function accepts. The available types are uint8, uint16 and uint32. Therefore, assuming your images were uint8 for example, do this before you run your loop:
ccl = vision.ConnectedComponentLabeler;
ccl.OutputDataType = 'uint8';
Now run your code, and it should work. Bear in mind that the input needs to be logical for this to have any meaningful output.
Minor comment
Why are you using the CVST Connected Component Labeller when the Image Processing Toolbox bwlabel function works exactly the same way? As you are using regionprops, you have access to the Image Processing Toolbox, so this should be available to you. It's much simpler to use and requires no setup: http://www.mathworks.com/help/images/ref/bwlabel.html

Opencv cvSaveImage

I am trying to save an image using opencv cvSaveImage function. The problem is that I am performing a DCT on the image and then changing the coefficients that are obtained after performing the DCT, after that I am performing an inverse DCT to get back the pixel values. But this time I get the pixel values in Decimals(e.g. 254.34576). So when I save this using cvSaveImage function it discards all the values after decimals(e.g. saving 254.34576 as 254) and saves the image. Due to this my result gets affected. Please Help
"The function cvSaveImage saves the image to the specified file. The image format is chosen depending on the filename extension, see cvLoadImage. Only 8-bit single-channel or 3-channel (with 'BGR' channel order) images can be saved using this function. If the format, depth or channel order is different, use cvCvtScale and cvCvtColor to convert it before saving, or use universal cvSave to save the image to XML or YAML format."
I'd suggest investigating the cvSave function.
HOWEVER, a much easier way is to just write your own save/load functions, this would be very easy:
f = fopen("image.dat","wb");
fprintf(f,"%d%d",width,height);
for (y=0 to height)
for (x=0 to width)
fprintf(f,"%f",pixelAt(x,y));
And a corresponding mirror function for reading.
P.S. Early morning and I can't remember for the life of me if fprintf works with binary files. But you get the idea. You could use fwrite() instead.

MATLAB - image huffman encoding

I have a homework in which i have to convert some images to grayscale and compress them using huffman encoding. I converted them to grayscale and then i tried to compress them but i get an error. I used the code i found here.
Here is the code i'm using:
A=imread('Gray\36.png');
[symbols,p]=hist(A,unique(A))
p=p/sum(p)
[dict,avglen]=huffmandict(symbols,p)
comp=huffmanenco(A,dict)
This is the error i get. It occurs at the second line.
Error using eps
Class must be 'single' or 'double'.
Error in hist (line 90)
bins = xx + eps(xx);
What am i doing wrong?
Thanks.
P.S. how can i find the compression ratio for each image?
The problem is that when you specify the bin locations (the second input argument of 'hist'), they need to be single or double. The vector A itself does not, though. That's nice because sometimes you don't want to convert your whole dataset from an integer type to floating precision. This will fix your code:
[symbols,p]=hist(A,double(unique(A)))
Click here to see this issue is discussed more in detail.
first, try :
whos A
Seems like its type must be single or double. If not, just do A = double(A) after the imread line. Should work that way, however I'm surprised hist is not doing the conversion...
[EDIT] I have just tested it, and I am right, hist won't work in uint8, but it's okay as soon as I convert my image to double.

Jxl and maximum number of formatted cells

When I'm writing Excel files with jxl and use your own cell format, I get this warning: The maximum number of formatted cells has exprired. Using default format". I have about 350 cells that need to be formatted, which seems relatively little to me. Am I doing something wrong? I use loops to set my cell format. Or is there any way to increase the number of formatted cells? My whole code this long, but here is a simple example of doing formatting:
for (int=0;i<30;i++) }
ws.getWritableCell(2, i).setCellFormat(sumrow());
{
How are you creating the CellFormat objects?
What you want to do is to make sure you are reusing the CellFormat objects and not recreating them in a loop somewhere.
That is unless you really have 350 cells that each have a different formatting. Otherwise create a single CellFormat object and pass that into setCellFormat.
Make the NumberFormat to EXPONENTIAL. It worked for me. Like this:-
NumberFormats.EXPONENTIAL

Resources