I posted a similar question recently, but I think I overcomplicated the question as I hadn't narrowed down the issue quite yet. Here is the context:
I am currently creating a custom module inside an existing CNN using pytorch. I am doing this as part of my schools research, and so I have access to a super computer with multiple GPU devices. When I train my model, after running through the first validation set I run into the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__cudnn_convolution)
Now, this does not happen when I skip the convolution in my module and just put some dummy calculation in. This error also doesn't raise when I use a single gpu or a cpu. Here is my custom module. The convolution takes place in the 'conv_gauss' function at the very bottom:
class DivisiveNormBlock(nn.Module):
def __init__(self, channel_num = 512, size = 56, ksize = 4):
super().__init__()
self.channel_num = channel_num
self.size = size
self.ksize = ksize
scale = 90 # Random scale factor I've been playing with.
self.theta = torch.nn.Parameter(scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda",
requires_grad=True))) # 512 thetas for a channel, 512 channels, same goes for...
self.p = torch.nn.Parameter(
scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.sig = torch.nn.Parameter(
scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.a = torch.nn.Parameter(
scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.nI = torch.nn.Parameter(
torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.nU = torch.nn.Parameter(torch.abs(torch.randn(self.channel_num, device="cuda", requires_grad=True)))
self.bias = torch.nn.Parameter(torch.abs(torch.randn(self.channel_num, device="cuda", requires_grad=True)))
self.gaussian_bank = torch.zeros(self.channel_num, self.channel_num, self.ksize * 2+ 1, self.ksize * 2+ 1,
device="cuda")
self.x = torch.linspace(-self.ksize, self.ksize, self.ksize * 2 + 1, device="cuda")
self.y = torch.linspace(-self.ksize, self.ksize, self.ksize * 2 + 1, device="cuda")
self.xv, self.yv = torch.meshgrid(self.x, self.y)
for i in range(self.channel_num):
for u in range(self.channel_num):
self.gaussian_bank[i, u, :, :] = self.get_gaussian(i, u)
p = int((self.ksize * 2) / 2)
conv_kernel_size = self.ksize * 2+ 1
self.conv = nn.Conv2d(self.channel_num, self.channel_num, padding=p, stride=1,
kernel_size=conv_kernel_size, bias=False)
def get_gaussian(self, cc, oc): #
xrot = self.xv * torch.cos(self.theta[cc, oc]) + self.yv * torch.sin(self.theta[cc, oc])
yrot = -self.xv * torch.sin(self.theta[cc, oc]) + self.yv * torch.cos(self.theta[cc, oc])
g_kernel = (self.a[cc, oc] / \
(2 * torch.pi * self.p[cc, oc] * self.sig[cc, oc])) * \
torch.exp(-0.5 * ((((xrot) ** 2) / self.p[cc, oc] ** 2) + ((yrot) ** 2) / self.sig[cc, oc] ** 2))
return g_kernel
def forward(self, x):
x_test = self.dn_f(x)
return x
def dn_f(self, x):
batch_size = x.shape[0]
under_sum = torch.zeros((self.channel_num, self.size, self.size), device="cuda")
normalized_channels = torch.zeros((batch_size, self.channel_num, self.size, self.size), device="cuda")
for b in tqdm(range(batch_size)):
for i in range(self.channel_num):
for u in range(self.channel_num):
under_sum[u] = self.conv_gauss(torch.pow(x[b, i], self.nI[i, u]), self.gaussian_bank[i, u])
#under_sum[u] = under_sum[u]
normalized_channels[b, i] = torch.pow(x[b, i], self.nU[i]) / (
torch.pow(self.bias[i], self.nU[i]) + torch.sum(under_sum, 0))
return normalized_channels
def conv_gauss(self, x_conv, gauss_conv):
x_conv = torch.reshape(x_conv, (1, 1, self.size, self.size))
gauss_conv = torch.reshape(gauss_conv, (1, 1, self.ksize * 2+ 1, self.ksize * 2+ 1))
p = int((self.ksize*2)/2)
self.conv.weight = nn.Parameter(gauss_conv)
output = self.conv(x_conv)
output = torch.reshape(output, (self.size, self.size))
return output
For some extra context, I also did tried this with F.conv2d in the function instead, but to no avail.
Related
I want to convert RAW image data (RGGB) to sRGB image. There are many specialized ways to do this but to first understand the basics, I've implemented some easy alogrithms like debayering by resolution-reduction.
My current pipeline is:
Rescale the u16 input data by blacklevel and whitelevel
Apply white balance coefficents
Debayer with size reduction, average for G: g=((g0+g1)/2)
Calculate pseudo-inverse for D65 illuminant XYZ_TO_CAM (from Adobe DNG)
Convert debayered RGB data to XYZ by CAM_TO_XYZ
Convert XYZ to D65 sRGB (matrix taken from Bruce Lindbloom)
Apply gamma correction (simple routine for now, should be replaced by sRGB gamma)
Rescale from [minval..maxval] to [0..1] and convert f32 to u16
Save as tiff
The problem is that if I skip the white balance coefficent multiplication (or just replace them by 1.0) the output image already looks acceptable. If I apply the coefficents (taken from AsShot in DNG) the output has a huge color cast. And I'm not sure if I have to multiply by coef or 1/coef.
The first image is the result of the pipeline with wb_coefs set to 1.0.
The second image is the result with the "correct" wb_coefs.
What is wrong in my pipeline?
Additional question:
I'm not sure about the rescaling process. Do I've to rescale into [0..1] after every step or is it enough to rescale during u16 conversion as final stage?
Full code:
macro_rules! max {
($x: expr) => ($x);
($x: expr, $($z: expr),+) => {{
let y = max!($($z),*);
if $x > y {
$x
} else {
y
}
}}
}
macro_rules! min {
($x: expr) => ($x);
($x: expr, $($z: expr),+) => {{
let y = min!($($z),*);
if $x < y {
$x
} else {
y
}
}}
}
/// sRGB D65
const XYZD65_TO_SRGB: [[f32; 3]; 4] = [
[3.2404542, -1.5371385, -0.4985314],
[-0.9692660, 1.8760108, 0.0415560],
[0.0556434, -0.2040259, 1.0572252],
[0.0, 0.0, 0.0],
];
// buf: RAW image data
fn to_srgb(buf: &Vec<u16>, width: usize, height: usize) {
let w = width / 2;
let h = height / 2;
let blacklevel: [u16; 4] = [511, 511, 511, 511];
let whitelevel: [u16; 4] = [12735, 12735, 12735, 12735];
let xyz2cam_d65: [[i32; 3]; 4] = [[6722, -635, -963], [-4287, 12460, 2028], [-908, 2162, 5668], [0, 0, 0]];
let cam2xyz = convert_matrix::<4>(xyz2cam_d65);
eprintln!("CAM_TO_XYZ: {:?}", cam2xyz);
// from DNG
// As Shot Neutral: 0.518481 1 0.545842
//let wb_coef = [1.0/0.518481, 1.0, 1.0, 1.0/0.545842];
//let wb_coef = [0.518481, 1.0, 1.0, 0.545842];
let wb_coef = [1.0, 1.0, 1.0, 1.0];
// b/w level correction, rescale, debayer
let mut rgb = vec![0.0_f32; width / 2 * height / 2 * 3];
for row in 0..h {
for col in 0..w {
let r0 = buf[(row * 2 + 0) * width + (col * 2) + 0];
let g0 = buf[(row * 2 + 0) * width + (col * 2) + 1];
let g1 = buf[(row * 2 + 1) * width + (col * 2) + 0];
let b0 = buf[(row * 2 + 1) * width + (col * 2) + 1];
let r0 = ((r0.saturating_sub(blacklevel[0])) as f32 / (whitelevel[0] - blacklevel[0]) as f32) * wb_coef[0];
let g0 = ((g0.saturating_sub(blacklevel[1])) as f32 / (whitelevel[1] - blacklevel[1]) as f32) * wb_coef[1];
let g1 = ((g1.saturating_sub(blacklevel[2])) as f32 / (whitelevel[2] - blacklevel[2]) as f32) * wb_coef[2];
let b0 = ((b0.saturating_sub(blacklevel[3])) as f32 / (whitelevel[3] - blacklevel[3]) as f32) * wb_coef[3];
rgb[row * w * 3 + (col * 3) + 0] = r0;
rgb[row * w * 3 + (col * 3) + 1] = (g0 + g1) / 2.0;
rgb[row * w * 3 + (col * 3) + 2] = b0;
}
}
// Convert to XYZ by CAM_TO_XYZ from D65 illuminant
let mut xyz = vec![0.0_f32; w * h * 3];
for row in 0..h {
for col in 0..w {
let r = rgb[row * w * 3 + (col * 3) + 0];
let g = rgb[row * w * 3 + (col * 3) + 1];
let b = rgb[row * w * 3 + (col * 3) + 2];
xyz[row * w * 3 + (col * 3) + 0] = cam2xyz[0][0] * r + cam2xyz[0][1] * g + cam2xyz[0][2] * b;
xyz[row * w * 3 + (col * 3) + 1] = cam2xyz[1][0] * r + cam2xyz[1][1] * g + cam2xyz[1][2] * b;
xyz[row * w * 3 + (col * 3) + 2] = cam2xyz[2][0] * r + cam2xyz[2][1] * g + cam2xyz[2][2] * b;
}
}
// Track min/max value for rescaling/clipping
let mut maxval = 1.0;
let mut minval = 0.0;
// Convert to sRGB from XYZ
let mut srgb = vec![0.0; w * h * 3];
for row in 0..h {
for col in 0..w {
let r = xyz[row * w * 3 + (col * 3) + 0] as f32;
let g = xyz[row * w * 3 + (col * 3) + 1] as f32;
let b = xyz[row * w * 3 + (col * 3) + 2] as f32;
srgb[row * w * 3 + (col * 3) + 0] = XYZD65_TO_SRGB[0][0] * r + XYZD65_TO_SRGB[0][1] * g + XYZD65_TO_SRGB[0][2] * b;
srgb[row * w * 3 + (col * 3) + 1] = XYZD65_TO_SRGB[1][0] * r + XYZD65_TO_SRGB[1][1] * g + XYZD65_TO_SRGB[1][2] * b;
srgb[row * w * 3 + (col * 3) + 2] = XYZD65_TO_SRGB[2][0] * r + XYZD65_TO_SRGB[2][1] * g + XYZD65_TO_SRGB[2][2] * b;
let r = srgb[row * w * 3 + (col * 3) + 0];
let g = srgb[row * w * 3 + (col * 3) + 1];
let b = srgb[row * w * 3 + (col * 3) + 2];
maxval = max!(maxval, r, g, b);
minval = min!(minval, r, g, b);
}
}
gamma_corr(&mut srgb, w, h, 2.2);
let mut output = vec![0_u16; w * h * 3];
for row in 0..h {
for col in 0..w {
let r = srgb[row * w * 3 + (col * 3) + 0];
let g = srgb[row * w * 3 + (col * 3) + 1];
let b = srgb[row * w * 3 + (col * 3) + 2];
output[row * w * 3 + (col * 3) + 0] = (clip(r, minval, maxval) * (u16::MAX as f32)) as u16;
output[row * w * 3 + (col * 3) + 1] = (clip(g, minval, maxval) * (u16::MAX as f32)) as u16;
output[row * w * 3 + (col * 3) + 2] = (clip(b, minval, maxval) * (u16::MAX as f32)) as u16;
}
}
let img = DynamicImage::ImageRgb16(ImageBuffer::from_raw(w as u32, h as u32, output).unwrap());
img.save_with_format("/tmp/test.tif", image::ImageFormat::Tiff).unwrap();
}
fn pseudoinverse<const N: usize>(matrix: [[f32; 3]; N]) -> [[f32; 3]; N] {
let mut result: [[f32; 3]; N] = [Default::default(); N];
let mut work: [[f32; 6]; 3] = [Default::default(); 3];
let mut num: f32 = 0.0;
for i in 0..3 {
for j in 0..6 {
work[i][j] = if j == i + 3 { 1.0 } else { 0.0 };
}
for j in 0..3 {
for k in 0..N {
work[i][j] += matrix[k][i] * matrix[k][j];
}
}
}
for i in 0..3 {
num = work[i][i];
for j in 0..6 {
work[i][j] /= num;
}
for k in 0..3 {
if k == i {
continue;
}
num = work[k][i];
for j in 0..6 {
work[k][j] -= work[i][j] * num;
}
}
}
for i in 0..N {
for j in 0..3 {
result[i][j] = 0.0;
for k in 0..3 {
result[i][j] += work[j][k + 3] * matrix[i][k];
}
}
}
result
}
fn convert_matrix<const N: usize>(adobe_xyz_to_cam: [[i32; 3]; N]) -> [[f32; N]; 3] {
let mut xyz_to_cam: [[f32; 3]; N] = [[0.0; 3]; N];
let mut cam_to_xyz: [[f32; N]; 3] = [[0.0; N]; 3];
for i in 0..N {
for j in 0..3 {
xyz_to_cam[i][j] = adobe_xyz_to_cam[i][j] as f32 / 10000.0;
}
}
eprintln!("XYZ_TO_CAM: {:?}", xyz_to_cam);
let inverse = pseudoinverse::<N>(xyz_to_cam);
for i in 0..3 {
for j in 0..N {
cam_to_xyz[i][j] = inverse[j][i];
}
}
cam_to_xyz
}
fn clip(v: f32, minval: f32, maxval: f32) -> f32 {
(v + minval.abs()) / (maxval + minval.abs())
}
// https://kosinix.github.io/raster/docs/src/raster/filter.rs.html#339-359
fn gamma_corr(rgb: &mut Vec<f32>, w: usize, h: usize, gamma: f32) {
for row in 0..h {
for col in 0..w {
let r = rgb[row * w * 3 + (col * 3) + 0];
let g = rgb[row * w * 3 + (col * 3) + 1];
let b = rgb[row * w * 3 + (col * 3) + 2];
rgb[row * w * 3 + (col * 3) + 0] = r.powf(1.0 / gamma);
rgb[row * w * 3 + (col * 3) + 1] = g.powf(1.0 / gamma);
rgb[row * w * 3 + (col * 3) + 2] = b.powf(1.0 / gamma);
}
}
}
The DNG for this example can be found at: https://chaospixel.com/pub/misc/dng/sample.dng (~40 MiB).
The main reason for getting wrong colors is that we have to normalize the rows of rgb2cam matrix to 1, as described in the following guide.
According to DNG spec:
ColorMatrix1 defines a transformation matrix that converts XYZ values to reference camera native color space values, under the first calibration illuminant.
It means that if the calibration illuminant is D65, the ColorMatrix converts XYZ to "camera RGB".
(Convert it as is, without using any white balance scaling coefficients).
The inverse ColorMatrix, converts from "camera RGB" to XYZ.
After converting XYZ to sRGB, the result is color balanced sRGB.
The conclusions is that ColorMatrix includes the while balance coefficients in it (the white balancing coefficients apply D65 illuminant).
Normalizing the rows of rgb2cam to 1 neutralizes the while balance coefficients, and keeps only the "Color Correction Matrix" (the math is a bit complicated).
Without normalizing the rows, we are scaling by while balance multipliers two times:
Scale coefficients from ColorMatrix that balances the input to D65.
Scale coefficients taken from AsShotNatural that balances the input to the illuminant of the scene (illuminant of the scene is close to D65).
The result of scaling twice is an extreme color cast.
Tracking the maximum in order to avoid "magenta cast in the highlights":
Instead of tracking the actual maximum color values in the input image, we suppose to track the "theoretical maximum color value".
Take whitelevel - blacklevel and scale by the white balance multipliers.
Track the result...
The guiding rule is that the colors supposed to be the same in both cases:
Applying the processing to small patches of the image, and places the patches together (where we can't track the global minimum and maximum).
Applying the processing to the entire image.
I suppose you have to track the maximum of scaled whitelevel - blacklevel, only when white balance multipliers are less than 1.
When all the multipliers are 1 or above, we can clip the result to 1.0, without tracking the maximum.
Note:
there is probably an advantage of scaling down, and tracking the maximum, but I don't know this subject.
In my solution we just multiply upper (above 1.0), and clip the result.
The solution is based on Processing RAW Images in MATLAB guide.
I am posting both MATLAB implementation and Python implementation (but no Rust implementation).
The first step is extracting the raw Bayer image from sample.dng using dcraw command line:
dcraw -4 -D -T sample.dng
Rename the tiff output to sample__lin_bayer.tif.
Conversion process:
Rescale the uint16 input data by blacklevel and whitelevel (subtract blacklevel from all the pixels and scale by whitelevel - blacklevel).
Apply white balance scaling coefficients.
The scaling coefficients equals 1./AsShotNatural.
Scale the red pixels in the Bayer alignment by the red scaling coefficient, scale the greens by the green scaling, and the blues by the blue scaling.
Assumption: the minimum scaling is 1.0 and the others are above 1.0 (we my divide by the minimum scaling to make sure).
Clip the scaled result to [0, 1] (clipping is required due to demosaic implementation limitations).
Demosaicing (Debayer) using MATLAB demosaic function or cv2.cvtColor in Python.
Calculate rgb2cam matrix: rgb2cam = ColorMatrix * rgb2xyz.
rgb2xyz matrix is taken from Bruce Lindbloom site.
Normalize rows of rgb2cam matrix so the sum of each row equals 1 (divide each row by the sum of the row).
Compute cam2rgb matrix by inverting rgb2cam: cam2rgb = inv(rgb2cam).
cam2rgb is the "CCM matrix" (Color Correction Matrix).
Left multiply matrix cam2rgb by each RGB tuple (apply color correction).
Apply gamma correction (use sRGB standard gamma).
Convert to uint8 and save as PNG (PNG format is used for posting in SO website).
MATLAB Implementation:
filename = 'sample__lin_bayer.tif'; % Output of: dcraw -4 -D -T sample.dng
% Exif information:
blacklevel = 511; % blacklevel = meta_info.SubIFDs{1}.BlackLevel(1);
whitelevel = 12735; % whitelevel = meta_info.SubIFDs{1}.WhiteLevel;
AsShotNeutral = [0.5185 1 0.5458];
ColorMatrix = [ 0.6722 -0.0635 -0.0963
-0.4287 1.2460 0.2028
-0.0908 0.2162 0.5668];
bayer_type = 'rggb';
% Constant matrix for converting sRGB to XYZ(D65):
% http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html
rgb2xyz = [0.4124564 0.3575761 0.1804375
0.2126729 0.7151522 0.0721750
0.0193339 0.1191920 0.9503041];
% Read input image (Bayer mosaic alignment, pixels data type is uint16):
raw = imread(filename);
% "Linearizing":
% There is no LinearizationTable so we only have to subtract the black level.
% Convert from range [blacklevel, whitelevel] to range [0, 1].
lin_bayer = (double(raw) - blacklevel) / (whitelevel - blacklevel);
lin_bayer = max(0, min(lin_bayer, 1));
% The White Balance multipliers are 1./AsShotNeutral
wb_multipliers = 1./AsShotNeutral;
r_scale = wb_multipliers(1); % Assume value is above 1
g_scale = wb_multipliers(2); % Assume value = 1
b_scale = wb_multipliers(3); % Assume value is above 1
% Bayer alignment is RGGB:
% R G
% G B
%
% Apply white balancing to linear Bayer image.
balanced_bayer = lin_bayer;
balanced_bayer(1:2:end, 1:2:end) = balanced_bayer(1:2:end, 1:2:end)*r_scale; % Red (indices [1, 3, 5,... ; 1, 3, 5,... ])
balanced_bayer(1:2:end, 2:2:end) = balanced_bayer(1:2:end, 2:2:end)*g_scale; % Green (indices [1, 3, 5,... ; 2, 4, 6,... ])
balanced_bayer(2:2:end, 1:2:end) = balanced_bayer(2:2:end, 1:2:end)*g_scale; % Green (indices [2, 4, 6,... ; 1, 3, 5,... ])
balanced_bayer(2:2:end, 2:2:end) = balanced_bayer(2:2:end, 2:2:end)*b_scale; % Blue (indices [2, 4, 6,... ; 2, 4, 6,... ])
% Clip to range [0, 1] for avoiding "pinkish highlights" (avoiding "magenta cast" in the highlights).
balanced_bayer = min(balanced_bayer, 1);
% Demosaicing
temp = uint16(balanced_bayer*(2^16-1)); % Convert from double to uint16, because MATLAB demosaic() function requires a uint8 or uint16 input.
lin_rgb = double(demosaic(temp, bayer_type))/(2^16-1); % Apply Demosaicing and convert range back type double and range [0, 1].
% Color Space Conversion
xyz2cam = ColorMatrix; % ColorMatrix applies XYZ(D65) to CAM_rgb
rgb2cam = xyz2cam * rgb2xyz;
% Result:
% rgb2cam = [0.2619 0.1835 0.0252
% 0.0921 0.7620 0.2053
% 0.0195 0.1897 0.5379]
% Normalize rows to 1. MATLAB shortcut: rgb2cam = rgb2cam ./ repmat(sum(rgb2cam,2),1,3);
rows_sum = sum(rgb2cam, 2);
% Result:
% rows_sum = [0.4706
% 1.0593
% 0.7470]
% Divide element of every row by the sum of the row:
rgb2cam(1, :) = rgb2cam(1, :) / rows_sum(1); % Divide top row
rgb2cam(2, :) = rgb2cam(2, :) / rows_sum(2); % Divide center row
rgb2cam(3, :) = rgb2cam(3, :) / rows_sum(3); % Divide bottom row
% Result (sum of every row is 1):
% rgb2cam = [0.5566 0.3899 0.0535
% 0.0869 0.7193 0.1938
% 0.0261 0.2539 0.7200]
cam2rgb = inv(rgb2cam); % Invert matrix
% Result:
% cam2rgb = [ 1.9644 -1.1197 0.1553
% -0.2412 1.6738 -0.4326
% 0.0139 -0.5498 1.5359]
R = lin_rgb(:, :, 1);
G = lin_rgb(:, :, 2);
B = lin_rgb(:, :, 3);
% Left multiply matrix cam2rgb by each RGB tuple (convert from "camera RGB" to "linear sRGB").
sR = cam2rgb(1,1)*R + cam2rgb(1,2)*G + cam2rgb(1,3)*B;
sG = cam2rgb(2,1)*R + cam2rgb(2,2)*G + cam2rgb(2,3)*B;
sB = cam2rgb(3,1)*R + cam2rgb(3,2)*G + cam2rgb(3,3)*B;
lin_srgb = cat(3, sR, sG, sB);
lin_srgb = max(min(lin_srgb, 1), 0); % Clip to range [0, 1]
% Convet from "Linear sRGB" to sRGB (apply gamma)
sRGB = lin2rgb(lin_srgb); % lin2rgb MATLAB functions uses the exact formula [you may approximate it to power of (1/gamma)].
% Show the result, and save to sRGB.png
figure;imshow(sRGB);impixelinfo;title('sRGB');
imwrite(im2uint8(sRGB), 'sRGB.png');
% Inverting 3x3 matrix (some help of MATLAB Symbolic Toolbox):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Assume:
% A = [ a11, a12, a13]
% [ a21, a22, a23]
% [ a31, a32, a33]
%
% 1. Compute determinant of A:
% detA = a11*a22*a33 - a11*a23*a32 - a12*a21*a33 + a12*a23*a31 + a13*a21*a32 - a13*a22*a31
%
% 2. Compute the inverse of the matrix A:
% invA = [ (a22*a33 - a23*a32)/detA, -(a12*a33 - a13*a32)/detA, (a12*a23 - a13*a22)/detA
% -(a21*a33 - a23*a31)/detA, (a11*a33 - a13*a31)/detA, -(a11*a23 - a13*a21)/detA
% (a21*a32 - a22*a31)/detA, -(a11*a32 - a12*a31)/detA, (a11*a22 - a12*a21)/detA]
Python implementation:
import numpy as np
import cv2
def lin2rgb(im):
""" Convert im from "Linear sRGB" to sRGB - apply Gamma. """
# sRGB standard applies gamma = 2.4, Break Point = 0.00304 (and computed Slope = 12.92)
g = 2.4
bp = 0.00304
inv_g = 1/g
sls = 1 / (g/(bp**(inv_g - 1)) - g*bp + bp)
fs = g*sls / (bp**(inv_g - 1))
co = fs*bp**(inv_g) - sls*bp
srgb = im.copy()
srgb[im <= bp] = sls * im[im <= bp]
srgb[im > bp] = np.power(fs*im[im > bp], inv_g) - co
return srgb
filename = 'sample__lin_bayer.tif' # Output of: dcraw -4 -D -T sample.dng
# Exif information:
blacklevel = 511
whitelevel = 12735
AsShotNeutral = np.array([0.5185, 1, 0.5458])
ColorMatrix = np.array([[ 0.6722, -0.0635, -0.0963],
[-0.4287, 1.2460, 0.2028],
[-0.0908, 0.2162, 0.5668]])
# bayer_type = 'rggb'
# Constant matrix for converting sRGB to XYZ(D65):
# http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html
rgb2xyz = np.array([[0.4124564, 0.3575761, 0.1804375],
[0.2126729, 0.7151522, 0.0721750],
[0.0193339, 0.1191920, 0.9503041]])
# Read input image (Bayer mosaic alignement, pixeles data type is np.uint16):
raw = cv2.imread(filename, cv2.IMREAD_UNCHANGED)
# "Linearizing":
# There is no LinearizationTable so we only have to subtract the black level.
# Convert from range [blacklevel, whitelevel] to range [0, 1] (convert type to np.float64).
lin_bayer = (raw.astype(np.float64) - blacklevel) / (whitelevel - blacklevel)
lin_bayer = lin_bayer.clip(0, 1)
# The White Balance multipliers are 1./AsShotNeutral
wb_multipliers = 1 / AsShotNeutral
r_scale = wb_multipliers[0] # Assume value is above 1
g_scale = wb_multipliers[1] # Assume value = 1
b_scale = wb_multipliers[2] # Assume value is above 1
# Bayer alignment is RGGB:
# R G
# G B
#
# Apply white balancing to linear Bayer image.
balanced_bayer = lin_bayer.copy()
balanced_bayer[0::2, 0::2] = balanced_bayer[0::2, 0::2]*r_scale # Red (indices [0, 2, 4,... ; 0, 2, 4,... ])
balanced_bayer[0::2, 1::2] = balanced_bayer[0::2, 1::2]*g_scale # Green (indices [0, 2, 4,... ; 1, 3, 5,... ])
balanced_bayer[1::2, 0::2] = balanced_bayer[1::2, 0::2]*g_scale # Green (indices [1, 3, 5,... ; 0, 2, 4,... ])
balanced_bayer[1::2, 1::2] = balanced_bayer[1::2, 1::2]*b_scale # Blue (indices [1, 3, 5,... ; 0, 2, 4,... ])
# Clip to range [0, 1] for avoiding "pinkish highlights" (avoiding "magenta cast" in the highlights).
balanced_bayer = np.minimum(balanced_bayer, 1)
# Demosaicing:
temp = np.round((balanced_bayer*(2**16-1))).astype(np.uint16) # Convert from double to np.uint16, because OpenCV demosaic() function requires a uint8 or uint16 input.
lin_rgb = cv2.cvtColor(temp, cv2.COLOR_BayerBG2RGB).astype(np.float64)/(2**16-1) # Apply Demosaicing and convert back to np.float64 in range [0, 1] (is there a bug in OpenCV Bayer naming?).
# Color Space Conversion
xyz2cam = ColorMatrix # ColorMatrix applies XYZ(D65) to CAM_rgb
rgb2cam = xyz2cam # rgb2xyz
# Result:
# rgb2cam = [0.2619 0.1835 0.0252
# 0.0921 0.7620 0.2053
# 0.0195 0.1897 0.5379]
# Normalize rows to 1. MATLAB shortcut: rgb2cam = rgb2cam ./ repmat(sum(rgb2cam,2),1,3);
rows_sum = np.sum(rgb2cam, 1)
# Result:
# rows_sum = [0.4706
# 1.0593
# 0.7470]
# Divide element of every row by the sum of the row:
rgb2cam[0, :] = rgb2cam[0, :] / rows_sum[0] # Divide top row
rgb2cam[1, :] = rgb2cam[1, :] / rows_sum[1] # Divide center row
rgb2cam[2, :] = rgb2cam[2, :] / rows_sum[2] # Divide bottom row
# Result (sum of every row is 1):
# rgb2cam = [0.5566 0.3899 0.0535
# 0.0869 0.7193 0.1938
# 0.0261 0.2539 0.7200]
cam2rgb = np.linalg.inv(rgb2cam) # Invert matrix
# Result:
# cam2rgb = [ 1.9644 -1.1197 0.1553
# -0.2412 1.6738 -0.4326
# 0.0139 -0.5498 1.5359]
r = lin_rgb[:, :, 0]
g = lin_rgb[:, :, 1]
b = lin_rgb[:, :, 2]
# Left multiply matrix cam2rgb by each RGB tuple (convert from "camera RGB" to "linear sRGB").
sr = cam2rgb[0, 0]*r + cam2rgb[0, 1]*g + cam2rgb[0, 2]*b
sg = cam2rgb[1, 0]*r + cam2rgb[1, 1]*g + cam2rgb[1, 2]*b
sb = cam2rgb[2, 0]*r + cam2rgb[2, 1]*g + cam2rgb[2, 2]*b
lin_srgb = np.dstack([sr, sg, sb])
lin_srgb = lin_srgb.clip(0, 1) # Clip to range [0, 1]
# Convert from "Linear sRGB" to sRGB (apply gamma)
sRGB = lin2rgb(lin_srgb) # lin2rgb MATLAB functions uses the exact formula [you may approximate it to power of (1/gamma)].
# Save to sRGB.png
cv2.imwrite('sRGB.png', cv2.cvtColor((sRGB*255).astype(np.uint8), cv2.COLOR_RGB2BGR))
Results (downscaled):
Result of RawTherapee (all enhancements are disabled):
MATLAB result:
Python result:
Note:
The result looks dark due to low exposure (and because we didn't apply any brightness correction).
I have trained different number of layers in CNN+LSTM encoder and decoder model with attention.
The problem I am facing is very strange to me. The validation loss is fluctuating around 3.***. As we can see from the below loss graphs. I have 3 CNN layer+1 layer BLSTM at encoder and 1 LSTM at decoder
3 layer CNN+2 layers of BLSTM at encoder and 1 layer LSTM at encoder
I have also tried weight decay from 0.1 to 0.000001. But still I am getting this type of loss graphs. Note that the Accuracy of the model is increasing on both validation and trainset. How is it possible that validation loss is still around 3 but accuracy is increasing? Can someone explain this?
Thanks
`
class Encoder(nn.Module):
def init(self,height, width, enc_hid_dim, dec_hid_dim, dropout):
super().init()
self.height= height
self.enc_hid_dim=enc_hid_dim
self.width=width
self.layer0 = nn.Sequential(
nn.Conv2d(1, 8, kernel_size=(3,3),stride =(1,1), padding=1),
nn.ReLU(),
nn.BatchNorm2d(8),
nn.MaxPool2d(2,2),
)
self.layer1 = nn.Sequential(
nn.Conv2d(8, 32, kernel_size=(3,3),stride =(1,1), padding=1),
nn.ReLU(),
nn.BatchNorm2d(32),
nn.MaxPool2d(2,2),
)
self.layer2 = nn.Sequential(
nn.Conv2d(32, 64, kernel_size=(3,3),stride =(1,1), padding=1),
nn.ReLU(),
nn.BatchNorm2d(64),
nn.MaxPool2d(2,2)
)
self.rnn = nn.LSTM(self.height//8*64, self.enc_hid_dim, bidirectional=True)
self.fc = nn.Linear(enc_hid_dim * 2, dec_hid_dim)
self.dropout = nn.Dropout(dropout)
self.cnn_dropout = nn.Dropout(p=0.2)
def forward(self, src, in_data_len, train):
batch_size = src.shape[0]
out = self.layer0(src)
out = self.layer1(out)
out = self.layer2(out)
out = self.dropout(out) # torch.Size([batch, channel, h, w])
out = out.permute(3, 0, 2, 1) # (width, batch, height, channels)
out.contiguous()
out = out.reshape(-1, batch_size, self.height//8*64) #(w,batch, (height, channels))
width = out.shape[0]
src_len = in_data_len.numpy()*(width/self.width)
src_len = src_len + 0.999 # in case of 0 length value from float to int
src_len = src_len.astype('int')
out = pack_padded_sequence(out, src_len.tolist(), batch_first=False)
outputs, hidden_out = self.rnn(out)
hidden=hidden_out[0]
cell=hidden_out[1]
# output: t, b, f*2 hidden: 2, b, f
outputs, output_len = pad_packed_sequence(outputs, batch_first=False)
hidden = torch.tanh(self.fc(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1)))
cell = torch.tanh(self.fc(torch.cat((cell[-2,:,:], cell[-1,:,:]), dim = 1)))
return outputs, hidden, cell, output_len
class Decoder(nn.Module):
def init(self, output_dim, emb_dim, enc_hid_dim, dec_hid_dim, dropout, attention):
super().init()
self.output_dim = output_dim
self.attention = attention
self.embedding = nn.Embedding(output_dim, emb_dim)
self.rnn = nn.LSTM((enc_hid_dim * 2) + emb_dim, dec_hid_dim)
self.fc_out = nn.Linear((enc_hid_dim * 2) + dec_hid_dim + emb_dim, output_dim)
self.dropout_layer = nn.Dropout(dropout)
def forward(self, input, hidden, cell, encoder_outputs, train):
input=torch.topk(input,1)[1]
embedded = self.embedding(input)
if train:
embedded=self.dropout_layer(embedded)
embedded = embedded.permute(1, 0, 2)
#embedded = [1, batch size, emb dim]
a = self.attention(hidden, encoder_outputs)
#a = [batch size, src len]
a = a.unsqueeze(1)
#a = [batch size, 1, src len]
encoder_outputs = encoder_outputs.permute(1, 0, 2)
#encoder_outputs = [batch size, src len, enc hid dim * 2]
weighted = torch.bmm(a, encoder_outputs)
weighted = weighted.permute(1, 0, 2)
#weighted = [1, batch size, enc hid dim * 2]
rnn_input = torch.cat((embedded, weighted), dim = 2)
output, hidden_out = self.rnn(rnn_input (hidden.unsqueeze(0),cell.unsqueeze(0)))
hidden=hidden_out[0]
cell=hidden_out[1]
assert (output == hidden).all()
embedded = embedded.squeeze(0)
output = output.squeeze(0)
weighted = weighted.squeeze(0)
prediction = self.fc_out(torch.cat((output, weighted, embedded), dim = 1))
return prediction, hidden.squeeze(0), cell.squeeze(0)
`
I am new to control systems and have tried to use GEKKO. For my model simulation geos correct and run however at MHE and also at MPC It runs correctly when t = 15 seconds after that if t = 16 seconds at the 12th it goes skeptical as shown on the attached figure. Is there a reason for that?
m = GEKKO(remote = False)
g = m.Const(value = 9.81)
Cc = m.Const(value = 2*10**-5)
D1 = m.Const(value = 0.1016)
D2 = m.Const(value = 0.1016)
h1 = m.Const(value = 100)
hv = m.Const(value = 1000)
L1 = m.Const(value = 500)
L2 = m.Const(value = 1100)
V1 = m.Const(value = 4.054)
V2 = m.Const(value = 9.729)
A1 = m.Const(0.008107)
A2 = m.Const(0.008107)
beta1 = m.Const(value = 1.5 * 10**9)
beta2 = m.Const(value = 1.5* 10**9)
Pres = m.Const(value = 1.261 * 10**7)
M = m.Const(value = 1.992 * 10**8)
rho = m.Const(value = 932)
PI = m.Const(value = 2.32 * 10**(-9))
visc = m.Const(value = 0.024)
fo = m.Const(value = 60)
Inp = m.Const(value = 65)
Pnp = m.Const(value = 1.625 * 10**5)
z = m.MV(1, lb = 0.1, ub = 1)
f = m.MV(35, lb = 35, ub = 65)
f.STATUS = 1
f.DMAX = 30
f.DCOST = 0.0002 # slow down change of frequency
f.UPPER = 65
Ppin = m.CV()
Ppin.STATUS = 1 # add the SP to the objective
Ppin.SP = 6e6 # set point
Ppin.TR_INIT = 1 # set point trajectory
Ppin.TAU = 5 # time constant of trajectory
Pwf = m.Var(ub = Pres)
Ppout = m.Var()
Pwh = m.Var()
Pm = m.Var()
P_fric_drop = m.Var()
F1 = m.Var()
F2 = m.Var()
q = m.Var(lb = 0)
qc = m.Var(lb = 0)
qr = m.Var(lb = 0)
I = m.Var()
H = m.Var()
BHP = m.Var(lb = 0)
BHPo = m.Var()
qo = m.Var()
Ho = m.Var()
m.Equation([Pwh.dt() == beta2*(q-qc)/V2,
Pwf.dt() == beta1*(qr-q)/V1,
q.dt() == (1/M)*(Pwf - Pwh - rho*g*hv - P_fric_drop + (Ppout-Ppin))])
m.Equation([qr == PI*(Pres - Pwf),
qc == Cc*z*(m.sqrt((Pwh - Pm))),
Pm == Pwh/2,
P_fric_drop == F1 + F2,
F1 == 0.158 * rho * L1* q**2 * visc**0.5 /(D1 * A1**2 * (rho*D1*q)**0.5),
F2 == 0.158 * rho * L2* q**2 * visc**0.5 /(D2 * A2**2 * (rho*D2*q)**0.5)])
m.Equation([
qo == q * fo/f,
H == Ho * (f/fo)**2,
BHPo == -2.3599e9*qo**3 - 1.8082e7*qo**2 + 4.3346e6*qo + 9.4355e4,
BHP == BHPo * (f/fo)**3,
Ppin == Pwf - rho*g*h1 - F1,
Ho == 9.5970e2+7.4959e3*qo+-1.2454e6*qo**2,
I == Inp * BHP / Pnp,
Ppout == H*rho*g + Ppin
])
m.options.solver = 1
m.options.MAX_ITER = 250
m.options.IMODE = 1
m.solve(debug=True)
tf = 30 # final time (sec)
st = 2.0 # simulation time intervals
nt = int(tf/st)+1 # simulation time points
m.time = np.linspace(0,tf,nt)
m.options.CV_TYPE = 2 # squared error
m.options.solver = 3
m.options.NODES=2
m.options.IMODE = 5
m.solve(debug = False)
I wasn't able to reproduce the problem you encountered so I'll give a few specific suggestions for troubleshooting your application and improving the ability of the solver to find a solution.
Help the Solver Find a Solution
Avoid sqrt(-negative)
#qc == Cc*z*(m.sqrt((Pwh - Pm)))
qc**2 == Cc**2 * z**2 * (Pwh - Pm) # avoid negative in sqrt
Avoid potential for divide by zero. The value of q is constrained away from zero but look for other equations that could be improved by re-arrangement.
#qo == q * fo/f
qo * f == q * fo
#H == Ho * (f/fo)**2
H * fo**2 == Ho * f**2
#I == Inp * BHP / Pnp
I * Pnp == Inp * BHP
Constrain the FVs and MVs
It can often help to put small DMAX or limit the upper and lower bound for FVs and MVs. Constraints on the Var types can lead to infeasible solutions.
Reduce the degrees of freedom
For MHE applications, I recommend that you use FVs instead of MVs for your unknown parameters. The FVs calculate a single value over all of the measurements. The MVs calculate a new parameter value for every time point. Using an FV instead of an MV can help the solver because there are fewer decisions for the optimizer and the parameter values won't be adjusted to track signal noise.
Check the Solutions
Configure the objective function.
For MHE applications, you need to have some CV values that you are trying to match that have FSTATUS=1. In the example you posted there are no CVs with FSTATUS=1 or any inserted measurements. MHE is to align a model with measured CVs values by adjusting the unknown parameters as FVs or MVs.
Check the solver status to make sure that it found a successful solution with APPSTAUTS==1
if m.options.APPSTATUS==1:
print('Successful Solution)
else:
print('Solver Failed to Find a Solution')
Create plots of the variables, especially if there is a problematic period. You can often see where there is an integrating variable such as qr or qc that are approaching a variable constraint. Here is an example where I converted your MHE application into an MPC application. I'm adjusting the friction factor f (not realistic) and the valve opening z to show how the MPC drives Ppin to the target set point.
from gekko import GEKKO
import numpy as np
import matplotlib.pyplot as plt
m = GEKKO(remote = False)
g = m.Const(value = 9.81)
Cc = m.Const(value = 2*10**-5)
D1 = m.Const(value = 0.1016)
D2 = m.Const(value = 0.1016)
h1 = m.Const(value = 100)
hv = m.Const(value = 1000)
L1 = m.Const(value = 500)
L2 = m.Const(value = 1100)
V1 = m.Const(value = 4.054)
V2 = m.Const(value = 9.729)
A1 = m.Const(0.008107)
A2 = m.Const(0.008107)
beta1 = m.Const(value = 1.5 * 10**9)
beta2 = m.Const(value = 1.5* 10**9)
Pres = m.Const(value = 1.261 * 10**7)
M = m.Const(value = 1.992 * 10**8)
rho = m.Const(value = 932)
PI = m.Const(value = 2.32 * 10**(-9))
visc = m.Const(value = 0.024)
fo = m.Const(value = 60)
Inp = m.Const(value = 65)
Pnp = m.Const(value = 1.625 * 10**5)
z = m.MV(1, lb = 0.1, ub = 1)
z.STATUS = 1
z.DMAX = 0.01
f = m.MV(35, lb = 35, ub = 65)
f.STATUS = 1
f.DMAX = 30
f.DCOST = 0.0002 # slow down change of frequency
f.UPPER = 65
Ppin = m.CV()
Ppin.STATUS = 1 # add the SP to the objective
Ppin.SP = 6e6 # set point
Ppin.TR_INIT = 1 # set point trajectory
Ppin.TAU = 5 # time constant of trajectory
Pwf = m.Var(ub = Pres)
Ppout = m.Var()
Pwh = m.Var()
Pm = m.Var()
P_fric_drop = m.Var()
F1 = m.Var()
F2 = m.Var()
q = m.Var(lb = 0)
qc = m.Var(lb = 0)
qr = m.Var(lb = 0)
I = m.Var()
H = m.Var()
BHP = m.Var(lb = 0)
BHPo = m.Var()
qo = m.Var()
Ho = m.Var()
m.Equation([Pwh.dt() == beta2*(q-qc)/V2,
Pwf.dt() == beta1*(qr-q)/V1,
M * q.dt() == Pwf - Pwh - rho*g*hv - P_fric_drop + (Ppout-Ppin)])
m.Equation([qr == PI*(Pres - Pwf),
qc == Cc*z*(m.sqrt((Pwh - Pm))),
Pm == Pwh/2,
P_fric_drop == F1 + F2,
F1 == 0.158 * rho * L1* q**2 * visc**0.5 /(D1 * A1**2 * (rho*D1*q)**0.5),
F2 == 0.158 * rho * L2* q**2 * visc**0.5 /(D2 * A2**2 * (rho*D2*q)**0.5)])
m.Equation([
qo * f == q * fo,
H * fo**2 == Ho * f**2,
BHPo == -2.3599e9*qo**3 - 1.8082e7*qo**2 + 4.3346e6*qo + 9.4355e4,
BHP == BHPo * (f/fo)**3,
Ppin == Pwf - rho*g*h1 - F1,
Ho == 9.5970e2+7.4959e3*qo+-1.2454e6*qo**2,
I * Pnp == Inp * BHP,
Ppout == H*rho*g + Ppin
])
m.options.solver = 1
m.options.MAX_ITER = 250
m.options.IMODE = 1
m.solve()
tf = 30 # final time (sec)
st = 2.0 # simulation time intervals
nt = int(tf/st)+1 # simulation time points
m.time = np.linspace(0,tf,nt)
m.options.CV_TYPE = 2 # squared error
m.options.solver = 3
m.options.NODES=2
m.options.IMODE = 6
plt.figure(figsize=(10,5))
for i in range(10):
m.solve(disp=False)
print(i,m.options.APPSTATUS)
plt.subplot(10,1,i+1)
plt.plot(m.time+i*st,Ppin.value)
plt.xlim([0,50])
plt.ylim([6e6,7e6])
plt.show()
Your application looks like it is for hydraulic pressure control in drilling. There are additional models available on GitHub. I hope you'll consider contributing your model or case studies to the open-source repository.
def adaboost(X_train, Y_train, X_test, Y_test, lamb=0.01, num_iterations=200, learning_rate=0.001):
label_train = 2*Y_train -1
label_test = 2*Y_test -1
[n,p] = X_train.shape
[ntest, ptest] = X_test.shape
X_train_1 = np.concatenate((np.ones([n,1]), X_train), axis=1)
X_test_1 = np.concatenate((np.ones([ntest,1]), X_test), axis=1)
beta = np.zeros([p+1])
acc_train = []
acc_test = []
#margins = []
for it in range(num_iterations):
score = np.matmul(X_train_1, beta)
error = (score*label_train < 1)
dbeta = np.mean(X_train_1 * (error * label_train).reshape(-1,1), axis=0)
beta += learning_rate * dbeta
beta[1:] -= lamb * beta[1:]
#margins.append(np.min(score*label_train))
# train
predict = (np.sign(score) == label_train)
acc = np.sum(predict)/n
acc_train.append(acc)
# test
score_test = np.matmul(X_test_1, beta)
predict = (np.sign(score_test) == label_test)
acc = np.sum(predict)/ntest
acc_test.append(acc)
return beta, acc_train, acc_test
I am calling this function by:
_, train_acc, test_acc = adaboost(X_train, y_train, X_test, y_test)
and it is giving the error provided in title:
for line 68 '''[ntest, ptest] = X_test.shape'''
Any idea how to stop getting this error?
Can someone explain what I am doing wrong??
Whatever X_test is, it must have only a single dimension when it should be two dimensional
I've been training a U-Net for single class small lesion segmentation, and have been getting consistently volatile validation loss. I have about 20k images split 70/30 between training and validation sets-so I don't think the issue is too little data. I've tried shuffling and resplitting the sets a few times with no change in volatility-so I don't think the validation set is unrepresentative. I have tried lowering the learning rate with no effect on volatility. And I have tried a few loss functions (dice coefficient, focal tversky, weighted binary cross-entropy). I'm using a decent amount of augmentation so as to avoid overfitting. I've also run through all my data (512x512 float64s with corresponding 512x512 int64 masks--both stored as numpy arrays) do double check that the value range, dtypes, etc. aren't screwy...and I even removed any ROIs in the masks under 35 pixels in area which I thought might be artifact and messing with loss.
I'm using keras ImageDataGen.flow_from_directory...I was initially using zca_whitening and brightness_range augmentation but I think this causes issues with flow_from_directory and the link between mask and image being lost.. so I skipped this.
I've tried validation generators with and without shuffle=True. Batch size is 8.
Here's some of my code, happy to include more if it would help:
# loss
from keras.losses import binary_crossentropy
import keras.backend as K
import tensorflow as tf
epsilon = 1e-5
smooth = 1
def dsc(y_true, y_pred):
smooth = 1.
y_true_f = K.flatten(y_true)
y_pred_f = K.flatten(y_pred)
intersection = K.sum(y_true_f * y_pred_f)
score = (2. * intersection + smooth) / (K.sum(y_true_f) + K.sum(y_pred_f) + smooth)
return score
def dice_loss(y_true, y_pred):
loss = 1 - dsc(y_true, y_pred)
return loss
def bce_dice_loss(y_true, y_pred):
loss = binary_crossentropy(y_true, y_pred) + dice_loss(y_true, y_pred)
return loss
def confusion(y_true, y_pred):
smooth=1
y_pred_pos = K.clip(y_pred, 0, 1)
y_pred_neg = 1 - y_pred_pos
y_pos = K.clip(y_true, 0, 1)
y_neg = 1 - y_pos
tp = K.sum(y_pos * y_pred_pos)
fp = K.sum(y_neg * y_pred_pos)
fn = K.sum(y_pos * y_pred_neg)
prec = (tp + smooth)/(tp+fp+smooth)
recall = (tp+smooth)/(tp+fn+smooth)
return prec, recall
def tp(y_true, y_pred):
smooth = 1
y_pred_pos = K.round(K.clip(y_pred, 0, 1))
y_pos = K.round(K.clip(y_true, 0, 1))
tp = (K.sum(y_pos * y_pred_pos) + smooth)/ (K.sum(y_pos) + smooth)
return tp
def tn(y_true, y_pred):
smooth = 1
y_pred_pos = K.round(K.clip(y_pred, 0, 1))
y_pred_neg = 1 - y_pred_pos
y_pos = K.round(K.clip(y_true, 0, 1))
y_neg = 1 - y_pos
tn = (K.sum(y_neg * y_pred_neg) + smooth) / (K.sum(y_neg) + smooth )
return tn
def tversky(y_true, y_pred):
y_true_pos = K.flatten(y_true)
y_pred_pos = K.flatten(y_pred)
true_pos = K.sum(y_true_pos * y_pred_pos)
false_neg = K.sum(y_true_pos * (1-y_pred_pos))
false_pos = K.sum((1-y_true_pos)*y_pred_pos)
alpha = 0.7
return (true_pos + smooth)/(true_pos + alpha*false_neg + (1-alpha)*false_pos + smooth)
def tversky_loss(y_true, y_pred):
return 1 - tversky(y_true,y_pred)
def focal_tversky(y_true,y_pred):
pt_1 = tversky(y_true, y_pred)
gamma = 0.75
return K.pow((1-pt_1), gamma)
model = BlockModel((len(os.listdir(os.path.join(imageroot,'train_ct','train'))), 512, 512, 1),filt_num=16,numBlocks=4)
#model.compile(optimizer=Adam(learning_rate=0.001), loss=weighted_cross_entropy)
#model.compile(optimizer=Adam(learning_rate=0.001), loss=dice_coef_loss)
model.compile(optimizer=Adam(learning_rate=0.001), loss=focal_tversky)
train_mask = os.path.join(imageroot,'train_masks')
val_mask = os.path.join(imageroot,'val_masks')
model.load_weights(model_weights_path) #I'm initializing with some pre-trained weights from a similar model
data_gen_args_mask = dict(
rotation_range=10,
shear_range=20,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=[0.8,1.2],
horizontal_flip=True,
#vertical_flip=True,
fill_mode='nearest',
data_format='channels_last'
)
data_gen_args = dict(
**data_gen_args_mask
)
image_datagen_train = ImageDataGenerator(**data_gen_args)
mask_datagen_train = ImageDataGenerator(**data_gen_args)#_mask)
image_datagen_val = ImageDataGenerator()
mask_datagen_val = ImageDataGenerator()
seed = 1
BS = 8
steps = int(np.floor((len(os.listdir(os.path.join(train_ct,'train'))))/BS))
print(steps)
val_steps = int(np.floor((len(os.listdir(os.path.join(val_ct,'val'))))/BS))
print(val_steps)
train_image_generator = image_datagen_train.flow_from_directory(
train_ct,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
train_mask_generator = mask_datagen_train.flow_from_directory(
train_mask,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
val_image_generator = image_datagen_val.flow_from_directory(
val_ct,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
val_mask_generator = mask_datagen_val.flow_from_directory(
val_mask,
target_size = (512, 512),
color_mode = ("grayscale"),
classes=None,
class_mode=None,
seed = seed,
shuffle = True,
batch_size = BS)
train_generator = zip(train_image_generator, train_mask_generator)
val_generator = zip(val_image_generator, val_mask_generator)
# make callback for checkpointing
plot_losses = PlotLossesCallback(skip_first=0,plot_extrema=False)
%matplotlib inline
filepath = os.path.join(versionPath, model_version + "_saved-model-{epoch:02d}-{val_loss:.2f}.hdf5")
if reduce:
cb_check = [ModelCheckpoint(filepath,monitor='val_loss',
verbose=1,save_best_only=False,
save_weights_only=True,mode='auto',period=1),
reduce_lr,
plot_losses]
else:
cb_check = [ModelCheckpoint(filepath,monitor='val_loss',
verbose=1,save_best_only=False,
save_weights_only=True,mode='auto',period=1),
plot_losses]
# train model
history = model.fit_generator(train_generator, epochs=numEp,
steps_per_epoch=steps,
validation_data=val_generator,
validation_steps=val_steps,
verbose=1,
callbacks=cb_check,
use_multiprocessing = False
)
And here's how my loss looks:
Another potentially relevant thing: I tweaked the flow_from_directory code a bit (added npy to the white list). But training loss looks fine so assuming the issue isnt here
Two suggestions:
Switch to the classic validation data format (i.e. numpy array) instead of using a generator -- this will ensure you always use the exactly same validation data every time. If you see a different validation curve, then there is something "random" in the validation generator giving you different data at different epochs.
Use a fixed set of samples (100 or 1000 should be enough w/o any data augmentation) for both training and validation. If everything goes well, you should see your network quickly overfit to this dataset and your training and validation curves should very much similar. If not, debug your network.