user-defined kernels on multiple gpu in cupy

user-defined kernels on multiple gpu in cupy - multi-gpu

I'm trying to launch raw kernels in cupy on multiple gpu.
Example 1 (RawKernel):
import numpy as np
import cupy
gpu_indices = (0,1)
n_gpu = len(gpu_indices)
source = """
extern "C" {
__global__
void my_add(const float *x1, const float *x2, float *y, const int n)
{
const int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid < n) {
y[tid] = x1[tid] + x2[tid];
}
}
}
"""
add_kernel = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
add_kernel[gpu_id] = cupy.RawKernel(source, 'my_add')
x1 = [None] * n_gpu
x2 = [None] * n_gpu
y = [None] * n_gpu
streams = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
x1[gpu_id] = cupy.arange(25, dtype=cupy.float32).reshape(5, 5)
x2[gpu_id] = cupy.arange(25, dtype=cupy.float32).reshape(5, 5)
y[gpu_id] = cupy.zeros((5, 5), dtype=cupy.float32)
streams[gpu_id] = cupy.cuda.stream.Stream()
for gpu_id in gpu_indices:
cupy.cuda.Device(gpu_id).use()
streams[gpu_id].use()
add_kernel[gpu_id]((5,), (5,), (x1[gpu_id], x2[gpu_id], y[gpu_id], x1[gpu_id].size), stream=streams[gpu_id])
streams[gpu_id].synchronize()
print(y[gpu_id])
This code runs without error with gpu_id=(0,) (single gpu) and gpu_id=(0,1) (dual gpu) on my machine.
Example 2 (RawModule):
import numpy as np
import cupy
gpu_indices = (0,1)
n_gpu = len(gpu_indices)
source = """
extern "C" {
__global__
void my_add(const float *x1, const float *x2, float *y, const int n)
{
const int tid = threadIdx.x + blockDim.x * blockIdx.x;
if (tid < n) {
y[tid] = x1[tid] + x2[tid];
}
}
}
"""
module = cupy.RawModule(code=source)
add_kernel = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
add_kernel[gpu_id] = module.get_function('my_add')
# same as example 1
x1 = [None] * n_gpu
x2 = [None] * n_gpu
y = [None] * n_gpu
streams = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
x1[gpu_id] = cupy.arange(25, dtype=cupy.float32).reshape(5, 5)
x2[gpu_id] = cupy.arange(25, dtype=cupy.float32).reshape(5, 5)
y[gpu_id] = cupy.zeros((5, 5), dtype=cupy.float32)
streams[gpu_id] = cupy.cuda.stream.Stream()
for gpu_id in gpu_indices:
cupy.cuda.Device(gpu_id).use()
streams[gpu_id].use()
add_kernel[gpu_id]((5,), (5,), (x1[gpu_id], x2[gpu_id], y[gpu_id], x1[gpu_id].size), stream=streams[gpu_id])
streams[gpu_id].synchronize()
print(y[gpu_id])
Only difference between two codes is the way of getting kernel. In Example 2, cupy.RawModule() is called to compile source and kernels are acquired by using get_function(). However, this code fails with gpu_id=(0,1):
[[ 0. 2. 4. 6. 8.]
[10. 12. 14. 16. 18.]
[20. 22. 24. 26. 28.]
[30. 32. 34. 36. 38.]
[40. 42. 44. 46. 48.]]
Traceback (most recent call last):
File "test_rawmodule.py", line 42, in <module>
add_kernel[gpu_id]((5,), (5,), (x1[gpu_id], x2[gpu_id], y[gpu_id], x1[gpu_id].size), stream=streams[gpu_id])
File "cupy/core/raw.pyx", line 66, in cupy.core.raw.RawKernel.__call__
File "cupy/cuda/function.pyx", line 162, in cupy.cuda.function.Function.__call__
File "cupy/cuda/function.pyx", line 144, in cupy.cuda.function._launch
File "cupy/cuda/driver.pyx", line 293, in cupy.cuda.driver.launchKernel
File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_INVALID_HANDLE: invalid resource handle
Could anyone provide an example of cupy.RawModule on multiple gpu? To me RawModule is preferred over RawKernel for some reasons.
Here is my environment:
cupy 7.3, numpy 1.18.1
cuda 10.2 (10.2.89_440.33.01)
python 3.6.10 (anaconda)
Linux Mint 19.1 Tessa (based on Ubuntu 18.04)
Two gtx 1080 Ti gpu (works well with native CUDA C/C++ programming)
add: If I try to compile source on each gpu like:
module = [None] * n_gpu
add_kernel = [None] * n_gpu
for gpu_id in gpu_indices:
with cupy.cuda.Device(gpu_id):
module[gpu_id] = cupy.RawModule(code=source)
add_kernel[gpu_id] = module[gpu_id].get_function('my_add')
Then I have following error messages:
module[gpu_id] = cupy.RawModule(code=source)
File "cupy/core/raw.pyx", line 252, in cupy.core.raw.RawModule.__init__
File "cupy/core/carray.pxi", line 146, in cupy.core.core.compile_with_cache
File "cupy/core/carray.pxi", line 194, in cupy.core.core.compile_with_cache
File "/home/kejang/anaconda3/envs/python3.6/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 287, in compile_with_cache
extra_source, backend)
File "/home/kejang/anaconda3/envs/python3.6/lib/python3.6/site-packages/cupy/cuda/compiler.py", line 335, in _compile_with_cache_cuda
mod.load(cubin)
File "cupy/cuda/function.pyx", line 197, in cupy.cuda.function.Module.load
File "cupy/cuda/function.pyx", line 199, in cupy.cuda.function.Module.load
File "cupy/cuda/driver.pyx", line 240, in cupy.cuda.driver.moduleLoadData
File "cupy/cuda/driver.pyx", line 118, in cupy.cuda.driver.check_status
cupy.cuda.driver.CUDADriverError: CUDA_ERROR_CONTEXT_IS_DESTROYED: context is destroyed
This is strange because I think source should be compiled on each gpu like Example 1.

Related

Using cython functions with multiprocessing failed

I have the following code in my jupyter notebook on my MacOS to compute similarity measure between list pairs. When running distSeq on a pair of lists distSeq(list1,list2,len1,len2), the similarity measure can be successfully computed. However, when I use multiprocessing in another cell, trying to compute the measure among many list pairs, a ModuleNotFound error is raised, and I'm wondering the problem is as I can't find any answer online.
Define functions in a cython cell
%%cython --annotate
cimport numpy as np
import numpy as np
import osmnx as ox
from scipy.spatial.distance import cdist
import cython
from __main__ import G
def simPnt(p1_n,p2_m):
*details cleared*
cpdef list lcs(list r1,list r2): # longest common subsequence btw route 1 and route 2
cdef int N = len(r1)
cdef int M = len(r2) # route is list of node ID
L_arr = np.empty((N+1,M+1),dtype=np.single)
cdef float[:,:] L = L_arr #declare L -> sum of similarity score
cdef list LCS=[]
sp_arr = cdist(np.array(r1).reshape((N,1)),np.array(r2).reshape((M,1)),simPnt)
cdef double[:,:] sp = sp_arr
cdef int n, m
for n in range(N+1):
for m in range(M+1):
if n == 0 or m == 0 :
L[n,m] = 0
else:
L[n,m] = max([L[n-1,m-1]+sp[n-1,m-1],L[n,m-1],L[n-1,m]])
# backtrack
n,m = N,M
cdef float tmp
while n > 0 and m > 0:
tmp = L[n-1,m-1]+sp[n-1,m-1]
if tmp > L[n,m-1] and tmp > L[n-1,m]:
LCS.append((n-1,m-1))
n -= 1
m -= 1
elif L[n-1,m] > L[n,m-1]:
n-=1
else:
m-=1
#print(LCS)
return LCS[::-1] # matched points solely from either route 1 or 2
cpdef float distSeq(list r1,list r2,float lenseq_r1,float lenseq_r2):
*details cleared*
if min(lenseq_LCS1,lenseq_LCS2) < gamma:
return 1
elif lenseq_r1<lenseq_r2:
return 1- (lenseq_LCS1/lenseq_r1)
else:
return 1- (lenseq_LCS2/lenseq_r2)
Run the function with multiprocessing
import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
results = [pool.apply(distSeq, args=(r1,r2,lenseq[i],lenseq[j])) for ((i,r1),(j,r2)) in itertools.combinations(enumerate(routes), 2)]
df2 = pd.DataFrame()
df2['r1_r2'] = list(itertools.combinations(list(range(len(routes))),2))
df2['distSeq'] = results
pool.close()
Error log
Process SpawnPoolWorker-28:
Traceback (most recent call last):
File "/Users/timmyhsu/miniconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
self.run()
File "/Users/timmyhsu/miniconda3/envs/py3.8/lib/python3.8/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/Users/timmyhsu/miniconda3/envs/py3.8/lib/python3.8/multiprocessing/pool.py", line 114, in worker
task = get()
File "/Users/timmyhsu/miniconda3/envs/py3.8/lib/python3.8/multiprocessing/queues.py", line 358, in get
return _ForkingPickler.loads(res)
ModuleNotFoundError: No module named '_cython_magic_86f410b10c6f86b67ee02703c26cace8'

Errror message for py tesseract

im a noobie at python. Im trying to automate the "chimp test" on human benchmark. I decided to use tesseract.
import pyautogui as p
import pytesseract
from PIL import Image
import PIL.ImageOps
import time as t
a,b = 0,0
while a==0 and b==0:
try:
a,b = p.locateCenterOnScreen("Start.png",confidence=0.7)
except TypeError:
pass
p.click(a,b)
columns = 8
rows = 5
squareSize = 88
numbers = 4
while True:
t.sleep(1)
p.moveTo(100,100)
image = p.screenshot(region = (320, 96, columns * squareSize, rows * squareSize))
image = image.convert("L")
image = PIL.ImageOps.invert(image)
pixels = image.load()
for x in range(image.size[0]):
for y in range(image.size[1]):
if pixels[x,y] > 10:
pixels[x,y] = 255
else:
pixels[x,y] = 0
coords = [(0,0)] * numbers
grid = []
for y in range(rows):
row = []
for x in range(columns):
digit = pytesseract.image_to_string(image.crop((x * squareSize, y * squareSize, (x+1) * squareSize, (y+1) * squareSize)), config="--psm 10 --oem 2 -c tessedit_char_whitelist=0123456789 classify_max_slope=20 classify_min_slope=0.2")
digit = digit[:-2]
if digit.isdigit():
coords[int(digit)-1] = ((x+0.5)*squareSize,(y+0.5)*squareSize)
row.append(digit)
print("At " + str(x) + "," + str(y) + " pytesseract saw: " + digit)
grid.append(row)
print(grid)
for z in range(numbers):
p.click(coords[z][0]+320,coords[z][1]+96)
a,b = 0,0
while a==0 and b==0:
try:
a,b = p.locateCenterOnScreen("Continue.JPG",confidence=0.7)
except TypeError:
pass
p.click(a,b)
numbers += 1
and i get the error message
Traceback (most recent call last):
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\site-packages\pytesseract\pytesseract.py", line 254, in run_tesseract
proc = subprocess.Popen(cmd_args, **subprocess_args())
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 966, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\subprocess.py", line 1435, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\franc\Desktop\New folder\New Text Document.py", line 40, in <module>
digit = pytesseract.image_to_string(image.crop((x * squareSize, y * squareSize, (x+1) * squareSize, (y+1) * squareSize)), config="--psm 10 --oem 2 -c tessedit_char_whitelist=0123456789 classify_max_slope=20 classify_min_slope=0.2")
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\site-packages\pytesseract\pytesseract.py", line 416, in image_to_string
return {
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\site-packages\pytesseract\pytesseract.py", line 419, in <lambda>
Output.STRING: lambda: run_and_get_output(*args),
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\site-packages\pytesseract\pytesseract.py", line 286, in run_and_get_output
run_tesseract(**kwargs)
File "C:\Users\franc\AppData\Local\Programs\Python\Python310\lib\site-packages\pytesseract\pytesseract.py", line 258, in run_tesseract
raise TesseractNotFoundError()
pytesseract.pytesseract.TesseractNotFoundError: tesseract is not installed or it's not in your PATH. See README file for more information.
ive installed tesseract correctly, i have uninstalled it, reinstalled it and used it previously too. if i could get an explanation or a fix for this that would be great. Im very new to python so im still learning the ropes

nn.Conv2d causing error in multi-GPU learning

I posted a similar question recently, but I think I overcomplicated the question as I hadn't narrowed down the issue quite yet. Here is the context:
I am currently creating a custom module inside an existing CNN using pytorch. I am doing this as part of my schools research, and so I have access to a super computer with multiple GPU devices. When I train my model, after running through the first validation set I run into the following error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument weight in method wrapper__cudnn_convolution)
Now, this does not happen when I skip the convolution in my module and just put some dummy calculation in. This error also doesn't raise when I use a single gpu or a cpu. Here is my custom module. The convolution takes place in the 'conv_gauss' function at the very bottom:
class DivisiveNormBlock(nn.Module):
def __init__(self, channel_num = 512, size = 56, ksize = 4):
super().__init__()
self.channel_num = channel_num
self.size = size
self.ksize = ksize
scale = 90 # Random scale factor I've been playing with.
self.theta = torch.nn.Parameter(scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda",
requires_grad=True))) # 512 thetas for a channel, 512 channels, same goes for...
self.p = torch.nn.Parameter(
scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.sig = torch.nn.Parameter(
scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.a = torch.nn.Parameter(
scale * torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.nI = torch.nn.Parameter(
torch.abs(torch.randn(self.channel_num, self.channel_num, device="cuda", requires_grad=True)))
self.nU = torch.nn.Parameter(torch.abs(torch.randn(self.channel_num, device="cuda", requires_grad=True)))
self.bias = torch.nn.Parameter(torch.abs(torch.randn(self.channel_num, device="cuda", requires_grad=True)))
self.gaussian_bank = torch.zeros(self.channel_num, self.channel_num, self.ksize * 2+ 1, self.ksize * 2+ 1,
device="cuda")
self.x = torch.linspace(-self.ksize, self.ksize, self.ksize * 2 + 1, device="cuda")
self.y = torch.linspace(-self.ksize, self.ksize, self.ksize * 2 + 1, device="cuda")
self.xv, self.yv = torch.meshgrid(self.x, self.y)
for i in range(self.channel_num):
for u in range(self.channel_num):
self.gaussian_bank[i, u, :, :] = self.get_gaussian(i, u)
p = int((self.ksize * 2) / 2)
conv_kernel_size = self.ksize * 2+ 1
self.conv = nn.Conv2d(self.channel_num, self.channel_num, padding=p, stride=1,
kernel_size=conv_kernel_size, bias=False)
def get_gaussian(self, cc, oc): #
xrot = self.xv * torch.cos(self.theta[cc, oc]) + self.yv * torch.sin(self.theta[cc, oc])
yrot = -self.xv * torch.sin(self.theta[cc, oc]) + self.yv * torch.cos(self.theta[cc, oc])
g_kernel = (self.a[cc, oc] / \
(2 * torch.pi * self.p[cc, oc] * self.sig[cc, oc])) * \
torch.exp(-0.5 * ((((xrot) ** 2) / self.p[cc, oc] ** 2) + ((yrot) ** 2) / self.sig[cc, oc] ** 2))
return g_kernel
def forward(self, x):
x_test = self.dn_f(x)
return x
def dn_f(self, x):
batch_size = x.shape[0]
under_sum = torch.zeros((self.channel_num, self.size, self.size), device="cuda")
normalized_channels = torch.zeros((batch_size, self.channel_num, self.size, self.size), device="cuda")
for b in tqdm(range(batch_size)):
for i in range(self.channel_num):
for u in range(self.channel_num):
under_sum[u] = self.conv_gauss(torch.pow(x[b, i], self.nI[i, u]), self.gaussian_bank[i, u])
#under_sum[u] = under_sum[u]
normalized_channels[b, i] = torch.pow(x[b, i], self.nU[i]) / (
torch.pow(self.bias[i], self.nU[i]) + torch.sum(under_sum, 0))
return normalized_channels
def conv_gauss(self, x_conv, gauss_conv):
x_conv = torch.reshape(x_conv, (1, 1, self.size, self.size))
gauss_conv = torch.reshape(gauss_conv, (1, 1, self.ksize * 2+ 1, self.ksize * 2+ 1))
p = int((self.ksize*2)/2)
self.conv.weight = nn.Parameter(gauss_conv)
output = self.conv(x_conv)
output = torch.reshape(output, (self.size, self.size))
return output
For some extra context, I also did tried this with F.conv2d in the function instead, but to no avail.

How to apply white balance coefficents to RAW image for sRGB output

I want to convert RAW image data (RGGB) to sRGB image. There are many specialized ways to do this but to first understand the basics, I've implemented some easy alogrithms like debayering by resolution-reduction.
My current pipeline is:
Rescale the u16 input data by blacklevel and whitelevel
Apply white balance coefficents
Debayer with size reduction, average for G: g=((g0+g1)/2)
Calculate pseudo-inverse for D65 illuminant XYZ_TO_CAM (from Adobe DNG)
Convert debayered RGB data to XYZ by CAM_TO_XYZ
Convert XYZ to D65 sRGB (matrix taken from Bruce Lindbloom)
Apply gamma correction (simple routine for now, should be replaced by sRGB gamma)
Rescale from [minval..maxval] to [0..1] and convert f32 to u16
Save as tiff
The problem is that if I skip the white balance coefficent multiplication (or just replace them by 1.0) the output image already looks acceptable. If I apply the coefficents (taken from AsShot in DNG) the output has a huge color cast. And I'm not sure if I have to multiply by coef or 1/coef.
The first image is the result of the pipeline with wb_coefs set to 1.0.
The second image is the result with the "correct" wb_coefs.
What is wrong in my pipeline?
Additional question:
I'm not sure about the rescaling process. Do I've to rescale into [0..1] after every step or is it enough to rescale during u16 conversion as final stage?
Full code:
macro_rules! max {
($x: expr) => ($x);
($x: expr, $($z: expr),+) => {{
let y = max!($($z),*);
if $x > y {
$x
} else {
y
}
}}
}
macro_rules! min {
($x: expr) => ($x);
($x: expr, $($z: expr),+) => {{
let y = min!($($z),*);
if $x < y {
$x
} else {
y
}
}}
}
/// sRGB D65
const XYZD65_TO_SRGB: [[f32; 3]; 4] = [
[3.2404542, -1.5371385, -0.4985314],
[-0.9692660, 1.8760108, 0.0415560],
[0.0556434, -0.2040259, 1.0572252],
[0.0, 0.0, 0.0],
];
// buf: RAW image data
fn to_srgb(buf: &Vec<u16>, width: usize, height: usize) {
let w = width / 2;
let h = height / 2;
let blacklevel: [u16; 4] = [511, 511, 511, 511];
let whitelevel: [u16; 4] = [12735, 12735, 12735, 12735];
let xyz2cam_d65: [[i32; 3]; 4] = [[6722, -635, -963], [-4287, 12460, 2028], [-908, 2162, 5668], [0, 0, 0]];
let cam2xyz = convert_matrix::<4>(xyz2cam_d65);
eprintln!("CAM_TO_XYZ: {:?}", cam2xyz);
// from DNG
// As Shot Neutral: 0.518481 1 0.545842
//let wb_coef = [1.0/0.518481, 1.0, 1.0, 1.0/0.545842];
//let wb_coef = [0.518481, 1.0, 1.0, 0.545842];
let wb_coef = [1.0, 1.0, 1.0, 1.0];
// b/w level correction, rescale, debayer
let mut rgb = vec![0.0_f32; width / 2 * height / 2 * 3];
for row in 0..h {
for col in 0..w {
let r0 = buf[(row * 2 + 0) * width + (col * 2) + 0];
let g0 = buf[(row * 2 + 0) * width + (col * 2) + 1];
let g1 = buf[(row * 2 + 1) * width + (col * 2) + 0];
let b0 = buf[(row * 2 + 1) * width + (col * 2) + 1];
let r0 = ((r0.saturating_sub(blacklevel[0])) as f32 / (whitelevel[0] - blacklevel[0]) as f32) * wb_coef[0];
let g0 = ((g0.saturating_sub(blacklevel[1])) as f32 / (whitelevel[1] - blacklevel[1]) as f32) * wb_coef[1];
let g1 = ((g1.saturating_sub(blacklevel[2])) as f32 / (whitelevel[2] - blacklevel[2]) as f32) * wb_coef[2];
let b0 = ((b0.saturating_sub(blacklevel[3])) as f32 / (whitelevel[3] - blacklevel[3]) as f32) * wb_coef[3];
rgb[row * w * 3 + (col * 3) + 0] = r0;
rgb[row * w * 3 + (col * 3) + 1] = (g0 + g1) / 2.0;
rgb[row * w * 3 + (col * 3) + 2] = b0;
}
}
// Convert to XYZ by CAM_TO_XYZ from D65 illuminant
let mut xyz = vec![0.0_f32; w * h * 3];
for row in 0..h {
for col in 0..w {
let r = rgb[row * w * 3 + (col * 3) + 0];
let g = rgb[row * w * 3 + (col * 3) + 1];
let b = rgb[row * w * 3 + (col * 3) + 2];
xyz[row * w * 3 + (col * 3) + 0] = cam2xyz[0][0] * r + cam2xyz[0][1] * g + cam2xyz[0][2] * b;
xyz[row * w * 3 + (col * 3) + 1] = cam2xyz[1][0] * r + cam2xyz[1][1] * g + cam2xyz[1][2] * b;
xyz[row * w * 3 + (col * 3) + 2] = cam2xyz[2][0] * r + cam2xyz[2][1] * g + cam2xyz[2][2] * b;
}
}
// Track min/max value for rescaling/clipping
let mut maxval = 1.0;
let mut minval = 0.0;
// Convert to sRGB from XYZ
let mut srgb = vec![0.0; w * h * 3];
for row in 0..h {
for col in 0..w {
let r = xyz[row * w * 3 + (col * 3) + 0] as f32;
let g = xyz[row * w * 3 + (col * 3) + 1] as f32;
let b = xyz[row * w * 3 + (col * 3) + 2] as f32;
srgb[row * w * 3 + (col * 3) + 0] = XYZD65_TO_SRGB[0][0] * r + XYZD65_TO_SRGB[0][1] * g + XYZD65_TO_SRGB[0][2] * b;
srgb[row * w * 3 + (col * 3) + 1] = XYZD65_TO_SRGB[1][0] * r + XYZD65_TO_SRGB[1][1] * g + XYZD65_TO_SRGB[1][2] * b;
srgb[row * w * 3 + (col * 3) + 2] = XYZD65_TO_SRGB[2][0] * r + XYZD65_TO_SRGB[2][1] * g + XYZD65_TO_SRGB[2][2] * b;
let r = srgb[row * w * 3 + (col * 3) + 0];
let g = srgb[row * w * 3 + (col * 3) + 1];
let b = srgb[row * w * 3 + (col * 3) + 2];
maxval = max!(maxval, r, g, b);
minval = min!(minval, r, g, b);
}
}
gamma_corr(&mut srgb, w, h, 2.2);
let mut output = vec![0_u16; w * h * 3];
for row in 0..h {
for col in 0..w {
let r = srgb[row * w * 3 + (col * 3) + 0];
let g = srgb[row * w * 3 + (col * 3) + 1];
let b = srgb[row * w * 3 + (col * 3) + 2];
output[row * w * 3 + (col * 3) + 0] = (clip(r, minval, maxval) * (u16::MAX as f32)) as u16;
output[row * w * 3 + (col * 3) + 1] = (clip(g, minval, maxval) * (u16::MAX as f32)) as u16;
output[row * w * 3 + (col * 3) + 2] = (clip(b, minval, maxval) * (u16::MAX as f32)) as u16;
}
}
let img = DynamicImage::ImageRgb16(ImageBuffer::from_raw(w as u32, h as u32, output).unwrap());
img.save_with_format("/tmp/test.tif", image::ImageFormat::Tiff).unwrap();
}
fn pseudoinverse<const N: usize>(matrix: [[f32; 3]; N]) -> [[f32; 3]; N] {
let mut result: [[f32; 3]; N] = [Default::default(); N];
let mut work: [[f32; 6]; 3] = [Default::default(); 3];
let mut num: f32 = 0.0;
for i in 0..3 {
for j in 0..6 {
work[i][j] = if j == i + 3 { 1.0 } else { 0.0 };
}
for j in 0..3 {
for k in 0..N {
work[i][j] += matrix[k][i] * matrix[k][j];
}
}
}
for i in 0..3 {
num = work[i][i];
for j in 0..6 {
work[i][j] /= num;
}
for k in 0..3 {
if k == i {
continue;
}
num = work[k][i];
for j in 0..6 {
work[k][j] -= work[i][j] * num;
}
}
}
for i in 0..N {
for j in 0..3 {
result[i][j] = 0.0;
for k in 0..3 {
result[i][j] += work[j][k + 3] * matrix[i][k];
}
}
}
result
}
fn convert_matrix<const N: usize>(adobe_xyz_to_cam: [[i32; 3]; N]) -> [[f32; N]; 3] {
let mut xyz_to_cam: [[f32; 3]; N] = [[0.0; 3]; N];
let mut cam_to_xyz: [[f32; N]; 3] = [[0.0; N]; 3];
for i in 0..N {
for j in 0..3 {
xyz_to_cam[i][j] = adobe_xyz_to_cam[i][j] as f32 / 10000.0;
}
}
eprintln!("XYZ_TO_CAM: {:?}", xyz_to_cam);
let inverse = pseudoinverse::<N>(xyz_to_cam);
for i in 0..3 {
for j in 0..N {
cam_to_xyz[i][j] = inverse[j][i];
}
}
cam_to_xyz
}
fn clip(v: f32, minval: f32, maxval: f32) -> f32 {
(v + minval.abs()) / (maxval + minval.abs())
}
// https://kosinix.github.io/raster/docs/src/raster/filter.rs.html#339-359
fn gamma_corr(rgb: &mut Vec<f32>, w: usize, h: usize, gamma: f32) {
for row in 0..h {
for col in 0..w {
let r = rgb[row * w * 3 + (col * 3) + 0];
let g = rgb[row * w * 3 + (col * 3) + 1];
let b = rgb[row * w * 3 + (col * 3) + 2];
rgb[row * w * 3 + (col * 3) + 0] = r.powf(1.0 / gamma);
rgb[row * w * 3 + (col * 3) + 1] = g.powf(1.0 / gamma);
rgb[row * w * 3 + (col * 3) + 2] = b.powf(1.0 / gamma);
}
}
}
The DNG for this example can be found at: https://chaospixel.com/pub/misc/dng/sample.dng (~40 MiB).

The main reason for getting wrong colors is that we have to normalize the rows of rgb2cam matrix to 1, as described in the following guide.
According to DNG spec:
ColorMatrix1 defines a transformation matrix that converts XYZ values to reference camera native color space values, under the first calibration illuminant.
It means that if the calibration illuminant is D65, the ColorMatrix converts XYZ to "camera RGB".
(Convert it as is, without using any white balance scaling coefficients).
The inverse ColorMatrix, converts from "camera RGB" to XYZ.
After converting XYZ to sRGB, the result is color balanced sRGB.
The conclusions is that ColorMatrix includes the while balance coefficients in it (the white balancing coefficients apply D65 illuminant).
Normalizing the rows of rgb2cam to 1 neutralizes the while balance coefficients, and keeps only the "Color Correction Matrix" (the math is a bit complicated).
Without normalizing the rows, we are scaling by while balance multipliers two times:
Scale coefficients from ColorMatrix that balances the input to D65.
Scale coefficients taken from AsShotNatural that balances the input to the illuminant of the scene (illuminant of the scene is close to D65).
The result of scaling twice is an extreme color cast.
Tracking the maximum in order to avoid "magenta cast in the highlights":
Instead of tracking the actual maximum color values in the input image, we suppose to track the "theoretical maximum color value".
Take whitelevel - blacklevel and scale by the white balance multipliers.
Track the result...
The guiding rule is that the colors supposed to be the same in both cases:
Applying the processing to small patches of the image, and places the patches together (where we can't track the global minimum and maximum).
Applying the processing to the entire image.
I suppose you have to track the maximum of scaled whitelevel - blacklevel, only when white balance multipliers are less than 1.
When all the multipliers are 1 or above, we can clip the result to 1.0, without tracking the maximum.
Note:
there is probably an advantage of scaling down, and tracking the maximum, but I don't know this subject.
In my solution we just multiply upper (above 1.0), and clip the result.
The solution is based on Processing RAW Images in MATLAB guide.
I am posting both MATLAB implementation and Python implementation (but no Rust implementation).
The first step is extracting the raw Bayer image from sample.dng using dcraw command line:
dcraw -4 -D -T sample.dng
Rename the tiff output to sample__lin_bayer.tif.
Conversion process:
Rescale the uint16 input data by blacklevel and whitelevel (subtract blacklevel from all the pixels and scale by whitelevel - blacklevel).
Apply white balance scaling coefficients.
The scaling coefficients equals 1./AsShotNatural.
Scale the red pixels in the Bayer alignment by the red scaling coefficient, scale the greens by the green scaling, and the blues by the blue scaling.
Assumption: the minimum scaling is 1.0 and the others are above 1.0 (we my divide by the minimum scaling to make sure).
Clip the scaled result to [0, 1] (clipping is required due to demosaic implementation limitations).
Demosaicing (Debayer) using MATLAB demosaic function or cv2.cvtColor in Python.
Calculate rgb2cam matrix: rgb2cam = ColorMatrix * rgb2xyz.
rgb2xyz matrix is taken from Bruce Lindbloom site.
Normalize rows of rgb2cam matrix so the sum of each row equals 1 (divide each row by the sum of the row).
Compute cam2rgb matrix by inverting rgb2cam: cam2rgb = inv(rgb2cam).
cam2rgb is the "CCM matrix" (Color Correction Matrix).
Left multiply matrix cam2rgb by each RGB tuple (apply color correction).
Apply gamma correction (use sRGB standard gamma).
Convert to uint8 and save as PNG (PNG format is used for posting in SO website).
MATLAB Implementation:
filename = 'sample__lin_bayer.tif'; % Output of: dcraw -4 -D -T sample.dng
% Exif information:
blacklevel = 511; % blacklevel = meta_info.SubIFDs{1}.BlackLevel(1);
whitelevel = 12735; % whitelevel = meta_info.SubIFDs{1}.WhiteLevel;
AsShotNeutral = [0.5185 1 0.5458];
ColorMatrix = [ 0.6722 -0.0635 -0.0963
-0.4287 1.2460 0.2028
-0.0908 0.2162 0.5668];
bayer_type = 'rggb';
% Constant matrix for converting sRGB to XYZ(D65):
% http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html
rgb2xyz = [0.4124564 0.3575761 0.1804375
0.2126729 0.7151522 0.0721750
0.0193339 0.1191920 0.9503041];
% Read input image (Bayer mosaic alignment, pixels data type is uint16):
raw = imread(filename);
% "Linearizing":
% There is no LinearizationTable so we only have to subtract the black level.
% Convert from range [blacklevel, whitelevel] to range [0, 1].
lin_bayer = (double(raw) - blacklevel) / (whitelevel - blacklevel);
lin_bayer = max(0, min(lin_bayer, 1));
% The White Balance multipliers are 1./AsShotNeutral
wb_multipliers = 1./AsShotNeutral;
r_scale = wb_multipliers(1); % Assume value is above 1
g_scale = wb_multipliers(2); % Assume value = 1
b_scale = wb_multipliers(3); % Assume value is above 1
% Bayer alignment is RGGB:
% R G
% G B
%
% Apply white balancing to linear Bayer image.
balanced_bayer = lin_bayer;
balanced_bayer(1:2:end, 1:2:end) = balanced_bayer(1:2:end, 1:2:end)*r_scale; % Red (indices [1, 3, 5,... ; 1, 3, 5,... ])
balanced_bayer(1:2:end, 2:2:end) = balanced_bayer(1:2:end, 2:2:end)*g_scale; % Green (indices [1, 3, 5,... ; 2, 4, 6,... ])
balanced_bayer(2:2:end, 1:2:end) = balanced_bayer(2:2:end, 1:2:end)*g_scale; % Green (indices [2, 4, 6,... ; 1, 3, 5,... ])
balanced_bayer(2:2:end, 2:2:end) = balanced_bayer(2:2:end, 2:2:end)*b_scale; % Blue (indices [2, 4, 6,... ; 2, 4, 6,... ])
% Clip to range [0, 1] for avoiding "pinkish highlights" (avoiding "magenta cast" in the highlights).
balanced_bayer = min(balanced_bayer, 1);
% Demosaicing
temp = uint16(balanced_bayer*(2^16-1)); % Convert from double to uint16, because MATLAB demosaic() function requires a uint8 or uint16 input.
lin_rgb = double(demosaic(temp, bayer_type))/(2^16-1); % Apply Demosaicing and convert range back type double and range [0, 1].
% Color Space Conversion
xyz2cam = ColorMatrix; % ColorMatrix applies XYZ(D65) to CAM_rgb
rgb2cam = xyz2cam * rgb2xyz;
% Result:
% rgb2cam = [0.2619 0.1835 0.0252
% 0.0921 0.7620 0.2053
% 0.0195 0.1897 0.5379]
% Normalize rows to 1. MATLAB shortcut: rgb2cam = rgb2cam ./ repmat(sum(rgb2cam,2),1,3);
rows_sum = sum(rgb2cam, 2);
% Result:
% rows_sum = [0.4706
% 1.0593
% 0.7470]
% Divide element of every row by the sum of the row:
rgb2cam(1, :) = rgb2cam(1, :) / rows_sum(1); % Divide top row
rgb2cam(2, :) = rgb2cam(2, :) / rows_sum(2); % Divide center row
rgb2cam(3, :) = rgb2cam(3, :) / rows_sum(3); % Divide bottom row
% Result (sum of every row is 1):
% rgb2cam = [0.5566 0.3899 0.0535
% 0.0869 0.7193 0.1938
% 0.0261 0.2539 0.7200]
cam2rgb = inv(rgb2cam); % Invert matrix
% Result:
% cam2rgb = [ 1.9644 -1.1197 0.1553
% -0.2412 1.6738 -0.4326
% 0.0139 -0.5498 1.5359]
R = lin_rgb(:, :, 1);
G = lin_rgb(:, :, 2);
B = lin_rgb(:, :, 3);
% Left multiply matrix cam2rgb by each RGB tuple (convert from "camera RGB" to "linear sRGB").
sR = cam2rgb(1,1)*R + cam2rgb(1,2)*G + cam2rgb(1,3)*B;
sG = cam2rgb(2,1)*R + cam2rgb(2,2)*G + cam2rgb(2,3)*B;
sB = cam2rgb(3,1)*R + cam2rgb(3,2)*G + cam2rgb(3,3)*B;
lin_srgb = cat(3, sR, sG, sB);
lin_srgb = max(min(lin_srgb, 1), 0); % Clip to range [0, 1]
% Convet from "Linear sRGB" to sRGB (apply gamma)
sRGB = lin2rgb(lin_srgb); % lin2rgb MATLAB functions uses the exact formula [you may approximate it to power of (1/gamma)].
% Show the result, and save to sRGB.png
figure;imshow(sRGB);impixelinfo;title('sRGB');
imwrite(im2uint8(sRGB), 'sRGB.png');
% Inverting 3x3 matrix (some help of MATLAB Symbolic Toolbox):
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Assume:
% A = [ a11, a12, a13]
% [ a21, a22, a23]
% [ a31, a32, a33]
%
% 1. Compute determinant of A:
% detA = a11*a22*a33 - a11*a23*a32 - a12*a21*a33 + a12*a23*a31 + a13*a21*a32 - a13*a22*a31
%
% 2. Compute the inverse of the matrix A:
% invA = [ (a22*a33 - a23*a32)/detA, -(a12*a33 - a13*a32)/detA, (a12*a23 - a13*a22)/detA
% -(a21*a33 - a23*a31)/detA, (a11*a33 - a13*a31)/detA, -(a11*a23 - a13*a21)/detA
% (a21*a32 - a22*a31)/detA, -(a11*a32 - a12*a31)/detA, (a11*a22 - a12*a21)/detA]
Python implementation:
import numpy as np
import cv2
def lin2rgb(im):
""" Convert im from "Linear sRGB" to sRGB - apply Gamma. """
# sRGB standard applies gamma = 2.4, Break Point = 0.00304 (and computed Slope = 12.92)
g = 2.4
bp = 0.00304
inv_g = 1/g
sls = 1 / (g/(bp**(inv_g - 1)) - g*bp + bp)
fs = g*sls / (bp**(inv_g - 1))
co = fs*bp**(inv_g) - sls*bp
srgb = im.copy()
srgb[im <= bp] = sls * im[im <= bp]
srgb[im > bp] = np.power(fs*im[im > bp], inv_g) - co
return srgb
filename = 'sample__lin_bayer.tif' # Output of: dcraw -4 -D -T sample.dng
# Exif information:
blacklevel = 511
whitelevel = 12735
AsShotNeutral = np.array([0.5185, 1, 0.5458])
ColorMatrix = np.array([[ 0.6722, -0.0635, -0.0963],
[-0.4287, 1.2460, 0.2028],
[-0.0908, 0.2162, 0.5668]])
# bayer_type = 'rggb'
# Constant matrix for converting sRGB to XYZ(D65):
# http://www.brucelindbloom.com/Eqn_RGB_XYZ_Matrix.html
rgb2xyz = np.array([[0.4124564, 0.3575761, 0.1804375],
[0.2126729, 0.7151522, 0.0721750],
[0.0193339, 0.1191920, 0.9503041]])
# Read input image (Bayer mosaic alignement, pixeles data type is np.uint16):
raw = cv2.imread(filename, cv2.IMREAD_UNCHANGED)
# "Linearizing":
# There is no LinearizationTable so we only have to subtract the black level.
# Convert from range [blacklevel, whitelevel] to range [0, 1] (convert type to np.float64).
lin_bayer = (raw.astype(np.float64) - blacklevel) / (whitelevel - blacklevel)
lin_bayer = lin_bayer.clip(0, 1)
# The White Balance multipliers are 1./AsShotNeutral
wb_multipliers = 1 / AsShotNeutral
r_scale = wb_multipliers[0] # Assume value is above 1
g_scale = wb_multipliers[1] # Assume value = 1
b_scale = wb_multipliers[2] # Assume value is above 1
# Bayer alignment is RGGB:
# R G
# G B
#
# Apply white balancing to linear Bayer image.
balanced_bayer = lin_bayer.copy()
balanced_bayer[0::2, 0::2] = balanced_bayer[0::2, 0::2]*r_scale # Red (indices [0, 2, 4,... ; 0, 2, 4,... ])
balanced_bayer[0::2, 1::2] = balanced_bayer[0::2, 1::2]*g_scale # Green (indices [0, 2, 4,... ; 1, 3, 5,... ])
balanced_bayer[1::2, 0::2] = balanced_bayer[1::2, 0::2]*g_scale # Green (indices [1, 3, 5,... ; 0, 2, 4,... ])
balanced_bayer[1::2, 1::2] = balanced_bayer[1::2, 1::2]*b_scale # Blue (indices [1, 3, 5,... ; 0, 2, 4,... ])
# Clip to range [0, 1] for avoiding "pinkish highlights" (avoiding "magenta cast" in the highlights).
balanced_bayer = np.minimum(balanced_bayer, 1)
# Demosaicing:
temp = np.round((balanced_bayer*(2**16-1))).astype(np.uint16) # Convert from double to np.uint16, because OpenCV demosaic() function requires a uint8 or uint16 input.
lin_rgb = cv2.cvtColor(temp, cv2.COLOR_BayerBG2RGB).astype(np.float64)/(2**16-1) # Apply Demosaicing and convert back to np.float64 in range [0, 1] (is there a bug in OpenCV Bayer naming?).
# Color Space Conversion
xyz2cam = ColorMatrix # ColorMatrix applies XYZ(D65) to CAM_rgb
rgb2cam = xyz2cam # rgb2xyz
# Result:
# rgb2cam = [0.2619 0.1835 0.0252
# 0.0921 0.7620 0.2053
# 0.0195 0.1897 0.5379]
# Normalize rows to 1. MATLAB shortcut: rgb2cam = rgb2cam ./ repmat(sum(rgb2cam,2),1,3);
rows_sum = np.sum(rgb2cam, 1)
# Result:
# rows_sum = [0.4706
# 1.0593
# 0.7470]
# Divide element of every row by the sum of the row:
rgb2cam[0, :] = rgb2cam[0, :] / rows_sum[0] # Divide top row
rgb2cam[1, :] = rgb2cam[1, :] / rows_sum[1] # Divide center row
rgb2cam[2, :] = rgb2cam[2, :] / rows_sum[2] # Divide bottom row
# Result (sum of every row is 1):
# rgb2cam = [0.5566 0.3899 0.0535
# 0.0869 0.7193 0.1938
# 0.0261 0.2539 0.7200]
cam2rgb = np.linalg.inv(rgb2cam) # Invert matrix
# Result:
# cam2rgb = [ 1.9644 -1.1197 0.1553
# -0.2412 1.6738 -0.4326
# 0.0139 -0.5498 1.5359]
r = lin_rgb[:, :, 0]
g = lin_rgb[:, :, 1]
b = lin_rgb[:, :, 2]
# Left multiply matrix cam2rgb by each RGB tuple (convert from "camera RGB" to "linear sRGB").
sr = cam2rgb[0, 0]*r + cam2rgb[0, 1]*g + cam2rgb[0, 2]*b
sg = cam2rgb[1, 0]*r + cam2rgb[1, 1]*g + cam2rgb[1, 2]*b
sb = cam2rgb[2, 0]*r + cam2rgb[2, 1]*g + cam2rgb[2, 2]*b
lin_srgb = np.dstack([sr, sg, sb])
lin_srgb = lin_srgb.clip(0, 1) # Clip to range [0, 1]
# Convert from "Linear sRGB" to sRGB (apply gamma)
sRGB = lin2rgb(lin_srgb) # lin2rgb MATLAB functions uses the exact formula [you may approximate it to power of (1/gamma)].
# Save to sRGB.png
cv2.imwrite('sRGB.png', cv2.cvtColor((sRGB*255).astype(np.uint8), cv2.COLOR_RGB2BGR))
Results (downscaled):
Result of RawTherapee (all enhancements are disabled):
MATLAB result:
Python result:
Note:
The result looks dark due to low exposure (and because we didn't apply any brightness correction).

Dot "cpbitmap" Images (imgaename.cpbitmap)

How can I convert the .cpbitmap images to .png or common images type ?
Thank you :)

Actually, the idea to write python code is excellent for it is easier to execute than to run some xcode stuff.
As the previous author stated, he did not tested the code, but I did. What I found is that it produces the image in which RED and BLUE components are misplaces.
That is why I decided to post a correct version of this code here:
#!/usr/bin/python
from PIL import Image,ImageOps
import struct
import sys
if len(sys.argv) < 3:
print "Need two args: filename and result_filename\n";
sys.exit(0)
filename = sys.argv[1]
result_filename = sys.argv[2]
with open(filename) as f:
contents = f.read()
unk1, width, height, unk2, unk3, unk4 = struct.unpack('<6i', contents[-24:])
im = Image.fromstring('RGBA', (width,height), contents, 'raw', 'RGBA', 0, 1)
r,g,b,a = im.split()
im = Image.merge('RGBA', (b,g,r,a))
im.save(result_filename)
Put this code in the file decode_cpbitmap, do
chmod 755 decode_cpbitmap
to make it executable and now you may call it as follows:
./decode_cpbitmap input_filename output_filename
where input_filename is a file '*.cpbitmap' that you already have and want to decode, and output_filename is smth.png (it will be created by this code).
You may get an error
ImportError: No module named PIL
Then you need to install PIL python module. I will not explain how to install python modules for you may find it elsewhere.

Here's a quick Python program to do it. I wasn't able to test it because I don't have any .cpbitmap images to use.
from PIL import Image
import struct
with open(filename) as f:
contents = f.read()
unk1, width, height, unk2, unk3, unk4 = struct.unpack('<6i', contents[-24:])
im = Image.fromstring('RGBA', (width,height), contents, 'raw', 'RGBA', 0, 1)
im.save('converted.png')

I tried to convert images from iOS 11, and these scripts do not work. Today the size of each row is rounded up to a number of 8 pixels by padding.
I wrote Node.JS script. Before run script install module jimp (npm install jimp). Tested on Node.JS v9.2.0 and jimp 0.2.28.
const fs = require('fs')
const util = require('util')
const Jimp = require('jimp')
const main = async () => {
if (process.argv.length != 4) {
console.log('Need two args: input filename and result filename')
console.log(`Example: ${process.argv[0]} ${process.argv[1]} HomeBackground.cpbitmap HomeBackground.png`)
return
}
const inpFileName = process.argv[2]
const outFileName = process.argv[3]
const readFile = util.promisify(fs.readFile)
const cpbmp = await readFile(inpFileName)
const width = cpbmp.readInt32LE(cpbmp.length - 4 * 5)
const height = cpbmp.readInt32LE(cpbmp.length - 4 * 4)
console.log(`Image height: ${height}, width: ${width}`)
const image = await new Jimp(width, height, 0x000000FF)
const calcOffsetInCpbmp = (x, y, width) => {
const lineSize = Math.ceil(width / 8) * 8
return x * 4 + y * lineSize * 4
}
const calcOffsetInImage = (x, y, width) => {
return x * 4 + y * width * 4
}
const swapRBColors = (c) => {
const r = c & 0xFF
const b = (c & 0xFF0000) >> 16
c &= 0xFF00FF00
c |= r << 16
c |= b
return c
}
for (let y = 0; y < height; y++) {
for (let x = 0; x < width; x++) {
const color = cpbmp.readInt32LE(calcOffsetInCpbmp(x, y, width))
image.bitmap.data.writeInt32LE(swapRBColors(color), calcOffsetInImage(x, y, width))
}
}
await image.write(outFileName)
console.log('Done')
}
main()

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

user-defined kernels on multiple gpu in cupy - multi-gpu

Related

Using cython functions with multiprocessing failed

Errror message for py tesseract

nn.Conv2d causing error in multi-GPU learning

How to apply white balance coefficents to RAW image for sRGB output

Dot "cpbitmap" Images (imgaename.cpbitmap)

Categories

Resources