I am trying to reduce the resolution of an image to speed up training. So I used tf.nn.max_pool method to operate on my raw image. I am expecting the resultant image is a blurred one with smaller size, but actually it is not.
My raw image has shape [320, 240, 3], and it looks like:
And after max_pooling, with ksize=[1,2,2,1] and strides=[1,2,2,1] it becomes
produced by the following code:
# `img` is an numpy.array with shape [320, 240, 3]
# since tf.nn.max_pool only receives tensor with size
# [batch_size, height,width,channel], so I need to reshape
# the image to have a dummy dimension.
img_tensor = tf.placeholder(tf.float32, shape=[1,320,240,3])
pooled = tf.nn.max_pool(img_tensor, ksize=[1,2,2,1], strides=[1,2,2,1],padding='VALID')
pooled_img = pooled.eval(feed_dict={img_tensor: img.reshape([1,320,240,3])})
plt.imshow(np.squeeze(pooled_img, axis=0))
The pooled image has shape [160, 120, 3] which is expected. Its just the transformation behaviour is really confused me. It shouldnt have that "repeated shifting" behaviour, since there is no pixel overlapping computation.
Many thanks in advance.
I think the problem is how your image has been reshaped. This image actually has the shape of [240, 320, 3].
So try to use [1, 240, 320, 3]) instead of [1, 320, 240, 3]). It should work.
Related
I am currently trying to understand the inception-v3 architecture and was taking a closer look at the definition of the model's layers:
with scopes.arg_scope([ops.conv2d, ops.max_pool, ops.avg_pool],stride=1, padding=’VALID’):
# 299 x 299 x 3
end_points[’conv0’] = ops.conv2d(inputs, 32, [3, 3], stride=2,scope=’conv0’)
# 149 x 149 x 32
end_points[’conv1’] = ops.conv2d(end_points[’conv0’], 32, [3, 3], scope=’conv1’)
# 147 x 147 x 32
end_points[’conv2’] = ops.conv2d(end_points[’conv1’], 64, [3, 3], padding=’SAME’, scope=’conv2’)
# 147 x 147 x 64
end_points[’pool1’] = ops.max_pool(end_points[’conv2’], [3, 3], stride=2, scope=’pool1’)
# 73 x 73 x 64
end_points[’conv3’] = ops.conv2d(end_points[’pool1’], 80, [1, 1], scope=’conv3’)
# 73 x 73 x 80.
end_points[’conv4’] = ops.conv2d(end_points[’conv3’], 192, [3, 3], scope=’conv4’)
# 71 x 71 x 192.
end_points[’pool2’] = ops.max_pool(end_points[’conv4’], [3, 3], stride=2, scope=’pool2’)
# 35 x 35 x 192.
net = end_points[’pool2’]
Checking the dimensions of each layer, I first had to take a look at the different padding styles: VALID and SAME. VALID will discard edges, while SAME will actually pad equally on both sides, so convolution still works on edges.
This holds for example for the first layer with 299x299 pixels to 149x149 with a stride of 2, so we only consider all odd pixels [Filter size: [3,3]] and end up with a dimension of 149x149, not 150x150 because padding is VALID (edges are discarded). Convolving this layer again, with the same filter size but now a stride of 1 we get 147x147 due to the edges "suffering" from being discarded. This layer then is again convolved but now with the twist, that padding is set to SAME which results in the same dimension of 147x147 as the layer before.
Now comes the spot that confuses me:
Assuming, SAME padding was only valid for the conv2 layer and is globally still set to VALID, the dimension for pool1 is correctly shown as 73x73 due to discarding the edge. When now going to the next convolutional layer conv3 I would expect it to become 71x71, taken the VALID padding as active. However, the output of conv3 remains at 73x73, which means, that SAME padding is used. But in conv4, the padding now seems to be VALID, reducing the dimension to 71x71 confusing me totally.
In the readme on github of slim's arg_scope I found, that setting one of the arguments locally overrides the global argument given:
with slim.arg_scope([slim.ops.conv2d], padding='SAME', stddev=0.01, weight_decay=0.0005):
net = slim.ops.conv2d(inputs, 64, [11, 11], scope='conv1')
net = slim.ops.conv2d(net, 128, [11, 11], padding='VALID', scope='conv2')
net = slim.ops.conv2d(net, 256, [11, 11], scope='conv3')
As the example illustrates, the use of arg_scope makes the code
cleaner, simpler and easier to maintain. Notice that while argument
values are specifed in the arg_scope, they can be overwritten locally.
In particular, while the padding argument has been set to 'SAME', the
second convolution overrides it with the value of 'VALID'.
However, this would mean, that conv4 should also have dimension of 73x73 because the padding would be SAME, so preserving the edges and the final pooling layer pool2 would then even be 37x37.
What is the thing that I am missing? Where is my mistake?
Thank you for helping me, I hope I have made the confusing problem clear.
I didn't see the filter size for the pool1 layer is actually [1,1] so it is not reducing the dimensions and has nothing to do with the arg_scope as it stays exactly how it should.
I'm looking for an algorithm that will detect the end of a curved line. I'm going to convert a binary image into a point cloud as coordinates, and I need to find the end of the line so I can start another algorithm.
I was thinking of taking the average of vectors for the N nearest '1' pixels to each point, and saying that the pixel with the longest vector must be an endpoint, because if a point is in the middle of a line then the average of the vectors will cancel out. However, I figure this must be a problem that is well known in image processing so I thought I'd throw it up here to see if anybody knows a 'proper' algorithm.
If the line will only ever be one or perhaps two pixels thick, you can use the approach suggested by Malcolm McLean in a comment.
Otherwise, one way to do this is to compute, for each red pixel, the red pixel in the same component that is furthest away, as well as how far away that furthest pixel is. (In graph theory terms, the distance between these two pixels is the eccentricity of each pixel.) Pixels near the end of a long line will have the greatest eccentricities, because the shortest path between them and points at the other end of the line is long. (Notice that, whatever the maximum eccentricity turns out to be, there will be at least two pixels having it, since the distance from a to b is the same as the distance from b to a.)
If you have n red pixels, all eccentricities (and corresponding furthest pixels) can be computed in O(n^2) time: for each pixel in turn, start a BFS at that pixel, and take the deepest node you find as its furthest pixel (there may be several; any will do). Each BFS runs in O(n) time, because there are only a constant number of edges (4 or 8, depending on how you model pixel connectivity) incident on any pixel.
For robustness you might consider taking the top 10 or 50 (etc.) pixel pairs and checking that they form 2 well-separated, well-defined clusters. You could then take the average position within each cluster as your 2 endpoints.
If you apply thinning to the line, so your line is just one pixel thick, You can leverage morphologyEX and use MORPH_HITMISS in OpenCV. Essentially you create a template (kernel or filter) for every possible corner (there are 8 possible) and convolve by each one. The result of each convolution will be 1 in the place where the kernel matches and 0 otherwise. So you can do the same manually if you feell that you can do a better job in c.
here is an example. It takes as input_image any image of zeros and ones where the lines are one pixel thick.
import numpy as np
import cv2
import matplotlib.pylab as plt
def find_endoflines(input_image, show=0):
kernel_0 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1, 1, -1]), dtype="int")
kernel_1 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[1,-1, -1]), dtype="int")
kernel_2 = np.array((
[-1, -1, -1],
[1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_3 = np.array((
[1, -1, -1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_4 = np.array((
[-1, 1, -1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_5 = np.array((
[-1, -1, 1],
[-1, 1, -1],
[-1,-1, -1]), dtype="int")
kernel_6 = np.array((
[-1, -1, -1],
[-1, 1, 1],
[-1,-1, -1]), dtype="int")
kernel_7 = np.array((
[-1, -1, -1],
[-1, 1, -1],
[-1,-1, 1]), dtype="int")
kernel = np.array((kernel_0,kernel_1,kernel_2,kernel_3,kernel_4,kernel_5,kernel_6, kernel_7))
output_image = np.zeros(input_image.shape)
for i in np.arange(8):
out = cv2.morphologyEx(input_image, cv2.MORPH_HITMISS, kernel[i,:,:])
output_image = output_image + out
return output_image
if show == 1:
show_image = np.reshape(np.repeat(input_image, 3, axis=1),(input_image.shape[0],input_image.shape[1],3))*255
show_image[:,:,1] = show_image[:,:,1] - output_image *255
show_image[:,:,2] = show_image[:,:,2] - output_image *255
plt.imshow(show_image)
Similarly to the Caffe framework, where it is possible to watch the learned filters during CNNs training and it's resulting convolution with input images, I wonder if is it possible to do the same with TensorFlow?
A Caffe example can be viewed in this link:
http://nbviewer.jupyter.org/github/BVLC/caffe/blob/master/examples/00-classification.ipynb
Grateful for your help!
To see just a few conv1 filters in Tensorboard, you can use this code (it works for cifar10)
# this should be a part of the inference(images) function in cifar10.py file
# conv1
with tf.variable_scope('conv1') as scope:
kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
stddev=1e-4, wd=0.0)
conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope.name)
_activation_summary(conv1)
with tf.variable_scope('visualization'):
# scale weights to [0 1], type is still float
x_min = tf.reduce_min(kernel)
x_max = tf.reduce_max(kernel)
kernel_0_to_1 = (kernel - x_min) / (x_max - x_min)
# to tf.image_summary format [batch_size, height, width, channels]
kernel_transposed = tf.transpose (kernel_0_to_1, [3, 0, 1, 2])
# this will display random 3 filters from the 64 in conv1
tf.image_summary('conv1/filters', kernel_transposed, max_images=3)
I also wrote a simple gist to display all 64 conv1 filters in a grid.
When marking tick locations on a plot, are there any standard solutions to how to place the tick markers? I looked at Matplotlib's MaxNLocator (https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/ticker.py#L1212) but it's not immediately clear what all the different options do, or which of them are necessary for basic tick placement.
Can someone provide pseudocode for a simple tick location function?
I think the rule of thumb for placing ticks on a plot is to use multiples of 1, 2, 5, and 10. In my experience, matplotlib seems to abide by this. If you have reason to deviate from the default ticks, I think the easiest way to set them is to use the set_ticks() method for a particular axis. The relevant documentation is here: http://matplotlib.org/api/axis_api.html.
Example
import numpy as np
import matplotlib.pyplot as plt
ax = plt.subplot() # create axes to plot into
foo = np.array([0, 4, 12, 13, 18, 22]) # awkwardly spaced data
bar = np.random.rand(6) # random bar heights
plt.bar(foo, bar) # bar chart
ax.xaxis.get_ticklocs() # check tick locations -- currently array([ 0., 5., 10., 15., 20., 25.])
ax.xaxis.set_ticks(foo) # set the ticks to be right at each bar
ax.xaxis.get_ticklocs() # array([ 0, 4, 12, 13, 18, 22])
plt.draw()
ax.xaxis.set_ticks([0, 10, 20]) # minimal set of ticks
ax.xaxis.get_ticklocs() # array([ 0, 10, 20])
plt.draw()
Of the three options in my example, I would keep the default behaviour in this case; but there are definitely times when I would override the defaults. For example, another rule of thumb is that we should minimize the amount of ink in our plots that is not data (i.e. markers and lines). So if the default tick set was [0, 1, 2, 3, 4, 5, 6], I might change that to [0, 2, 4, 6], since that's less ink for the plot ticks without losing clarity.
Edit: The ticks at [0, 10, 20] can also be accomplished with locators, as suggested in the comment. Examples:
ax.xaxis.set_major_locator(plt.FixedLocator([0,10,20]))
ax.xaxis.set_major_locator(plt.MultipleLocator(base=10))
ax.xaxis.set_major_locator(plt.MaxNLocator(nbins=3))
Is there something like an anti-filter in image processing?
Say for instance, I am filtering an image using the following 13 tap symmetric filter:
{0, 0, 5, -6, -10, 37, 76, 37, -10, -6, 5, 0, 0} / 128
Each pixel is changed by this filtering process. My question is can we get back the original image by doing some mathematical operation on the filtered image.
Obviously such mathematical operations exists for trivial filters, like:
{1, 1} / 2
Can we generalize this to complex filters like the one I mentioned at the beginning?
Here is a pointer to one method of deconvolution - taking account of noise which in your case I guess you have due to rounding error - http://en.wikipedia.org/wiki/Wiener_deconvolution