difference between convolution2d and conv2d in tensorflow in terms of ussage - filter

In TensorFlow for 2D convolution we have:
tf.nn.conv2d(input, filter, strides, padding, use_cudnn_on_gpu=None,
data_format=None, name=None)
and
tf.contrib.layers.convolution2d(*args, **kwargs)
I am not sure about differences?
I know that I should use the first one if I want to use a special filter, right? But what else? Especially about outputs?
Thank you

tf.nn.conv2d(...) is the core, low-level convolution functionality provided by TensorFlow. tf.contrib.layers.conv2d(...) is part of a higher-level API build around core-TensorFlow.
Note, that in current TensorFlow versions, parts of layers are now in core, too, e.g. tf.layers.conv2d.
The difference is simply, that tf.nn.conv2d is an op, that does convolution, nothing else. tf.layers.conv2d does more, e.g. it also creates variables for the kernel and the biases amongst other things.
Check out the Tensorflow Tutorial on CNNs which uses Tensorflow core (here). With the low-level API the convolutional layers are created like this:
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding='SAME')
W_conv1 = weight_variable([5, 5, 1, 32])
b_conv1 = bias_variable([32])
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
Compare that to the TF Layers Tutorial for CNNs (here). With TF Layers convolutional layers are create like this:
conv1 = tf.layers.conv2d(
inputs=input_layer,
filters=32,
kernel_size=[5, 5],
padding="same",
activation=tf.nn.relu)
Without knowing your use case: Most likely you want to use tf.layers.conv2d.

there will be no difference between tf.keras.layers.Conv2D and tf.keras.layers.Convolution2D in Tensorflow 2.x.
Here's the link for the illustration
In tensorflow 2.x, keras is an API in tensorflow

Related

Multiply Eigen::Matrices without transposing first

Say you have Matrix<int, 4, 2> and Matrix<int, 3, 2> which you want to multiply in the natural way that consumes the -1 dimension without first transposing.
Is this possible? Or do we have to transpose first. Which would be silly (unperformative) from a cache perspective, because now the elements we are multiplying and summing aren't contiguous.
Here's a playground. https://godbolt.org/z/Gdj3sfzcb
Pytorch provides torch.inner and torch.tensordot which do this.
Just like in Numpy, transpose() just creates a "view". It doesn't do any expensive memory operations (unless you assign it to a new matrix). Just call a * b.transpose() and let Eigen handle the details of the memory access. A properly optimized BLAS library like Eigen handles the transposition on smaller tiles in temporary memory for optimal performance.
Memory order still matters for fine tuning though. If you can, write your matrix multiplications in the form a.transpose() * b for column-major matrices (like Eigen, Matlab), or a * b.transpose() for row-major matrices like those in Numpy. That saves the BLAS library the trouble of doing that transposition.
Side note: You used auto for your result. Please read the Common Pitfalls chapter in the documentation. Your code didn't compute a matrix multiplication, it stored an expression of one.

Why is my Doc2Vec model in gensim not reproducible?

I have noticed that my gensim Doc2Vec (DBOW) model is sensitive to document tags. My understanding was that these tags are cosmetic and so they should not influence the learned embeddings. Am I misunderstanding something? Here is a minimal example:
from gensim.test.utils import common_texts
from gensim.models.doc2vec import Doc2Vec, TaggedDocument
import numpy as np
import os
os.environ['PYTHONHASHSEED'] = '0'
reps = []
for a in [0,500]:
documents = [TaggedDocument(doc, [i + a])
for i, doc in enumerate(common_texts)]
model = Doc2Vec(documents, vector_size=100, window=2, min_count=0,
workers=1, epochs=10, dm=0, seed=0)
reps.append(np.array([model.docvecs[k] for k in range(len(common_texts))])
reps[0].sum() == reps[1].sum()
This last line returns False. I am working with gensim 3.8.3 and Python 3.5.2. More generally, is there any role that the values of the tags play (assuming they are unique)? I ask because I have found that using different tags for documents in a classification task leads to widely varying performance.
Thanks in advance.
First & foremost, your test isn't even comparing vectors corresponding to the same texts!
In run #1, the vector for the 1st text in in model.docvecs[0]. In run #2, the vector for the 1st text is in model.docvecs[1].
And, in run #2, the vector at model.docvecs[0] is just a randomly-initialized, but never-trained, vector - because none of the training texts had a document tag of (int) 0. (If using pure ints as the doc-tags, Doc2Vec uses them as literal indexes - potentially leaving any unused slots less than your highest tag allocated-and-initialized, but never-trained.)
Since common_texts only has 11 entries, by the time you reach run #12, all the vectors in your reps array of the first 11 vectors are garbage uncorrelated with any of your texts/
However, even after correcting that:
As explained in the Gensim FAQ answer #11, determinism in this algorithm shouldn't generally be expected, given many sources of potential randomness, and the fuzzy/approximate nature of the whole approach. If you're relying on it, or testing for it, you're probably making some unwarranted assumptions.
In general, tests of these algorithms should be evaluating "roughly equivalent usefulness in comparative uses" rather than "identical (or even similar) specific vectors". For example, a test whether apple and orange are roughly at the same positions in each others' nearest-neighbor rankings makes more sense than checking their (somewhat arbitrary) exact vector positions or even cosine-similarity.
Additionally:
tiny toy datasets like common_texts won't show the algorithm's usual behavior/benefits
PYTHONHASHSEED is only consulted by the Python interpreter at startup; setting it from Python can't have any effect. But also, the kind of indeterminism it introduces only comes up with separate interpreter launches: a tight loop within a single interpreter run like this wouldn't be affected by that in any case.
Have you checked the magnitude of the differences?
Just running:
delta = reps[0].sum() - reps[1].sum()
for the aggregate differences results with -1.2598932e-05 when I run it.
Comparison dimension-wise:
eps = 10**-4
over = (np.abs(diff) <= eps).all()
Returns True on a vast majority of the runs which means that you are getting quite reproducible results given the complexity of the calculations.
I would blame numerical stability of the calculations or uncontrolled randomness. Even though you do try to control the random seed, there is a different random seed in NumPy and different in random standard library so you are not controlling for all of the sources of randomness. This can also have an influence on the results but I did not check the actual implementation in gensim and it's dependencies.
Change
import os
os.environ['PYTHONHASHSEED'] = '0'
to
import os
import sys
hashseed = os.getenv('PYTHONHASHSEED')
if not hashseed:
os.environ['PYTHONHASHSEED'] = '0'
os.execv(sys.executable, [sys.executable] + sys.argv)

How to use another library in the tensorflow graph?

I just read this article. The article says that the resize algorithm of tensorflow has some bugs. Now I want to use scipy.misc.imresize instead of tf.image.resize_images. And I wonder what is the best way to implement the scipy resize algorithm.
Let`s consider the following layer:
def up_sample(input_tensor, new_height, new_width):
_up_sampled = tf.image.resize_images(input_tensor, [new_height, new_width])
_conv = tf.layers.conv2d(_up_sampled, 32, [3,3], padding="SAME")
return _conv
How can I use the scipy algorithm in this layer?
Edit:
An example can be this function:
input_tensor = tf.placeholder("float32", [10, 200, 200, 8])
output_shape = [32, 210, 210, 8]
def up_sample(input_tensor, output_shape):
new_array = np.zeros(output_shape)
for batch in range(input_tensor.shape[0]):
for channel in range(input_tensor.shape[-1]):
new_array[batch, :, :, channel] = misc.imresize(input_tensor[batch, :, :, channel], output_shape[1:3])
But obviously scipy raises a ValueError that the the tf.Tensor object does not have the right shape. I read that during the a tf.Session the Tensors are accessible as numpy arrays. How can I use the scipy function only during a session and omit the execution in when creating the protocol buffer?
And is there a faster way than looping over all batches and channels?
Generally speaking, the tools you need are a combination of tf.map_fn and tf.py_func.
tf.py_func allows you to wrap a standard python function into a tensorflow op that is inserted into your graph.
tf.map_fn allows you to call a function repeatedly on the batch samples, when the function cannot operate on the whole batch — as it is often the case with image functions.
In the present case, I would probably advise to use scipy.ndimage.zoom on the basis that it can operate directly on the 4D tensor, which makes things simpler. On the other hand, it takes as input zoom factors, not sizes, so we need to compute them.
import tensorflow as tf
sess = tf.InteractiveSession()
# unimportant -- just a way to get an input tensor
batch_size = 13
im_size = 7
num_channel=5
x = tf.eye(im_size)[None,...,None] + tf.zeros((batch_size, 1, 1, num_channel))
new_size = 17
from scipy import ndimage
new_x = tf.py_func(
lambda a: ndimage.zoom(a, (1, new_size/im_size, new_size/im_size, 1)),
[x], [tf.float32], stateful=False)[0]
print(new_x.eval().shape)
# (13, 17, 17, 5)
You could use other functions (e.g. OpenCV's cv2.resize, Scikit-image's transform.image, Scipy's misc.imresize) but none can operate directly on 4D tensors and therefore are more verbose to use. You may still want to use them if you want an interpolation other than zoom's spline-based interpolation.
However, be aware of the following things:
Python functions are executed on the host. So, if you are executing your graph on a device like a graphics card, it needs to stop, copy the tensor to host memory, call your function, then copy the result back on the device. This can completely ruin your computation time if memory transfers are important.
Gradients do not pass through python functions. If your node is used, say, in an upscaling part of a network, layers upstream will not receive any gradient (or only part of it, if you have skip connections), which would compromise your training.
For those two reasons, I would advise to apply this kind of resampling to inputs only, when preprocessed on CPU and gradients are not used.
If you do want to use this upscale node for training on the device, then I see no alternative as to either stick with the buggy tf.image.resize_image, or to write your own.

What is a tensorflow session actually?

What does the sentence below quoted from getting started with tensorflow mean?
A session encapsulates the control and state of the TensorFlow runtime.
I know encapsulation in object oriented programming, and also have played with session a bit with success. Still I do not get this sentence very well. Can someone rephrase it in simple words?
This encapsulation has nothing to do with OOP encapsulation. A slightly better (in terms of understanding for a new-comer) definition is in session documentation.
A Session object encapsulates the environment in which Operation
objects are executed, and Tensor objects are evaluated.
Which means that none the operators and variables defined in the graph-definition part are being executed. For example nothing is being executed/calculated here
a = tf.Variable(tf.random_normal([3, 3], stddev=1.)
b = tf.Variable(tf.random_normal([3, 3], stddev=1.)
c = a + b
You will not get the values of a tensors a/b/c now. There values will be evaluated only inside of the Session.
Like Dali said, the encapsulation has nothing to do with OOP encapsulation.
the control and state of the TensorFlow runtime
the control: A TensorFlow graph is a description of computations. To compute anything, a graph must be launched in a Session.
state: A Session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them. These methods return tensors produced by ops as numpy ndarray objects in Python, and as tensorflow::Tensor instances in C and C++.

Is there a way to parallelize stacked RNNs over multiple GPUs in TensorFlow?

Is it possible to take the output of a tf.scan operation and stream it directly to a different GPU, effectively running two stacked RNNs on two GPUs in parallel? Something like this:
cell1 = tf.nn.rnn_cell.MultiRNNCell(..)
cell2 = tf.nn.rnn_cell.MultiRNNCell(..)
with tf.device("/gpu:0"):
ys1 = tf.scan(lambda a, x: cell1(x, a[1]), inputs,
initializer=(tf.zeros([batch_size, state_size]), init_state))
with tf.device("/gpu:1"):
ys2 = tf.scan(lambda a, x: cell2(x, a[1]), ys1,
initializer=(tf.zeros([batch_size, state_size]), init_state))
Will TensorFlow automatically take care of that optimization, or will it block the graph flow until the list ys1 is finalized.
Unfortunately, tf.scan has a "boundary" at the output, and all iterations have to complete before the output tensor can be read by the next operations. However, you can run the different levels of your lstm stack on different GPUs, and get frame parallelism within a scan. Write your own version of MultiRNNCell to use separate devices for each lstm layer.
Also you probably want to use tf.nn.dynamic_rnn instead of scan.

Resources