Tensorflow - shuffle & split dataset of images and labels - image

New with Tensorflow, I'm using neural networks to classify images. I've got a Tensor that contains images, of shape [N, 128, 128, 1] (N images 128x128 with 1 channel), and a Tensor of shape [N] that contains the labels of the images.
I want to shuffle it all and split it between training and testing tensors (let's say 80%-20%). I didn't find a way to 'zip' my tensors to associate each image with its label (in order to shuffle images and labels the same way). Is it possible ? If not, how can I achieve that shuffling/splitting job ?
Thanks for any help !

Just use the same 'seed' keyword parameter value, say seed=8 in function
tf.random_shuffle for both labels and data.
ipdb> my_data = tf.convert_to_tensor([[1,1], [2,2], [3,3], [4,4],
[5,5], [6,6], [7,7], [8,8]])
ipdb> my_labels = tf.convert_to_tensor([1,2,3,4,5,6,7,8])
ipdb> sess.run(tf.random_shuffle(my_data, seed=8))
array([[5, 5],
[3, 3],
[1, 1],
[7, 7],
[2, 2],
[8, 8],
[4, 4],
[6, 6]], dtype=int32)
ipdb> sess.run(tf.random_shuffle(my_labels, seed=8))
array([5, 3, 1, 7, 2, 8, 4, 6], dtype=int32)
EDIT:
if you need random shuffling in runtime, where batches, say, will be shuffled randomly but differendly, you may use such a trick:
# each time shuffling pattern will be differend
# for now, it works
indicies = tf.random_shuffle(tf.range(8))
params = tf.convert_to_tensor([111, 222, 333, 444, 555, 666, 777, 888])
sess.run(tf.add(tf.gather(params, indicies), tf.gather(params, indicies) * 1000))
> array([555555, 444444, 666666, 222222, 111111, 888888, 333333, 777777], dtype=int32)
numbers consisting of the same digits show, that gather<-indicies take the same seed value

Related

How to vectorize getting sub arrays from numpy array using indexing arrays

I want to get a numpy array of sub arrays from a base array using some type of indexing arrays (style/format of indexing arrays open for suggestions). I can easily do this with a for loop, but wondering if there is a clever way to use numpy broadcasting?
Constraints: Sub-arrays are guaranteed to be the same size.
up_idx = np.array([[0, 0],
[0, 2],
[1, 1]])
lw_idx = np.array([[2, 2],
[2, 4],
[3, 3]])
base = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
samples = []
for index in range(up_idx.shape[0]):
up_row = up_idx[index, 0]
up_col = up_idx[index, 1]
lw_row = lw_idx[index, 0]
lw_col = lw_idx[index, 1]
samples.append(base[up_row:lw_row, up_col:lw_col])
samples = np.array(samples)
print(samples)
> [[[ 1 2]
[ 5 6]]
[[ 3 4]
[ 7 8]]
[[ 6 7]
[10 11]]]
I've tried:
vector_s = base[up_idx[:, 0]:lw_idx[:, 1], up_idx[:, 1]:lw_idx[:, 1]]
But that was just nonsensical it seems.
I don't think there is a fast way to do this in general via numpy broadcasting operations – for one thing, the way you set up the problem there is no guarantee that the resulting sub-arrays will be the same shape, and thus able to fit into a single output array.
The most succinct and efficient way to solve this is probably via a list comprehension; e.g.
result = np.array([base[i1:i2, j1:j2] for (i1, j1), (i2, j2) in zip(up_idx, lw_idx)])
Unless your base array is very large, this shouldn't be much of a bottleneck.
If you have different problem constraints (i.e. same size slice in every case) it may be possible to come up with a faster vectorized solution based on fancy indexing. For example, if every slice is of size two (as in your example above) then you can use fancy indexing like this to obtain the same result:
i, j = up_idx.T[:, :, None] + np.arange(2)
result = base[i[:, :, None], j[:, None]]
The key to understanding this fancy indexing is to realize that the result follows the broadcasted shape of the index arrays.

Convert a nested array to a matrix in Ruby?

When converting a nested array to a matrix in Ruby, the matrix ends up with an extra [] around the values, compared to simply creating a matrix from scratch.
> require 'matrix'
> matrix1 = Matrix[[1,2,3],[4,5,6],[7,8,9]]
> p matrix1
=> Matrix[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
> nested_array = [[1,2,3],[4,5,6],[7,8,9]]
> matrix2 = Matrix[nested_array]
> p matrix2
=> Matrix[[[1, 2, 3], [4, 5, 6], [7, 8, 9]]]
Is there a way to avoid the extra square brackets when building from an array?
matrix2 = Matrix[*nested_array]
p matrix2
=> Matrix[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
The asterisk (*) there is called the "splat operator," and it essentially can be used to treat an array (nested_array in this case) as if it weren't an array, but rather as if its elements were individual elements/arguments.

How to generate partially repeated permutations in ruby?

I have a range of numbers R = (1..n). I also have another character 'a'. I want to generate strings of length L (L > n + 2) that have all the numbers in the same order, but go through every repeated permutation of 'a' to fill the length L. For example, if n = 3, and L = 7, then some valid strings would be :
"123aaaa",
"1a23aaa",
"1aa2a3a",
"aaaa123"
while the following strings would be invalid:
"213aaaa", # invalid, because 1,2,3 are not in order
"123a", #invalid, because length < L
"1123aaa", # invalid because a number is repeated
I am currently doing this, which is way too inefficient:
n = 3
L = 7
all_terms = (1..n).to_a + Array.new(L - n, 'a')
all_terms.permutation.each do |permut|
if(valid_permut? permut) # checks if numbers are in their natural order
puts permut.join
end
end
How do I directly generate valid strings more efficiently?
The problem is equivalent to: select n elements from index 0 to L - 1, fill these with 1 to n accordingly, and fill the rest with some constant character.
In your example, it's taking 3 elements from 0..6:
(0..6).to_a.combination(3).to_a
=> [[0, 1, 2], [0, 1, 3], [0, 1, 4], [0, 1, 5], [0, 1, 6], [0, 2, 3], [0, 2, 4],
[0, 2, 5], [0, 2, 6], [0, 3, 4], [0, 3, 5], [0, 3, 6], [0, 4, 5], [0, 4, 6], [0, 5, 6],
[1, 2, 3], [1, 2, 4], [1, 2, 5], [1, 2, 6], [1, 3, 4], [1, 3, 5], [1, 3, 6], [1, 4, 5],
[1, 4, 6], [1, 5, 6], [2, 3, 4], [2, 3, 5], [2, 3, 6], [2, 4, 5], [2, 4, 6], [2, 5, 6],
[3, 4, 5], [3, 4, 6], [3, 5, 6], [4, 5, 6]]
Every subarray here represents a possible result. For example, [0, 2, 3] corresponds to '0a12aaa', [3, 5, 6] corresponds to 'aaa0a12', etc. The code for this conversion is straight-forward.
You can model this as all possible interleavings of two strings, where relative order of the input elements is preserved. Here's a recursive solution. It works by choosing an element from one list, and prepending it to all possible subproblems, then doing it again where an element is chosen from the second list instead, and combining the two solution sets at the end.
# Returns an array of all possible interleaving of two strings
# Maintains relative order of each character of the input strings
def interleave_strings_all(a1, a2)
# Handle base case where at least one input string is empty
return [a1 + a2] if a1.empty? || a2.empty?
# Place element of first string, and prepend to all subproblems
set1 = interleave_strings_all(a1[1..-1], a2).map{|x| a1[0] + x}
# Place element of second string and prepend to all subproblems
set2 = interleave_strings_all(a1, a2[1..-1]).map{|x| a2[0] + x}
# Combine solutions of subproblems into overall problem
return set1.concat(set2)
end
if __FILE__ == $0 then
l = 5
n = 3
a1 = (1..n).to_a.map{|x| x.to_s}.join()
a2 = 'a' * (l - n)
puts interleave_strings_all(a1, a2)
end
The output is:
123aa
12a3a
12aa3
1a23a
1a2a3
1aa23
a123a
a12a3
a1a23
aa123

Sorting Array of Arrays having variable number of elements

I have to sort an array of arrays. I've searched for solutions however my problem is:
need to sort arrays that may have different sizes from a script run to another.
need to sort not only by one or two elements, but, if possible based in all elements.
For example, for the following inputs:
[[2,3,4,5,6],[1,3,4,5,7],[1,3,4,5,8]]
[[5,2,3],[2,2,4],[2,2,5]]
The output should be, respectively:
[[1,3,4,5,7],[1,3,4,5,8],[2,3,4,5,6]]
[[2,2,4],[2,2,5],[5,2,3]]
Do as below
input=[[2,3,4,5,6],[1,3,4,5,7],[1,3,4,5,8]]
input.sort # => [[1, 3, 4, 5, 7], [1, 3, 4, 5, 8], [2, 3, 4, 5, 6]]

Given some integer ranges, finding a smallest set containing at least one integer from each range

How can I find a set of minimum number of integers such that, for some given ranges of integers, for each range, the set contains at least one integer. For example, if I'm given these ranges :
[0, 4], [1, 2], [5, 7], [6, 7], [6, 9], [8, 10]
Then some solution sets are : { 1, 6, 8 }, { 2, 7, 9 }, { 1, 7, 8 } etc.
Imagine you draw all your ranges, ordered by end value, as you would draw meetings inside a day planner.
You can visually choose your numbers in a greedy manner, such that the first one is the segment that finishes first (in your example, that would be 2).
Then you erase all segments that contain that number, and you start all over.
This algo would yield solution { 2, 7, 10 }
0 1 2 3 4 5 6 7 8 9 10
----
-------------
^ -------
| ----
----------
^ -------
| ^
|
Algorithm:
Sort the start and end points. Pass over them until you meet an endpoint. Add it to the answer and remove all ranges which startpoints already passed (i.e. which contain current endpoint). Repeat until there's any point left.
Example:
[0, 4], [1, 2], [5, 7], [6, 7], [6, 9], [8, 10]
After sorting will become
[0, [1, 2], 4], [5, [6, [6, 7], 7], [8, 9], 10], ans = []
First endpoint is 2], we add it to ans and remove ranges opened before it, i.e. [0 and [1:
[5, [6, [6, 7], 7], [8, 9], 10], ans = [2]
Now first endpoint is 7] and we remove ranges [5, 7], [6, 7], [6, 9]:
[8, 9], ans = [2, 7]
Finally add 9 and remove the last range. The result will be [2, 7, 9].
Complexity:
Sorting will take O(nlogn) time, after that you'll pass on each element twice: once when looking for next endpoing and once when removing all currently opened intervals, which is linear, and total complexity will be O(nlogn) which comes from sorting.
We sort the intervals by the end numbers. For any interval, if its start is not greater than the previous end, (end is not smaller than the previous end since the intervals have been sorted), then we have an overlap at the previous end, and can skip this interval. If the start of the current interval is greater than the previous end, we have no overlap, and add the current end to the result set.
Consider the intervals (0, 3), (2, 6), (3, 4), (6, 10). After sorting, we have (0, 3), (3, 4), (2, 6), (6, 10). We start with result = [3] and previous = 3. Since 3 <= previous, we skip the interval (3, 4); previous remains unchanged. Since 2 <= previous, we skip the interval (2, 6); previous remains unchanged. Lastly, since 6 > previous, we add 10 to the result, and update previous = 10. The algorithm terminates; the answer is [3, 10].
Time complexity: n log(n), where n is the number of intervals.

Resources