What does the sentence below quoted from getting started with tensorflow mean?
A session encapsulates the control and state of the TensorFlow runtime.
I know encapsulation in object oriented programming, and also have played with session a bit with success. Still I do not get this sentence very well. Can someone rephrase it in simple words?
This encapsulation has nothing to do with OOP encapsulation. A slightly better (in terms of understanding for a new-comer) definition is in session documentation.
A Session object encapsulates the environment in which Operation
objects are executed, and Tensor objects are evaluated.
Which means that none the operators and variables defined in the graph-definition part are being executed. For example nothing is being executed/calculated here
a = tf.Variable(tf.random_normal([3, 3], stddev=1.)
b = tf.Variable(tf.random_normal([3, 3], stddev=1.)
c = a + b
You will not get the values of a tensors a/b/c now. There values will be evaluated only inside of the Session.
Like Dali said, the encapsulation has nothing to do with OOP encapsulation.
the control and state of the TensorFlow runtime
the control: A TensorFlow graph is a description of computations. To compute anything, a graph must be launched in a Session.
state: A Session places the graph ops onto Devices, such as CPUs or GPUs, and provides methods to execute them. These methods return tensors produced by ops as numpy ndarray objects in Python, and as tensorflow::Tensor instances in C and C++.
Related
I'm implementing a distributed version of the Barnes-Hut n-body simulation in Chapel. I've already implemented the sequential and shared memory versions which are available on my GitHub.
I'm following the algorithm outlined here (Chapter 7):
Perform orthogonal recursive bisection and distribute bodies so that each process has equal amount of work
Construct locally essential tree on each process
Compute forces and advance bodies
I have a pretty good idea on how to implement the algorithm in C/MPI using MPI_Allreduce for bisection and simple message passing for communication between processes (for body transfer). And also MPI_Comm_split is a very handy function that allows me to split the processes at each step of ORB.
I'm having some trouble performing ORB using parallel/distributed constructs that Chapel provides. I would need some way to sum (reduce) work across processes (locales in Chapel), splitting processes into groups and process-to-process communication to transfer bodies.
I would be grateful for any advice on how to implement this in Chapel. If another approach would be better for Chapel that would also be great.
After a lot of deadlocks and crashes I did manage to implement the algorithm in Chapel. It can be found here: https://github.com/novoselrok/parallel-algorithms/tree/75312981c4514b964d5efc59a45e5eb1b8bc41a6/nbody-bh/dm-chapel
I was not able to use much of the fancy parallel equipment Chapel provides. I relied only on block distributed arrays with sync elements. I also replicated the SPMD model.
In main.chpl I set up all of the necessary arrays that will be used to transfer data. Each array has a corresponding sync array used for synchronization. Then each worker is started with its share of bodies and the previously mentioned arrays. Worker.chpl contains the bulk of the algorithm.
I replaced the MPI_Comm_split functionality with a custom function determineGroupPartners where I do the same thing manually. As for the MPI_Allreduce I found a nice little pattern I could use everywhere:
var localeSpace = {0..#numLocales};
var localeDomain = localeSpace dmapped Block(boundingBox=localeSpace);
var arr: [localeDomain] SomeType;
var arr$: [localeDomain] sync int; // stores the ranks of inteded receivers
var rank = here.id;
for i in 0..#numLocales-1 {
var intendedReceiver = (rank + i + 1) % numLocales;
var partner = ((rank - (i + 1)) + numLocales) % numLocales;
// Wait until the previous value is read
if (arr$[rank].isFull) {
arr$[rank];
}
// Store my value
arr[rank] = valueIWantToSend;
arr$[rank] = intendedReceiver;
// Am I the intended receiver?
while (arr$[partner].readFF() != rank) {}
// Read partner value
var partnerValue = arr[partner];
// Do something with partner value
arr$[partner]; // empty
// Reset write, which blocks until arr$[rank] is empty
arr$[rank] = -1;
}
This is a somewhat complicated way of implementing FIFO channels (see Julia RemoteChannel, where I got the inspiration for this "pattern").
Overview:
First each locale calculates its intended receiver and its partner (where the locale will read a value from)
Locale checks if the previous value was read
Locales stores a new value and "locks" the value by setting the arr$[rank] with the intended receiver
Locale waits while its partner sets the value and sets the appropriate intended receiver
Once the locale is the intended receiver it reads the partner value and does some operation on it
Then locale empties/unlocks the value by reading arr$[partner]
Finally it resets its arr$[rank] by writing -1. This way we also ensure that the value set by locale was read by the intended receiver
I realize that this might be an overly complicated solution for this problem. There probably exists a better algorithm that would fit Chapels global view of parallel computation. The algorithm I implemented lends itself to the SPMD model of computation.
That being said, I think that Chapel still does a good job performance-wise. Here are the performance benchmarks against Julia and C/MPI. As the number of processes grows the performance improves by quite a lot. I didn't have a chance to run the benchmark on a cluster with >4 nodes, but I think Chapel will end up with respectable benchmarks.
This is a pretty simple question. I'm practicing using MPI in Fortran, and as far as I can tell, the sending and receiving sources must be arrays, like this:
do 20 i = 1, 25
final_sum(1) = final_sum(1) + random_numbers(i)
20 continue
print *, 'process final sum: ', final_sum(1)
call MPI_GATHER(final_sum,1,MPI_INTEGER,sum_recv_buff,1,MPI_INTEGER,0,MPI_COMM_WORLD,ierror)
Here, I'm using the array final_sum to hold a single value. Is there a better way to do this? This is my first time using Fortran, and since I've already done some practice in C, I was trying out Fortran to see the differences and similarities.
High Performance Mark is right. There is actually normally no type checking when calling MPI routines that work with buffers. The MPI routines make use of the Fortran feature that enables calling procedures with implicit interface. On machine language level just a pointer is passed (it may be a pointer to a temporary copy!). That means you can use scalars or arrays without any problem. Just use count 1 and the correct MPI type and you can pass and receive scalars.
The MPI derived types is a feature that will enable you to work with even more complicated pieces of data.
I'm trying to create a 2d array where, when I access an index, will return the value. However, if an undefined index is accessed, it calls a callback and fills the index with that value, and then returns the value.
The array will have negative indexes, too, but I can overcome that by using 4 arrays (one for each quadrant around 0,0).
You can create a Matrix class that relies on tuples and dictionary, with the following behavior :
from collections import namedtuple
2DMatrixEntry = namedtuple("2DMatrixEntry", "x", "y", "value")
matrix = new dict()
defaultValue = 0
# add entry at 0;1
matrix[2DMatrixEntry(0,1)] = 10.0
# get value at 0;1
key = 2DMatrixEntry(0,1)
value = {defaultValue,matrix[key]}[key in matrix]
Cheers
This question is probably too broad for stackoverflow. - There is not a generic "one size fits all" solution for this, and the results depend a lot on the language used (and standard library).
There are several problems in this question. First of all let us consider a 2d array, we say this is simply already part of the language and that such an array grows dynamically on access. If this isn't the case, the question becomes really language dependent.
Now often when allocating memory the language automatically initializes the spots (again language dependent on how this happens and what the best method is, look into RAII). Though I can foresee that actual calculation of the specific cell might be costly (compared to allocation). In that case an interesting thing might be so called "two-phase construction". The array has to be filled with tuples/objects. The default construction of an object sets a bit/boolean to false - indicating that the value is not ready. Then on acces (ie a get() method or a operator() - language dependent) if this bit is false it constructs, else it just reads.
Another method is to use a dictionary/key-value map. Where the key would be the coordinates and the value the value. This has the advantage that the problem of construct-on-access is inherit to the datastructure (though again language dependent). The drawback of using maps however is that lookup speed of a value changes from O(1) to O(logn). (The actual time is widely different depending on the language though).
At last I hope you understand that how to do this depends on more specific requirements, the language you used and other libraries. In the end there is only a single data structure that is in each language: a long sequence of unallocated values. Anything more advanced than that depends on the language.
I'm studying purely functional language and currently thinking about some immutable data implementation.
Here is a pseudo code.
List a = [1 .. 10000]
List b = NewListWithoutLastElement a
b
When evaluating b, b must be copied in eager/strict implementation of immutable data.
But in this case, a is not used anymore in any place, so memory of 'a' can be re-used safely to avoid copying cost.
Furthermore, programmer can force compiler always do this by marking the type List with some keyword meaning must-be-disposed-after-using. Which makes compile time error on logic cannot avoid copying cost.
This can gain huge performance. Because it can be applied to huge object graph too.
How do you think? Any implementations?
This would be possible, but severely limited in scope. Keep in mind that the vast majority of complex values in a functional program will be passed to many functions to extract various properties from them - and, most of the time, those functions are themselves arguments to other functions, which means you cannot make any assumptions about them.
For example:
let map2 f g x = f x, g x
let apply f =
let a = [1 .. 10000]
f a
// in another file :
apply (map2 NewListWithoutLastElement NewListWithoutFirstElement)
This is fairly standard in functional code, and there is no way to place a must-be-disposed-after-using attribute on a because no specific location has enough knowledge about the rest of the program. Of course, you could try adding that information to the type system, but type inference on this is decidedly non-trivial (not to mention that types would grow quite large).
Things get even worse when you have compound objects, such as trees, that might share sub-elements between values. Consider this:
let a = binary_tree [ 1; 2; 5; 7; 9 ]
let result_1 = complex_computation_1 (insert a 6)
let result_2 = complex_computation_2 (remove a 5)
In order to allow memory reuse within complex_computation_2, you would need to prove that complex_computation_1 does not alter a, does not store any part of a within result_1 and is done using a by the time complex_computation_2 starts working. While the two first requirements might seem the hardest, keep in mind that this is a pure functional language: the third requirement actually causes a massive performance drop because complex_computation_1 and complex_computation_2 cannot be run on different threads anymore!
In practice, this is not an issue in the vast majority of functional languages, for three reasons:
They have a garbage collector built specifically for this. It is faster for them to just allocate new memory and reclaim the abandoned one, rather than try to reuse existing memory. In the vast majority of cases, this will be fast enough.
They have data structures that already implement data sharing. For instance, NewListWithoutFirstElement already provides full reuse of the memory of the transformed list without any effort. It's fairly common for functional programmers (and any kind of programmers, really) to determine their use of data structures based on performance considerations, and rewriting a "remove last" algorithm as a "remove first" algorithm is kind of easy.
Lazy evaluation already does something equivalent: a lazy list's tail is initially just a closure that can evaluate the tail if you need to—so there's no memory to be reused. On the other hand, this means that reading an element from b in your example would read one element from a, determine if it's the last, and return it without really requiring storage (a cons cell would probably be allocated somewhere in there, but this happens all the time in functional programming languages and short-lived small objects are perfectly fine with the GC).
Does anyone know the answer and/or have an opinion about this?
Since tuples would normally not be very large, I would assume it would make more sense to use structs than classes for these. What say you?
Microsoft made all tuple types reference types in the interests of simplicity.
I personally think this was a mistake. Tuples with more than 4 fields are very unusual and should be replaced with a more typeful alternative anyway (such as a record type in F#) so only small tuples are of practical interest. My own benchmarks showed that unboxed tuples up to 512 bytes could still be faster than boxed tuples.
Although memory efficiency is one concern, I believe the dominant issue is the overhead of the .NET garbage collector. Allocation and collection are very expensive on .NET because its garbage collector has not been very heavily optimized (e.g. compared to the JVM). Moreover, the default .NET GC (workstation) has not yet been parallelized. Consequently, parallel programs that use tuples grind to a halt as all cores contend for the shared garbage collector, destroying scalability. This is not only the dominant concern but, AFAIK, was completely neglected by Microsoft when they examined this problem.
Another concern is virtual dispatch. Reference types support subtypes and, therefore, their members are typically invoked via virtual dispatch. In contrast, value types cannot support subtypes so member invocation is entirely unambiguous and can always be performed as a direct function call. Virtual dispatch is hugely expensive on modern hardware because the CPU cannot predict where the program counter will end up. The JVM goes to great lengths to optimize virtual dispatch but .NET does not. However, .NET does provide an escape from virtual dispatch in the form of value types. So representing tuples as value types could, again, have dramatically improved performance here. For example, calling GetHashCode on a 2-tuple a million times takes 0.17s but calling it on an equivalent struct takes only 0.008s, i.e. the value type is 20× faster than the reference type.
A real situation where these performance problems with tuples commonly arises is in the use of tuples as keys in dictionaries. I actually stumbled upon this thread by following a link from the Stack Overflow question F# runs my algorithm slower than Python! where the author's F# program turned out to be slower than his Python precisely because he was using boxed tuples. Manually unboxing using a hand-written struct type makes his F# program several times faster, and faster than Python. These issues would never had arisen if tuples were represented by value types and not reference types to begin with...
The reason is most likely because only the smaller tuples would make sense as value types since they would have a small memory footprint. The larger tuples (i.e. the ones with more properties) would actually suffer in performance since they would be larger than 16 bytes.
Rather than have some tuples be value types and others be reference types and force developers to know which are which I would imagine the folks at Microsoft thought making them all reference types was simpler.
Ah, suspicions confirmed! Please see Building Tuple:
The first major decision was whether
to treat tuples either as a reference
or value type. Since they are
immutable any time you want to change
the values of a tuple, you have to
create a new one. If they are
reference types, this means there can
be lots of garbage generated if you
are changing elements in a tuple in a
tight loop. F# tuples were reference
types, but there was a feeling from
the team that they could realize a
performance improvement if two, and
perhaps three, element tuples were
value types instead. Some teams that
had created internal tuples had used
value instead of reference types,
because their scenarios were very
sensitive to creating lots of managed
objects. They found that using a value
type gave them better performance. In
our first draft of the tuple
specification, we kept the two-,
three-, and four-element tuples as
value types, with the rest being
reference types. However, during a
design meeting that included
representatives from other languages
it was decided that this "split"
design would be confusing, due to the
slightly different semantics between
the two types. Consistency in behavior
and design was determined to be of
higher priority than potential
performance increases. Based on this
input, we changed the design so that
all tuples are reference types,
although we asked the F# team to do
some performance investigation to see
if it experienced a speedup when using
a value type for some sizes of tuples.
It had a good way to test this, since
its compiler, written in F#, was a
good example of a large program that
used tuples in a variety of scenarios.
In the end, the F# team found that it
did not get a performance improvement
when some tuples were value types
instead of reference types. This made
us feel better about our decision to
use reference types for tuple.
If the .NET System.Tuple<...> types were defined as structs, they would not be scalable. For instance, a ternary tuple of long integers currently scales as follows:
type Tuple3 = System.Tuple<int64, int64, int64>
type Tuple33 = System.Tuple<Tuple3, Tuple3, Tuple3>
sizeof<Tuple3> // Gets 4
sizeof<Tuple33> // Gets 4
If the ternary tuple were defined as a struct, the result would be as follows (based on a test example I implemented):
sizeof<Tuple3> // Would get 32
sizeof<Tuple33> // Would get 104
As tuples have built-in syntax support in F#, and they are used extremely often in this language, "struct" tuples would pose F# programmers at risk of writing inefficient programs without even being aware of it. It would happen so easily:
let t3 = 1L, 2L, 3L
let t33 = t3, t3, t3
In my opinion, "struct" tuples would cause a high probability of creating significant inefficiencies in everyday programming. On the other hand, the currently existing "class" tuples also cause certain inefficiencies, as mentioned by #Jon. However, I think that the product of "occurrence probability" times "potential damage" would be much higher with structs than it currently is with classes. Therefore, the current implementation is the lesser evil.
Ideally, there would be both "class" tuples and "struct" tuples, both with syntactic support in F#!
Edit (2017-10-07)
Struct tuples are now fully supported as follows:
Built into mscorlib (.NET >= 4.7) as System.ValueTuple
Available as NuGet for other versions
Syntactic support in C# >= 7
Syntactic support in F# >= 4.1
For 2-tuples, you can still always use the KeyValuePair<TKey,TValue> from earlier versions of the Common Type System. It's a value type.
A minor clarification to the Matt Ellis article would be that the difference in use semantics between reference and value types is only "slight" when immutability is in effect (which, of course, would be the case here). Nevertheless, I think it would have been best in the BCL design not to introduce the confusion of having Tuple cross over to a reference type at some threshold.
I don't know but if you have ever used F# Tuples are part of the language. If I made a .dll and returned a type of Tuples it be nice to have a type to put that in. I suspect now that F# is part of the language (.Net 4) some modifications to CLR were made to accommodate some common structures in F#
From http://en.wikibooks.org/wiki/F_Sharp_Programming/Tuples_and_Records
let scalarMultiply (s : float) (a, b, c) = (a * s, b * s, c * s);;
val scalarMultiply : float -> float * float * float -> float * float * float
scalarMultiply 5.0 (6.0, 10.0, 20.0);;
val it : float * float * float = (30.0, 50.0, 100.0)