Parallelizing code (attempted using multiprocessing) - parallel-processing

I am new to Python and have a parallelization question.
I am filling up matrices A and B using the function fill_mat() as follows:
for ib in range(par.nB):
# Run function for each ib
A[:, :, ib], B[:, :, :, :, ib] = fill_mat(ib, param1, param2, pardict)
where ib, param1, and param2 are numbers and pardict is a dictionary. Since each of these iterations is very expensive I was wondering what is the best way for me to parallelize over ib?
I have tried using par pool.apply from multiprocessing but I think I am using it incorrectly because I have the function fill_mat print a sentence at the beginning and this is not printed all at once for all nB cases. Here is what I do:
print("Number of processors: ", mp.cpu_count())
pool = mp.Pool(mp.cpu_count())
results = [pool.apply(fill_mat, args=(ib, param1, param2, pardict)) for b in range(par.nB)]
Thank you for any help you are able to give.

You've made a mistake in your iteration. You've used b, however, you should use ib. See the following example:
import multiprocessing
def fill_mat(ib, param1, param2, pardict):
print(ib*ib)
pool = multiprocessing.Pool(multiprocessing.cpu_count())
results = [pool.apply(fill_mat, args=(ib, "param1", "param2", "pardict"))
for ib in range(10)]

Related

Python: Printing vertically

The final code will print the distance between states. I'm trying to print the menu with the names of the states numbered and vertically. I really struggle to find my mistakes.
This code doesn't raise any error, it just prints nothing, empty.
state_data = """
LA 34.0522°N 118.2437°W
Florida 27.6648°N 81.5158°W
NY 40.7128°N 74.0060°W"""
states = []
import re
state_data1 = re.sub("[°N#°E]", "", state_data)
def process_states(string):
states_temp = string.split()
states = [(states_temp[x], float(states_temp[x + 1]), float(states_temp[x + 2])) for x in
range(0, len(states_temp), 3)]
return states
def menu():
for state_data in range(state_data1):
print(f'{state_data + 1} {name[number]}')
My first guess is, your code does not print anything without errors because you never actually execute process_airports() nor menu().
You have to call them like this at the end of your script:
something = process_airports(airport_data1)
menu()
This will now raise some errors though. So let's address them.
The menu() function will raise an error because neither name nor number are defined and because you are trying to apply the range function over a string (airport_data1) instead of an integer.
First fixing the range error: you mixed two ideas in your for-loop: iterating over the elements in your list airport_data1 and iterating over the indexes of the elements in the list.
You have to choose one (we'll see later that you can do both at once), in this example, I choose to iterate over the indexes of the list.
Then, since neither name nor number exists anywhere they will raise an error. You always need to declare variables somewhere, however, in this case they are not needed at all so let's just remove them:
def menu(data):
for i in range(len(data)):
print(f'{i + 1} {data[i]}')
processed_airports = process_airports(airport_data1)
menu(processed_airports)
Considering data is the output of process_airports()
Now for some general advices and improvements.
First, global variables.
Notice how you can access airport_data1 within the menu() function just fine, while it works this is not something recommended, it's usually better to explicitly pass variables as arguments.
Notice how in the function I proposed above, every single variable is declared in the function itself, there is no information coming from a higher scope. Again, this is not mandatory but makes the code way easier to work with and understand.
airport_data = """
Alexandroupoli 40.855869°N 25.956264°E
Athens 37.936389°N 23.947222°E
Chania 35.531667°N 24.149722°E
Chios 38.343056°N 26.140556°E
Corfu 39.601944°N 19.911667°E"""
airports = []
import re
airport_data1 = re.sub("[°N#°E]", "", airport_data)
def process_airports(string):
airports_temp = string.split()
airports = [(airports_temp[x], float(airports_temp[x + 1]), float(airports_temp[x + 2])) for x in
range(0, len(airports_temp), 3)]
return airports
def menu(data):
for i in range(len(data)):
print(f'{i + 1} {data[i]}')
# I'm adding the call to the functions for clarity
data = process_airports(airport_data1)
menu(data)
The printed menu now looks like that:
1 ('Alexandroupoli', 40.855869, 25.956264)
2 ('Athens', 37.936389, 23.947222)
3 ('Chania', 35.531667, 24.149722)
4 ('Chios', 38.343056, 26.140556)
5 ('Corfu', 39.601944, 19.911667)
Second and this is mostly fyi, but you can access both the index of a iterable and the element itself by looping over enumerate() meaning, the following function will print the exact same thing as the one with range(len(data)). This is handy if you need to work with both the element itself and it's index.
def menu(data):
for the_index, the_element in enumerate(data):
print(f'{the_index + 1} {the_element}')

How can I print the intermediate variables in the loss function in TensorFlow and Keras?

I'm writing a custom objective to train a Keras (with TensorFlow backend) model but I need to debug some intermediate computation. For simplicity, let's say I have:
def custom_loss(y_pred, y_true):
diff = y_pred - y_true
return K.square(diff)
I could not find an easy way to access, for example, the intermediate variable diff or its shape during training. In this simple example, I know that I could return diff to print its values, but my actual loss is more complex and I can't return intermediate values without getting compiling errors.
Is there an easy way to debug intermediate variables in Keras?
This is not something that is solved in Keras as far as I know, so you have to resort to backend-specific functionality. Both Theano and TensorFlow have Print nodes that are identity nodes (i.e., they return the input node) and have the side-effect of printing the input (or some tensor of the input).
Example for Theano:
diff = y_pred - y_true
diff = theano.printing.Print('shape of diff', attrs=['shape'])(diff)
return K.square(diff)
Example for TensorFlow:
diff = y_pred - y_true
diff = tf.Print(diff, [tf.shape(diff)])
return K.square(diff)
Note that this only works for intermediate values. Keras expects tensors that are passed to other layers to have specific attributes such as _keras_shape. Values processed by the backend, i.e. through Print, usually do not have that attribute. To solve this, you can wrap debug statements in a Lambda layer for example.
In TensorFlow 2, you can now add IDE breakpoints in the TensorFlow Keras models/layers/losses, including when using the fit, evaluate, and predict methods. However, you must add model.run_eagerly = True after calling model.compile() for the values of the tensor to be available in the debugger at the breakpoint. For example,
import tensorflow as tf
from tensorflow.keras.layers import Dense
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
def custom_loss(y_pred, y_true):
diff = y_pred - y_true
return tf.keras.backend.square(diff) # Breakpoint in IDE here. =====
class SimpleModel(Model):
def __init__(self):
super().__init__()
self.dense0 = Dense(2)
self.dense1 = Dense(1)
def call(self, inputs):
z = self.dense0(inputs)
z = self.dense1(z)
return z
x = tf.convert_to_tensor([[1, 2, 3], [4, 5, 6]], dtype=tf.float32)
y = tf.convert_to_tensor([0, 1], dtype=tf.float32)
model0 = SimpleModel()
model0.run_eagerly = True
model0.compile(optimizer=Adam(), loss=custom_loss)
y0 = model0.fit(x, y, epochs=1) # Values of diff *not* shown at breakpoint. =====
model1 = SimpleModel()
model1.compile(optimizer=Adam(), loss=custom_loss)
model1.run_eagerly = True
y1 = model1.fit(x, y, epochs=1) # Values of diff shown at breakpoint. =====
This also works for debugging the outputs of intermediate network layers (for example, adding the breakpoint in the call of the SimpleModel).
Note: this was tested in TensorFlow 2.0.0-rc0.
In TensorFlow 2.0, you can use tf.print and print anything inside the definition of your loss function. You can also do something like tf.print("my_intermediate_tensor =", my_intermediate_tensor), i.e. with a message, similar to Python's print. However, you may need to decorate your loss function with #tf.function to actually see the results of the tf.print.

Ruby: evaluate string with dynamic binding of variables

I have a database of "formulas" stored as strings. Let's assume for simplicity, that each formula contains 2 variables denoted by a and b, and that the formulas are all wellformed and it is ensured that it consists only of characters from the set ()ab+-*.
At runtime, formulas are fetched from this database, and from another source, numeric values for a and b are fetched, and the formulas are evaluated. The evaluation can be programmed like this:
# This is how it works right now
formula = fetch_formula(....)
a = fetch_left_arg(....)
b = fetch_right_arg(....)
result = eval(formula)
This design works, but I'm not entirely happy with it. It requires that my program names the free variables exactly the same as they are named in the formula, which is ugly.
If my "formula" would not be a string, but a Proc object or Lambda which accepts two parameters, I could do something like
# No explicitly named variables
result = fetch_proc(...).call(fetch_left_arg(....),fetch_right_arg(....))
but unfortunately, the formulas have to be strings.
I tried to experiment in the following way: What if the method, which fetches the formula from the database, would wrap the string into something, which behaves like a block, and where I could pass parameters to it?
# This does not work of course, but maybe you get the idea:
block_string = "|a,b| #{fetch_formula(....)}"
Of course I can't eval such a block_string, but is there something similar which I could use? I know that instance_eval can pass parameters, but what object should I apply it to? So this is perhaps not an option either....
This is very nasty approach, but for simple formulas you’ve mentioned it should work:
▶ formula = 'a + b'
▶ vars = formula.scan(/[a-z]+/).uniq.join(',') # getting vars names
#⇒ "a,b"
▶ pr = eval("proc { |#{vars}| #{formula} }") # preparing proc
▶ pr.call 3, 5
#⇒ 8
Here we rely on the fact, that parameters are passed to the proc in the same order, as they appear in the formula.
If I get your question correctly, it is something that I have done recently, and is fairly easy. Given a string:
s = "{|x, y| x + y}"
You can create a proc by doing:
eval("Proc.new#{s}")
One way to avoid creating the variables in the local scope could be to use a Binding:
bind = binding
formula = fetch_formula(....)
bind.local_variable_set :a, fetch_left_arg(....)
bind.local_variable_set :b, fetch_right_arg(....)
result = bind.eval(formula)
The variables a and b now only exist in the binding, and do not pollute the rest of your code.
You can create a lambda from string, as shown below:
formula = "a + b"
lambda_template = "->(a,b) { %s }"
formula_lambda = eval(lambda_template % formula)
p formula_lambda.call(1,2)
#=> 3

Populating a list is Scala with random double taking forever

I am new to Scala and am trying to get a list of random double values:
The thing is, when I try to run this, it takes way too long compared to its Java counterpart. Any ideas on why this is or a suggestion on a more efficient approach?
def random: Double = java.lang.Math.random()
var f = List(0.0)
for (i <- 1 to 200000)
( f = f ::: List(random*100))
f = f.tail
You can also achieve it like this:
List.fill(200000)(math.random)
the same goes for e.g. Array ...
Array.fill(200000)(math.random)
etc ...
You could construct an infinite stream of random doubles:
def randomList(): Stream[Double] = Stream.cons(math.random, randomList)
val f = randomList().take(200000)
This will leverage lazy evaluation so you won't calculate a value until you actually need it. Even evaluating all 200,000 will be fast though. As an added bonus, f no longer needs to be a var.
Another possibility is:
val it = Iterator.continually(math.random)
it.take(200000).toList
Stream also has a continually method if you prefer.
First of all, it is not taking longer than java because there is no java counterpart. Java does not have an immutable list. If it did, performance would be about the same.
Second, its taking a lot of time because appending lists have linear performance, so the whole thing has quadratic performance.
Instead of appending, prepend, which had constant performance.
if your using mutable state anyways you should use a mutable collection like buffer which you can add too with += (which then would be the real counterpart to java code).
but why dont u use list comprehension?
val f = for (_ <- 1 to 200000) yield (math.random * 100)
by the way: var f = List(0.0) ... f = f.tail can be replaced by var f: List[Double] = Nil in your example. (no more performance but more beauty ;)
Yet more options! Tail recursion:
def randlist(n: Int, part: List[Double] = Nil): List[Double] = {
if (n<=0) part
else randlist(n-1, 100*random :: part)
}
or mapped ranges:
(1 to 200000).map(_ => 100*random).toList
Looks like you want to use Vector instead of List. List has O(1) prepend, Vector has O(1) append. Since you are appending, but using concatenation, it'll be faster to use Vector:
def random: Double = java.lang.Math.random()
var f: Vector[Double] = Vector()
for (i <- 1 to 200000)
f = f :+ (random*100)
Got it?

Is Odersky serious with "bills !*&^%~ code!"?

In his book programming in scala (Chapter 5 Section 5.9 Pg 93)
Odersky mentioned this expression "bills !*&^%~ code!"
In the footnote on same page:
"By now you should be able to figure out that given this code,the Scala compiler would
invoke (bills.!*&^%~(code)).!()."
That's a bit to cryptic for me, could someone explain what's going on here?
What Odersky means to say is that it would be possible to have valid code looking like that. For instance, the code below:
class BadCode(whose: String, source: String) {
def ! = println(whose+", what the hell do you mean by '"+source+"'???")
}
class Programmer(who: String) {
def !*&^%~(source: String) = new BadCode(who, source)
}
val bills = new Programmer("Bill")
val code = "def !*&^%~(source: String) = new BadCode(who, source)"
bills !*&^%~ code!
Just copy&paste it on the REPL.
The period is optional for calling a method that takes a single parameter, or has an empty parameter list.
When this feature is utilized, the next chunk after the space following the method name is assumed to be the single parameter.
Therefore,
(bills.!*&^%~(code)).!().
is identical to
bills !*&^%~ code!
The second exclamation mark calls a method on the returned value from the first method call.
I'm not sure if the book provides method signatures but I assume it's just a comment on Scala's syntactic sugar so it assumes if you type:
bill add monkey
where there is an object bill which has a method add which takes a parameter then it automatically interprets it as:
bill.add(monkey)
Being a little Scala rusty, I'm not entirely sure how it splits code! into (code).!() except for a vague tickling of the grey cells that the ! operator is used to fire off an actor which in compiler terms might be interpretted as an implicit .!() method on the object.
The combination of the '.()' being optional with method calls (as Wysawyg explained above) and the ability to use (almost) whatever characters you like for naming methods, makes it possible to write methods in Scala that look like operator overloading. You can even invent your own operators.
For example, I have a program that deals with 3D computer graphics. I have my own class Vector for representing a 3D vector:
class Vector(val x: Double, val y: Double, val z: Double) {
def +(v: Vector) = new Vector(x + v.x, y + v.y, z + v.z)
// ...etc.
}
I've also defined a method ** (not shown above) to compute the cross product of two vectors. It's very convenient that you can create your own operators like that in Scala, not many other programming languages have this flexibility.

Resources