Geopandas touches method on geoseries does not work as I expect - geopandas

I have two geoseries in the same crs. I want to extract from the geoseries_1 all the polygons touching any polygon of geoseries_2. In the documentation it says that geoseries are tested element-wise, so I do:
geoseries_1.touches(geoseries_2)
but the output is
0 False
1 False
2 False
...
569 False
597 False
598 False
Length: 599, dtype: bool
but I know some of the polygons of geoseries_1 are actually touching some polygons in geoseries_2 and if I do for example:
geoseries_1.touches(geoseries_2.geometry.iloc[0])), the result is:
0 True
1 True
2 False
...
569 False
597 True
598 False
Length: 599, dtype: bool
Is this the expected output? Am I misinterpreting the documentation?
Thanks in advance!

Yes, this is the expected (but sometimes surprising) behaviour: if you pass another GeoSeries as argument, the 'touches' operation is done element-wise (so first of geoseries_1 with first of geoseries_2, second of geoseries_1 with second of geoseries_2, ...).
So it does not the "for all elements in geoseries_1, check each element of geoseries_1" behaviour. That is more like a spatial join. But, unfortunately, GeoPandas does not support the 'touches' spatial relationships in its sjoin function
So what is the solution? This depends on the desired output: do you want to repeat the rows that have multiple matches? Or do you just want to have the list of touching polygons?
BTW: I recently opened an issue on github to propose disabling this automatic alignment (so at least the above would given an error if geoseries_1 and geoseries_2 don't have the same length and index): https://github.com/geopandas/geopandas/issues/750

Related

Julia type instability: Array of LinearInterpolations

I am trying to improve the performance of my code by removing any sources of type instability.
For example, I have several instances of Array{Any} declarations, which I know generally destroy performance. Here is a minimal example (greatly simplified compared to my code) of a 2D Array of LinearInterpolation objects, i.e
n,m=5,5
abstract_arr=Array{Any}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
so that typeof(abstract_arr) gives Array{Any,2}.
How can I initialize abstract_arr to avoid using Array{Any} here?
And how can I do this in general for Arrays whose entries are structures like Dicts() where the Dicts() are dictionaries of 2-tuples of Float64?
If you make a comprehension, the type will be figured out for you:
arr = [LinearInterpolation(arr_x, ;alpha.*arr_x.^n) for l in 1:n, alpha in 1:m]
isconcretetype(eltype(arr)) # true
When it can predict the type & length, it will make the right array the first time. When it cannot, it will widen or extend it as necessary. So probably some of these will be Vector{Int}, and some Vector{Union{Nothing, Int}}:
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:10]
The main trick is that you just need to know the type of the object that is returned by LinearInterpolation, and then you can specify that instead of Any when constructing the array. To determine that, let's look at the typeof one of these objects
julia> typeof(LinearInterpolation(arr_x,arr_x.^2))
Interpolations.Extrapolation{Float64, 1, ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, BSpline{Linear{Throw{OnGrid}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Linear{Throw{OnGrid}}}, Tuple{LinRange{Float64}}}, BSpline{Linear{Throw{OnGrid}}}, Throw{Nothing}}
This gives a fairly complicated type, but we don't necessarily need to use the whole thing (though in some cases it might be more efficient to). So for instance, we can say
using Interpolations
n,m=5,5
abstract_arr=Array{Interpolations.Extrapolation}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
which gives us a result of type
julia> typeof(abstract_arr)
Matrix{Interpolations.Extrapolation} (alias for Array{Interpolations.Extrapolation, 2})
Since the return type of this LinearInterpolation does not seem to be of known size, and
julia> isbitstype(typeof(LinearInterpolation(arr_x,arr_x.^2)))
false
each assignment to this array will still trigger allocations, and consequently there actually may not be much or any performance gain from the added type stability when it comes to filling the array. Nonetheless, there may still be performance gains down the line when it comes to using values stored in this array (depending on what is subsequently done with them).

Microsoft technical interview: Matrix Algorithm

I recently had an interview in which the interviewer gave me some pseudocode and asked questions related to it. Unfortunately, I was not able to answer his questions due to lack of preparation. Due to time constraint, I could not ask him the solution for that problem. I would really appreciate if someone could guide me and help me understand the problem so I can improve for the future. Below is the pseudocode:
A sample state of ‘a’:
[[ 2, NULL, 2, NULL],
[ 2, NULL, 2, NULL],
[NULL, NULL, NULL, NULL],
[NULL, NULL, NULL, NULL]]
FUNCTION foo()
FOR y = 0 to 3
FOR x = 0 to 3
IF a[x+1][y] != NULL
IF a[x+1][y] = a[x][y]:
a[x][y] := a[x][y]*2
a[x+1][y] := NULL
END IF
IF a[x][y] = NULL
a[x][y] := a[x+1][y]
a[x+1][y] := NULL
END IF
END IF
END FOR
END FOR
END FUNCTION
The interviewer asked me:
What is the issue with the above code and how would I fix it?
Once corrected, what does function foo do? Please focus on the result of the function, not the details of the implementation.
How could you make foo more generic? Explain up to three possible generalization directions and describe a strategy for each, no need to write the code!
I mentioned to him:
The state of the matrix looks incorrect because an integer matrix cannot have null values. By default they are assigned 0, false for Boolean and null for the reference type.
Another issue with the above code is at IF a[x+1][y] != NULL, the condition will produce an array index out-of-bounds error when x equals 3.
But I felt the interviewer was looking for something else in my answer and was not satisfied with the explanation.
Have you played the game "2048" (link to game)? If not, this question will likely not make much intuitive sense to you, and because of that, I think it's a poor interview question.
What this attempts to do is simulate one step of the 2048 game where the numbers go upward. Numbers will move upward by one cell unless they hit another number or the matrix border (think of gravity pulling all numbers upward). If the two numbers are equal, they combine and produce a new number (their sum).
Note: this isn't exactly one step of the 2048 game because numbers only move one cell upward, while in the game they move "all they way" until they hit something else. To get a step of the 2048 game, you'd repeat the given function until no more changes occur.
The issue in the code is, as you mentioned, the array index out-of-bounds. It should be fixed by iterating over x = 0 to 2 instead.
To make this more general, you have to be creative:
The main generalization is that it should take a "direction" parameter. (Again you wouldn't know this if you haven't played the 2048 game yourself.) Instead of gravity pulling numbers upward, gravity can pull numbers in any of the 4 cardinal directions.
Maybe the algorithm shouldn't check for NULL but should check against some other sentinel value (which is another input).
It's also pretty easy to generalize this to larger matrices.
Maybe there should be some other rule that dictates when numbers get combined, and how precisely they get combined (not necessarily 2 times the first). These rules can be given in the form of lambdas.
As for this part of your answer:
integer matrix cannot have null values, by default they are assigned 0, false for Boolean and null for the reference type
That is largely dependent on the language being used, so I wouldn't say this is an error in the pseudocode (which isn't supposed to be in any particular language). For instance, in weakly-typed languages you can certainly have a matrix with int and NULL values.
You don't mention what you said about the function's behavior. If I were the interviewer, I would want to see someone "think out loud" and realize at least the following:
The code is trying to compare each element with the one below it.
Nothing happens unless the lower element is NULL.
If the two elements are equal, then the lower one is replaced with NULL and the upper element becomes twice as large.
If the top element is NULL, then the lower non-NULL element "moves" to the top element's place.
These observations about the code are straightforward to obtain just by reading the source code. Whether or not you make sense of these "rules" and notice that it's (similar to) the 2048 game is largely dependent on whether you've played the game before.
Here's the python code for the same program. I have fixed the index out of bound issue in this code. Hope this helps.
null = 0
array = [[2,null,2,null],[2,null,2,null],[null,null,null,null],[null,null,null,null]]
range = [0,1,2]
for y in range:
for x in range:
if array[x+1][y] != null:
if array[x+1][y] == array[x][y]:
array[x][y] = array[x][y]*2
array[x+1][y] = null
if array[x][y] == null:
array[x][y] = array[x+1][y]
array[x+1][y] = null
print(array)
Once corrected, what does function foo do? Please focus on the result of the function, not the details of the implementation
The output will be :
4 null 4 null
null null null null
null null null null
null null null null

Check if number is NaN

I'm trying to check if a variable I have is equals to NaN in my Ruby on Rails application.
I saw this answer, but it's not really useful because in my code I want to return 0 if the variable is NaN and the value otherwise:
return (average.nan ? 0 : average.round(1))
The problem is that if the number is not a NaN I get this error:
NoMethodError: undefined method `nan?' for 10:Fixnum
I can't check if the number is a Float instance because it is in both cases (probably, I'm calculating an average).
What can I do?
It is strange only to me that a function to check if a variable is equals to NaN is avaible only to NaN objects?
Quickest way is to use this:
under_the_test.to_f.nan? # gives you true/false e.g.:
123.to_f.nan? # => false
(123/0.0).to_f.nan? #=> true
Also note that only Floats have #nan? method defined on them, that's the reason why I'm using #to_f in order to convert result to float first.
Tip: if you have integer calculation that potentially can divide by zero this will not work:
(123/0).to_f.nan?
Because both 123 and 0 are integers and that will throw ZeroDivisionError, in order to overcome that issue Float::NAN constant can be useful - for example like this:
return Float::NAN if divisor == 0
return x / divisor
I found this answer while duckducking for something that is neither NaN nor Infinity (e.g., a finite number). Hence I'll add my discovery here for next googlers.
And as always in ruby, the answer was just to type my expectation while searching in the Float documentation, and find finite?
n = 1/0.0 #=> Infinity
n.nan? #=> false
n.finite? #=> false
The best way to avoid this kind of problem is to rely on the fact that a NaN isn't even equal to itself:
a = 0.0/0.0
a != a
# -> True !
This is likely not going to be an issue with any other type.

Selecting only a small amount of trials in a possibly huge condition file in a pseudo-randomized way

I am using the PsychoPy Builder and have used the code only rudimentary.
Now I'm having a problem for which I think coding is inevitable, but I have no idea how to do it and so far, I didn't find helpful answers in the net.
I have an experiment with pictures of 3 valences (negative, neutral, positive).
In one of the corners of the pictures, additional pictures (letters and numbers) can appear (randomly in one of the 4 positions) in random latencies.
All in all, with all combinations (taken the identity of the letters/numbers into account), I have more than 2000 trial possibilities.
But I only need 72 trials, with the condition that each valence appears 24 times (or: each of the 36 pictures 2 times) and each latency 36 times. Thus, the valence and latency should be counterbalanced, but the positions and the identities of the letters and numbers can be random. However, in a specific rate, (in 25% of the trials) no letters/ numbers should apear in the corners.
Is there a way to do it?
Adding a pretty simple code component in builder will do this for you. I'm a bit confused about the conditions, but you'll probably get the general idea. Let's assume that you have your 72 "fixed" conditions in a conditions file and a loop with a routine that runs for each of these conditions.
I assume that you have a TextStim in your stimulus routine. Let's say that you called it 'letternumber'. Then the general strategy is to pre-compute a list of randomized characters and positions for each of the 72 trials and then just display them as we move through the experiment. To do this, add a code component to the top of your stimulus routine and add under "begin experiment":
import random # we'll use this module to pick random elements from below
# Indicator sequence, specifying whether letter/number should be shown. False= do not show. True = do show.
show_letternumber = [False] * 18 + [True] * 54 # 18/72=25%, 54/72=75%.
random.shuffle(show_letternumber)
# Sets of letters and numbers to present
char_set = ['1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'g'] # ... and so on.
char_trial = [random.choice(char_set) if show_char else '' for show_char in char_set] # list with characters
# List of positions
pos_set = [(0.5, 0.5),(-0.5, 0.5),(-0.5,-0.5),(0.5, -0.5)] # coordinates of your four corners
pos_trial = [random.choice(pos_set) for char in char_trial]
Then under "begin routine" in the code component, set the lettersnumbers to show the value of character_trial for that trial and at the position in pos_trial.
letternumbers.pos = pos_trial[trials.thisN] # set position. trials.thisN is the current trial number
letternumbers.text = char_trial[trials.thisN] # set text
# Save to data/log
trials.addData('pos', pos_trial[trials.thisN])
trials.addData('char', char_trial[trials.thisN])
You may need to tick "set every repeat" for the lettersnumbers component in Builder for the text to actually show.
Here is a strategy you could try, but as I don't use builder I can't integrate it into that work flow.
Prepare a list that has the types of trials you want in the write numbers. You could type this by hand if needed. For example mytrials = ['a','a',...'d','d'] where those letters represent some label for the combination of trial types you want.
Then open up the console and permute that list (i.e. shuffle it).
import random
random.shuffle(mytrials)
That will shift the mytrials around. You can see that by just printing that. When you are happy with that paste that into your code with some sort of loop like
t in mytrials:
if t == 'a':
<grab a picture of type 'a'>
elseif t == 'b':
<grab a picture of type 'b'>
else:
<grab a picture of type 'c'>
<then show the picture you grabbed>
There are programmatic ways to build the list with the right number of repeats, but for what you are doing it may be easier to just get going with a hand written list, and then worry about making it fancier once that works.

Is there an efficient algorithm to find which composition of boolean functions will match the output of a given boolean function?

Suppose I have the following boolean functions.
def Bottom():
return False
def implies(var1, var2):
if var1 == True and var2 == False: return False
return True
def land(var1, var2):
return var1 == True and var2 == True.
Is there an efficient algorithm which will take these three functions as input, and determine which (possibly multiple-application) functional composition of the first two functions will match the output of the third function for every Boolean (T,F) input to the third function?
I am using Python to write my example in, but I am not restricting solutions to Python or any programming language for that matter.
In fact I am not actually looking for code, but more of a description of an algorithm or an explanation for why one does not exist.
As a side note, my motivation for trying to discover this algorithm is because I was asked to show Functional Completeness of a particular set of logical connectives, and we do this by showing that one logical connective can be emulated by a certain set of others.
For logic, we have to use a little bit of guess and check, but I could not figure out a way to capture that in a program without a linear search over a large space of possibilities.
If you're only looking at boolean functions of two arguments, a simple brute-force technique will work. It could be extended to ternary logic, or ternary functions, or even both, but it is exponential so you can't push it too far. Here's the boolean version; I hope it's obvious how to extend it.
1) A binary boolean function is a relation {False, True} X {False, True} -> {False, True}. There are exactly 16 of these. Note that these include various functions which are independent of one or even both of the inputs. So let's make the set 𝓕 consisting exactly of these 16 functions, and now note that every boolean function has a corresponding higher-order function 𝓕 X 𝓕 -> 𝓕.
2) Now, start with the boolean functions Take first and Take second, and construct a closure using the HOFs corresponding to the "given functions". If the target function is in the closure, then it's achievable from some combination of the given functions. More generally, if every element in 𝓕 is in the closure, then the given function(s) are universal.
So, let's apply this to your example. I'm going to write elements of 𝓕 as a four-tuple corresponding to the inputs (F,F) (F,T) (T,F) (T,T) in that order, and I'm going to write the HOFs in bold. So Bottom is FFFF and Implies is TTFT. Bottom(a, b) is FFFF for any (a,b).
Take first is FFTT and Take second is FTFT, so that's our starting set. We can use Bottom to add FFFF, but obviously no further applications of Bottom are going to add anything.
So now we have nine possible pairs of functions we can apply to Implies. Here we go:
Implies(FFTT, FFTT) == TTTT (new)
Implies(FFTT, FTFT) == TTFT (new)
Implies(FFTT, FFFF) == TTFF (new)
Implies(FTFT, FFTT) == TFTT (new)
Implies(FTFT, FTFT) == TTTT
Implies(FTFT, FFFF) == TFTF (new)
Implies(FFFF, FFTT) == TTTT
Implies(FFFF, FTFT) == TTTT
Implies(FFFF, FFFF) == TTTT
Now we're up to eight of the sixteen functions, and we have a bunch more pairs to check. Since this is actually a complete set, it will get tedious, so I'll leave the next step to the reader (or perhaps their computer program).

Resources