Erlang binary matching efficiency - performance

What would be the difference between matching like:
fun(Binary) ->
[Value, Rest] = binary:split(Binary, <<101>>)
end
and
fun(Binary) ->
[Value, <<Rest/binary>>] = binary:split(Binary, <<101>>)
end
I am thinking that one may simply increment a counter as it traverses the binary and keep the sub binary pointer and the other will copy a new binary. Any ideas?

I can think of pattern matching in two ways.
Method 1:
[A,B] = [<<"abcd">>,<<"fghi">>]
Method 2:
[A, <<B/binary>>] = [<<"abcd">>,<<"fghi">>]
Unless you need to make it sure B is binary, Method 2 will take it longer, few micro seconds, because it's not just assigning <<"fghi">> to B, but also make it sure it is bianary.
However if you need more parsing than method 2, you can go further, which method 1 can't do.
[A, <<B:8, Rest/binary>>] = [<<"abcd">>,<<"fghi">>].

I think you could test it by timer module's tc/N function.

Related

Julia type instability: Array of LinearInterpolations

I am trying to improve the performance of my code by removing any sources of type instability.
For example, I have several instances of Array{Any} declarations, which I know generally destroy performance. Here is a minimal example (greatly simplified compared to my code) of a 2D Array of LinearInterpolation objects, i.e
n,m=5,5
abstract_arr=Array{Any}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
so that typeof(abstract_arr) gives Array{Any,2}.
How can I initialize abstract_arr to avoid using Array{Any} here?
And how can I do this in general for Arrays whose entries are structures like Dicts() where the Dicts() are dictionaries of 2-tuples of Float64?
If you make a comprehension, the type will be figured out for you:
arr = [LinearInterpolation(arr_x, ;alpha.*arr_x.^n) for l in 1:n, alpha in 1:m]
isconcretetype(eltype(arr)) # true
When it can predict the type & length, it will make the right array the first time. When it cannot, it will widen or extend it as necessary. So probably some of these will be Vector{Int}, and some Vector{Union{Nothing, Int}}:
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:3]
[rand()>0.8 ? nothing : 0 for i in 1:10]
The main trick is that you just need to know the type of the object that is returned by LinearInterpolation, and then you can specify that instead of Any when constructing the array. To determine that, let's look at the typeof one of these objects
julia> typeof(LinearInterpolation(arr_x,arr_x.^2))
Interpolations.Extrapolation{Float64, 1, ScaledInterpolation{Float64, 1, Interpolations.BSplineInterpolation{Float64, 1, Vector{Float64}, BSpline{Linear{Throw{OnGrid}}}, Tuple{Base.OneTo{Int64}}}, BSpline{Linear{Throw{OnGrid}}}, Tuple{LinRange{Float64}}}, BSpline{Linear{Throw{OnGrid}}}, Throw{Nothing}}
This gives a fairly complicated type, but we don't necessarily need to use the whole thing (though in some cases it might be more efficient to). So for instance, we can say
using Interpolations
n,m=5,5
abstract_arr=Array{Interpolations.Extrapolation}(undef,n+1,m+1)
arr_x=LinRange(1,10,100)
for l in 1:n
for alpha in 1:m
abstract_arr[l,alpha]=LinearInterpolation(arr_x,alpha.*arr_x.^n)
end
end
which gives us a result of type
julia> typeof(abstract_arr)
Matrix{Interpolations.Extrapolation} (alias for Array{Interpolations.Extrapolation, 2})
Since the return type of this LinearInterpolation does not seem to be of known size, and
julia> isbitstype(typeof(LinearInterpolation(arr_x,arr_x.^2)))
false
each assignment to this array will still trigger allocations, and consequently there actually may not be much or any performance gain from the added type stability when it comes to filling the array. Nonetheless, there may still be performance gains down the line when it comes to using values stored in this array (depending on what is subsequently done with them).

Julia: Self-referential and recursive types

What I am trying to do is not very straight forward, maybe it is easier if I start with the result and then explain how I am trying to get there.
I have a struct with two fields:
struct data{T}
point::T
mat::Array
end
What I would like to do is nest this and make the field mat self-referential to get something like this:
data{data{Int64}}(data{Int64}(1, [1]), [1])
The 'outer' type should not store [1] but reference to the innermost mat. I am not sure if this makes sense or is even possible. The field mat should store the same large array repeatedly.
I have tried something like this (n is the number of nested types.
struct data{T}
point::T
g::Array
function D(f, g, n)
for i = 1:n
(x = new{T}(f, g); x.f = x)
end
end
end
Again I am not sure if I understand self-referential constructors enough, or if this is possible. Any help/clarification would be appreciated, thanks!
The exact pattern will depend on what you want to achieve but here is one example:
struct Data{V, A <: AbstractArray{V}, T}
mat::A
point::T
Data(mat::A, point::T = nothing) where {V, A <: AbstractArray{V}, T} =
new{V,A,T}(mat,point)
end
Usage
julia> d0 = Data([1,2,3])
Data{Int64,Array{Int64,1},Nothing}([1, 2, 3], nothing)
julia> d1 = Data([1.0,2.0],d0)
Data{Float64,Array{Float64,1},Data{Int64,Array{Int64,1},Nothing}}([1.0, 2.0], Data{Int64,Array{Int64,1},Nothing}([1, 2, 3], nothing))
Tips:
Never use untyped containers. Hence, when you want to store an Array you need to have its type in your struct defintion.
Use names starting with a capital letter for structs
Provide constructors to have your API readable
Last but not least. If you want to have several nesting levels for such structure the compiling times will hugely increase. In that case it would be usually better to use homogenous types. In such scenarios you could use perhaps type Unions instead (unions of small number of types are fast in Julia).
Based on your description, the data seems like a very general wrapper. Maybe you can try something like this:
mutable struct Wrap{T}
w::Wrap{T}
d::T
function Wrap(d::T) where T
w = new{T}()
w.d = d
w
end
end
function Wrap(d, n::Int)
res = Wrap(d)
cur = res
for _ in 1:n-1
cur.w = Wrap(d)
cur = cur.w
end
res
end
Wrap([1], 4)
# Wrap{Array{Int64,1}}(Wrap{Array{Int64,1}}(Wrap{Array{Int64,1}}(Wrap{Array{Int64,1}}(#undef, [1]), [1]), [1]), [1])

How does element membership work in Perl 6?

Consider this example
my #fib = (1,1, * + * … * > 200).rotor(2 => -1);
say #fib[0] ∈ #fib; # prints True
The first statement creates a Sequence of 2-element subsequences via the use of the rotor function. #fib will contain (1,1), (1,2) and so on. Quite obviously, the first element of a sequence is part of a sequence. Or is it?
my #fib = (1,1, * + * … * > 200).rotor(2 => -1);
say #fib[0], #fib[0].^name; # OUTPUT: «(1 1)List␤»
So the first element contains a list whose value is (1 1). OK, let's see
my $maybe-element = (1,1);
say $maybe-element, $maybe-element.^name; # OUTPUT: «(1 1)List␤»
say $maybe-element ∈ #fib; # OUTPUT: «False␤»
Wait, what? Let's see...
my $maybe-element = #fib[0];
say $maybe-element ∈ #fib; # OUTPUT: «True␤»
Hum. So it's not the container. But
say (1,1).List === (1,1).List; # OUTPUT: «False␤»
And
say (1,1).List == (1,1).List; # OUTPUT: «True␤»
So I guess ∈ is using object identity, and not equality. That being the case, how can we check, in sets or sequences of lists, if an independently generated list is included using this operator? Should we use another different strategy?
Maybe a subquestion is why the same literals generate completely different objects, but there's probably a good, and very likely security-related, answer for that.
So I guess ∈ is using object identity, and not equality.
That is correct.
That being the case, how can we check, in sets or sequences of lists, if an independently generated list is included using this operator?
You can use .grep or .first and the equality operator of your choice (presumably you want eqv here), or you can try to find a list-like value type. Off the top of my head, I don't know if one is built into Perl 6.

Generate a random number in ruby with some conditions

I have no idea of how to proceed, I've been learning ruby for just one week. I thought I'd create an array filled by an external source, such as a database and forbid these elements inside to be picked up by the script. Is it possible? I just want to have a general idea of how creating such script.
Do you mean some thing like this?
forbidden_numbers = [ 5 , 6 , 3 , 4]
new_number = loop do
tmp_number = rand 1_000_000
break tmp_number unless forbidden_numbers.include?(tmp_number)
end
puts new_number
In general, you have two choices:
Remove the ineligible elements, then choose one at random:
arr.reject {...}.sample
Choose an element at random. If it is disallowed, repeat, continuing until a valid element is found:
until (n=arr.sample) && ok?(n) end
n
Without additional information we cannot say which approach is best in this case.

Sort array of hashes based on value of key in hash?

I'm attempting to work with Vagrant to perform some automation in spinning up Docker containers. Vagrantfiles are essentially Ruby and thus I should be able to apply Ruby logic to assist with this issue.
I am reading through a conf.d directory filled with YAML files containing configuration data and then pushing a hash of configuration items into an array. Once done I am itering through the array with .each and applying the configuration to each entry in the array based on the values of some of the keys inside the hash. One of these keys is "link". The value of link will correlate to the value of another key "name".
I essentially need to ensure that the hash with link => 'name' is in the array prior to the hash with name => 'value'.
Example of input and expected output:
Input
containers = [{"name"=>"foo", "ports"=>["80:80", "443:443"], "links"=>["bar", "baz"]}, {"name"=>"bar", "ports"=>["8888:8888"]}, {"name"=>"baz","ports"=>"80:80"}]
Expected Output
containers = [{"name"=>"bar", "ports"=>["8888:8888"]}, {"name"=>"baz", "ports"=>"80:80"}, {"name"=>"foo", "ports"=>["80:80", "443:443"], "links"=>["bar", "baz"]}]
The end result is that any entry with "link" appears after an entry in the array where the hash's name key matches it. (Basically dependency ordering based on the link key.)
Note it may occur that a linked container links to another linked container.
It's been puzzling me a bit as I have the ideas of what I need to do but lack the technical chops to actually figure out "How?" :)
Thanks in advance for any assistance.
This should work for you:
def order_containers(containers)
unordered = containers.dup
ordered = []
names_from_ordered = {}
name_is_ordered = names_from_ordered.method(:[])
until unordered.empty?
container = unordered.find do |c|
c.fetch('links', []).all? &name_is_ordered
end
raise 'container ordering impossible' if !container
ordered << container
unordered.delete(container)
names_from_ordered[container.fetch('name')] = true
end
ordered
end
containers = [
{ 'name'=>'foo', 'links'=>['bar'] },
{ 'name'=>'a', 'links'=>['goo'] },
{ 'name'=>'bar' },
{ 'name'=>'goo', 'links'=>['foo'] },
]
containers = order_containers(containers)
require 'pp'
pp containers
# => [{"name"=>"bar"},
# {"name"=>"foo", "links"=>["bar"]},
# {"name"=>"goo", "links"=>["foo"]},
# {"name"=>"a", "links"=>["goo"]}]
The basic idea is that we use a loop, and each iteration of the loop will find one container from the input list that is suitable for adding to the output list. A container is suitable for adding to the output list if all the containers it depends on have already been added to the output list. The container is then removed from the input list and added to the output list.
This loop can terminate in two main ways:
when the input list is empty, which would indicate success, or
when we cannot find a container that we are able to start, which would be an error caused by a circular dependency.
Seems to me the simplest thing would be something like:
linkless_configs = []
linked_configs = []
if config_hash.has_key?("links")
linked_configs.push(config_hash)
else
linkless_configs.push(config_hash)
end
then you can iterator over linkless_configs + linked_configs and be guaranteed that each linked config comes after the corresponding link-less config.
Alternatively, if you must sort, you could
containers.sort_by { |config| config.has_key?("links") ? 1 : 0 }
[Edit: #DavidGrayson has pointed out a flaw with my answer. I'll see if I can find a fix, but if I cannot, and I fear that may be the case, I'll delete the answer. [Edit#2: Oh, my! Someone upvoted my answer after my initial edit. I'm not sure I can delete it now, but to be truthful, I'd already decided not to do so, mainly because my explanation has implications for any proposed solution to the OP's problem. With 10 points in the balance, leaving it up is now even more compelling. 2#tidE]
I believe I understand the problem. sort requires a total order, which is a partial order in which a <= b or a <= b for every pair of elements. ref The latter is not a problem, but the partial order requirement is. A partial order must satisfy axioms of:
reflexivity (x ≤ x),
antisymmetry (if x ≤ y and y ≤ x then x = y) and
transitivity (if x ≤ y and y ≤ z, then x ≤ z).
My ordering only satisfies the reflexivity axiom. David gives the counter-example:
containers = [h0, h1, h2]
where
h0 = {'name'=>'foo', 'links'=>['bar']},
h1 = {'name'=>'a'},
h2 = {'name'=>'bar'},
containers.sort
#=> [{"name"=>"foo", "links"=>["bar"]},
# {"name"=>"a"}, {"name"=>"bar"}]
My method Hash#<=> establishes:
h0 = h1
h0 > h2
h1 = h2
If sort were to find that h0 = h1 = h2, it would conclude, by transitivity, that h0 = h2 (and not check h0 <=> h2), which may result in an incorrect result.
David also points out that o.follows?(self) should raise an exception because I have defined it as private. As I have not yet encountered an exception, I conclude that statement has not been executed, but I have not traced the reason for that, but that's a minor point (though no doubt a useful clue).
I'm grateful to David for identifying the problem. Incorrect answers need to be exposed, of course, but I feel I've learned something useful as well.
tidE]
If I understand the question correctly, and the data provides a valid ordering, I think you could do it as follows.
class Hash
def <=>(o)
case
when follows?(o) then 1
when o.follows?(self) then -1
else 0
end
end
private
def follows?(o)
key?("links") && self["links"].include?(o["name"])
end
end
containers = [{"name"=>"foo", "ports"=>["80:80", "443:443"],
"links"=>["bar", "baz"]},
{"name"=>"bar", "ports"=>["8888:8888"]},
{"name"=>"baz","ports"=>"80:80"}]
containers.sort
#=> [{"name"=>"baz", "ports"=>"80:80"},
# {"name"=>"bar", "ports"=>["8888:8888"]},
# {"name"=>"foo", "ports"=>["80:80", "443:443"],
# "links"=>["bar", "baz"]}]
Addendum
Although I prefaced with the assumption that the data provides a valid ordering, #Ajedi32 asks what happens when there is a circular reference. Let's find out:
containers = [{"name"=>"foo", "links"=>["bar"]},
{"name"=>"bar", "links"=>["baz"]},
{"name"=>"baz", "links"=>["foo"]}]
containers.sort
#=> [{ "name"=>"baz", "links"=>["foo"] },
# { "name"=>"bar", "links"=>["baz"] },
# { "name"=>"foo", "links"=>["bar"] }]
containers = [{"name"=>"foo", "links"=>["bar"]},
{"name"=>"bar", "links"=>["foo"]}]
containers.sort
#=> [{ "name"=>"bar", "links"=>["foo"] },
# { "name"=>"foo", "links"=>["bar"] }]
This shows that if one were not certain that there were no circular references, one should check for that before sorting.

Resources