Ordering array by dependencies with perl - algorithm

Have an array of hashes,
my #arr = get_from_somewhere();
the #arr contents (for example) is:
#arr = (
{ id => "id2", requires => 'someid', text => "another text2" },
{ id => "xid4", requires => 'id2', text => "text44" },
{ id => "someid", requires => undef, text => "some text" },
{ id => "id2", requires => 'someid', text => "another text2" },
{ id => "aid", requires => undef, text => "alone text" },
{ id => "id2", requires => 'someid', text => "another text2" },
{ id => "xid3", requires => 'id2', text => "text33" },
);
need something like:
my $texts = join("\n", get_ordered_texts(#arr) );
soo need write a sub what return the array of texts from the hashes, - in the dependent order, so from the above example need to get:
"some text", #someid the id2 depends on it - so need be before id2
"another text2", #id2 the xid3 and xid4 depends on it - and it is depends on someid
"text44", #xid4 the xid4 and xid3 can be in any order, because nothing depend on them
"text33", #xid3 but need be bellow id2
"alone text", #aid nothing depends on aid and hasn't any dependencies, so this line can be anywhere
as you can see, in the #arr can be some duplicated "lines", ("id2" in the above example), need output only once any id.
Not providing any code example yet, because havent any idea how to start. ;(
Exists some CPAN module what can be used to the solution?
Can anybody points me to the right direction?

Using Graph:
use Graph qw( );
my #recs = (
{ id => "id2", requires => 'someid', text => "another text2" },
{ id => "xid4", requires => 'id2', text => "text44" },
{ id => "someid", requires => undef, text => "some text" },
{ id => "id2", requires => 'someid', text => "another text2" },
{ id => "aid", requires => undef, text => "alone text" },
{ id => "id2", requires => 'someid', text => "another text2" },
{ id => "xid3", requires => 'id2', text => "text33" },
);
sub get_ordered_recs {
my %recs;
my $graph = Graph->new();
for my $rec (#_) {
my ($id, $requires) = #{$rec}{qw( id requires )};
$graph->add_vertex($id);
$graph->add_edge($requires, $id) if $requires;
$recs{$id} = $rec;
}
return map $recs{$_}, $graph->topological_sort();
}
my #texts = map $_->{text}, get_ordered_recs(#recs);

An interesting problem.
Here's my first round solution:
sub get_ordered_texts {
my %dep_found; # track the set of known dependencies
my #sorted_arr; # output
my $last_count = scalar #_; # infinite loop protection
while (#_ > 0) {
for my $value (#_) {
# next unless we are ready for this text
next if defined $value->{requires}
and not $dep_found{ $value->{requires} };
# Add to the sorted list
push #sorted_arr, $value->{text};
# Remember that we found it
$dep_found{ $value->{id} }++;
}
if (scalar #_ == $last_count) die "some requirements don't exist or there is a dependency loop";
$last_count = scalar #_;
}
return \#sorted_arr;
}
This is not terribly efficient and probably runs in O(n log n) time or something, but if you don't have a huge dataset, it's probably OK.

I would use a directed graph to represent the dependency tree and then walk the graph. I've done something very similiar using Graph.pm
Each of your hashes would be a graph vertex and the edge would represent the dependency.This has the added benefit of supporting more complex dependencies in the future as well as providing shortcut functions for working with the graph.

you didn't say what to do of the dependencies are "independent" of each other.
E.g. id1 requires id2; id3 requires id4; id3 requires id5. What should the order be? (other than 1 before 2 and 3 before both 4/5)
What you want is basically a BFS (Breadth First Search) of a tree (directed graph) of dependencies (or a forest depending on answers to #1 - the forest being a set of non-connected trees).
To do that:
Find all of the root nodes (ids that don't have a requirement themselves)
You can easily do that by making a hash of ALL the IDs using grep on your data structure
Put all those root modes into a starting array.
Then implement BFS. If you need help implementing basic BFS using an array and a loop in Perl, ask a separate question. There may be a CPAN module but the algorithm/code is rather trivial (at least once you wrote it once :)

Related

How to force Elastic to keep more decimals from a float

I have some coordinates that I pass to Elasticsearch from Logstash, but Elastic keeps only 3 decimals, so coordinate wise, I completely lose the location.
When I send the data from Logstash, I can see it got the right value:
{
"nasistencias" => 1,
"tiempo_demora" => "15",
"path" => "/home/elk/data/visits.csv",
"menor" => "2",
"message" => "5,15,Parets del Vallès,76,0,8150,41.565505,2.234999575,LARINGITIS AGUDA,11/3/17 4:20,1,38,1,2,POINT(2.2349995750000695 41.565505000000044)",
"id_poblacion" => 76,
"#timestamp" => 2017-03-11T04:20:00.000Z,
"poblacion" => "Parets del Vallès",
"edad_valor" => 0,
"patologia" => "LARINGITIS AGUDA",
"host" => "elk",
"#version" => "1",
"Geopoint_corregido" => "POINT(2.2349995750000695 41.565505000000044)",
"id_tipo" => 1,
"estado" => "5",
"cp" => 8150,
"location" => {
"lon" => 2.234999575, <- HERE
"lat" => 41.565505 <- AND HERE
},
"id_personal" => 38,
"Fecha" => "11/3/17 4:20"
}
But then, I get it on Kibana as follows:
I do the conversion as follows:
mutate {
convert => { "longitud_corregida" => "float" }
convert => { "latitude_corregida" => "float" }
}
mutate {
rename => {
"longitud_corregida" => "[location][lon]"
"latitude_corregida" => "[location][lat]"
}
}
How can I keep all the decimals? With geolocation, one decimal can return the wrong city.
Another question (related)
I add the data to the csv document as follows:
# echo "5,15,Parets del Vallès,76,0,8150,"41.565505","2.234999575",LARINGITIS AGUDA,11/3/17 4:20,1,38,1,2,POINT(2.2349995750000695 41.565505000000044)" >> data/visits.csv
But in the original document, instead of dots there are comas for the coordinates. like this:
# echo "5,15,Parets del Vallès,76,0,8150,"41,565505","2,234999575",LARINGITIS AGUDA,11/3/17 4:20,1,38,1,2,POINT(2.2349995750000695 41.565505000000044)" >> data/visits.csv
But the problem was that it was getting the coma as field separator, and all the data was being sent to Elasticsearch wrong. Like here:
Here, the latitude was 41,565505, but that coma made it understand 41 as latitude, and 565505 as longitude. I changed the coma by dot, and am not sure if float understands comas and dots, or just comas. My question is, did I do wrong changing the coma by dot? Is there a better way to correct this?
Create a GEO-Point mapping for the lat/lon fields. This will lead to a more precise and internally optimized storage in ES and allow you more sophisticated GEO-Queries.
Please keep in mind, that you'll need to reindex the data as mapping changes are not possible afterwards (if there are already docs present having the fields to change)
Zero downtime approach:
Create a new index with a optimized mapping (derive it from the current, and make your changes manually)
Reindex the data (at least some docs for verification)
Empty the new index again
Change the logstash destination to the new index (consider using aliases)
Reindex the old data into the new index

Checking if a ruby hash contains a value greater than x

I have the following object returned from an InfluxDB query, and I want to be able to check if any of the derivatives are equal or greater than say 100, if so then do stuff.
I've been trying to use select to check that field, but I really don't actually understand how to work with a data structure like this. How would I go about iterating through every derivative value in my returned object?
I'm not really seeing an example that's similar to my case in the enumerable documentation.
https://ruby-doc.org/core-2.4.0/Enumerable.html
[{
"name" => "powerdns_value",
"tags" => nil,
"values" => [
{ "time" => "2017-03-21T14:20:00Z", "derivative" => 1},
{ "time" => "2017-03-21T14:30:00Z", "derivative" => 900},
{ "time" => "2017-03-21T14:40:00Z", "derivative" => 0},
{ "time" => "2017-03-21T15:20:00Z", "derivative" => 0}
]
}]
If you just want to know if one of the hashes in your array meet the condition
arr.first['values'].any? { |hash| hash['derivative'] >= 100 }

With a hash of lists how do I operate on each key/list element once in random order?

For example, if my HoL looks like:
%HoL = (
"flintstones" => [ "fred", "barney" ],
"jetsons" => [ "george", "jane", "elroy" ],
"simpsons" => [ "homer", "marge", "bart" ],
);
And I want to create a loop that will allow me to operate only once on each key/element pair in a completely random order (so that it jumps between keys randomly too, not just elements), how do I do that? I'm thinking it will use shuffle, but figuring out the specifics is defeating me.
(Sorry for noobishness of question; I haven't been coding long. I was also unable to find an answer for this specific problem by googling, though I daresay it's been answered somewhere before.)
Build an array of all key-value pairs, then shuffle that:
use List::Util 'shuffle';
my %HoL = (
"flintstones" => [ "fred", "barney" ],
"jetsons" => [ "george", "jane", "elroy" ],
"simpsons" => [ "homer", "marge", "bart" ],
);
# Build an array of arrayrefs ($ref->[0] is the key and $ref->[1] is the value)
my #ArrayOfPairs = map {
my $key = $_;
map { [ $key, $_ ] } #{$HoL{$key}}
} keys %HoL;
for my $pair (shuffle #ArrayOfPairs) {
print "$pair->[1] $pair->[0]\n";
}

How to count many _ids for the search term?

How to make query returns count numbers of post_ids for searched names? I would like to have name and count in the resulting array.
Actual code:
#array_tags = Tag.where(:name.in=>[/r/])
# returns
# [{"_id":"4eb57a20b51ab102cc00001f","name":"ruby","post_ids":["4eb57a20b51ab102cc00001e","4eb57a53b51ab102cc000023","4eb57a63b51ab102cc000025"]}]
# best expected
# [{"_id":"4eb57a20b51ab102cc00001f","name":"ruby","count":"3"}]
Schema:
{ "_id" : ObjectId( "4eb57a20b51ab102cc00001f" ),
"name" : "ruby",
"post_ids" : [
ObjectId( "4eb57a20b51ab102cc00001e" ),
ObjectId( "4eb57a53b51ab102cc000023" ),
ObjectId( "4eb57a63b51ab102cc000025" ) ] }
EDIT!
I get it! Source:
#tags = Tag.grpost(params[:term])
def self.grpost(find_by)
self.collection.group(
:key => 'name',
:cond => {:name=>{"$in"=>[/^#{find_by}/]}},
:reduce => "function(obj,prev) { prev.total_posts += obj.post_ids.length; }",
:initial => { total_posts: 0 }
)
end
I don't think it's possible to do that on-the-fly without some complex map-reduce operation.
Most probably it would be easier to add the count field to your Tag document and maintain it yourself with $inc or something.

MongoDB and MongoRuby: Sorting on mapreduce

I am currently trying to do a simple mapreduce over some documents stored in MongoDB. I use
map = BSON::Code.new "function() { emit(this.userid, 1); }"
for the mapping and
reduce = BSON::Code.new "function(key, values) {
var sum = 0;
values.forEach(function(value) {
sum += value;
});
return sum;
}"
for the reduction. This works fine when I call map_reduce the following way:
output = col.map_reduce(map, reduce, # col is the collection in mongodb, e.g. db.users
{
:out => {:inline => true},
:raw => true
}
)
Now to the real question: How can I use the upper call to map_reduce to enable sorting? The manual says, that I must use sort and an array of [key, direction] pairs. I guessed the following should work, but it doesn't:
output = col.map_reduce(map, reduce,
{
:sort => [["value", Mongo::ASCENDING]],
:out => {:inline => true},
:raw => true
}
)
Do I have to choose another datatype? The option also doesn't work (same error), when using an empty [], although the manual says that is the default for the option. Unfortunately the error message from MongoDB doesn't help too much:
/usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/db.rb:506:in `command': Database command 'mapreduce' failed: {"assertion"=>"sort has to be blank or an Object", "assertionCode"=>13609, "errmsg"=>"db assertion failure", "ok"=>0.0} (Mongo::OperationFailure)
from /usr/lib/ruby/gems/1.9.1/gems/mongo-1.3.1/lib/mongo/collection.rb:576:in `map_reduce'
from ./mapreduce.rb:26:in `<main>'
If you need the full runnable code, please say so in the comments. I exclude it for now as it only contains the initialization of a connection to mongodb and initialization of the collection col by querying a database.
Use a BSON::OrderedHash and it will work.
output = col.map_reduce(map, reduce,
{
:sort => BSON::OrderedHash.new[{"value", Mongo::ASCENDING}],
:out => {:inline => true},
:raw => true
}
)

Resources