Following is the input set given.
1009 2000
1009 2001
1002 2002
1003 2002
Each line represents one group, Number represents ID of the member in the group. Problem is to choose minimum number of people which re-presents complete given set. Only one member should be choose from each Group. 2-tuple members will not repeated. But members can be part of more than one group.
So in this example answer be 1009 and 2002 which represents the sets. 1009 is chosen because it is representing two team and same is the case for 2002.
I am looking for what algorithm can be used to solve this problem.
Another example:
1009 2000
1009 2001
1002 2002
1003 2002
1004 2003
Answer can be { 1009 , 2002, 1004} or { 1009, 2002, 2003}
Actually, the example given by Sodved shows, that I was wrong. It is not solved by the edge cover, as that still leaves the problem of selecting the actual vertices.
You left some details out, so I'm making the following assumptions. Let me know if they're wrong.
There are exactly two numbers per line
No number that appears first on some line also appears second on some other line
So if you think of each number as a vertex in a graph, and each line of input as an edge between two vertices, what you've got is a bipartite graph--a set of "first numbers" and a set of "second numbers," and edges between them. Now, break up the graph into each of its connected components. In each connected component, you either have to pick all the "first numbers" or all the "second numbers." So for each connected component, pick whichever of those two options is a smaller set.
If anyone's intersted, here's some clumsy inefficent perl code which seems to solev the problem. Takes a LONG... time with a large data set. I'm sure things could be done better, especially he generation of the index permutations (sequences).
#!/usr/bin/perl
# http://stackoverflow.com/questions/6689147/
use strict;
use warnings;
# Return array of arrays with every possible sequence of 0..n-1
sub sequences($);
sub sequences($)
{
my $n = $_[0];
my #ret;
if( $n > 0 )
{
for( my $i=0; $i<$n; $i++ )
{
my #a = sequences( $n-1 );
foreach my $br (#a)
{
my #b = #$br;
splice( #b, $i, 0, $n-1 );
push( #ret, \#b );
}
}
}
else
{
#ret = ( [] );
}
return #ret;
} # END sequences
# Remove elements from set which are covered by later elements in set
sub stripset($$)
{
my( $data, $set ) = #_;
my $strip = 0;
my %cover;
for( my $i=0; $i<scalar(#$set); $i++ )
{
my $covered;
for( my $j=scalar(#$set)-1; $j>$i; $j-- )
{
if( $data->{$set->[$j]}->{$set->[$i]}
&& !$cover{$set->[$i]} )
{
$covered = 1;
$cover{$set->[$j]} = 1;
last;
}
}
if( $covered )
{
$strip = $i+1;
}
else
{
last;
}
}
if( $strip )
{
splice( #$set, 0, $strip );
}
} # END stripset
# Load input
my %links;
while( my $line = <STDIN> )
{
if( $line =~ /^\s*(\S+)\s+(\S+)\s*$/ )
{
$links{$1}->{$2} = 1;
$links{$2}->{$1} = 1;
}
else
{
warn "INVALID INPUT LINE: $line";
}
}
my #elems = keys(%links);
my #minset = #elems;
foreach my $seq ( sequences( scalar(#elems) ) )
{
my #set = map( $elems[$_], #$seq );
#print "TEST set: " . join( ' ', #set ) . "\n";
stripset( \%links, \#set );
#print "STRP set: " . join( ' ', #set ) . "\n";
if( scalar(#set) < scalar(#minset) )
{
#minset = #set;
}
}
print "Shortest set: " . join( ' ', #minset ) . "\n";
Just want to notice that this requirement
Only one member should be choose from each Group.
doesn't make much sense. If you enforce it, this simple problem
1 2
2 3
3 1
has no solutions.
Related
In EMEditor, is there a way to get the number of occurrences of a "find in files" search per file? In other words, it finds 10,000 "hits" across 25 files, I'd like to know that 1200 where in file1 etc.
Notepad++ does a great job of this by allowing you to collapse the results by file and showing a summary for each, but I haven't seen a way to get the information in EMEditor.
After Find in Files, you can run this macro while the results document is active. Save this code as, for instance, statistics.jsee, and then select this file from Select... in the Macros menu. Finally, do Find in Files, and select Run in the Macros menu while the results document is active.
// Creates statistics from Find in Files Results.
// 2020-06-27
Redraw = false;
sOutput = "";
y = 1;
yMax = document.GetLines();
for( ;; ) {
document.selection.SetActivePoint( eePosLogical, 1, y++ );
document.selection.Mode = eeModeStream | eeModeKeyboard;
bFound = document.selection.Find("\\(\\d+?\\)\\:",eeFindNext | eeFindReplaceCase | eeFindReplaceRegExp,0);
document.selection.Mode = eeModeStream;
if( !bFound ) {
break;
}
sFile = document.selection.Text;
n = sFile.lastIndexOf("(");
sFile = sFile.substr( 0, n );
nCount = 1;
for( ;; ) {
document.selection.SetActivePoint( eePosLogical, 1, y );
sLine = document.GetLine( y );
if( sLine.length > sFile.length && sLine.substr( 0, sFile.length ) == sFile ) {
++nCount;
++y;
}
else {
sOutput += sFile + "\t" + nCount + "\n";
break;
}
}
}
document.selection.Mode = eeModeStream;
Redraw = true;
editor.NewFile();
document.write( sOutput );
editor.ExecuteCommandByID(4471); // switch to TSV mode
I was playing around with parallel sorting tonight.
creating sort file
naive-sort ...
1000000
23.61265496
partial-hyper-sort ...
4
7.4924575
simple-hyper-sort ...
1000000
141.7945921
naive-hyper-sort ...
1000000
23.5756172
Two things stand out.
a) naive-hyper-sort is just as fast as ordinary sort
b) The sorting in partial-hyper-sort is 66% faster than ordinary sort.
My problem: partial-hyper-sort is exactly that: "partial". It returns (on my system) 4 sublists, but you want of course one. My attempt to merge them into one (simple-hyper-sort) is an order of magnitude slower than the whole sorting!
So how do I get this faster? And if someone can explain why naive-hyper-sort is not faster than naive-sort, bonus points and a cookie (seriously, a literal cookie).
create-sortfile
unless "tosort.txt".IO.e;
my $start = DateTime.now;
say "naive-sort ...";
say naive-sort.elems;
say DateTime.now - $start;
$start = DateTime.now;
say "partial-hyper-sort ...";
say partial-hyper-sort.elems;
say DateTime.now - $start;
$start = DateTime.now;
say "simple-hyper-sort ...";
say simple-hyper-sort.elems;
say DateTime.now - $start;
$start = DateTime.now;
say "naive-hyper-sort ...";
say naive-hyper-sort.elems;
say DateTime.now - $start;
sub create-sortfile
{
say "creating sort file";
my $to-sort = "tosort.txt".IO.open(:w);
$to-sort.say( ( 10_000 .. 99_999 ).pick )
for ( 1 .. 1_000_000 );
$to-sort.close;
}
sub simple-hyper-sort
{
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
my $degrees = $*KERNEL.cpu-cores;
my $batch = $lines.elems div $degrees;
my #parts = $lines.batch( $batch ).hyper( :batch(1) ).map({ .sort });
my #index = 0 xx $degrees;
return gather loop
{
my $smallest = Inf;
my $smallest-index = -1;
my $smallest-degree = -1;
for ^$degrees -> $degree
{
my $index = #index[$degree];
if ( $index < $batch )
{
my $value = #parts[$degree;$index];
if $value < $smallest
{
$smallest = $value;
$smallest-index = $index;
$smallest-degree = $degree;
}
}
}
last if $smallest-index < 0;
#index[$smallest-degree]++;
take $smallest;
}
}
sub partial-hyper-sort
{
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
my $degrees = $*KERNEL.cpu-cores;
my $batch = $lines.elems div $degrees;
my #parts = $lines.batch( $batch ).hyper( :batch(1) ).map({ .sort });
}
multi sub naive-hyper-sort
{
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
my $degrees = $*KERNEL.cpu-cores;
my $batch = $lines.elems div $degrees;
$lines.hyper( :$batch, :$degrees ).sort;
}
sub naive-sort {
my $to-sort = "tosort.txt".IO.open( :r );
$to-sort.lines.sort;
}
Using .hyper and .race only results in a speedup if there is a parallel implementation of the operation that follows. At the time of writing, there is not a parallel sort implementation in Rakudo, which means that it will fall back to using the regular sort implementation. So, this answers why native-hyper-sort doesn't come out faster right now (however it almost certainly will in the future).
The idea in simple-hyper-sort is along the right lines: break the data up into sublists, sort the sublists, and then merge them. We can therefore parallelize the sorting of the sublists. As you've observed, this achieving a win is dependent on the merge operation itself being fast enough, and so we'd need to carefully optimize that.
It's much easier to write a tight (not to mention correct!) merge operation if it only needs to merge two sublists. Thus, we need to structure the problem in a way that gives us that. This points to a different approach:
Break the list in half
start a task to sort each half
await the two tasks
Merge the results of the two tasks
Note that step 2 involves recursion. We stop recursing when the size of a partition is too small, and use the built-in sort on such partitions. (We can choose to define "too small" by dividing the input list size by the number of CPU cores, along the lines of your example.)
Thus we get a solution like this:
sub parallel-merge-sort {
my $to-sort = "tosort.txt".IO.open( :r );
my $lines = $to-sort.lines;
return do-sort $lines, ceiling($lines.elems / $*KERNEL.cpu-cores);
sub do-sort(#in, $limit) {
if #in.elems < $limit {
#in.sort
}
else {
my $pivot = #in.elems div 2;
merge |await
(start do-sort #in[0..$pivot], $limit),
(start do-sort #in[$pivot^..#in.end], $limit)
}
}
sub merge(#a, #b) {
my #result;
my int $a-idx = 0;
my int $a-elems = +#a;
my int $b-idx = 0;
my int $b-elems = +#b;
my int $r-idx = 0;
while $a-idx < $a-elems && $b-idx < $b-elems {
my $a := #a[$a-idx];
my $b := #b[$b-idx];
if $a before $b {
$a-idx++;
#result[$r-idx++] := $a;
}
else {
$b-idx++;
#result[$r-idx++] := $b;
}
}
if $a-idx < $a-elems {
#result[$r-idx++] := $_ for #a[$a-idx..*];
}
elsif $b-idx < $b-elems {
#result[$r-idx++] := $_ for #b[$b-idx..*];
}
return #result;
}
}
I didn't spend terribly long optimizing this (haven't profiled, etc.), but did take care to use natives and binding in order to reduce allocations. On My Machine, this does give a speedup over the serial sorting, however.
One other easy speedup we can get on this - at the cost of a tad more complexity in the code - comes from realizing that we don't need to slice the input in do-sort until the point that we actually need to send it to the built-in sort:
sub do-sort(#in, $limit, $from = 0, $to = #in.end) {
my $elems = $to - $from;
if $elems < $limit {
#in[$from..$to].sort
}
else {
my $pivot = $from + $elems div 2;
merge |await
(start do-sort #in, $limit, $from, $pivot),
(start do-sort #in, $limit, $pivot + 1, $to)
}
}
Which saves some work; by this point, I measure a factor of two speedup on the machine I'm testing it on, which isn't amazing, but given we've an enforced serial O(n) step, and a bunch more parallelized O(n) steps, over the serial sort algorithm, it's perhaps not so disappointing after all.
Last week I decided to give a try to Perl6 and started to reimplement one of my program.
I have to say, Perl6 is so the easy for object programming, an aspect very painfull to me in Perl5.
My program have to read and store big files, such as whole genomes (up to 3 Gb and more, See example 1 below) or tabulate data.
The first version of the code was made in the Perl5 way by iterating line by line ("genome.fa".IO.lines). It was very slow and unsable for a correct execution time.
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
my $id;
my $s;
for $!file.IO.lines -> $line {
if $line ~~ /^\>/ {
say $id;
if $id.defined {
%!seq{$id} = sequence.new(id => $id, seq => $s);
}
my $l = $line;
$l ~~ s:g/^\>//;
$id = $l;
$s = "";
}
else {
$s ~= $line;
}
}
%!seq{$id} = sequence.new(id => $id, seq => $s);
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
So after a little bit of RTFM, I changed for a slurp on the file, a split on the \n which I parsed with a for loop. This way I managed to load the data in 2 min. Much better but not enough. By cheating, I mean by removing a maximum of \n (Example 2), I decreased the execution time to 30 seconds. Quite good, but not totaly satisfied, by this fasta format is not the most used.
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
my $id;
my $s;
say "Slurping ...";
my $f = $!file.IO.slurp;
say "Spliting file ...";
my #lines = $f.split(/\n/);
say "Parsing lines ...";
for #lines -> $line {
if $line !~~ /^\>/ {
$s ~= $line;
}
else {
say $id;
if $id.defined {
%!seq{$id} = seq.new(id => $id, seq => $s);
}
$id = $line;
$id ~~ s:g/^\>//;
$s = "";
}
}
%!seq{$id} = seq.new(id => $id, seq => $s);
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
So RTFM again and I discovered the magic of Grammar. So new version and an execution time of 45 seconds whatever the fasta format used. Not the fastest way but more elegant and stable.
my grammar fastaGrammar {
token TOP { <fasta>+ }
token fasta {<.ws><header><seq> }
token header { <sup><id>\n }
token sup { '>' }
token id { <[\d\w]>+ }
token seq { [<[ACGTNacgtn]>+\n]+ }
}
my class fastaActions {
method TOP ($/){
my #seqArray;
for $<fasta> -> $f {
#seqArray.push: seq.new(id => $f.<header><id>.made, seq => $f<seq>.made);
}
make #seqArray;
}
method fasta ($/) { make ~$/; }
method id ($/) { make ~$/; }
method seq ($/) { make $/.subst("\n", "", :g); }
}
my class fasta {
has Str $.file is required;
has %seq;
submethod TWEAK() {
say "=> Slurping ...";
my $f = $!file.IO.slurp;
say "=> Grammaring ...";
my #seqArray = fastaGrammar.parse($f, actions => fastaActions).made;
say "=> Storing data ...";
for #seqArray -> $s {
%!seq{$s.id} = $s;
}
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
I think that I found good solution to handle these kind of big files, but performances are still under those of Perl5.
As a newbie in Perl6, I would be interested to know if there is better ways to deal with big data or if there is some limitation due to the Perl6 implementation ?
As a newbie in Perl6, I would ask two questions :
Is there other Perl6 mechanisms that I'm not aware yet, or not yet
documented, for storing huge data from a file (like my genomes) ?
Did I reach the maximum performances for the current version of
Perl6 ?
Thanks for reading !
Fasta Example 1 :
>2L
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATG
ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT
...
>3R
CGACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCATTTTCTCTCCCATATTATAGGGAGAAATATG
ATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCTCTTTGATTTTTTGGCAACCCAAAATGGTGGCGGATGAACGAGAT
...
Fasta example 2 :
>2L
GACAATGCACGACAGAGGAAGCAGAACAGATATTTAGATTGCCTCTCAT...
>3R
TAGGGAGAAATATGATCGCGTATGCGAGAGTAGTGCCAACATATTGTGCT...
EDIT
I applied advises of #Christoph and #timotimo and test with code:
my class fasta {
has Str $.file is required;
has %!seq;
submethod TWEAK() {
say "=> Slurping / Parsing / Storing ...";
%!seq = slurp($!file, :enc<latin1>).split('>').skip(1).map: {
.head => seq.new(id => .head, seq => .skip(1).join) given .split("\n").cache;
}
}
}
sub MAIN()
{
my $f = fasta.new(file => "genome.fa");
}
The program finished in 2.7s, which is so great !
I also tried this code on the wheat genome (10 Gb). It finished in 35.2s.
Perl6 is not so slow finally !
Big Thank for the help !
One simple improvement is to use a fixed-width encoding such as latin1 to speed up character decoding, though I'm not sure how much this will help.
As far as Rakudo's regex/grammar engine is concerned, I've found it to be pretty slow, so it might indeed be necessary to take a more low-level approach.
I did not do any benchmarking, but what I'd try first is something like this:
my %seqs = slurp('genome.fa', :enc<latin1>).split('>')[1..*].map: {
.[0] => .[1..*].join given .split("\n");
}
As the Perl6 standard library is implemented in Perl6 itself, it is sometimes possible to improve performance by just avoiding it, writing code in an imperative style such as this:
my %seqs;
my $data = slurp('genome.fa', :enc<latin1>);
my $pos = 0;
loop {
$pos = $data.index('>', $pos) // last;
my $ks = $pos + 1;
my $ke = $data.index("\n", $ks);
my $ss = $ke + 1;
my $se = $data.index('>', $ss) // $data.chars;
my #lines;
$pos = $ss;
while $pos < $se {
my $end = $data.index("\n", $pos);
#lines.push($data.substr($pos..^$end));
$pos = $end + 1
}
%seqs{$data.substr($ks..^$ke)} = #lines.join;
}
However, if the parts of the standard library used has seen some performance work, this might actually make things worse. In that case, the next step to take would be adding low-level type annotations such as str and int and replacing calls to routines such as .index with NQP builtins such as nqp::index.
If that's still too slow, you're out of luck and will need to switch languages, eg calling into Perl5 by using Inline::Perl5 or C using NativeCall.
Note that #timotimo has done some performance measurements and wrote an article about it.
If my short version is the baseline, the imperative version improves performance by 2.4x.
He actually managed to squeeze a 3x improvement out of the short version by rewriting it to
my %seqs = slurp('genome.fa', :enc<latin-1>).split('>').skip(1).map: {
.head => .skip(1).join given .split("\n").cache;
}
Finally, rewriting the imperative version using NQP builtins sped things up by a factor of 17x, but given potential portability issues, writing such code is generally discouraged, but may be necessary for now if you really need that level of performance:
use nqp;
my Mu $seqs := nqp::hash();
my str $data = slurp('genome.fa', :enc<latin1>);
my int $pos = 0;
my str #lines;
loop {
$pos = nqp::index($data, '>', $pos);
last if $pos < 0;
my int $ks = $pos + 1;
my int $ke = nqp::index($data, "\n", $ks);
my int $ss = $ke + 1;
my int $se = nqp::index($data ,'>', $ss);
if $se < 0 {
$se = nqp::chars($data);
}
$pos = $ss;
my int $end;
while $pos < $se {
$end = nqp::index($data, "\n", $pos);
nqp::push_s(#lines, nqp::substr($data, $pos, $end - $pos));
$pos = $end + 1
}
nqp::bindkey($seqs, nqp::substr($data, $ks, $ke - $ks), nqp::join("", #lines));
nqp::setelems(#lines, 0);
}
I am asked to do the perl program to find a value(from user input) in array. If matched "its ok". If not matched, then check within the value in the index[0] to index[1] ... index[n]. So then if the value matched to the between two elements then report which is near to these elements might be index[0] or index[1].
Let you explain.
Given array : 10 15 20 25 30;
Get the value from user : 14 (eg.)
Hence 14 matched with in the two elements that is 10(array[0]) - 15(array[1])
Ultimately the check point is do not use more than one for loop and never use the while loop. You need to check one for loop and many of if conditions.
I got the output by which I did here is:
use strict;
use warnings;
my #arr1 = qw(10 15 20 25 30);
my $in = <STDIN>;
chomp($in);
if(grep /$in/, #arr1)
{ } #print "S: $in\n"; }
else
{
for(my $i=0; $i<scalar(#arr1); $i++)
{
my $j = $i + 1;
if($in > $arr1[$i] && $in < $arr1[$j])
{
#print "SN: $arr1[$i]\t$arr1[$j]\n";
my ($inc, $dec) = "0";
my $chk1 = $arr1[$i] + 1;
AGAIN1:
if($in == $chk1)
{ }
else
{ $chk1++; $inc++; goto AGAIN1; }
my $chk2 = $arr1[$j] - 1;
AGAIN2:
if($in == $chk2){ }
else
{ $chk2--; $dec++; goto AGAIN2; }
if($inc > $dec)
{ print "Matched value nearest to $arr1[$j]\n"; }
elsif($inc < $dec)
{ print "Matched value nearest to $arr1[$i]\n"; }
}
}
}
However my question is there a way in algorithm?. Hence if someone can help on this one and it would be appreciated.
Thanks in advance.
You seem determined to make this as complicated as possible :-)
Your specification isn't completely clear, but I think this does what you want:
#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
my #array = qw[10 15 20 25 30];
chomp(my $in = <STDIN>);
if ($in < $array[0]) {
say "$in is less than first element in the array";
exit;
}
if ($in > $array[-1]) {
say "$in is greater than last element in the array";
exit;
}
for (0 .. $#array) {
if ($in == $array[$_]) {
say "$in is in the array";
exit;
}
if ($in < $array[$_]) {
if ($in - $array[$_ - 1] < $array[$_] - $in) {
say "$in is closest to $array[$_ - 1]";
} else {
say "$in is closest to $array[$_]";
}
exit;
}
}
say "Shouldn't get here!";
Using the helper functions any and reduce from the core module List::Util and the built in abs.
#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/reduce any/;
my #arr1 = qw(10 15 20 25 30);
chomp(my $in = <STDIN>);
if (any {$in == $_} #arr1) {
print "$in is in the array\n";
}
else {
my $i = reduce { abs($in - $arr1[$a]) > abs($in - $arr1[$b]) ? $b : $a} 0 .. $#arr1;
print "$in is closest to $arr1[$i]\n";
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
PHP: “Notice: Undefined variable” and “Notice: Undefined index”
i am getting "ranksection" array at run time and after implementing ksort on "ranksection" i wanna move its data upward on null index as i am printing ranksection before moving its data upward if there were any free array i am successfully getting what i want but it also giving error "Undefined Index" i dont know why my code is,
$sortvar = count($ranksection);
$seqnum = 0;
for ($var = 0; $var <= $sortvar; $var++) {
if ($ranksection[$var] != null) {
$sequence[$seqnum] = $ranksection[$var];
$seqnum++;
}
}
print_r($sortvar);
print_r($ranksection);
print_r($sequence);
the result is,
3
Array ( [1] => Self Introduction [2] => Experience in Econometrics and multivariate S [3] => Experience )
Array ( [0] => Self Introduction [1] => Experience in Econometrics and multivariate S [2] => Experience )
Hopes for your suggestions
See your print_r section of second array it starts with index 1 and your $var assigned to 0.
Now here you are trying to access the 0th index. that is why you're getting this error.
Try to use foreach
foreach($ranksection as $key => $value ) {
if ($ranksection[$key] != null) {
$sequence[$seqnum] = $ranksection[$key];
$seqnum++;
}
}
Do this as the condition for your for loop:
for ($var = 0; $var <= $sortvar - 1; $var++) {
The -1 is important since arrays start from 0 and go the length of the array, minus one.
You ought to be using count()-1 in your for loop:
$sortvar = count($ranksection) -1;
$seqnum = 0;
for ($var = 0; $var <= $sortvar; $var++) {
Or, use less than (without less than equal to) as the operator:
for ($var = 0; $var < $sortvar; $var++) {
You're going past the end of your array because:
$sortvar = count($ranksection); // This is 4
Array indexs start at 0, but count returns the number where 1 is the first item, not 0. Do this to fix it:
$sortvar = count($ranksection) - 1;
Or change <= to <
for ($var = 0; $var < $sortvar; $var++) {
if ($ranksection[$var] != null) {
$sequence[$seqnum] = $ranksection[$var];
$seqnum++;
}
}