How to sort strings using two substring equality conditions? - sorting

I have a list of strings with the following format:
('group1-1', 'group1-2','group1-9', 'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' )
I need them to be sorted as below:
group wise first and then number wise.
('group1-1', 'group1-2','group1-9','group1-10', 'group2-1','group2-2', 'group2-9', 'group2-10' )
I've written following code, but it's not working as expected:
a comparator that sorts based on the group and if the groups match, it sorts based on the number.
my #list = ('group1-1', 'group1-2','group1-9',
'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' );
#list = sort compare #list;
for (#list){
print($_."\n");
}
sub compare{
my $first_group, $first_num = get_details($a);
my $second_group, $second_num = get_details($b);
if($first_group < $second_group){
return -1;
} elsif($first_group == $second_group){
if ( $first_num < $second_num) {
return -1;
} elsif ( $first_num == $second_num ) {
return 0;
} else {
return 1;
}
} else{
return 1;
}
}
sub get_details($){
my $str= shift;
my $group = (split /-/, $str)[0];
$group =~ s/\D//g;
my $num = (split /-/, $str)[1];
$num =~ s/\D//g;
return $group, $num;
}

You could use a Schwartzian transform:
use warnings;
use strict;
my #list = ('group1-1', 'group1-2','group1-9',
'group2-1','group2-2', 'group2-9','group1-10', 'group2-10' );
#list = map { $_->[0] }
sort { $a->[1] cmp $b->[1] or $a->[2] <=> $b->[2] }
map { [$_, split /-/] }
#list;
for (#list) {
print($_."\n");
}
Prints:
group1-1
group1-2
group1-9
group1-10
group2-1
group2-2
group2-9
group2-10

There's a little detail with the data here that can lead to a quiet bug. When you use the pre-hyphen substring for sorting (group1 etc), it has both letters and numbers so when sorted lexicographically it may be wrong for multi-digit numbers. For example
group1, group2, group10
is sort-ed (by default cmp) into
group1
group10
group2
What is wrong, I presume.
So inside sorting we need to break the groupN into group and N, and sort numerically by N.
use warnings;
use strict;
use feature 'say';
my #list = ('group1-1', 'group1-2','group1-9',
'group2-1','group2-2', 'group2-9',
'group1-10', 'group2-10',
'group10-2', 'group10-1' # Added some 'group10' data
);
# Break input string into: group N - N (and sort by first then second number)
#list =
map { $_->[0] }
sort { $a->[2] <=> $b->[2] or $a->[4] <=> $b->[4] }
map { [ $_, /[0-9]+|[a-zA-Z]+|\-/g ] }
#list;
say for #list;
The regex extracts both numbers and words from the string, for sorting. But if that lone substring is always indeed the same (group) then we only ever sort by numbers and can use /[0-9]+/g, and compare numerically arrayref elements at indices 1 and 2.
Prints
group1-1
group1-2
group1-9
group1-10
group2-1
group2-2
group2-9
group2-10
group10-1
group10-2

I'd make sure the strings in the list follows the pattern (\S+\d+-\d+) and then use cmp for the string comparison part and <=> for the numbers:
sub compare {
if( $a =~ /(\S+)(\d+)-(\d+)/ ) {
my($A1,$A2,$A3) = ($1,$2,$3);
if( $b =~ /(\S+)(\d+)-(\d+)/ ) {
my($B1,$B2,$B3) = ($1,$2,$3);
return ($A1 cmp $B1) || ($A2 <=> $B2) || $A3 <=> $B3;
}
}
$a cmp $b; # fallback if a string doesn't follow the pattern
};

Natural sort
What you want is called a "natural sort".
use Sort::Key::Natural qw( natsort );
my #sorted = natsort #unsorted;
It can also be performed in-place.
use Sort::Key::Natural qw( natsort_inplace );
natsort_inplace #array;
Key sort
For when you want more control.
use Sort::Key::Multi qw( uukeysort );
my #sorted = uukeysort { /(\d+)/g } #unsorted;
or
use Sort::Key::Multi qw( uukeysort_inplace );
uukeysort_inplace { /(\d+)/g } #array;
Without modules (Unoptimized)
my #sorted =
sort {
my ($ag, $an) = $a =~ /(\d+)/g;
my ($bg, $bn) = $b =~ /(\d+)/g;
$ag <=> $bg || $an <=> $bn
}
#unsorted;
Without modules (Schwartzian transform)
This avoids repeating the same work. Instead of extracting the info 2*N*log2(N) times, it only extracts it N times.
my #sorted =
map $_->[0],
sort { $a->[1] <=> $b->[1] || $a->[2] <=> $b->[2] }
map [ $_, /(\d+)/g ],
#unsorted;
Without modules (GRT)
An optimization of ST.
my #sorted =
map substr($_->[0], 8),
sort
map pack('NNa*', /(\d+)/g, $_),
#unsorted;

Related

Appropriate way to return an arrayref from a sub with optional sorting in Perl version 5.20

I try to write a subroutine under Perl 5 version 5.20, that creates a large directory list stored in an array. The subroutine returns the result as an arrayref. For convenience reasons I want the have the option to sort the result.
#!/usr/bin/env perl
use v5.20;
use warnings;
use strict;
use File::Slurp qw(read_dir);
use Time::HiRes;
use feature qw(signatures);
no warnings 'once';
no warnings 'experimental';
no warnings 'experimental::signatures';
my $PATH='/net/dbfs/GRM-RS/Flight-Campaigns/2021-08-23.Ram-Head-i-22.SE-01/cam/MM010259/iiq/';
sub fsReadDir($base, $sort, $mode = 1) {
$base //= '.'; # Base path default is the current path
$sort //= 0; # Flag for array sorting of the result
my #res=read_dir($base);
if ($sort) {
return [sort(#res)] if $mode == 1;
if ($mode == 2) {
#res = sort(#res);
return \#res;
}
} else {
return \#res;
}
}
sub testSorting($sort, $mode, $max = 1000) {
my $start = [Time::HiRes::gettimeofday()];
my $count = 0;
for my $ix (0..$max) {
my $array = fsReadDir($PATH, $sort, $mode );
$count = #$array;
}
my $end = time();
my $dif = Time::HiRes::tv_interval($start);
print "SORT: $sort MODE: $mode COUNT: $count TIME: $dif s\n"
}
testSorting(0, 1);
testSorting(1, 1);
testSorting(1, 2);
Results
/usr/bin/env perl "test-array.pl"
SORT: 0 MODE: 1 COUNT: 14861 TIME: 6.882694 s
SORT: 1 MODE: 1 COUNT: 14861 TIME: 9.131504 s
SORT: 1 MODE: 2 COUNT: 14861 TIME: 8.622628 s
What is the effective way to sort the array directly at the return level?
If you insist on sorting out the sorting business in the return statement itself can use a ternary
return $sort ? [ sort #res ] : \#res;
This may be all well and clear enough in simple cases.
However, I find it clearer to first deal with cases and options and then return the result
#res = sort #res if $sort;
if ($mode == 1) { ... } # modes given in the question do nearly the same,
elsif ($mode == 2) { ... } # but imagine different processing based on value
...
return \#res;
Also, sorting in place should be a little more efficient.
If this were about efficiency then you'd want to benchmark different approaches, and under realistic circumstances. For one, it may all get blown out of the water by reading a large directory, when one may not be able to tell any performance difference in how exactly the return is constructed.
So I'd go for clarity, until it is clearly seen that the choice does affect performance.

Are single element lists allowed in perl?

I'm trying to dynamically create a list of values from an AJAX request (using Catalyst), like this:
my #listofvalues = #{$params->{parameter_with_many_values}};
Then I loop through the list to make database insertions (one for each value). Since I need to loop through various lists like the one above, I need to access the index of the list. I am currently doing it like this:
foreach my $key (0 .. $#listofvalues){
$table_model->new({
field1 => $listofvalues[$key],
field2 => $x,
field3 => $another_listofvalues[$key]
field4 => $yet_another_listofvalues[$key]
});
}
This seems to work fine when two or more elements are received in the request. Whenever a single element is received, I get an error like
[error] Caught exception in pbitdb::Controller::Subjects->add "Can't use string ("1") as an ARRAY ref while "strict refs" in use at /home/lioneluranl/svn/pbitdb/pbitdb/script/../lib/pbitdb/Controller/Subjects.pm line 119."
Where, in this case, 1 is the value received and line 119 is the line where #listofvalues is being declared.
Now I've tried several to workaround this issue, but haven't found anything that works both ways (for a single or for various values). Any suggestions?
Yes, single element lists are OK in Perl, as are arrays and references to such arrays.
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
sub new {
print Dumper \#_;
}
my $table_model = 'main';
for my $values ( [ 'a' .. 'c' ],
[ 'd' ]
) {
my $params = { parameter_with_many_values => $values };
my #listofvalues = #{ $params->{parameter_with_many_values} };
my #another_listofvalues = map uc, #listofvalues;
for my $key (0 .. $#listofvalues) {
my $x = rand;
$table_model->new({
field1 => $listofvalues[$key],
field2 => $x,
field3 => $another_listofvalues[$key]
});
}
}
How do you populate $params->{parameter_with_many_values}?
Update
It seems Catalyst::Request should mention that their "safe" parameters should be handled as follows:
#!/usr/bin/perl
use warnings;
use strict;
use Data::Dumper;
for my $params ( { param_with_many_values => 'a' },
{ param_with_many_values => [ 'a' .. 'e' ] },
{ something => 'else' }
) {
my $value_or_values = $params->{param_with_many_values};
my #list_of_values = ref $value_or_values ? #$value_or_values
: defined $value_or_values ? $value_or_values
: ();
print Dumper \#list_of_values;
}
First of all, you are asking about arrays (a type of variable), not lists (a vague term which can take on numerous definitions, none of which are pertinent here).
Yes, you can have an array with only one element.
$ perl -e'my #a = "ele"; CORE::say 0+#a; CORE::say for #a;'
1
ele
That's not the problem. The problem is that
#{$params->{parameter_with_many_values}}
expects
$params->{parameter_with_many_values}
to contain a reference to an array, but it contains 1 instead. It was probably set using
$params->{parameter_with_many_values} = #a; # Assigns number of elements
instead of
$params->{parameter_with_many_values} = \#a;

Sorting hash kv pairs

my %hash =
two => 2,
three => 3,
one => 1,
;
for %hash.sort(*.key)>>.kv -> ($key, $value) {
say "'$key' => '$value'";
}
Is %hash.sort({.key})>>.kv equivalent to above sort?
Why this sort doesn't work without hyper >> hint?
The sort method is returning a List of Pairs.
Since calling .kv on the list returns a list of index, Pair lists, which you don't want; you can't just call .kv. So you have to pull out the key and value from the Pair objects in the list individually by calling the .kv method on each of them, which >>.kv does.
You could also have used .map(*.kv) instead.
The >>.kv syntax allows an implementation to spread the work over multiple threads if it makes sense to do so.
( Currently Rakudo just does the work in a semi-random order to prevent people from using the feature wrong )
There is an alternate way of writing the loop by extracting the attributes using adverbs in the sub-signature:
for %hash.sort -> (:$key, :$value) {
say "'$key' => '$value'";
}
for %hash.sort -> $pair (:$key, :$value) {
say $pair;
say $key === $pair.key and $value === $pair.value; # True␤
}
# :$key is short for :key($key)
for %hash.sort -> (:key($k), :value($v)) {
say "'$k' => '$v'";
}
This can be useful on other objects which don't have a method for creating a list of their public attributes
class C { has $.a; has $.b; has $.c; has $!private-value }
my $c = 5;
my $obj = C.new(:a<A>,:b(1),:$c);
given $obj -> ( :$a, :b($b), :$c) ) {
say "$a $b $c";
}
# ignore $.a by using an unnamed scalar
given $obj -> ( :a($), :$b, :$c ) { ... }
# places any unspecified public attributes in %others
given $obj -> ( :$a, :$b, *%others ) {
.say for keys %others; # c␤
}
# ignores any unspecified attributes
# useful to allow subclasses to add more attributes
# or to just drop any values you don't care about
given $obj -> ( :$a, :$b, *% ) { ... }
# fails because it doesn't handle the public c attribute
# in the sub-signature
given $obj -> ( :$a, :$b ) { ... }
That is only the beginning of what's possible with signatures.
All of the following is also allowed in subroutine and method signatures, optional, and completely overkill for this example.
It is really useful in multi subs and multi methods for restricting the possible candidates.
for 'one' => 1, 1/3
->
# Type is an alias to the object type
::Type Any $_ # Any is the default type requirement
# the public attributes of the object
(
::A-Type Any :key( :numerator( $a ) ),
::B-Type Any :value( :denominator( $b ) ) where $b >= 1,
)
{
my Type $obj = $_; # new variable declared as having the same type
my A-Type $new-a = $a;
my B-Type $new-b = $b;
# could have used $_.^name or .^name instead of Type.^name
# so you don't actually have to add the alias to the signature
# to get the name of the arguments type
say Type.^name, ' ', $_;
say ' ', A-Type.^name, ' ', $a;
say ' ', B-Type.^name, ' ', $b;
}
Pair one => 1
Str one
Int 1
Rat 0.333333
Int 1
Int 3
As to using .sort({.key}), yes that is basically the same thing, as sort accepts anything Callable there.
I would like to point out that you didn't even need to provide an argument to sort because it's default is even smarter than what you gave it.
Perl 6 has many ways of creating and accessing Callable things. So any of the following would have worked:
*.key
{ .key } # { $_.key }
-> $_ { .key } # basically what the previous line turns into
{ $^placeholder-var.key }
sub ($_) { .key }
&a-subroutine-reference # you would have to create the subroutine though
Also since all of the normal operators are actually subroutines, you could use them in other places where you need a Callable. ( I can't think of one that works in that spot though )
&infix:<+> # the subroutines responsible for the numeric addition operator
&[+] # ditto
&prefix:<++>
&postfix:<++>
# etc
As far as I can see, the ony difference between the two versions is use of a block with implicit $_ parameter instead of using a Whatever-Star, so they are indeed equivalent.
This is Perl, so There Is More Than One Way to Do It:
*.key
{ .key }
{ $^arg.key }
-> $arg { $arg.key }
Why this sort doesn't work without hyper >> hint?
sort coerces the hash to a list of pairs, and that's what you'll get:
say %hash.sort(*.key).perl;
# ("one" => "1", "three" => "3", "two" => "2")
To get rid of the pairs, you need to iterate over the list and call .kv on each one:
say %hash.sort(*.key)>>.kv.perl;
# (("one", "1"), ("three", "3"), ("two", "2"))
say %hash.sort(*.key).map(*.kv).perl;
# (("one", "1"), ("three", "3"), ("two", "2"))
You could coerce to Hash before calling .kv:
say %hash.sort(*.key).hash.kv.perl;
# ("one", "1", "three", "3", "two", "2")
but this would of course defeat the purpose of the excercise as hash ordering cannot be relied on.
You may have noticed that you'll get different results depending on how exactly you write the code. If there no trailing .list, what you get is actually a Parcel and not a List, but semantics have not been finalized.
Note that even though the returned objects all perlify with simple parentheses, some are parcels and some are lists, which you can check by calling .WHAT. This is still a work in progress.
Also note the inner parentheses in some of these variants, which you can get rid of with a call to .flat. If you do so, you can use -> $key, $value as signature of your for loop instead of -> ($key, $value) (or, more explicitly, -> $anon ($key, $value)) which uses signature binding to unpack the parcels.
Instead of using .kv, you could use the same approach to unpack the pair objects instead:
for %hash.sort(*.key) -> (:$key, :$value) {
say "'$key' => '$value'";
}

Can I use 'where' inside a for-loop in swift?

Is there also a possibility to use the 'where' keyword in another place then a switch? Can I use it in a for in loop for example?
I have an array with bools, all with a value, can I do something like this:
var boolArray: [Bool] = []
//(...) set values and do stuff
for value where value == true in boolArray {
doSomething()
}
This would be a lot nicer than use an if, so I am wondering if there is a possibility to use where in combination with a for-loop. Ty for your time.
In Swift 2, new where syntax was added:
for value in boolArray where value == true {
...
}
In Pre 2.0 one solution would be to call .filter on the array before you iterate it:
for value in boolArray.filter({ $0 == true }) {
doSomething()
}
A normal for-loop will iterate all elements present in the list. But sometimes we want to iterate only when data satisfy some condition, there we can use where clause with for -loop. It's just a replacement of if condition inside the loop.
For example:
let numbers = [1,2,3,4,5,6,7]
for data in numbers {
if (data % 2 == 0) {
print(data)
}
}
can be rewritten in the simpler way as:
for data in numbers where data % 2 == 0 {
print(data)
}
Yes, you can use "where" clause with for loop.
let arr = [1,2,3,4,5]
for value in arr where value != 0 {
print(value)
}
Considering your example,
var boolArray: [Bool] = []
//(...) set values and do stuff
for value in boolArray where value == true {
doSomething()
}

replacing letters in text (pseudo-code)

I'm making a script to create a username. It should be four letters long; traditionally we've used 3 letters of the last name + 1 of the first name.
If it was already used we manually thought of an alternative.
So if my name is Fred Flinstones we should try FLIF. If this doesnt work; we loop through the name: FLIA, FLIB, FLIC ... FLIZ, FLAA, FLAB, FLAC, ... FLZZ, FAAA, FAAB, ...
The easiest way is to loop through last letters; then make another set of loops through second last letter and loop through last letters; then a set of loops through third last, second last, last; and fourth+third+second+last.
This makes a lot of do while loops nested in eachother + unreadable for other humans + a lot of typing.
I could use a counter per letter but that also doesn't seem elegant
I could try with one counter and then using mod 26 to see how many letters need replacement (but that seems very complex).
Is there some elegant/efficient ways to do this?
Bonus points for first trying to keep the string as 'logically correct' as possible (f.e. keeping the last letter an F for Fred or skipping letters FLIF; FLNF, FLSF, FLTF, ...) .
Not sure if this is what you mean, but if you structure your username-script in the following way (I used PHP as language), you can extend it by adding options with higher fuzz factors while keeping the code readable:
echo findName('FLINTSTONE', 'FRED');
function findName($last, $first) {
for ($fuzzFactor = 0; ; $fuzzFactor++) {
$candidates = fuzzNames($last, $first, $fuzzFactor);
if (empty($candidates)) {
// exhausted
return "sorry, I'm out of options!";
}
foreach ($candidates as $candidate) {
if (isUnique($candidate)) {
return $candidate;
}
}
}
}
function fuzzNames($last, $first, $fuzzFactor) {
switch ($fuzzFactor) {
case 0:
// no fuzz, return first choice
return array(substr($last, 0, 3) . $first[0]);
case 1:
// replace the third letter of the last name
// by the fourth/fifth/.../last letter (FLNF, FLTF, ... , FLEF)
$candidates = array();
for ($i = 3; $i < strlen($last); $i++) {
$candidates[] = substr($last, 0, 2) . $last[$i] . $first[0];
}
return $candidates;
case 2:
// replace the second and third letter of the last name
// by their follow-ups (FINF, FITF, ... , FNEF)
$candidates = array();
for ($i = 2; $i < strlen($last) - 1; $i++) {
for ($j = $i + 1; $j < strlen($last); $j++) {
$candidates[] = $last[0] . $last[$i] . $last[$j] . $first[0];
}
}
return $candidates;
default:
return array();
}
}

Resources