Sort hash by values - sorting

This is not how I populated my hash. Just for easier reading, here are its contents, keys are on a fixed length string:
my %country_hash = (
"001 Sample Name New Zealand" => "NEW ZEALAND",
"002 Samp2 Nam2 Zimbabwe " => "ZIMBABWE",
"003 SSS NNN Australia " => "AUSTRALIA",
"004 John Sample Philippines" => "PHILIPPINES,
);
I want to get the sorted keys based on values. So my expectation:
"003 SSS NNN Australia "
"001 Sample Name New Zealand"
"004 John Sample Philippines"
"002 Samp2 Nam2 Zimbabwe "
What I did:
foreach my $line( sort {$country_hash{$a} <=> $country_hash{$b} or $a cmp $b} keys %country_hash ){
print "$line\n";
}
also;
(I doubted this will sort but anyway)
my #sorted = sort { $country_hash{$a} <=> $country_hash{$b} } keys %country_hash;
foreach my $line(#sorted){
print "$line\n";
}
Neither of them sorted correctly. I hope someone could help.

If you had used warnings, you would have been told that <=> is the wrong operator; it is used for numeric comparison. Use cmp for string comparison instead. Refer to sort.
use warnings;
use strict;
my %country_hash = (
"001 Sample Name New Zealand" => "NEW ZEALAND",
"002 Samp2 Nam2 Zimbabwe " => "ZIMBABWE",
"003 SSS NNN Australia " => "AUSTRALIA",
"004 John Sample Philippines" => "PHILIPPINES",
);
my #sorted = sort { $country_hash{$a} cmp $country_hash{$b} } keys %country_hash;
foreach my $line(#sorted){
print "$line\n";
}
This prints:
003 SSS NNN Australia
001 Sample Name New Zealand
004 John Sample Philippines
002 Samp2 Nam2 Zimbabwe
This also works (without the extra array):
foreach my $line (sort {$country_hash{$a} cmp $country_hash{$b}} keys %country_hash) {
print "$line\n";
}

Related

Comparison between two tab separated files in unix using awk

I've written this code on unix but I am facing the problem as mentioned below.
My Code is:
paste 1.txt 2.txt|
awk ' { FS = "\t " } ; NR == 1 { n = NF/2 }
{for(i=1;i<=n;i++)
if($i!=$(i+n))
{c = c s i; s = "," }
if(c)
{print "Line No. " NR-1 " COLUMN NO " c;
c = "" ; s = "" } } '
Expected Output:
Line No. 2 COLUMN NO 2,3
Line No. 4 COLUMN NO 1,2,3,4
Line No. 6 COLUMN NO 2,3,4,5
Line No. 7 COLUMN NO 1,2,3,4,5
Output that is getting generated:
Line No. 2 COLUMN NO 2,3
Line No. 4 COLUMN NO 1,2,3,4
Line No. 6 COLUMN NO 2,3,4,5
Line No. 7 COLUMN NO 1,2,3,4
Below specified file is space separated. To understand it better I have formatted it this way.
File1:
ID_ID First_name Last_name Address Contact_Number
ID1 John Rock 32, Park Lake, California 2222200000
ID2 Tommy Hill 5322 Otter Lane Middleberge 3333300000
ID3 Leonardo Test Half-Way Pond, Georgetown 4444400000
ID8 Rhyan Bigsh 6762,33 Ave N,St. Petersburg 5555500000
ID50 Steve Goldberg 6762,33 Ave N,St. Petersburg 6666600000
ID60 Steve Goldberg 6666600000
File2:
ID_ID First_name Last_name Address Contact_Number
ID1 John Rock 32, Park Lake, California 2222200000
ID2 Tommy1 Hill1 5322 Otter Lane Middleberge 3333300000
ID3 Leonardo Test Half-Way Pond, Georgetown 4444400000
ID80 Sylvester Stallone 5555500000
ID50 Steve Goldberg 6762,33 Ave N,St. Petersburg 6666600000
ID60 Mark Waugh St. Petersburg 7777700000
ID70 John Smith 8888800000

Efficient use of Perl hash

I'm using a hash to abbreviate state names
%STATEABBRIVATE = ('ALABAMA' => 'AL',
...);
Some of my input sets already have abbreviated state names. Would it be more efficient to use an if defined $STATEABBRIVATE{$state} or to add another 51 matched pairs 'AL'=>'AL' to the hash?
If you want to verify that the state really exists, using AL => 'AL' might be the easiest way.
To keep your code DRY (Don't Repeat Yourself), you can just
my %STATEABBRIVATE = ( ALABAMA => 'AL',
...
);
my #abbrevs = values %STATEABBRIVATE;
#STATEABBRIVATE{#abbrevs} = #abbrevs;
If you're concenrned about performance, the bottleneck is probably somewhere else:
#! /usr/bin/perl
use warnings;
use strict;
use Benchmark qw{ cmpthese };
use Test::More;
my %hash = qw( Alabama AL Alaska AK Arizona AZ Arkansas AR California CA
Colorado CO Connecticut CT Delaware DE Florida FL
Georgia GA Hawaii HI Idaho ID Illinois IL Indiana IN
Iowa IA Kansas KS Kentucky KY Louisiana LA Maine ME
Maryland MD Massachusetts MA Michigan MI Minnesota MN
Mississippi MS Missouri MO Montana MT Nebraska NE
Nevada NV Ohio OH Oklahoma OK Oregon OR Pennsylvania PA
Tennessee TN Texas TX Utah UT Vermont VT Virginia VA
Washington WA Wisconsin WI Wyoming WY );
$hash{'West Virginia'} = 'WV';
$hash{'South Dakota'} = 'SD';
$hash{'South Carolina'} = 'SC';
$hash{'Rhode Island'} = 'RI';
$hash{'North Dakota'} = 'ND';
$hash{'North Carolina'} = 'NC';
$hash{'New York'} = 'NY';
$hash{'New Mexico'} = 'NM';
$hash{'New Jersey'} = 'NJ';
$hash{'New Hampshire'} = 'NH';
my %larger = %hash;
#larger{ values %hash } = values %hash;
sub def {
my $state = shift;
return defined $hash{$state} ? $hash{$state} : $state
}
sub ex {
my $state = shift;
return exists $hash{$state} ? $hash{$state} : $state
}
sub hash {
my $state = shift;
return $larger{$state}
}
is(def($_), ex($_), "def-ex-$_") for keys %larger;
is(def($_), hash($_), "def-hash-$_") for keys %larger;
done_testing();
cmpthese(-1,
{ hash => sub { map hash($_), keys %larger },
ex => sub { map ex($_), keys %larger },
def => sub { map def($_), keys %larger },
});
Results:
Rate def ex hash
def 27307/s -- -2% -11%
ex 27926/s 2% -- -9%
hash 30632/s 12% 10% --
Both if defined $STATEABBRIVATE{$state} and any hash lookups are going to be constant time (i.e. O(1) operations). In fact, defined() probably uses a hash table lookup behind the scenes anyway. So, my prediction is that the difference in performance is going to be negligible, even with large data sets. This is, at best, an educated guess.

Number of string value occurrences for distinct another column value

I have a model Counter which returns the following records:
name.....flowers.....counter
vino.....rose.........1
vino.....lily.........1
gaya.....rose.........1
rosi.....lily.........1
vino.....lily.........1
rosi.....rose.........1
rosi.....rose.........1
I want to display in the table like:
name | Rose | Lily |
---------------------
Vino | 1 | 2 |
---------------------
Gaya | 1 | 0 |
---------------------
Rosi | 2 | 1 |
I want to display the count of flowers for each distinct name. I have tried the following and wondering how can I do it elegantly?
def counter_results
#counter_results= {}
Counter.each do |name|
rose = Counter.where(flower: 'rose').count
lily= Counter.where(flower: 'lily').count
#counter_results['name'] = name
#counter_results['rose_count'] = rose
#counter_results['lily_count'] = lily
end
return #counter_results
end
which I don't get the hash values.
This will give you slightly different output, but I think it is probably closer to what you want than what you showed.
You can use the query:
Counter.group([:name, :flowers]).sum(:counter)
To get a result set that looks like:
{ ["vino", "rose"] => 1, ["vino", "lily"] => 2, ["gaya", "rose"] => 1, ["gaya", "lily"] => 0, ... }
And you can do something like this to generate your hash:
def counter_results
#counter_results = {}
Counter.group([:name, :flowers]).sum(:counter).each do |k, v|
#counter_results[k.join("_")] = v
end
#counter_results
end
The resulting hash would look like this:
{
"vino_rose" => 1,
"vino_lily" => 2,
"gaya_rose" => 1,
"gaya_lily" => 0,
...
}
Somebody else may have a better way to do it, but seems like that should get you pretty close.

How to extract a number using regular expression in ruby

I am new to regular expressions and ruby. below is the example which I start working with
words= "apple[12345]: {123123} boy 1233 6F74 2AC 28458 1594 6532 1500 D242g
apple[13123]: {123123123} girl Aui817E 9AD453 91321SDF 3423FS 1213FDAS 110FADA4 43ADAC0 1AADS4D8 BASAA24 "
I want to extract boy 1233 6F74 .. to .. D242g in an array
Similarly I want to extract girl Aui817E 9AD453 .. to .. 43ADAC0 1AADS4D8 BASAA24 in an array
I did tried to this could not do it. Can some one please help me to this simple exercise.
Thanks in advance.
begin
pattern = /apple\[\d+\]: \{\d+\} (\w) (\d+) (\d+) /
f = pattern.match(words)
puts " #{f}"
end
words.scan(/apple\[\d+\]: \{\d+\}(.+)/).map{|a| a.first.scan(/\S+/)}
or
words.each_line.map{|s| s.split.drop(2)}
Output:
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array = words.scan(/apple\[\d+\]: {\d+}(.+)/).flatten.map { |line| line.scan(/\w+/) }
({ and } are not need to escape on regex.)
return
[
["boy", "1233", "6F74", "2AC", "28458", "1594", "6532", "1500", "D242g"],
["girl", "Aui817E", "9AD453", "91321SDF", "3423FS", "1213FDAS", "110FADA4", "43ADAC0", "1AADS4D8", "BASAA24"]
]
array[0] gives an array start with "boy", and array[1] gives an array start with "girl".

Creating a tree data structure (Has to be native) in Perl to represent a call tree that is located in a external file

Here is an example of the file.
powrup.asm POWER_UP
......EXTERNAL_RAM_ADDRESSING_CHECK powrup.asm:461
......EXRAM powrup.asm:490
......INRAM powrup.asm:540
......OUTPUT_TEST powrup.asm:573
............AD_READ douttst.asm:276
............AD_READ douttst.asm:366
......OUTPUT2_TEST powrup.asm:584
............AD_READ douttst2.asm:253
............AD_READ douttst2.asm:342
......OUTPUT3_TEST powrup.asm:599
............AD_READ douttst3.asm:307
............AD_READ douttst3.asm:398
......INPUT_TEST powrup.asm:614
......PROGRAM_PINS2_INPUT powrup.asm:629
......ARINC_TEST powrup.asm:633
............ARINC_LEVEL_TEST artest.asm:178
..................AD_READ arltst.asm:204
..................AD_READ arltst.asm:250
..................AD_READ arltst.asm:300
..................AD_READ arltst.asm:346
..................AD_READ arltst.asm:396
..................AD_READ arltst.asm:442
............ARINC_READ artest.asm:209
............ARINC_WORD_TXRX_TEST artest.asm:221
..................ARINC_OUT artxrx.asm:207
..................ARINC_READ artxrx.asm:221
............ARINC_READ artest.asm:251
............ARINC_WORD_TXRX_TEST artest.asm:263
..................ARINC_OUT artxrx.asm:207
..................ARINC_READ artxrx.asm:221
......PROGRAM_PINS2_INPUT powrup.asm:640
......PROGRAM_PIN_TEST powrup.asm:642
......PT_RCVR_BITE powrup.asm:645
............AD_READ10 ptbite.asm:225
..................AD_READ adread10.asm:141
............AD_READ10 ptbite.asm:308
..................AD_READ adread10.asm:141
............AD_READ10 ptbite.asm:384
..................AD_READ adread10.asm:141
............AD_READ10 ptbite.asm:467
..................AD_READ adread10.asm:141
............AD_READ10 ptbite.asm:542
..................AD_READ adread10.asm:141
............AD_READ10 ptbite.asm:622
..................AD_READ adread10.asm:141
......PROGRAM_PINS2_INPUT powrup.asm:653
......EXEC_INIT powrup.asm:663
The ... represents the call depth. The file name after the line indicates the file name
and the line number it was called from in the parent. I can parse the file. What I am trying to do once I have parsed the file is put the data in a n-ary tree.
I am doing a Data coupling and Control Coupling analysis and have already collected all the set/use data for the all the variables in the build. I need to now be able to traverse the tree and based on the depth figure out if there are any set before use situations or any set but not used situations. I thought a tree traversal would make the most sense.
Here is an example of the of the collected data:
$hash_DB{'alt_deviation_evaluation.asm->ALT_STATUS'} = [
'alt_deviation_evaluation.asm',
'ALT_STATUS',
'1.1',
1,
"",
"",
"135,188,202,242",
"130,144"
];
'alt_deviation_evaluation.asm->ALT_STATUS' is the file name and variable name.
'alt_deviation_evaluation.asm', File name
'ALT_STATUS', Variable name
'1.1', versions of file
1, indicates has been processed
"", not used (maybe in future)
"", not used (maybe in future)
"135,188,202,242", variable Set line numbers for this fileVariable
"130,144" Variable Use line number for this file/Variable
I also have an array with all the variable names. Shortened example:
our #vars_list = (
'A429_TX_BUFFER_LENGTH',
'A429_TX_INPUT_BUFFER',
'A429_TX_INT_MASK',
'ABS_ALT_DIFF',
'ACTUAL_ALT',
'ADDRESS_FAIL',
'AD_CONV_FAIL',
'AD_CONV_SIGNAL',
'AD_DATA',
'AD_FAIL',
'AD_STATUS',
'AIR_MODE',
'AIR_MODE_COUNT',
'AIR_MODE_LAST',
'ALPHA_COR_SSM',
'ALPHA_EC_SSM',
'ALPHA_GRAD_SSM',
'ALPHA_LE_SSM',
'ALPHA_LG_SSM',
'ALPHA_MAX_MC_SSM'
};
My biggest hurdle is figuring out the proper data structures and algorithms to accomplish this task.
I figured a depth first search of a n-ary tree would give me what I want.
Here is my final solution:
#!/usr/local/bin/perl
# !/usr/bin/perl
use Data::Dumper; #!!!
sub Create_Tree;
sub Treverse;
#for my $node (#TREE) {
# print_node($node[0], 1);
#}
#Main
our #TREE;
Create_Tree("call_tree.small_01.txt");
my $str = Dumper #TREE;
$str =~ s/^(\s+)/' 'x(length($1)>>2)/meg;
#print "\n\n=======================================\n$str"; #!!!
#print "\n\n=======================================\n" . (Dumper #TREE); #!!!
#print "Arr = #TREE, SZ = $#TREE\n\n";
Treverse(\#TREE,1);
sub Create_Tree
{
my ($call_tree) = #_;
my #stack;
my ($old_depth, $p_arr) = (0, \#TREE);
open(IN, "< $call_tree" ) or die "Can not open '$call_tree' for input.\n";
for (<IN>)
{
if (m/^(\s*)(\S+)\s*=>\s*(\S+):(\d+)/ or m/^(\s*)(\S+)()()/)
{
my ($depth, $callee_fn, $caller_fn, $ln, $diff) = ($1, $2, $3, $4, 0);
$depth = int(length($depth) / 6);
$diff = $depth - $old_depth;
if ($diff == 1)
{
push #stack, $p_arr;
$p_arr = \#{$$p_arr[$#{$p_arr}]{children}};
}
elsif ($diff < 0)
{
$p_arr = pop #stack while ++$diff <= 0;
}
elsif ($diff > 1)
{
die "Incorrectly formated call tree:\n $_\n";
}
push #$p_arr, {
caller => $caller_fn,
called_by => $callee_fn,
at_line => $ln
};
$old_depth = $depth;
}
}
close IN;
}
exit;
OUTPUT look like this:
......file1
............file1 101:A
..................XXX.AA 102:AA
........................XXX.AAA 103:AAA
........................XXX.AAB 104:AAB
..............................XXX.AABA 105:AABA
..................XXX.AB 106:AB
........................XXX.ABA 107:ABA
............file1 108:B
..................XXX.BA 109:BA
........................XXX.BAA 110:BAA
........................XXX.BAB 111:BAB
From this call_tree.txt file:
file1
A => file1:101
AA => XXX.AA:102
AAA => XXX.AAA:103
AAB => XXX.AAB:104
AABA => XXX.AABA:105
AB => XXX.AB:106
ABA => XXX.ABA:107
B => file1:108
BA => XXX.BA:109
BAA => XXX.BAA:110
BAB => XXX.BAB:111
Using this subroutine:
sub Treverse
{
my ($p_arr, $level) = #_;
for (my $ind=0; $ind<=$#{$p_arr}; $ind++)
{
print "." x ($level*6);
if ($$p_arr[$ind]{'caller'} ne "") {print "$$p_arr[$ind]{'caller'}" . " " x 4;}
if ($$p_arr[$ind]{'at_line'} ne "") {print "$$p_arr[$ind]{'at_line' }" . ":";}
if ($$p_arr[$ind]{'called_by'} ne "") {print "$$p_arr[$ind]{'called_by'}" . "\n";}
Treverse(\#{$$p_arr[$ind]{children}}, $level +1) if defined $$p_arr[$ind]{children};
}
}
# END of Treverse
Here is how to print the structure you built using Wes's answer.
Once processed the data, you end up with something like this:
my #nodes = (
{ name => 'ARINC_TEST', file => 'powrup.asm', line => 633,
children => [
{ name => 'ARINC_LEVEL_TEST', file => 'artest.asm', line => 178,
children => [
{ name => 'AD_READ', file => 'arltst.asm', line => 204 },
{ name => 'AD_READ', file => 'arltst.asm', line => 250 },
{ name => 'AD_READ', file => 'arltst.asm', line => 300 },
{ name => 'AD_READ', file => 'arltst.asm', line => 346 },
{ name => 'AD_READ', file => 'arltst.asm', line => 396 },
{ name => 'AD_READ', file => 'arltst.asm', line => 442 },
],
},
{ name => 'ARINC_READ', file => 'artest.asm', line => 209,
children => [],
},
{ name => 'ARINC_WORD_TXRX_TEST', file => 'artest.asm', line => 221,
children => [
{ name => 'ARINC_OUT', file => 'artxrx.asm', line => 207 },
{ name => 'ARINC_READ', file => 'artxrx.asm', line => 221 },
],
}
]
}
);
The structure is recursive and children key point to arrayref of another hash. To print this out, you need recursive code:
for my $node (#nodes) {
print_node($node, 1);
}
sub print_node {
my ($node, $level) = #_;
# the node itself
print "." x ($level*6)
, $node->{name}, " " x 4
, $node->{file}, ":"
, $node->{line}, "\n";
# recurse for children
if(defined $node->{children}) {
for my $child (#{ $node->{children} }) {
print_node($child, $level + 1);
}
}
}
For data above, the code outputs
......ARINC_TEST powrup.asm:633
............ARINC_LEVEL_TEST artest.asm:178
..................AD_READ arltst.asm:204
..................AD_READ arltst.asm:250
..................AD_READ arltst.asm:300
..................AD_READ arltst.asm:346
..................AD_READ arltst.asm:396
..................AD_READ arltst.asm:442
............ARINC_READ artest.asm:209
............ARINC_WORD_TXRX_TEST artest.asm:221
..................ARINC_OUT artxrx.asm:207
..................ARINC_READ artxrx.asm:221
For data structures, one of the biggest powers of perl is arbitrary nesting of structures. Thus, rather than have a single variable that contains all the data for all the notes you can have "subnodes" inside their parents.
Lets say you had a hash for one entry:
%node1 = (
name => 'ALPHA_MAX_MC_SSM',
file => 'arltst.asm',
line => 42
);
The above code will create a nice simply node to store data in. But you can actually store more data inside that one itself. A "child node":
%node2 = (
name => 'ACTUAL_ALT',
file => 'foo.asm',
line => 2001
);
$node1{children}[0] = \%node2;
Then you have a child node ('children') in the first node that is an array of all it's children. You can access date in the child directly like:
$node1{'children'}[0]{'name'};
To understand this and how it works you need to read up on perl references and the perl data types. It takes a bit as a new perl programmer to get the concepts, but once you get it you can do amazingly powerful quick programs snarfing up complex hierarchical data and processing it.

Resources