Inserting a hyphen when string changes from alpha to numeric - performance

Given a random alphanumeric string (A-Z0-9) between 1 to 10 characters long, I'd like to insert a hyphen when it changes from alpha to numeric, or numeric to alpha.
I have something that works now, but I'm sure it performs as awful as it looks. I know there's a better way to handle this, but someone in the office made weak coffee this morning, or at least that's the excuse I'm going with. ;)
I have to do this ~15 million times, so the faster, the better.
Code snip:
my #letters = split //, $string;
my $type;
foreach my $letter ( #letters ) {
if (! $type) {
if ($letter =~ /^[A-Z]$/) {
$type = 'a'
}
else {
$type = 'd'
}
$string = $letter;
next;
}
else {
if ($type eq 'a') {
if ($letter =~ /^[0-9]$/) {
$string .= '-' . $letter;
$type = 'd';
next;
}
else {
$sring .= $letter;
}
}
else {
if ($letter =~ /^[A-Z]$/) {
etc, etc.
Ugh, it hurts just looking at that.

This should work:
$string =~ s/([A-Z])([0-9])/$1-$2/g;
$string =~ s/([0-9])([A-Z])/$1-$2/g;
Add the /i modifier if you want to to be case-insensitive.
Probably faster (since it avoids captures), but requires 5.10:
$string =~ s/[A-Z]\K(?=[0-9])/-/g;
$string =~ s/[0-9]\K(?=[A-Z])/-/g;

Related

Using explode but after first <br><br> Pattern

I am using explode to create an array from string and using pattern <br><br> but in my case i want it to be start after first match <br><br>. Means when first time it gets <br><br> then it skip and create an array from second time and so on match <br><br>
<?php
$myString = "Welcome, j.<br><br>
 
(1) this revisional application has been preferred under read with section of the against the order passed by the learned special judge, ndps act, 6th court at barasat.<br><br>
(2) case of the petitioner is that he along with other accused persons are facing trial case, as pending the learned additional sessions judge, 6th court. barasat.";
$myArray = explode('<br><br>', $myString);
// $arr = ltrim($myArray, ' ');
echo "<pre>"; print_r($myArray);
foreach($myArray as $key => $value)
{
$whatIWant = substr($value, strpos($value," "));
echo ucfirst($whatIWant);
}
Basically my task is to Capitalize first letter of Second word i.e this and case.
You can use array_shift() function to shift your array by one element and skip first item in your array.
php document for array_shift
$myArray = explode('<br><br>', $myString);
array_shift($myArray);
foreach($myArray as $key => $value)
{
$whatIWant = substr($value, strpos($value," "));
echo ucfirst($whatIWant);
}
I would try using preg_match_all instead.
The code below looks for the pattern (\d+) (\w+)
\d+ = 1 or more digits
\w+ = 1 or more alphanumeric letters
If you were to add (3)... (4)... and so on, it would match all of them
<?php
$myString = "Welcome, j.<br><br>
(1) this revisional application has been preferred under read with section of the against the order passed by the learned special judge, ndps act, 6th court at barasat.<br><br>
(2) case of the petitioner is that he along with other accused persons are facing trial case, as pending the learned additional sessions judge, 6th court. barasat.";
preg_match_all("/(\(\d+\))\s(\w+)/", $myString, $all_matches);
foreach ($all_matches[1] as $idx => $match) {
$original = "{$match} " . $all_matches[2][$idx];
$replacement = "{$match} " . ucfirst($all_matches[2][$idx]);
$myString = str_replace($original, $replacement, $myString);
}
echo $myString;

Perl: how to combine consecutive page numbers?

OS: Windows server 2012, so I don't have access to Unix utils
Activestate Perl 5.16. Sorry I cannot upgrade the OS or Perl, I'm stuck with it.
I did a google search and read about 10 pages from that, I find similar problems but not what I'm looking for.
I then did 3 searches here and found similar issues with SQL, R, XSLT, but not what I'm looking for.
I actually am not sure where to start so I don't even have code yet.
I'd like to combine consecutive page numbers into a page range. Input will be a series of numbers in an array.
Input as an array of numbers: my #a=(1,2,5)
Output as a string: 1-2, 5
Input ex: (1,2,3,5,7)
Output ex: 1-3, 5, 7
Input ex: (100,101,102,103,115,120,121)
Output ex: 100-103,115,120-121
Thank you for your help!
This is the only code I have so far.
sub procpages_old
# $aref = array ref to list of page numbers.
# $model = used for debugging.
# $zpos = used for debugging only.
{my($aref,$model,$zpos)=#_;
my $procname=(caller(0))[3];
my #arr=#$aref; # Array of page numbers.
my #newarr=();
my $i=0;
my $np1=0; # Page 1 of possible range.
my $np2=0; # Page 2 of possible range.
my $p1=0; # Page number to test.
my $p2=0;
my $newpos=0;
while ($i<$#arr)
{
$np1=$arr[$i];
$np2=getdata($arr[$i+1],'');
$p1=$np1;
$p2=$np2;
while ($p2==($p1+1)) # Consecutive page numbers?
{
$i++;
$p1=$a[$i];
$p2=getdata($a[$i+1],'');
}
$newarr[$newpos]=$np1.'-'.$p2;
$newpos++;
# End of loop
$i++;
}
my $pages=join(', ',#arr);
return $pages;
}
That's called an intspan. Use Set::IntSpan::Fast::XS.
use Set::IntSpan::Fast::XS qw();
my $s = Set::IntSpan::Fast::XS->new;
$s->add(100,101,102,103,115,120,121);
$s->as_string; # 100-103,115,120-121
This seems to do what you want.
#!/usr/bin/perl
use strict;
use warnings;
use feature 'say';
while (<DATA>) {
chomp;
say rangify(split /,/);
}
sub rangify {
my #nums = #_;
my #range;
for (0 .. $#nums) {
if ($_ == 0 or $nums[$_] != $nums[$_ - 1] + 1) {
push #range, [ $nums[$_] ];
} else {
push #{$range[-1]}, $nums[$_];
}
}
for (#range) {
if (#$_ == 1) {
$_ = $_->[0];
} else {
$_ = "$_->[0]-$_->[-1]";
}
}
return join ',', #range;
}
__DATA__
1,2,5
1,2,3,5,7
100,101,102,103,115,120,121
The rangify() function builds an array of arrays. It traverses your input list and if a number is just one more than the previous number then it adds the new number to the second-level array that's currently at the end of the first-level array. If the new number is not sequential, it adds a new second-level array at the end of the first level array.
Having built this data structure, we walk the first-level array, looking at each of the second-level arrays. If the second level array contains only one element then we know it's not a range, so we overwrite the value with the single number from the array. If it contains more than one element, then it's a range and we overwrite the value with the first and last elements separated with a hyphen.
So I managed to adjust this code to work for me. Pass your array of numbers into procpages() which will then call num2range().
######################################################################
# In:
# Out:
sub num2range
{
local $_ = join ',' => #_;
s/(?<!\d)(\d+)(?:,((??{$++1}))(?!\d))+/$1-$+/g;
tr/-,/, /;
return $_;
}
######################################################################
# Concatenate consecutive page numbers in array.
# In: array like (1,2,5,7,99,100,101)
# Out: string like "1-2, 6, 7, 99-101"
sub procpages
{my($aref,$model,$zpos)=#_;
my $procname=(caller(0))[3];
my #arr=#$aref;
my $pages=num2range(#arr);
$pages=~s/\,/\-/g; # Change comma to dash.
$pages=~s/ /\, /g; # Change space to comma and space.
#$pages=~s/\,/\, /g;
return $pages;
}
You probably have the best solution already with the Set::IntSpan::Fast::XS module, but assuming you want to take the opportunity to learn perl here's another perl-ish way to do it.
use strict;
use warnings;
my #nums = (1,2,5);
my $prev = -999; # assuming you only use positive values, this will work
my #out = ();
for my $num (#nums) {
# if we are continuing a sequence, add a hyphen unless we did last time
if ($num == $prev + 1) {
push (#out, '-') unless (#out and $out[-1] eq '-');
}
else {
# if we are breaking a sequence (#out ends in '-'), add the previous number first
if (#out and $out[-1] eq '-') {
push(#out, $prev);
}
# then add the current number
push (#out, $num);
}
# track the previous number
$prev = $num;
}
# add the final number if necessary to close the sequence
push(#out, $prev) if (#out and $out[-1] eq '-');
# join all values with comma
my $pages = join(',', #out);
# flatten the ',-,' sequence to a single '-'
$pages =~ s/,-,/-/g;
print "$pages\n";
This is not super elegant or short, but is very simple to understand and debug.

Mixin Passing Variables to Each/For

Instead of doing something like this (which is obviously inefficient):
#mixin padding($top, $right, $bottom, $left) {
$top: $top * $spacer;
$right: $right * $spacer;
$bottom: $bottom * $spacer;
$left: $left * $spacer;
$output: $top $right $bottom $left;
padding: $output;
}
Can I do something similar to this?
#mixin padding($top:"", $right:"", $bottom:"", $left:"") {
$params: $top, $right, $bottom, $left;
$output: "";
#each $var in $params {
$var: $var * $spacer;
$output: $output + $var;
}
padding: $output;
}
Yes you can =)
In this case you can also skip the first step and use $params... as the parameter (variable argument list), and then you can have padding with 1, 2, 3, or 4 values.
#mixin padding($params...) {
$output: ();
#each $var in $params {
$var: $var * $spacer;
$output: join( $output, $var );
}
padding: $output;
}
If you use the join function instead of string concatenation you won't have troubles separating the values with spaces when printing out (a list gets automatically compiled to CSS as space separated elements).
DEMO
And if you want to make sure to limit the params to 4 max, you can do something like this instead of the #each loop:
$n: length($params);
#for $i from 1 through if( $n < 4, $n , 4) {
$var: nth($params,$i) * $spacer;
$output: join( $output, $var );
}
DEMO
However, if you want to stick with strings and concatenation instead of lists, you would need to use an additional space in the concatenation inside the loop (e.g. $output + " " + $var) and then return the $output with string interpolation #{$output} or using unquote($output). But you would end up with an extra space attached to the string ... and would need to apply some additional logic in case you would want to get rid of it.

finding text using regex

I have string like this abc+def(ghi)+jkl and I want to get {abc,ghi,jkl} as the result of regex. So far I found: [a-z]+(?!\\() but it returns {abc,de,ghi,jkl}. Does anyone know how to write proper regular expression?
Examples:
var + var_s    => { var, var_s }
var + method(arg) + var_s    => { var, arg, var_s }
string * string_s + method_name(arg,arg_s)   => { string, string_s, arg, arg_s }
var + 2 * ( 3 + something ) +count( 3, gender )   => { var, something, gender }
I need to take all strings consist of 'a-z A-z _' but not ending with ( char. Strings: method(, method_name(, count( should be omitted because of ( .
if (preg_match('/(\w+)/i', $subject, $regs)) {
$result = $regs[0];
} else {
$result = "";
}
(\w+)
Options: case insensitive
Match the regular expression below and capture its match into backreference number 1 «(\w+)»
Match a single character that is a “word character” (letters, digits, and underscores) «\w+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
(regexbuddy is your friend!)
I have found a solution on http://msdn.microsoft.com/en-us/library/az24scfc.aspx:
\b[a-zA-Z_]+(?!\()\b
if (preg_match('/\b([a-z]+)(?!\(|\))\b/i', $subject, $regs)) {
$result = $regs[0];
} else {
$result = "";
}

Script to find words inside a given word from wordlist

I have a dictionary with 250K words (txt file). For each of those words I would like to come up with a script that will throw all possible anagrams (each anagram should also be in the dictionary).
Ideally the script would output in this format:
word1: anagram1,anagram2...
word2: anagram1,anagram2...
Any help would be greatly appreciated.
Inspired by this, I would suggest you create a Trie.
Then, the trie with N levels will have all possible anagrams (where N is the length of the original word). Now, to get different sized words, I suggest you simply traverse the trie, ie. for all 3 letter subwords, just make all strings that are 3 levels deep in the trie.
I'm not really sure of this, because I didn't test this, but it's an interesting challenge, and this suggestion would be how I would start tackling it.
Hope it helps a little =)
It must be anagram week.
I'm going to refer you to an answer I submitted to a prior question: https://stackoverflow.com/a/12811405/128421. It shows how to build a hash for quick searches of words that have common letters.
For your purpose, of finding substrings/inner-words, you will also want to find the possible inner words. Here's how to quickly locate unique combinations of letters of varying sizes, based on a starting word:
word = 'misses'
word_letters = word.downcase.split('').sort
3.upto(word.length) { |i| puts word_letters.combination(i).map(&:join).uniq }
eim
eis
ems
ess
ims
iss
mss
sss
eims
eiss
emss
esss
imss
isss
msss
eimss
eisss
emsss
imsss
eimsss
Once you have those combinations, split them (or don't do the join) and do look-ups in the hash my previous answer built.
What I tried so far in Perl :
use strict;
use warnings;
use Algorithm::Combinatorics qw(permutations);
die "First argument should be a dict\n" unless $ARGV[0] or die $!;
open my $fh, "<", $ARGV[0] or die $!;
my #arr = <$fh>;
my $h = {};
map { chomp; $h->{lc($_)} = [] } #arr;
foreach my $word (#arr) {
$word = lc($word);
my $chars = [ ( $word =~ m/./g ) ];
my $it = permutations($chars);
while ( my $p = $it->next ) {
my $str = join "", #$p;
if ($str ne $word && exists $h->{$str}) {
push #{ $h->{$word} }, $str
unless grep { /^$str$/ } #{ $h->{$word} };
}
}
if (#{ $h->{$word} }) {
print "$word\n";
print "\t$_\n" for #{ $h->{$word} };
}
}
END{ close $fh; }
There's maybe some possible improvement for speed, but it works.
I use French dict from words archlinux package.
EXAMPLE
$ perl annagrammes.pl /usr/share/dict/french
abaissent
absentais
abstenais
abaisser
baissera
baserais
rabaisse
(...)
NOTE
To installl the perl module :
cpan -i Algorithm::Combinatorics
h = Hash.new{[]}
array_of_words.each{|w| h[w.downcase.chars.sort].push(w)}
h.values

Resources