preg_replace technicality in PHP regular expressions - expression

My current code:
$text = "This is my string exec001 and this is the rest of the string exec222 and here is even more execSOMEWORD a very long string!";
$text2 = preg_replace('/\bexec(\S+)/', "<html>$1</html><div>$1</div>",, $text);
echo $text2,"\n";
Outputs the following:
This is my string <html>001</html><div>001</div> and this is the rest of the string <html>222</html><div>222</div> and here is even more <html>SOMEWORD</html><div>SOMEWORD</div> a very long string!
My question is, how can I store multiple variables? For example: I want to replace execVARIABLE1:VARIABLE2:VARIABLE3 and store each of VARIABLE1, 2 and 3 in say $1, $2 and $3 when the string is rewritten.

In order to save the matched groups you can use preg_match():
$text = "This is my string exec001 and this is the rest of the string exec222 and here is even more execSOMEWORD a very long string!";
preg_match( '/.*?\bexec(\S+).*?\bexec(\S+).*?\bexec(\S+)/', $text, $matches);
print_r($matches);
OUTPUT
Array
(
[0] => This is my string exec001 and this is the rest of the string exec222 and here is even more execSOMEWORD
[1] => 001
[2] => 222
[3] => SOMEWORD
)

Thanks to M42, we have the correct answer as follows:
$text2 = preg_replace('/\bexec([^:\s]+):([^:\s]+)/', "<html>$1</html><div>$2</div>", $text);

Related

How do I handle Unicode with DBD::Oracle?

The perl DBI documentation says this :
Perl supports two kinds of strings: Unicode (utf8 internally) and non-Unicode (defaults to iso-8859-1 if forced to assume an encoding). Drivers should accept both kinds of strings and, if required, convert them to the character set of the database being used. Similarly, when fetching from the database character data that isn't iso-8859-1 the driver should convert it into utf8.
DBD::Sqlite with parameter (sqlite_unicode => 1), or DBD::Pg with parameter (pg_enable_utf8 => -1) -- which is the default -- indeed do such conversions.
With DBD::Oracle (v1.83, NLS_LANG='FRENCH_FRANCE.UTF8') it is not so : if non-Unicode strings are passed to INSERT or UPDATE statements, the driver does not upgrade them automatically to utf8.
Here is my test suite. Variants for SQLite and Pg succeed, but this Oracle variant fails :
use utf8;
use strict;
use warnings;
use Test::More;
use SQL::Abstract::More;
use Scalar::Util qw/looks_like_number/;
use DBI;
my #DBI_CONNECT_ARGS = #ARGV;
my ($table, $key_col, $val_col) = qw/TST_UTF8 KEY VAL/; # assuming this table is already created
binmode $_, ':utf8' for *STDERR, *STDOUT;
# strings for tests
my %str;
$str{utf8} = "il était une bergère"; # has flag utf8 because of 'use utf8'
$str{native} = $str{utf8}; utf8::downgrade($str{native}); # without flag utf8
$str{wide_chars} = "il était une bergère♥♡"; # chars > 256 - cannot be a native string (\x{2665}\x{2661})
$str{named_chars} = "il \N{LATIN SMALL LETTER E WITH ACUTE}tait une " # identical to string 'wide_chars'
. "berg\N{LATIN SMALL LETTER E WITH GRAVE}re"
. "\N{BLACK HEART SUIT}\N{WHITE HEART SUIT}";
# check that test strings meet expectations
ok utf8::is_utf8($str{utf8}), "perl string with utf8 flag";
ok !utf8::is_utf8($str{native}), "perl string without utf8 flag, (native chars ... latin1)";
is $str{utf8}, $str{native}, "strings 'utf8' and 'native' have different encodings but represent the same chars";
ok utf8::is_utf8($str{wide_chars}), "string with wide chars must have utf8 flag";
ok utf8::is_utf8($str{named_chars}), "string with named wide chars must have utf8 flag";
is $str{wide_chars}, $str{named_chars}, "named chars are identical to chars from perl source";
my $dbh = DBI->connect(#DBI_CONNECT_ARGS);
my $sqlam = SQL::Abstract::More->new;
my ($sql, #bind);
# suppress records from previous run
my #k = keys %str;
($sql, #bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \#k}});
my $del = $dbh->do($sql, {}, #bind);
note "DELETED $del records";
# insert strings via bind values
while (my ($key, $val) = each %str) {
($sql, #bind) = $sqlam->insert(-into => $table, -values => {$key_col => $key, $val_col => $val});
my $ins = $dbh->do($sql, {}, #bind);
note "INSERT via bind $key: $ins";
}
# read data back
($sql, #bind) = $sqlam->select(-from => $table,
-columns => [$key_col, $val_col],
-where => {$key_col => {-in => \#k}});
my $rows = $dbh->selectall_arrayref($sql, {}, #bind);
my %str_from_db = map {#$_} #$rows;
# check round trip
is_deeply \%str_from_db, \%str, 'round trip with bind values';
# suppress again
($sql, #bind) = $sqlam->delete(-from => $table, -where => {$key_col => {-in => \#k}});
$del = $dbh->do($sql, {}, #bind);
note "DELETED $del records";
# insert strings via raw sql
while (my ($key, $val) = each %str) {
my $ins = $dbh->do("INSERT INTO $table($key_col, $val_col) VALUES ('$key', '$val')");
note "INSERT via raw SQL $key: $ins";
}
# check round trip
is_deeply \%str_from_db, \%str, 'round trip with raw SQL';
As a workaround, I added some callbacks for automatic upgrading of native strings; with this addition the tests pass :
$dbh->{Callbacks}{prepare} = sub {
# warn "PREPARE : upgrading stmt: $_[1]\n";
utf8::upgrade($_[1]);
return;
};
$dbh->{Callbacks}{ChildCallbacks}{execute} = sub {
# warn "EXECUTE: ";
foreach my $i (1 .. $#_) {
if ($_[$i] && ! ref $_[$i] && ! looks_like_number(($_[$i]))) {
# warn "upgrading $i : $_[$i];";
utf8::upgrade($_[$i]);
}
}
print STDERR "\n";
return;
};
If I understand properly the DBI spec, this automatic upgrade should be performed by the DBD::Oracle driver, not by the application code. Or am i missing something ?

require solution for string replacement

I have a code where I have to replace a string with another string.
My file contains
secondaryPort = 7504
The code below
filtered_data =
filtered_data.gsub(
/secondaryPort=\d+/,
'secondaryPort=' + node['server']['secondaryPort']
)
should replace my file with
secondaryPort = 7555
but it fails to do so.
Make sure you account for the spaces around the equals sign in your string:
filtered_data = 'secondaryPort = 7504'
=> 'secondaryPort = 7504'
# with literal spaces
filtered_data.gsub(/secondaryPort = \d+/, 'secondaryPort = 7555')
=> 'secondaryPort = 7555'
# with regex character class for literal space
filtered_data.gsub(/secondaryPort\s{1}=\s{1}\d+/, 'secondaryPort = 7555')
=> 'secondaryPort = 7555'

two regex tests within regex

now i have got a problem with preg_match.
This is an example string: "!asksheet!H69=var8949", there can also be more than one var8949 or H69 index in this row. Result shoud be "var33333=var8949"
This is my part:
preg_match_all('#\b\!(.*)\![A-Z]{1,3}\d+\b#', $output, $matches2);
foreach ($matches2[0] as $match2) {
$result6 = $db->query("SELECT varid FROM variablen WHERE varimportedindex = '".$match2."' AND projectid = $pid AND varsheetname LIKE '%".$match2."%' ");
$rowoperation2 = $result6->fetch_assoc();
if ($rowoperation2['varid'] != "" AND $rowoperation2['varid'] != "0") {
$output2 = preg_replace("#\b\!(.*)\![A-Z]{1,3}\d+\b#", "var".$rowoperation2['varid']."", $output);
}
}
Can someone perhaps help?
Thank you,
Regards
Olaf
Why not using a simple preg_match instead of preg_match_all, you don't need word boundary and exclamation mark doesn't need to be escaped, the strings you're looking for are in group 1 and 2:
$str = '"!asksheet!H69=var8949"';
preg_match('#!(.*?)!([A-Z]{1,3}\d+)#', $str, $m);
print_r($m);
Output:
Array
(
[0] => !asksheet!H69
[1] => asksheet
[2] => H69
)

Parsing a string field

I have these Syslog messages:
N 4000000 PROD 15307 23:58:12.13 JOB78035 00000000 $HASP395 GGIVJS27 ENDED\r
NI0000000 PROD 15307 23:58:13.41 STC81508 00000200 $A J78036 /* CA-JOBTRAC JOB RELEASE */\r
I would like to parse these messages into various fields in a Hash, e.g.:
event['recordtype'] #=> "N"
event['routingcode'] #=> "4000000"
event['systemname'] #=> "PROD"
event['datetime'] #=> "15307 23:58:12.13"
event['jobid'] #=> "JOB78035"
event['flag'] #=> "00000000"
event['messageid'] #=> "$HASP395"
event['logmessage'] #=> "$HASP395 GGIVJS27 ENDED\r"
This is the code I have currently:
message = event["message"];
if message.to_s != "" then
if message[2] == " " then
array = message.split(%Q[ ]);
event[%q[recordtype]] = array[0];
event[%q[routingcode]] = array[1];
event[%q[systemname]] = array[2];
event[%q[datetime]] = array[3] + " " +array[4];
event[%q[jobid]] = message[38,8];
event[%q[flags]] = message[47,8];
event[%q[messageid]] = message[57,8];
event[%q[logmessage]] = message[56..-1];
else
array = message.split(%Q[ ]);
event[%q[recordtype]] = array[0][0,2];
event[%q[routingcode]] = array[0][2..-1];
event[%q[systemname]] = array[1];
event[%q[datetime]] = array[2] + " "+array[3];
event[%q[jobid]] = message[38,8];
event[%q[flags]] = message[47,8];
event[%q[messageid]] = message[57,8];
event[%q[logmessage]] = message[56..-1];
end
end
I'm looking to improve the above code. I think I could use a regular expression, but I don't know how to approach it.
You can't use split(' ') or a default split to process your fields because you are dealing with columnar data that has fields that have no whitespace between them, resulting in your array being off. Instead, you have to pick apart each record by columns.
There are many ways to do that but the simplest and probably fastest, is indexing into a string and grabbing n characters:
'foo'[0, 1] # => "f"
'foo'[1, 2] # => "oo"
The first means "starting at index 0 in the string, grab one character." The second means "starting at index 1 in the string, grab two characters."
Alternately, you could tell Ruby to extract by ranges:
'foo'[0 .. 0] # => "f"
'foo'[1 .. 2] # => "oo"
These are documented in the String class.
This makes writing code that's easily understood:
record_type = message[ 0 .. 1 ].rstrip
routing_code = message[ 2 .. 8 ]
system_name = message[ 10 .. 17 ]
Once you have your fields captured add them to a hash:
{
'recordtype' => record_type,
'routingcode' => routing_code,
'systemname' => system_name,
'datetime' => date_time,
'jobid' => job_id,
'flags' => flags,
'messageid' => message_id,
'logmessage' => log_message,
}
While you could use a regular expression there's not much gained using one, it's just another way of doing it. If you were picking data out of free-form text it'd be more useful, but in columnar data it tends to result in visual noise that makes maintenance more difficult. I'd recommend simply determining your columns then cutting the data you need based on those from each line.

Join array of strings into 1 or more strings each within a certain char limit (+ prepend and append texts)

Let's say I have an array of Twitter account names:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
And a prepend and append variable:
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
How can I turn this into an array of as few strings as possible each with a maximum length of 140 characters, starting with the prepend text, ending with the append text, and in between the Twitter account names all starting with an #-sign and separated with a space. Like this:
tweets = ['Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday', 'Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday', 'Check out these cool people: #example18 #example19 #example20 #FollowFriday']
(The order of the accounts isn't important so theoretically you could try and find the best order to make the most use of the available space, but that's not required.)
Any suggestions? I'm thinking I should use the scan method, but haven't figured out the right way yet.
It's pretty easy using a bunch of loops, but I'm guessing that won't be necessary when using the right Ruby methods. Here's what I came up with so far:
# Create one long string of #usernames separated by a space
tmp = twitter_accounts.map!{|a| a.insert(0, '#')}.join(' ')
# alternative: tmp = '#' + twitter_accounts.join(' #')
# Number of characters left for mentioning the Twitter accounts
length = 140 - (prepend + append).length
# This method would split a string into multiple strings
# each with a maximum length of 'length' and it will only split on empty spaces (' ')
# ideally strip that space as well (although .map(&:strip) could be use too)
tweets = tmp.some_method(' ', length)
# Prepend and append
tweets.map!{|t| prepend + t + append}
P.S.
If anyone has a suggestion for a better title let me know. I had a difficult time summarizing my question.
The String rindex method has an optional parameter where you can specify where to start searching backwards in a string:
arr = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
str = arr.map{|name|"##{name}"}.join(' ')
prepend = 'Check out these cool people: '
append = ' #FollowFriday'
max_chars = 140 - prepend.size - append.size
until str.size <= max_chars do
p str.slice!(0, str.rindex(" ", max_chars))
str.lstrip! #get rid of the leading space
end
p str unless str.empty?
I'd make use of reduce for this:
string = %w[example1 example2 example3 example4 example5 example6 example7 example8 example9 example10 example11 example12 example13 example14 example15 example16 example17 example18 example19 example20]
prepend = 'Check out these cool people:'
append = '#FollowFriday'
# Extra -1 is for the space before `append`
max_content_length = 140 - prepend.length - append.length - 1
content_strings = string.reduce([""]) { |result, target|
result.push("") if result[-1].length + target.length + 2 > max_content_length
result[-1] += " ##{target}"
result
}
tweets = content_strings.map { |s| "#{prepend}#{s} #{append}" }
Which would yield:
"Check out these cool people: #example1 #example2 #example3 #example4 #example5 #example6 #example7 #example8 #example9 #FollowFriday"
"Check out these cool people: #example10 #example11 #example12 #example13 #example14 #example15 #example16 #example17 #FollowFriday"
"Check out these cool people: #example18 #example19 #example20 #FollowFriday"

Resources