In Perl, best way to insert a char every N chars - algorithm

I would like to find the best way in Perl to insert a char every N chars in a string.
Suppose I have the following :
my $str = 'ABCDEFGH';
I would like to insert a space every two chars, so that I get:
my $finalstr = 'AB CD EF GH';
The innocent way would be:
my $finalstr;
while ($str =~ s/(..)//) {
$finalstr .= $1.' ';
}
(But the last space does not make me happy.)
Can we do better? Is it possible using a single substitution pattern s///, especially to use that same string $str (and not using $finalstr)?
The next step: do the same but with text before and after patterns to be cut (and to be kept, for sure), say for example '<<' and '>>':
my $str = 'blah <<ABCDEFGH>> blah';
my $finalstr1 = 'blah <<AB CD EF GH>> blah';
my $finalstr2 = 'blah << AB CD EF GH >> blah'; # alternate

Using positive lookahead and lookbehind assertions to insert a space:
my $str = 'ABCDEFGH';
$str =~ s/..\K(?=.)/ /sg;
use Data::Dump;
dd $str;
Outputs:
"AB CD EF GH"
Enhancement for limiting the Translation
If you want to apply this modification to only part of the string, break it into steps:
my $str = 'blah <<ABCDEFGH>> blah';
$str =~ s{<<\K(.*?)(?=>>)}{$1 =~ s/..\K(?=.)/ /sgr}esg;
use Data::Dump;
dd $str;
Outputs:
"blah <<AB CD EF GH>> blah"

The best solution using substitutions would probably be s/\G..\K/ /sg. Why?
The \G anchores at the current “position” of the string. This position is where the last match ended (usually this is set to the beginning of the string. If in doubt, set pos($str) = 0). Because we use the /g modifier, this will be where the previous substitution ended.
The .. matches any two characters. Note that we also use the /s modifier which causes . to really match any character, and not just the [^\n] character class.
The \K treats the previous part of the regex as a look-behind, by not including the previously matched part of the string in the substring that will be substituted. So \G..\K matches the zero length string after two arbitrary characters.
We substitute that zero length string with a single space.
I'd let the regex engine handle the substitution, rather than manually appending $1 . " ". Also, my lookbehind solution avoids the cost of using captures like $1.

You want the //g modifier with its many capabilities. See e.g. here for an introduction to the intricacies of global matching.

Do you mean something like...
$str =~ s/(..)/$1 /sg;
update: For more complex substitutions as the one you are asking in the second part of your question, you can use the e modifier that allows you to evaluate arbitrary perl code:
sub insert_spcs {
my $str = shift;
join ' ', $str =~ /(..?)/sg
}
my $str = 'blah <<ABCDEFGH>> blah';
$str =~ s/<<(.*?)>>/'<< '.insert_spcs($1).' >>'/se;

Personally I'd split the text with m//g and use join:
my $input = "ABCDEFGH";
my $result = join " ", ( $input =~ m/(..)/g );
say "RESULT <$result>";'
Yields
RESULT <AB CD EF GH>

The other answers are better, but just for giggles:
join ' ', grep length, split /(..)/, 'ABCDEFGH';

Related

Extract value for a key in a key/pair string

I have key value pairs in a string like this:
key1 = "value1"
key2 = "value2"
key3 = "value3"
In a bash script, I need to extract the value of one of the keys like for key2, I should get value2, not in quote.
My bash script needs to work in both Redhat and Ubuntu Linux hosts.
What would be the easiest and most reliable way of doing this?
I tried something like this simplified script:
pattern='key2\s*=\s*\"(.*?)\".*$'
if [[ "$content" =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
else
echo 'not found'
fi
But it does not work consistently.
Any better/easier/more reliable way of doing this?
To separate the key and value from your $content variable, you can use:
[[ $content =~ (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ ]]
That will properly populate the BASH_REMATCH array with both values where your key is in BASH_REMATCH[1] and the value in BASH_REMATCH[2].
Explanation
In bash the [[...]] treats what appears on the right side of =~ as an extended regular expression and matched according to man 3 regex. See man 1 bash under the section heading for [[ expression ]] (4th paragraph). Sub-expressions in parenthesis (..) are saved in the array variable BASH_REMATCH with BASH_REMATCH[0] containing the entire portion of the string (your $content) and each remaining elements containing the sub-expressions enclosed in (..) in the order the parenthesis appear in the regex.
The Regular Expression (^[^ ]+)[[:blank:]]*=[[:blank:]]*[[:punct:]](.*)[[:punct:]]$ is explained as:
(^[^ ]+) - '^' anchored at the beginning of the line, [^ ]+ match one or more characters that are not a space. Since this sub-expression is enclosed in (..) it will be saved as BASH_REMATCH[1], followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
= - an equal sign, followed by;
[[:blank:]]* - zero or more whitespace characters, followed by;
[[:punct:]] - a punctuation character (matching the '"', which avoids caveats associated with using quotes within the regex), followed by the sub-expression;
(.*) - zero or more characters (the rest of the characters), and since it is a sub-expression in (..) it the characters will be stored in BASH_REMATCH[2], followed by;
[[:punct:]] - a punctuation character (matching the '"' ... ditto), at the;
$ - end of line anchor.
So if you match what your key and value input lines separated by an = sign, it will separate the key and value into the array BASH_REMATCH as you wanted.
Bash supports BRE only and you cannot use \s and .*?.
As an alternative, please try:
while IFS= read -r content; do
# pattern='key2\s*=\s*\"(.*)\".*$'
pattern='key2[[:blank:]]*=[[:blank:]]*"([^"]*)"'
if [[ $content =~ $pattern ]]
then
key2="${BASH_REMATCH[1]}"
echo "key2: $key2"
(( found++ ))
fi
done < input-file.txt
if (( found == 0 )); then
echo "not found"
fi
What you start talking about key-value pairs, it is best to use an associative array:
declare -A map
Now looking at your lines, they look like key = "value" where we assume that:
value is always encapsulated by double quotes, but also could contain a quote
an unknown number of white spaces is before and/or after the equal sign.
So assuming we have a variable line which contains key = "value", the following operations will extract that value:
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
IFS=" \t=" read -r value _ <<<"$line"
This allows us now to have something like:
declare -A map
while read -r line; do
key="${line%%=*}"; key="${key// /}"
value="${line#*=}"; value="${value#*\042}"; value="${value%\042*}"
map["$key"]="$value"
done <inputfile
With awk:
awk -v key="key2" '$1 == key { gsub("\"","",$3);print $3 }' <<< "$string"
Reading the output of the variable called string, pass the required key in as a variable called key and then if the first space delimited field is equal to the key, remove the quotes from the third field with the gsub function and print.
Ok, after spending so many hours, this is how I solved the problem:
If you don't know where your script will run and what type of file (win/mac/linux) are you reading:
Try to avoid non-greedy macth in linux bash instead of tweaking diffrent switches.
don't trus end of line match $ when you might get data from windows or mac
This post solved my problem: Non greedy text matching and extrapolating in bash
This pattern works for me in may linux environments and all type of end of lines:
pattern='key2\s*=\s*"([^"]*)"'
The value is in BASH_REMATCH[1]

Use sed to escape a pattern to another sed

I would like an approach to do a replace using the sed command that escapes a "pattern" (string) to be used in another sed command. This escape process must include handling for multi-line strings pattern.
To illustrate I present the code below. It works perfectly (well tested so far), but fails when we have strings with multiple lines (see "STRING_TO_ESCAPE").
#!/bin/bash
# Escape TARGET_STRING.
read -r -d '' STRING_TO_ESCAPE <<'HEREDOC'
$N = "magic_quotes_gpc = <b>"._("On")."</b>";
$D = _("Increase your server security by setting magic_quotes_gpc to 'on'. PHP will escape all quotes in strings in this case.");
$S = _("Search for 'magic_quotes_gpc' in your php.ini and set it to 'On'.");
$R = ini_get('magic_quotes_gpc');
$M = TRUE;
$this->config_checks[] = array("NAME" => $N , "DESC" => $D , "RESULT" => $R , "SOLUTION" => $S , "MUST" => $M );
HEREDOC
ESCAPED_STRING=$(echo "'${STRING_TO_ESCAPE}'" | sed 's/[]\/$*.^|[]/\\&/g')
ESCAPED_STRING=${ESCAPED_STRING%?}
TARGET_STRING=${ESCAPED_STRING#?}
# NOTE: The single quotes in "'${STRING_TO_ESCAPE}'" serve to prevent spaces
# being "lost" at the beginning and end of the string! The manipulations with
# "ESCAPED_STRING" are used to remove them. When we use sed with the file being
# input (flag "-i") this problem does not occur.
# Escape REPLACE_STRING.
read -r -d '' STRING_TO_ESCAPE <<'HEREDOC'
/* NOTE: "Magic_quotes_gpc" is no longer required. We taught GOsa2 to deal with it (see /usr/share/gosa/html/main.php). By Questor */
/* Automatic quoting must be turned on */
/* $N = "magic_quotes_gpc = <b>"._("On")."</b>";
$D = _("Increase your server security by setting magic_quotes_gpc to 'on'. PHP will escape all quotes in strings in this case.");
$S = _("Search for 'magic_quotes_gpc' in your php.ini and set it to 'On'.");
$R = ini_get('magic_quotes_gpc');
$M = TRUE;
$this->config_checks[] = array("NAME" => $N , "DESC" => $D , "RESULT" => $R , "SOLUTION" => $S , "MUST" => $M ); */
HEREDOC
ESCAPED_STRING=$(echo "'${STRING_TO_ESCAPE}'" | sed 's/[]\/$*.^|[]/\\&/g')
ESCAPED_STRING=${ESCAPED_STRING%?}
REPLACE_STRING=${ESCAPED_STRING#?}
# Do the replace.
STRING_TO_MODIFY=$(cat file_name.txt)
MODIFIED_STRING=$(echo "'${STRING_TO_MODIFY}'" | sed 's/$TARGET_STRING/$REPLACE_STRING/g')
MODIFIED_STRING=${MODIFIED_STRING%?}
MODIFIED_STRING=${MODIFIED_STRING#?}
echo "$MODIFIED_STRING"
Thanks! =D
This escape process must include handling for multi-line strings pattern.
I think you're barking up the wrong tree. If you're trying to match a multiline pattern then the most significant problem is not how to escape the pattern, but rather how to write a sed script that will successfully match anything to it.
The problem is that sed reads input one line at a time. There are various ways to collect multiple lines and to operate on such collections, but you need to do that explicitly in the program. sed is therefore a poor choice for attempting to match arbitrary multiline text. To make your task feasible, you would want to know how many lines the pattern will contain, so as to write your sed program to be specific for that. Even then, this might be a better job for Perl.
Update:
Because I like sed, however, here's an example of how you could write a sed program that matches multiline patterns:
#!/bin/sed -f
# Build up a three-line window in the pattern space
:a
/\(.*\
\)\{2\}/! { N; ba; }
# A(nother) multiline pattern. If the pattern fails to match then the
# first line of the pattern space is printed and deleted, then
# we loop back to reload.
/^The\
quick\
brown$/! { P; D; ba; }
# Do whatever we want to do in the event of a match
s/brown/red/
# If control reaches here then the whole pattern space is printed,
# and if any input lines remain then we start again from the beginning
# with an initially-empty pattern space.
Example input:
$ ./ml.sed <<EOF
The
quick
brown
fox
jumped over
the lazy dog.
EOF
Output:
The
quick
red
fox
jumped over
the lazy dog.
Note well that newlines are matched as ordinary characters, but that literal newlines in pattern or replacement text need to be escaped in the normal way, for syntactic reasons.
Update 2:
Here's a variation the replaces appearances of the three-line sequence
brown
fox
jumped over
with the three-line sequence
red
pig
is fat
. Of course there are many other ways to accomplish the same thing with sed, and one of the others might be preferable to this for your particular purposes.
#!/bin/sed -f
:a
/\(.*\
\)\{2\}/! { N; ba; }
/^brown\
fox\
jumped over$/! { P; D; ba; }
s/.*/red\
pig\
is fat/

How to escape special chars powershell

I am using the code below to send some keys to automate some process in my company.
$wshell = New-Object -ComObject wscript.shell;
$wshell.SendKeys("here comes my string");
The problem is that the string that gets sent must be sanitazed to escape some special chars as described here.
For example: {, [, +, ~ all those symbols must be escaped like {{}, {[}, {+}, {~}
So I am wondering: is there any easy/clean way to do a replace in the string? I dont want to use tons of string.replace("{","{{}"); string.replace("[","{[}")
What is the right way to do this?
You can use a Regular Expression (RegEx for short) to do this. RegEx is used for pattern matching, and works great for what you need. Ironicly you will need to escape the characters for RegEx before defining the RegEx pattern, so we'll make an array of the special characters, escape them, join them all with | (which indicates OR), and then replace on that with the -replace operator.
$SendKeysSpecialChars = '{','}','[',']','~','+','^','%'
$ToEscape = ($SendKeysSpecialChars|%{[regex]::Escape($_)}) -join '|'
"I need to escape [ and } but not # or !, but I do need to for %" -replace "($ToEscape)",'{$1}'
That produces:
I need to escape {[} and {}} but not # or !, but I do need to for {%}
Just put the first two near the beginning of the script, then use the replace as needed. Or make a function that you can call that'll take care of the replace and the SendKeys call for you.
You can use Here Strings.
Note: Here Strings were designed for multi-line strings, but you can still use them to escape expression characters.
As stated on this website.
A here string is a single-quoted or double-quoted string which can
span multiple lines. Expressions in single-quoted strings are not
evaluated.
All the lines in a here-string are interpreted as strings,
even though they are not enclosed in quotation marks.
Example:
To declare a here string you have to use a new-line for the text
itself, Powershell syntax.
$string = #'
{ [ + ~ ! £ $ % ^ & ( ) _ - # ~ # '' ""
'#
Output: { [ + ~ ! £ $ % ^ & ( ) _ - # ~ # '' ""

Replace and increment letters and numbers with awk or sed

I have a string that contains
fastcgi_cache_path /var/run/nginx-cache15 levels=1:2 keys_zone=MYSITEP:100m inactive=60m;
One of the goals of this script is to increment nginx-cache two digits based on the value find on previous file. For doing that I used this code:
# Replace cache_path
PREV=$(ls -t /etc/nginx/sites-available | head -n1) #find the previous cache_path number
CACHE=$(grep fastcgi_cache_path $PREV | awk '{print $2}' |cut -d/ -f4) #take the string to change
SUB=$(echo $CACHE |sed "s/nginx-cache[0-9]*[0-9]/&#/g;:a {s/0#/1/g;s/1#/2/g;s/2#/3/g;s/3#/4/g;s/4#/5/g;s/5#/6/g;s/6#/7/g;s/7#/8/g;s/8#/9/g;s/9#/#0/g;t a};s/#/1/g") #increment number
sed -i "s/nginx-cache[0-9]*/$SUB/g" $SITENAME #replace number
Maybe not so elegant, but it works.
The other goal is to increment last letter of all occurrences of MYSITEx (MYSITEP, in that case, should become MYSITEQ, after MYSITEP, etc. etc and once MYSITEZ will be reached add another letter, like MYSITEAA, MYSITEAB, etc. etc.
I thought something like:
sed -i "s/MYSITEP[A-Z]*/MYSITEGG/g" $SITENAME
but it can't works cause MYSITEGG is a static value and can't be used.
How can I calculate the last letter, increment it to the next one and once the last Z letter will be reached, add another letter?
Thank you!
Perl's autoincrement will work on letters as well as digits, in exactly the manner you describe
We may as well tidy your nginx-cache increment as well while we're at it
I assume SITENAME holds the name of the file to be modified?
It would look like this. I have to assign the capture $1 to an ordinary variable $n to increment it, as $1 is read-only
perl -i -pe 's/nginx-cache\K(\d+)/ ++($n = $1) /e; s/MYSITE\K(\w+)/ ++($n = $1) /e;' $SITENAME
If you wish, this can be done in a single substitution, like this
perl -i -pe 's/(?:nginx-cache|MYSITE)\K(\w+)/ ++($n = $1) /ge' $SITENAME
Note: The solution below is needlessly complicated, because as Borodin's helpful answer demonstrates (and #stevesliva's comment on the question hinted at), Perl directly supports incrementing letters alphabetically in the manner described in the question, by applying the ++ operator to a variable containing a letter (sequence); e.g.:
$ perl -E '$letters = "ZZ"; say ++$letters'
AAA
The solution below may still be of interest as an annotated showcase of how Perl's power can be harnessed from the shell, showing techniques such as:
use of s///e to determine the replacement string with an expression.
splitting a string into a character array (split //, "....")
use of the ord and chr functions to get the codepoint of a char., and convert a(n incremented) codepoint back to a char.
string replication (x operator)
array indexing and slices:
getting an array's last element ($chars[-1])
getting all but the last element of an array (#chars[0..$#chars-1])
A perl solution (in effect a re-implementation of what ++ can do directly):
perl -pe 's/\bMYSITE\K([A-Z]+)/
#chars = split qr(), $1; $chars[-1] eq "Z" ?
"A" x (1 + scalar #chars)
:
join "", #chars[0..$#chars-1], chr (1 + ord $chars[-1])
/e' <<'EOF'
...=MYSITEP:...
...=MYSITEZP:...
...=MYSITEZZ:...
EOF
yields:
...=MYSITEQ:... # P -> Q
...=MYSITEZQ:... # ZP -> ZQ
...=MYSITEAAA:... # ZZ -> AAA
You can use perl's -i option to replace the input file with the result
(perl -i -pe '...' "$SITENAME").
As Borodin's answer demonstrates, it's not hard to solve all tasks in the question using perl alone.
The s function's /e option allows use of a Perl expression for determining the replacement string, which enables sophisticated replacements:
$1 references the current MYSITE suffix in the expression.
#chars = split qr(), $1 splits the suffix into a character array.
$chars[-1] eq "Z" tests if the last suffix char. is Z
If so: The suffix is replaced with all As, with an additional A appended
("A" x (1 + scalar #chars)).
Otherwise: The last suffix char. is replaced with the following letter in the alphabet
(join "", #chars[0..$#chars-1], chr (1 + ord $chars[-1]))

remove only *some* fullstops from a csv file

If I have lines like the following:
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
how can I replace all instances of ,., with ,?,
I want to preserve actual decimal places in the numbers so I can't just do
sed 's/./?/g' file
however when doing:
sed 's/,.,/,?,/g' file
this only appears to work in some cases. i.e. there are still instances of ,., hanging around.
anyone have any pointers?
Thanks
This should work :
sed ':a;s/,\.,/,?,/g;ta' file
With successive ,., strings, after a substitution succeeded, next character to be processed will be the following . that doesn't match the pattern, so with you need a second pass.
:a is a label for upcoming loop
,\., will match dot between commas. Note that the dot must be escaped because . is for matching any character (,a, would match with ,.,).
g is for general substitution
ta tests previous substitution and if it succeeded, loops to :a label for remaining substitutions.
Using sed it is possible by running a loop as shown in above answer however problem is easily solved using perl command line with lookarounds:
perl -pe 's/(?<=,)\.(?=,)/?/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This command doesn't need a loop because instead of matching surrounding commas we're just asserting their position using a lookbehind and lookahead.
All that's necessary is a single substitution
$ perl -pe 's/,\.(?=,)/,?/g' dots.csv
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
You have an example using sed style regular expressions. I'll offer an alternative - parse the CSV, and then treat each thing as a 'field':
#!/usr/bin/perl
use strict;
use warnings;
#iterate input row by row
while ( <DATA> ) {
#remove linefeeds
chomp;
#split this row on ,
my #row = split /,/;
#iterate each field
foreach my $field ( #row ) {
#replace this field with "?" if it's "."
$field = "?" if $field eq ".";
}
#stick this row together again.
print join ",", #row,"\n";
}
__DATA__
1,987372,987372,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,.,1.293,12.23,0.989,0.973,D,.,.,.,.,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,.,.,.,.,.,.,.,.,1,D,.,.,.,.,.,.,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
This is more verbose than it needs to be, to illustrate the concept. This could be reduced down to:
perl -F, -lane 'print join ",", map { $_ eq "." ? "?" : $_ } #F'
If your CSV also has quoting, then you can break out the Text::CSV module, which handles that neatly.
You just need 2 passes since the trailing , found on a ,., match isn't available to match the leading , on the next ,.,:
$ sed 's/,\.,/,?,/g; s/,\.,/,?,/g' file
1,987372,987372,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,?,1.293,12.23,0.989,0.973,D,?,?,?,?,0.253,0,4.08,0.917,1.048,1.000,1.000,12.998
1,987393,987393,C,T,?,?,?,?,?,?,?,?,1,D,?,?,?,?,?,?,0.152,1.980,16.09,0.999,0.982,D,-0.493,T,0.335,T,0.696,0,5.06,0.871,0.935,0.998,0.997,16.252
The above will work in any sed on any OS.

Resources