Perl difference between 2 paths - windows

How to get the difference between 2 paths ?
we've $src variable which is defined with as base path, and we're getting the modified lists into FilesList.txt.
as
$src = "C:\\Users\\Desktop\\Perl\\Index"
$_ = "C:\Users\Desktop\Perl\Index\CC\Login.jsp";
Now, how we can get "CC\Login.jsp" value, i'm using below code, but we're not getting the expected output. Please help.
$src="C:\\Users\\Desktop\\Perl\\Index";
open IN, "FilesList.txt";
while(<IN>)
{
chomp($_);
$final=$_;
$final =~ s/\$src//;
print "\nSubvalue is ---$final \n";
}

Don't use regex patterns to handle path strings. There are multiple different representations of equivalent paths and the strings may not match. A regex will also pay no attention to path separators, so it will not correct for a trailing separator on the base path and it may match partial path steps like C:\Users\Desktop\Perl\Ind, leaving ex\CC\Login.jsp which is clearly wrong
You need the abs2rel function from File::Spec::Functions
Like this
use strict;
use warnings 'all';
use feature 'say';
use File::Spec::Functions 'abs2rel';
my $src = 'C:\Users\Desktop\Perl\Index';
for ( 'C:\Users\Desktop\Perl\Index\CC\Login.jsp' ) {
say abs2rel($_, $src);
}
output
CC\Login.jsp

Related

Search for a specific file name in a specific folder in laravel

everyone. So, what I basically want to do is to search for all files that start with "dm" or end with ".tmp" in storage_path("app/public/session").
I already tried File::allFiles() and File::files() but what I get is all files that are into that session folder and I can't figure out how to do it. What I could find in here is questions on how to empty a folder but that's not what I am looking for. Thanks.
Try this code :
$files = File::allFiles(storage_path("app/public/session"));
$files = array_filter($files, function ($file) {
return (strpos($file->getFilename(), 'dm') === 0) || (substr($file->getFilename(), -4) === '.tmp');
});
Or you can use the glob function like this :
$files = array_merge(
glob(storage_path("app/public/session/dm*")),
glob(storage_path("app/public/session/*.tmp"))
);
In Laravel, you can use the File facade's glob() method to search for files that match a certain pattern. The glob() function searches for all the pathnames matching a specified pattern according to the rules used by the libc glob() function, which is similar to the rules used by common shells.
You can use the glob() method to search for files that start with "dm" or end with ".tmp" in the "app/public/session" directory like this:
use Illuminate\Support\Facades\File;
$storagePath = storage_path("app/public/session");
// Find files that start with "dm"
$files = File::glob("$storagePath/dm*");
// Find files that end with ".tmp"
$files = File::glob("$storagePath/*.tmp");
You can also use the ? and [] wildcard characters,
for example ? matches one any single character and [] matches one character out of the set of characters between the square brackets,
to search for files that match more specific patterns, like this:
// Find files that starts with "dm" and ends with ".tmp"
$files = File::glob("$storagePath/dm*.tmp");
Note that, File::glob() method return array of matched path, you can loop and see the files or use it according to your needs.

Insert string into multiple filenames

I have multiple files named in this format:
Fat1920OVXPlacebo_S20_R1_001.fastq
Kidney1235SHAM_S65_R1_001.fastq
Kidney1911OVXPlacebo_S94_R2_001.fastq
Liver1289OVXEstrogen_S24_R2_001.fastq
I need to insert the string "L1000_" into their names so that they read
Fat1920OVXPlacebo_S20_L1000_R1_001.fastq
Kidney1235SHAM_S65_L1000_R1_001.fastq
Kidney1911OVXPlacebo_S94_L1000_R2_001.fastq
Liver1289OVXEstrogen_S24_L1000_R2_001.fastq
I apologize but I have absolutely no experience in coding in powershell. The closest thing I could find to do this was a script that renames the entire file:
Set objFso = CreateObject(“Scripting.FileSystemObject”)
Set Folder = objFSO.GetFolder(“ENTER\PATH\HERE”)
For Each File In Folder.Files
sNewFile = File.Name
sNewFile = Replace(sNewFile,”ORIGINAL”,”REPLACEMENT”)
if (sNewFile<>File.Name) then
File.Move(File.ParentFolder+”\”+sNewFile)
end if
Next
however, I just need to insert a string at a specific place in the file's title. I have 257 files and do not want to go 1 by 1. Does anyone have an idea on how to run this in windows?
Use Get-ChildItem to enumerate the files of interest, pipe them to Rename-Item, and use a delay-bind script block ({ ... }) to dynamically determine the new name, via a regex-based -replace operation.
(Get-ChildItem $yourFolder -Filter *.fastq) |
Rename-Item -NewName { $_.Name -replace '(?<=_S\d+_)', 'L1000_' } -WhatIf
Note:
• The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
• Even though not strictly necessary in this case, enclosing the Get-ChildItem command in (...), the grouping operator ensures that already renamed files don't accidentally re-enter the enumeration of files to be renamed - see this answer.
(?<=_S\d+_) uses a positive look-behind assertion ((?<=...)) to match verbatim string _S, followed by one or more (+) digits (\d), followed by verbatim _.
Since the look-behind assertion merely matches a position in the string rather than a substring, the replacement operand, verbatim L1000_ in this case, is inserted at that position in (a copy of) the input string.
For a more detailed explanation of the delay-bind script-block technique, see this answer.
here's one way to do that with PoSh. note that the demo does not handle either the rename or directory related stuff. it ONLY handles generating the new file names.
what it does ...
fakes reading in a list of fileinfo objects
when ready to do this for real, replace the entire #region/#endregion block with a call to Get-ChildItem and save it to $FileList.
sets the text to be inserted
iterates thru the file list
splits the file .Name property on the underscores
saves that to a $Var
adds the 1st two splits, the insertion text, and the last two splits to a new array
joins that array with an underscore as the delimiter
sends the new file name to the $Result collection
displays the list of new names
the code ...
#region - fake reading in a list of files
# in real life, use Get-ChildItem
$FileList = #(
[system.io.fileinfo]'Fat1920OVXPlacebo_S20_R1_001.fastq'
[system.io.fileinfo]'Kidney1235SHAM_S65_R1_001.fastq'
[system.io.fileinfo]'Kidney1911OVXPlacebo_S94_R2_001.fastq'
[system.io.fileinfo]'Liver1289OVXEstrogen_S24_R2_001.fastq'
)
#endregion - fake reading in a list of files
$InsertionText = 'L1000'
$Result = foreach ($FL_Item in $FileList)
{
$FLI_Parts = $FL_Item.Name.Split('_')
($FLI_Parts[0,1] + $InsertionText + $FLI_Parts[2,3]) -join '_'
}
$Result
output ...
Fat1920OVXPlacebo_S20_L1000_R1_001.fastq
Kidney1235SHAM_S65_L1000_R1_001.fastq
Kidney1911OVXPlacebo_S94_L1000_R2_001.fastq
Liver1289OVXEstrogen_S24_L1000_R2_001.fastq
Using PowerShell, you could use a regular expression to rename the files. Example:
Get-ChildItem "C:\foldername\here\*.fastq" | ForEach-Object {
$oldName = $_.Name
$newName = [Regex]::Replace($oldName,'(S\d+)_(R\d+)','$1_L1000_$2')
Rename-Item $_ $newName -WhatIf
}
[Regex] is a PowerShell type accelerator for the .NET Regex class, and Replace is the method for the Regex class that performs text substitutions. The first parameter to the Replace method is the input string (the old filename), the second parameter is the regular expression pattern (run help about_Regular_Rxpressions for more information), and the third parameter is the replacement string pattern, where $1 is the first capture pattern in ( ), and $2 is the second capture pattern in ( )). Finally, the Rename-Item cmdlet renames the files. Remove the -WhatIf parameter if the output looks correct to actually perform the renames.

Checking really fast if one of many strings exists in one of many other strings, in Perl

Let's say I have a set of 100,000 strings, each one around 20-50 bytes on average.
Then let's say I have another set of 100,000,000 strings, each one also around 20-50 bytes on average.
I'd like to go through all 100,000,000 strings from the second set and check if any one of the strings from the first set exist in any one string from the second set.
Example: string from first set: "abc", string from second set: "123abc123" = match!
I've tried using Perl's index(), but it's not fast enough. Is there a better way to do this type of matching?
I found Algorithm::AhoCorasik::XS on CPAN, which implements the classic, very efficient multiple string search Aho-Corasick algorithm (Same one used by grep -F), and it seems to be reasonably fast (About half the speed of an equivalent grep invocation):
Example script:
#!/usr/bin/env perl
use warnings;
use strict;
use autodie;
use feature qw/say/;
use Algorithm::AhoCorasick::XS;
open my $set1, "<", "set1.txt";
my #needles = <$set1>;
chomp #needles;
my $search = Algorithm::AhoCorasick::XS->new(\#needles);
open my $set2, "<", "set2.txt";
while (<$set2>) {
chomp;
say if defined $search->first_match($_);
}
and using it (With randomly-generated test files):
$ wc -l set1.txt set2.txt
10000 set1.txt
500000 set2.txt
510000 total
$ time perl test.pl | wc -l
458414
real 0m0.403s
user 0m0.359s
sys 0m0.031s
$ time grep -Ff set1.txt set2.txt | wc -l
458414
real 0m0.199s
user 0m0.188s
sys 0m0.031s
You should use a regex alternation, like:
my #string = qw/abc def ghi/;
my $matcher = qr/#{[join '|', map quotemeta, sort #string]}/;
This should be faster than using index. But it can be made faster yet:
Up to a certain limit, depending on both the length and number of the strings, perl will build a trie for efficient matching; see e.g. https://perlmonks.org/?node_id=670558. You will want to experiment with how many strings you can include in a single regex to generate an array of regexes. Then combine those separate regexes into a single one (untested):
my #search_strings = ...;
my #matchers;
my $string_limit = 3000; # a guess on my part
my #strings = sort #search_strings;
while (my #subset = splice #strings, 0, $string_limit) {
push #matchers, qr/^.*?#{[join '|', map quotemeta, sort #subset]}/s;
}
my $matcher = '(?:' . join('|', map "(??{\$matchers[$_]})", 0..$#matchers) . ')';
$matcher = do { use re 'eval'; qr/$matcher/ };
/$matcher/ and print "line $. matched: $_" while <>;
The (??{...}) construct is needed to join the separate regexes; without it, the subregexes are all just interpolated and the joined regex compiled all together, removing the trie optimization. Each subregex starts with ^.*? so it searches the entire string; without that, the joined regex would have to invoke each subregex separately for each position in the string.
Using contrived data, I'm seeing about 3000 strings searched per second with this approach in a not very fast vm. Using the naive regex is less than 50 strings per second. Using grep, as suggested in a comment by Shawn, is faster (about 4200 strings per second for me) but gives you less control if you want to do things like identify which strings matched or at what positions.
You may want to have a look at https://github.com/leendo/hello-world .
Its parallel processing makes it really fast. Either just type in all search terms individually or as || conjunctions, or (better) adapt it to run the second set programatically in one go.
Here is an idea: you could partition the dictionary into lists of words that have the same 2 or 3 letter prefix. You would then iterate on the large set and for each position in each string, extract the prefix and try and match the strings that have this prefix.
You would use hashtable to store the lists with O(1) lookup time.
If some words in the dictionary are shorter than the prefix length, you would have to special case short words.
Making prefixes longer will make the hashtable larger but the lists shorter, improving the test time.
I have no clue if this can be implemented efficiently in Perl.
The task is quite simple and perhaps used on everyday basis around the globe.
Please see following code snippet for one of many possible implementations
use strict;
use warnings;
use feature 'say';
use Getopt::Long qw(GetOptions);
use Pod::Usage;
my %opt;
my $re;
GetOptions(
'sample|s=s' => \$opt{sample},
'debug|d' => \$opt{debug},
'help|h' => \$opt{help},
'man|m' => \$opt{man}
) or pod2usage(2);
pod2usage(1) if $opt{help};
pod2usage(-verbose => 2) if $opt{man};
pod2usage("$0: No files given.") if ((#ARGV == 0) && (-t STDIN));
$re = read_re($opt{sample});
say "DEBUG: search pattern is ($re)" if $opt{debug};
find_in_file($re);
sub find_in_file {
my $search = shift;
while( <> ) {
chomp;
next unless /$search/;
say;
}
}
sub read_re {
my $filename = shift;
open my $fh, '<', $filename
or die "Couldn't open $filename";
my #data = <$fh>;
close $fh;
chomp #data;
my $re = join('|', #data);
return $re;
}
__END__
=head1 NAME
file_in_file.pl - search strings of one file in other
=head1 SYNOPSIS
yt_video_list.pl [options] -s sample.txt dbfile.txt
Options:
-s,--sample search pattern file
-d,--debug debug flag
-h,--help brief help message
-m,--man full documentation
=head1 OPTIONS
=over 4
=item B<-s,--sample>
Search pattern file
=item B<-d,--debug>
Print debug information.
=item B<-h,--help>
Print a brief help message and exits.
=item B<-m,--man>
Prints the manual page and exits.
=back
B<This program> seaches patterns from B<sample> file in B<dbfile>
=cut

Are there any other uses of parenthesis in powershell?

As new to Powershell world, sometime I'm stuck in the tricky syntax. That's why I'm trying to figure out all the possible uses of the parenthesis inside the language.
Do you know some more? Can you add here?
Here mine (left out basic use of curly in pipeline and round in method calls):
# empty array
$myarray = #()
# empty hash
$myhash = #{}
# empty script block
$myscript = {}
# variables with special characters
${very strange variable # stack !! overflow ??}="just an example"
# Single statement expressions
(ls -filter $home\bin\*.ps1).length
# Multi-statement expressions inside strings
"Processes: $($p = “a*”; get-process $p )"
# Multi statement array expression
#( ls c:\; ls d:\)
Cause a statement to yield a result in an expression:
($x=3) + 5 # yields 8
When using generics, you need to wrap the type in [..]:
New-Object Collections.Generic.LinkedList[string]
For some people this might look confusing, because it is similar to indexing in arrays.
The Param( ) statement (in a function, script, or scriptblock)
Around the condition in an If (or Elseif statement)
Around the expression in a switch statement.
Edit: Forgot the condition in the while statement.
Edit2: Also, $() for subexpressions (e.g. in strings).
Regular expressions are arguably a first-class construct in Powershell.
If we're compiling a complete list, we can include the role that square and round brackets play in regular expressions.
An example:
$obj.connectionString = $obj.connectionString -replace '(Data Source)=[^;]+', '$1=serverB\SQL2008_R2'
Because of the support for XML, you can go so far as to include the square brackets used in XPath. (That's really drawing a long bow though :-)
select-xml $config -xpath "./configuration/connectionStrings/add[#name='LocalSqlServer']"
It's even written, but not enough clearly in the first short list after "Multi-statement expressions inside strings I'will add
# Var property inside a string
$a = get-process a*
write-host "Number of process : $a.length" # Get a list of process and then ".length
Number of process : System.Diagnostics.Process (accelerometerST) System.Diagnostics.Process (AEADISRV) System.Diagnostics.Process (agr64svc).length
write-host "Number of process : $($a.length)" # To get correct number of process
Number of process : 3
The parenthesis is most powerfully.
Suppose that you want collect all output, including errors, of some scriptblock and redirect to a variable or another functions for handle this... With the parenthesis, this is easy task:
$customScript = { "This i as test"; This will be procedure error! }
(. $customScript 2>&1 ) | %{"CAPTURING SCRIPT OUTPUT: "+$_}

How do I parse YAML with nil values?

I apologize for the very specific issue I'm posting here but I hope it will help others that may also run across this issue. I have a string that is being formatted to the following:
[[,action1,,],[action2],[]]
I would like to translate this to valid YAML so that it can be parsed which would look like this:
[['','acton1','',''],['action2'],['']]
I've tried a bunch of regular expressions to accomplish this but I'm afraid that I'm at a complete loss. I'm ok with running multiple expressions if needed. For example (ruby):
puts s.gsub!(/,/,"','") # => [[','action1','',']','[action2]','[]]
puts s.gsub!(/\[',/, "['',") # => [['','action1','',']','[action2]','[]]
That's getting there, but I have a feeling I'm starting to go down a rat-hole with this approach. Is there a better way to accomplish this?
Thanks for the help!
This does the job for the empty fields (ruby1.9):
s.gsub(/(?<=[\[,])(?=[,\]])/, "''")
Or for ruby1.8, which doesn't support zero-width look-behind:
s.gsub(/([\[,])(?=[,\]])/, "\\1''")
Quoting non-empty fields can be done with one of these:
s.gsub(/(?<=[\[,])\b|\b(?=[,\]])/, "'")
s.gsub(/(\w+)/, "'\\1'")
In the above I'm making use of zero-width positive look behind and zero-width positive look ahead assertions (the '(?<=' and '(?=').
I've looked for some ruby specific documentation but could not find anything that explains these features in particular. Instead, please let me refer you to perlre.
It would be easier to just parse it, then output valid YAML.
Since I don't know Ruby, Here is an example in Perl.
Since you only want a subset of YAML, that appears to be similar to JSON, I used the JSON module.
I've been wanting an excuse to use Regexp::Grammars, so I used it to parse the data.
I guarantee it will work, no matter how deep the arrays are.
#! /usr/bin/env perl
use strict;
#use warnings;
use 5.010;
#use YAML;
use JSON;
use Regexp::Grammars;
my $str = '[[,action1,,],[action2],[],[,],[,[],]]';
my $parser = qr{
<match=Array>
<token: Text>
[^,\[\]]*
<token: Element>
(?:
<.Text>
|
<MATCH=Array>
)
<token: Array>
\[
(?:
(?{ $MATCH = [qw'']; })
|
<[MATCH=Element]> ** (,)
)
\]
}x;
if( $str =~ $parser ){
say to_json $/{match};
}else{
die $# if $#;
}
Which outputs.
[["","action1","",""],["action2"],[],["",""],["",[],""]]
If you really wanted YAML, just un comment "use YAML;", and replace to_json() with Dump()
---
-
- ''
- action1
- ''
- ''
-
- action2
- []
-
- ''
- ''
-
- ''
- []
- ''
Try this:
s.gsub(/([\[,])(?=[,\]])/, "\\1''")
.gsub(/([\[,])(?=[^'\[])|([^\]'])(?=[,\]])/, "\\+'");
EDIT: I'm not sure about the replacement syntax. That's supposed to be group #1 in the first gsub, and the highest-numbered participating group -- $+ -- in the second.

Resources