Search for strings from array in text file - windows

I want to search a textfile for more than one string. If i find at least 1 string ( i repeat , i only need one string to be found, not all of them ) i want the program to stop and create a file in which i will find the text : "found"
This is my code that doesn't work properly :
$f = 'C:\users\datboi\desktop\dump.dmp'
$text = 'found'
$array = "_command",".command","-
command","!command","+command","^command",":command","]command","[command","#command","*command","$command","&command","#command","%command","=command","/command","\command","command!","command#","command#","command$","command%","command^","command&","command*","command-","command+","command=","command\","command/","command_","command.","command:"
$len = 9
$offset = 8
$data = [IO.File]::ReadAllBytes($f)
for ($i=0; $i -lt $data.Count - $offset; $i++) {
$slice = $data[$i..($i+$offset)]
$sloc = [char[]]$slice
if ($array.Contains($sloc)){
$text > 'command.log'
break
}
}
When i say it doesn t work properly i mean : it runs, no errors, but even if the file contains at least one of the strings from the array, it doesn't create the file i want .

This is literally what the Select-String cmdlet was created for. You can use a Regular Expression to simplify your search. For the RegEx I would use:
[_\.-!\+\^:]\[\#\*\$&#%=/\\]command|command[_\.-!\+\^:\#\*\$&#%=/\\]
That comes down to any of the characters in the [] brackets followed by the word 'command', or the word 'command' followed by any of the characters in the [] brackets. Then just pipe that to a ForEach-Object loop that outputs to your file and breaks.
Select-String -Path $f -Pattern '[_\.-!\+\^:]\[\#\*\$&#%=/\\]command|command[_\.-!\+\^:\#\*\$&#%=/\\]' | ForEach{
$text > 'command.log'
break
}

First, I would recommend using a regular expression as you can greatly shorten your code.
Second, PowerShell is good at pattern matching.
Example:
$symbolList = '_\-:!\.\[\]#\*\/\\&#%\^\+=\$'
$pattern = '([{0}]command)|(command[{0}])' -f $symbolList
$found = Select-String $pattern "inputfile.txt" -Quiet
$found
The $symbolList variable is a regular expression pattern containing a list of characters you want to find either before or after the word "command" in your search string.
The $pattern variable uses $symbolList to create the pattern.
The $found variable will be $true if the pattern is found in the file.

Related

Powershell 7.x How to Select a Text Substring of Unknown Length Only Using Boundary Substrings

I am trying to store a text file string which has a beginning and end that make it a substring of the original text file. I am new to Powershell so my methods are simple/crude. Basically my approach has been:
Roughly get what I want from the start of the string
Worry about trimming off what I don't want later
My minimum reproducible example is as follows:
# selectStringTest.ps
$inputFile = Get-Content -Path "C:\test\test3\Copy of 31832_226140__0001-00006.txt"
# selected text string needs to span from $refName up to $boundaryName
[string]$refName = "001 BARTLETT"
[string]$boundaryName = "001 BEECH"
# a rough estimate of the text file lines required
[int]$lines = 200
if (Select-String -InputObject $inputFile -pattern $refName) {
Write-Host "Selected shortened string found!"
# this selects the start of required string but with extra text
[string]$newFileStart = $inputFile | Select-String $refName -CaseSensitive -SimpleMatch -Context 0, $lines
}
else {
Write-Host "Selected string NOT FOUND."
}
# tidy up the start of the string by removing rubbish
$newFileStart = $newFileStart.TrimStart('> ')
# this is the kind of thing I want but it doesn't work
$newFileStart = $newFileStart - $newFileStart.StartsWith($boundaryName)
$newFileStart | Out-File tempOutputFile
As it is: the output begins correctly but I cannot remove text including and after $boundaryName
The original text file is OCR generated (Optical Character Recognition) So it is unevenly formatted. There are newlines in odd places. So I have limited options when it comes to delimiting.
I am not sure my if (Select-String -InputObject $inputFile -pattern $refName)is valid. It appears to work correctly. The general design seems crude. In that I am guessing how many lines I will need. And finally I have tried various methods of trimming the string from $boundaryName without success. For this:
string.split() not practical
replacing spaces with newlines in an array & looping through to elements of $boundaryName is possible but I don't know how to terminate the array at this point before returning it to string.
Any suggestions would be appreciated.
Abbreviated content of x2 200 listings single Copy of 31832_226140__0001-00006.txt file is:
Beginning of text file
________________
BARTLETT-BEDGGOOD
PENCARROW COMPOSITE ROLL
PAGE 6
PAGE 7
PENCARROW COMPOSITE ROLL
BEECH-BEST
www.
.......................
001 BARTLETT. Lois Elizabeth
Middle of text file
............. 15 St Ronans Av. Lower Hutt Marned 200 BEDGGOOD. Percy Lloyd
............15 St Ronans Av, Lower Mutt. Coachbuild
001 BEECH, Margaret ..........
End of text file
..............312 Munita Rood Eastbourne, Civil Eng 200 BEST, Dons Amy .........
..........50 Man Street, Wamuomata, Marned
SO NON
To use a regex across newlines, the file needs to be read as a single string. Get-Content -Raw will do that. This assumes that you do not want the lines containing refName and boundaryName included in the output
$c = Get-Content -Path '.\beech.txt' -Raw
$refName = "001 BARTLETT"
$boundaryName = "001 BEECH"
if ($c -match "(?smi).*$refName.*?`r`n(.*)$boundaryName.*?`r`n.*") {
$result = $Matches[1]
}
$result
More information at https://stackoverflow.com/a/12573413/447901
How close does this come to what you want?
function Process-File {
param (
[Parameter(Mandatory = $true, Position = 0)]
[string]$HeadText,
[Parameter(Mandatory = $true, Position = 1)]
[string]$TailText,
[Parameter(ValueFromPipeline)]
$File
)
Process {
$Inside = $false;
switch -Regex -File $File.FullName {
#'^\s*$' { continue }
"(?i)^\s*$TailText(?<Tail>.*)`$" { $Matches.Tail; $Inside = $false }
'^(?<Line>.+)$' { if($Inside) { $Matches.Line } }
"(?i)^\s*$HeadText(?<Head>.*)`$" { $Matches.Head; $Inside = $true }
default { continue }
}
}
}
$File = 'Copy of 31832_226140__0001-00006.txt'
#$Path = $PSScriptRoot
$Path = 'C:\test\test3'
$Result = Get-ChildItem -Path "$Path\$File" | Process-File '001 BARTLETT' '001 BEECH'
$Result | Out-File -FilePath "$Path\SpanText.txt"
This is the output:
. Lois Elizabeth
............. 15 St Ronans Av. Lower Hutt Marned 200 BEDGGOOD. Percy Lloyd
............15 St Ronans Av, Lower Mutt. Coachbuild
, Margaret ..........

batch rename files and keep the last dash

I have many many files in one folder, which look like this:
E123_1_410_4.03_97166_456_2.B.pdf
E123-1-410-4.03-97166-456_2.B.pdf
I can change all the underscores, but not just 5 of them.
$names = "AD1-D-1234-3456-01","111-D-abcd-3456-01","abc-d-efgi-jklm-no","xxx-xx-xxxx-xxxx-xx"
$names |
ForEach-Object{
$new = $_ -replace '(?x)
^ # beginning of string
( # begin group 1
[^-]{3} # a pattern of three non-hyphen characters
) # end of group 1
- # a hyphen
( # begin group 2
[^-] # a non-hyphen (one character)
- # a hyphen
[^-]{4} # a pattern of non-hyphen characters four characters in length
- # a hyphen
[^-]{4} # a pattern of non-hyphen characters four characters in length
) # end of group 2
- # a hyphen
( # begin group 3
[^-]{2} # a pattern of non-hyphen characters two characters in length
) # end of group 3
$ # end of string
', '$1_$2_$3' # put the groups back in order and insert "_" between the three groups
if ($new -eq $_){ # check to see if the substitution worked. I.e., was the pattern in $_ correct
Write-Host "Replacement failed for '$_'"
}
else{
$new
}
}
This will rename the files by replacing all underscores in it to dashes, except for the last underscore:
(Get-ChildItem -Path 'X:\Where\The\Files\Are' -Filter '*_*.*' -File) | Rename-Item -NewName {
$prefix, $postfix = $_.Name -split '^(.+)(_[^_]+)$' -ne ''
"{0}$postfix" -f ($prefix -replace '_', '-')
} -WhatIf
I have put the Get-ChildItem inside brackets to let it finish gathering the files first. If you leave that out, there is the possibility it might pick up files that were already renamed which is a waste of time.
The added switch _WhatIf is a safety device. This lets you see in the console window what the code would rename. If you are satisfied this is correct, remove the -WhatIf switch and run the code again so the files actually are renamed.
Examples:
X:\Where\The\Files\Are\111_D_abcd_3456_01_qqq_7C.pdf --> X:\Where\The\Files\Are\111-D-abcd-3456-01-qqq_7C.pdf
X:\Where\The\Files\Are\AD1_D-1234_3456-01_xyz_3.A.pdf --> X:\Where\The\Files\Are\AD1-D-1234-3456-01-xyz_3.A.pdf
X:\Where\The\Files\Are\E123_1_410_4.03_97166_456_2.B.pdf --> X:\Where\The\Files\Are\E123-1-410-4.03-97166-456_2.B.pdf
If you want to keep the last underscore when renaming your file, use split to deconstruct part of the word, and reconstruct the name by using a loop. At last add the dash a the end. In this way whatever the number of underscores, you can replace all of them.
Working code:
$names = "E123_1_410_4.03_97166_456-test-test_2.pdf", "E123_1_410_4.03_97166_456_2.B.pdf"
$names |
ForEach-Object{
$new = [string]::empty;
#split
$tab = $_.split("_");
#do nothing if there is only one or no dash
if($tab.count -gt 2){
#reconstruct by using keep a dash at the end
$new = $tab[0];
for($i = 1; $i -lt $tab.count - 1; $i++){
$txt = $tab[$i];
$new += "-" + $txt ;
}
#add last dash
$txt = $tab[$tab.count - 1];
$new += "_" + $txt;
if ($new -eq $_){ # check to see if the substitution worked. I.e., was the pattern in $_ correct
Write-Host "Replacement failed for '$_'"
}
else{
write-Host $new;
}
}
}

Windows Powershell script to find and replace a string after a particular string

I am currently working to convert AS3 class to JavaScript using Powershell script.
Below is the sample code needs to be converted.
package somePackageName
{
class someClassName
{
// other codes
}
}
I need the entire package block to be removed and "class someClassName{" should be converted to "function someClassName(){".
The "someClassName" can be any string.
And I need the output like this.
function someClassName()
{
}
This is what I tried.
$l1 = Get-Content $dest | Where-Object {$_ -like 'class'}
$arr = $l1 -split ' '
$n1 = "function "+ $arr[1] + "() " +$arr[2]
(Get-Content $dest) -creplace $l1, $n1 | Set-Content $dest
I can able to achieve what I intended if the opening brace is in same line as the package declaration line. As Powershell checks line by line, I am stuck if the opening brace present in next line.
Regex based solution
Depending on your willingness to post process this or accept leading spaces you could use this regex to remove the block outside of the class and replace with a function declaration. This is messier than it needs to be but safer since we cannot guess what // other codes is. You could just match the whole class block outright but if there are other curly braces in there it would muddy the regex.
PS M:\> (Get-Content -Raw $dest) -replace "(?sm).*?class (\w+)(.*)}",'function $1()$2'
function someClassName()
{
// other codes
}
See Regex101 for more detail on what the regex is doing.
Basically dump everything until the word class (first time). Then keep everything until the last closing brace
Note the leading space in the greater portion. This is honoring the existing space. To account for this we need to calculate the indentation. Simply removing all leading space would break existing indentation in the class/function.
So a solution like this might be preferred:
# Read in the file as a single string
$raw = (Get-Content -Raw $dest)
# Using the line that has the class declaration measure the number of spaces in front of it.
[void]($raw -match "(?m)^(\s+)class")
$leadingSpacesToRemove = $Matches[1].Length
# Remove the package block. Also remove a certain amount of leading space.
$raw -replace "(?sm).*?class (\w+)(.*)}",'function $1()$2' -replace "(?m)^\s{$leadingSpacesToRemove}"
Less regex
Seems filtering the lines with no leading spaces is an easy way to narrow down to what you want.
Get-Content $dest | Where-Object{$_.StartsWith(" ")}
From there we still need to replace the "class" and deal with the leading spaces. For those we are going to use similar solutions to what I showed above.
# Read in the file as a single string. Skipping the package wrapper since it has no leading spaces.
$classBlock = Get-Content $dest | Where-Object{$_.StartsWith(" ")}
# Get the class name and the number of leading spaces.
$classBlock[0] -match "^(\s+)class (\w+)" | Out-Null
$leadingSpacesToRemove = $matches[1].Length
$className = $matches[2]
# Output the new declaration and the trimmed block.
# Using an array to start so that piping output will be in one pipe
#("function $className()") + ($classBlock | Select -Skip 1) -replace "^\s{$leadingSpacesToRemove}"
Both solutions try to account for your exact specifications and account for the presence of weird stuff inside the class block.
I'd suggest using regex:
#class myclass -> function myclass()
#(Get-Content $dest) -creplace 'class\s(.+)', 'function $1()' |
Set-Content $dest
This will capture the class declaration and replace it with a backreference to the class name capture.

Powershell: Count instances of strings in a file using a list

I am trying to get the number of times a string (varying from 40 to 400+ characters) in "file1" occurs in "file2" in an effective way. file1 has about 2k lines and file2 has about 130k lines. I currently have a Unix solution that does it in about 2 mins in a VM and about 5 in Cygwin, but I am trying to do it with Powershell/Python since the files are in windows and I am using the output in excel and use it with automation (AutoIT.)
I have a solution, but it takes WAY too long (in about the same times that the Cygwin finished - all 2k lines - I had only 40-50 lines in Powershell!)
Although I haven't prepare a solution yet, I am open to use Python as well if there is a solution that can be fast and accurate.
Here is the Unix Code:
while read SEARCH_STRING;
do printf "%s$" "${SEARCH_STRING}";
grep -Fc "${SEARCH_STRING}" file2.csv;
done < file1.csv | tee -a output.txt;
And here is the Powershell code I currently have
$Target = Get-Content .\file1.csv
Foreach ($line in $Target){
#Just to keep strings small, since I found that not all
#strings were being compared correctly if they where 250+ chars
$line = $line.Substring(0,180)
$Coll = Get-Content .\file2.csv | Select-string -pattern "$line"
$cnt = $Coll | measure
$cnt.count
}
Any ideas of suggestions will help.
Thanks.
EDIT
I'm trying a modified solution suggested by C.B.
del .\output.txt
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "C:\temp\file2.csv" )
Foreach ($line in $Target){
$line = [string]$line.Substring(0, $line.length/2)
$cnt = [regex]::matches( [string]$file, $line).count >> ".\output.txt"
}
But, since my strings in file1 are varying in length I keept getting OutOfBound exceptions for the SubString function, so I halved (/2) the input string to try to get a match. And when I try to halve them, if I it had an open parentheses, it tells me this:
Exception calling "Matches" with "2" argument(s): "parsing "CVE-2013-0796,04/02/2013,MFSA2013-35 SeaMonkey: WebGL
crash with Mesa graphics driver on Linux (C" - Not enough )'s."
At C:\temp\script_test.ps1:6 char:5
+ $cnt = [regex]::matches( [string]$file, $line).count >> ".\output.txt ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ArgumentException
I don't know if there is a way to raise the input limit in powershell (My biggest size at the moment is 406, but could be bigger in the future) or just give up and try a Python solution.
Thoughts?
EDIT
Thanks to #C.B. I got the correct answer and it matches the output of the Bash script perfectly. Here is the full code that outputs results to a text file:
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "C:\temp\file2.csv" )
Foreach ($line in $Target){
$cnt = [regex]::matches( $file, [regex]::escape($line)).count >> ".\output.txt"
}
Give this a try:
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "c:\test\file2.csv" )
Foreach ($line in $Target){
$line = $line.Substring(0,180)
$cnt = [regex]::matches( $file, [regex]::escape($line)).count
}
One issue with your script is that you read file2.csv over and over again, for each line from file1.csv. Reading the file just once and storing the content in a variable should significantly speed things up. Try this:
$f2 = Get-Content .\file2.csv
foreach ($line in (gc .\file1.csv)) {
$line = $line.Substring(0,180)
#($f2 | ? { $_ -match $line }).Count
}

Searching Multiple Strings in Huge log files

Powershell question
Currently i have 5-10 log files all about 20-25GB each and need to search through each of them to check if any of 900 different search parameters match. i have written a basic powershell script that will search through the whole log file for 1 search parameter. if it matches it will dump out the results into a seperate text file, the problem is it is pretty slow. i was wondering if there is a way to speed this up by either making it search for all 900 parameters at once and only looking through the log once. any help would be good even if its just improving the script.
basic overview :
1 csv file with all the 900 items listed under an "item" column
1 log file (.txt)
1 result file (.txt)
1 ps1 file
here is the code i have below for powershell in a PS1 file:
$search = filepath to csv file<br>
$log = "filepath to log file"<br>
$result = "file path to result text file"<br>
$list = import-csv $search <br>
foreach ($address in $list) {<br>
Get-Content $log | Select-String $address.item | add-content $result <br>
*"#"below is just for displaying a rudimentary counter of how far through searching it is <br>*
$i = $i + 1 <br>
echo $i <br>
}
900 search terms is quite large a group. Can you reduce its size by using regular expressions? A trivial solution is based on reading the file row-by-row and looking for matches. Set up a collection that contains regexps or literal strings for search terms. Like so,
$terms = #("Keyword[12]", "KeywordA", "KeyphraseOne") # Array of regexps
$src = "path-to-some-huge-file" # Path to the file
$reader = new-object IO.StreamReader($src) # Stream reader to file
while(($line = $reader.ReadLine()) -ne $null){ # Read one row at a time
foreach($t in $terms) { # For each search term...
if($line -match $t) { # check if the line read is a match...
$("Hit: {0} ({1})" -f $line, $t) # and print match
}
}
}
$reader.Close() # Close the reader
Surely this is going to be incredibly painful on any parser you use just based on the file sizes you have there, but if your log files are of a format that is standard (for example IIS log files) then you could consider using a Log parsing app such as Log Parser Studio instead of Powershell?

Resources