Searching Multiple Strings in Huge log files - windows

Powershell question
Currently i have 5-10 log files all about 20-25GB each and need to search through each of them to check if any of 900 different search parameters match. i have written a basic powershell script that will search through the whole log file for 1 search parameter. if it matches it will dump out the results into a seperate text file, the problem is it is pretty slow. i was wondering if there is a way to speed this up by either making it search for all 900 parameters at once and only looking through the log once. any help would be good even if its just improving the script.
basic overview :
1 csv file with all the 900 items listed under an "item" column
1 log file (.txt)
1 result file (.txt)
1 ps1 file
here is the code i have below for powershell in a PS1 file:
$search = filepath to csv file<br>
$log = "filepath to log file"<br>
$result = "file path to result text file"<br>
$list = import-csv $search <br>
foreach ($address in $list) {<br>
Get-Content $log | Select-String $address.item | add-content $result <br>
*"#"below is just for displaying a rudimentary counter of how far through searching it is <br>*
$i = $i + 1 <br>
echo $i <br>
}

900 search terms is quite large a group. Can you reduce its size by using regular expressions? A trivial solution is based on reading the file row-by-row and looking for matches. Set up a collection that contains regexps or literal strings for search terms. Like so,
$terms = #("Keyword[12]", "KeywordA", "KeyphraseOne") # Array of regexps
$src = "path-to-some-huge-file" # Path to the file
$reader = new-object IO.StreamReader($src) # Stream reader to file
while(($line = $reader.ReadLine()) -ne $null){ # Read one row at a time
foreach($t in $terms) { # For each search term...
if($line -match $t) { # check if the line read is a match...
$("Hit: {0} ({1})" -f $line, $t) # and print match
}
}
}
$reader.Close() # Close the reader

Surely this is going to be incredibly painful on any parser you use just based on the file sizes you have there, but if your log files are of a format that is standard (for example IIS log files) then you could consider using a Log parsing app such as Log Parser Studio instead of Powershell?

Related

Powershell IF conditional isn't firing in the way I expected. Unsure what I'm doing wrong

I am writing a simple script that makes use of 7zip's command-line to extract archives within folders and then delete the original archives.
There is a part of my script that isn't behaving how I would expect it to. I can't get my if statement to trigger correctly. Here's a snippet of the code:
if($CurrentRar.Contains(".part1.rar")){
[void] $RarGroup.Add($CurrentRar)
# Value of CurrentRar:
# Factory_Selection_2.part1.rar
$CurrentRarBase = $CurrentRar.TrimEnd(".part1.rar")
# Value: Factory_Selection_2
for ($j = 1; $j -lt $AllRarfiles.Count; $j++){
$NextRar = $AllRarfiles[$j].Name
# Value: Factory_Selection_2.part2.rar
if($NextRar.Contains("$CurrentRarBase.part$j.rar")){
Write-Host "Test Hit" -ForegroundColor Green
# Never fires, and I have no idea why
# [void] $RarGroup.Add($NextRar)
}
}
$RarGroups.Add($RarGroup)
}
if($NextRar.Contains("$CurrentRarBase.part$j.rar")) is the line that I can't get to fire.
If I shorten it to if($NextRar.Contains("$CurrentRarBase.part")), it fires true. But as soon as I add the inline $j it always triggers false. I've tried casting $j to string but it still doesn't work. Am I missing something stupid?
Appreciate any help.
The issue seems to be your for statement and the fact that an array / list is zero-indexed (means they start with 0).
In your case, the index 0 of $AllRarfiles is probably the part1 and your for statement starts with 1, but the file name of index 1 does not contain part1 ($NextRar.Contains("$CurrentRarBase.part$j.rar"), but part2 ($j + 1).
As table comparison
Index / $j
Value
Built string for comparison (with Index)
0
Factory_Selection_2.part1.rar
Factory_Selection_2.part0.rar
1
Factory_Selection_2.part2.rar
Factory_Selection_2.part1.rar
2
Factory_Selection_2.part3.rar
Factory_Selection_2.part2.rar
3
Factory_Selection_2.part4.rar
Factory_Selection_2.part3.rar
Another simpler approach
Since it seems you want to group split RAR files which belong together, you could also use a simpler approach with Group-Object
# collect and group all RAR files.
$rarGroups = Get-ChildItem -LiteralPath 'C:\somewhere\' -Filter '*.rar' | Group-Object -Property { $_.Name -replace '\.part\d+\.rar$' }
# do some stuff afterwards
foreach($rarGroup in $rarGroups){
Write-Verbose -Verbose "Processing RAR group: $($rarGroup.Name)"
foreach($rarFile in $rarGroup.Group) {
Write-Verbose -Verbose "`tCurrent RAR file: $($rarFile.Name)"
# do some stuff per file
}
}

Search for strings from array in text file

I want to search a textfile for more than one string. If i find at least 1 string ( i repeat , i only need one string to be found, not all of them ) i want the program to stop and create a file in which i will find the text : "found"
This is my code that doesn't work properly :
$f = 'C:\users\datboi\desktop\dump.dmp'
$text = 'found'
$array = "_command",".command","-
command","!command","+command","^command",":command","]command","[command","#command","*command","$command","&command","#command","%command","=command","/command","\command","command!","command#","command#","command$","command%","command^","command&","command*","command-","command+","command=","command\","command/","command_","command.","command:"
$len = 9
$offset = 8
$data = [IO.File]::ReadAllBytes($f)
for ($i=0; $i -lt $data.Count - $offset; $i++) {
$slice = $data[$i..($i+$offset)]
$sloc = [char[]]$slice
if ($array.Contains($sloc)){
$text > 'command.log'
break
}
}
When i say it doesn t work properly i mean : it runs, no errors, but even if the file contains at least one of the strings from the array, it doesn't create the file i want .
This is literally what the Select-String cmdlet was created for. You can use a Regular Expression to simplify your search. For the RegEx I would use:
[_\.-!\+\^:]\[\#\*\$&#%=/\\]command|command[_\.-!\+\^:\#\*\$&#%=/\\]
That comes down to any of the characters in the [] brackets followed by the word 'command', or the word 'command' followed by any of the characters in the [] brackets. Then just pipe that to a ForEach-Object loop that outputs to your file and breaks.
Select-String -Path $f -Pattern '[_\.-!\+\^:]\[\#\*\$&#%=/\\]command|command[_\.-!\+\^:\#\*\$&#%=/\\]' | ForEach{
$text > 'command.log'
break
}
First, I would recommend using a regular expression as you can greatly shorten your code.
Second, PowerShell is good at pattern matching.
Example:
$symbolList = '_\-:!\.\[\]#\*\/\\&#%\^\+=\$'
$pattern = '([{0}]command)|(command[{0}])' -f $symbolList
$found = Select-String $pattern "inputfile.txt" -Quiet
$found
The $symbolList variable is a regular expression pattern containing a list of characters you want to find either before or after the word "command" in your search string.
The $pattern variable uses $symbolList to create the pattern.
The $found variable will be $true if the pattern is found in the file.

Powershell - Read a single text file and sort contents to multiple files based on text within the line

I'm looking for some direction on how to read a file line by line, then copy the line based on a search criteria to a newly created file. Since my description is probably poor, I've tried to illustrate below:
Single Text File Sample:
Name=N0060093G
Name=N0060093H
Name=N400205PW
Name=N400205PX
Name=N966O85Q0
Name=N966O85Q1
The script would read each line and use the "###" after "Name=N", to create a new file name after the identifier, "###" to copy each appropriate line to the new file. So, lines "Name=N0060093G"and "Name=N0060093H" would go to "006.txt"; "Name=N400205PW" and "Name=N400205PX" would write to "400.txt", etc.
A RegEx style approach:
$File = 'test.txt'
Get-Content $File | ForEach {
If ($_ -match '^Name\=N(?<filename>\d{3}).*') {
$_ | Out-File -Append "$($Matches.Filename).txt" -WhatIf
}
}

Awk command for powershell

Is there any command like awk in powershell?
I want to execute this command:
awk '
BEGIN {count=1}
/^Text/{text=$0}
/^Time/{time=$0}
/^Rerayzs/{retext=$0}
{
if (NR % 3 == 0) {
printf("%s\n%s\n%s\n", text, time, retext) > (count ".txt")
count++
}
}' file
to a powershell command.
Usually we like to see what you have tried. It at least shows that you are making an effort, and we aren't just doing your work for you. I think you're new to PowerShell, so I'm going to just spoon-feed you an answer, hoping that you use it to learn and expand your knowledge, and hopefully have better questions in the future.
I am pretty sure that this will accomplish the same thing as what you laid out. You have to give it an array of input (the contents of a text file, an array of strings, something like that), and it will generate several files depending on how many matches it finds for the treo "Text", "Time", and "Rerayzs". It will order them as Text, then a new line with Time, and then a new line with Rerayzs.
$Text,$Time,$Retext = $Null
$FileCounter = 1
gc c:\temp\test.txt|%{
Switch($_){
{$_ -match "^Text"} {$Text = $_}
{$_ -match "^Time"} {$Time = $_}
{$_ -match "^Rerayzs"} {$Retext = $_}
}
If($Text -and $Time -and $Retext){
("{0}`n{1}`n{2}") -f $Text,$Time,$Retext > "c:\temp\$FileCounter.txt"
$FileCounter++
$Text,$Time,$Retext = $Null
}
}
That will get the text of a file C:\Temp\Test.txt and will output numbered files to the same location. The file I tested against is:
Text is good.
Rerayzs initiated.
Stuff to not include
Time is 18:36:12
Time is 20:21:22
Text is completed.
Rerayzs failed.
I was left with 2 files as output. The first reads:
Text is good.
Time is 18:36:12
Rerayzs initiated.
The second reads:
Text is completed.
Time is 20:21:22
Rerayzs failed.

Powershell: Count instances of strings in a file using a list

I am trying to get the number of times a string (varying from 40 to 400+ characters) in "file1" occurs in "file2" in an effective way. file1 has about 2k lines and file2 has about 130k lines. I currently have a Unix solution that does it in about 2 mins in a VM and about 5 in Cygwin, but I am trying to do it with Powershell/Python since the files are in windows and I am using the output in excel and use it with automation (AutoIT.)
I have a solution, but it takes WAY too long (in about the same times that the Cygwin finished - all 2k lines - I had only 40-50 lines in Powershell!)
Although I haven't prepare a solution yet, I am open to use Python as well if there is a solution that can be fast and accurate.
Here is the Unix Code:
while read SEARCH_STRING;
do printf "%s$" "${SEARCH_STRING}";
grep -Fc "${SEARCH_STRING}" file2.csv;
done < file1.csv | tee -a output.txt;
And here is the Powershell code I currently have
$Target = Get-Content .\file1.csv
Foreach ($line in $Target){
#Just to keep strings small, since I found that not all
#strings were being compared correctly if they where 250+ chars
$line = $line.Substring(0,180)
$Coll = Get-Content .\file2.csv | Select-string -pattern "$line"
$cnt = $Coll | measure
$cnt.count
}
Any ideas of suggestions will help.
Thanks.
EDIT
I'm trying a modified solution suggested by C.B.
del .\output.txt
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "C:\temp\file2.csv" )
Foreach ($line in $Target){
$line = [string]$line.Substring(0, $line.length/2)
$cnt = [regex]::matches( [string]$file, $line).count >> ".\output.txt"
}
But, since my strings in file1 are varying in length I keept getting OutOfBound exceptions for the SubString function, so I halved (/2) the input string to try to get a match. And when I try to halve them, if I it had an open parentheses, it tells me this:
Exception calling "Matches" with "2" argument(s): "parsing "CVE-2013-0796,04/02/2013,MFSA2013-35 SeaMonkey: WebGL
crash with Mesa graphics driver on Linux (C" - Not enough )'s."
At C:\temp\script_test.ps1:6 char:5
+ $cnt = [regex]::matches( [string]$file, $line).count >> ".\output.txt ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ArgumentException
I don't know if there is a way to raise the input limit in powershell (My biggest size at the moment is 406, but could be bigger in the future) or just give up and try a Python solution.
Thoughts?
EDIT
Thanks to #C.B. I got the correct answer and it matches the output of the Bash script perfectly. Here is the full code that outputs results to a text file:
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "C:\temp\file2.csv" )
Foreach ($line in $Target){
$cnt = [regex]::matches( $file, [regex]::escape($line)).count >> ".\output.txt"
}
Give this a try:
$Target = Get-Content .\file1.csv
$file= [System.IO.File]::ReadAllText( "c:\test\file2.csv" )
Foreach ($line in $Target){
$line = $line.Substring(0,180)
$cnt = [regex]::matches( $file, [regex]::escape($line)).count
}
One issue with your script is that you read file2.csv over and over again, for each line from file1.csv. Reading the file just once and storing the content in a variable should significantly speed things up. Try this:
$f2 = Get-Content .\file2.csv
foreach ($line in (gc .\file1.csv)) {
$line = $line.Substring(0,180)
#($f2 | ? { $_ -match $line }).Count
}

Resources