power shell Inserting text in the middle of large files(90MB) - windows

As part of our project we are downloading huge chunk of eml files from secure sftp location,after downloading we need to add a subtag in each of the downloaded file which is around 90 MB ,i tried to add the sub tag using powershell script that i have seen in other site and pasted below,it works fine for small files of 10 kb to 200kb but when i try to use the same script for huge files the scripts got struck, can anyone please help to get through it.
(Get-Content F:\EmlProcessor\UnZipped\example.eml) |
Foreach-Object {
$_ # send the current line to output
if ($_ -match "x-globalrelay-MsgType: ICECHAT")
{
#Add Lines after the selected pattern
" X-Autonomy SubTag=GMAIL"
}
} | Set-Content F:\EmlProcessor\EmlProcessor\example2.txt
SAMPLE EML FILE
Date: Tue, 3 Oct 2017 07:44:32 +0000 (UTC)
From: XYZ
To: ABC
Message-ID: <1373565887.28221.1507075364517.JavaMail.tomcat#HKLVATAPP075>
Subject: Symphony: 2 users, 4 messages, duration 00:00
MIME-Version: 1.0
Content-Type: multipart/mixed;
boundary="----=_Part_28220_1999480254.1507075364517"
x-globalrelay-MsgType: GMAIL
x-symphony-StreamType: GMAIL
x-symphony-StreamID: RqN3HnR/ajgZvWOstxzLuH///qKcERyOdA==
x-symphony-ContentStartDateUTC: 1507016636610
x-symphony-ContentStopDateUTC: 1507016672387
x-symphony-FileGeneratedDateUTC: 1507075364516
------=_Part_28220_1999480254.1507075364517
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<!DOCTYPE html><html><body><p><font color=3D"grey">Message ID: Un/pfFrGvvVy=
T6quhMBKjX///qEezwdFdA=3D=3D</font><br>2017-10-03T07:43:56.610Z 0
----
------
-----
</HTML>
As shown in the above sample input file i must add a text "X-Autonomy SubTab" above or below "x-globalrelay-MsgType".
I tried to add subtag to sample file which is of 90 MB ,as said it got struck,though my requirement is to add to nearly 2K files by looping through each file ,i have tried it for one file with the above code but was unsuccessful,I am very new to batch & windows powershell scripting, any quick help is appreciated.

Are you sure it is stuck or just takes longer? Your code has to iterate through thousands of lines to find a match.
I did not have large text file to test with so converted a large csv (60 MB) to txt and this was working for me pretty fast (10-15 sec).
Note: Since you are new and you realize the power of PowerShell, I am going to be really generous. Most people would expect you to put in some effort yourself but I have faith that you will at least try to understand what the script is doing. Because if you use the scripts you get here directly on your environment without testing, you could end up doing some serious damage. So, at least for the sake of testing, you would understand what each line does. I have edited the code to use functions for scalability. I could use multi-threading to speed up the process but since this is a heavy CPU oriented operation, I do not think it would do much good.
#Coz functions are the best
Function Insert-SubTag ($Path)
{
$FileName = $Path | Split-Path -Leaf
$File = Get-Content -Path $Path
$Line = $File | Select-String -Pattern "x-globalrelay-MsgType"
$LineNumber = $Line.LineNumber
#Since Linenumber starts from 1 but array count starts from 0
$File[$LineNumber - 1] = "$Line
X-Autonomy SubTag=GMAIL"
$SavePath = "F:\EmlProcessor\UnZipped2\$FileName" #You can also pass the save folder as a parameter to this function like $path
$File | Set-Content -Path $SavePath
}
#If you have the list of Files in a text file use this
$FileList = Get-content C:\FileList.txt
#If you have a folder, and want to iterate through each file, use this
$FileList = (Get-ChildItem -Path "F:\EmlProcessor\UnZipped").FullName
Foreach ($FilePath in $FileList)
{
Insert-SubTag -Path $FilePath
}
Assuming that x-globalrelay-MsgType only appears once in the text file.
Do not forget to consider selecting this as the answer if it works for you.

Related

Powershell script: List files with specific change date (Amount if possible)

For license porpuses I try to automate the counting process instead of having to login into every single server, go into directory, search a file name and count the results based on the change date.
Want I'm aiming for:
Running a powershell script every month that checks the directory "C:\Users" for the file "Outlook.pst" recursively. And then filters the result by change date (one month or newer). Then packing this into an email to send to my inbox.
I'm not sure if that's possible, cause I am fairly new to powershell. Would appreciate your help!
It is possible.
I dont know how to start a ps session on a remote computer, but I think the cmdlet Enter-PSSession will do the trick. Or at least it was the first result while searching for "open remote powershell session". If that does not work use the Invoke-Command as suggested by lit to get $outlookFiles as suggested below.
For the rest use this.
$outlookFiles = Get-ChildItem -Path "C:\Users" -Recurse | Where-Object { $_.Name -eq "Outlook.pst" }
Now you have all files that have this name. If you are not familiar with the pipe in powershell it redirects all objects it found with the Get-ChildItem to the next pipe section and here the Where-Object will filter the received objects. If the current object ($_) will pass the condition it is returned by the whole command.
Now you can filter these objects again to only include the latest ones with.
$latestDate = (Get-Date).AddMonths(-1)
$newFiles = $outlookFiles | Where-Object { $_.LastAccessTime -gt $latestDate }
Now you have all the data you want in one object. Now you only have to format this how you like it e.g. you could use $mailBody = $newFiles | Out-String and then use Send-MailMessage -To x#y.z -From r#g.b -Body $mailBodyto send the mail.

How to get the Dropbox folder in Powershell in Windows

Same question exists for Python here: How can I get the Dropbox folder location programmatically in Python?, or here for OSX: How to get the location of currently logined Dropbox folder
Same thing in Powershell. I need the path of DropBox to copy files to it (building a software and then copying it to dropbox to share with team).
This Dropbox help page tells us where this info is stored, ie, in a json file in the AppData of the user: https://www.dropbox.com/help/4584
function GetDropBoxPathFromInfoJson
{
$DropboxPath = Get-Content "$ENV:LOCALAPPDATA\Dropbox\info.json" -ErrorAction Stop | ConvertFrom-Json | % 'personal' | % 'path'
return $DropboxPath
}
The line above is taken from: https://www.powershellgallery.com/packages/Spizzi.Profile/1.0.0/Content/Functions%5CProfile%5CInstall-ProfileEnvironment.ps1
Note that it doesn't check if you've got a Dropbox business account, or if you have both. It just uses the personal one.
You can then use this base Dropbox folder to build your final path, for example:
$targetPath = Join-Path -Path (GetDropBoxPathFromInfoJson) -ChildPath 'RootDropboxFolder\Subfolder1\Subfolder2'
if (-not (Test-Path -Path $targetPath)) { throw "Path '$targetPath' not found!" }
--
Alternative way is using the host.db file, as shown on this page:
http://bradinscoe.tumblr.com/post/75819881755/get-dropbox-path-in-powershell
$base64path = gc $env:appdata\Dropbox\host.db | select -index 1 # -index 1 is the 2nd line in the file
$dropboxPath = [System.Text.Encoding]::ASCII.GetString([System.Convert]::FromBase64String($base64path)) # convert from base64 to ascii

Move columns in CSV with batch or powershell

I'm using MediaInfo CLI version in Win 7 x64 to automatically make a CSV via template when a video file has finished encoding in StaxRip.
However, its CLI version is critical about how to apply the output template (long story short, its variables are in sections (general, video, audio, text) and you can only use one section in one block, you can't go back to a previous section further down the template), so one variable that I want elsewhere has to end up in the wrong spot for the automation to even work.
Like this:
UTC 2015-05-21 18:04:06,Episode01.mp4,211 MiB,22mn 7s,29.970 fps,1 210 Kbps,High 10#L3,120 Kbps,AAC,Japanese
UTC 2015-05-21 19:16:18,Episode02.mp4,211 MiB,22mn 6s,29.970 fps,1 212 Kbps,High 10#L3,118 Kbps,AAC,Japanese
UTC 2015-05-21 20:24:57,Episode03.mp4,211 MiB,22mn 6s,29.970 fps,1 212 Kbps,High 10#L3,119 Kbps,AAC,Japanese
What I'm looking for is the timestamp portion (first column) to become the LAST column instead:
Episode01.mp4,211 MiB,22mn 7s,29.970 fps,1 210 Kbps,High 10#L3,120 Kbps,AAC,Japanese,UTC 2015-05-21 18:04:06
I would very much love to find a solution to this in a .bat or Powershell script if possible since these are already used in the aforementioned process, but am open to small single-purpose applications. The crucial part is being able to be run from CMD or from a master .bat file.
Thank you for your time.
I tried this one out and is working.
[string] $SourceFileFullPath = "C:\Projects\INT\CSV_ColumnSwap.csv"
[Array] $SourceFileContent = Get-Content $SourceFileFullPath
[int] $ArrayLength = $SourceFileContent.length
for ($i=0; $i -lt $ArrayLength; $i++) {
$splitter1 = ","
$LineData = $SourceFileContent[$i] -split $splitter1
$DateTimeV, $Linedata = $LineData
$LineData += $DateTimeV
$LineData -join "," >> Result.csv
}
I am not particularly sure about the performance aspects. YMMV.
Cheers

Extract hostnames from Perfmon blg with Powershell

I'm writing a script which will automate the extraction of data from .blg Perfmon logs.
I've worked out the primary Import-Counter commands I will need to use to get the data out, but am trying to parametrise this so that I can do it for each machine in the log file (without having to open the log up in Perfmon, which can take 15 minutes or sometimes more, and is the reason I'm writing this script), and find out what each hostname is.
The script I have does the job, but it still takes a minute to return the data I want, and I wondered if there was a simpler way to do this, as I'm not too familiar with Powershell?
Here's what I have:
$counters = Import-Counter -Path $log_path$logfile -ListSet * | Select-Object paths -ExpandProperty paths
$svrs = #()
# for each line in the list of counters, extract the name of the server and add it to the array
foreach ($line in $counters) {
$svrs += $line.split("\")[2]
}
# remove duplicates and sort the list of servers
$sorted_svrs = $svrs | sort -unique
foreach ($svr in $sorted_svrs) {
Write-Host $svr
}
I'm just printing the names for the moment, but they'll go into an array in the proper script, and then I'll run my Import-Counter block with each of these hosts parametrised in.
Just wondered if there was a better way of doing this?
$sorted_svrs=Import-Counter "$log_path$logfile" -Counter "\\*\physicaldisk(_total)\% disk time" | %{$_.countersamples.path.split("\")[2]} | sort -Unique

Unix tail equivalent command in Windows Powershell

I have to look at the last few lines of a large file (typical size is 500MB-2GB). I am looking for a equivalent of Unix command tail for Windows Powershell. A few alternatives available on are,
http://tailforwin32.sourceforge.net/
and
Get-Content [filename] | Select-Object -Last 10
For me, it is not allowed to use the first alternative, and the second alternative is slow. Does anyone know of an efficient implementation of tail for PowerShell.
Use the -wait parameter with Get-Content, which displays lines as they are added to the file. This feature was present in PowerShell v1, but for some reason not documented well in v2.
Here is an example
Get-Content -Path "C:\scripts\test.txt" -Wait
Once you run this, update and save the file and you will see the changes on the console.
For completeness I'll mention that Powershell 3.0 now has a -Tail flag on Get-Content
Get-Content ./log.log -Tail 10
gets the last 10 lines of the file
Get-Content ./log.log -Wait -Tail 10
gets the last 10 lines of the file and waits for more
Also, for those *nix users, note that most systems alias cat to Get-Content, so this usually works
cat ./log.log -Tail 10
As of PowerShell version 3.0, the Get-Content cmdlet has a -Tail parameter that should help. See the technet library online help for Get-Content.
I used some of the answers given here but just a heads up that
Get-Content -Path Yourfile.log -Tail 30 -Wait
will chew up memory after awhile. A colleague left such a "tail" up over the last day and it went up to 800 MB. I don't know if Unix tail behaves the same way (but I doubt it). So it's fine to use for short term applications, but be careful with it.
PowerShell Community Extensions (PSCX) provides the Get-FileTail cmdlet. It looks like a suitable solution for the task. Note: I did not try it with extremely large files but the description says it efficiently tails the contents and it is designed for large log files.
NAME
Get-FileTail
SYNOPSIS
PSCX Cmdlet: Tails the contents of a file - optionally waiting on new content.
SYNTAX
Get-FileTail [-Path] <String[]> [-Count <Int32>] [-Encoding <EncodingParameter>] [-LineTerminator <String>] [-Wait] [<CommonParameters>]
Get-FileTail [-LiteralPath] <String[]> [-Count <Int32>] [-Encoding <EncodingParameter>] [-LineTerminator <String>] [-Wait] [<CommonParameters>]
DESCRIPTION
This implentation efficiently tails the cotents of a file by reading lines from the end rather then processing the entire file. This behavior is crucial for ef
ficiently tailing large log files and large log files over a network. You can also specify the Wait parameter to have the cmdlet wait and display new content
as it is written to the file. Use Ctrl+C to break out of the wait loop. Note that if an encoding is not specified, the cmdlet will attempt to auto-detect the
encoding by reading the first character from the file. If no character haven't been written to the file yet, the cmdlet will default to using Unicode encoding
. You can override this behavior by explicitly specifying the encoding via the Encoding parameter.
Probably too late for an answere but, try this one
Get-Content <filename> -tail <number of items wanted> -wait
Just some additions to previous answers. There are aliases defined for Get-Content, for example if you are used to UNIX you might like cat, and there are also type and gc. So instead of
Get-Content -Path <Path> -Wait -Tail 10
you can write
# Print whole file and wait for appended lines and print them
cat <Path> -Wait
# Print last 10 lines and wait for appended lines and print them
cat <Path> -Tail 10 -Wait
I have a useful tip on this subject concerning multiple files.
Following a single log file (like 'tail -f' in Linux) with PowerShell 5.2 (Win7 and Win10) is easy (just use "Get-Content MyFile -Tail 1 -Wait"). However, watching MULTIPLE log files at once seems complicated. With PowerShell 7.x+ however, I've found an easy way by using "Foreach-Object -Parrallel". This performs multiple 'Get-Content' commands concurrently. For example:
Get-ChildItem C:\logs\*.log | Foreach-Object -Parallel { Get-Content $_ -Tail 1 -Wait }
Using Powershell V2 and below, get-content reads the entire file, so it was of no use to me. The following code works for what I needed, though there are likely some issues with character encodings. This is effectively tail -f, but it could be easily modified to get the last x bytes, or last x lines if you want to search backwards for line breaks.
$filename = "\wherever\your\file\is.txt"
$reader = new-object System.IO.StreamReader(New-Object IO.FileStream($filename, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [IO.FileShare]::ReadWrite))
#start at the end of the file
$lastMaxOffset = $reader.BaseStream.Length
while ($true)
{
Start-Sleep -m 100
#if the file size has not changed, idle
if ($reader.BaseStream.Length -eq $lastMaxOffset) {
continue;
}
#seek to the last max offset
$reader.BaseStream.Seek($lastMaxOffset, [System.IO.SeekOrigin]::Begin) | out-null
#read out of the file until the EOF
$line = ""
while (($line = $reader.ReadLine()) -ne $null) {
write-output $line
}
#update the last max offset
$lastMaxOffset = $reader.BaseStream.Position
}
I found most of the code to do this here.
I took #hajamie's solution and wrapped it up into a slightly more convenient script wrapper.
I added an option to start from an offset before the end of the file, so you can use the tail-like functionality of reading a certain amount from the end of the file. Note the offset is in bytes, not lines.
There's also an option to continue waiting for more content.
Examples (assuming you save this as TailFile.ps1):
.\TailFile.ps1 -File .\path\to\myfile.log -InitialOffset 1000000
.\TailFile.ps1 -File .\path\to\myfile.log -InitialOffset 1000000 -Follow:$true
.\TailFile.ps1 -File .\path\to\myfile.log -Follow:$true
And here is the script itself...
param (
[Parameter(Mandatory=$true,HelpMessage="Enter the path to a file to tail")][string]$File = "",
[Parameter(Mandatory=$true,HelpMessage="Enter the number of bytes from the end of the file")][int]$InitialOffset = 10248,
[Parameter(Mandatory=$false,HelpMessage="Continuing monitoring the file for new additions?")][boolean]$Follow = $false
)
$ci = get-childitem $File
$fullName = $ci.FullName
$reader = new-object System.IO.StreamReader(New-Object IO.FileStream($fullName, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read, [IO.FileShare]::ReadWrite))
#start at the end of the file
$lastMaxOffset = $reader.BaseStream.Length - $InitialOffset
while ($true)
{
#if the file size has not changed, idle
if ($reader.BaseStream.Length -ge $lastMaxOffset) {
#seek to the last max offset
$reader.BaseStream.Seek($lastMaxOffset, [System.IO.SeekOrigin]::Begin) | out-null
#read out of the file until the EOF
$line = ""
while (($line = $reader.ReadLine()) -ne $null) {
write-output $line
}
#update the last max offset
$lastMaxOffset = $reader.BaseStream.Position
}
if($Follow){
Start-Sleep -m 100
} else {
break;
}
}
try Windows Server 2003 Resource Kit Tools
it contains a tail.exe which can be run on Windows system.
https://www.microsoft.com/en-us/download/details.aspx?id=17657
There have been many valid answers, however, none of them has the same syntax as tail in linux. The following function can be stored in your $Home\Documents\PowerShell\Microsoft.PowerShell_profile.ps1 for persistency (see powershell profiles documentation for more details).
This allows you to call...
tail server.log
tail -n 5 server.log
tail -f server.log
tail -Follow -Lines 5 -Path server.log
which comes quite close to the linux syntax.
function tail {
<#
.SYNOPSIS
Get the last n lines of a text file.
.PARAMETER Follow
output appended data as the file grows
.PARAMETER Lines
output the last N lines (default: 10)
.PARAMETER Path
path to the text file
.INPUTS
System.Int
IO.FileInfo
.OUTPUTS
System.String
.EXAMPLE
PS> tail c:\server.log
.EXAMPLE
PS> tail -f -n 20 c:\server.log
#>
[CmdletBinding()]
[OutputType('System.String')]
Param(
[Alias("f")]
[parameter(Mandatory=$false)]
[switch]$Follow,
[Alias("n")]
[parameter(Mandatory=$false)]
[Int]$Lines = 10,
[parameter(Mandatory=$true, Position=5)]
[ValidateNotNullOrEmpty()]
[IO.FileInfo]$Path
)
if ($Follow)
{
Get-Content -Path $Path -Tail $Lines -Wait
}
else
{
Get-Content -Path $Path -Tail $Lines
}
}
Very basic, but does what you need without any addon modules or PS version requirements:
while ($true) {Clear-Host; gc E:\test.txt | select -last 3; sleep 2 }
It is possible to download all of the UNIX commands compiled for Windows from this GitHub repository: https://github.com/George-Ogden/UNIX
For those admins who live by the axiom that less typing is best, here is the shortest version I can find:
gc filename -wai -ta 10

Resources