Meaningful data from text file powershell - windows

I have a text file which looks something like below, try to get meaningful data by doing replace, using wild card characters, but not quite getting it right.
computer - server1
Volume 1 System Rese NTFS junk data
Volume 2 C NTFS junk data
Volume 3 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF99XY2_APP01_ABCD_LH\
Volume 4 H R5T1_ABCDEF NTFS junk data
Volume 5 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF99XY2_DBE01_EFGH_LH\
Volume 10 R6T3_ABCDEF NTFS junk data
H:\R6X3_ABCDEF99XY2_QRS_IJKL_LH\
Volume 7 R5T2_ABCDEF NTFS junk data
H:\R5X2_ABCDEF99XY2_QWE__MNOP_LH\
Volume 8 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF99XY2_BTE___0DF8_LH\
computer - server2
Volume 1 System Rese NTFS junk data
Volume 2 C NTFS junk data
Volume 3 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF88XY2_APP01_ABCD_LH\
Volume 4 H R5T1_ABCDEF NTFS junk data
Volume 5 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF88XY2_DBE01_EFGH_LH\
Volume 10 R6T3_ABCDEF NTFS junk data
H:\R6X3_ABCDEF88XY2_QRS_IJKL_LH\
Volume 7 R5T2_ABCDEF NTFS junk data
H:\R5X2_ABCDEF88XY2_QWE__MNOP_LH\
Volume 8 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF88XY2_BTE___0DF8_LH\
This is the output I am looking for : 1) get those volumes with a letter next to them(like volumes 2,4).
2) get those volumes with no letter next to them, the line below it which is not a volume line (like volumes 3,5,6). 3) remove those volumes with no letter nor a non-volume line below them (like volume 1).
Eventually, output looks like:
computer1 Volume 2 C
computer1 Volume 3 H:\R5X1_ABCDEF99XY2_APP01_ABCD_LH\
computer1 Volume 4 H
computer1 Volume 5 H:\R5X1_ABCDEF99XY2_DBE01_EFGH_LH\
computer1 Volume 10 H:\R6X3_ABCDEF99XY2_QRS_IJKL_LH\
computer2 Volume 2 C
computer2 Volume 3 H:\R5X1_ABCDEF88XY2_APP01_ABCD_LH\
computer2 Volume 2 H
computer2 Volume 3 H:\R5X1_ABCDEF88XY2_DBE01_EFGH_LH\
computer2 Volume 4 H:\R6X3_ABCDEF88XY2_QRS_IJKL_LH\
computer2 Volume 10 H:\R6X3_ABCDEF88XY2_QRS_IJKL_LH\
Edit Example code from comment:
im a bit stuck on the conditions tht need to be used :
$FileListArray2 = #()
Foreach($file in Get-Content $FilesName | Where-Object {$_ -notmatch "(junk1)|(junk2)"}) {
if($file -match "(Volume)") { }
$FileListArray2 += ,#($file2)
}
$FileListArray2
Please note that here I have left the condition bit empty, i have tried some stuff for that but its not quite working the way i want

This should do it:
UPDATED:
$inputstring = #'
computer - server1
Volume 1 System Rese NTFS junk data
Volume 2 C NTFS junk data
Volume 3 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF99XY2_APP01_ABCD_LH\
Volume 4 H R5T1_ABCDEF NTFS junk data
Volume 5 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF99XY2_DBE01_EFGH_LH\
Volume 10 R6T3_ABCDEF NTFS junk data
H:\R6X3_ABCDEF99XY2_QRS_IJKL_LH\
Volume 7 R5T2_ABCDEF NTFS junk data
H:\R5X2_ABCDEF99XY2_QWE__MNOP_LH\
Volume 8 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF99XY2_BTE___0DF8_LH\
computer - server2
Volume 1 System Rese NTFS junk data
Volume 2 C NTFS junk data
Volume 3 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF88XY2_APP01_ABCD_LH\
Volume 4 H R5T1_ABCDEF NTFS junk data
Volume 5 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF88XY2_DBE01_EFGH_LH\
Volume 10 R6T3_ABCDEF NTFS junk data
H:\R6X3_ABCDEF88XY2_QRS_IJKL_LH\
Volume 7 R5T2_ABCDEF NTFS junk data
H:\R5X2_ABCDEF88XY2_QWE__MNOP_LH\
Volume 8 R5T1_ABCDEF NTFS junk data
H:\R5X1_ABCDEF88XY2_BTE___0DF8_LH\
'#
# Split the input into an array of strings
$lines = $inputstring -split '[\r\n]'
# Setup patterns to match against
$pattern1 = "(Volume \d+)\s+(\w)\s+"
$pattern2 = "(Volume \d+)\s+"
$pattern3 = "\s+(\w:\\.*)"
$pattern4 = "^ computer - (\S+)$"
# Store the current computer that is being viewed
$currentcomputer = "UNKNOWN"
# Loop through each line
for($i = 0; $i -lt $lines.count; $i++)
{
# Start off assuming the current line produces no output
$output = ""
# Look for volumes with drive letters
if($lines[$i] -match $pattern1)
{
$output = $matches[1] + " " + $matches[2]
}
# Look for volumes without drive letters
elseif($lines[$i] -match $pattern2)
{
$volume = $matches[1]
# Consider the next line to see if it has a path
if($lines[$i + 1] -match $pattern3)
{
$output = $volume + " " + $matches[1]
# If the next line had a path then we handled it and need to skip it
$i++
}
}
elseif($lines[$i] -match $pattern4)
{
$currentcomputer = $matches[1]
}
# Write out any output that was produced
if($output -ne "")
{
$currentcomputer + " " + $output
}
}

Related

Packetbeat interface detection

I'm using packbeat to monitor network traffic for a SIEM-like setup with ELK. I'd like to push it to a large number of machines but the setup requires manual identification in packetbeat.yml.
Has any been able to script the process of selecting the appropriate interface to monitor for packetbeat?
I've put this together - which uses 3 separate .yml
ConfigTemplate.yml which contains the rest of the packetbeat.yml minus the interfaces.
Interfaces.yml which is a temp file used to write the interfaces to.
packetbeat.yml which is the final config file packetbeat will use.
The python script should be in the packetbeat directory along with the config .yml's
The only limitation is that it needs python on the host machines - the next stage is to see if it can be done with powershell.
Hope this helps anyone else! Any improvements are welcome!
import subprocess
devices = subprocess.check_output(["powershell.exe", "(./packetbeat.exe devices).count"])
devicesCount = int(devices.decode('utf-8'))
print(devicesCount)
deviceCount = range(devicesCount)
with open('ConfigTemplate.yml', 'r') as original: data1 = original.read()
with open('Interfaces.yml', 'w') as modified:
for i in deviceCount:
modified.write("packetbeat.interfaces.device: " + str(i)+ "\n" )
with open('Interfaces.yml', 'r') as original: data2 = original.read()
with open('Packetbeat.yml', 'w') as modified2: modified2.write("# ================== Set listening interfaces ==================" +"\n"+ data2 + "\n" + data1 + "\n")
Powershell version -
$count = (C:\path\to\packetbeat.exe - devices).count
$line = ''
for($i=0; $i -le ($count-1); $i++){
$line +="packetbeat.interfaces.device:"+" $i `r`n"
}
$line | Out-File -FilePath "C:\path\to\packetbeat\Interfaces.yml"
$configTemplate = Get-Content -Path "C:\path\to\packetbeat\ConfigTemplate.yml"
$interfaces = Get-Content -Path "C:\path\to\packetbeat\Interfaces.yml"
$interfaces + "`r`n" + $configTemplate | Out-File -FilePath "C:\path\to\packetbeat\packet.yml"

Iterate a windows ascii text file, find all instances of {LINE2 1-9999} replace with {LINE2 "line number the code is on"}. Overwrite. Faster?

This code works. I just want to see how much faster someone can make it work.
Backup your Windows 10 batch file in case something goes wrong. Find all instances of string {LINE2 1-9999} and replace with {LINE2 "line number the code is on"}. Overwrite, encoding as ASCII.
If _61.bat is:
TITLE %TIME% NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE LINE2 1243
TITLE %TIME% DOC/SET YQJ8 LINE2 1887
SET ztitle=%TIME%: WINFOLD LINE2 2557
TITLE %TIME% _*.* IN WINFOLD LINE2 2597
TITLE %TIME% %%ZDATE1%% YQJ25 LINE2 3672
TITLE %TIME% FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922
Results:
TITLE %TIME% NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE LINE2 1
TITLE %TIME% DOC/SET YQJ8 LINE2 2
SET ztitle=%TIME%: WINFOLD LINE2 3
TITLE %TIME% _*.* IN WINFOLD LINE2 4
TITLE %TIME% %%ZDATE1%% YQJ25 LINE2 5
TITLE %TIME% FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 6
Code:
Copy-Item $env:windir\_61.bat -d $env:temp\_61.bat
(gc $env:windir\_61.bat) | foreach -Begin {$lc = 1} -Process {
$_ -replace "LINE2 \d*", "LINE2 $lc";
$lc += 1
} | Out-File -Encoding Ascii $env:windir\_61.bat
I expect this to take less than 984 milliseconds. It takes 984 milliseconds. Can you think of anything to speed it up?
The key to better performance in PowerShell code (short of embedding C# code compiled on demand with Add-Type, which may or may not help) is to:
avoid use of cmdlets and the pipeline in general,
especially invocation of a script block ({...}) for each pipeline input object, such as with ForEach-Object and Where-Object
However, it isn't the pipeline per se that is to blame, it is the current inefficient implementation of these cmdlets - see GitHub issue #10982 - and there is a workaround that noticeably improves pipeline performance:
# Faster alternative to:
# 1..10 | ForEach-Object { $_ * 10 }
1..10 | . { process { $_ * 10 } }
# Faster alternative to:
# 1..10 | Where-Object { $_ -gt 5 }
1..10 | . { process { if ($_ -gt 5) { $_ } } }
avoiding the pipeline requires direct use of the .NET framework types as an alternative to cmdlets.
if feasible, use switch statements for array or line-by-line file processing - switch statements generally outperform foreach loops.
To be clear: The pipeline and cmdlets offer clear benefits, so avoiding them should only be done if optimizing performance is a must.
In your case, the following code, which combines the switch statement with direct use of the .NET framework for file I/O seems to offer the best performance - note that the input file is read into memory as a whole, as an array of lines, and a copy of that array with the modified lines is created before it is written back to the input file:
$file = "$env:temp\_61.bat" # must be a *full* path.
$lc = 0
$updatedLines = & { switch -Regex -File $file {
'^(.*? LINE2 )\d+(.*)$' { $Matches[1] + ++$lc + $Matches[2] }
default { ++$lc; $_ } # pass non-matching lines through
} }
[IO.File]::WriteAllLines($file, $updatedLines, [Text.Encoding]::ASCII)
Note:
Enclosing the switch statement in & { ... } is an obscure performance optimization explained in this answer.
If case-sensitive matching is sufficient, as suggested by the sample input, you can improve performance a little more by adding the -CaseSensitive option to the switch command.
In my tests (see below), this provided a more than 4-fold performance improvement in Windows PowerShell relative to your command.
Here's a performance comparison via the Time-Command function:
The commands compared are:
The switch command from above.
A slightly streamlined version of your own command.
A PowerShell Core v6.1+ alternative that uses the -replace operator with the array of lines as the LHS and a scriptblock as the replacement expression.
Instead of a 6-line sample file, a 6,000-line file is used.
100 runs are being averaged.
It's easy to adjust these parameters.
# Sample file content (6 lines)
$fileContent = #'
TITLE %TIME% NO "%zmyapps1%\*.*" ARCHIVE ATTRIBUTE LINE2 1243
TITLE %TIME% DOC/SET YQJ8 LINE2 1887
SET ztitle=%TIME%: WINFOLD LINE2 2557
TITLE %TIME% _*.* IN WINFOLD LINE2 2597
TITLE %TIME% %%ZDATE1%% YQJ25 LINE2 3672
TITLE %TIME% FINISHED. PRESS ANY KEY TO SHUTDOWN ... LINE2 4922
'#
# Determine the full path to a sample file.
# NOTE: Using the *full* path is a *must* when calling .NET methods, because
# the latter generally don't see the same working dir. as PowerShell.
$file = "$PWD/test.bat"
# Create the sample file with the sample content repeated N times.
$repeatCount = 1000 # -> 6,000 lines
[IO.File]::WriteAllText($file, $fileContent * $repeatCount)
# Warm up the file cache and count the lines.
$lineCount = [IO.File]::ReadAllLines($file).Count
# Define the commands to compare as an array of scriptblocks.
$commands =
{ # switch -Regex -File + [IO.File]::Read/WriteAllLines()
$i = 0
$updatedLines = & { switch -Regex -File $file {
'^(.*? LINE2 )\d+(.*)$' { $Matches[1] + ++$i + $Matches[2] }
default { ++$lc; $_ }
} }
[IO.File]::WriteAllLines($file, $updatedLines, [text.encoding]::ASCII)
},
{ # Get-Content + -replace + Set-Content
(Get-Content $file) | ForEach-Object -Begin { $i = 1 } -Process {
$_ -replace "LINE2 \d*", "LINE2 $i"
++$i
} | Set-Content -Encoding Ascii $file
}
# In PS Core v6.1+, also test -replace with a scriptblock operand.
if ($PSVersionTable.PSVersion.Major -ge 6 -and $PSVersionTable.PSVersion.Minor -ge 1) {
$commands +=
{ # -replace with scriptblock + [IO.File]::Read/WriteAllLines()
$i = 0
[IO.File]::WriteAllLines($file,
([IO.File]::ReadAllLines($file) -replace '(?<= LINE2 )\d+', { (++$i) }),
[text.encoding]::ASCII
)
}
} else {
Write-Warning "Skipping -replace-with-scriptblock command, because it isn't supported in this PS version."
}
# How many runs to average.
$runs = 100
Write-Verbose -vb "Averaging $runs runs with a $lineCount-line file of size $('{0:N2} MB' -f ((Get-Item $file).Length / 1mb))..."
Time-Command -Count $runs -ScriptBlock $commands
Here are sample results from my Windows 10 machine (the absolute timings aren't important, but hopefully the relative performance show in in the Factor column is somewhat representative); the PowerShell Core version used is v6.2.0-preview.4
# Windows 10, Windows PowerShell v5.1
WARNING: Skipping -replace-with-scriptblock command, because it isn't supported in this PS version.
VERBOSE: Averaging 100 runs with a 6000-line file of size 0.29 MB...
Factor Secs (100-run avg.) Command
------ ------------------- -------
1.00 0.108 # switch -Regex -File + [IO.File]::Read/WriteAllLines()...
4.22 0.455 # Get-Content + -replace + Set-Content...
# Windows 10, PowerShell Core v6.2.0-preview 4
VERBOSE: Averaging 100 runs with a 6000-line file of size 0.29 MB...
Factor Secs (100-run avg.) Command
------ ------------------- -------
1.00 0.101 # switch -Regex -File + [IO.File]::Read/WriteAllLines()…
1.67 0.169 # -replace with scriptblock + [IO.File]::Read/WriteAllLines()…
4.98 0.503 # Get-Content + -replace + Set-Content…

Windows Volume without a Partition

Background:
I'm working on a powershell script to automate installation from a USB stick via WinPE. Because the target systems have several drives, each possibly having a couple partitions, Windows quickly runs out of drive letters. Part of my script unassigns all drive letters, then reassigns only the necessary disks. Right now, I assign hard-coded letters to certain partitions, but I've run into a problem with one of the letters not being unassigned.
The issue is that I somehow have a volume with an assigned drive letter, yet there's apparently no underlying partition, and since Remove-PartitionAccessPath requires a partition object, there's no way to do it from powershell (without resorting to diskpart).
Here's the output of diskpart - you can see the selected disk has no partitions, yet somehow has a volume:
Microsoft DiskPart version 10.0.15063.0
Copyright (C) Microsoft Corporation.
On computer: MININT-6GI0UNM
DISKPART> list disk
Disk ### Status Size Free Dyn Gpt
-------- ------------- ------- ------- --- ---
Disk 0 Online 5589 GB 0 B *
Disk 1 Online 5589 GB 0 B *
Disk 2 Online 5589 GB 0 B *
Disk 3 Online 5589 GB 0 B *
Disk 4 Online 5589 GB 0 B *
Disk 5 Online 5589 GB 0 B *
Disk 6 Online 5589 GB 0 B *
Disk 7 Online 5589 GB 0 B *
Disk 8 Online 5589 GB 0 B *
Disk 9 Online 5589 GB 0 B *
Disk 10 Online 5589 GB 0 B *
Disk 11 Online 5589 GB 0 B *
Disk 12 Online 447 GB 0 B *
Disk 13 Online 447 GB 0 B *
Disk 14 Online 232 GB 0 B *
Disk 15 Online 29 GB 29 GB
Disk 16 Online 28 GB 0 B *
DISKPART> sel disk 15
Disk 15 is now the selected disk.
DISKPART> list part
There are no partitions on this disk to show.
DISKPART> detail disk
ATA Hypervisor USB Device
Disk ID: E0623CE6
Type : USB
Status : Online
Path : 0
Target : 0
LUN ID : 0
Location Path : UNAVAILABLE
Current Read-only State : No
Read-only : No
Boot Disk : No
Pagefile Disk : No
Hibernation File Disk : No
Crashdump Disk : No
Clustered Disk : No
Volume ### Ltr Label Fs Type Size Status Info
---------- --- ----------- ----- ---------- ------- --------- --------
Volume 20 E Removable 0 B Unusable
DISKPART>
Here's what happens when I try to remove the letter from powershell:
PS X:\sources> Get-Volume -DriveLetter E | Remove-PartitionAccessPath -AccessPath "E:"
Remove-PartitionAccessPath : The input object cannot be bound to any parameters for the command either because the
command does not take pipeline input or the input and its properties do not match any of the parameters that take
pipeline input.
At line:1 char:29
+ ... t-Volume -DriveLetter E | Remove-PartitionAccessPath -AccessPath "E:"
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidArgument: (MSFT_Volume (Ob...rosoft/Wind...):PSObject) [Remove-PartitionAccessPat
h], ParameterBindingException
+ FullyQualifiedErrorId : InputObjectNotBound,Remove-PartitionAccessPath
PS X:\sources> Get-Volume -DriveLetter E | fl *
OperationalStatus : Unknown
HealthStatus : Healthy
DriveType : Removable
FileSystemType : Unknown
DedupMode : NotAvailable
ObjectId : {1}\\MININT-6GI0UNM\root/Microsoft/Windows/Storage/Providers_v2\WSP_Volume.ObjectId="{63585070-
3cd2-11e7-b877-806e6f6e6963}:VO:\\?\Volume{635850c4-3cd2-11e7-b877-806e6f6e6963}\"
PassThroughClass :
PassThroughIds :
PassThroughNamespace :
PassThroughServer :
UniqueId : \\?\Volume{635850c4-3cd2-11e7-b877-806e6f6e6963}\
AllocationUnitSize : 0
DriveLetter : E
FileSystem :
FileSystemLabel :
Path : \\?\Volume{635850c4-3cd2-11e7-b877-806e6f6e6963}\
Size : 0
SizeRemaining : 0
PSComputerName :
CimClass : ROOT/Microsoft/Windows/Storage:MSFT_Volume
CimInstanceProperties : {ObjectId, PassThroughClass, PassThroughIds, PassThroughNamespace...}
CimSystemProperties : Microsoft.Management.Infrastructure.CimSystemProperties
PS X:\sources> Get-Volume -DriveLetter E | Get-Partition
PS X:\sources> $null -eq (Get-Volume -DriveLetter E | Get-Partition)
True
Powershell version table:
PS X:\sources> $PSVersionTable
Name Value
---- -----
PSVersion 5.1.15063.0
PSEdition Desktop
PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...}
BuildVersion 10.0.15063.0
CLRVersion 4.0.30319.42000
WSManStackVersion 3.0
PSRemotingProtocolVersion 2.3
SerializationVersion 1.1.0.1
I can try to get more details about the contents of the disk in question if necessary.
What could be causing this? Is there a powershell workaround?
Note: I realize it would probably be better to have Windows pick drive letters instead of hard-coding them, but I'm still curious about the mysterious volume.
Try this:
Get-Volume -Drive 'E' | Get-Partition | Remove-PartitionAccessPath -AccessPath 'E:\'
Reference: https://blogs.technet.microsoft.com/heyscriptingguy/2015/12/07/powertip-use-powershell-to-remove-drive-letter/

Windows / NTFS: Two files with identical long-names in the same directory?

I have been a lurker at stackoverflow.com for many years (great site and users here), but never had the need to ask a question. Now the time has come :-) Let me begin:
OS: x64 Windows 8.0 to Windows 10 (15063.14) (the issue exists since years, but I have never pursued it fully yet, so we can exclude that it is specific to a specific Windows version)
FS: NTFS
Issue: 2 files with the same (long) name in the same directory and I cannot figure out how this is even possible. This happens to me since years whenever I manually upgrade my Email client. The main .EXE file of it (MailClient.exe) is never asking for replacement if copying the new one over to the same directory. Instead they are both placed there, with the exact same long name.
The issue has nothing to do with a specific directory, I can copy around both .EXE files to freshly created directories on the NTFS drive without issues (also getting no "overwrite" question there).
Let me show you:
C:\temp\2>dir
Volume in drive C is SSD 840 Pro
Volume Serial Number is 0C6D-D489
Directory of C:\temp\2
13.04.2017 02:29 <DIR> .
13.04.2017 02:29 <DIR> ..
21.10.2016 17:10 24.742.760 MailClient.exe
27.12.2016 03:26 24.911.872 MailCliеnt.exe
2 File(s) 49.654.632 bytes
2 Dir(s) 78.503.038.976 bytes free
However, if doing a dir /x, this comes up:
C:\temp\2>dir /x
Volume in drive C is SSD 840 Pro
Volume Serial Number is 0C6D-D489
Directory of C:\temp\2
13.04.2017 02:29 <DIR> .
13.04.2017 02:29 <DIR> ..
21.10.2016 17:10 24.742.760 MAILCL~2.EXE MailClient.exe
27.12.2016 03:26 24.911.872 MAILCL~1.EXE MailCliеnt.exe
2 File(s) 49.654.632 bytes
2 Dir(s) 78.503.038.976 bytes free
So they obviously have a different 8.3 name, OK, but the exact same long name. Here is another screenshot of the situation. Both files show the same location within the Windows "properties" dialog (right click) too. Unfortunately I am not allowed to post images just yet (it seems) - just tried. So you will have to take my word.
I cannot figure out how this is possible and this is bugging me ;) As soon as I rename both files for example to 1.exe, Windows starts telling me that there is already a file with that name in the same directory. So it obviously has something to do with the filename, but they are both exactly identical, no extra spaces, nothing, as you can see from the DIR command.
I´ve also tried to rename them and re-wrote the exact wording "MailCient.exe" manually for both, to make sure the characters are EXCACTLY the same, Windows still won´t complain, they both go there once again under the same name. However, renaming them to "Mail.exe" and "Mail.exe" will NOT work, then Windows is saying that another file with that name already exists. However, naming them both back to "MailClient.exe" is just absolutely fine, no complains by Windows with that.
Another fun fact about this, if I dir for mailclient.exe directly, this happens:
C:\temp\2>dir mailclient.exe
Volume in drive C is SSD 840 Pro
Volume Serial Number is 0C6D-D489
Directory of C:\temp\2
21.10.2016 17:10 24.742.760 MailClient.exe
1 File(s) 24.742.760 bytes
0 Dir(s) 78.501.998.592 bytes free
However, if looking for *.exe, this happens:
C:\temp\2>dir *.exe
Volume in drive C is SSD 840 Pro
Volume Serial Number is 0C6D-D489
Directory of C:\temp\2
21.10.2016 17:10 24.742.760 MailClient.exe
27.12.2016 03:26 24.911.872 MailCliеnt.exe
2 File(s) 49.654.632 bytes
0 Dir(s) 78.501.990.400 bytes free
This yields also interesting results:
C:\temp\2>ren mailclient.exe *.bak
C:\temp\2>dir
Volume in drive C is SSD 840 Pro
Volume Serial Number is 0C6D-D489
Directory of C:\temp\2
13.04.2017 02:50 <DIR> .
13.04.2017 02:50 <DIR> ..
21.10.2016 17:10 24.742.760 MailClient.bak
27.12.2016 03:26 24.911.872 MailCliеnt.exe
2 File(s) 49.654.632 bytes
2 Dir(s) 78.501.990.400 bytes free
And back:
C:\temp\2>ren mailclient.bak MailClient.exe
C:\temp\2>dir
Volume in drive C is SSD 840 Pro
Volume Serial Number is 0C6D-D489
Directory of C:\temp\2
13.04.2017 02:51 <DIR> .
13.04.2017 02:51 <DIR> ..
21.10.2016 17:10 24.742.760 MailClient.exe
27.12.2016 03:26 24.911.872 MailCliеnt.exe
2 File(s) 49.654.632 bytes
2 Dir(s) 78.501.982.208 bytes free
I´ve also checked permissions on the files and took ownership, it changes nothing. Additionally I´ve cleared the NTFS Journal and even the transaction log + run chkdsk, which reveals no errors either.
Any ideas on this mysterious situation? What am I missing?
Thanks so much:)
UPDATE #1:
I´ve just tried this: going to Windows explorer and renaming both files after each other by truncating their names. So I first renamed the first "MailClient.exe" to "MailClien.exe", then the seconds "MailClient.exe" to "MailClien.exe". Again, no message by Windows that they have the same name, it just renamed both fine. I then continued to "MailClie.exe". Worked.
However, as soon as I tried to renamed both to "MailCli.exe", Windows complained and told me that there is already another file with that name. Trying to rename both back from there to "MailClient.exe" also does not work, just for one of them, because then Windows says (and right so too) that a file with that name already exists. So it seems to come down to the "e" possibly having another ANSI-character in both filenames? I, however, wouldn´t know of another one for "e", or am I missing something?
Harry Johnston is right: one of the filenames contains a Unicode character that just looks the same as an ANSI character.
Read Naming Files, Paths, and Namespaces:
On newer file systems, such as NTFS, exFAT, UDFS, and FAT32, Windows
stores the long file names on disk in Unicode, which means that the
original long file name is always preserved. This is true even if a
long file name contains extended characters, regardless of the code
page that is active during a disk read or write operation.
Use the following PowerShell script 43381802b.ps1 to detect and show non-ANSI file names (see different calls below):
param( [string[]]$Path = '.',
[switch]$Cpp, ### list any non-ANSI character in file names like a C++ literal
### i.e. a prefix \u followed by a four digit Unicode code point
[switch]$All ### list all files including pure ANSI-encoded file names
)
Set-StrictMode -Version latest
$strArr = Get-ChildItem -path $Path
$arrDiff = #()
for ($i=0; $i -lt $strArr.Count; $i++) {
$strDiff = 'ANSI'
$strName = ''
$auxName = $strArr[$i].Name
for ( $k=0; $k -lt $auxName.Length; $k++ ) {
if ( [int][char]$auxName[$k] -gt 255 ) {
$strDiff = 'UCS2'
$strName += '\u{0:X4}' -f [int][char]$auxName[$k]
} else {
$strName += $auxName[$k]
}
}
if ( $All.IsPresent -or $strDiff -eq 'UCS2' ) {
$strArr[$i] | Add-Member NoteProperty Code $strDiff
$strArr[$i] | Add-Member NoteProperty CppName $strName
$arrDiff += $strArr[$i]
}
}
if ( $Cpp.IsPresent ) {
$arrDiff | Select-Object -Property Code, Mode, LastWriteTime, Length, CppName | ft
} else {
$arrDiff | Select-Object -Property Code, Mode, LastWriteTime, Length, Name | ft
}
Output:
PS D:\PShell> .\SO\43381802b.ps1 'C:\testC\43381802'
Code Mode LastWriteTime Length Name
---- ---- ------------- ------ ----
UCS2 -a---- 02/05/2017 11:47:53 317 MailCliеnt.txt
UCS2 -a---- 02/05/2017 11:49:04 317 МailClient.txt
UCS2 -a---- 02/05/2017 11:50:16 399 МailCliеnt.txt
PS D:\PShell> .\SO\43381802b.ps1 'C:\testC\43381802' -Cpp
Code Mode LastWriteTime Length CppName
---- ---- ------------- ------ -------
UCS2 -a---- 02/05/2017 11:47:53 317 MailCli\u0435nt.txt
UCS2 -a---- 02/05/2017 11:49:04 317 \u041CailClient.txt
UCS2 -a---- 02/05/2017 11:50:16 399 \u041CailCli\u0435nt.txt
PS D:\PShell> .\SO\43381802b.ps1 'C:\testC\43381802' -Cpp -All
Code Mode LastWriteTime Length CppName
---- ---- ------------- ------ -------
ANSI -a---- 02/05/2017 11:44:05 235 MailClient.txt
UCS2 -a---- 02/05/2017 11:47:53 317 MailCli\u0435nt.txt
UCS2 -a---- 02/05/2017 11:49:04 317 \u041CailClient.txt
UCS2 -a---- 02/05/2017 11:50:16 399 \u041CailCli\u0435nt.txt
Use the following 43381802a.ps1 script to get more info about non-ANSI characters (see the first call bellow) and their position in file names (see the latter call bellow with -Detail switch):
param( [string[]] $strArr = #('ΗGreek', 'НCyril', 'HLatin'),
[switch]$Detail )
Set-StrictMode -Version latest
$auxArr = #()
if ( ( Get-Command -Name Get-CharInfo -ErrorAction SilentlyContinue ) -and
( -not $Detail.IsPresent ) ) {
$auxArr = $strArr | Get-CharInfo |
Where-Object { [int]$_.Codepoint.Replace('U+', '0x') -ge 128 }
} else {
foreach ($strStr in $strArr) {
for ($i = 0; $i -lt $strStr.Length; $i++ ) {
if ( [int][char]$strStr[$i] -ge 128 ) {
$auxArr += [PSCustomObject] #{
Char = $strStr[$i]
CodePoint = 'U+{0:x4}' -f [int][char]$strStr[$i]
Category = $i + 1 ### 1-based index
Description = $strStr ### string itself
}
}
}
}
}
$auxArr
Output:
PS D:\PShell> .\SO\43381802a.ps1 ( Get-childitem -path 'C:\testC\43381802' ).Name
Char CodePoint Category Description
---- --------- -------- -----------
е U+0435 LowercaseLetter Cyrillic Small Letter Ie
М U+041C UppercaseLetter Cyrillic Capital Letter Em
М U+041C UppercaseLetter Cyrillic Capital Letter Em
е U+0435 LowercaseLetter Cyrillic Small Letter Ie
PS D:\PShell> .\SO\43381802a.ps1 ( Get-childitem -path 'C:\testC\43381802' ).Name -detail
Char CodePoint Category Description
---- --------- -------- -----------
е U+0435 8 MailCliеnt.txt
М U+041c 1 МailClient.txt
М U+041c 1 МailCliеnt.txt
е U+0435 8 МailCliеnt.txt
Tested on files:
==> dir /-C /X /A-D C:\testC\43381802\
Volume in drive C has no label.
Volume Serial Number is …
Directory of C:\testC\43381802
02/05/2017 11:44 235 MAILCL~1.TXT MailClient.txt
02/05/2017 11:47 317 MAILCL~2.TXT MailCliеnt.txt
02/05/2017 11:49 317 AILCLI~1.TXT МailClient.txt
02/05/2017 11:50 399 AILCLI~2.TXT МailCliеnt.txt
4 File(s) 1268 bytes
0 Dir(s) 69914857472 bytes free
==>

Get last n lines or bytes of a huge file in Windows (like Unix's tail). Avoid time consuming options

I need to retrieve the last n lines of huge files (1-4 Gb), in Windows 7.
Due to corporate restrictions, I cannot run any command that is not built-in.
The problem is that all solutions I found appear to read the whole file, so they are extremely slow.
Can this be accomplished, fast?
Notes:
I managed to get the first n lines, fast.
It is ok if I get the last n bytes. (I used this https://stackoverflow.com/a/18936628/2707864 for the first n bytes).
Solutions here Unix tail equivalent command in Windows Powershell did not work.
Using -wait does not make it fast. I do not have -tail (and I do not know if it will work fast).
PS: There are quite a few related questions for head and tail, but not focused on the issue of speed. Therefore, useful or accepted answers there may not be useful here. E.g.,
Windows equivalent of the 'tail' command
CMD.EXE batch script to display last 10 lines from a txt file
Extract N lines from file using single windows command
https://serverfault.com/questions/490841/how-to-display-the-first-n-lines-of-a-command-output-in-windows-the-equivalent
powershell to get the first x MB of a file
https://superuser.com/questions/859870/windows-equivalent-of-the-head-c-command
If you have PowerShell 3 or higher, you can use the -Tail parameter for Get-Content to get the last n lines.
Get-content -tail 5 PATH_TO_FILE;
On a 34MB text file on my local SSD, this returned in 1 millisecond vs. 8.5 seconds for get-content |select -last 5
How about this (reads last 8 bytes for demo):
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-8, 'End') | Out-Null
for ($i = 0; $i -lt 8; $i++)
{
$fs.ReadByte()
}
UPDATE. To interpret bytes as string (but be sure to select correct encoding - here UTF8 is used):
$N = 8
$fpath = "C:\10GBfile.dat"
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$N, [System.IO.SeekOrigin]::End) | Out-Null
$buffer = new-object Byte[] $N
$fs.Read($buffer, 0, $N) | Out-Null
$fs.Close()
[System.Text.Encoding]::UTF8.GetString($buffer)
UPDATE 2. To read last M lines, we'll be reading the file by portions until there are more than M newline char sequences in the result:
$M = 3
$fpath = "C:\10GBfile.dat"
$result = ""
$seq = "`r`n"
$buffer_size = 10
$buffer = new-object Byte[] $buffer_size
$fs = [IO.File]::OpenRead($fpath)
while (([regex]::Matches($result, $seq)).Count -lt $M)
{
$fs.Seek(-($result.Length + $buffer_size), [System.IO.SeekOrigin]::End) | Out-Null
$fs.Read($buffer, 0, $buffer_size) | Out-Null
$result = [System.Text.Encoding]::UTF8.GetString($buffer) + $result
}
$fs.Close()
($result -split $seq) | Select -Last $M
Try playing with bigger $buffer_size - this ideally is equal to expected average line length to make fewer disk operations. Also pay attention to $seq - this could be \r\n or just \n.
This is very dirty code without any error handling and optimizations.
When the file is already opened, it's better to use
Get-Content $fpath -tail 10
because of "exception calling "OpenRead" with "1" argument(s): "The process cannot access the file..."
This is not an answer, but a large comment as reply to sancho.s' answer.
When you want to use small PowerShell scripts from a Batch file, I suggest you to use the method below, that is simpler and allows to keep all the code in the same Batch file:
#PowerShell ^
$fpath = %2; ^
$fs = [IO.File]::OpenRead($fpath); ^
$fs.Seek(-%1, 'End') ^| Out-Null; ^
$mystr = ''; ^
for ($i = 0; $i -lt %1; $i++) ^
{ ^
$mystr = ($mystr) + ([char[]]($fs.ReadByte())); ^
} ^
Write-Host $mystr
%End PowerShell%
With the awesome answer by Aziz Kabyshev, which solves the issue of speed, and with some googling, I ended up using this script
$fpath = $Args[1]
$fs = [IO.File]::OpenRead($fpath)
$fs.Seek(-$Args[0], 'End') | Out-Null
$mystr = ''
for ($i = 0; $i -lt $Args[0]; $i++)
{
$mystr = ($mystr) + ([char[]]($fs.ReadByte()))
}
$fs.Close()
Write-Host $mystr
which I call from a batch file containing
#PowerShell -NoProfile -ExecutionPolicy Bypass -Command "& '.\myscript.ps1' %1 %2"
(thanks to How to run a PowerShell script from a batch file).
Get last n bytes of a file:
set file="C:\Covid.mp4"
set n=7
copy /b %file% tmp
for %i in (tmp) do set /a m=%~zi-%n%
FSUTIL file seteof tmp %m%
fsutil file createnew temp 1
FSUTIL file seteof temp %n%
type temp >> tmp
fc /b tmp %file% | more +1 > temp
REM problem parsing file with byte offsets in hex from fc, to be converted to decimal offsets before output
type nul > tmp
for /f "tokens=1-3 delims=: " %i in (temp) do set /a 0x%i >> tmp & set /p=": " <nul>> tmp & echo %j %k >> tmp
set /a n=%m%+%n%-1
REM output
type nul > temp
for /l %j in (%m%,1,%n%) do (find "%j: "< tmp || echo doh: la 00)>> temp
(for /f "tokens=3" %i in (temp) do set /p=%i <nul) & del tmp & del temp
Tested on Win 10 cmd Surface Laptop 1
Result: 1.43 GB file processed in 10 seconds

Resources