remove last n characters of a file using batch - windows

I am having around 500 files in a folder. I am trying to concatenate a 4KB data(stored as a file) to all the files
using batch "for /r %i in (*) do type 4KB file.txt >> %i
Now I want them to revert to orignal state. Few files are around 14GB . While trying to read , it takes a lot of time to open.
Please let me know how can I revert them back to original state.

First off, the difficulty of this task depends on the actual string you appended to the file.
Get-ChildItem [your path here] | ForEach-Object {
$TempVar = Get-Content $_ | Select-string -Exclude "Something that uniquely matches the string you want to remove"
$TempVar | Set-Content $_ }
Alternatively:
Get-ChildItem [your path here] | ForEach-Object {
$Length = ($_.ToCharArray) - 4096 #As you have 4KB, exactly?
$Newfile = ($_.ToCharArray() | Select -first $Length) -join ""
$NewFile | Set-content $_
}
Both attempts can be considered harmful as this may affect encoding, file structure etc - I just don't have enough time to check on that. If the files are of any importance, better back them up.

Related

Script in power shell to add checksum as alternate data stream fails with some file names but otherwise works

I want to check files for integrity with a checksum. To make it easier I put the hash into an alternate data stream of the file. When someone alters the file I can verify this with the checksum.
However, when I add a data stream the file's LastWriteTime gets updated, so I added functionality to reverse it.
It works like a charm - mostly. But it fails with some files, about 5%. I have no idea why. It looks like it fails with file names that contain spaces or extra dots, but many other that have spaces and multiple dots in the file name work just fine.
Does anyone know what's going on, how to prevent these failures or how to improve the code?
Thanks!
The code:
$filenames = Get-ChildItem *.xl* -Recurse | % { $_.FullName }
foreach( $filename in $filenames ) { ForEach-Object { $timelwt = Get-ItemProperty $filename | select -expand LastWriteTime | select -expand ticks } {add-content -stream MD5 -value (Get-FileHash -a md5 $filename).hash $filename } { Set-ItemProperty $filename -Name LastWriteTime -Value $timelwt}}```
Your code can be reduced to this:
Get-ChildItem *.xl* -Recurse | ForEach-Object {
$lastWriteTime = $_.LastWriteTime
$_ | Add-Content -Stream MD5 -Value ($_ | Get-FileHash -a md5).Hash
$_.LastWriteTime = $lastWriteTime
}
Get-ChildItem with the -Filter you have in place will return FileInfo objects, which have a settable LastWriteTime property, there is no reason for using Get-ItemProperty nor Set-ItemProperty over them.
As for, why your code could be failing, the likeable explanation is that you have some file paths with wildcard metacharacters, and since you're not using -LiteralPath, the cmdlets are defaulting to the -Path parameter (which allows wildcard metacharacters).
As aside, I would personally recommend you to create a separate checksum file for the files instead of adding an alternative data stream.

Rename multiple files in a directory by looping through a list of names in a .txt / .csv file for a partial match

I'm trying to automate renaming of many multiple files in a Windows 7 directory. I need to search a source index file (.txt or .csv which is a list of extended file names) and, where there is a partial match to the original file name, copy the first 12 characters (of the relevant string in the index file) and rename the original file accordingly (and preserving original file extension).
e.g.
(a) Files currently in the Windows directory are named as follows (hundreds of files):
23456abc.doc
76543cab.doc
92837bca.doc
(b) Values in the .txt/.csv file as follows (hundreds of values - NOTE: these do not have file extensions):
BetterName1.RandomText1.23456abc.MoreRandomText1
BetterName2.RandomText2.76543cab.MoreRandomText2
BetterName3.RandomText3.92837bca.MoreRandomText3
(c) Desired Result is for the files to be auto renamed as follows:
[by searching for filename in (a) within the list of values in (b) and, where there is a match, returning the first 12 characters as the new filename whilst preserving the original file extension]
BetterName1.doc
BetterName2.doc
BetterName3.doc
NOTE: My preference is to use an Index file for the look-up that is in .txt format. However in need I can also use a .csv
I have never used PowerShell before and am new to Windows batch scripting. I have searched around and tried to cobble together snippets of code into a Windows batch script (also tried a PowerShell script) to achieve this but my knowledge in this area is seriously lacking so unfortunately I'm still struggling away at square one.
Any assistance would be greatly appreciated. Thank you in advance.
P.S. Here is a PowerShell script that I tried to get working but to no avail.
$fdPath = 'C:\TEST\Data'
$sourcelistFiles = Get-ChildItem -Path $FDPATH\*.txt | ForEach-Object {$_.user } FullName
$findReplaceList = Import-Csv -Path $FDPATH\AllNames.csv
$totalitems = $sourcelistFiles.count
$currentrow = 0
foreach ($sourcelistFile in $sourcelistFiles)
{
$currentrow += 1
Write-Progress -Activity "Processing record $currentrow of $totalitems" -Status "Progress:" -PercentComplete (($currentrow / $totalitems) * 100)
[string] $txtSourceListFile = Get-Content $sourcelistFile | Out-String
ForEach ($findReplaceItem in $findReplaceList)
{
$txtSourceListFile = $txtSourceListFile -replace "$($findReplaceitem.FindString)", "$($findReplaceitem.ReplaceString)"
}
$txtSourceListFile | Set-Content ($sourcelistFile)
-NoNewLine
}
$FDPATH = 'C:\TEST\Data'
foreach($obj in (Import-Csv -Path $FDPATH\AllNames.csv)){
foreach($thing in $(gci -Path $FDPATH\*.txt)){
if("123.hash.avocado" -match $thing.basename){$ret = $thing.fullname}
}
$stuff = $obj -split "."
ren -Path $ret -NewName $stuff[0]
}
See if this works, it iterates through the csv then iterates through the directory to see if the directory's name is in the csv's line that is being iterated, then sets a variable to be the fullname of the file and renames it to the first name before the period.
Import-CSV LISTA.csv -Header newFileName | % { Copy-Item -Path archivo_convert.pdf -Destination "$($_.newfilename).pdf" }

PowerShell Format-Table -AutoSize not Producing an Output File

When running the following line in PowerShell including the "Format-Table -AutoSize", an empty output file is generated:
Get-ChildItem -Recurse | select FullName,Length | Format-Table -AutoSize | Out-File filelist.txt
The reason I need the output file to be AutoSized is because longer filenames from the directoy are being trunacted. I am trying to pull all Filenames and File Sizes for all files within a folder and subfolders. When removing the -Autosize element, an output file is generated with truncated file names:
Get-ChildItem -Recurse | select FullName,Length | Out-File filelist.txt
Like AdminOfThings commented, use Export-CSV to get the untruncated values of your object.
Get-ChildItem -Recurse | select FullName,Length | Export-CSv -path $myPath -NoTypeInformation
I do not use Out-File much at all, and only use Format-Table/Format-List for interactive scripts. If I want to write data to a file, Select-Object Column1,Column2 | Sort-Object Column1| Export-CSV lets me select the properties of the object I am exporting that I want to export, and sort the records as needed. you can change the delimiter from a comma to tab/pipe/whatever else you may need.
While the other answer may address the issue, you may have other reasons for wanting to use Out-File. Out-File has a "Width" parameter. If this is not set, PowerShell defaults to 80 characters - hence your issue. This should do the trick:
Get-ChildItem -Recurse | select FullName,Length | Out-File filelist.txt -Width 250 (or any other value)
The Format-* commandlets in PowerShell are only intended to be used in the console. They do not actually produce output that can be piped to other commandlets.
The usual approach to get the data out is with Export-Csv. CSV files are easily imported into other scripts or spreadsheets.
If you really need to output a nicely formatted text file you can use .Net composite formatting with the -f (format) operator. This works similarly to printf() in C. Here is some sample code:
# Get the files for the report
$files = Get-ChildItem $baseDirectory -Recurse
# Path column width
$nameWidth = $files.FullName |
ForEach-Object { $_.Length } |
Measure-Object -Maximum |
Select-Object -ExpandProperty Maximum
# Size column width
$longestFileSize = $files |
ForEach-Object { $_.Length.tostring().Length } |
Measure-Object -Maximum |
Select-Object -ExpandProperty Maximum
# Have to consider that some directories will have no files with
# length strings longer than "Size (Bytes)"
$sizeWidth = [System.Math]::Max($longestFileSize, "Size (Bytes)".Length)
# Right-align paths, left-align file size
$formatString = "{0,-$nameWidth} {1,$sizeWidth}"
# Build the report and write it to a file
# ArrayList are much more efficient than using += with arrays
$lines = [System.Collections.ArrayList]::new($files.Length + 3)
# The [void] cast are just to prevent ArrayList.add() from cluttering the
# console with the returned indices
[void]$lines.Add($formatString -f ("Path", "Size (Bytes)"))
[void]$lines.Add($formatString -f ("----", "------------"))
foreach ($file in $files) {
[void]$lines.Add($formatString -f ($file.FullName, $file.Length.ToString()))
}
$lines | Out-File "Report.txt"

Sort very large text file in PowerShell

I have standard Apache log files, between 500Mb and 2GB in size. I need to sort the lines in them (each line starts with a date yyyy-MM-dd hh:mm:ss, so no treatment necessary for sorting.
The simplest and most obvious thing that comes to mind is
Get-Content unsorted.txt | sort | get-unique > sorted.txt
I am guessing (without having tried it) that doing this using Get-Content would take forever in my 1GB files. I don't quite know my way around System.IO.StreamReader, but I'm curious if an efficient solution could be put together using that?
Thanks to anyone who might have a more efficient idea.
[edit]
I tried this subsequently, and it took a very long time; some 10 minutes for 400MB.
Get-Content is terribly ineffective for reading large files. Sort-Object is not very fast, too.
Let's set up a base line:
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$c = Get-Content .\log3.txt -Encoding Ascii
$sw.Stop();
Write-Output ("Reading took {0}" -f $sw.Elapsed);
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$s = $c | Sort-Object;
$sw.Stop();
Write-Output ("Sorting took {0}" -f $sw.Elapsed);
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$u = $s | Get-Unique
$sw.Stop();
Write-Output ("uniq took {0}" -f $sw.Elapsed);
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$u | Out-File 'result.txt' -Encoding ascii
$sw.Stop();
Write-Output ("saving took {0}" -f $sw.Elapsed);
With a 40 MB file having 1.6 million lines (made of 100k unique lines repeated 16 times) this script produces the following output on my machine:
Reading took 00:02:16.5768663
Sorting took 00:02:04.0416976
uniq took 00:01:41.4630661
saving took 00:00:37.1630663
Totally unimpressive: more than 6 minutes to sort tiny file. Every step can be improved a lot. Let's use StreamReader to read file line by line into HashSet which will remove duplicates, then copy data to List and sort it there, then use StreamWriter to dump results back.
$hs = new-object System.Collections.Generic.HashSet[string]
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$reader = [System.IO.File]::OpenText("D:\log3.txt")
try {
while (($line = $reader.ReadLine()) -ne $null)
{
$t = $hs.Add($line)
}
}
finally {
$reader.Close()
}
$sw.Stop();
Write-Output ("read-uniq took {0}" -f $sw.Elapsed);
$sw = [System.Diagnostics.Stopwatch]::StartNew();
$ls = new-object system.collections.generic.List[string] $hs;
$ls.Sort();
$sw.Stop();
Write-Output ("sorting took {0}" -f $sw.Elapsed);
$sw = [System.Diagnostics.Stopwatch]::StartNew();
try
{
$f = New-Object System.IO.StreamWriter "d:\result2.txt";
foreach ($s in $ls)
{
$f.WriteLine($s);
}
}
finally
{
$f.Close();
}
$sw.Stop();
Write-Output ("saving took {0}" -f $sw.Elapsed);
this script produces:
read-uniq took 00:00:32.2225181
sorting took 00:00:00.2378838
saving took 00:00:01.0724802
On same input file it runs more than 10 times faster. I am still surprised though it takes 30 seconds to read file from disk.
I've grown to hate this part of windows powershell, it is a memory hog on these larger files. One trick is to read the lines [System.IO.File]::ReadLines('file.txt') | sort -u | out-file file2.txt -encoding ascii
Another trick, seriously is to just use linux.
cat file.txt | sort -u > output.txt
Linux is so insanely fast at this, it makes me wonder what the heck microsoft is thinking with this set up.
It may not be feasible in all cases, and i understand, but if you have a linux machine, you can copy 500 megs to it, sort and unique it, and copy it back in under a couple minutes.
If each line of the log is prefixed with a timestamp, and the log messages don't contain embedded newlines (which would require special handling), I think it would take less memory and execution time to convert the timestamp from [String] to [DateTime] before sorting. The following assumes each log entry is of the format yyyy-MM-dd HH:mm:ss: <Message> (note that the HH format specifier is used for a 24-hour clock):
Get-Content unsorted.txt
| ForEach-Object {
# Ignore empty lines; can substitute with [String]::IsNullOrWhitespace($_) on PowerShell 3.0 and above
if (-not [String]::IsNullOrEmpty($_))
{
# Split into at most two fields, even if the message itself contains ': '
[String[]] $fields = $_ -split ': ', 2;
return New-Object -TypeName 'PSObject' -Property #{
Timestamp = [DateTime] $fields[0];
Message = $fields[1];
};
}
} | Sort-Object -Property 'Timestamp', 'Message';
If you are processing the input file for interactive display purposes you can pipe the above into Out-GridView or Format-Table to view the results. If you need to save the sorted results you can pipe the above into the following:
| ForEach-Object {
# Reconstruct the log entry format of the input file
return '{0:yyyy-MM-dd HH:mm:ss}: {1}' -f $_.Timestamp, $_.Message;
} `
| Out-File -Encoding 'UTF8' -FilePath 'sorted.txt';
(Edited to be more clear based on n0rd's comments)
It's might be a memory issue. Since you're loading the entire file into memory to sort it (and adding the overhead of the pipe into Sort-Object and the pipe into Get-Unique), it's possible that you're hitting the memory limits of the machine and forcing it to page to disk, which will slow things down a lot. One thing you might consider is splitting the logs up before sorting them, and then splicing them back together.
This probably won't match your format exactly, but if I've got a large log file for, say, 8/16/2012 which spans several hours, I can split it up into a different file for each hour using something like this:
for($i=0; $i -le 23; $i++){ Get-Content .\u_ex120816.log | ? { $_ -match "^2012-08-16 $i`:" } | Set-Content -Path "$i.log" }
This is creating a regular expression for each hour of that day and dumping all the matching log entries into a smaller log file named by the hour (e.g. 16.log, 17.log).
Then I can run your process of sorting and getting unique entries on a much smaller subsets, which should run a lot faster:
for($i=0; $i -le 23; $i++){ Get-Content "$i.log" | sort | get-unique > "$isorted.txt" }
And then you can splice them back together.
Depending on the frequency of the logs, it might make more sense to split them by day, or minute; the main thing is to get them into more manageable chunks for sorting.
Again, this only makes sense if you're hitting the memory limits of the machine (or if Sort-Object is using a really inefficient algorithm).
"Get-Content" can be faster than you think. Check this code-snippet in addition to the above solution:
foreach ($block in (get-content $file -ReadCount 100)) {
foreach ($line in $block){[void] $hs.Add($line)}
}
There doesn't seem to be a great way to do it in powershell, including [IO.File]::ReadLines(), but with the native windows sort.exe or the gnu sort.exe, either within cmd.exe, 30 million random numbers can be sorted in about 5 minutes with around 1 gb of ram. The gnu sort automatically breaks things up into temp files to save ram. Both commands have options to start the sort at a certain character column. Gnu sort can merge sorted files. See external sorting.
30 million line test file:
& { foreach ($i in 1..300kb) { get-random } } | set-content file.txt
And then in cmd:
copy file.txt+file.txt file2.txt
copy file2.txt+file2.txt file3.txt
copy file3.txt+file3.txt file4.txt
copy file4.txt+file4.txt file5.txt
copy file5.txt+file5.txt file6.txt
copy file6.txt+file6.txt file7.txt
copy file7.txt+file7.txt file8.txt
With gnu sort.exe from http://gnuwin32.sourceforge.net/packages/coreutils.htm . Don't forget the dependency dll's -- libiconv2.dll & libintl3.dll. Within cmd.exe:
.\sort.exe < file8.txt > filesorted.txt
Or windows sort.exe within cmd.exe:
sort.exe < file8.txt > filesorted.txt
With the function below:
PS> PowerSort -SrcFile C:\windows\win.ini
function PowerSort {
param(
[string]$SrcFile = "",
[string]$DstFile = "",
[switch]$Force
)
if ($SrcFile -eq "") {
write-host "USAGE: PowerSort -SrcFile (srcfile) [-DstFile (dstfile)] [-Force]"
return 0;
}
else {
$SrcFileFullPath = Resolve-Path $SrcFile -ErrorAction SilentlyContinue -ErrorVariable _frperror
if (-not($SrcFileFullPath)) {
throw "Source file not found: $SrcFile";
}
}
[Collections.Generic.List[string]]$lines = [System.IO.File]::ReadAllLines($SrcFileFullPath)
$lines.Sort();
# Write Sorted File to Pipe
if ($DstFile -eq "") {
foreach ($line in $lines) {
write-output $line
}
}
# Write Sorted File to File
else {
$pipe_enable = 0;
$DstFileFullPath = Resolve-Path $DstFile -ErrorAction SilentlyContinue -ErrorVariable ev
# Destination File doesn't exist
if (-not($DstFileFullPath)) {
$DstFileFullPath = $ev[0].TargetObject
}
# Destination Exists and -force not specified.
elseif (-not $Force) {
throw "Destination file already exists: ${DstFile} (using -Force Flag to overwrite)"
}
write-host "Writing-File: $DstFile"
[System.IO.File]::WriteAllLines($DstFileFullPath, $lines)
}
return
}

script to find given string and replace in all files in given directory

How to write script in powershell which finds given string in all files in given directory and changes it to given second one ?
thanks for any help,
bye
Maybe something like this
$files = Get-ChildItem "DirectoryContainingFiles"
foreach ($file in $files)
{
$content = Get-Content -path $file.fullname
$content | foreach {$_ -replace "toreplace", "replacewith"} |
Set-Content $file.fullname
}
If the string to replace spans multiple lines then using Get-Content isn't going to cut it unless you stitch together the output of Get-Content into a single string. It's easier to use [io.file]::ReadAllText() in this case e.g.:
Get-ChildItem | Where {!$_.PSIsContainer} |
Foreach { $txt = [IO.File]::ReadAllText($_.fullname);
$txt -replace $old,$new; $txt | Out-File $_}
Note with with $old, you may need to use a regex directive like '(?s)' at the beginning to indicate that . matches newline characters also.
I believe that you can get the list of all files in a directory (simple?). Now comes the replacement part. Here is how you can do it with power shell:
type somefile.txt | %{$_ -replace "string_to_be_replaces","new_strings"}
Modify it as per your need. You can also redirect the output to a new file the same way you do other redirection (using: >).
To get the list of files, use:
Get-ChildItem <DIR_PATH> -name

Resources