Well as the title states i made a PowerShell script that had such a horrible performance that it overextended the server ressources and crashed it.
The script reads an entire.xml file and appends a text at the beginning and at the end of the file. Also it changes the name of the file accroding to what is located in my filename.txt.
The .xml files are around 500 MB big and have over 4.7 million rows. Is there a way, that i don't have to read the entire file but not loose information?
function start-jobhere([scriptblock]$block){
start-job -argumentlist (get-location),$block { set-location $args[0]; invoke-expression $args[1] }
}
$handler_button1_Click= {
Try{
$job3 = start-jobhere {
#Text that should be at filebeginning
#('<?xml version="1.0" encoding="UTF-8"?>
<ids:ControlInfo>
<ids:ObjectFormat>CSV</ids:ObjectFormat>
<ids:SeparatorForCSV>;</ids:SeparatorForCSV>
</ids:ControlInfo>
<ids:BatchDeltaUntil></ids:BatchDeltaUntil>
</ids:BatchInfo>
</ids:Header>
<ids:Body>'
) + (get-content ZUB_Lokalisation.xml) | set-content ZUB_Lokalisation.xml
#Text that should be at file end
Add-Content ZUB_Lokalisation.xml -Value "</ids:Body>`n</ids:SimpleOperation>"
#Information that goes into the header of the file but has to be extracted from the filename inside a .txt
$filename = Select-String filename.txt -Pattern "Lokalisation"
$nameoffile = [System.IO.Path]::GetFileName($filename)
$split = $nameoffile.split('_')
$finalid = $split[5]
$content = Get-Content ZUB_Lokalisation.xml
$content[8] = ' <ids:BatchInfo ids:BatchID="{0}">' -f $finalid
$content | Set-Content ZUB_Lokalisation.xml
#Rename the file
Rename-Item ZUB_Lokalisation.xml -NewName $filename}
}catch [System.Exception]{zeigen
[System.Windows.Forms.MessageBox]::Show("ZUB_LOK_ERROR", "ERROR")}
}
Get-Job | Wait-Job | Where State -eq "Running"
}
Create files containing the start and end fragments that you want.
Then run this in a dos window or batch file:
COPY StartFile.TXT + YourXMLFile.TXT + EndFile.TXT OutputFile.TXT
This sticks the three files together and saves them as OutputFile.TXT
Related
In windows, how can I batch convert base64 file names in a folder to their original names assuming every file name in the folder is encoded with base64
You can do this by iterating the path of the files and try to decode the base64 basenames of those files. If that succeeds, rename the file.
Get-ChildItem -Path 'TheFolderWhereTheFilesAre>' -File | ForEach-Object {
# store the file name for when we hit the catch block
$file = $_.FullName
try {
$newBase = [System.Text.Encoding]::Default.GetString([System.Convert]::FromBase64String($_.BaseName))
$_ | Rename-Item -NewName ('{0}{1}' -f $newBase, $_.Extension) -ErrorAction Stop
}
catch {
Write-Warning "Error renaming file '$file':`r`n$_.Exception.Message"
}
}
I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file, (since incoming data files could have new columns added at any position without notice). If the columns in both files match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file. Any help would be appreciated. Thanks!
Data file is ps_job.txt
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
Meta data file is ps_job_metadata.dat
“empid”|”name”|”zipcode”|”salary”
I would like my output to be
“empid”|”name”|”zipcode”|”salary”
“1”|”Tom”|”11111″|”1000″
“2”|”Ann”|”22222″|”2000″
That's a seemingly simple question with a very complicated answer. However, I've broken down the code for what you will need to do. Here are the steps that need to happen in order for powershell to do everything you're asking of it.
Read the .dat file
Save the .dat data into an object
Read the .txt file
Save the .txt header into an object
Check for the differences
Delete the old text file (that had too many columns)
Create a new text file with the new columns
I've made some assumptions in how this looks. However, with the way I've structured the code, it should be easy enough to make modifications as necessary if my assumptions are wrong. Here are my assumptions:
The text file will always have all of the columns that the DAT file has (even though it will sometimes have more)
The dat file is structured like a text file and can be directly imported into powershell.
And here is the code, with comments. I've done my best to explain the purpose of each section, but I've written this with the expectation that you have a basic knowledge of powershell, especially arrays. If you have questions I'll do my best to answer, though I'll ask that you refer to the section of code you have questions on.
###
### The paths. I'm sure you will have multiples of each file. However, I didn't want to attempt to pull in
### the files with this sample code as it can vary so much in your environment.
###
$dat = "C:\StackOverflow\thingy.dat"
$txt = "C:\stackoverflow\ps_job.txt"
###
### This is the section to process the DAT file
###
# This will read the file and put it in a variable
$dat_raw = get-content -Path $dat
# Now, let's seperate out the punctuation and give us our object
$dat_array = $dat_raw.split("|")
$dat_object = #()
foreach ($thing in $dat_array)
{
$dat_object+=$thing.Replace("""","")
}
###
### This is the section to process the TXT file
###
# This will read the file and put it into a variable
$txt_raw = get-content -Path $txt
# Now, let's seperate out the punctuation and give us our object
$txt_header_array = $txt_raw[0].split("|")
$txt_header_object = #()
foreach ($thing in $txt_header_array)
{
$txt_header_object += $thing.Replace("""","")
}
###
### Now, let's figure out which columns we're eliminating (if any)
###
$x = 0
$total = $txt_header_object.count
$to_keep = #()
While ($x -le $total)
{
if ($dat_object -contains $txt_header_object[$x])
{
$to_keep += $x
}
$x++
}
### Now that we know which objects to keep, we can apply the changes to each line of the text file.
### We will save each line to a new variable. Then, once we have the new variable, we will delete
### The existing file with a new file that has only the data we want.Note, we will only run this
### Code if there's a difference in the files.
if ($total -ne $to_keep.count)
{
### This first section will go line by line and 'fix' the number of columns
$new_text_file = #()
foreach ($line in $txt_raw)
{
if ($line.Length -gt 0)
{
# Blank out the array each time
$line_array = #()
foreach ($number in $to_keep)
{
$line_array += ($line.split("|"))[$number]
}
$new_text_file += $line_array -join "|"
}
else
{
$new_text_file +=""
}
}
### This second section will delete the original file and replace it with our good
### file that has been created.
Remove-item -Path $txt
$new_text_file | out-file -FilePath $txt
}
This small example can be a start for your solution :
$ps_job = Import-Csv D:\ps_job.txt -Delimiter '|'
$ps_job_metadata = (Get-Content D:\ps_job_metadata.txt) -split '\|'-replace '"'
foreach( $d in (Compare-Object $column $ps_job_metadata))
{
if($d.SideIndicator -eq '<=')
{
$ps_job | %{ $_.psobject.Properties.Remove($d.InputObject) }
}
}
$ps_job | Export-Csv -Path D:\output.txt -Delimiter '|' -NoTypeInformation
I tried this and it works.
$outputFile = "C:\Script_test\ps_job_mod.dat"
$sample = Import-Csv -Path "C:\Script_test\ps_job.dat" -Delimiter '|'
$metadataLine = Get-Content -Path "C:\Script_test\ps_job_metadata.txt" -First 1
$desiredColumns = $metadataLine.Split("|").Replace("`"","")
$sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Please note that the smart quotes are in consistent over the rows and there are empty lines between the rows (I highly recommend to reformat/update your question).
Anyways, as long as the quoting of the header is consistent between the two (ps_job.txt and ps_job_metadata.dat) files:
# $JobTxt = Get-Content .\ps_job.txt
$JobTxt = #'
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
'#
# $MetaDataTxt = Get-Content .\ps_job_metadata.dat
$MetaDataTxt = #'
“empid”|”name”|”zipcode”|”salary”
'#
$Job = ConvertFrom-Csv -Delimiter '|' $JobTxt
$MetaData = ConvertFrom-Csv -Delimiter '|' (#($MetaDataTxt) + 'x|')
$Job | Select-Object $MetaData.PSObject.Properties.Name
“empid” ”name” ”zipcode” ”salary”
------- ------ --------- --------
“1” ”Tom” ”11111″ ”1000″
“2” ”Ann” ”22222″ ”2000″
Here's the same answer I posted to your question on Powershell.org
$jobfile = "ps_job.dat"
$metafile = "ps_job_metadata.dat"
$outputfile = "some_file.csv"
$meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|')
Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {
Select($line,$meta)
{
$meta | foreach{$this.add($_,(iex "`$line.$_"))}
}
ColumnSelector($line,$meta)
{
$this.select($line,$meta)
}
}
import-csv $jobfile -Delimiter '|' |
foreach{[pscustomobject]([columnselector]::new($_,$meta))} |
Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Output
PS C:\>Get-Content $outputfile
"empid"|"name"|"zipcode"|"salary"
"1"|"Tom"|"11111"|"1000"
"2"|"Ann"|"22222"|"2000"
Provided you want to keep those curly quotes and your code page and console font supports all the characters, you can do the following:
# Create array of properties delimited by |
$headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers
is there a way to convert html to plaintext?
I have a script that exports all NuGet-Licenses which been used in a visual studio project to a textfile.
Unfortunately the exports are mostly in HTML, and I found no way to solve it.
# Run in Package Manager Console with `./download-packages-license.ps1`.
# If access denied, execute `Set-ExecutionPolicy -Scope Process -ExecutionPolicy RemoteSigned`.
# Save licenses to One text file and one csv file instead of individual files
$LicensesFile = (Join-Path (pwd) 'licenses\Licenses.txt')
$LicensesFile_csv = (Join-Path (pwd) 'licenses\Licenses.csv')
$results = #()
# Below 2 lines to comment if you uncomment Split-Path ..
$solutionFile = "d:\Solutions\SolFile.sln"
cd "d:\Solutions"
# Uncomment below line if you wish to want to use above 2 lines
# Split-Path -parent $dte.Solution.FileName | cd;
New-Item -ItemType Directory -Force -Path ".\licenses";
#( Get-Project -All | ? { $_.ProjectName } | % {
Get-Package -ProjectName $_.ProjectName | ? { $_.LicenseUrl }
} ) | Sort-Object Id -Unique | % {
$pkg = $_;
Try
{
if ($pkg.Id -notlike 'microsoft*' -and $pkg.LicenseUrl.StartsWith('http'))
{
Write-Host ("Download license for package " + $pkg.Id + " from " + $pkg.LicenseUrl);
#Write-Host (ConvertTo-Json ($pkg));
$licenseUrl = $pkg.LicenseUrl
if ($licenseUrl.contains('github.com')) {
$licenseUrl = $licenseUrl.replace("/blob/", "/raw/")
}
$extension = ".txt"
if ($licenseUrl.EndsWith(".md"))
{
$extension = ".md"
}
(New-Object System.Net.WebClient).DownloadFile($licenseUrl, (Join-Path (pwd) 'licenses\') + $pkg.Id + $extension);
$licenseText = get-content "$((Join-Path (pwd) 'licenses\') + $pkg.Id + $extension)"
Remove-Item $((Join-Path (pwd) 'licenses\') + $pkg.Id + $extension) -ErrorAction SilentlyContinue -Force
$data = '' | select PkgId, LicenseText
$data.PkgId = $pkg.Id
$data.LicenseText = $licenseText | Out-String
$results += $data
# save in txt file
"Designation: NugetPackage $($pkg.Id)" | Add-Content $LicensesFile
$licenseText | Add-Content $LicensesFile
"" | Add-Content $LicensesFile
"" | Add-Content $LicensesFile
"" | Add-Content $LicensesFile
"" | Add-Content $LicensesFile
Write-Host "Package $($pkg.Id): License Text saved to $LicensesFile" -ForegroundColor Green
}
}
Catch [system.exception]
{
Write-Host ("Could not download license for " + $pkg.Id)
}
}
# save in .csv file
$results | Export-Csv $LicensesFile_csv -nti
Source of the Script here
A user also said ,,Unfortunately, most license URLs now point to HTML-only versions (early 2020). For example, licenses.nuget.org ignores any "Accept: text/plain" (or json) headers and returns html regardless"
So is there even a way to get the license information in plaintext?
Thanks and stay healthy!
So is there even a way to get the license information in plaintext?
Actually, we do not recommend that you convert the html file into plaintext format. And when you get the license data from nuget.org, it is the data returned from the site in full HTML format, which is designed by that.
The returned data also contains various formats for the license field, so we should not easily modify the accepted data format(such as plaintext ). And if possible, the only way to do this is to get rid of the HTML format fields from the source data, but however, it is impossible by Powershell and it cannot be done so far.
Therefore, in order to strictly follow the format of the returned data, it is best to use an HTML file to receive license info. It can maintain consistency with the website in the form of html.
Suggestion
1) change these in powershell:
$LicensesFile = (Join-Path (pwd) 'licenses\Licenses.html')
$LicensesFile_csv = (Join-Path (pwd) 'licenses\Licenses_csv.html')
And then you can get what you want.
Hope it could help you.
I am not sure if it's possible. i want to add filename at the end of text file each line.
assume i have a text file Sam_NEW.txt Tom_New.txt Robin_New.txt etc. inside the text follow line available
test1.rar
test2.rar
test3.rar
i want to have output
copy "C:\test1.rar" "F:\Sam_NEW\"
copy "C:\test2.rar" "F:\Sam_NEW\"
copy "C:\test3.rar" "F:\Sam_NEW\"
copy "C:\test1.rar" "F:\Tom_New\"
copy "C:\test2.rar" "F:\Tom_New\"
copy "C:\test3.rar" "F:\Tom_New\"
copy "C:\test1.rar" "F:\Robin_New\"
copy "C:\test2.rar" "F:\Robin_New\"
copy "C:\test3.rar" "F:\Robin_New\"
and save the text files. english is not my first language here is the image what i am trying to do
https://i.imgur.com/V2VTHa4.png
here is replace code so far i have.
(Get-Content C:\temp\*.txt) -creplace '^', '"C:\' | Set-Content C:\temp\*.txt
(Get-Content C:\temp\*.txt) -creplace '$', '"F:\HOW TO add here filename \"' | Set-Content C:\temp\*.txt
i am stuck in last part. how to add file name for the destination folder?
You'll want something like this:
$item = get-item -path "C:\temp\test.txt"
$lines = get-content -path $item.fullname
$newfileoutput = #()
foreach ($line in $lines){
$newfileoutput += 'copy "C:\' + $line + '" "F:\' + $item.basename + '\"'
}
$newfileoutput | set-content $item.fullname
But I can only encourage you to deepen your knowledge of simple cmdlets like get-item, get-content and the like. I don't have the impression that you understand the code you're writing. Sometimes, less code (and more pipelining) is making things more complicated. Try and write code that you understand.
I don't know that this code will do exactly what you're looking for, but I've tried to write it in a clear way with lots of explanation. Hopefully the techniques and cmdlets in here are helpful to you.
$RarFileNames = Get-ChildItem -Path C:\Temp -Filter *.rar | Select-Object -ExpandProperty Name
$NewFolderPaths = Get-ChildItem -Path F:\ -Directory | Select-Object -ExpandProperty FullName
foreach( $NewFolderPath in $NewFolderPaths )
{
foreach( $RarFile in $RarFileNames )
{
# EXAMPLE: C:\Temp\test1.rar
$RarFilePath = Join-Path -Path $RarFolderPath -ChildPath $RarFile
# EXAMPLE: Sam_New.txt
$NewFileName = (Split-Path $NewFolderPath -Leaf) + '.txt'
# EXAMPLE: F:\Sam_NEW\Sam_NEW.txt
$NewFilePath = Join-Path -Path $NewFolderPath -ChildPath ($NewFileName)
# This is the string that will be saved in the .txt file
# EXAMPLE: copy "C:\Temp\test1.rar" "C:\Sam_NEW\"
$StringToOutput = 'copy "' + $RarFilePath + '" "' + $NewFolderPath + '"'
# Append that string to the file:
Add-Content -Value $StringToOutput -Path $NewFilePath
}
}
With Powershell i'm trying to split a text file into multiple files using the the beginning of each line as a delimiter
Input file (transfer.txt):
3M|9935551876|11.99|2235641|001|1|100|N|780
3M|1135741031|13.99|8735559|003|1|100|N|145
3M|5835551001|20.50|4556481|002|1|100|N|222
3M|4578420001|33.00|1125785|001|1|100|N|652
8L|00811444243|134148|4064080040|1|02/05/2017 21:15:13|8|170502707|19.85
8L|00811444243|130925|4189133003|1|02/05/2017 21:15:13|8|170502707|4.69
8L|00811444243|136513|4186144003|2|02/05/2017 21:15:13|8|170502707|10.83
Output file (Article.txt):
3M|9935551876|11.99|2235641|001|1|100|N|780
3M|1135741031|13.99|8735559|003|1|100|N|145
3M|5835551001|20.50|4556481|002|1|100|N|222
3M|4578420001|33.00|1125785|001|1|100|N|652
Here's a snippet of my code:
$Path = "D:\BATCH\"
$InputFile = (Join-Path $Path "transfer.txt")
$Reader = New-Object System.IO.StreamReader($InputFile)
while (($Line = $Reader.ReadLine()) -ne $null) {
if ($Line.StartsWith("3M")) {
$OutputFile = "Article.txt"
}
Add-Content (Join-Path $Path $OutputFile) $Line
}
This as a result, creates the same file as the input file. What's wrong with the code?
The below line is the problem. It is outside the If loop and adding the content of each line to the output file. But as I understand, that is not what you want. You want only the content that pass the If condition to be added to the output file. Hence, it needs to be inside the If loop.
Add-Content (Join-Path $Path $OutputFile) $Line
Although I am not too found of this approach because you would be making as many Disk I/O operations as there are lines that pass the if condition. Not very good for scalability.
You can change your code to something like this to reduce number of Disk I/O to just 1.
$out = While (($Line = $Reader.ReadLine()) -ne $null) {
If ($Line.StartsWith("3M")) {
$Line
}
}
$OutputFile = "Article.txt"
Add-Content (Join-Path $Path $OutputFile) $Out
As others have already pointed out, you never change the output file to anything different from "Article.txt", and you write all input lines to the defined output file.
If you want to write the lines of the input file to different files depending on the value of the first field I'd recommend naming the output files after that value. And since you're writing the output with Add-Content I'd also suggest reading the input file via Get-Content for simplicity reasons. Use a StreamReader when performance is an issue (in which case you'll want to use a StreamWriter too), but not just because.
Get-Content $InputFile | ForEach-Object {
$basename, $null = $_.Split('|', 2)
Add-Content (Join-Path $Path "${basename}.txt") $_
}