Powershell, download CSV into string - powershell-4.0

In Powershell, instead of downloaded a CSV file to drive from a web request, can you download a file directly into a string? I often pull earthquake data from the USGS database which outputs the data as a downloadable CVS file. The issue is data is pulled per day and when pulling years of data I end up pulling a thousand or more files, then pipe them together. If I can download directly into a string I could eliminate the read write file time needed for a thousand or more files by doing it all in memory.
The basic portion of code used to pull the file is:
$U = $env:userprofile
$Location = "Region_Earthquakes"
$MaxLat = "72.427"
$MinLat = "50.244"
$MaxLon = "-140.625"
$MinLon = "-176.133"
$yearspulled = 1
$ts = New-TimeSpan -Days 1
$Today = Get-Date
$StartDate = [datetime](((Get-Date -format "yyyy") - $yearspulled + 1).ToString() + "-01-01 00:00:00")
While ($StartDate -le $Today)
{$EndDate = $Startdate + $ts
$Start = $StartDate.ToString("yyyy-MM-dd HH:mm:ss")
$FileDate = $StartDate.ToString("yyyy_MM_dd")
$End = $EndDate.ToString("yyyy-MM-dd HH:mm:ss")
$url = "http://earthquake.usgs.gov/fdsnws/event/1/query.csv?starttime=$Start&endtime=$End&minmagnitude=-1.00&maxmagnitude=10.00&maxlatitude=$MaxLat&minlatitude=$MinLat&maxlongitude=$MaxLon&minlongitude=$MinLon&eventtype=earthquake&orderby=time"
$output = "$U\DownLoads\$Location" + "_$FileDate.csv"
(New-Object System.Net.WebClient).DownloadFile($url, $output)
Write-Output "Date pulled for $Start"
$Startdate = $Startdate + $ts}

When in doubt, read the documentation. The System.Net.WebClient class has a method DownloadString() that does exactly what you're asking.
WebClient.DownloadString Method
Namespace: System.Net
Assemblies: System.dll, netstandard.dll, System.Net.WebClient.dll, System.Net.dll
Downloads the requested resource as a String. The resource to download may be specified as either String containing the URI or a Uri.
Emphasis mine.
$s = (New-Object System.Net.WebClient).DownloadString($url)

Related

OledbDataReader using up all the RAM (in powershell)

From all my reading, the oledb datareader does not store records in memory, but this code is maxing out the RAM. Its meant to pull data from an Oracle db (about 10M records) and write them to a GZIP file. I have tried everything (including commenting out the Gzip write) and it still ramps up the RAM until it falls over. Is there are way to just execute the reader without it staying in memory? What am I doing wrong?
$tableName='ACCOUNTS'
$fileNum=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);
$cmd.CommandTimeout = '0';
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$encoding = [System.Text.Encoding]::UTF8
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())
{
$j++
$str=$reader[0..$($reader.Fieldcount-1)] -join '|'
$out=$encoding.GetBytes($("$str`n").ToString() )
$gzipStream.Write($out,0, $out.length)
if($j % 10000 -eq 0){write-host $j}
if($j % 1000000 -eq 0){
write-host 'creating new gz file'
$gzipStream.Close();
$gzipStream.Dispose()
$fileNum+=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
}
}
Edit:
from the comments, [system.gc]::Collect() had no effect. Also, stripping it down to the simplest form and only reading a single field also had no effect. This code ramps up to 16GB memory (viewed in task manager) and then quits with OOM
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);
$cmd.CommandTimeout = '0';
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())
{
$str=$reader[0]
}
Possibly it's using up virtual address space rather than actual RAM. That's a common problem with the underlying .Net garbage collector used with (at least) the ADO.Net and string objects created here, especially if any of the records have fields with lots of text.
Building on that, it looks like you're doing most of the correct things to avoid this issue (using DataReader, writing directly to a stream, etc). What you could do to improve this is writing to the stream one field at a time, rather than using -join to push all the fields into the same string and then writing, and making sure we re-use the same $out array buffer (though I'm not sure exactly what this last looks like in PowerShell or with Encoding.GetBytes().
This may help, but it still can create issues with how it concatenates the fieldDelimiter and line terminator. If you find this runs for longer, but still eventually produces an error, you probably need to do the tedious work to have separate write operations to the gzip stream for each of those values.
$tableName='ACCOUNTS'
$fileNum=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);
$cmd.CommandTimeout = '0';
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$encoding = [System.Text.Encoding]::UTF8
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())
{
$j++
$fieldDelimiter= ""
$terminator = ""
for ($k=0;$k -lt $reader.Fieldcount;$k++) {
if ($k -eq $reader.Fieldcount - 1) { $terminator = "`n"}
$out = $encoding.GetBytes("$fieldDelimiter$($reader[$k])$terminator")
$gzipStream.Write($out,0,$out.length)
$fieldDelimiter= "|"
}
if($j % 10000 -eq 0){write-host $j}
if($j % 1000000 -eq 0){
write-host 'creating new gz file'
$gzipStream.Close();
$gzipStream.Dispose()
$fileNum+=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
}
}

How save png as jpg without saving the file in dir

I'm using FromFile to get the image out of files, and it has the following error for the png's on the FromFile line:
Exception calling "FromFile" with "1" argument(s): "The given path's
format is not supported."
So, I'm trying to convert the bmp's to jpg, (see convert line above FromFile below) but all the examples I see (that seem usable) are saving the file. I don't want to save the file in the dir. All I need is the image format, so FromFile can use it like this example. I saw ConvertTo-Jpeg, but I don't think this is a standard powershell module, or don't see how to install it.
I saw this link, but I don't think that would leave the image in the format needed by FromFile.
This is my code:
$imageFile2 = Get-ChildItem -Recurse -Path $ImageFullBasePath -Include #("*.bmp","*.jpg","*.png") | Where-Object {$_.Name -match "$($pictureName)"} #$imageFile | Select-String -Pattern '$($pictureName)' -AllMatches
Write-Host $imageFile2
if($imageFile2.Exists)
{
if($imageFile2 -Match "png")
{
$imageFile2 | .\ConvertTo-Jpeg #I don't think this will work with FromFile below
}
$image = [System.Drawing.Image]::FromFile($imageFile2) step
}
else {
Write-Host "$($imageFile2) does not exist"
}
And then I put it in excel:
$xlsx = $result | Export-Excel -Path $outFilePath -WorksheetName $errCode -Autosize -AutoFilter -FreezeTopRow -BoldTopRow -PassThru # -ClearSheet can't ClearSheet every time or it clears previous data ###left off
$ws = $xlsx.Workbook.Worksheets[$errCode]
$ws.Dimension.Columns #number of columns
$tempRowCount = $ws.Dimension.Rows #number of rows
#only change width of 3rd column
$ws.Column(3).Width
$ws.Column(3).Width = 100
#Change all row heights
for ($row = 2 ;( $row -le $tempRowCount ); $row++)
{
#Write-Host $($ws.Dimension.Rows)
#Write-Host $($row)
$ws.Row($row).Height
$ws.Row($row).Height = 150
#place the image in spreadsheet
#https://github.com/dfinke/ImportExcel/issues/1041 https://github.com/dfinke/ImportExcel/issues/993
$drawingName = "$($row.PictureID)_Col3_$($row)" #Name_ColumnIndex_RowIndex
Write-Host $image
$picture = $ws.Drawings.AddPicture("$drawingName",$image)
$picture.SetPosition($row - 1, 0, 3 - 1, 0)
if($ws.Row($row).Height -lt $image.Height * (375/500)) {
$ws.Row($row).Height = $image.Height * (375/500)
}
if($ws.Column(3).Width -lt $image.Width * (17/120)){
$ws.Column(3).Width = $image.Width * (17/120)
}
}
Update:
I just wanted to reiterate that FromFile can't be used for a png image. So where Hey Scripting Guy saves the image like this doesn't work:
$image = [drawing.image]::FromFile($imageFile2)
I figured out that the $imageFile2 path has 2 filenames in it. It must be that two met the Get-ChildItem/Where-Object/match criteria. The images look identical, but have similar names, so will be easy to process. After I split the names, it does FromFile ok.

Powershell Script not outputting data into File outside of ISE

I understand that other people have had similar questions but none are like this. I made a ps1 script to convert an a file of XML objects into a CSV file of rows representing some of that data. Last night I was able to run the batch file and convert files, but this morning it saves an empty CSV file when I run from batch but it works fine when I run it in Powershell ISE.
I run it from a batch file with -STA mode to enable it to open the dialog windows:
powershell -sta C:\Users\*******\Downloads\JiraXMLtoCSV.ps1
And here is the script(it was tough to make this code block lol excuse the '}'):
# This function will open a file-picker for the user to select their Jira XML Export
Function Get-JiraXMLFile(){
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null;
$OpenFileDialog = New-Object System.Windows.Forms.OpenFileDialog;
$OpenFileDialog.initialDirectory = Get-Location;
$OpenFileDialog.filter = "XML files (*.xml)|*.xml";
$OpenFileDialog.ShowDialog() | Out-Null;
$OpenFileDialog.filename;
$OpenFileDialog.ShowHelp = $true;
}
# This function will open the file save dialong to allow the user to choose location and name of the converted XML-to-CSV file
Function Get-SaveFile(){
[System.Reflection.Assembly]::LoadWithPartialName("System.windows.forms") | Out-Null;
$SaveFileDialog = New-Object System.Windows.Forms.SaveFileDialog;
$SaveFileDialog.initialDirectory = Get-Location;
$SaveFileDialog.filter = "CSV files (*.csv)|*.csv";
$SaveFileDialog.ShowDialog() | Out-Null;
$SaveFileDialog.filename;
$SaveFileDialog.ShowHelp = $true;
}
# Invoke the file-picker function and obtain input file
$inputFile = Get-JiraXMLFile;
#initialize list for items that will be extracted from XML Input File
$list = #();
# Loop through all the items in Jira XML export file
foreach ( $item in $XMLFile.rss.channel.item ) {
# Create a new hash object
$issue = #{};
# Gather wanted attributes
$issue.Key = $item.key.InnerXML;
$issue.StatusColor = $item.statusCategory.colorName;
$issue.Status = $item.status.InnerXML;
# Check for comments
if ( $item.comments ) {
# Record the comments with column name/header format as follows: comment #0 | comment #2|...
# Change this value to 1 if you want to see it start at comment #1 instead of comment #0
$incrementalCounter = 0;
# Loop through all comments on the issue
foreach ( $comment in $item.comments.comment ) {
$issue.("comment #"+$incrementalCounter) = $comment.InnerXML;
$incrementalCounter += 1;
}
}
#Create an object to be added to the list
$object = New-Object –TypeName PSObject –Prop $issue;
Write-Output $object;
# add this issue to the list to convert/export to CSV
$list += $object;
}
# Open File Saving window to choose file name and location for the new
$OutputFile = Get-SaveFile;
$list | Export-Csv -Path ($OutputFile) -NoTypeInformation;
And if you want some sample XML to help me learn what I am doing wrong:
<rss version="0.92">
<channel>
<title>XML Export</title>
<link>...</link>
<description>An XML representation of a search request</description>
<language>en-us</language>
<issue start="0" end="7" total="7"/>
<build-info>...</build-info>
<item>
<title>[AJT-46] another new story</title>
<project id="1652" key="AJT">Advanced Training</project>
<description/>
<environment/>
<key id="220774">AJT-46</key>
<status id="16615" iconUrl="https://website.com/" description="Desc text">To Do</status>
<statusCategory id="2" key="new" colorName="gray"/>
<labels></labels>
<created>Tue, 5 Jun 2018 11:25:38 -0400</created>
<updated>Tue, 5 Jun 2018 11:29:00 -0400</updated>
<due/>
</item>
</channel>
</rss>
It was working last night and now it is not working when I showed up this morning so nothing changed that I know of, I didn't reboot either. It still works in the Powershell ISE which is fine but I need the batch file method for the person I am making it for. Any help, advice, etc. is appreciated! Thanks
Changes I made and it works now, double newline separated:
# Invoke the file-picker function and obtain input file
[Xml]$inputFile = Get-JiraXMLFile;
# Grab all the items we exported, ignore the header info
if ( $inputFile ) {
#$XmlComments = Select-Xml "//comment()" -Xml $inputFile;
#$inputFile.RemoveChild($XmlComments);
$items = Select-Xml "//rss/channel/item" -Xml $inputFile;
}
# Iterate over items and grab important info to be put into CSV format
foreach ( $item in $items ){
# Create a new hash object
$issue = #{};
# Gather wanted attributes
if( $item.Node.key){
$issue.Key = $item.Node.key.InnerXML;
}

formatting csv files and powershell

Ok so we have a manual process that runs through PL/SQL Developer to run a query and then export to csv.
I am trying to automate that process using powershell since we are working in a windows environment.
I have created two files that seems to be exact duplicates from the automated and manual process but they don't work the same so I assume I am missing some hidden characters but I can't find them or figure out how to remove them.
The most obvious example of them working differently is opening them in excel. The manual file opens in excel automatically putting each column in it's own seperate column. The automated file instead puts everything into one column.
Can anybody shed some light? I am hoping that by resolving this or at least getting some info will help with the bigger problem of it not processing correctly.
Thanks.
ex one column
"rownum","year","month","batch","facility","transfer_facility","trans_dt","meter","ticket","trans_product","trans","shipper","customer","supplier","broker","origin","destination","quantity"
ex seperate column
"","ROWNUM","RPT_YR","RPT_MO","BATCH_NBR","FACILITY_CD","TRANSFER_FACILITY_CD","TRANS_DT","METER_NBR","TKT_NBR","TRANS_PRODUCT_CD","TRANS_CD","SHIPPER_CD","CUSTOMER_NBR","SUPPLIER_NBR","BROKER_CD","ORIGIN_CD","DESTINATION_CD","NET_QTY"
$connectionstring = "Data Source=database;User Id=user;Password=password"
$connection = New-Object System.Data.OracleClient.OracleConnection($connectionstring)
$command = New-Object System.Data.OracleClient.OracleCommand($query, $connection)
$connection.Open()
Write-Host -ForegroundColor Black " Opening Oracle Connection"
Start-Sleep -Seconds 2
#Getting data from oracle
Write-Host
Write-Host -ForegroundColor Black "Getting data from Oracle"
$Oracle_data=$command.ExecuteReader()
Start-Sleep -Seconds 2
if ($Oracle_data.read()){
Write-Host -ForegroundColor Green "Connection Success"
while ($Oracle_data.read()) {
#Variables for recordset
$rownum = $Oracle_data.GetDecimal(0)
$rpt_yr = $Oracle_data.GetDecimal(1)
$rpt_mo = $Oracle_data.GetDecimal(2)
$batch_nbr = $Oracle_data.GetString(3)
$facility_cd = $Oracle_data.GetString(4)
$transfer_facility_cd = $Oracle_data.GetString(5)
$trans_dt = $Oracle_data.GetDateTime(6)
$meter_nbr = $Oracle_data.GetString(7)
$tkt_nbr = $Oracle_data.GetString(8)
$trans_product_cd = $Oracle_data.GetString(9)
$trans_cd = $Oracle_data.GetString(10)
$shipper_cd = $Oracle_data.GetString(11)
$customer_nbr = $Oracle_data.GetString(12)
$supplier_nbr = $Oracle_data.GetString(13)
$broker_cd = $Oracle_data.GetString(14)
$origin_cd = $Oracle_data.GetString(15)
$destination_cd = $Oracle_data.GetString(16)
$net_qty = $Oracle_data.GetDecimal(17)
#Define new file
$filename = "Pipeline" #Get-Date -UFormat "%b%Y"
$filename = $filename + ".csv"
$fileLocation = $newdir + "\" + $filename
$fileExists = Test-Path $fileLocation
#Create object to hold record
$obj = new-object psobject -prop #{
rownum = $rownum
year = $rpt_yr
month = $rpt_mo
batch = $batch_nbr
facility = $facility_cd
transfer_facility = $transfer_facility_cd
trans_dt = $trans_dt
meter = $meter_nbr
ticket = $tkt_nbr
trans_product = $trans_product_cd
trans = $trans_cd
shipper = $shipper_cd
customer = $customer_nbr
supplier = $supplier_nbr
broker = $broker_cd
origin = $origin_cd
destination = $destination_cd
quantity = $net_qty
}
$records += $obj
}
}else {
Write-Host -ForegroundColor Red " Connection Failed"
}
#Write records to file with headers
$records | Select-Object rownum,year,month,batch,facility,transfer_facility,trans_dt,meter,ticket,trans_product,trans,shipper,customer,supplier,broker,origin,destination,quantity |
ConvertTo-Csv |
Select -Skip 1|
Out-File $fileLocation
Why are you skipping the first row(usually the headers)? Also, try using Export-CSV instead:
#Write records to file with headers
$records | Select-Object rownum, year, month, batch, facility, transfer_facility, trans_dt, meter, ticket, trans_product, trans, shipper, customer, supplier, broker, origin, destination, quantity |
Export-Csv $fileLocation -NoTypeInformation

Windows Script to consolidate files

I have to work with a huge number of text files. I am able to consolidate the files into one single file. But I also have the use of the file name in my work and I would like to have it before the text of the file itself in excel format, preferably the first column should contain the names of files and the columns afterwards can contain the data.
Any help would be appreciated. Thanks.
Here's the Powershell script. You might need to modify it a bit to look for specific file extensions as now it's only looking for PS1 files
[System.Threading.Thread]::CurrentThread.CurrentCulture = New-Object System.Globalization.CultureInfo("en-US")
$excel = new-Object -comobject Excel.Application
$excel.visible = $false
$workBook = $excel.Workbooks.Add()
$sheet = $workBook.Sheets.Item(1)
$sheet.Name = "Files"
$sheet.Range("A1", "B1").Font.Bold = $true
$sheet.Range("A1","A2").ColumnWidth = 40
$sheet.Range("B1","B2").ColumnWidth = 100
$sheet.Cells.Item(1,1) = "Filename"
$sheet.cells.Item(1,2) = "Content"
$files = get-childitem C:\PST -recurse | where {$_.extension -eq ".ps1"}
$index = 2
foreach($file in $files)
{
$sheet.Cells.Item($index,1) = $file.FullName
$sheet.Cells.Item($index,2) = [System.IO.File]::ReadAllText($file.FullName)
$index++
}
$workBook.SaveAs("C:\PST\1.xlsx")
$excel.Quit()
Note: I'm not pretending that it's perfect, you still need to polish it and refactor it, but at least it will give you direction

Resources