Transform text file : how to? - etl

I have a text file with 85 fields , comma separated.
I need to produce a different text file with only 25 fields.
The first option i've thought is to import that file in a database, then re-export only the fields i need.
What are other tool / option I can use ?
Does exists some command-line tool (Windows) to transform a text file without going through database ?
Thanks

You can use a Powershell Script (Microsoft). Please see also https://technet.microsoft.com/de-de/library/ee176874.aspx
Create a text file with the following content:
Name,Department,Title
Pilar Ackerman,Research,Manager
Jonathan Haas,Finance,Finance Specialist
Ken Myer,Finance,Accountant
The following command extracts all infomrmation and filters for department = Finance
Import-Csv c:\temp\test.txt | Where-Object {$_.department -eq "Finance"}
Name Department Title
---- ---------- -----
Jonathan Haas Finance Finance Specialist
Ken Myer Finance Accountant
This commands accesses specific columns
Import-Csv -Delimiter (",") -Header "Name","Department","Title" -Path c:\temp\test.txt | SELECT {$_.Title + " " + $_.Name + " " + $_.Department }
$_.Title + " " + $_.Name + " " + $_.Department
-----------------------------------------------
Title Name Department
Manager Pilar Ackerman Research
Finance Specialist Jonathan Haas Finance
Accountant Ken Myer Finance
You can, of course, also save the result in a new file.

Related

Export text ouput into csv format ready for insert into databases using Powershell

I wish to pipe aws cli output which appears on my screen as text output from a powershell session into a text file in csv format.
I have researched the Export-CSV cmdlet from articles such as the below:
https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/export-csv?view=powershell-7.1
I cannot see how to use this to help me with my goal. From my testing, it only seems to work with specific windows programs, not general text output.
An article on this site shows how you can achieve my goal with unix commands, by replacing spaces with commas.
Output AWS CLI command with filters to CSV without jq
The answer with unix is to use sed at the end of the command like so:
aws rds describe-db-instance-automated-backups --query 'DBInstanceAutomatedBackups[*].{ARN:DBInstanceArn,EarliestTime:RestoreWindow.EarliestTime,LatestTime:RestoreWindow.LatestTime}' --output text | sed -E 's/\s+/,/g'
Export-csv` appears to not be able to do this.
Does anyone know how I might replicate what sed is doing here with powershell?
Here is an example of the output that I would like in csv format:
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27
:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28
:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29
:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31
:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-
05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26
:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29
:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
It's possible that you are working with a tab delimited text file, with no headers.
The tab separator can look like multiple spaces when it is displayed on your screen.
If this is the case, If so, you can actually read this file with import-csv, but you have to use the -header parameter to supply your own field names, and the -delimiter character to use tab as the delimiter. The tab character has to be specified using the backtick escape mechanism.
For details, see the accepted answer to this question.
If you have control over your data feed, there is an alternative. The aws cli interface has an option to format the output in JSON format. That format will be much easier to import into Powershell in a form you can use.
Edit:
The following script uses the mockup provided by Theo, except that the multiple spaces have been replaced by a tab character. It uses ConvertFrom-Csv rather than Import-Csv, but it's the same idea:
$awsReturn = #"
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
"#
$myarray = $awsreturn | ConvertFrom-Csv -header "Prod","DateStart","DateEnd" -delimiter "`t"
$myarray | Format-Table
$myarray | gm
When I ran it in my environment, it produced the following:
Prod DateStart DateEnd
---- --------- -------
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
TypeName: System.Management.Automation.PSCustomObject
Name MemberType Definition
---- ---------- ----------
Equals Method bool Equals(System.Object obj)
GetHashCode Method int GetHashCode()
GetType Method type GetType()
ToString Method string ToString()
DateEnd NoteProperty string DateEnd=2019-03-05T01:25:53Z
DateStart NoteProperty string DateStart=2019-03-03T09:54:29.402Z
Prod NoteProperty string Prod=arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod
Lets assume the data returned looks like this mockup (in the question it is strangely formatted):
$awsReturn = #"
arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod 2019-03-03T09:54:29.402Z 2019-03-05T01:25:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf 2019-03-01T09:04:31.477Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb 2019-02-01T09:07:30.648Z 2019-03-05T01:27:20Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb 2019-02-02T09:04:30.771Z 2019-03-05T01:28:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault 2019-02-26T14:14:30.254Z 2019-03-05T01:29:13Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault 2019-02-01T14:05:40.456Z 2019-03-05T01:31:05Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod 2019-02-02T14:06:26.050Z 2019-03-05T01:27:02Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod 2019-02-01T14:12:05.286Z 2019-03-05T01:26:53Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage 2019-03-02T09:54:29.053Z 2019-03-05T01:29:11Z
arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod 2019-02-02T22:09:00.673Z 2019-03-05T01:29:40Z
arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod 2019-03-02T09:54:20.729Z 2019-03-05T01:30:21Z
"#
Then, you can do this:
# Since I don't know if that is one single string or a string array:
if ($awsReturn -isnot [array]) { $awsReturn = $awsReturn -split '\r?\n' }
# write it to csv file
$awsReturn -replace '\s+', ',' | Set-Content -Path 'WhereEver.csv' -PassThru # PassThru also displays on screen
to get a file that can serve as CSV (although it has no headers or quoted fields)
If you want to use Export-CSV to get a csv file with headers and quoted fields, you need to split the lines and output objects.
Something like this:
# Since I don't know if that is one single string or a string array:
if ($awsReturn -isnot [array]) { $awsReturn = $awsReturn -split '\r?\n' }
# write it to csv file (without headers or quotes values)
$awsReturn | ForEach-Object {
$data = $_ -split '\s+' # in this case we know we have 3 fields
[PsCustomObject]#{
Prod = $data[0]
DateStart = $data[1]
DateEnd = $data[2]
}
} | Export-Csv -Path 'WhereEver.csv' -NoTypeInformation
The WhereEver.csv file will then look like this:
"Prod","DateStart","DateEnd"
"arn:aws:rds:ap-southwest-2:9711387875370:db:catflow--prod","2019-03-03T09:54:29.402Z","2019-03-05T01:25:53Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:xyz-prod-rds-golf","2019-03-01T09:04:31.477Z","2019-03-05T01:28:40Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-stardb","2019-02-01T09:07:30.648Z","2019-03-05T01:27:20Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-asm-prod-rds-domaindb","2019-02-02T09:04:30.771Z","2019-03-05T01:28:40Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-ctz-prod-rds-datavault","2019-02-26T14:14:30.254Z","2019-03-05T01:29:13Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-gcp-prod-rds-datavault","2019-02-01T14:05:40.456Z","2019-03-05T01:31:05Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-conformed-datavault-prod","2019-02-02T14:06:26.050Z","2019-03-05T01:27:02Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-dqm-datavault-prod","2019-02-01T14:12:05.286Z","2019-03-05T01:26:53Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-prod-dgc-cde-lineage","2019-03-02T09:54:29.053Z","2019-03-05T01:29:11Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:prod-rec-prod","2019-02-02T22:09:00.673Z","2019-03-05T01:29:40Z"
"arn:aws:rds:ap-southwest-2:9711387875370:db:-serve-prod","2019-03-02T09:54:20.729Z","2019-03-05T01:30:21Z"

Export Firefox bookmarks (and tags) to CSV?

Firefox lets you export to HTML, and while I could write a script that uses regex to parse that into CSV I was curious if there were any existing utilities / Firefox addons that allowed us to directly export to CSV. Also interested if there is any way to import like this.
I believe such an extension does not exist as of yet, but I wanted to make you aware that you can also export your bookmarks to JSON format, which might make a conversion to CSV easier compared to working with the HTML export (... depending).
Mozilla's official Firefox support page Restore bookmarks from backup or move them to another computer mentions how to do this under "Manual backup", though I find navigating there via the browser menu bar easier:
Bookmarks > Show All Bookmarks, click the star-shaped button and select Backup.... This will prompt the Save File dialogue for a JSON file named bookmarks-YYYY-MM-DD.json with the current date.
Edit: the closest solution to using a FF add-on is probably a JavaScript bookmarklet. I included code for a first simple version in a Gist over on GitHub. You'd run this with the HTML export of your bookmarks open in your browser.
I put this (Powershell script) together just for this question. I do not have a way to import though. See my comments in the code for explanation of what is happening. There is really more to a bookmark than the name and URL but these are the most essential pieces of data so that's all I collected for the CSV. Also, you have to export the bookmarks to HTML, what this does is convert that to a CSV file.
#set paths
#where your bookmarks.html is
$bkmarkPath = "C:/Users/jhancock/Desktop/test/FFbookmarks/bookmarks.html"
#where you want your CSV file to be.
$newCSVPath = 'C:/Users/jhancock/Desktop/test/FFbookmarks/bookmarks.csv'
#get the HTML and parse it out.
$bookmarkpage = New-Object -ComObject "HTMLFile"
$bookmarkpage.IHTMLDocument2_write($(Get-content $bkmarkPath -Raw))
#get the links, and link names and put into variable.
$atags = $bookmarkpage.all.tags("a") | % innerText;
$links = $bookmarkpage.links | % ie8_href
#clear the file if it exists
if (Test-Path $newCSVPath) {
clear-content $newCSVPath
}
#create a new csvfile if it doesn't exist
"""Name"",""URL""`n" | Out-File $newCSVPath -Append
#add number of lines equal to number of links
For ($i=0; $i -lt $links.length; $i++) {
"`n"""",""""" | Out-File $newCSVPath -Append
}
#sleep while file is created
start-sleep 2
#import our fresh CSV file
$csv = Import-Csv $newCSVPath -Header Name, URL | Select-object -skip 1
#populate our links and URLs into the CSV
$numItems = $links.Count
for ($i = 0; $i -lt $numItems; $i++) {
$csv[$i].Name = $atags[$i]
$csv[$i].URL = $links[$i]
}
#Generate the CSV!
$csv | Export-Csv $newCSVPath -NoTypeInformation

Mapping tons of printers

Can you help me understand why this script wont work?
I need to map tons of printers
$path = 'C:\temp\printers.csv'
Import-Csv -Header ('Printernames') -Path $path
foreach ($Printer in $Printername) {
start \\print01\$Printer
}
It's Looks like it takes the header of the CSV file within everytime it loops.
The Import-Csv cmdlet automatically reads the first line of the input file as the CSV headers. The -Header parameter exists so you can provide custom headers in case your data file comes without headers.
Example:
Consider a file input.csv with the following content:
1,"a",23
2,"b",42
If you read that file normally, the first line would be interpreted as the headers of the CSV:
PS C:\> Import-Csv 'input.csv'
1 a 23
- - --
2 b 42
To import all rows as data rows you provide custom headers via the parameter -Header:
PS C:\> Import-Csv 'input.csv' -Header A,B,C
A B C
- - -
1 a 23
2 b 42

PowerShell csv character replacement

I manage to generate a csv through PowerShell script on collecting a group of server disk info, but the result output on the csv file require some data massage.
below will be the script for ref:
foreach($pc in $comp)
$diskvalue += Get-WmiObject #Params | Select #{l='drives';e='DeviceID'}, #{l='server',e='SystemName'}, #{Name=”size(MB)”;Expression={“{0:N1}” -f($_.size/1mb)}}, #{Name=”freespace(MB)”;Expression={“{0:N1}” -f($_.freespace/1mb)}}, #{Name=”UsedSpace(MB)”;Expression={“{0:N2}” -f(($_.size - $_.FreeSpace)/1mb)}}
$diskvalue | Export-Csv C:\disk_info\DiskReport.csv -NoTypeInformation
The output csv file on "drives" column will contain:
C:
Yet I would like to remove the ":" at the back that data output.
C
Change:
#{l='drives';e='DeviceID'}
to
#{l='drives';e={"$($_.DeviceID)".Trim(": ")}
Trim(": ") will remove any whitespace and : characters from the DeviceID string

Extract session ID from psexec / PowerShell query user command

I'm writing a PowerShell script to find out the session ID of the active user at a remote machine, to then launch a program using that session ID. Here is what I have so far.
$queryusers = $psexecdirectory + ' \\' + $remotepc + ' -u ' + $domain + '\' + $username + ' -p ' + $password + ' query user'
$results = iex $queryusers
The above works fine, with the example results below being stored on the variable $results
USERNAME SESSIONNAME ID STATE IDLE TIME LOGON TIME
usr1 3 Disc 1:12 9/5/2013 11:59
AM
>usr2 rdp-tcp#1 4 Active . 9/5/2013 11:59
AM
I've used the below to get the ID, but the number on session name 'rdp-ctp#0' changes when another user logs in, like in the output above, rendering it useless:
$id = $results | Select-String "$rdp-tcp#0\s+(\w+)" |
Foreach {$_.Matches[0].Groups[1].Value}
I am unfamiliar with the PowerShell syntax, and have been unable to find a site where formatting options are explained. Can someone help me out? And if you know of a website where I can learn more about extracting snippets from strings? Thanks in advance.
Try this:
$id = $results | ? { $_ -match '(\d+)\s+Active' } | % { $matches[1] }
The regular expression (\d+)\s+Active will match the keyword "Active" preceeded by a number and the subsequent loop returns the first submatch (i.e. the number).

Resources