Powershell Invoke-WebRequest extract specific from the filtered text - powershell-4.0

I have written a PowerShell script which will extract the required text from the url which is as follows
$ExtractData = Invoke-WebRequest "https://www.somesite.com/downloads"
$ExtractData = $ExtractData.tostring() -split "[`r`n]" | select-string "http://somesite.com/download"
Which is giving the result as follows
onclick="_gaq.push(['_trackEvent', 'Downloads', 'http://somesite.com/download/some.exe']);">
I thought of writing it splitting by comma separated one but is there any better way to do to get only this thing
http://somesite.com/download/some.exe
My try with regex
$regex = ‘(http|ftp|https)://([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:/~+#-]*[\w#?^=%&/~+#-])?’
$ExtractData= $ExtractData | select-string -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value }
$ExtractData
Which is giving this path but not with exe
http://somesite.com/download

Use Regex.Matches to extract all links in an array of Match records, then collect Groups[1].Value:
$webpage = Invoke-WebRequest "https://www.somesite.com/downloads"
$links = ([regex]'((?:ftp|https?)://\S+?)[''"]').Matches($webpage) |
ForEach { [Web.HTTPUtility]::HtmlDecode($_.Groups[1].Value) }
Note, since we're processing raw HTML the URLs may be HTML-encoded with & instead of & so HtmlDecode was used.

Related

PowerShell: compare 2 large CSV files to find users that don't exist in one of them

I have 2 csv files with ~10,000 users each. I need to count how many users appear in csv1 and not in csv2. At the moment I have the code below. However I'm aware that this is probably extremely inefficient as it is potentially looping through up to 10,000 users 10,000 times. The code takes forever to run and I'm sure there must be a more efficient way. Any help or suggestions are appreciated I am fairly new to Powershell
foreach ($csv1User in $csv1) {
$found = $false
foreach ($csv2User in $csv2) {
if ($csv1User.identifier -eq $csv2User.identifier)
{
$found = $true
break
}
}
if ($found -ne $true){
$count++
}
}
If you replace your nested loops with 2 HashSet's, you'll have two ways of calculating the exception between the two:
Using SymmetricExceptWith()
The HashSet<T>.SymmetricExceptWith() function allows us to calculate the subset of terms that exist in either collection but not in both:
# Create hashset from one list
$userIDs = [System.Collections.Generic.HashSet[string]]::new([string[]]$csv1.identifier)
# Pass the other list to `SymmetricExceptWith`
$userIDs.SymmetricExceptWith([string[]]$csv2.identifier)
# Now we have an efficient filter!
$relevantRecords = #($csv1;$csv2) |Where-Object { $userIDs.Contains($_.identifier) } |Sort-Object -Unique identifier
Using a sets to track duplicates
Similarly we can use hash sets to keep track of which terms that have been observed at least once, and which ones has been seen more than once:
# Create sets for tracking
$seenOnce = [System.Collections.Generic.HashSet[string]]::new()
$seenTwice = [System.Collections.Generic.HashSet[string]]::new()
# Loop through whole superset of records
foreach($record in #($csv1;$csv2)){
# Always attempt to add to the $seenOnce set
if(!$seenOnce.Add($record.identifier)){
# We've already seen this identifier once, add it to $seenTwice
[void]$seenTwice.Add($record.identifier)
}
}
# Just like the previous example, we now have an efficient filter!
$relevantRecords = #($csv1;$csv2) |Where-Object { $seenOnce.Contains($_.identifier) -and -not $seenTwice.Contains($_.identifier) } |Sort-Object -Unique identifier
Using a hash table as a grouping construct
You could also use a dictionary type (like a [hashtable] for example) to group records from both csv files based on their identifier, and then filter on number of record values in each dictionary entry:
# Groups records on their identifier value
$groupsById = #{}
foreach($record in #($csv1;$csv2)){
if(-not $groupsById.ContainsKey($record.identifier)){
$groupsById[$record.identifier] = #()
}
$groupsById[$record.identifier] += $record
}
# Filter based on number of records with a distinct identifier
$relevantRecords = $groupsById.GetEnumerator() |Where-Object { $_.Value.Count -eq 1 } |Select-Object -Expand Value
If you're just looking for the count then this should be much faster.
$csv2 = Import-Csv $csvfile2
Import-Csv $csvfile1 |
Where-Object identifier -in $csv2.identifier |
Measure-Object | Select-Object -ExpandProperty Count
Here's a small example
$csvfile1 = New-TemporaryFile
$csvfile2 = New-TemporaryFile
#'
identifier
bob
sally
john
sue
'# | Set-Content $csvfile1 -Encoding UTF8
#'
identifier
bill
sally
john
stan
'# | Set-Content $csvfile2 -Encoding UTF8
$csv2 = Import-Csv $csvfile2
Import-Csv $csvfile1 |
Where-Object identifier -in $csv2.identifier |
Measure-Object | Select-Object -ExpandProperty Count
Output is simply
2

Compare columns between 2 files and delete non common columns using Powershell

I have a bunch of files in folder A and their corresponding metadata files in folder B. I want to loop though the data files and check if the columns are the same in the metadata file, (since incoming data files could have new columns added at any position without notice). If the columns in both files match, no action to is to be taken. If Data file has more columns than metadata file, then those columns should be deleted from incoming data file. Any help would be appreciated. Thanks!
Data file is ps_job.txt
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
Meta data file is ps_job_metadata.dat
“empid”|”name”|”zipcode”|”salary”
I would like my output to be
“empid”|”name”|”zipcode”|”salary”
“1”|”Tom”|”11111″|”1000″
“2”|”Ann”|”22222″|”2000″
That's a seemingly simple question with a very complicated answer. However, I've broken down the code for what you will need to do. Here are the steps that need to happen in order for powershell to do everything you're asking of it.
Read the .dat file
Save the .dat data into an object
Read the .txt file
Save the .txt header into an object
Check for the differences
Delete the old text file (that had too many columns)
Create a new text file with the new columns
I've made some assumptions in how this looks. However, with the way I've structured the code, it should be easy enough to make modifications as necessary if my assumptions are wrong. Here are my assumptions:
The text file will always have all of the columns that the DAT file has (even though it will sometimes have more)
The dat file is structured like a text file and can be directly imported into powershell.
And here is the code, with comments. I've done my best to explain the purpose of each section, but I've written this with the expectation that you have a basic knowledge of powershell, especially arrays. If you have questions I'll do my best to answer, though I'll ask that you refer to the section of code you have questions on.
###
### The paths. I'm sure you will have multiples of each file. However, I didn't want to attempt to pull in
### the files with this sample code as it can vary so much in your environment.
###
$dat = "C:\StackOverflow\thingy.dat"
$txt = "C:\stackoverflow\ps_job.txt"
###
### This is the section to process the DAT file
###
# This will read the file and put it in a variable
$dat_raw = get-content -Path $dat
# Now, let's seperate out the punctuation and give us our object
$dat_array = $dat_raw.split("|")
$dat_object = #()
foreach ($thing in $dat_array)
{
$dat_object+=$thing.Replace("""","")
}
###
### This is the section to process the TXT file
###
# This will read the file and put it into a variable
$txt_raw = get-content -Path $txt
# Now, let's seperate out the punctuation and give us our object
$txt_header_array = $txt_raw[0].split("|")
$txt_header_object = #()
foreach ($thing in $txt_header_array)
{
$txt_header_object += $thing.Replace("""","")
}
###
### Now, let's figure out which columns we're eliminating (if any)
###
$x = 0
$total = $txt_header_object.count
$to_keep = #()
While ($x -le $total)
{
if ($dat_object -contains $txt_header_object[$x])
{
$to_keep += $x
}
$x++
}
### Now that we know which objects to keep, we can apply the changes to each line of the text file.
### We will save each line to a new variable. Then, once we have the new variable, we will delete
### The existing file with a new file that has only the data we want.Note, we will only run this
### Code if there's a difference in the files.
if ($total -ne $to_keep.count)
{
### This first section will go line by line and 'fix' the number of columns
$new_text_file = #()
foreach ($line in $txt_raw)
{
if ($line.Length -gt 0)
{
# Blank out the array each time
$line_array = #()
foreach ($number in $to_keep)
{
$line_array += ($line.split("|"))[$number]
}
$new_text_file += $line_array -join "|"
}
else
{
$new_text_file +=""
}
}
### This second section will delete the original file and replace it with our good
### file that has been created.
Remove-item -Path $txt
$new_text_file | out-file -FilePath $txt
}
This small example can be a start for your solution :
$ps_job = Import-Csv D:\ps_job.txt -Delimiter '|'
$ps_job_metadata = (Get-Content D:\ps_job_metadata.txt) -split '\|'-replace '"'
foreach( $d in (Compare-Object $column $ps_job_metadata))
{
if($d.SideIndicator -eq '<=')
{
$ps_job | %{ $_.psobject.Properties.Remove($d.InputObject) }
}
}
$ps_job | Export-Csv -Path D:\output.txt -Delimiter '|' -NoTypeInformation
I tried this and it works.
$outputFile = "C:\Script_test\ps_job_mod.dat"
$sample = Import-Csv -Path "C:\Script_test\ps_job.dat" -Delimiter '|'
$metadataLine = Get-Content -Path "C:\Script_test\ps_job_metadata.txt" -First 1
$desiredColumns = $metadataLine.Split("|").Replace("`"","")
$sample | select $desiredColumns | Export-Csv $outputFile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Please note that the smart quotes are in consistent over the rows and there are empty lines between the rows (I highly recommend to reformat/update your question).
Anyways, as long as the quoting of the header is consistent between the two (ps_job.txt and ps_job_metadata.dat) files:
# $JobTxt = Get-Content .\ps_job.txt
$JobTxt = #'
“empid”|”name”|”deptid”|”zipcode”|”salary”|”gender”
“1”|”Tom”|”10″|”11111″|”1000″|”M”
“2”|”Ann”|”20″|”22222″|”2000″|”F”
'#
# $MetaDataTxt = Get-Content .\ps_job_metadata.dat
$MetaDataTxt = #'
“empid”|”name”|”zipcode”|”salary”
'#
$Job = ConvertFrom-Csv -Delimiter '|' $JobTxt
$MetaData = ConvertFrom-Csv -Delimiter '|' (#($MetaDataTxt) + 'x|')
$Job | Select-Object $MetaData.PSObject.Properties.Name
“empid” ”name” ”zipcode” ”salary”
------- ------ --------- --------
“1” ”Tom” ”11111″ ”1000″
“2” ”Ann” ”22222″ ”2000″
Here's the same answer I posted to your question on Powershell.org
$jobfile = "ps_job.dat"
$metafile = "ps_job_metadata.dat"
$outputfile = "some_file.csv"
$meta = ((Get-Content $metafile -First 1 -Encoding UTF8) -split '\|')
Class ColumnSelector : System.Collections.Specialized.OrderedDictionary {
Select($line,$meta)
{
$meta | foreach{$this.add($_,(iex "`$line.$_"))}
}
ColumnSelector($line,$meta)
{
$this.select($line,$meta)
}
}
import-csv $jobfile -Delimiter '|' |
foreach{[pscustomobject]([columnselector]::new($_,$meta))} |
Export-CSV $outputfile -Encoding UTF8 -NoTypeInformation -Delimiter '|'
Output
PS C:\>Get-Content $outputfile
"empid"|"name"|"zipcode"|"salary"
"1"|"Tom"|"11111"|"1000"
"2"|"Ann"|"22222"|"2000"
Provided you want to keep those curly quotes and your code page and console font supports all the characters, you can do the following:
# Create array of properties delimited by |
$headers = (Get-Content .\ps_job_metadata.dat -Encoding UTF8) -split '\|'
Import-Csv ps_job.dat -Delimiter '|' -Encoding utf8 | Select-Object $headers

using -replace to remove a string with special characters from cells in a csv

I have a CSV file like:
"localpath"
"C:\Users\calabresel"
"C:\Users\goslinep"
"C:\Users\deangelisr"
"C:\Users\bannont"
"C:\Users\goodwind"
I am looking for a way to isolate just the username from each field. I will then query the AD to determine if each user is disabled or enabled. I haven't been able to figure out how to get just the last piece though. My idea was to use -replace to replace the identical string with null like this:
$txt = import-csv paths1.csv | % {$_.localpath = $_.localpath -replace "C:\Users\", ""}
That came back with invalid regular expression pattern errors though which I assumed was a result of the target string containing special characters (the backslashes). I then started looking for a way to get powershell to take the \ literally instead. That lead me to try this:
$txt = import-csv paths1.csv | % {$_.localpath = $_.localpath -replace [Regex]::Escape("C:\\Users\\"), ""}
and this
$txt = import-csv paths1.csv | % {$_.localpath = $_.localpath -replace "C:\\Users\\", ""}
both of those methods stop the invalid regular expression errors and just return me a fresh line without complaining. however when I print the $txt variable it is empty...
I'm certain I am approaching this problem from the wrong angle and/or with improper syntax but I could use some guidance as I just started working with powershell a week ago.
any help provided would be greatly appreciated.
The following will import the CSV file and then get the leaf of the path. I.e the user name.
$txt = Import-Csv paths1.csv | ForEach-Object { Split-Path $_.localpath -leaf }
If you still want to use your replace method, just take out the $_.localpath = part and it should work.
$txt = Import-Csv C:\##Scatch\test.csv | % { $_.localpath -replace "C:\\Users\\", ""}
The reason why you aren't getting anything back into $txt is that you update a property of $_ but don't return $_.
Assuming that you want to use the regex rather than Split-Path
$txt = import-csv C:\temp\test.csv | % {
$_.localpath = $_.localpath -replace "C:\\Users\\", ""
$_
}
Or
$txt = import-csv C:\temp\test.csv | % {
$_.localpath -replace "C:\\Users\\", ""
}
other solution
Get-Content "C:\temp\test.txt" | select #{N="Value";E={$_.split('\')[-1].replace('"', '')}} -Skip 1

Need help on PowerShell column looping

I have one requirement which should be done in windows PowerShell or command line. I need to split CSV file columns into .txt files.
customer.csv:
id,name
1,a
2,b
I need to split columns into text files (here rows and columns count are dynamic)
The output text files should be as follows:
id.txt:
1
2
name.txt:
a
b
I found the following script with the help of Google:
$a = Import-Csv "D:\Final\customer.csv"
$b = $a[0] | Get-Member | select -Skip 1 | ? { $_.membertype -eq 'noteproperty'}
$b | % { $a | ft -Property $_.name | out-file "$($_.name).txt" }
But the output text files are coming with column names, spaces & etc.. I am unable to customize the above code. Kindly provide any help and let me know if any one needs more information.
Thank you,
Satish Kumar
The problem with your code is the use of ft (Format-Table) which formats data from the CSV file, thus the spaces.
The following PowerShell script is cleaner way to do it:
$csv = Import-Csv -Path 'D:\Final\customer.csv'
$columns = $csv | Get-Member -MemberType NoteProperty
foreach( $c in $columns )
{
foreach( $line in $csv )
{
Add-Content -Path $( $c.Name + '.txt' ) -Value $line.$( $c.Name )
}
}

script to find given string and replace in all files in given directory

How to write script in powershell which finds given string in all files in given directory and changes it to given second one ?
thanks for any help,
bye
Maybe something like this
$files = Get-ChildItem "DirectoryContainingFiles"
foreach ($file in $files)
{
$content = Get-Content -path $file.fullname
$content | foreach {$_ -replace "toreplace", "replacewith"} |
Set-Content $file.fullname
}
If the string to replace spans multiple lines then using Get-Content isn't going to cut it unless you stitch together the output of Get-Content into a single string. It's easier to use [io.file]::ReadAllText() in this case e.g.:
Get-ChildItem | Where {!$_.PSIsContainer} |
Foreach { $txt = [IO.File]::ReadAllText($_.fullname);
$txt -replace $old,$new; $txt | Out-File $_}
Note with with $old, you may need to use a regex directive like '(?s)' at the beginning to indicate that . matches newline characters also.
I believe that you can get the list of all files in a directory (simple?). Now comes the replacement part. Here is how you can do it with power shell:
type somefile.txt | %{$_ -replace "string_to_be_replaces","new_strings"}
Modify it as per your need. You can also redirect the output to a new file the same way you do other redirection (using: >).
To get the list of files, use:
Get-ChildItem <DIR_PATH> -name

Resources