Problem: I'm working on making a PowerShell script that will download the sites source code, find all the file targets, and then download said targets. I'm alright for authentication for the moment, so on my test website, I enabled anonymous authentication, enabled directory browsing, and disabled all other default pages, so all I get is a list of files on my site. What I have so far is this:
$source = "http://testsite/testfolder/"
$webclient = New-Object system.net.webclient
$destination = "c:/users/administrator/desktop/test/"
$webclient.downloadstring($source)
The $webclient.downloadstring will return basically the source code of my site, and I can see the files I want wrapped in the rest of the code. My question to you guys is what is the best and/or easiest ways of isolating the links I want so I can do a foreach command to download all of them?
Also, for extra credit, how would I go about adding in code to download folders and the files within those folders from my site? I can at least make seperate scripts to pull the files from each subfolder, but obviously it would be much nicer to get it all in one script.
If you are on PowerShell v3 the Invoke-WebRequest cmdlet may be of help.
To get an object representing the website:
Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell"
To get all the links in that website:
Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links
And to just get a list of the href elements:
Invoke-WebRequest "http://stackoverflow.com/search?tab=newest&q=powershell" | select -ExpandProperty Links | select href
If you are on PowerShell v2 or earlier you'll have to create an InternetExplorer.Application COM object and use that to navigate the page:
$ie = new-object -com "InternetExplorer.Application"
# sleep for a second while IE launches
Start-Sleep -Seconds 1
$ie.Navigate("http://stackoverflow.com/search?tab=newest&q=powershell")
# sleep for a second while IE opens the page
Start-Sleep -Seconds 1
$ie.Document.Links | select IHTMLAnchorElement_href
# quit IE
$ie.Application.Quit()
Thanks to this blog post where I learnt about Invoke-WebRequest.
Update:
One could also download the website source like you posted and then extract the links from the source. Something like this:
$webclient.downloadstring($source) -split "<a\s+" | %{ [void]($_ -match "^href=[`'`"]([^`'`">\s]*)"); $matches[1] }
The -split part splits the source along lines that start with <a followed by one or more spaces. The output is placed in an array which I then pipe through a foreach-object block. Here I match each line on the regexp which extracts the links part and outputs it.
If you want to do more with the output you can pipe it further through another block which does something with it.
Related
I was trying to use dir command to list recursively all files that end with .cpp in a given directory, I tried to follow various solutions but my powershell seems not to accept any options after '/' sign as seen on the picture bellow:
Example
The command I initially tried was 'dir sourcefolder "*.cpp"' but it only lists files in a given folder (because I cant provide any additional options as seen in microsoft doc), also any example command provided there does not work for me giving the same error as shown in example above.
here is how I will bring out all the files in .cpp.
Here is a small program in powershell :
$path = "C:\temp\"
$filter = "*.cpp"
$files = Get-ChildItem -Path $path -Filter $filter
Write-Host "here, all the .cpp files in '$path' :"
Write-Host $files -Separator "`r`n"
I prefer to use the cmdlet "Get-ChildItem" rather than "dir".
Here the content folder for my test
And, why so many / ?
I found this script to download Javax64 and it really works, but I had some problems.
The first is that the command I would put would be inside an XML file that a powershell script calls it, so putting it directly like this, it gave some errors because where it shows "<a" the XML understood that this was part of it and not of something that only PowerShell would make use of.
The second is "New-Object -ComObject "InternetExplorer.Application" where this is not working on my Windows Server and it is recommended not to use it for ie it is being discontinued soon. It still works on Windows 10 normally but on Windows Server it gets stuck in a loop and won't get out.
How would I convert this script to an Invoke-WebRequest, is this possible? Because then I would just need to put the complete string of the Invoke-WebResquest in my XML file and PowerShell would read it normally, I think.
$ie = New-Object -ComObject "InternetExplorer.Application"
# Navigate to the requested page
$ie.Navigate2("https://www.java.com/en/download/manual.jsp")
$anchor = $null
while($anchor -eq $null -or $anchor -eq "")
{
#wait 1 second for the page to load
start-sleep -m 1000
#get the html of the page
$html = $ie.document.body.innerHTML
#apply your regex to identify the anchar with the download link
$anchor = [regex]::Match($html, '(?:<a title="Download Java software for Windows \(64-bit\)" href=")(.*)(?:">)').Groups[1] .Value
}
#regex doesn't return the link correctly, that's why I made the substring to get the link
$url_download = $anchor.Substring(0,$anchor.IndexOf(""""))
$url_download
Edit: There is the same situation but to download Edge?
Note: Neither Invoke-WebRequest or the built-in .NET clients for obtaining files over HTTP seem to support rendering the full DOM, and so JavaScript cannot be executed. JavaScript is required to access those downloads links and use the site in general. You have two choices:
Use static links as I have outlined in my original answer below; or
Automate Edge using WebDriver, which is how Microsoft recommends you automate MS Edge. There is no COM functionality for controlling the Edge browser.
Unfortunately, I cannot help with the latter as I have no experience using WebDriver.
Looking at the oraclejdk Chocolatey package installation script, the URL is
https://download.oracle.com/java/17/archive/jdk-17.0.2_windows-x64_bin.msi. Since you're already familiar with Chocolatey from another environment, I would figure the version you need, see if there is a Chocolatey package for it, and get the direct URL from that package version's installation script.
You could also attempt to templatize the URL like so:
https://download.oracle.com/java/MAJOR_VERSION/archive/jdk-MAJOR_VERSION.MINOR_VERSION.PATCH_windows-x64_bin.msi
where MAJOR_VERSION, MINOR_VERSION, and PATCH are pieces of the Java version. However, I have not tested that all Java MSI URLs follow this pattern.
Regardless, once you have the URL, it's as simple as:
# Work around performance issue with iwr and the progress bar
$ProgressPreference = 'SilentlyContinue'
$MSI_URL = 'https://download.oracle.com/java/17/archive/jdk-17.0.2_windows-x64_bin.msi'
Invoke-WebRequest -UseBasicParsing $MSI_URL -Outfile 'jdk-17.0.2_windows-x64_bin.msi'
I've just started using PowerShell and I have a task where I need to be able to have the file path displayed on screen when I enter the file name.
Is there a script that allows me to do the below ? :
Ex 1: I enter "test.txt" and I get "C:\Program Files...."
Ex 2: I enter a file name "My Documents" and I also get its path.
I have searched online on how to do this but I didn't quite find what I was looking for and all the queries/answers were too complicated for me to understand.
Can anyone help me out, please?
Thanks in advance!
Here is a starter sample for you.
This example search only within the confine of the paths present is the Path system environment variable. It also only looks for files and do not recurse through these path.
So anything you could access directly from the command line should be available to you through it.
Now, if you want to search the whole drive, you could replace the $DefaultPaths assignment with Get-ChildItem -Path 'C:' -Recurse but doing that each time won't be super efficient.
You could do it and it will work... but it will be slow.
To search on the whole drive or whole filesystem, there are alternative methods that might work better. Some examples of what might entice:
Using a database which you have to buld & maintain to index all the files so that when you search, results are instantaneous and / or very fast
Parsing the MFT table (if using Windows / NTFS filesystem only) instead of using Get-ChildItem (This is not somehting natively doable through a simple cmdlet though) .
Relying on a third party software and interface with (For example, Void Tools Everything search engine already parse MFT and build its own database, allowing users to search instantly through a Windows NTFS filesystem. It also have its own SDK you can plug in through Powershell and retrieve what you seek instantly. The caveats is that you need the software installed first for that solution to work.)
Example: Searching through all paths defined in the Path variable
# What you are looking for. Accept wildcards characters (*)
$Filter = 'notepad.exe'
# Get the System Environment Path variable in an array
$DefaultPaths = $env:Path -split ';'
$Paths =
Foreach ($P in $DefaultPaths) {
# Search for files matching the specified filter. Ignore errors (often if the path do not exist but is sin the Path)
$MatchingFiles = Get-ChildItem -Path $P -Filter $Filter -File -ErrorAction SilentlyContinue
if ($MatchingFiles.count -gt 0) {
$MatchingFiles.Directory.FullName
}
}
$Paths | out-string | Write-Host -ForegroundColor Cyan
Output for Notepad.exe search using this method.
C:\Windows\system32
C:\Windows
What I'm trying to accomplish:
Create a PS script to run from a single Admin machine, but search against C$ on all Windows servers in AD.
Search for a specific list of paths\filenames that I provide.
If ANY of the specific list of paths\filenames are found on a server, THEN output the server name, and paths\filenames to a *.CSV file titled "Badfiles.csv" on the Admin machine.
Was trying to build from the following syntax but admittedly my old brain is not good at this stuff, and I've only specified a single file here - how do I refer to a list of multiple paths\files? Thank you for helping an old lady out. :)
$name= gc env:computername
$computers= get-content -path C:\Windows\Temp\v.bat
$csvfile = "c:\temp\$badfiles.csv"
foreach ($computer in $computers) {
"\$computer\C$\" | Get-ChildItem -recurse -filter "*.bat"
}
To refer to a list of items whether those are files or computer names you will need to use what is called an Array.
You can create an array in many ways, in your case it might best to create a list in a txt file and afterwards in Powershell you read the list contents using get-content, save the result in a variable and it will automatically be saved as an array!
Then iterate through each of them using what is called a foreach loop, that basically lets you take each of the items in the array and do something with it, then move to the next item and so on until every item has been dealt with.
Now the most important part of what you want to achieve is not clear. Let me explain.
To check if a file exists you can use test-path. That will return true or false and you can then act upon the result of that. You need to define an exact path and name of a file to check for this to work.
If you don't know the exact names and paths of files that need to be checked, you can use Get-ChildItem similarly as you have done in the code you provided. The caveat here is that you have to narrow down the scope of the file search as much as you can. In your example you search for the .bat file extension on the whole machine and that can result in some issues. A typical C drive will have hundreds of thousands if not millions of files and folders. Parsing all of them can take a long time.
So this is an important distinction to understand and what causes confusion for me is that you say in "2. Search for a specific list of paths\filenames that I provide..." yet in the code you use Get-ChildItem to get all files instead of providing a list of filenames.
Further I will assume you have a list of filenames with exact known paths.
Now in your given code I can see you have found some of the right commands but they need to be arranged differently to produce the results you need.
Please review this example code that might help you further:
Example ComputerList.txt file content(list of computer hostnames to check):
computer1
serverXYZ
laptop123
Example FileList.txt file content(List of files to check for in each of the above computers):
c:\temp\virus.bat
c:\games\game.exe
c:\Pictures\personal.jpg
Now the PowerShell code:
# Gets the list of items from TXT files and saves them as arrays in variables
$ComputerNames = Get-Content 'c:\temp\ComputerList.txt'
$FileList = Get-Content 'c:\temp\FileList.txt'
# Define the path and name of CSV report
$BadFiles = "c:\temp\badfiles.csv"
# Define the foreach loop that will iterate through each hostname in computer list
foreach($computer in $ComputerNames){
# Define foreach loop that will iterate through each file in the list and test their path in the current computer in the current iteration
foreach($file in $FileList){
# Test the path of the current file in the loop and append the CSV file if it was found
# Convert the file path to C$ share path
$file = $file -replace("c:","c$")
# Define path of file to test
$FileToTest = "\\$computer\$file"
if (test-path $FileToTest -ErrorAction SilentlyContinue){
# This block will run only when a bad file has been found
# This part can be tricky but it is used to make sure we properly format the current bad file entry when we append it to the resulting CSV file
$BadFile = "" | select Computer,File
# Save information about current computer
$BadFile.computer = $computer
# Save information about current file
$BadFile.file = $file
# Append the entry to an array of found bad files
$BadFileList += $badfile
}
}
}
# When done iterating through every computer and file, save the results in a CSV file
$BadFileList | ConvertTo-Csv -NoTypeInformation | Out-File $BadFiles
The above is a full code snippet you can test and run in your environment. First please create the two TXT files and make sure you run PowerShell with the appropriate permissions to access the C$ network shares of the servers.
The snippet should work but I have not tested it myself. Let me know if there are any errors.
Please test and feel free to ask if you have any follow up questions.
What I am trying to do is download 2 images from URL's and open them after download. Here's what I have:
#echo off
set files='https://cdn.suwalls.com/wallpapers/cars/mclaren-f1-gtr-42852-400x250.jpg','http://www.dubmagazine.com/home/media/k2/galleries/9012/GTR_0006_EM-2014-12-21_04_GTR_007.jpg'
powershell "(%files%)|foreach{$fileName='%TEMP%'+(Split-Path -Path $_ -Leaf);(new-object System.Net.WebClient).DownloadFile($_,$fileName);Invoke-Item $fileName;}"
Im getting 'Cannot find drive' A drive with the name 'https' cannot be found.
It's the Split-path command that is having problems but cant seem to find a solution.
You could get away with basic string manipulation but, if the option is available, I would opt for using anything else that is data aware. In your case you could use the [uri] type accelerator to help with these. I would also just opt for pure PowerShell instead of splitting between batch and PS.
$urls = 'https://cdn.suwalls.com/wallpapers/cars/mclaren-f1-gtr-42852-400x250.jpg',
'http://www.dubmagazine.com/home/media/k2/galleries/9012/GTR_0006_EM-2014-12-21_04_GTR_007.jpg'
$urls | ForEach-Object{
$uri = [uri]$_
Invoke-WebRequest $_ -OutFile ([io.path]::combine($env:TEMP,$uri.Segments[-1]))
}
Segments will get you the last portion of the url which is a proper file name in your case. Combine() will build the target destination path for you. Feel free to add you invoke item logic of course.
This also lacks error handling if the url cannot be accessed or what not. So be aware of that possibility. The code above was meant to be brief to give direction.