I am having a hard time with powershell (because I am learning it in the run). I have huuuge amount of data and I am trying to find a unique identifier for every folder with data. I wrote a script which is just MD5-ing every folder recursively and saving the hash value for every folder. But as you might have already thought it is super slow. So I thought that I will hash only the metadata. But I have no idea how to do this in powershell. The ideas from the internet are not working and they return always the same hash value. Has anyone had similar problem? Is there a magic powershell trick to perform such task?
Sorry for lack of precision.
I have a big ~20000 list of folders. In every folder there are unique data, photos, files etc. I iterated through every folder and counted hash of every file (I actually made a crypto-stream here so I had a one hash for the data). This solution is taking ages.
The solution I wanted to adopt was using the metadata. Like those from this command:
Get-ChildItem -Path $Env:USERPROFILE\Desktop -Force | Select-Object -First 1 | Format-List *
But hashing this always gives me the same value even when something changed. I have to have a possibility to chceck if nothing has changed in those files.
First, create an MD5 class that does not create a new instance of System.Security.Cryptography.MD5 every time we create an MD5 from a string.
class MD5 {
static hidden [System.Security.Cryptography.MD5]$_md5 = [System.Security.Cryptography.MD5]::Create()
static [string]Create([string]$inputString) {
return [BitConverter]::ToString([MD5]::_md5.ComputeHash([Text.Encoding]::ASCII.GetBytes($inputString)))
}
}
Second, figure out a way to use each child items Name, Length, CreationTimeUtc, and LastWriteTimeUtc to create unique ID text per each child in the folder, merge into a single string and create an MD5 based on that resulting string.
Get the child objects of a folder.
Select only certain properties, returning the content as a string array.
Join the string array into a single string. No need for joining with newline.
Convert the string into an MD5.
Output the newly created MD5.
$ChildItems = Get-ChildItem -Path $Env:USERPROFILE\Desktop -Force
$SelectProperties = [string[]]($ChildItems | Select-Object -Property Name, Length, CreationTimeUtc, LastWriteTimeUtc)
$JoinedText = $SelectProperties -join ''
$MD5 = [MD5]::Create($JoinedText)
$MD5
Alternately, join the above lines into a very long command.
$AltMD5 = [MD5]::Create([string[]](Get-ChildItem -Path $Env:USERPROFILE\Desktop -Force | Select-Object -Property Name, Length, CreationTimeUtc, LastWriteTimeUtc) -join '')
$AltMD5
This resulting MD5 should be a unique signature of a folder's contents, not the folder itself, but only of the contents. So, you could in theory change the name of the folder itself and this MD5 would remain the same.
Not exactly sure how you aim to use this, but be aware that if any file, or sub-folder, in the folder changes, the MD5 for the folder will also change.
Continuing from my comment.
As per this resource
3rdP tool: http://www.idrix.fr/Root/Samples/DirHash.zip
function Get-FolderHash ($folder)
{
dir $folder -Recurse | ?{!$_.psiscontainer} |
%{[Byte[]]$contents += [System.IO.File]::ReadAllBytes($_.fullname)}
$hasher = [System.Security.Cryptography.SHA1]::Create()
[string]::Join("",$($hasher.ComputeHash($contents) |
%{"{0:x2}" -f $_}))
}
Note, that I've not tested/validated either of the above and will leave that to you.
Lastly, this is not the first time this kind of question has been asked via SO, using the default cmdlet and some .Net. So, this could be seen/markerd as a duplicate.
$HashString = (Get-ChildItem C:\Temp -Recurse |
Get-FileHash -Algorithm MD5).Hash |
Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))
Original, faster but less robust, method:
$HashString = Get-ChildItem C:\script\test\TestFolders -Recurse | Out-String
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]$HashString))
could be condensed into one line if wanted, although it starts getting
harder to read:
Get-FileHash -InputStream ([IO.MemoryStream]::new([char[]]"$(Get-ChildItem C:\script\test\TestFolders -Recurse|Out-String)"))
Whether it's faster or fast enough for your use case is a different matter. Yet, it does address ensuring you get a different hash based on target folder changes.
Related
So I want to know if any of the folders in a directory have any subfolders or files in them, I tried just looking at the directory in PowerShell but it gave me only mode, last write time, and name. Is there any way of adding to this list to include metadata of the folder like size or number of subfiles/folders all I want to know is if they are empty or not so there may be a simpler way I'm missing.
Thanks for any help you can give!
I see the question is tagged 'windows', so on Windows you could also use a COM object.
$fso = New-Object -ComObject Scripting.FileSystemObject
$folder = $fso.GetFolder($pathToFolder)
$folder will be an object with a bunch of interesting metadata on it, including SubFolders and Files. One of the interesting ones is Size. If Size is zero, there are no files in that directory, or in any nested subdirectories either.
If you just want to know if there are folders/subfolders and/or files then this will work:
$folder="C:\Test"
Get-ChildItem $folder -Recurse | Measure-Object
Output (in my case)
Count : 2
Average :
Sum :
Maximum :
Minimum :
Property :
If you want to see more properties then this might work for you:
Get-ChildItem -Path $folder -Recurse | Format-List *
alternatively you can also select the first x, last x, or even skip items:
Get-ChildItem -Path $folder -Recurse |Select-Object -First 2| Format-List *
*-Recurse will check all folders below
Is there a way to stop Powershell from sorting by default? I need to read in files from a directory and in the order which they are listed in the directory I need them to also be listed in the array (variable). Even when I use -lastwritetime on the get-childitem command, it seems to have no affect. The primary reason why I want to do this is because the files have names that are all the same except each file has a number after it like the following:
document1.doc
document2.doc
document3.doc
.....
document110.doc
The problem is if it's sorted by name, it will sort in this manner:
document1.doc
document10.doc
document111.doc
Which is horribly wrong!
Right now I have this command and it doesn't work:
$filesnames1 = get-childItem -name *.doc -Path c:\fileFolder\test | sort-object -LastWriteTime
You probably want something more along these lines.
$filesnames1 = Get-ChildItem -Path c:\fileFolder\test\*.doc |
Sort-Object -Property LastWriteTime
I don't think either of those two cmdlets have a -LastWriteTime parameter.
If you need only the names from those filesystem objects, you can use ($filesnames1).Name after the code above. There are other ways.
Thanks for responding Mike. What I did is put a "-filter *.pdf" just before -path which gave me the headers. Then I piped in a "select-object -ExpandProperty name" to list it exactly how I needed it to. It was a little trial and error but I did eventually figure it out.
$filesnames1 = Get-ChildItem -filter *.doc -Path c:\fileFolder\test |
Sort-Object -LastWriteTime | -ExpandProperty name
I am writing a PowerShell module to look for data that each user who has logged onto the computer at some point might have in their directory in HKEY_USERS. My initial thought was to mount HKEY_USERS, find a way to store each user's SID in a string variable, and then loop through all folders like so:
dir HKU\<STRING VARIABLE HOLDING SID>\Software\MyApp\Mydesireddata
Is there a way I can avoid having to loop through SIDs (because I won't know them ahead of time), and extract that file info from each SID on the system while remembering which SID it came from?
EDIT: Here is an example of the key I'm trying to extract from each user's SID using regedit (vncviewer's EulaAccepted)
Use Get-ChildItem to retrieve each user-specific subkey:
$UserHives = Get-ChildItem Registry::HKEY_USERS\ |Where-Object {$_.Name -match '^HKEY_USERS\\S-1-5-21-[\d\-]+$'}
Then loop over each entry and retrieve the desired registry value:
foreach($Hive in $UserHives)
{
# Construct path from base key
$Path = Join-Path $Hive.PSPath "SOFTWARE\MyApp\DataKey"
# Attempt to retrieve Item property
$Item = Get-ItemProperty -Path $Path -Name ValueName -ErrorAction SilentlyContinue
# Check if item property was there or not
if($Item)
{
$Item.ValueName
}
else
{
# doesn't exist
}
}
I tackled this issue a slightly different way; preferring to make use of a conspicuously placed wildcard.
Get-ItemProperty -Path Registry::HKEY_USERS\*\SOFTWARE\TestVNC\viewer\ -Name EulaAccepted |
Select-Object -Property #{n="SID";e={$_.PSPath.Split('::')[-1].Split('\')[1]}},EulaAccepted
The wildcard will automatically check all available paths and return what you need as well as the SID from the parent path.
As for the username (which is probably more useful than a SID), you didn't specifically ask for it, but I added it in for grins; this should cover local and domain accounts.
mind the line breaks
Get-ItemProperty -Path Registry::HKEY_USERS\*\SOFTWARE\TestVNC\viewer\ -Name EulaAccepted |
Select-Object -Property #{n="SID";e={$_.PSPath.Split('::')[-1].Split('\')[1]}},EulaAccepted |
Select-Object -Property #{n="User";e={[System.Security.Principal.SecurityIdentifier]::new($_.SID).`
Translate([System.Security.Principal.NTAccount]).Value}},SID,EulaAccepted
Getting the username is just ugly; there's likely a cleaner way to get it, but that's what I have in my head. The double-select really makes my skin crawl - there's something unpleasant about it. I could just do a one shot thing, but then it gets so unwieldly you don't even know what you're doing by looking at it.
I've included a screenshot of the registry below, and a screenshot of the screen output from running the few lines.
I have generated the list of MD5 checksum values from a directory within my project using Powershell's Get-FileHash function and then I exported the values to a .csv file.
$path = "C:\Users\Krishnaa\Documents\Visual Studio 2012\Projects\NamePrint\NamePrint\obj\Debug"
$hash = Get-FileHash -Path $path\* -Algorithm MD5
$export = $hash | Export-csv $path\hashfile.csv
This is how the output looks like if I call on $hash: http://i.stack.imgur.com/Owi0Q.png
Then I imported the .csv file back to the Powershell console.
$import = Import-csv $path\hashfile.csv | Format-Table
And when I call on $import, it outputs this : http://i.stack.imgur.com/cqvsO.png
When I created a simple function of my own to compare both the contents, I encounter problem whereby it says the the contents do not match. I do understand that each line in a .csv is treated as an object by Powershell. How to compare object-to-object in Powershell?
One problem with you above code is your use of Import-CSV. You aren't assigning the objects returned by Import-CSV to $import, you're assigning the array of formatting objects returned by Format-Table. If you drop the Format-Table, you should be able to compare $import.hash with $hash.hash (although you may need to loop through and compare row by row).
I'm looking to thin down how many folders I need to recover after a cryptolocker outbreak at a clients site and started looking into powershell as a good way to do this. What I need to do is recover a folder if it has any file inside with the extension .encrypted.
I can run the below
get-childitem C:\ -recurse -filter “*.encrypted” | %{$_.DirectoryName} | Get-Unique
And get a list of all folders that have .encrypted files in them but what I would like to do is thin down the list for example if we have the below file list and assume * means the folder contains encrypted files.
C:\Folder1
C:\Folder1\Folder2\Folder4*
C:\Folder1\Folder2*
C:\Folder1\Folder3\Folder5*
C:Folder1\Folder3\Folder6\
rather than returning
C:\Folder1\Folder2\Folder4*
C:\Folder1\Folder2*
C:\Folder1\Folder3\Folder5*
I would like it just to return as this would be the optimal recovery option.
C:\Folder1\Folder2*
C:\Folder1\Folder3\Folder5*
I know this is a fairly complex problem so I'm not asking anyone to solve it for me just some pointers in the right direction would be awesome as my brain is fried at the moment and I need to write this fairly quickly.
Here's a simple way to do this that should be pretty efficient:
PS C:\> dir -ad -rec | where { test-path (join-path $_.FullName *.encrypted) }
dir is an alias for get-childitem
where is an alias for where-object
-ad means return directories only
-rec means recurse
test-path returns $true if the path exists (yes, it handles wildcards)
S, we recurse through all folders forwarding the folder object down the pipeline. We get the full name of the folder and append *.encrypted to it. If test-path returns $true for this path, we forward the folder down the pipeline. The folder ends up in the console output.
Now, if you want to get a little fancier, here's a more fleshed out one-liner than will report the folders and the encrypted files count into a csv file named after the machine:
dir -ad -rec | ? { test-path (join-path $_.FullName *.txt) } | % {
[pscustomobject]#{"Path"=$_.fullname;"Count"=(dir (join-path $_ *.txt)).count}} |`
Export-Csv "c:\temp\$(hostname).csv" -NoTypeInformation
(? and % are aliases for where-object and foreach-object respectively)
With a little more effort, you could use a fan-out scan of the entire company assuming powershell remoting is enabled on each target machine and have it return all results to you from all machines.
Good luck!
This is too much for a comment, but I don't know that it would be a good answer, just a kind of hackish way to get it done...
The only thing I could think of is to get your list of folders, then start matching them all against each other, and when you get two that at least partially match remove the longer one.
$FullList = GCI C:\*.encrypted | Select -Expand DirectoryName -Unique | Sort -Property Length
$ToRemove = #()
foreach($Folder in $FullList){$ToRemove+=$FullList| Where{$_ -ne $Folder -and ($_ -match [regex]::Escape($Folder))}}
$FinalList = $FullList | Where{$ToRemove -notcontains $_}
That's going to be slow though, there has to be a better way to do it. I just haven't thought of a better way yet.
Don't get me wrong, this will work, and it's faster than going through things by hand for sure, but I'm sure that there has to be a better way to do it.