Powershell v2, getting specific lines from file, sorting - sorting

I have a text file with a simple structure, which is actually the content of an ftp:
1.0
1.0a
10.0
10.0b
11.0
11.0f
2.0
3.0
4.0
...(and so on)
random string
random string
I'm using get-content to get the contents of the file but then I want to be able to retrieve only the lines that contain the max number and the max-1 number. In this case for example I would want it to return:
10.0
10.0b
11.0
11.0f
I tried using sort-object but didn't work. Is there a way to use sort-object in such a manner so it knows it is sorting numbers and not strings(so that it doesn't place 10 after 1), then sort according to the digits before the full stop and ignore the random strings at the end alltogether...
Or if you have another method to suggest please do so... Thank you.

You can pass scriptblocks to some cmdlets, in this case Sort-Object and Group-Object. To clarify a bit more:
Load the data
Get-Content foo.txt |
Group by the number (ignoring the suffix, if present):
Group-Object { $_ -replace '\..*$' } |
This will remove non-digits at the end of the string first and use the remainder of the string (hopefully now just containing a floating-point number) as the group name.
Sort by that group name, numerically.
Sort-Object { [int] $_.Name } |
This is done simply by converting the name of the group to a number and sort by that, similar to how we grouped by something derived from the original line.
Then we can get the last two groups, representing all lines with the maximum number and second-to-maximum number and unwrap the groups. The -Last parameter is fairly self-explanatory, the -ExpandProperty selects the values of a property instead of constructing a new object with a filtered property list:
Select-Object -Last 2 -ExpandProperty Group
And there we are. You can try this pipeline in various stages just to get a feeling for what the commands to:
PS Home:\> gc foo.txt
1.0
1.0a
10.0
10.0b
11.0
11.0f
2.0
3.0
4.0
PS Home:\> gc foo.txt | group {$_ -replace '\..*$'}
Count Name Group
----- ---- -----
2 1.0 {1.0, 1.0a}
2 10.0 {10.0, 10.0b}
2 11.0 {11.0, 11.0f}
1 2.0 {2.0}
1 3.0 {3.0}
1 4.0 {4.0}
PS Home:\> gc foo.txt | group {$_ -replace '\..*$'} | sort {[int]$_.Name}
Count Name Group
----- ---- -----
2 1.0 {1.0, 1.0a}
1 2.0 {2.0}
1 3.0 {3.0}
1 4.0 {4.0}
2 10.0 {10.0, 10.0b}
2 11.0 {11.0, 11.0f}
PS Home:\> gc foo.txt | group {$_ -replace '\..*$'} | sort {[int]$_.Name} | select -l 2 -exp group
10.0
10.0b
11.0
11.0f
If you need the items within the groups (and this in the final result for the last two groups) sorted by suffix, you can stick another Sort-Object directly after the Get-Content.

You can pass an expression to Sort-Object, the sort will then use that expression to sort the objects. This is done by passing a hash table with key expression (can be abbreviated to e). To reverse the order add a second key descending (or d) with value $true.
In your case
...input... | Sort #{e={convert $_ as required}}
Multiple property names and hash tables can be supplied: so 11.0f could be split into a number and suffix.
If there is a lot of overlap between the sort expressions you could pre-process the input into objects with the sort properties first (and remove after):
...input... | %{
if ($_ -match '^(\d+\.0)(.)?') {
new-object PSObject -prop #{value=$_; a=[double]::Parse($matches[1]); b=$matches[2] }
} else {
new-object PSObject -prop #{value=$_; a=[double]::MinValue; b=$null }
}
} | sort a,b | select -expand value

Related

Extract multiple columns from multiple test files in powershell

I got 450 files from computational model calculations for a nanosystem. Each of these files contain top three lines with Title, conditions and date/time. The fourth line has column labels (x y z t n m lag lead bus cond rema dock). From fifth line data starts upto 55th line. There are multiple spaces as delimiter. Spaces are not fixed.
I want to
I) create new text files with only x y z n m rema columns
Ii
II) I want only x y z and n values of all txt files in a single file
How to do it in powershell, plz help!
Based on your description, I guess the content of your files looks something like this:
Title: MyFile
Conditions: Critical
Date: 2020-02-23T11:33:02
x y z t n m lag lead bus cond rema dock
sdasd asdfafd awef wefaewf aefawef aefawrgt eyjrteujer bhtnju qerfqeg 524rwefqwert q3tgqr4fqr4 qregq5g
avftgwb ryhwtwtgqreg efqerfe rgwetgq ergqreq erwf ef 476j q4 w4th2 ef 42r13gg asdfasdrv
You can always read files like that by typing them out, line by line and only keep the lines you actually want. In your case, the data is in line 4-55 (including headers).
To get to that data, you can use this command:
Get-Content MyFile.txt | Select-Object -skip 3 -First 51
If you can confirm, that the data is the data you want, you can start working on the next issue - the multiple spaces delimiter issue.
Since (the number of) spaces are not fixed, you need to replace multiple spaces by a single space. Assuming that the values you are looking for are without spaces, you can add this to your pipeline:
Get-Content C:\MyFile.txt | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}
The '( )+' part means one or more spaces.
Now you have proper csv data. To convert this to a proper object, you just need to convert the data from csv like this:
ConvertFrom-Csv -InputObject (Get-Content C:\MyFile.txt | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}) -Delimiter ' '
From here it is pretty simple to select the values you want:
ConvertFrom-Csv -InputObject (Get-Content C:\MyFile.txt | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}) -Delimiter ' ' | Select-Object x,y,z,n,m,rema
You also need to get all the files done, so you might start by getting the files like this:
foreach ($file in (Get-Content C:\MyFiles)){
ConvertFrom-Csv -InputObject (Get-Content $file.fullname | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}) -Delimiter ' ' | Select-Object x,y,z,n,m,rema
}
You might want to split up the code into a more read-able format, but this should pretty much cover it.

find string with most occurrences in .txt file with powershell

I'm currently working on a school assignment in powershell and I have to display the word longer then 6 characters with the most occurences from a txt file. I tried this code but it's returning the number of occurrences for each word and it's not what i need to do. Please help.
$a= Get-Content -Path .\germinal_split.txt
foreach($object in $a)
{
if($object.length -gt 6){
$object| group-object | sort-object -Property "Count" -Descending | ft -Property ("Name", "Count");
}
}
From the question we don't know what's in the text file. The approaches so far will only work if there's only 1 word per line. I think something like below will work regardless:
$Content = (Get-Content 'C:\temp\test12-01-19' -raw) -Split "\b"
$content |
Where-Object{$_.Length -ge 6} |
Group-Object -Property Length -NoElement | Sort-Object count | Format-Table -AutoSize
Here I'm reading in the file as a single string using the -Raw parameter. Then I'm splitting on word boundaries. Still use Where to filter out words shorter than 6 characters. Now use Group-Object against the length property as seen in the other examples.
I don't use the word boundary RegEx very often. My concern is it might be weird around punctuation, but my tests look pretty good.
Let me know what you think.
You can do something like the following:
$a = Get-Content -Path .\germinal_split.txt
$a | Where Length -gt 6 | Group-Object -NoElement | Sort-Object Count -Descending
Explanation:
Where specifies the Length property's condition. Group-Object -NoElement leaves off the Group property, which contains the actual object data. Sort-Object sorts the grouped output in ascending order by default. Here the Count property is specified as the sorted property and the -Descending parameter reverses the default sort order.

Read latest content from multiple CSV files and sort-Logic

I need to get a logic for doing one type of sorting/filtering using multiple CSV files. The problem is I have 2 two CSV files with some investment content in to. The data would like this:
File A_11012015_123.csv(Time stamp appended)
TktNo, AcctID, Rate
1 1 187
2 1 145
7 2 90
File A_12012015_1345.csv(Timestamp appended)
TktNo, AcctID, Rate
1 2 190
2 2 187
3 5 43
Expected output file content
TktNo, AcctID, Rate
1 2 190
2 2 187
3 5 43
7 2 90
Things have tried , not the exact one
$output=#()
foreach(..)(multple files)
{
$output+=Get -Content -path $csv| sort "TktNo: - Unique
}
export $output
Conditions for the output file
Ticket number should be unique and sorted
if there is same ticket number in both files the content of the latest file should be added to the output file.
As this part of migration to power shell and again I am also a beginner, I appreciate if anybody can help me with the implementation.
This code assumes a couple of things that I tried to address in the comments. More description to follow.
Get-ChildItem C:\temp -Filter *.csv | ForEach-Object{
$rawDate = ($_.BaseName -split "_",2)[1]
$filedate = [datetime]::ParseExact($rawDate,"MMddyyyy_HHmmss",[System.Globalization.CultureInfo]::CurrentCulture)
Import-csv $_ | Add-Member -MemberType NoteProperty -Name "Date" -Value $filedate -PassThru
} | Group-Object tktno | ForEach-Object{
$_.Group | Sort-Object Date | Select -Last 1
} | Select-Object TktNo,AcctID,Rate | Sort-Object TktNo
Assumptions
All your csv files are in one location like "c:\temp". Add -recurse if you need to
You say that your date format is "mmddyyyy_hhmmss". The example file times do not coorespond with that. I editing the file names as such to use "MMddyyyy_HHmmss". "File A_11012015_123321.csv" and "File A_12012015_134522.csv"
Breakdown
Couple of ways to do this but a simple one that we used here is Group-Object. As long as you don't have 100's of these files with 1000's of entries it should do the trick.
Take each file and for every entry append its file data with Import-csv $_ | Add-Member -MemberType NoteProperty -Name "Date" -Value $filedate -PassThru. For example you would have:
TktNo AcctID Rate Date
----- ------ ---- ----
1 1 187 11/1/2015 12:33:21 PM
2 1 145 11/1/2015 12:33:21 PM
7 2 90 11/1/2015 12:33:21 PM
We take all of these files and group them together based on tktno. Of each group that is created sort them by date property we created earlier and return the entry that is the current one using Select -Last 1. Drop the date property and sort the remaining data on tktno
As for output you could just append this to the end.
} | Select-Object TktNo,AcctID,Rate | Sort-Object TktNo | Export-CSV "C:\somepath.csv" -NoTypeInformation

Powershell sort and filter

I have a csv file containing detailed data, say columns A,B,C,D etc. Columns A and B are categories and C is a time stamp.
I am trying to create a summary file showing one row for each combination of A and B. It should pick the row from the original data where C is the most recent date.
Below is my attempt at solving the problem.
Import-CSV InputData.csv | `
Sort-Object -property #{Expression="ColumnA";Descending=$false}, `
#{Expression="ColumnB";Descending=$false}, `
#{Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)};Descending=$true} | `
Sort-Object ColumnA, ColumnB -unique `
| Export-CSV OutputData.csv -NoTypeInformation
First the file is read, then everything is sorted by all 3 columns, the second Sort-Object call is supposed to then take the first row of each. However, Sort-Object with the -unique switch seems to pick a random row, rather than the first one. Thus this does get one row for each AB combination, but not the one corresponding to most recent C.
Any suggestions for improvements? The data set is very large, so going through the file line by line is awkward, so would prefer a powershell solution.
You should look into Group-By. I didn't create a sample CSV (you should provide it :-) ) so I haven't tested this out, but I think it should work:
Import-CSV InputData.csv | `
Select-Object -Property *, #{Label="DateTime";Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)}} | `
Group-Object ColumnA, ColumnB | `
% {
$sum = ($_.Group | Measure-Object -Property ColumnD -Sum).Sum
$_.Group | Sort-Object -Property "DateTime" -Descending | Select-Object -First 1 -Property *, #{name="SumD";e={ $sum } } -ExcludeProperty DateTime
} | Export-CSV OutputData.csv -NoTypeInformation
This returns the same columns that was inputted(datetime gets excluded from the output).

Powershell Sort with Custom Sorting Expression

I have a directory containing numbered directories:
Archive
|-1
|-2
|-3
|-...
I need to create the next directory numerically. For which I am currently doing
$lastArchive = ls .\Archive | sort Name | select -Last 1
$dirName = '1'
if($lastArchive) {
$dirName = ([int]$lastArchive.Name)+1
}
This of course fails once we get to 10 which by sorting rules follows after 1 not 9. I need the sort expression to actually be [int]$_.Name - how would I do this?
I think you need to change that first line as follows:
$lastArchive = ls .\Archive |
Sort-Object -property #{Expression={[int]$_.Name}} |
Select-Object -Last 1
Then, you can create the next directory in numerical order like this:
mkdir ([int]$lastArchive.Name + 1).ToString()

Resources