I'm new with powershell and I am currently stuck with an issue.
I import a CSV file with 2 columns (ServerName, and Size)
like this :
Server | Size
-------------
SRV1 | 140
SRV2 | 120
SRV1 | 100
SRV1 | 140
SRV2 | 200
I want to add all Size values for each server, for example:
SRV2 = 120+200
SRV1 = 140+100+140
I have no idea how to do it.
I tried with a for loop, but the operation is done for each line, so my results are false.
Could anyone help me ?
Use:
the Group-Object cmdlet to group the CSV rows by server name (Server)
then use Select-Object to construct a single output object per group,
containing the server name and the sum of all the associated rows' Size values, obtained via a calculated property that uses the Measure-Object cmdlet:
Import-Csv file.csv | Group-Object Server |
Select-Object Name, #{ n='Size'; e={ ($_.Group | Measure-Object Size -Sum).Sum } }
If you want the first output column to be named Server, replace Name with #{ n='Server'; e='Name' }
With your sample data, the above yields:
Name Size
---- ----
SRV1 380
SRV2 320
Here is an example how you could do it:
$Data = Import-Csv -Path "yourfilepath" -Delimiter ";"
$SortedData = $Data | Group {$_.server}
$Hashtable = #{}
$SortedData.group | ForEach-Object {
if ($Hashtable.Contains($_.server)) {
$Hashtable[$_.server] += ",$($_.size)"
} else {
$Hashtable.Add($_.server,$_.size)
}
}
You need to change your delimiter in your case
Related
I'm trying to split a csv-file By UserID
UserId;FirstName;LastName;Start;End;Type;BreakInMinutes;DateOfCreation;DateOfUpdate
1206;Viktoria;Jechsmayr;2017-10-04 08:15:00.000;2017-10-04 16:15:00.000;work;30;04.10.2017 16:07;05.10.2017 12:31
1205;Brigitte;Jechsmayr;2017-10-05 12:15:00.000;2017-10-05 16:15:00.000;work;0;05.10.2017 12:32;05.10.2017 16:15
1207;Lisa;Jechsmayr;2017-10-06 08:40:00.000;2017-10-06 12:00:00.000;work;0;05.10.2017 15:51;06.10.2017 08:42
1206;Viktoria;Jechsmayr;2017-10-09 08:25:00.000;2017-10-09 16:35:00.000;work;30;09.10.2017 08:23;09.10.2017 16:34
1204;Karl;Jechsmayr;2017-10-11 08:15:00.000;2017-10-11 16:30:00.000;work;60;11.10.2017 08:24;11.10.2017 16:14
1204;Karl;Jechsmayr;2017-10-12 12:30:00.000;2017-10-12 16:45:00.000;work;0;12.10.2017 12:39;12.10.2017 16:43
1205;Brigitte;Jechsmayr;2017-10-13 08:10:00.000;2017-10-13 12:25:00.000;work;0;13.10.2017 08:13;16.10.2017 07:41
1207;Lisa;Jechsmayr;2017-10-16 07:30:00.000;2017-10-16 17:05:00.000;work;30;16.10.2017 07:41;16.10.2017 17:05
I'm trying to split the file (>750.000 rows) by the UserId-Column (1400 distinct userids).
All Datasets by one UserId should be moved/coppied to a seperate csv-File named like
UserId_LastName-FirstName.csv
I don't have any Idea how to do that. I work on a Windows 10 PC.
I tried various scripts found on stackoverflow/google already. Seems not to work:
export generates a ".csv" without a name and 0KB size (empty)
Or it does nothing.
I tried:
Import-Csv file.csv | Group-Object -Property "UserId" |
Foreach-Object {$path=$_.name+".csv" ; $_.group |
Export-Csv -Path $path -NoTypeInformation}
Generates: file with the same content like the Origin but with " at the front and end of each line and filename .csv (just extention, no name)
awk -F',' 'UserId==NR{a[$1]++;next} a[$1]==1' file.csv file.csv
Output: nothing - no file no error
And some other- I cannot find anymore - sorry.
Thanks for help.
You were close with the Group-Object. This should work for you.
Import-Csv -Path 'D:\file.csv' -Delimiter ';' | Group-Object UserId | ForEach-Object {
$firstName = $_.Group.FirstName | Select-Object -First 1
$lastName = $_.Group.LastName | Select-Object -First 1
$fileOut = 'D:\test\{0}_{1}.csv' -f $lastName, $firstName
$_.Group | Export-Csv -Path $fileOut -NoTypeInformation
}
I have a working powershell script that removes duplicates in a csv file, but it sorts the column headers within the data, which I don't want, and cannot figure out a way to keep the column headers.
Get-Content C:\testdata.csv | ConvertFrom-Csv -Header "Column1", "Column2", "Column3", "Column4" | sort -Unique -Property Column1 | % {"{0},{1},{2},{3}" -f $_.Column1, $_.Column2, $_.Column3, $_.Column4} | set-content c:\output.csv
The test data csv is as follows:
Name,IDNumber,OtherNumber,UniqueCode
Tom,10,133,abcd
Tom,10,133,abcd
Bill,4,132,efgh
Bill,4,132,efgh
Bill,4,132,efgh
Lefty,3,122,ijkl
Lefty,3,122,ijkl
Lefty,3,122,ijkl
Lefty,3,122,ijkl
Is there a way to accomplish this with Powershell?
Using Import-Csv and Export-Csv makes this process much easier as they are built to deal with csv files and headers.
Import-Csv "C:\testdata.csv" | Sort-Object * -Unique | Export-Csv "c:\output.csv" -NoTypeInformation
Untested, but try this...
Import-Csv -Path 'C:\path\to\File.csv' |
Select * -Unique |
Export-Csv 'C:\path\to\NewFile.csv' -NoTypeInformation
You could use Select -Skip 1 to skip over the original header column:
Get-Content testdata.csv | Select -Skip 1 | ConvertFrom-Csv -Header "Column1","Column2","Column3","Column4" | sort -Unique -Property Column1 | % {"{0},{1},{2},{3}" -f $_.Column1, $_.Column2, $_.Column3, $_.Column4} | set-content output.csv
I have folder with 3 text files.
File 1, call it test1.txt has values
11
22
22
test2.txt has values
11
22
22
33
test3.txt has values
11
22
22
33
44
44
How can I get my final result equal to (New.txt)
to be:
44
44
This values is not in the other 2 files so this is what I want.
So far code:
$result = "C:\NonDuplicate.txt"
$filesvalues=gci "C:\*.txt" | %{$filename=$_.Name; gc $_ | %{[pscustomobject]#{FileName= $filename; Row=$_ }}}
#list file where not exists others file with same value
$filesvalues | % {
$valtockeck=$_
[pscustomobject]#{
Val=$valtockeck
Exist=$filesvalues.Where({ $_.FileName -ne $valtockeck.FileName -and $_.Row -eq $valtockeck.Row }).Count -gt 0
}
} |
where Exist -NE $true |
% {$_.Val.Row | out-file $result -Append}
This is the error:
Where-Object : Cannot bind parameter 'FilterScript'. Cannot convert the "Exist" value of type "System.String" to type "System.Management.Automation.ScriptBlock".
At line:16 char:23
+ where <<<< Exist -NE $true |
+ CategoryInfo : InvalidArgument: (:) [Where-Object], ParameterBindingException
+ FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.WhereObjectCommand
try this
#list files/values couple
$filesvalues=gci "C:\temp\test\test*.txt" -file | %{$filename=$_.Name; gc $_ | %{[pscustomobject]#{FileName= $filename; Row=$_ }}}
#list file where not exists others file with same value
$filesvalues | % {
$valtockeck=$_
[pscustomobject]#{
Val=$valtockeck
Exist=$filesvalues.Where({ $_.FileName -ne $valtockeck.FileName -and $_.Row -eq $valtockeck.Row }).Count -gt 0
}
} |
where Exist -NE $true |
% {$_.Val.Row | out-file "c:\temp\test\New.txt" -Append}
$file1 = ".\test1.txt"
$file2 = ".\test2.txt"
$file3 = ".\test3.txt"
$results = ".\New.txt"
$Content = Get-Content $File1
$Content += Get-Content $File2
Get-Content $file3 | Where {$Content -notcontains $_}| Set-Content $Results
Other solution 1
#get couple files/values
$filesvalues=gci "C:\temp\test\test*.txt" -file |
%{$filename=$_.Name; gc $_ |
%{[pscustomobject]#{FileName= $filename; Row=$_ }}}
#group by value and filter by number of distinct filename, then extract data into file
($filesvalues | group -Property Row | where {($_.Group.FileName | Get-Unique).Count -eq 1 }).Group.Row |
out-file "C:\temp\test\New2.txt" -Append
The Compare-Object cmdlet's purpose is to compare two sets of inputs.
Nesting two Compare-Object calls yields the desired output:
$file1Lines = Get-Content .\test1.txt
$file2Lines = Get-Content .\test2.txt
$file3Lines = Get-Content .\test3.txt
(Compare-Object `
(Compare-Object -IncludeEqual $file1Lines $file2Lines).InputObject `
$file3Lines |
Where-Object SideIndicator -eq '=>'
).InputObject
Compare-Object outputs [pscustomobject] instances whose .InputObject property contains the input object and whose .SideIndicator property indicates which operand the value is unique to - <= (LHS) or >= (RHS) - and, with -IncludeEqual, if it is contained in both operands (==).
-IncludeEqual in the 1st Compare-Object call not only outputs the lines that differ, but also includes the ones that are the same, resulting in a union of the lines from file test1.txt and test2.txt.
By not specifying switches for the 2nd Compare-Object call, only [objects wrapping] the lines that differ are output (the default behavior).
Filter Where-Object SideIndicator -eq '=>' then filters the differences down to those lines that are unique to the RHS.
To generalize the command to N > 3 files and output to a new file:
# Get all input files as file objects.
$files = Get-ChildItem .\test*.txt
# I'll asume that all files but the last are the *reference files* - the
# files for which the union of all their lines should be formed first...
$refFiles = $files[0..$($files.count-2)]
# ... and that the last file is the *difference file* - the file whose lines
# to compare against the union of lines from the reference files.
$diffFile = $files[($files.count-1)]
# The output file path.
$results = ".\New.txt"
# Build the union of all lines from the reference files.
$unionOfLines = #()
$refFiles | ForEach-Object {
$unionOfLines = (Compare-Object -IncludeEqual $unionOfLines (Get-Content $_)).InputObject
}
# Compare the union of lines to the difference file and
# output only the lines unique to the difference file to the output file.
(Compare-Object $unionOfLines (Get-Content $diffFile) |
Where-Object SideIndicator -eq '=>').InputObject |
Set-Content $results
Note that Set-Content uses the Windows legacy single-byte encoding by default. Use the -Encoding parameter to change that.
Well, instead of writing the result in the $results file, save it in a variable $tmpResult and then do the same check as above for $tmpResult and $file3 to gain a final result. And if you have more than 3 files, you can create a loop to repeat the check.
But something is missing in the code above - you only get the unique lines in file2 and not those in file1.
I need to get a logic for doing one type of sorting/filtering using multiple CSV files. The problem is I have 2 two CSV files with some investment content in to. The data would like this:
File A_11012015_123.csv(Time stamp appended)
TktNo, AcctID, Rate
1 1 187
2 1 145
7 2 90
File A_12012015_1345.csv(Timestamp appended)
TktNo, AcctID, Rate
1 2 190
2 2 187
3 5 43
Expected output file content
TktNo, AcctID, Rate
1 2 190
2 2 187
3 5 43
7 2 90
Things have tried , not the exact one
$output=#()
foreach(..)(multple files)
{
$output+=Get -Content -path $csv| sort "TktNo: - Unique
}
export $output
Conditions for the output file
Ticket number should be unique and sorted
if there is same ticket number in both files the content of the latest file should be added to the output file.
As this part of migration to power shell and again I am also a beginner, I appreciate if anybody can help me with the implementation.
This code assumes a couple of things that I tried to address in the comments. More description to follow.
Get-ChildItem C:\temp -Filter *.csv | ForEach-Object{
$rawDate = ($_.BaseName -split "_",2)[1]
$filedate = [datetime]::ParseExact($rawDate,"MMddyyyy_HHmmss",[System.Globalization.CultureInfo]::CurrentCulture)
Import-csv $_ | Add-Member -MemberType NoteProperty -Name "Date" -Value $filedate -PassThru
} | Group-Object tktno | ForEach-Object{
$_.Group | Sort-Object Date | Select -Last 1
} | Select-Object TktNo,AcctID,Rate | Sort-Object TktNo
Assumptions
All your csv files are in one location like "c:\temp". Add -recurse if you need to
You say that your date format is "mmddyyyy_hhmmss". The example file times do not coorespond with that. I editing the file names as such to use "MMddyyyy_HHmmss". "File A_11012015_123321.csv" and "File A_12012015_134522.csv"
Breakdown
Couple of ways to do this but a simple one that we used here is Group-Object. As long as you don't have 100's of these files with 1000's of entries it should do the trick.
Take each file and for every entry append its file data with Import-csv $_ | Add-Member -MemberType NoteProperty -Name "Date" -Value $filedate -PassThru. For example you would have:
TktNo AcctID Rate Date
----- ------ ---- ----
1 1 187 11/1/2015 12:33:21 PM
2 1 145 11/1/2015 12:33:21 PM
7 2 90 11/1/2015 12:33:21 PM
We take all of these files and group them together based on tktno. Of each group that is created sort them by date property we created earlier and return the entry that is the current one using Select -Last 1. Drop the date property and sort the remaining data on tktno
As for output you could just append this to the end.
} | Select-Object TktNo,AcctID,Rate | Sort-Object TktNo | Export-CSV "C:\somepath.csv" -NoTypeInformation
I have a csv file containing detailed data, say columns A,B,C,D etc. Columns A and B are categories and C is a time stamp.
I am trying to create a summary file showing one row for each combination of A and B. It should pick the row from the original data where C is the most recent date.
Below is my attempt at solving the problem.
Import-CSV InputData.csv | `
Sort-Object -property #{Expression="ColumnA";Descending=$false}, `
#{Expression="ColumnB";Descending=$false}, `
#{Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)};Descending=$true} | `
Sort-Object ColumnA, ColumnB -unique `
| Export-CSV OutputData.csv -NoTypeInformation
First the file is read, then everything is sorted by all 3 columns, the second Sort-Object call is supposed to then take the first row of each. However, Sort-Object with the -unique switch seems to pick a random row, rather than the first one. Thus this does get one row for each AB combination, but not the one corresponding to most recent C.
Any suggestions for improvements? The data set is very large, so going through the file line by line is awkward, so would prefer a powershell solution.
You should look into Group-By. I didn't create a sample CSV (you should provide it :-) ) so I haven't tested this out, but I think it should work:
Import-CSV InputData.csv | `
Select-Object -Property *, #{Label="DateTime";Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)}} | `
Group-Object ColumnA, ColumnB | `
% {
$sum = ($_.Group | Measure-Object -Property ColumnD -Sum).Sum
$_.Group | Sort-Object -Property "DateTime" -Descending | Select-Object -First 1 -Property *, #{name="SumD";e={ $sum } } -ExcludeProperty DateTime
} | Export-CSV OutputData.csv -NoTypeInformation
This returns the same columns that was inputted(datetime gets excluded from the output).