Split CSV-File by distinct UserId to seperate file

Split CSV-File by distinct UserId to seperate file - windows

I'm trying to split a csv-file By UserID
UserId;FirstName;LastName;Start;End;Type;BreakInMinutes;DateOfCreation;DateOfUpdate
1206;Viktoria;Jechsmayr;2017-10-04 08:15:00.000;2017-10-04 16:15:00.000;work;30;04.10.2017 16:07;05.10.2017 12:31
1205;Brigitte;Jechsmayr;2017-10-05 12:15:00.000;2017-10-05 16:15:00.000;work;0;05.10.2017 12:32;05.10.2017 16:15
1207;Lisa;Jechsmayr;2017-10-06 08:40:00.000;2017-10-06 12:00:00.000;work;0;05.10.2017 15:51;06.10.2017 08:42
1206;Viktoria;Jechsmayr;2017-10-09 08:25:00.000;2017-10-09 16:35:00.000;work;30;09.10.2017 08:23;09.10.2017 16:34
1204;Karl;Jechsmayr;2017-10-11 08:15:00.000;2017-10-11 16:30:00.000;work;60;11.10.2017 08:24;11.10.2017 16:14
1204;Karl;Jechsmayr;2017-10-12 12:30:00.000;2017-10-12 16:45:00.000;work;0;12.10.2017 12:39;12.10.2017 16:43
1205;Brigitte;Jechsmayr;2017-10-13 08:10:00.000;2017-10-13 12:25:00.000;work;0;13.10.2017 08:13;16.10.2017 07:41
1207;Lisa;Jechsmayr;2017-10-16 07:30:00.000;2017-10-16 17:05:00.000;work;30;16.10.2017 07:41;16.10.2017 17:05
I'm trying to split the file (>750.000 rows) by the UserId-Column (1400 distinct userids).
All Datasets by one UserId should be moved/coppied to a seperate csv-File named like
UserId_LastName-FirstName.csv
I don't have any Idea how to do that. I work on a Windows 10 PC.
I tried various scripts found on stackoverflow/google already. Seems not to work:
export generates a ".csv" without a name and 0KB size (empty)
Or it does nothing.
I tried:
Import-Csv file.csv | Group-Object -Property "UserId" |
Foreach-Object {$path=$_.name+".csv" ; $_.group |
Export-Csv -Path $path -NoTypeInformation}
Generates: file with the same content like the Origin but with " at the front and end of each line and filename .csv (just extention, no name)
awk -F',' 'UserId==NR{a[$1]++;next} a[$1]==1' file.csv file.csv
Output: nothing - no file no error
And some other- I cannot find anymore - sorry.
Thanks for help.

You were close with the Group-Object. This should work for you.
Import-Csv -Path 'D:\file.csv' -Delimiter ';' | Group-Object UserId | ForEach-Object {
$firstName = $_.Group.FirstName | Select-Object -First 1
$lastName = $_.Group.LastName | Select-Object -First 1
$fileOut = 'D:\test\{0}_{1}.csv' -f $lastName, $firstName
$_.Group | Export-Csv -Path $fileOut -NoTypeInformation
}

Related

Extract multiple columns from multiple test files in powershell

I got 450 files from computational model calculations for a nanosystem. Each of these files contain top three lines with Title, conditions and date/time. The fourth line has column labels (x y z t n m lag lead bus cond rema dock). From fifth line data starts upto 55th line. There are multiple spaces as delimiter. Spaces are not fixed.
I want to
I) create new text files with only x y z n m rema columns
Ii
II) I want only x y z and n values of all txt files in a single file
How to do it in powershell, plz help!

Based on your description, I guess the content of your files looks something like this:
Title: MyFile
Conditions: Critical
Date: 2020-02-23T11:33:02
x y z t n m lag lead bus cond rema dock
sdasd asdfafd awef wefaewf aefawef aefawrgt eyjrteujer bhtnju qerfqeg 524rwefqwert q3tgqr4fqr4 qregq5g
avftgwb ryhwtwtgqreg efqerfe rgwetgq ergqreq erwf ef 476j q4 w4th2 ef 42r13gg asdfasdrv
You can always read files like that by typing them out, line by line and only keep the lines you actually want. In your case, the data is in line 4-55 (including headers).
To get to that data, you can use this command:
Get-Content MyFile.txt | Select-Object -skip 3 -First 51
If you can confirm, that the data is the data you want, you can start working on the next issue - the multiple spaces delimiter issue.
Since (the number of) spaces are not fixed, you need to replace multiple spaces by a single space. Assuming that the values you are looking for are without spaces, you can add this to your pipeline:
Get-Content C:\MyFile.txt | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}
The '( )+' part means one or more spaces.
Now you have proper csv data. To convert this to a proper object, you just need to convert the data from csv like this:
ConvertFrom-Csv -InputObject (Get-Content C:\MyFile.txt | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}) -Delimiter ' '
From here it is pretty simple to select the values you want:
ConvertFrom-Csv -InputObject (Get-Content C:\MyFile.txt | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}) -Delimiter ' ' | Select-Object x,y,z,n,m,rema
You also need to get all the files done, so you might start by getting the files like this:
foreach ($file in (Get-Content C:\MyFiles)){
ConvertFrom-Csv -InputObject (Get-Content $file.fullname | Select-Object -skip 3 -First 51 | ForEach-Object {$_ -replace '( )+',' '}) -Delimiter ' ' | Select-Object x,y,z,n,m,rema
}
You might want to split up the code into a more read-able format, but this should pretty much cover it.

Removing duplicates from CSV yet keeping column headers

I have a working powershell script that removes duplicates in a csv file, but it sorts the column headers within the data, which I don't want, and cannot figure out a way to keep the column headers.
Get-Content C:\testdata.csv | ConvertFrom-Csv -Header "Column1", "Column2", "Column3", "Column4" | sort -Unique -Property Column1 | % {"{0},{1},{2},{3}" -f $_.Column1, $_.Column2, $_.Column3, $_.Column4} | set-content c:\output.csv
The test data csv is as follows:
Name,IDNumber,OtherNumber,UniqueCode
Tom,10,133,abcd
Tom,10,133,abcd
Bill,4,132,efgh
Bill,4,132,efgh
Bill,4,132,efgh
Lefty,3,122,ijkl
Lefty,3,122,ijkl
Lefty,3,122,ijkl
Lefty,3,122,ijkl
Is there a way to accomplish this with Powershell?

Using Import-Csv and Export-Csv makes this process much easier as they are built to deal with csv files and headers.
Import-Csv "C:\testdata.csv" | Sort-Object * -Unique | Export-Csv "c:\output.csv" -NoTypeInformation

Untested, but try this...
Import-Csv -Path 'C:\path\to\File.csv' |
Select * -Unique |
Export-Csv 'C:\path\to\NewFile.csv' -NoTypeInformation

You could use Select -Skip 1 to skip over the original header column:
Get-Content testdata.csv | Select -Skip 1 | ConvertFrom-Csv -Header "Column1","Column2","Column3","Column4" | sort -Unique -Property Column1 | % {"{0},{1},{2},{3}" -f $_.Column1, $_.Column2, $_.Column3, $_.Column4} | set-content output.csv

Iterate through txt files and find rows that are not in all files

I have folder with 3 text files.
File 1, call it test1.txt has values
11
22
22
test2.txt has values
11
22
22
33
test3.txt has values
11
22
22
33
44
44
How can I get my final result equal to (New.txt)
to be:
44
44
This values is not in the other 2 files so this is what I want.
So far code:
$result = "C:\NonDuplicate.txt"
$filesvalues=gci "C:\*.txt" | %{$filename=$_.Name; gc $_ | %{[pscustomobject]#{FileName= $filename; Row=$_ }}}
#list file where not exists others file with same value
$filesvalues | % {
$valtockeck=$_
[pscustomobject]#{
Val=$valtockeck
Exist=$filesvalues.Where({ $_.FileName -ne $valtockeck.FileName -and $_.Row -eq $valtockeck.Row }).Count -gt 0
}
} |
where Exist -NE $true |
% {$_.Val.Row | out-file $result -Append}
This is the error:
Where-Object : Cannot bind parameter 'FilterScript'. Cannot convert the "Exist" value of type "System.String" to type "System.Management.Automation.ScriptBlock".
At line:16 char:23
+ where <<<< Exist -NE $true |
+ CategoryInfo : InvalidArgument: (:) [Where-Object], ParameterBindingException
+ FullyQualifiedErrorId : CannotConvertArgumentNoMessage,Microsoft.PowerShell.Commands.WhereObjectCommand

try this
#list files/values couple
$filesvalues=gci "C:\temp\test\test*.txt" -file | %{$filename=$_.Name; gc $_ | %{[pscustomobject]#{FileName= $filename; Row=$_ }}}
#list file where not exists others file with same value
$filesvalues | % {
$valtockeck=$_
[pscustomobject]#{
Val=$valtockeck
Exist=$filesvalues.Where({ $_.FileName -ne $valtockeck.FileName -and $_.Row -eq $valtockeck.Row }).Count -gt 0
}
} |
where Exist -NE $true |
% {$_.Val.Row | out-file "c:\temp\test\New.txt" -Append}

$file1 = ".\test1.txt"
$file2 = ".\test2.txt"
$file3 = ".\test3.txt"
$results = ".\New.txt"
$Content = Get-Content $File1
$Content += Get-Content $File2
Get-Content $file3 | Where {$Content -notcontains $_}| Set-Content $Results

Other solution 1
#get couple files/values
$filesvalues=gci "C:\temp\test\test*.txt" -file |
%{$filename=$_.Name; gc $_ |
%{[pscustomobject]#{FileName= $filename; Row=$_ }}}
#group by value and filter by number of distinct filename, then extract data into file
($filesvalues | group -Property Row | where {($_.Group.FileName | Get-Unique).Count -eq 1 }).Group.Row |
out-file "C:\temp\test\New2.txt" -Append

The Compare-Object cmdlet's purpose is to compare two sets of inputs.
Nesting two Compare-Object calls yields the desired output:
$file1Lines = Get-Content .\test1.txt
$file2Lines = Get-Content .\test2.txt
$file3Lines = Get-Content .\test3.txt
(Compare-Object `
(Compare-Object -IncludeEqual $file1Lines $file2Lines).InputObject `
$file3Lines |
Where-Object SideIndicator -eq '=>'
).InputObject
Compare-Object outputs [pscustomobject] instances whose .InputObject property contains the input object and whose .SideIndicator property indicates which operand the value is unique to - <= (LHS) or >= (RHS) - and, with -IncludeEqual, if it is contained in both operands (==).
-IncludeEqual in the 1st Compare-Object call not only outputs the lines that differ, but also includes the ones that are the same, resulting in a union of the lines from file test1.txt and test2.txt.
By not specifying switches for the 2nd Compare-Object call, only [objects wrapping] the lines that differ are output (the default behavior).
Filter Where-Object SideIndicator -eq '=>' then filters the differences down to those lines that are unique to the RHS.
To generalize the command to N > 3 files and output to a new file:
# Get all input files as file objects.
$files = Get-ChildItem .\test*.txt
# I'll asume that all files but the last are the *reference files* - the
# files for which the union of all their lines should be formed first...
$refFiles = $files[0..$($files.count-2)]
# ... and that the last file is the *difference file* - the file whose lines
# to compare against the union of lines from the reference files.
$diffFile = $files[($files.count-1)]
# The output file path.
$results = ".\New.txt"
# Build the union of all lines from the reference files.
$unionOfLines = #()
$refFiles | ForEach-Object {
$unionOfLines = (Compare-Object -IncludeEqual $unionOfLines (Get-Content $_)).InputObject
}
# Compare the union of lines to the difference file and
# output only the lines unique to the difference file to the output file.
(Compare-Object $unionOfLines (Get-Content $diffFile) |
Where-Object SideIndicator -eq '=>').InputObject |
Set-Content $results
Note that Set-Content uses the Windows legacy single-byte encoding by default. Use the -Encoding parameter to change that.

Well, instead of writing the result in the $results file, save it in a variable $tmpResult and then do the same check as above for $tmpResult and $file3 to gain a final result. And if you have more than 3 files, you can create a loop to repeat the check.
But something is missing in the code above - you only get the unique lines in file2 and not those in file1.

Read latest content from multiple CSV files and sort-Logic

I need to get a logic for doing one type of sorting/filtering using multiple CSV files. The problem is I have 2 two CSV files with some investment content in to. The data would like this:
File A_11012015_123.csv(Time stamp appended)
TktNo, AcctID, Rate
1 1 187
2 1 145
7 2 90
File A_12012015_1345.csv(Timestamp appended)
TktNo, AcctID, Rate
1 2 190
2 2 187
3 5 43
Expected output file content
TktNo, AcctID, Rate
1 2 190
2 2 187
3 5 43
7 2 90
Things have tried , not the exact one
$output=#()
foreach(..)(multple files)
{
$output+=Get -Content -path $csv| sort "TktNo: - Unique
}
export $output
Conditions for the output file
Ticket number should be unique and sorted
if there is same ticket number in both files the content of the latest file should be added to the output file.
As this part of migration to power shell and again I am also a beginner, I appreciate if anybody can help me with the implementation.

This code assumes a couple of things that I tried to address in the comments. More description to follow.
Get-ChildItem C:\temp -Filter *.csv | ForEach-Object{
$rawDate = ($_.BaseName -split "_",2)[1]
$filedate = [datetime]::ParseExact($rawDate,"MMddyyyy_HHmmss",[System.Globalization.CultureInfo]::CurrentCulture)
Import-csv $_ | Add-Member -MemberType NoteProperty -Name "Date" -Value $filedate -PassThru
} | Group-Object tktno | ForEach-Object{
$_.Group | Sort-Object Date | Select -Last 1
} | Select-Object TktNo,AcctID,Rate | Sort-Object TktNo
Assumptions
All your csv files are in one location like "c:\temp". Add -recurse if you need to
You say that your date format is "mmddyyyy_hhmmss". The example file times do not coorespond with that. I editing the file names as such to use "MMddyyyy_HHmmss". "File A_11012015_123321.csv" and "File A_12012015_134522.csv"
Breakdown
Couple of ways to do this but a simple one that we used here is Group-Object. As long as you don't have 100's of these files with 1000's of entries it should do the trick.
Take each file and for every entry append its file data with Import-csv $_ | Add-Member -MemberType NoteProperty -Name "Date" -Value $filedate -PassThru. For example you would have:
TktNo AcctID Rate Date
----- ------ ---- ----
1 1 187 11/1/2015 12:33:21 PM
2 1 145 11/1/2015 12:33:21 PM
7 2 90 11/1/2015 12:33:21 PM
We take all of these files and group them together based on tktno. Of each group that is created sort them by date property we created earlier and return the entry that is the current one using Select -Last 1. Drop the date property and sort the remaining data on tktno
As for output you could just append this to the end.
} | Select-Object TktNo,AcctID,Rate | Sort-Object TktNo | Export-CSV "C:\somepath.csv" -NoTypeInformation

Powershell sort and filter

I have a csv file containing detailed data, say columns A,B,C,D etc. Columns A and B are categories and C is a time stamp.
I am trying to create a summary file showing one row for each combination of A and B. It should pick the row from the original data where C is the most recent date.
Below is my attempt at solving the problem.
Import-CSV InputData.csv | `
Sort-Object -property #{Expression="ColumnA";Descending=$false}, `
#{Expression="ColumnB";Descending=$false}, `
#{Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)};Descending=$true} | `
Sort-Object ColumnA, ColumnB -unique `
| Export-CSV OutputData.csv -NoTypeInformation
First the file is read, then everything is sorted by all 3 columns, the second Sort-Object call is supposed to then take the first row of each. However, Sort-Object with the -unique switch seems to pick a random row, rather than the first one. Thus this does get one row for each AB combination, but not the one corresponding to most recent C.
Any suggestions for improvements? The data set is very large, so going through the file line by line is awkward, so would prefer a powershell solution.

You should look into Group-By. I didn't create a sample CSV (you should provide it :-) ) so I haven't tested this out, but I think it should work:
Import-CSV InputData.csv | `
Select-Object -Property *, #{Label="DateTime";Expression={[DateTime]::ParseExact($_.ColumnC,"dd-MM-yyyy HH:mm:ss",$null)}} | `
Group-Object ColumnA, ColumnB | `
% {
$sum = ($_.Group | Measure-Object -Property ColumnD -Sum).Sum
$_.Group | Sort-Object -Property "DateTime" -Descending | Select-Object -First 1 -Property *, #{name="SumD";e={ $sum } } -ExcludeProperty DateTime
} | Export-CSV OutputData.csv -NoTypeInformation
This returns the same columns that was inputted(datetime gets excluded from the output).

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Split CSV-File by distinct UserId to seperate file - windows

Related

Extract multiple columns from multiple test files in powershell

Removing duplicates from CSV yet keeping column headers

Iterate through txt files and find rows that are not in all files

Read latest content from multiple CSV files and sort-Logic

Powershell sort and filter

Categories

Resources