Bigger txt file with less rows on Windows after Powershell Select-String - windows

I applied several filters to a text files using Powershell Get-Content and the -nomatch operator and then i spooled the result to a file.
gc file.txt | {?_ -notmatch 'excl1|excl2|excl3'} | out-file newfile.txt
What happens is that the output file (newfile.txt) has less lines, but it is reported by windows with a bigger size than file.txt.
Has someone ever encountered this behavior? How can I have the correct size reported by windows? I checked the number of rows, the file with less rows is reported as bigger in size.

I'm certain you have an encoding issue. By default Get-Content uses ascii whereas Out-File uses Unicode.
From TechNet
-Encoding
Specifies the type of character encoding used in the file. Valid values are "Unicode", "UTF7", "UTF8", "UTF32", "ASCII", "BigEndianUnicode", "Default", and "OEM". "Unicode" is the default.
Use -Enconding ascii with Out-File or just use Set-Content as it is the partner of Get-Content.
Get-Content file.txt | {?_ -notmatch 'excl1|excl2|excl3'} |
out-file -Encoding ascii newfile.txt
# or
Set-Content newfile.txt
Coming from the other direction if you have issues with your input file Get-Content in PowerShell v3 and above also supports -Enconding

Related

How to cat all files in a directory except first line?

I have around 3,000 gzip folders that I need to concatenate into one file except for the first line of each file. I also need to have the word "break" in between each file.
Concatenating Files And Insert a Word In Between Files
I asked this a few days ago. I need to do the exact same thing, just taking out the first line. Any help would be appreciated.
For this, you'll need to pipe each file to Get-Content individually and omit the -Raw switch:
Get-ChildItem *gz |ForEach-Object {
$_ |Get-Content |Select-Object -Skip 1
'break'
} |Select-Object -SkipLast 1 |Set-Content -Encoding utf8 allmethods.txt
The Select-Object -Skip 1 command will discard the first line from each file read, and Select-Object -SkipLast 1 will remove the last trailing break from the entire output stream.

How can i convert a sed command to its PowerShell equivalent?

Editor's note:
The macOS sed command below performs an in-place (-i '') string-substitution (string-replacement) operation on the given file, i.e. it transforms the file's existing content. The specific substitution shown, s/././g, replaces all non-newline characters (regex metacharacter .) with verbatim . characters, so be careful when trying the command yourself.
While the intended question may ultimately be a different one, as written the question is well-defined, and can be answered to show the full PowerShell equivalent of the sed command (a partial translation is in the question itself), notably including the in-place updating of the file.
I have a mac command and i need it to run on windows. I have no experience in mac whatsoever.
sed -i '' 's/././g' dist/index.html
After research i found that i should use
get-content path | %{$_ -replace 'expression','replace'}
but can't get it to work yet.
Note:
The assumption is that s/././g in your sed command is just a example string substitution that you've chosen as a placeholder for real-world ones. What this example substitution does is to replace all characters other than newnlines (regex .) with a verbatim . Therefore, do not run the commands below as-is on your files, unless you're prepared to have their characters turn into .
The direct translation of your sed command, which performs in-place updating of the input file, is (ForEach-Object is the name of the cmdlet that the built-in % alias refers to):
(Get-Content dist/index.html) |
ForEach-Object { $_ -replace '.', '.' } |
Set-Content dist/index.html -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Or, more efficiently:
(Get-Content -ReadCount 0 dist/index.html) -replace '.', '.' | Set-Content dist/index.html -WhatIf
-ReadCount 0 reads the lines into a single array before outputting the result, instead of the default behavior of emitting each line one by one to the pipeline.
Or, even more efficiently, if line-by-line processing isn't required and the -replace operation can be applied to the entire file content, using the -Raw switch:
(Get-Content -Raw dist/index.html) -replace '.', '.' | Set-Content -NoNewLine dist/index.html -WhatIf
Note:
-replace, the regular-expression-based string replacement operator uses the syntax <input> -replace <regex>, <replacement> and invariably performs global replacements (as requested by the g option in your sed command), i.e. replaces all matches it finds.
Unlike sed's regular expressions, however, PowerShell's are case-insensitive by default; to make them case-sensitive, use the -creplace operator variant.
Note the required (...) around the Get-Content call, which ensures that the file is read into memory in full and closed again first, which is the prerequisite for being able to rewrite the file with Set-Content in the same pipeline.
Caveat: While unlikely, this approach can result in data loss, namely if the write operation that saves back to the input file gets interrupted.
You may need -Encoding with Set-Content to ensure that the rewritten file uses the same character encoding as the original content - Get-Content reads text files into .NET strings recognizing a variety of encodings, and no information is retained as to what encoding was encountered.
Except with the Get-Content -Raw / Set-Content -NoNewLine solution, which preserves the original newline format, the output file will use the platform-native newline format - CRLF (\r\n) on Windows, LF (\n) on Unix-like platforms - irrespective of which format the input file originally used.

How to get actual separate lines in PowerShell's Write-Output using a newline character

I tried to create a multiline Input to practice Select-String, expecting only a single matching line to be output, like I would normaly see it in an echo -e ... | grep combination. But the following command still gives me both lines. It seems to be the newline is only interpreted on final ouptut and Select-String still gets a single line of input
Write-Output "Hi`nthere" | Select-String -Pattern "i"
#
# Hi
# there
#
#
while I would expect it to return just
Hi
I used this version of PowerShell:
Get-Host | Select-Object Version
# 5.1.19041.906
Comparing with bash I would do the following for testing commands on multiline input in bash. I usually generate multiple lines with echo -e and then grep processes the individual lines.
echo -e "Hi\nthere" | grep "i"
# Hi
I hope someone can explain what I miss here in PowerShell? This problem seems like a basic misconception to me, where I also was not sure what to Google for.
Edits
[edit 1]: problem also for line ending with carriage return
Write-Output "Hi`r`nthere" | Select-String -Pattern "i"
I saw that separating with commas works as valid multiline input. So maybe the question is how to convert from newline to actual input line separation.
Write-Output "Hi","there" | Select-String -Pattern "i"
# Hi
[edit 2]: from edit 1 I found this stackoverflow-answer, where for me it now works with
Write-Output "Hi`nthere".Split([Environment]::NewLine) | Select-String -Pattern "i"
# or
Write-Output "Hi`nthere".Split("`n") | Select-String -Pattern "i"
Still may someone please explain why this is relevant here, but not in bash?
All the information is in the comments, but let me summarize and complement it:
PowerShell's pipeline is object-based, and Select-String operates on each input object - even if that happens to be a single multi-line string object, such as output by Write-Output "Hi`nthere"
It is only the output from external programs that is streamed line by line.
Therefore, you must split your multi-line string into individual lines in order to match them as such.
The best idiom for that is -split '\r?\n', because it recognizes both Windows-format CRLF and Unix-format LF-only newlines :
"Hi`nthere" -split '\r?\n' | Select-String -Pattern "i"
Note:
I've omitted Write-Output in favor of PowerShell's implicit output behavior (see the bottom section of this answer for more information).
For more information on how -split '\r?\n' works, see this answer.
Select-String doesn't directly output the matching lines (strings); instead it wraps them in match-information objects that provide metadata about each match. To get just the matching line (string):
In PowerShell (Core) 7+, add the -Raw switch.
In Windows PowerShell, pipe to ForEach-Object Line or wrap the entire call in (...).Line

Size of the sorted file is double than original file in powershell

I have a powershell script, that reads file content, sorts it and writes output to new file. Following is the script:
get-content $inputFile | sort > $sortedFile
The output in file is sorted properly, but the output file ($sortedFile) is double larger than input file ($inputFile). Note: There are no duplicate or extra line in output file.
Any help or ideas regarding this will be helpful.
Most likely the input file is ascii encoding while the default output using redirection is unicode encoding.
Instead of using > as redirection you can use out-file and specify an encoding.
get-content $inputFile | sort | out-file -encoding ASCII

Windows : How to list files recursively with size and last access date?

I need a simple way to create a list of all files in a certain folder. (recursively)
Each file must be in a single line. I also need the file size and the last access date in the same line, separated by a special character.
The output (textfile) should look like this:
c:\folder\file1.txt|400|2012-11-12 15:23:08
c:\folder\file2.txt|200|2012-11-12 15:23:08
c:\folder\file3.txt|100|2012-11-12 15:23:08
c:\folder\sub folder\file4.txt|500|2012-11-12 15:23:08
'Dir' seems not to be an option, because the German Special characters get messed up that way. (öäüß)
Powershell handles the special characters well, but I couldn't make it so that the information for one file ends up in a single line:
get-childitem D:\temp -rec | where {!$_.PSIsContainer} | foreach-object -process {$_.FullName, $_.LastWriteTime, $_.Length}
try this:
get-childitem D:\temp -rec | where {!$_.PSIsContainer} |
select-object FullName, LastWriteTime, Length | export-csv -notypeinformation -delimiter '|' -path file.csv

Resources