Concatenating Files And Insert a Word In Between Files

Concatenating Files And Insert a Word In Between Files - windows

I have around 3000 .gz files that I need to concatenate with the word "break" in between each file in PowerShell.
cat *gz > allmethods.txt
This concatenates all my files but does not leave any space in between. I need to add a word in between each file. Any help would be appreciated.

Try the following:
Get-Content -Raw *gz |
ForEach-Object { $_ + 'break' } |
Set-Content -Encoding utf8 allmethods.txt
On Windows, cat is a built-in alias for the Get-Content cmdlet; -Raw reads each matching file in full, as a single, multiline string.
The ForEach-Object call concatenates each file's content, reflected in the automatic $_ variable variable with verbatim string break and outputs the result.
Note: This assumes that each input file has a trailing newline and that you don't want an empty line before each occurrence of break; to in effect insert a newline between the file's content and break, use $_; 'break' instead.
The last file's content will also be followed by break.
The Set-Content call saves all strings it receives to the specified output file, using the specified encoding via the -Encoding parameter - adjust as needed.

Related

How to cat all files in a directory except first line?

I have around 3,000 gzip folders that I need to concatenate into one file except for the first line of each file. I also need to have the word "break" in between each file.
Concatenating Files And Insert a Word In Between Files
I asked this a few days ago. I need to do the exact same thing, just taking out the first line. Any help would be appreciated.

For this, you'll need to pipe each file to Get-Content individually and omit the -Raw switch:
Get-ChildItem *gz |ForEach-Object {
$_ |Get-Content |Select-Object -Skip 1
'break'
} |Select-Object -SkipLast 1 |Set-Content -Encoding utf8 allmethods.txt
The Select-Object -Skip 1 command will discard the first line from each file read, and Select-Object -SkipLast 1 will remove the last trailing break from the entire output stream.

How can i convert a sed command to its PowerShell equivalent?

Editor's note:
The macOS sed command below performs an in-place (-i '') string-substitution (string-replacement) operation on the given file, i.e. it transforms the file's existing content. The specific substitution shown, s/././g, replaces all non-newline characters (regex metacharacter .) with verbatim . characters, so be careful when trying the command yourself.
While the intended question may ultimately be a different one, as written the question is well-defined, and can be answered to show the full PowerShell equivalent of the sed command (a partial translation is in the question itself), notably including the in-place updating of the file.
I have a mac command and i need it to run on windows. I have no experience in mac whatsoever.
sed -i '' 's/././g' dist/index.html
After research i found that i should use
get-content path | %{$_ -replace 'expression','replace'}
but can't get it to work yet.

Note:
The assumption is that s/././g in your sed command is just a example string substitution that you've chosen as a placeholder for real-world ones. What this example substitution does is to replace all characters other than newnlines (regex .) with a verbatim . Therefore, do not run the commands below as-is on your files, unless you're prepared to have their characters turn into .
The direct translation of your sed command, which performs in-place updating of the input file, is (ForEach-Object is the name of the cmdlet that the built-in % alias refers to):
(Get-Content dist/index.html) |
ForEach-Object { $_ -replace '.', '.' } |
Set-Content dist/index.html -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Or, more efficiently:
(Get-Content -ReadCount 0 dist/index.html) -replace '.', '.' | Set-Content dist/index.html -WhatIf
-ReadCount 0 reads the lines into a single array before outputting the result, instead of the default behavior of emitting each line one by one to the pipeline.
Or, even more efficiently, if line-by-line processing isn't required and the -replace operation can be applied to the entire file content, using the -Raw switch:
(Get-Content -Raw dist/index.html) -replace '.', '.' | Set-Content -NoNewLine dist/index.html -WhatIf
Note:
-replace, the regular-expression-based string replacement operator uses the syntax <input> -replace <regex>, <replacement> and invariably performs global replacements (as requested by the g option in your sed command), i.e. replaces all matches it finds.
Unlike sed's regular expressions, however, PowerShell's are case-insensitive by default; to make them case-sensitive, use the -creplace operator variant.
Note the required (...) around the Get-Content call, which ensures that the file is read into memory in full and closed again first, which is the prerequisite for being able to rewrite the file with Set-Content in the same pipeline.
Caveat: While unlikely, this approach can result in data loss, namely if the write operation that saves back to the input file gets interrupted.
You may need -Encoding with Set-Content to ensure that the rewritten file uses the same character encoding as the original content - Get-Content reads text files into .NET strings recognizing a variety of encodings, and no information is retained as to what encoding was encountered.
Except with the Get-Content -Raw / Set-Content -NoNewLine solution, which preserves the original newline format, the output file will use the platform-native newline format - CRLF (\r\n) on Windows, LF (\n) on Unix-like platforms - irrespective of which format the input file originally used.

How to get actual separate lines in PowerShell's Write-Output using a newline character

I tried to create a multiline Input to practice Select-String, expecting only a single matching line to be output, like I would normaly see it in an echo -e ... | grep combination. But the following command still gives me both lines. It seems to be the newline is only interpreted on final ouptut and Select-String still gets a single line of input
Write-Output "Hi`nthere" | Select-String -Pattern "i"
#
# Hi
# there
#
#
while I would expect it to return just
Hi
I used this version of PowerShell:
Get-Host | Select-Object Version
# 5.1.19041.906
Comparing with bash I would do the following for testing commands on multiline input in bash. I usually generate multiple lines with echo -e and then grep processes the individual lines.
echo -e "Hi\nthere" | grep "i"
# Hi
I hope someone can explain what I miss here in PowerShell? This problem seems like a basic misconception to me, where I also was not sure what to Google for.
Edits
[edit 1]: problem also for line ending with carriage return
Write-Output "Hi`r`nthere" | Select-String -Pattern "i"
I saw that separating with commas works as valid multiline input. So maybe the question is how to convert from newline to actual input line separation.
Write-Output "Hi","there" | Select-String -Pattern "i"
# Hi
[edit 2]: from edit 1 I found this stackoverflow-answer, where for me it now works with
Write-Output "Hi`nthere".Split([Environment]::NewLine) | Select-String -Pattern "i"
# or
Write-Output "Hi`nthere".Split("`n") | Select-String -Pattern "i"
Still may someone please explain why this is relevant here, but not in bash?

All the information is in the comments, but let me summarize and complement it:
PowerShell's pipeline is object-based, and Select-String operates on each input object - even if that happens to be a single multi-line string object, such as output by Write-Output "Hi`nthere"
It is only the output from external programs that is streamed line by line.
Therefore, you must split your multi-line string into individual lines in order to match them as such.
The best idiom for that is -split '\r?\n', because it recognizes both Windows-format CRLF and Unix-format LF-only newlines :
"Hi`nthere" -split '\r?\n' | Select-String -Pattern "i"
Note:
I've omitted Write-Output in favor of PowerShell's implicit output behavior (see the bottom section of this answer for more information).
For more information on how -split '\r?\n' works, see this answer.
Select-String doesn't directly output the matching lines (strings); instead it wraps them in match-information objects that provide metadata about each match. To get just the matching line (string):
In PowerShell (Core) 7+, add the -Raw switch.
In Windows PowerShell, pipe to ForEach-Object Line or wrap the entire call in (...).Line

Extract value from key value pair using powershell

I have a file having key-value data in it. I have to get the value of a specific key in that file. I have the Linux equivalent command:
File:
key1=val1
key2=val2
..
Command:
cat path/file | grep 'key1' | awk -F '=' '{print $2}'
Output:
val1
I want to achieve the same output on windows as well. I don't have any experience working in power shell but I tried with the following:
Get-Content "path/file" | Select-String -Pattern 'key1' -AllMatches
But I'm getting output like this:
key1=val1
What am i doing wrong here?

<# required powershell version 5.1 or later
#'
key1=val1
key2=val2
'# | out-file d:\temp.txt
#>
(Get-Content d:\temp.txt | ConvertFrom-StringData).key1

Note:
With your specific input format (key=value lines), Алексей Семенов's helpful answer offers the simplest solution, using ConvertFrom-StringData; note that it ignores whitespace around = and trailing whitespace after the value.
The answer below focuses generally on how to implement grep and awk-like functionality in PowerShell.
It is not the direct equivalent of your approach, but a faster and PowerShell-idiomatic solution using a switch statement:
# Create a sample file
#'
key1=val1
key2=val2
'# > sample.txt
# -> 'val1'
switch -Regex -File ./sample.txt { '^\s*key1=(.*)' { $Matches[1]; break } }
The -Regex option implicitly performs a -match operation on each line of the input file (thanks to -File), and the results are available in the automatic $Matches variable.
$Matches[1] therefore returns what the first (and only) capture group ((...)) in the regex matched; break stops processing instantly.
A more concise, but slower option is to combine the -match and -split operators, but note that this will only work as intended if only one line matches:
((Get-Content ./sample.txt) -match '^\s*key1=' -split '=')[1]
Also note that this invariably involves reading the entire file, by loading all lines into an array up front via Get-Content.
A comparatively slow version - due to using a cmdlet and thereby implicitly the pipeline - that fixes your attempt:
(Select-String -List '^\s*key1=(.*)' ./sample.txt).Matches[0].Groups[1].Value
Note:
Select-String outputs wrapper objects of type Microsoft.PowerShell.Commands.MatchInfo that wrap metadata around the matching strings rather than returning them directly (the way that grep does); .Matches is the property that contains the details of the match, which allows accessing what the capture group ((...)) in the regex captured, but it's not exactly obvious how to access that information.
The -List switch ensures that processing stops at the first match, but note that this only works with a direct file argument rather than with piping a file's lines individually via Get-Content.
Note that -AllMatches is for finding multiple matches in a single line (input object), and therefore not necessary here.
Another slow solution that uses ForEach-Object with a script block in which each line is -split into the key and value part, as suggested by Jeroen Mostert:
Get-Content ./sample.txt | ForEach-Object {
$key, $val = $_ -split '='
if ($key -eq 'key1') { $val }
}
Caveat: This invariably processes all lines, even after the key of interest was found.
To prevent that, you can append | Select-Object -First 1 to the command.
Unfortunately, as of PowerShell 7.1 there is no way to directly exit a pipeline on demand from a script block; see long-standing GitHub feature request #3821.
Note that break does not work as intended - except if you wrap your pipeline in a dummy loop statement (such as do { ... } while ($false)) to break out of.

Size of the sorted file is double than original file in powershell

I have a powershell script, that reads file content, sorts it and writes output to new file. Following is the script:
get-content $inputFile | sort > $sortedFile
The output in file is sorted properly, but the output file ($sortedFile) is double larger than input file ($inputFile). Note: There are no duplicate or extra line in output file.
Any help or ideas regarding this will be helpful.

Most likely the input file is ascii encoding while the default output using redirection is unicode encoding.
Instead of using > as redirection you can use out-file and specify an encoding.
get-content $inputFile | sort | out-file -encoding ASCII

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio