I have around 3,000 gzip folders that I need to concatenate into one file except for the first line of each file. I also need to have the word "break" in between each file.
Concatenating Files And Insert a Word In Between Files
I asked this a few days ago. I need to do the exact same thing, just taking out the first line. Any help would be appreciated.
For this, you'll need to pipe each file to Get-Content individually and omit the -Raw switch:
Get-ChildItem *gz |ForEach-Object {
$_ |Get-Content |Select-Object -Skip 1
'break'
} |Select-Object -SkipLast 1 |Set-Content -Encoding utf8 allmethods.txt
The Select-Object -Skip 1 command will discard the first line from each file read, and Select-Object -SkipLast 1 will remove the last trailing break from the entire output stream.
Related
I have around 3000 .gz files that I need to concatenate with the word "break" in between each file in PowerShell.
cat *gz > allmethods.txt
This concatenates all my files but does not leave any space in between. I need to add a word in between each file. Any help would be appreciated.
Try the following:
Get-Content -Raw *gz |
ForEach-Object { $_ + 'break' } |
Set-Content -Encoding utf8 allmethods.txt
On Windows, cat is a built-in alias for the Get-Content cmdlet; -Raw reads each matching file in full, as a single, multiline string.
The ForEach-Object call concatenates each file's content, reflected in the automatic $_ variable variable with verbatim string break and outputs the result.
Note: This assumes that each input file has a trailing newline and that you don't want an empty line before each occurrence of break; to in effect insert a newline between the file's content and break, use $_; 'break' instead.
The last file's content will also be followed by break.
The Set-Content call saves all strings it receives to the specified output file, using the specified encoding via the -Encoding parameter - adjust as needed.
I tried to create a multiline Input to practice Select-String, expecting only a single matching line to be output, like I would normaly see it in an echo -e ... | grep combination. But the following command still gives me both lines. It seems to be the newline is only interpreted on final ouptut and Select-String still gets a single line of input
Write-Output "Hi`nthere" | Select-String -Pattern "i"
#
# Hi
# there
#
#
while I would expect it to return just
Hi
I used this version of PowerShell:
Get-Host | Select-Object Version
# 5.1.19041.906
Comparing with bash I would do the following for testing commands on multiline input in bash. I usually generate multiple lines with echo -e and then grep processes the individual lines.
echo -e "Hi\nthere" | grep "i"
# Hi
I hope someone can explain what I miss here in PowerShell? This problem seems like a basic misconception to me, where I also was not sure what to Google for.
Edits
[edit 1]: problem also for line ending with carriage return
Write-Output "Hi`r`nthere" | Select-String -Pattern "i"
I saw that separating with commas works as valid multiline input. So maybe the question is how to convert from newline to actual input line separation.
Write-Output "Hi","there" | Select-String -Pattern "i"
# Hi
[edit 2]: from edit 1 I found this stackoverflow-answer, where for me it now works with
Write-Output "Hi`nthere".Split([Environment]::NewLine) | Select-String -Pattern "i"
# or
Write-Output "Hi`nthere".Split("`n") | Select-String -Pattern "i"
Still may someone please explain why this is relevant here, but not in bash?
All the information is in the comments, but let me summarize and complement it:
PowerShell's pipeline is object-based, and Select-String operates on each input object - even if that happens to be a single multi-line string object, such as output by Write-Output "Hi`nthere"
It is only the output from external programs that is streamed line by line.
Therefore, you must split your multi-line string into individual lines in order to match them as such.
The best idiom for that is -split '\r?\n', because it recognizes both Windows-format CRLF and Unix-format LF-only newlines :
"Hi`nthere" -split '\r?\n' | Select-String -Pattern "i"
Note:
I've omitted Write-Output in favor of PowerShell's implicit output behavior (see the bottom section of this answer for more information).
For more information on how -split '\r?\n' works, see this answer.
Select-String doesn't directly output the matching lines (strings); instead it wraps them in match-information objects that provide metadata about each match. To get just the matching line (string):
In PowerShell (Core) 7+, add the -Raw switch.
In Windows PowerShell, pipe to ForEach-Object Line or wrap the entire call in (...).Line
I have a file having key-value data in it. I have to get the value of a specific key in that file. I have the Linux equivalent command:
File:
key1=val1
key2=val2
..
Command:
cat path/file | grep 'key1' | awk -F '=' '{print $2}'
Output:
val1
I want to achieve the same output on windows as well. I don't have any experience working in power shell but I tried with the following:
Get-Content "path/file" | Select-String -Pattern 'key1' -AllMatches
But I'm getting output like this:
key1=val1
What am i doing wrong here?
<# required powershell version 5.1 or later
#'
key1=val1
key2=val2
'# | out-file d:\temp.txt
#>
(Get-Content d:\temp.txt | ConvertFrom-StringData).key1
Note:
With your specific input format (key=value lines), Алексей Семенов's helpful answer offers the simplest solution, using ConvertFrom-StringData; note that it ignores whitespace around = and trailing whitespace after the value.
The answer below focuses generally on how to implement grep and awk-like functionality in PowerShell.
It is not the direct equivalent of your approach, but a faster and PowerShell-idiomatic solution using a switch statement:
# Create a sample file
#'
key1=val1
key2=val2
'# > sample.txt
# -> 'val1'
switch -Regex -File ./sample.txt { '^\s*key1=(.*)' { $Matches[1]; break } }
The -Regex option implicitly performs a -match operation on each line of the input file (thanks to -File), and the results are available in the automatic $Matches variable.
$Matches[1] therefore returns what the first (and only) capture group ((...)) in the regex matched; break stops processing instantly.
A more concise, but slower option is to combine the -match and -split operators, but note that this will only work as intended if only one line matches:
((Get-Content ./sample.txt) -match '^\s*key1=' -split '=')[1]
Also note that this invariably involves reading the entire file, by loading all lines into an array up front via Get-Content.
A comparatively slow version - due to using a cmdlet and thereby implicitly the pipeline - that fixes your attempt:
(Select-String -List '^\s*key1=(.*)' ./sample.txt).Matches[0].Groups[1].Value
Note:
Select-String outputs wrapper objects of type Microsoft.PowerShell.Commands.MatchInfo that wrap metadata around the matching strings rather than returning them directly (the way that grep does); .Matches is the property that contains the details of the match, which allows accessing what the capture group ((...)) in the regex captured, but it's not exactly obvious how to access that information.
The -List switch ensures that processing stops at the first match, but note that this only works with a direct file argument rather than with piping a file's lines individually via Get-Content.
Note that -AllMatches is for finding multiple matches in a single line (input object), and therefore not necessary here.
Another slow solution that uses ForEach-Object with a script block in which each line is -split into the key and value part, as suggested by Jeroen Mostert:
Get-Content ./sample.txt | ForEach-Object {
$key, $val = $_ -split '='
if ($key -eq 'key1') { $val }
}
Caveat: This invariably processes all lines, even after the key of interest was found.
To prevent that, you can append | Select-Object -First 1 to the command.
Unfortunately, as of PowerShell 7.1 there is no way to directly exit a pipeline on demand from a script block; see long-standing GitHub feature request #3821.
Note that break does not work as intended - except if you wrap your pipeline in a dummy loop statement (such as do { ... } while ($false)) to break out of.
Say I have several text files that contain the word 'not' and I want to find them and create a file containing the matches. In Linux, I might do
grep -r not *.txt > found_nots.txt
This works fine. In PowerShell, the following echos what I want to the screen
get-childitem *.txt -recurse | select-string not
However, if I pipe this to a file:
get-childitem *.txt -recurse | select-string not > found_nots.txt
It runs for ages. I eventually CTRL-C to exit and take a look at the found_nots.txt file which is truly huge. It looks as though PowerShell includes the output file as one of the files to search. Every time it adds more content, it finds more to add.
How can I stop this behavior and make it behave more like the Unix version?
Use the -Exclude option.
get-childitem *.txt -Exclude 'found_nots.txt' -recurse | select-string not > found_nots.txt
First easy solution is rename file output extension to another
I need a simple way to create a list of all files in a certain folder. (recursively)
Each file must be in a single line. I also need the file size and the last access date in the same line, separated by a special character.
The output (textfile) should look like this:
c:\folder\file1.txt|400|2012-11-12 15:23:08
c:\folder\file2.txt|200|2012-11-12 15:23:08
c:\folder\file3.txt|100|2012-11-12 15:23:08
c:\folder\sub folder\file4.txt|500|2012-11-12 15:23:08
'Dir' seems not to be an option, because the German Special characters get messed up that way. (öäüß)
Powershell handles the special characters well, but I couldn't make it so that the information for one file ends up in a single line:
get-childitem D:\temp -rec | where {!$_.PSIsContainer} | foreach-object -process {$_.FullName, $_.LastWriteTime, $_.Length}
try this:
get-childitem D:\temp -rec | where {!$_.PSIsContainer} |
select-object FullName, LastWriteTime, Length | export-csv -notypeinformation -delimiter '|' -path file.csv