WIndows Batch Script to remove multiple special characters from a text file - windows

I have a text file which has lot of special characters. From that test file, I would like to remove three special characters(~ œ <). Can someone please provide me a script to address my need? I tried with some scripts but it doesn't seem to be working for the character ~.

You can use sed to execute this, download it here -
sed "s/[^a-zA-Z0-9]//g" file.txt
If you have latest, Windows 7 or higher version, you can do something like this in PowerShell
Get-Content file.txt | foreach { $_ -replace '[^\w\d]' } | Out-File -Encoding UTF8 file.new.txt
OR, Download Ruby for windows
C:\>ruby -ne 'print $_.gsub(/[~)œ\[\]<]/,"")' file
Thanks!

Related

Concatenating Files And Insert a Word In Between Files

I have around 3000 .gz files that I need to concatenate with the word "break" in between each file in PowerShell.
cat *gz > allmethods.txt
This concatenates all my files but does not leave any space in between. I need to add a word in between each file. Any help would be appreciated.
Try the following:
Get-Content -Raw *gz |
ForEach-Object { $_ + 'break' } |
Set-Content -Encoding utf8 allmethods.txt
On Windows, cat is a built-in alias for the Get-Content cmdlet; -Raw reads each matching file in full, as a single, multiline string.
The ForEach-Object call concatenates each file's content, reflected in the automatic $_ variable variable with verbatim string break and outputs the result.
Note: This assumes that each input file has a trailing newline and that you don't want an empty line before each occurrence of break; to in effect insert a newline between the file's content and break, use $_; 'break' instead.
The last file's content will also be followed by break.
The Set-Content call saves all strings it receives to the specified output file, using the specified encoding via the -Encoding parameter - adjust as needed.

How can i convert a sed command to its PowerShell equivalent?

Editor's note:
The macOS sed command below performs an in-place (-i '') string-substitution (string-replacement) operation on the given file, i.e. it transforms the file's existing content. The specific substitution shown, s/././g, replaces all non-newline characters (regex metacharacter .) with verbatim . characters, so be careful when trying the command yourself.
While the intended question may ultimately be a different one, as written the question is well-defined, and can be answered to show the full PowerShell equivalent of the sed command (a partial translation is in the question itself), notably including the in-place updating of the file.
I have a mac command and i need it to run on windows. I have no experience in mac whatsoever.
sed -i '' 's/././g' dist/index.html
After research i found that i should use
get-content path | %{$_ -replace 'expression','replace'}
but can't get it to work yet.
Note:
The assumption is that s/././g in your sed command is just a example string substitution that you've chosen as a placeholder for real-world ones. What this example substitution does is to replace all characters other than newnlines (regex .) with a verbatim . Therefore, do not run the commands below as-is on your files, unless you're prepared to have their characters turn into .
The direct translation of your sed command, which performs in-place updating of the input file, is (ForEach-Object is the name of the cmdlet that the built-in % alias refers to):
(Get-Content dist/index.html) |
ForEach-Object { $_ -replace '.', '.' } |
Set-Content dist/index.html -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Or, more efficiently:
(Get-Content -ReadCount 0 dist/index.html) -replace '.', '.' | Set-Content dist/index.html -WhatIf
-ReadCount 0 reads the lines into a single array before outputting the result, instead of the default behavior of emitting each line one by one to the pipeline.
Or, even more efficiently, if line-by-line processing isn't required and the -replace operation can be applied to the entire file content, using the -Raw switch:
(Get-Content -Raw dist/index.html) -replace '.', '.' | Set-Content -NoNewLine dist/index.html -WhatIf
Note:
-replace, the regular-expression-based string replacement operator uses the syntax <input> -replace <regex>, <replacement> and invariably performs global replacements (as requested by the g option in your sed command), i.e. replaces all matches it finds.
Unlike sed's regular expressions, however, PowerShell's are case-insensitive by default; to make them case-sensitive, use the -creplace operator variant.
Note the required (...) around the Get-Content call, which ensures that the file is read into memory in full and closed again first, which is the prerequisite for being able to rewrite the file with Set-Content in the same pipeline.
Caveat: While unlikely, this approach can result in data loss, namely if the write operation that saves back to the input file gets interrupted.
You may need -Encoding with Set-Content to ensure that the rewritten file uses the same character encoding as the original content - Get-Content reads text files into .NET strings recognizing a variety of encodings, and no information is retained as to what encoding was encountered.
Except with the Get-Content -Raw / Set-Content -NoNewLine solution, which preserves the original newline format, the output file will use the platform-native newline format - CRLF (\r\n) on Windows, LF (\n) on Unix-like platforms - irrespective of which format the input file originally used.

How to get actual separate lines in PowerShell's Write-Output using a newline character

I tried to create a multiline Input to practice Select-String, expecting only a single matching line to be output, like I would normaly see it in an echo -e ... | grep combination. But the following command still gives me both lines. It seems to be the newline is only interpreted on final ouptut and Select-String still gets a single line of input
Write-Output "Hi`nthere" | Select-String -Pattern "i"
#
# Hi
# there
#
#
while I would expect it to return just
Hi
I used this version of PowerShell:
Get-Host | Select-Object Version
# 5.1.19041.906
Comparing with bash I would do the following for testing commands on multiline input in bash. I usually generate multiple lines with echo -e and then grep processes the individual lines.
echo -e "Hi\nthere" | grep "i"
# Hi
I hope someone can explain what I miss here in PowerShell? This problem seems like a basic misconception to me, where I also was not sure what to Google for.
Edits
[edit 1]: problem also for line ending with carriage return
Write-Output "Hi`r`nthere" | Select-String -Pattern "i"
I saw that separating with commas works as valid multiline input. So maybe the question is how to convert from newline to actual input line separation.
Write-Output "Hi","there" | Select-String -Pattern "i"
# Hi
[edit 2]: from edit 1 I found this stackoverflow-answer, where for me it now works with
Write-Output "Hi`nthere".Split([Environment]::NewLine) | Select-String -Pattern "i"
# or
Write-Output "Hi`nthere".Split("`n") | Select-String -Pattern "i"
Still may someone please explain why this is relevant here, but not in bash?
All the information is in the comments, but let me summarize and complement it:
PowerShell's pipeline is object-based, and Select-String operates on each input object - even if that happens to be a single multi-line string object, such as output by Write-Output "Hi`nthere"
It is only the output from external programs that is streamed line by line.
Therefore, you must split your multi-line string into individual lines in order to match them as such.
The best idiom for that is -split '\r?\n', because it recognizes both Windows-format CRLF and Unix-format LF-only newlines :
"Hi`nthere" -split '\r?\n' | Select-String -Pattern "i"
Note:
I've omitted Write-Output in favor of PowerShell's implicit output behavior (see the bottom section of this answer for more information).
For more information on how -split '\r?\n' works, see this answer.
Select-String doesn't directly output the matching lines (strings); instead it wraps them in match-information objects that provide metadata about each match. To get just the matching line (string):
In PowerShell (Core) 7+, add the -Raw switch.
In Windows PowerShell, pipe to ForEach-Object Line or wrap the entire call in (...).Line

Size of the sorted file is double than original file in powershell

I have a powershell script, that reads file content, sorts it and writes output to new file. Following is the script:
get-content $inputFile | sort > $sortedFile
The output in file is sorted properly, but the output file ($sortedFile) is double larger than input file ($inputFile). Note: There are no duplicate or extra line in output file.
Any help or ideas regarding this will be helpful.
Most likely the input file is ascii encoding while the default output using redirection is unicode encoding.
Instead of using > as redirection you can use out-file and specify an encoding.
get-content $inputFile | sort | out-file -encoding ASCII

Replacing a string in a file in windows

I am looking for a way to replace all occurrences of string A with B in a file.
I tried using GnuWin32 sed utility, but the result file is trimmed. It probably happens because the file contains non unicode characters. The same command worked on Mac with the same file only after adding LC_ALL=C before the command.
What other tools i can use and how? Can i pass some flag to GnuWin32 sed that will work with non unicode characters?
In PowerShell something like this should work:
$f = 'C:\path\to\your.txt'
(Get-Content $f) -replace 'A','B' | Out-File $f
Another option is to use variable syntax:
${C:\path\to\your.txt} -replace 'A','B' | Out-File C:\path\to\your.txt
Without seeing the files encoding it's impossible to tell if this will work but this uses a helper batch file called repl.bat from - http://www.dostips.com/forum/viewtopic.php?f=3&t=3855
type "file.txt" |repl "A" "B" >"newfile.txt"

Resources