How to do a "sed -" like operation on Windows? - windows

I have a large 55 GB file in which there is a sentence on every line.
I want to check if there are any lines that have a dot "." at the end, and then if there is, I want to insert a space before the dot in that line.
Ex: I like that car.
Replace with: I like that car .
A space before the trailing dot on every line if there is a dot.
I don't have any cygwin or unix and I use a windows OS. Is there a sed like common that I can do on this 55GB! file?
I tried GetGNUWin32 but I am unable to determine the actual command there.

Install Perl. Strawberry Perl is probably the best distribution for Windows. http://strawberryperl.com/
To do what you're talking about in Perl, it would be this:
perl -p -i -e's/\.$/ ./' filename

You can install Cygwin and use sed from there. And here I found Sed for Windows
Edit:
Very Good Answers to your Question:
Is there any sed like utility for cmd.exe
(I always prefix stackoverfloew when I search on google. Same I did for you on google: sed on window stackoverflow, but that is different matter)

For your use case:
From PowerShell.exe (comes with Windows)
(Get-Content file.txt) -Replace '\.$', ' .' | Set-Content file.txt
I searched for hours and hours and had so much trouble trying to find a solution to my use case, so I hope adding this answer helps someone else in the same situation.
For those who got here to figure out git filter clean/smudge like I did, here's how I finally managed it:
In file: .gitconfig (global)
[filter "replacePassword"]
required = true
clean = "PowerShell -Command \"(Get-Content " %f ") -Replace 'this is a password', 'this is NOT a password'\""
smudge = "PowerShell -Command \"(Get-Content " %f ") -Replace 'this is NOT a password', 'this is a password'\""
Please note that this snippet doesn't change the original file (this is intended for my use case).
Additional search terms to help those looking: interpolation, interpolate

Related

Converting unix script to windows script - emulating a Sed command in PowerShell

I have a unix script (korn to be exact) that is working well and I need to convert it windows batch script. So far I have tried inserting a powershell command line on my code, but it doesn't work. Please help, I am just new to both unix scripting and windows scripting so any help will do.
This is the line of code that I need to convert:
#create new file to parse ; exclude past instances of timestamp
parsefile=/tmp/$$.parse
sed -e "1,/$TIMESTAMP/d" -e "/$TIMESTAMP/d" $DSTLOGFILE > $parsefile
So far I have tried a powershell command line to be called on my script but it didn't work:
:set_parse_file
#powershell -Command "Get-Content $SCHLOGFILE | Foreach-Object {$_ -replace('1,/"$TIMESTAMP"/d' '/"$TIMESTAMP"/d'} | Set-Content $PARSEFILE"
Any suggestions please?
PowerShell has no sed-like constructs for processing ranges of lines (e.g., sed interprets 1,/foo/ as referring to the range of consecutive lines from line 1 through a subsequent line that matches regex foo)
Emulating this feature with line-by-line processing would be much more verbose, but a comparatively more concise version is possible if the input file is processed as a whole - which is only an option with files small enough to fit into memory as a whole, however (PSv5+ syntax).
Here's the pure PowerShell code:
$escapedTimeStamp = [regex]::Escape($TIMESTAMP)
(Get-Content -Raw $SCHLOGFILE) -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') `
-replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') |
Set-Content -NoNewline $PARSEFILE
Note that [regex]::Escape() is used to make sure that the value of $TIMESTAMP is treated as a literal, even if it happens to contain regex metacharacters (chars. with special meaning to the regex engine).
Your ksh code doesn't do that (and it's nontrivial to do in ksh), so if - conversely - $TIMESTAMP should be interpreted as a regex, simply omit that step and use $TIMESTAMP directly.
The -replace operator is regex-based and uses the .NET regular-expression engine.
It is the use of Get-Content's -Raw switch that requires PSv3+ and the use of Set-Content's -NoNewline switch that requires PSv5+. You can make this command work in earlier versions, but it requires more effort.
Calling the above from cmd.exe (a batch file) gets quite unwieldy - and you always have to be wary of quoting issues - but it should work:
#powershell.exe -noprofile -command "$escapedTimeStamp = [regex]::Escape('%TIMESTAMP%'); (Get-Content -Raw '%SCHLOGFILE%') -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') -replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') | Set-Content -NoNewline '%PARSEFILE%'"
Note how the -command argument is passed as a single "..." string, which is ultimately the safest and conceptually cleanest way to pass code to PowerShell.
Also note the need to embed batch variables as %varname% in the command, and since they are enclosed in embedded '...' above, the assumption is that their values contain no ' chars.
Therefore, consider implementing your entire script in Powershell - you'll have a much more powerful scripting language at your disposal, and you'll avoid the quoting headaches that come from bridging two disparate worlds.

Recursive find and replace on Command Line with Special Characters

Im trying to recursively go through a folder structure and update a bunch of pom.xml files. I want to only update my version number so I'm trying to be as exact as possible. What I want to change is:
<version>5.1.1</version>
to
<version>5.2.0</version>
Im trying to include the version tags to be sure I dont replace any comments or dependencies this same version number may appear on.
I think the characters like '<,> or /' are causing issues.
I don't have much experience with escaping characters like this on the command line so any help is appreciated.
I am on a Windows 7 machine but have Git Bash and Cygwin installed.
I am either using a tool called "fart.exe" for this - if the replacement is simple. https://sourceforge.net/projects/fart-it/
If I need regex I use power shell.
Here is an example (mix of batch-file and power-shell) which replaces a version string in all XML files:
[replace.bat]:
SET version=1.2.3
for /r %%x in (*.xml) do (
powershell -Command "& {(Get-Content '%%x') | Foreach-Object { $_ -replace '(''version''\s?\:\s?'')(\d*\.\d*\.\d*)('')', '${1}%version%${3}' } | Set-Content '%%x'}"
)

Convert file from Windows to UNIX through Powershell or Batch

I have a batch script that prompts a user for some input then outputs a couple of files I'm using in an AIX environment. These files need to be in UNIX format (which I believe is UTF8), but I'm looking for some direction on the SIMPLEST way of doing this.
I don't like to have to download extra software packages; Cygwin or GnuWin32. I don't mind coding this if it is possible, my coding options are Batch, Powershell and VBS. Does anyone know of a way to do this?
Alternatively could I create the files with Batch and call a Powershell script to reform these?
The idea here is a user would be prompted for some information, then I output a standard file which are basically prompt answers in AIX for a job. I'm using Batch initially, because I didn't know that I would run into this problem, but I'm kind of leaning towards redoing this in Powershell. because I had found some code on another forum that can do the conversion (below).
% foreach($i in ls -name DIR/*.txt) { \
get-content DIR/$i | \
out-file -encoding utf8 -filepath DIR2/$i \
}
Looking for some direction or some input on this.
You can't do this without external tools in batch files.
If all you need is the file encoding, then the snippet you gave should work. If you want to convert the files inline (instead of writing them to another place) you can do
Get-ChildItem *.txt | ForEach-Object { (Get-Content $_) | Out-File -Encoding UTF8 $_ }
(the parentheses around Get-Content are important) However, this will write the files in UTF-8 with a signature at the start (U+FEFF) which some Unix tools don't accept (even though it's technically legal, though discouraged to use).
Then there is the problem that line breaks are different between Windows and Unix. Unix uses only U+000A (LF) while Windows uses two characters for that: U+000D U+000A (CR+LF). So ideally you'd convert the line breaks, too. But that gets a little more complex:
Get-ChildItem *.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}
Try the overloaded version ReadAllText(String, Encoding) if you are using ANSI characters and not only ASCII ones.
$contents = [IO.File]::ReadAllText($_, [Text.Encoding]::Default) -replace "`r`n", "`n"
https://msdn.microsoft.com/en-us/library/system.io.file.readalltext(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
ASCII - Gets an encoding for the ASCII (7-bit) character set.
Default - Gets an encoding for the operating system's current ANSI code page.

an app or a batch file script to remove special characters from text

I love this online tool http://textmechanic.co/ but it lacks another important feature which is to delete special characters such as %, %, [, ), *, ?, ', etc.. except for _, -, and . from a large quantity of text.
I am looking for an online tool or a small windows utility or a batch script that can do this.
I think sed is the easiest choice here. You can download it for Windows here Furthermore, nearly every text editor should allow that (but most won't cope with files in the multi-GiB range well).
With sed you'd probably want something like this:
sed "s/[^a-zA-Z0-9_.-]//g" file.txt
Likewise, if you have a semi-recent Windows (i.e. Windows 7), then PowerShell comes preinstalled with it. The following one-liner will do that for you:
Get-Content file.txt | foreach { $_ -replace '[^\w\d_.-]' } | Out-File -Encoding UTF8 file.new.txt
This can easily adapted to multiple files as well. It could be that you also can output into the original file again, since I think Get-Content yields an array, not an enumerator (i.e. this pipeline cannot operate on the file as you read it). Similar problem due to that with very large files, though.
You can do regex with any tool/language that supports it. Here's a Ruby for Windows command
C:\work>ruby -ne 'print $_.gsub(/[%)?\[\]*]/,"")' file

Why can't my Perl script find the file when I run it from Windows?

I have a Perl Script which was built on a Linux platform using Perl 5.8 . However now I am trying to run the Perl Script on a Windows platform command prompt with the same Perl version.
I am using this command perl rgex.pl however it gives me one whole chunk of errors which looks to me like it has already been resolved in the script itself. The weird thing is I am able to run another Perl script without problem consisting of simple functions such as print, input etc.
The Code:
#!/usr/bin/perl
use warnings;
use strict;
use Term::ANSIColor;
my $file = "C:\Documents and Settings\Desktop\logfiles.log";
open LOG, $file or die "The file $file has the error of:\n => $!";
my #lines = <LOG>;
close (LOG);
my $varchar = 0;
foreach my $line ( #lines ) {
if ( $line =~ m/PLLog/ )
{
print("\n\n\n");
my $coloredText = colored($varchar, 'bold underline red');
print colored ("POS :: $coloredText\n\n", 'bold underline red');
$varchar ++;
}
print( $line );
}
When I run on the windows command prompt it gives me errors such as:
Unrecognized escape \D passed through at rgex.pl line 7.
=> No such file or directory at rgex.pl line 8.
Please give some advice on the codes please. Thanks.
A \ in a Perl string enclosed in double quotes marks the beginning of an escape sequence like \n for newline, \t for tab. Since you want \ to be treated literally you need to escape \ like \\ as:
my $file = "C:\\Documents and Settings\\Desktop\\logfiles.log";
Since you are not interpolating any variables in the string it's better to use single quotes:
my $file = 'C:\Documents and Settings\Desktop\logfiles.log';
(Inside single quotes, \ is not special unless the next character is a backslash or single quote.)
These error messages are pretty clear. They tell you exactly which lines the problems are on (unlike some error messages which tell you the line where Perl first though "Hey, wait a minute!").
When you run into these sorts of problems, reduce the program to just the problematic lines and start working on them. Start with the first errors first, since they often cascade to the errors that you see later.
When you want to check the value that you get, print it to ensure it is what you think it is:
my $file = "C:\\D....";
print "file is [$file]\n";
This would have shown you very quickly that there was a problem with $file, and once you know where the problem is, you're most of the way to solving it.
This is just basic debugging technique.
Also, you're missing quite a bit of the basics, so going through a good Perl tutorial will help you immensely. There are several listed in perlfaq2 or perlbook. Many of the problems that you're having are things that Learning Perl deals with in the first couple of chapters.

Resources