Converting unix script to windows script - emulating a Sed command in PowerShell - windows

I have a unix script (korn to be exact) that is working well and I need to convert it windows batch script. So far I have tried inserting a powershell command line on my code, but it doesn't work. Please help, I am just new to both unix scripting and windows scripting so any help will do.
This is the line of code that I need to convert:
#create new file to parse ; exclude past instances of timestamp
parsefile=/tmp/$$.parse
sed -e "1,/$TIMESTAMP/d" -e "/$TIMESTAMP/d" $DSTLOGFILE > $parsefile
So far I have tried a powershell command line to be called on my script but it didn't work:
:set_parse_file
#powershell -Command "Get-Content $SCHLOGFILE | Foreach-Object {$_ -replace('1,/"$TIMESTAMP"/d' '/"$TIMESTAMP"/d'} | Set-Content $PARSEFILE"
Any suggestions please?

PowerShell has no sed-like constructs for processing ranges of lines (e.g., sed interprets 1,/foo/ as referring to the range of consecutive lines from line 1 through a subsequent line that matches regex foo)
Emulating this feature with line-by-line processing would be much more verbose, but a comparatively more concise version is possible if the input file is processed as a whole - which is only an option with files small enough to fit into memory as a whole, however (PSv5+ syntax).
Here's the pure PowerShell code:
$escapedTimeStamp = [regex]::Escape($TIMESTAMP)
(Get-Content -Raw $SCHLOGFILE) -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') `
-replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') |
Set-Content -NoNewline $PARSEFILE
Note that [regex]::Escape() is used to make sure that the value of $TIMESTAMP is treated as a literal, even if it happens to contain regex metacharacters (chars. with special meaning to the regex engine).
Your ksh code doesn't do that (and it's nontrivial to do in ksh), so if - conversely - $TIMESTAMP should be interpreted as a regex, simply omit that step and use $TIMESTAMP directly.
The -replace operator is regex-based and uses the .NET regular-expression engine.
It is the use of Get-Content's -Raw switch that requires PSv3+ and the use of Set-Content's -NoNewline switch that requires PSv5+. You can make this command work in earlier versions, but it requires more effort.
Calling the above from cmd.exe (a batch file) gets quite unwieldy - and you always have to be wary of quoting issues - but it should work:
#powershell.exe -noprofile -command "$escapedTimeStamp = [regex]::Escape('%TIMESTAMP%'); (Get-Content -Raw '%SCHLOGFILE%') -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') -replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') | Set-Content -NoNewline '%PARSEFILE%'"
Note how the -command argument is passed as a single "..." string, which is ultimately the safest and conceptually cleanest way to pass code to PowerShell.
Also note the need to embed batch variables as %varname% in the command, and since they are enclosed in embedded '...' above, the assumption is that their values contain no ' chars.
Therefore, consider implementing your entire script in Powershell - you'll have a much more powerful scripting language at your disposal, and you'll avoid the quoting headaches that come from bridging two disparate worlds.

Related

Is there a PowerShell Get-Content Function to extract line based on the first character?

I am trying to extract each line from a CSV that has over 1million (1,000,000) lines, where the first character is a 1.
The 1 in this case, refers to the 1st line of a log. There are several different logs in this file, and I need the first line from all of them. Problem is (as you could understand) 1 is not unique, and can appear in any of the 12 'columns' of data I have in this CSV
Essentially, I would like to extract them all to a new CSV file as well, for further break down.
I know it sounds simple enough, but I cannot seem to get the information I need.
I have searched StackOverflow, Microsoft, Google and my own Tech Team.
PS: Get-Content 'C:\Users\myfiles\Desktop\massivelogs.csv' | Select-String "1" | Out-File "extractedlogs.csv"
The immediate answer is that you must use Select-String '^1 in order to restrict matching to the start (^) of each input line.
However, a much faster solution is to use the switch statement with the -File` option:
$inFile = 'C:\Users\myfiles\Desktop\massivelogs.csv'
$outFile = 'extractedlogs.csv'
& { switch -File $inFile -Wildcard { '1*' { $_ } } } | Set-Content $outFile
Note, however, that the output file won't be a true CSV file, because it will lack a header row.
Also, note that Set-Content applies an edition-specific default character encoding (the active ANSI code page in Windows PowerShell, BOM-less UTF-8 in PowerShell Core); use -Encoding as needed.
Using -Wildcard with a wildcard pattern (1*) speeds things up slightly, compared to -Regex with ^1.

How do you resolve special folders in path strings in PowerShell?

In the Windows Command Prompt, special folders are resolved like so:
However, in powershell, these folders do not seem to be resolved:
Consider the string:
$myfile = "%temp%\\myfolder\\myfile.txt"
How can I use this as an argument to PowerShell functions (eg: Remove-Item), and have PowerShell correctly resolve the special folders, as opposed to taking it literally and prepending the current working directory?
Edit:
I am working with strings using standard windows path notation coming from external configuration files, for example:
config.json:
{
"some_file": "%TEMP%\\folder\\file.txt"
}
myscript.ps1:
$config = Get-Content -Raw -Path "config.json" | ConvertFrom-Json
Remove-Item -path $config.some_file -Force
Note: as any of the Windows special folders can appear in these strings, I'd rather avoid horrible find-replace hacks like this
$config.some_file = $config.some_file -replace '%TEMP%' $env:temp
You can expand it to a full path using:
[System.Environment]::ExpandEnvironmentVariables("%TEMP%\\myfolder\\myfile.txt")
c:\users\username\AppData\Local\Temp\\myfolder\\myfile.txt
Double-backslash \\ isn't a PowerShell thing either, \ is not a special character in a PowerShell string - but double backslashes in a path do seem to work.
Documentation: https://msdn.microsoft.com/en-us/library/system.environment.expandenvironmentvariables.aspx
If you don't mind some performance issues
$resolvedPathInABitHackyWay = (cmd /c echo "%TEMP%\\folder\\file.txt")
This will actually give you %TEMP% resolved by cmd itself.
You can grab all env variables from the env:\ drive and use that to construct a succinct regex pattern for your find-replace operation, then use the Regex.Replace() method with a match evaluator:
$vars = Get-ChildItem env:\ |ForEach-Object {[regex]::Escape($_.Name)}
$find = "%(?:$($envNames -join '|'))%"
[regex]::Replace($config.some_file, $find, {param([System.Text.RegularExpressions.Match]$found) return (Get-Item "env:\$($found.Groups[1])").Value},'IgnoreCase')

Convert file from Windows to UNIX through Powershell or Batch

I have a batch script that prompts a user for some input then outputs a couple of files I'm using in an AIX environment. These files need to be in UNIX format (which I believe is UTF8), but I'm looking for some direction on the SIMPLEST way of doing this.
I don't like to have to download extra software packages; Cygwin or GnuWin32. I don't mind coding this if it is possible, my coding options are Batch, Powershell and VBS. Does anyone know of a way to do this?
Alternatively could I create the files with Batch and call a Powershell script to reform these?
The idea here is a user would be prompted for some information, then I output a standard file which are basically prompt answers in AIX for a job. I'm using Batch initially, because I didn't know that I would run into this problem, but I'm kind of leaning towards redoing this in Powershell. because I had found some code on another forum that can do the conversion (below).
% foreach($i in ls -name DIR/*.txt) { \
get-content DIR/$i | \
out-file -encoding utf8 -filepath DIR2/$i \
}
Looking for some direction or some input on this.
You can't do this without external tools in batch files.
If all you need is the file encoding, then the snippet you gave should work. If you want to convert the files inline (instead of writing them to another place) you can do
Get-ChildItem *.txt | ForEach-Object { (Get-Content $_) | Out-File -Encoding UTF8 $_ }
(the parentheses around Get-Content are important) However, this will write the files in UTF-8 with a signature at the start (U+FEFF) which some Unix tools don't accept (even though it's technically legal, though discouraged to use).
Then there is the problem that line breaks are different between Windows and Unix. Unix uses only U+000A (LF) while Windows uses two characters for that: U+000D U+000A (CR+LF). So ideally you'd convert the line breaks, too. But that gets a little more complex:
Get-ChildItem *.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}
Try the overloaded version ReadAllText(String, Encoding) if you are using ANSI characters and not only ASCII ones.
$contents = [IO.File]::ReadAllText($_, [Text.Encoding]::Default) -replace "`r`n", "`n"
https://msdn.microsoft.com/en-us/library/system.io.file.readalltext(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
ASCII - Gets an encoding for the ASCII (7-bit) character set.
Default - Gets an encoding for the operating system's current ANSI code page.

an app or a batch file script to remove special characters from text

I love this online tool http://textmechanic.co/ but it lacks another important feature which is to delete special characters such as %, %, [, ), *, ?, ', etc.. except for _, -, and . from a large quantity of text.
I am looking for an online tool or a small windows utility or a batch script that can do this.
I think sed is the easiest choice here. You can download it for Windows here Furthermore, nearly every text editor should allow that (but most won't cope with files in the multi-GiB range well).
With sed you'd probably want something like this:
sed "s/[^a-zA-Z0-9_.-]//g" file.txt
Likewise, if you have a semi-recent Windows (i.e. Windows 7), then PowerShell comes preinstalled with it. The following one-liner will do that for you:
Get-Content file.txt | foreach { $_ -replace '[^\w\d_.-]' } | Out-File -Encoding UTF8 file.new.txt
This can easily adapted to multiple files as well. It could be that you also can output into the original file again, since I think Get-Content yields an array, not an enumerator (i.e. this pipeline cannot operate on the file as you read it). Similar problem due to that with very large files, though.
You can do regex with any tool/language that supports it. Here's a Ruby for Windows command
C:\work>ruby -ne 'print $_.gsub(/[%)?\[\]*]/,"")' file

VS2010 Post-build event, replace string in a file. Powershell?

I need to replace a simple string in a minified .js file after a successful build in VS2010.
So I'm trying to run a simple command line call from the Post-build events window.
This example, from here: https://blogs.technet.com/b/heyscriptingguy/archive/2008/01/17/how-can-i-use-windows-powershell-to-replace-characters-in-a-text-file.aspx totally mangulates the resulting .js file. Something is wrong, I suspect it is coming across some weird chars in my minified .js file that screws it up.
(Get-Content C:\Scripts\Test.js) |
Foreach-Object {$_ -replace "// Old JS comment", "// New JS comment"} |
Set-Content C:\Scripts\Test.js
How can I achieve such a simple task like I could do in unix in a single line..?
It would be great to see the diff file. Without more info, some info:
Set-Content adds a new empty line at the end (probably not a problem for you)
You can use -replace operator like this:
(gc C:\Scripts\Test.js) -replace 'a','b' | sc C:\Scripts\Test.js
-replace works on arrays too.
You could read the content via [io.file]::ReadAllText('c:\scripts\test.js') and use-replace`, but again, I don't think there will be significant difference.
Edit:
Double quotes are used when evaluating the string. Example:
$r = 'x'
$a = 'test'
'beg',1,2,"3x",'4xfour','last' -replace "1|$r","$a"
gives
beg
test
2
3test
4testfour
anything
To save the content with no ending new line, just use [io.file]::WriteAllText
$repl = (gc C:\Scripts\Test.js) -replace 'a','b' -join "`r`n"
[io.file]::WriteAllText('c:\scripts\test.js', $repl)

Resources