I am running multiple scripts in PowerShell ISE to allow me to remove a specific line of code in many XML documents. The code I have been advised to use is the following:
Get-Content .\AdamInfTest.xml | Get-Content .\AdamInfTest.xml | Where-Object {$_ -notmatch '<!DOCTYPE'} | Set-Content .\Complete\AdamInfTestOut.xml
This will remove the line of code in the XML file with the text 'DOCTYPE' and this does work. However, when running the script the outputted file is replicating the entire document code over and over until the file becomes worryingly large (about 30MB when the non-scripted XML is about 60KB).
I am unsure why this is happening and any help would be appreciated - Thanks!
This Get-Content .\AdamInfTest.xml | Get-Content .\AdamInfTest.xml doesn't make sanse as this will mean that you re-read the whole .\AdamInfTest.xml for each line in the .\AdamInfTest.xml file. I guess that 60Kb times the number of lines in the .\AdamInfTest.xml (which I do not know) is about 30Mb... In other word what happens when you remove one of the Get-Content .\AdamInfTest.xml? –
Related
I am trying to extract each line from a CSV that has over 1million (1,000,000) lines, where the first character is a 1.
The 1 in this case, refers to the 1st line of a log. There are several different logs in this file, and I need the first line from all of them. Problem is (as you could understand) 1 is not unique, and can appear in any of the 12 'columns' of data I have in this CSV
Essentially, I would like to extract them all to a new CSV file as well, for further break down.
I know it sounds simple enough, but I cannot seem to get the information I need.
I have searched StackOverflow, Microsoft, Google and my own Tech Team.
PS: Get-Content 'C:\Users\myfiles\Desktop\massivelogs.csv' | Select-String "1" | Out-File "extractedlogs.csv"
The immediate answer is that you must use Select-String '^1 in order to restrict matching to the start (^) of each input line.
However, a much faster solution is to use the switch statement with the -File` option:
$inFile = 'C:\Users\myfiles\Desktop\massivelogs.csv'
$outFile = 'extractedlogs.csv'
& { switch -File $inFile -Wildcard { '1*' { $_ } } } | Set-Content $outFile
Note, however, that the output file won't be a true CSV file, because it will lack a header row.
Also, note that Set-Content applies an edition-specific default character encoding (the active ANSI code page in Windows PowerShell, BOM-less UTF-8 in PowerShell Core); use -Encoding as needed.
Using -Wildcard with a wildcard pattern (1*) speeds things up slightly, compared to -Regex with ^1.
I have a unix script (korn to be exact) that is working well and I need to convert it windows batch script. So far I have tried inserting a powershell command line on my code, but it doesn't work. Please help, I am just new to both unix scripting and windows scripting so any help will do.
This is the line of code that I need to convert:
#create new file to parse ; exclude past instances of timestamp
parsefile=/tmp/$$.parse
sed -e "1,/$TIMESTAMP/d" -e "/$TIMESTAMP/d" $DSTLOGFILE > $parsefile
So far I have tried a powershell command line to be called on my script but it didn't work:
:set_parse_file
#powershell -Command "Get-Content $SCHLOGFILE | Foreach-Object {$_ -replace('1,/"$TIMESTAMP"/d' '/"$TIMESTAMP"/d'} | Set-Content $PARSEFILE"
Any suggestions please?
PowerShell has no sed-like constructs for processing ranges of lines (e.g., sed interprets 1,/foo/ as referring to the range of consecutive lines from line 1 through a subsequent line that matches regex foo)
Emulating this feature with line-by-line processing would be much more verbose, but a comparatively more concise version is possible if the input file is processed as a whole - which is only an option with files small enough to fit into memory as a whole, however (PSv5+ syntax).
Here's the pure PowerShell code:
$escapedTimeStamp = [regex]::Escape($TIMESTAMP)
(Get-Content -Raw $SCHLOGFILE) -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') `
-replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') |
Set-Content -NoNewline $PARSEFILE
Note that [regex]::Escape() is used to make sure that the value of $TIMESTAMP is treated as a literal, even if it happens to contain regex metacharacters (chars. with special meaning to the regex engine).
Your ksh code doesn't do that (and it's nontrivial to do in ksh), so if - conversely - $TIMESTAMP should be interpreted as a regex, simply omit that step and use $TIMESTAMP directly.
The -replace operator is regex-based and uses the .NET regular-expression engine.
It is the use of Get-Content's -Raw switch that requires PSv3+ and the use of Set-Content's -NoNewline switch that requires PSv5+. You can make this command work in earlier versions, but it requires more effort.
Calling the above from cmd.exe (a batch file) gets quite unwieldy - and you always have to be wary of quoting issues - but it should work:
#powershell.exe -noprofile -command "$escapedTimeStamp = [regex]::Escape('%TIMESTAMP%'); (Get-Content -Raw '%SCHLOGFILE%') -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') -replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') | Set-Content -NoNewline '%PARSEFILE%'"
Note how the -command argument is passed as a single "..." string, which is ultimately the safest and conceptually cleanest way to pass code to PowerShell.
Also note the need to embed batch variables as %varname% in the command, and since they are enclosed in embedded '...' above, the assumption is that their values contain no ' chars.
Therefore, consider implementing your entire script in Powershell - you'll have a much more powerful scripting language at your disposal, and you'll avoid the quoting headaches that come from bridging two disparate worlds.
I need to extract information using a powershell cmdlet and a txt file.
The TXT file contains a list of groups
I want to first feed powershell the script... pretty simple:
get-content c:\scripts\mygroups.txt
I then want to run a Foreach-object cmdlet against it and pull only the distinguished name
The problem is that I keep running into the -Filter command and I shouldn't need the filter command because the names are exactly pulled from AD.
Foreach-Object {Get-ADGroup -Filter "*" | select DistinguishedName} works but I dont want all the groups I want the variable that I used for the get-content command. I feel I am missing some type of link between the -Filter and selecting the field I want to display. Please help me link the two together. Thanks!
Here is the error I am getting...
Cannot convert 'System.Object[]' to the type 'Microsoft.ActiveDirectory.Management.ADGroup
Assuming that each group name is on a line in the file and there are no blank lines, try this:
Get-Content c:\scripts\mygroups.txt | Foreach {Get-ADGroup $_} |
Select DistinguishedName
You could actually take out the "Foreach" part of Keith's code and just let the pipeline do the loop for you:
Get-Content c:\scripts\mygroups.txt | Get-ADGroup | Select DistinguishedName
This is still assuming that the text file contains the group names, ("Name" attribute), with only one group name per line.
Pipe the content of the file to the Get-ADGroup cmdlet and expand the DistinguishedName of each output object:
Get-Content c:\scripts\mygroups.txt |
Get-ADGroup |
Select-Object -ExpandProperty DistinguishedName
I have a batch script that prompts a user for some input then outputs a couple of files I'm using in an AIX environment. These files need to be in UNIX format (which I believe is UTF8), but I'm looking for some direction on the SIMPLEST way of doing this.
I don't like to have to download extra software packages; Cygwin or GnuWin32. I don't mind coding this if it is possible, my coding options are Batch, Powershell and VBS. Does anyone know of a way to do this?
Alternatively could I create the files with Batch and call a Powershell script to reform these?
The idea here is a user would be prompted for some information, then I output a standard file which are basically prompt answers in AIX for a job. I'm using Batch initially, because I didn't know that I would run into this problem, but I'm kind of leaning towards redoing this in Powershell. because I had found some code on another forum that can do the conversion (below).
% foreach($i in ls -name DIR/*.txt) { \
get-content DIR/$i | \
out-file -encoding utf8 -filepath DIR2/$i \
}
Looking for some direction or some input on this.
You can't do this without external tools in batch files.
If all you need is the file encoding, then the snippet you gave should work. If you want to convert the files inline (instead of writing them to another place) you can do
Get-ChildItem *.txt | ForEach-Object { (Get-Content $_) | Out-File -Encoding UTF8 $_ }
(the parentheses around Get-Content are important) However, this will write the files in UTF-8 with a signature at the start (U+FEFF) which some Unix tools don't accept (even though it's technically legal, though discouraged to use).
Then there is the problem that line breaks are different between Windows and Unix. Unix uses only U+000A (LF) while Windows uses two characters for that: U+000D U+000A (CR+LF). So ideally you'd convert the line breaks, too. But that gets a little more complex:
Get-ChildItem *.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}
Try the overloaded version ReadAllText(String, Encoding) if you are using ANSI characters and not only ASCII ones.
$contents = [IO.File]::ReadAllText($_, [Text.Encoding]::Default) -replace "`r`n", "`n"
https://msdn.microsoft.com/en-us/library/system.io.file.readalltext(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
ASCII - Gets an encoding for the ASCII (7-bit) character set.
Default - Gets an encoding for the operating system's current ANSI code page.
I need to replace a simple string in a minified .js file after a successful build in VS2010.
So I'm trying to run a simple command line call from the Post-build events window.
This example, from here: https://blogs.technet.com/b/heyscriptingguy/archive/2008/01/17/how-can-i-use-windows-powershell-to-replace-characters-in-a-text-file.aspx totally mangulates the resulting .js file. Something is wrong, I suspect it is coming across some weird chars in my minified .js file that screws it up.
(Get-Content C:\Scripts\Test.js) |
Foreach-Object {$_ -replace "// Old JS comment", "// New JS comment"} |
Set-Content C:\Scripts\Test.js
How can I achieve such a simple task like I could do in unix in a single line..?
It would be great to see the diff file. Without more info, some info:
Set-Content adds a new empty line at the end (probably not a problem for you)
You can use -replace operator like this:
(gc C:\Scripts\Test.js) -replace 'a','b' | sc C:\Scripts\Test.js
-replace works on arrays too.
You could read the content via [io.file]::ReadAllText('c:\scripts\test.js') and use-replace`, but again, I don't think there will be significant difference.
Edit:
Double quotes are used when evaluating the string. Example:
$r = 'x'
$a = 'test'
'beg',1,2,"3x",'4xfour','last' -replace "1|$r","$a"
gives
beg
test
2
3test
4testfour
anything
To save the content with no ending new line, just use [io.file]::WriteAllText
$repl = (gc C:\Scripts\Test.js) -replace 'a','b' -join "`r`n"
[io.file]::WriteAllText('c:\scripts\test.js', $repl)