I am planning to write an object-oriented shell (based on Python). I have many ideas already. But before I am going to implement it, I want to inspire me by some existing shell.
What I basically mean by object-oriented:
Parameters are not just an array of strings but an array of objects.
The return value is also an object.
There is not just stdin, stdout and stderr but any possible number of named streams which can be of certain types (not just a stream of bytes).
I have read that the Windows PowerShell is somewhat like that (based on .Net). Though I am searching for some existing Linux/MacOSX shells.
Of course there is also IPython but it is not really intended as a Unix shell, i.e. piping stuff around is quite complicated.
Microsoft's Powershell. Installed by default on Windows 7 & Server 2008, can be installed on XP & Vista. It's a really good tool, a bit long to warm-up, but once it's done it's really usefull.
The features I really love in it is the filtering :
ls | where-object { $_.size -eq 0 }
who can be rewritten in the compact form
ls | ? { $_.size -eq 0 }
and the transformation (followed by it's compact form ):
ls | foreach-object { $_.name -replace "\folderName","daba" }
ls | % { $_.name -replace "\folderName","daba" }
you can also easily create pipe filter within the shell language, which is a pretty neat feature.
function concat()
{
Begin { $rez = ""; }
Process { $rez = $rez + $_ }
End { $rez }
}
ls | % { $_.name } | concat
The last expression list all files, extract the filename and concatenate them in a single string (it might be some cmdlet to do that but I don't remember the name).
Another important part of the powershell, is the introspection, you can query your object proprety/methods from the command line :
ls | get-member
Really useful to play with new objects, it's a bit more descriptive than dir()from python
Perhaps you may want to take a look at Pash.
It is an open source implementation of the PowerShell for other platforms. For educational purposes and inspiration it might be useful. Unfortunately, as far as I can see, this promising project is not being developed.
According to the shell comparison list on Wikipedia, the only existing shells which can do that are MS PowerShell and IPython (if that counts as a command shell) with the IPipe extension for piping.
If you only count real cross platform solutions, MS PowerShell cannot be used. There is the Pash port of it (thanks Roman to notice it), though it is incomplete and thus not really useable.
So, to answer my question: There is no such thing yet.
Related
I have a directory containing hundreds of thousands of PDF files with quite complex names. I need to be able to move SOME (not all files) from the directory they're in to another directory. Here is an example of my .sh script that handles it:
#!/bin/bash
/usr/bin/echo "Moving subset 300-399"
# 300-399
/usr/bin/mv *-*-*-3[0-9][0-9]-*-*-*-*.pdf ../destination_folder/
/usr/bin/echo "Moving subset 450-499"
# 450-499
/usr/bin/mv *-*-*-4[5-9][0-9]-*-*-*-*.pdf ../destination_folder/
/usr/bin/echo "Moving subset 500-599"
# 500-599
/usr/bin/mv *-*-*-5[0-9][0-9]-*-*-*-*.pdf ../destination_folder/
Because there are so many files and I think that mv is performing an evaluation on every single one, it's taking upwards of two hours to perform the work. This is a script that must be run EVERY day, so I need to find a more efficient way to do the work. Is there a more efficient command I can utilize in a Windows environment or a more efficient way I can evaluate each file in order to speed up the mv process?
As mentioned in the comments, powershell will probably be faster as it is native to windows. The difference in speed will be dependent on the implementation of bash you are using.
For a pure bash solution, you can try :
#!/bin/bash
find /input/folder -regextype posix-extended -regex '^(?:[^-]+-){3}(?:4[5-9]|[35][0-9])[0-9](?:-[^-]+){4}\.pdf$' -exec mv {} /destination/folder +
Explanation :
find /input/folder -regextype posix-extended -regex :
find every file in your input folder that match the regex
'^(?:[^-]+-){3}(?:4[5-9]|[35][0-9])[0-9](?:-[^-]+){4}\.pdf$'
the pattern matching your files. More explanations here
-exec mv {} /destination/folder +
execute the mv command on every file found
the + symbol means the command will be executed in as few calls as possible, when the find command has discovered every file matching the regex
It is worth to mention that the duration of these mv commands depends on the amount of data of course: the total size of the pdf files in the current directory.
Please, note that mv command has at least 2 different behaviors with different performances, depending on the location of the ../destination_folder/ directory:
../destination_folder/ and *.pdf files on different file systems: the mv command is copying the files and then removing them from the source directory.
../destination_folder/ and *.pdf files on the same file system: only a rename is done which is super fast.
the df command can be used to display the ../destination_folder/ directory very nature.
Should you could choose the destination directory, then make sure it is located on the same file system: expect a great improvement.
In addition, if the ../destination_folder/ directory is located onto a remote server, the duration depends also on the network speed. If this is your situation, then compressing/uncompressing the files while moving should be tested: the performance can be much better.
If you have bash on Windows, you can run each in the background with the & suffix and try to parallelize it to achieve better performance. Use the wait keyword to wait for the background processes to complete. For example:
/usr/bin/echo "Moving subset 300-399"
/usr/bin/mv *-*-*-3[0-9][0-9]-*-*-*-*.pdf ../destination_folder/ & # Run this line in the background
# Other async calls
# Wait for background processes to finish
wait
If you want PowerShell, you can use Start-Job to run these in the background. To use your 300 subset as an example:
Write-Host "Moving subset 300-399"
$mv300jb = Start-Job {
$sourceFiles = Get-ChildItem -File .\*-*-*-3*-*-*-*-*.pdf | Where-Object {
$_.FullName -match '\\(\w+-){3}3[0-9]{2}(-\w+){4}\.pdf$'
}
Move-Item -Path $sourceFiles "..\destination_folder"
}
# Here you would also start other async jobs, assigning $mv400, $mv500, etc. like above
...
# Wait for job to complete
while( $mv300.State -notin 'Completed', 'Failed' ) {
Start-Sleep 30 # Change this to number of seconds to poll job again
}
Honorable mention
A second alternative on Windows would be to use robocopy.exe which copies and moves files more performantly than the standard copy and move commands. The /mt parameter will make use of multi-threading. Unfortunately, I don't have any robocopy examples to share here.
Explaining the regex
Note: I have since learned that you can use basic character ranges with Get-ChildItem and some other PowerShell cmdlets which support globbing. See my edit at the bottom of this answer for more information.
Since asked, here's a breakdown of the .NET regex I used to match on the filename:
\\(\w+-){3}3[0-9]{2}(-\w+){4}\.pdf$
\\: Literal \ character
(\w+-): Looks for group of one or more \w word-characters followed by a -
{3}: Quantifier to match on exactly 3 occurrences of the previous group
3[0-9]: Looks for literal 3 followed by a digit character
{2}: Quantifier to match on exactly two preceeding digit characters
(-\w+): Looks for group of one or more - characters followed by at least one word-character \w.
{4}: Quantifier to match exactly 4 occurrences of the previous group
\.pdf: Literal . character followed by pdf
$: End of input/string
At this time of writing I was unaware character ranges can be used with globbing in Get-ChildItem, so I resorted to using a regular expression to find the exact number of fields matching the specific number pattern in the 4th field, while ensuring the 8-field filename was intact for any found files.
If you plug this expression into https://regexr.com, it will break the expression down and explain everything better visually than I can here, without making this answer too long.
EDIT
As I learned the other day, you can use character ranges with PowerShell's file matching, though this doesn't work in other contexts within Windows. In my example above the following line can be modified to match letter and number ranges as well without having to use regex. If you take the following code from above:
$sourceFiles = Get-ChildItem -File .\*-*-*-3*-*-*-*-*.pdf | Where-Object {
$_.FullName -match '\\(\w+-){3}3[0-9]{2}(-\w+){4}\.pdf$'
}
we can use globbing to match on the filename without having to use the Where-Object or regular expression, greatly reducing the complexity of this bit:
$sourceFiles = Get-ChildItem -File .\*-*-*-3[0-9][0-9]*-*-*-*-*.pdf
Here is the modified code for eschewing the regex in favor of globbing:
Write-Host "Moving subset 300-399"
$mv300jb = Start-Job {
$sourceFiles = Get-ChildItem -File .\*-*-*-3*-*-*-*-*.pdf
Move-Item -Path $sourceFiles "..\destination_folder"
}
# Here you would also start other async jobs, assigning $mv400, $mv500, etc. like above
...
# Wait for job to complete
while( $mv300.State -notin 'Completed', 'Failed' ) {
Start-Sleep 30 # Change this to number of seconds to poll job again
}
The availability of this feature seems to hinge on whether a PowerShell construct is performing the globbing (it works) or if it is native to the Win32 API (does not work). In other words, it seems to be supported by PowerShell but not by other Windows APIs.
I am trying to extract each line from a CSV that has over 1million (1,000,000) lines, where the first character is a 1.
The 1 in this case, refers to the 1st line of a log. There are several different logs in this file, and I need the first line from all of them. Problem is (as you could understand) 1 is not unique, and can appear in any of the 12 'columns' of data I have in this CSV
Essentially, I would like to extract them all to a new CSV file as well, for further break down.
I know it sounds simple enough, but I cannot seem to get the information I need.
I have searched StackOverflow, Microsoft, Google and my own Tech Team.
PS: Get-Content 'C:\Users\myfiles\Desktop\massivelogs.csv' | Select-String "1" | Out-File "extractedlogs.csv"
The immediate answer is that you must use Select-String '^1 in order to restrict matching to the start (^) of each input line.
However, a much faster solution is to use the switch statement with the -File` option:
$inFile = 'C:\Users\myfiles\Desktop\massivelogs.csv'
$outFile = 'extractedlogs.csv'
& { switch -File $inFile -Wildcard { '1*' { $_ } } } | Set-Content $outFile
Note, however, that the output file won't be a true CSV file, because it will lack a header row.
Also, note that Set-Content applies an edition-specific default character encoding (the active ANSI code page in Windows PowerShell, BOM-less UTF-8 in PowerShell Core); use -Encoding as needed.
Using -Wildcard with a wildcard pattern (1*) speeds things up slightly, compared to -Regex with ^1.
I have a unix script (korn to be exact) that is working well and I need to convert it windows batch script. So far I have tried inserting a powershell command line on my code, but it doesn't work. Please help, I am just new to both unix scripting and windows scripting so any help will do.
This is the line of code that I need to convert:
#create new file to parse ; exclude past instances of timestamp
parsefile=/tmp/$$.parse
sed -e "1,/$TIMESTAMP/d" -e "/$TIMESTAMP/d" $DSTLOGFILE > $parsefile
So far I have tried a powershell command line to be called on my script but it didn't work:
:set_parse_file
#powershell -Command "Get-Content $SCHLOGFILE | Foreach-Object {$_ -replace('1,/"$TIMESTAMP"/d' '/"$TIMESTAMP"/d'} | Set-Content $PARSEFILE"
Any suggestions please?
PowerShell has no sed-like constructs for processing ranges of lines (e.g., sed interprets 1,/foo/ as referring to the range of consecutive lines from line 1 through a subsequent line that matches regex foo)
Emulating this feature with line-by-line processing would be much more verbose, but a comparatively more concise version is possible if the input file is processed as a whole - which is only an option with files small enough to fit into memory as a whole, however (PSv5+ syntax).
Here's the pure PowerShell code:
$escapedTimeStamp = [regex]::Escape($TIMESTAMP)
(Get-Content -Raw $SCHLOGFILE) -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') `
-replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') |
Set-Content -NoNewline $PARSEFILE
Note that [regex]::Escape() is used to make sure that the value of $TIMESTAMP is treated as a literal, even if it happens to contain regex metacharacters (chars. with special meaning to the regex engine).
Your ksh code doesn't do that (and it's nontrivial to do in ksh), so if - conversely - $TIMESTAMP should be interpreted as a regex, simply omit that step and use $TIMESTAMP directly.
The -replace operator is regex-based and uses the .NET regular-expression engine.
It is the use of Get-Content's -Raw switch that requires PSv3+ and the use of Set-Content's -NoNewline switch that requires PSv5+. You can make this command work in earlier versions, but it requires more effort.
Calling the above from cmd.exe (a batch file) gets quite unwieldy - and you always have to be wary of quoting issues - but it should work:
#powershell.exe -noprofile -command "$escapedTimeStamp = [regex]::Escape('%TIMESTAMP%'); (Get-Content -Raw '%SCHLOGFILE%') -replace ('(?ms)\A.*?\r?\n.*?' + $escapedTimeStamp + '.*?\r?\n') -replace ('(?m)^.*?' + $escapedTimeStamp + '.*\r?\n') | Set-Content -NoNewline '%PARSEFILE%'"
Note how the -command argument is passed as a single "..." string, which is ultimately the safest and conceptually cleanest way to pass code to PowerShell.
Also note the need to embed batch variables as %varname% in the command, and since they are enclosed in embedded '...' above, the assumption is that their values contain no ' chars.
Therefore, consider implementing your entire script in Powershell - you'll have a much more powerful scripting language at your disposal, and you'll avoid the quoting headaches that come from bridging two disparate worlds.
I have a batch script that prompts a user for some input then outputs a couple of files I'm using in an AIX environment. These files need to be in UNIX format (which I believe is UTF8), but I'm looking for some direction on the SIMPLEST way of doing this.
I don't like to have to download extra software packages; Cygwin or GnuWin32. I don't mind coding this if it is possible, my coding options are Batch, Powershell and VBS. Does anyone know of a way to do this?
Alternatively could I create the files with Batch and call a Powershell script to reform these?
The idea here is a user would be prompted for some information, then I output a standard file which are basically prompt answers in AIX for a job. I'm using Batch initially, because I didn't know that I would run into this problem, but I'm kind of leaning towards redoing this in Powershell. because I had found some code on another forum that can do the conversion (below).
% foreach($i in ls -name DIR/*.txt) { \
get-content DIR/$i | \
out-file -encoding utf8 -filepath DIR2/$i \
}
Looking for some direction or some input on this.
You can't do this without external tools in batch files.
If all you need is the file encoding, then the snippet you gave should work. If you want to convert the files inline (instead of writing them to another place) you can do
Get-ChildItem *.txt | ForEach-Object { (Get-Content $_) | Out-File -Encoding UTF8 $_ }
(the parentheses around Get-Content are important) However, this will write the files in UTF-8 with a signature at the start (U+FEFF) which some Unix tools don't accept (even though it's technically legal, though discouraged to use).
Then there is the problem that line breaks are different between Windows and Unix. Unix uses only U+000A (LF) while Windows uses two characters for that: U+000D U+000A (CR+LF). So ideally you'd convert the line breaks, too. But that gets a little more complex:
Get-ChildItem *.txt | ForEach-Object {
# get the contents and replace line breaks by U+000A
$contents = [IO.File]::ReadAllText($_) -replace "`r`n?", "`n"
# create UTF-8 encoding without signature
$utf8 = New-Object System.Text.UTF8Encoding $false
# write the text back
[IO.File]::WriteAllText($_, $contents, $utf8)
}
Try the overloaded version ReadAllText(String, Encoding) if you are using ANSI characters and not only ASCII ones.
$contents = [IO.File]::ReadAllText($_, [Text.Encoding]::Default) -replace "`r`n", "`n"
https://msdn.microsoft.com/en-us/library/system.io.file.readalltext(v=vs.110).aspx
https://msdn.microsoft.com/en-us/library/system.text.encoding(v=vs.110).aspx
ASCII - Gets an encoding for the ASCII (7-bit) character set.
Default - Gets an encoding for the operating system's current ANSI code page.
I love this online tool http://textmechanic.co/ but it lacks another important feature which is to delete special characters such as %, %, [, ), *, ?, ', etc.. except for _, -, and . from a large quantity of text.
I am looking for an online tool or a small windows utility or a batch script that can do this.
I think sed is the easiest choice here. You can download it for Windows here Furthermore, nearly every text editor should allow that (but most won't cope with files in the multi-GiB range well).
With sed you'd probably want something like this:
sed "s/[^a-zA-Z0-9_.-]//g" file.txt
Likewise, if you have a semi-recent Windows (i.e. Windows 7), then PowerShell comes preinstalled with it. The following one-liner will do that for you:
Get-Content file.txt | foreach { $_ -replace '[^\w\d_.-]' } | Out-File -Encoding UTF8 file.new.txt
This can easily adapted to multiple files as well. It could be that you also can output into the original file again, since I think Get-Content yields an array, not an enumerator (i.e. this pipeline cannot operate on the file as you read it). Similar problem due to that with very large files, though.
You can do regex with any tool/language that supports it. Here's a Ruby for Windows command
C:\work>ruby -ne 'print $_.gsub(/[%)?\[\]*]/,"")' file