How to more accurate detect one file is a ruby script file in linux - ruby

I know the extname is rb is worked.
I know linux file command is worked to in some case.
But all not accurate enough to decide a file is a ruby scripts
What I want to do is: more accuretely amount the ruby lines I wrote
with a bash shell scripts like followings:
find -name '*.rb' |xargs -n100 cat |grep -v '\s*#' |wc -l
but, in fact, I wrote some executable ruby scripts, and others, e.g.
.rake, Gemfile Capfile jbuider etc ...

Use ruby -c. From the man page:
-c Causes Ruby to check the syntax of the script and exit
This will tell you if the file is a valid Ruby script without executing it. If it is, it will print "Syntax OK" to STDOUT and exit with status code 0; otherwise it will print a syntax error to STDERR and exit with a nonzero code. (You can of course suppress the messages using I/O redirection, e.g. &>/dev/null.)
Of course, false positives are possible (the fact that a file is valid Ruby doesn't necessarily mean it was intended to be a Ruby script), but unlikely except with very short files.

What you want is impossible. For example, the following is a valid, and semantically identical program in at least Ruby, PHP, Scala, and Perl:
It is also valid in Python, although semantically slightly different: it prints a newline (i.e. it prints the string "Hello\n") while the others don't (the others print "Hello" without a newline).
It is also at least syntactically valid in ECMAScript, and may be semantically equivalent assuming a suitable print function exists in the standard library.
It is probably valid in a lot more languages than that, some that I can think of are AmbientTalk, Atomy, CoffeeScript, Converge, Dart, Dylan, E, Elixir, Falcon, Fancy, Groovy, Hack, Io, Ioke, Julia, Lua, Monte, Neko, Pico, Pike, and Seph. It is also a valid fragment, although not a complete program, in at least Perl6, C, C++, Objective-C, Objective-C++, D, Java, C♯, Spec♯, Sing♯, M♯, Cω, X♯, Kotlin, Ceylon, Rust, and Rust.
There is no way of knowing whether this is a Ruby program except asking the person who wrote it.


How Bash parse multi-flag commands?

I'm trying to create an overly simplified version of bash, I've tried split the program into "lexer + expander, parser, executor".
In the lexer i store my data (commands, flags, files) and create tokens out of them , my procedure is simply to loop through given input char by char and use a state machine to handle states, states are either a special character, an alphanumeric character or space.
Now when i'm at an alphanumeric state i'm at a command, the way i know where the next flag is when i encounter again alphanumeric state or if input[i] == '-', now the problem is with multi-flag commands.
For example:
$ ls -la | grep "*.c"
I successfully get the command ls, grep and the flag -la, *.c.
However with multi-flag commands like.
$ sed -i "*.bak" "s/a/b/g" file1 file2
It seems to me very difficult, and i can't figure out yet, how can i know where the flags to a specific command ends, so my question is how bash parse these multi-flags commands ? any suggestions regarding my problem, would be appreciated !
The shell does not attempt to parse command arguments; that's the responsibility of the utility. The range of possible command argument syntaxes, both in use and potentially useful, is far too great to attempt that.
On Unix-like systems, the shell identifies individual arguments from the command line, mostly by splitting at whitespace but also taking into account the use of quotes and a variety of other transformations, such as "glob expansion". It then makes a vector of these arguments ("argv") and passes the vector to execve, which hands them to the newly created process.
On Windows systems, the shell doesn't even do that. It just hands over the command-line as a string, and leaves it to the command-line tool to do everything. (In order to provide a modicum of compatibility, there's an intermediate layer which is called by the application initialization code, which eventually calls main(). This does some basic argument-splitting, although its quoting algorithm is quite a bit simplified from that used by a Unix shell.)
No command-line shell that I know of attempts to identify command-line flags. And neither should you.
For a bit of extracurricular reading, here's the description of shell parsing from the Posix standard: Trying to implement all that goes far beyond the requirements given to you for this assignment, and I'm certainly not recommending that you do that. But it might still be interesting, and understanding it will help you immensely if you start using a shell.
Alternatively, you could try reading the Bash manual, which might be easier to understand. Note that Bash implements a lot of extensions to the Posix standard.

Ruby script equivalent to the bash `set` builtin's `-x` flag?

When a bash script is running, the set -x command can be used to print all commands and output as they're being executed.
I'm writing a Ruby script. Is there a way to also have this print the commands and output as they're being executed?
While interpreted, Ruby parses and runs code very differently from languages like Bash or Tcl, so there's no built-in way to do exactly what you want. You'll have to use a debugger or REPL to get something that approximates what you're trying to do, but it won't really be the same as using flags like -x or -v in Bash. An external or IDE-based debugger will probably come closest, though.
A Couple of Options
There is no built-in way to do this, as Ruby is not really a line-by-line interpreted language in the same way as Bash or Tcl. While Ruby is generally considered an interpreted language, it actually uses a tokenizer and parser to generate code that runs on a virtual machine such as YARV or GraalVM. You do have a couple of options, though:
Use the -d flag or set $DEBUG to a truthy value in your code, and then do some level of introspection based on whether the debug flag is enabled. For example:
# 1 is printed because $DEBUG is truthy
$ ruby -e 'BEGIN { $DEBUG = true }; puts 1 if $DEBUG'
# nothing is printed because $DEBUG is falsey
$ ruby -e 'puts 1 if $DEBUG'
Please note that Ruby 3.0.3 and 3.1.0 seem to have an issue with the -d flag, so the first example uses a BEGIN statement to set the value of the flag inside the program.
Use the debug gem (now standard with Ruby 3). You can either step through the code with rdbg and use the list command liberally, or (if you're clever) script a series of list commands on specific lines using the ~/.rdbgrc file.
Use an external debugger, with or without rdbg. Note that the new debugger supports IDE-based debugging (e.g. with RubyMine or VS Code) and remote debugging, but setting up IDE or remote debugging is likely a topic outside the scope of a reasonable SO answer.
Use irb or pry with the debugger of your choice, which usually gives you a number of ways to inspect source code, frames, expressions, variables, and so on, although you need to run from an on-disk file rather than a REPL to access some of the functionality you may be looking for.
For the most part, if you're not using an IDE or a debugger, you will generally need to rely on return values in a REPL or Kernel#pp statements in your code to inspect return values as you go along. However, short of a debugger or REPL that supports listing methods or lines of code on request, you'll either need to use external tools to solve whatever problem you're trying to solve via this approach another way.
Other Options
If you use pry, the pry-rescue gem along with pry-stack_explorer will allow you to automatically trigger a REPL session that allows you to traverse up and down the stack if you hit an exception without requiring you to start your session in the REPL or explicitly call binding.pry. On supported versions of Ruby, this can be very useful, especially since Pry supports a show-source -l command that will do something similar to what you want (at least interactively), although the line numbers may not be what you expect if the code is entered directly in the REPL rather than loaded from a Ruby program on disk.

Interpretation of additional arguments to Ruby's Kernel::system method

Why does the first excerpt succeed and the second fail?
system 'emacs', '--batch', '--quick', '--eval="(require \'package)"'
system 'emacs --batch --quick --eval="(require \'package)"'
(If it matters, I'm executing the code on Mac OS X Mountain Lion with Ruby version 1.8.7 and Emacs version 22.1.1.)
First of all, those two system calls are different in ways that you may not expect. A quick example will probably explain the difference better than a bunch of words and hand waving. Start with a simple shell script:
echo $1
I'll call that because I like pancakes more than foo. Then we can step into irb and see what's going on:
>> system('./ --where-is="house?"')
>> system('./', '--where-is="house?"')
Do you see the significant difference? The single argument form of system hands the whole string to /bin/sh for processing and /bin/sh will deal with the double quotes in its own way so the program being called will never see them. The multi-argument form of system doesn't invoke /bin/sh to process the command line so the arguments are passed as-is with double quotes intact.
Back to your system calls. The first one will send this exact argument to emacs (note that Ruby will take care of converting \' to just '):
--eval="(require 'package)"
and emacs will try to evaluate "(require 'package)"; that looks more like a string than an elisp snippet to me and evaluating a string literal doesn't do much of anything. Your second will send this to emacs:
--eval=(require 'package)
and emacs will complain that it
Cannot open load file: package
Note that my elisp knowledge is buried under about 20 years of rust and forgetfulness so some of the emacs details may be a bit off.

what is the use of "#!/usr/local/bin/ruby -w" at the start of a ruby program

what is the use of writing the following command at the start of a ruby program ?
#!/usr/local/bin/ruby -w
Is it OS specific command? Is it valid for ruby on windows ? if not, then what is an equivalent command in windows ?
It is called a Shebang. It tells the program loader what command to use to execute the file. So when you run ./myscript.rb, it actually translates to /usr/local/bin/ruby -w ./myscript.rb.
Windows uses file associations for the same purpose; the shebang line has no effect (edit: see FMc's answer) but causes no harm either.
A portable way (working, say, under Cygwin and RVM) would be:
#!/usr/bin/env ruby
This will use the env command to figure out where the Ruby interpreter is, and run it.
Edit: apparently, precisely Cygwin will misbehave with /usr/bin/env ruby -w and try to look up ruby -w instead of ruby. You might want to put the effect of -w into the script itself.
The Shebang line is optional, and if you run the ruby interpreter and pass the script to it as a command line argument, then the flags you set on the command line are the flags ruby runs with.
A Shebang line is not ruby at all (unless you want to call it a ruby comment). It's really shell scripting. Most linux and unix users are running the BASH shell (stands for Borne Again SHell), but pretty much every OS has a command interpreter that will honor the Shebang.
“#!/usr/local/bin/ruby -w”
The "she" part is the octothorp (#), aka pound sign, number sign, hash mark, and now hash tag (I still call it tic-tac-toe just cuz).
The "bang" part is the exclaimation mark (!), and it's like banging your fist on the table to exclaim the command.
On Windows, the "Shell" is the command prompt, but even without a black DOS window, the command interpreter will run the script based on file associations. It doesn't really matter if the command interpreter or the programming langue is reading the shebang and making sure the flags are honored, the important point is, they are honored.
The "-w" is a flag. Basically it's an instruction for ruby to follow when it runs the script. In this case "-w" turns on warnings, so you'll get extra warnings (script keeps running) or errors (script stops running) during the execution of the script. Warnings and exceptions can be caught and acted upon during the program. These help programmers find problems that lead to unexpected behavior.
I'm a fan of quick and dirty scripts to get a job done, so no -w. I'm also a fan of high quality reusable coding, so definitely use -w. The right tool for the right job. If you're learning, then always use -w. When you know what you're doing, and stop using -w on quick tasks, you'll start to figure out when it would have helped to use -w instead of spending hours trouble shooting. (Hint, when the cause of a problem isn't pretty obvious, just add -w and run it to see what you get).
"-w" requires some extra coding to make it clear to ruby what you mean, so it doesn't immediately solve things, but if you already write code with -w, then you won't have much trouble adding the necessary bits to make a small script run with warnings. In fact, if you're used to using -w, you're probably already writing code that way and -w won't change anything unless you've forgotten something. Ruby requires far less "plumbing code" then most (maybe all) compiled languages like C++, so choosing to not use -w doesn't allow you to save much typing, it just lets you think less before you try running the script (IMHO).
-v is verbose mode, and does NOT change the running of the script (no warnings are raised, no stopping the script in new places). Several sites and discussions call -w verbose mode, but -w is warning mode and it changes the execution of the script.
Although the execution behavior of a shebang line does not translate directly to the Windows world, the flags included on that line (for example the -w in your question) do affect the running Ruby script.
Example 1 on a Windows machine:
#!/usr/local/bin/ruby -w
puts $VERBOSE # true
Example 2 on a Windows machine:
puts $VERBOSE # false

Converting a history command into a shell script

This is sort of one of those things that I figured a lot of people would use a lot, but I can't seem to find any people who have written about this sort of thing.
I find that a lot of times I do a lot of iteration on a command-line one-liner and when I end up using it a lot, or anticipate wanting to use it in the future, or when it becomes cumbersome to work with in one line, it generally is a good idea to turn the one-liner into a shell script and stick it somewhere reasonable and easily accessible like ~/bin.
It's obviously too cumbersome to use any sort of roundabout method involving a text editor to get this done, and it's possible to simply do it on the shell, for instance in zsh typing
echo "#!/usr/bin/env sh" > ~/bin/ && echo !523 >> ~/bin/
followed by pressing Tab to inject the !523rd command and somehow shoehorning it into an acceptable string to be saved.
This is particularly cumbersome and has at minimum three problems:
Does not work in bash as it does not complete the !523
Requires some manual inspection and string escapement
Requires too much typing such as the script name must be entered twice
So it looks like I need to do some meta shell scripting here.
I think a good solution would function under both bash and zsh, and it should probably work by taking two arguments, an integer for the history command number and a name for the shell script to poop out in a hardcoded directory which contains that one command. Furthermore, under bash, it appears that multi-line commands are treated as separate commands, but I'm willing to assume that we only care about one-liners here and I only use zsh anyway at this point.
The stumbling block here is that i think I'll still be running shell scripts through bash even when using zsh, so it won't likely then be able to parse zsh's history files. I may need to make this into two separate programs then.
Update: I agree with #Floris 's comment that direct use of the commands like !! would be helpful though I am not sure how to make this work. Suppose I have the usage be
mkscript command_number_24 !24
this is inadequate because mkscript will be receiving the expanded out contents of the 24th command. if the 24th command contains any file globs or somesuch they will have been expanded already. This is bad, and I basically want the contents of the history file, i.e. the raw command string. I guess this can be worked around by manually implementing those shortcuts in here. Or just screw it and just take an integer argument.
function mkscript() {
echo '#!/bin/bash' > ~/bin/$2
history -p '!'$1 >> ~/bin/$2
Only tested in Bash.
Update from OP: In zsh I can accomplish this with fc -l $2 $2
