Most reliable way to get text into ruby script - ruby

I have a ruby script that’ll do some text parsing (à lá markdown). It does it in a sequence of steps, like
string = string.gsub # more code here
string = string.gsub # more code here
# and so on
what is the best (i.e. most reliable) way to feed text into string in the first place? It’s a script, and the text it’ll be fed can vary a lot — it can be multilingual, have some characters that might trip a shell (like ", ', ’, &, $ you get the idea), and will likely be multi-line.
Is there some trick on the lines of
cat << EOF
bunch of text here
EOF
Additional considerations
I’m not looking for a markdown parser, this is something I want to do, not something I want a tool for.
I’m not a big ruby user (I’m starting to use it), so the more detailed the answer you can provide, the better.
It must be completely scriptable (i.e., no interrupting to ask the user for information).

The Kernel#gets method will read a string separated using the record separator from stdin or files specified on the command line. So if you use that you can do things like:
yourscript <filename #read from filename
yourscript file1 file2 # read both file1 and file2
yourscript #lets you type at your script
So to run something like:
cat <<'eof' |ruby yourscript.rb
This' & will $all 'eof' be 'fine'''
eof
Script might contain something like:
s = gets() # read a line
lines = readlines() # read all lines into an array
That's fairly standard for command-line scripts. If you want to have a user-interface then you'll want something more complex. There is an option to the Ruby interpreter to set the encoding of files as they are read.

Just read from stdin (which is an IO object):
$stdin.read
As you can see, stdin is provided in the global variable $stdin. Since it’s an IO object, there are a lot of other methods available if read doesn’t suit your needs.
Here’s a simple one-line example in the shell:
$ echo "foo\nbar" | ruby -e 'puts $stdin.read.upcase'
FOO
BAR
Obviously reading from stdin is extremely flexible since you can pipe input in from anywhere.

Ruby is very adept at encodings (see eg. Encoding docs). To get text into Ruby, one typically uses either gets, or reads File objects, or uses a GUI, which one can build with gtk2 gem or rugui (if already finished). In case you are getting texts from the wild internet, security should be your concern. Ruby used to have 4 $SAFE levels, but after some discussions, now there might be only 3 of them left. In any case, the best strategy to handle strings is to know as much as possible about the properties of the string that you expect in advance. Handling absolutely arbitrary strings is a surprisingly difficult task. Try to limit the number of possible encodings and figure the maximum size for the string that you expect.
Also, with respect to your original stated goal writing a markdown-processor-like something, you might want to not reinvent the wheel (unless it is for didactic purposes). There is this SO post:
Better ruby markdown interpreter?
The answer will direct you to kramdown gem, which gets a lot of praise, though I have not tried it personally.

Related

How to create a script that takes a string and converts it to another managed string?

My intent is to capture the values of a string that I type and have those values be shifted to other letters. Essentially it would be a fake translation program or custom cipher generation script. Example of function:
I would type the sentence:
Who are you?
and the output would be shifted by lets say 1 to the next consonant or vowel, for example. The script would also need to know how to skip vowels or consonants as needed, and for the sake of argument y would always be considered a vowel. So the output would be:
Xju eso auy?
This is something I wanted to attempt for a creative writing project as a means of making another language. Ideally the shift variable could be an input as well to work with to find the best outcome. Possibly even variable shifts for vowels and consonants at the same time?
If you truly are doing this for a creative writing project, then I submit that diving deep into the programming is not warranted. None of the input transformations you described require decisions to be made by the program. That is; once an encoding is chosen, the incoming letters will be each be firmly associated with outgoing letters. This greatly expands your options for how to achieve this, and greatly simplifies the complexity of the task.
Since you tagged Terminal, here are a couple commands you could use in action:
echo "Who are you?" | perl -pe 'tr/N-ZA-Mn-za-m/A-Za-z/'
outputs: Jub ner lbh?
This is the famous Rot13 "encoding" (all it does is substitute the letter that is 13 later in the alphabet). It's particularly handy as 13 is half the alphabet's 26, so putting some "encoded" text in will give you back the original text:
echo "Jub ner lbh?" | perl -pe 'tr/N-ZA-Mn-za-m/A-Za-z/'
outputs: Who are you?
echo just sends text to the screen or other commands. Here we echo our text "How are you?" into a pipe | to pass it to the next command perl, which is a very powerful and flexible text-manipulation and reporting program. The rest of the line is just instructions for perl on how to spin 13 letters later in the alphabet.
Quick note; normally hitting return runs the command in terminal. You can put a backslash \ at the end of a line though and hit return, it will then let you keep typing on the next line but treat it all as one command. Handy for lining things up.
echo "How are you?" | tr \
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' \
'DFVBTXEUWZOSHCJMAQYRINKLPGdfvbtxeuwzoshcjmaqyrinklpg'
outputs: Ujk dqt pji?
There's another command, tr. This example demonstrates an arbitrary substitution—in this case, random. It looks through that first long set of letters, and swaps in instead the letter in the second long set that is in the matching position. Since this substitution example is random, you could use this kind of mapping to create "Cryptogram" puzzles.
The great thing about the tr command is that you can tell it to use whatever input-to-output "mapping" you'd like. Sure, it's a bit manual, but hey—no programming needed!
Here's the mapping to achieve your requested "consonants and vowels" example shift:
echo "Who are you?" | tr \
'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz' \
'ECDFIGHJOKLMNPUQRSTVYWXZABecdfighjoklmnpuqrstvywxzab'
Outputs: Xju esi auy? Not doing it by hand has its advantages—you missed a vowel in there.
So if you need to rapidly try different mappings, consider learning a bit more about perl (or simpler: sed. or more complex: awk. Or or or…). If, instead, you don't mind a bit of careful command-construction, just lining up each incoming letter with your desired output letter, I think tr would serve nicely.

How can I get piped data with arguments on Ruby2.4

In Python3, I have this code:
arg, unk = parser.parse_known_args()
buf = ''
for line in fileinput.input(unk):
buf += line
fileinput.close()
This code allows me to get piped data along with arguments to the program.
What I am achieving is to get piped e-mail from postfix. Postfix pipe email file to my python app and also add some arguments that I want. In Ruby I cannot find a proper way of doing this. Piped data can be max. ~25MB. So I need a correct, proper and smooth way of handling this. I want to handle even large files without issues.
ruby test.rb --option ARG
Of course, I can get arguments easily but I also want to get PIPED data.
In fact, I cannot find exact method that Ruby has for getting piped data. I am stuck at this point. Can anyone give me a hand on this?
It seems that in ruby you want to read from ARGF. It handles files passed as filenames or piped to your program.

Getting both File input AND STDIN from ARGF?

I am using the shoes library to run a piece of ruby code and have discovered that it treats the ruby code it's running as File Input, and thus does not allow me to get STDIN anymore (since ARGF allows File Input OR STDIN but apparently not both).
Is there anyway to override this? I'm told perl, for example, allows you to read from STDIN once the IO buffer is empty.
Edit:
I have had some success with the "-" special filename character, which apparently is a signal to switch to STDIN on the command line.
Previous Form of Question: Is Shoes ARGF Broken?
Using general Ruby, I can read either files or Standard In with ARGF. With Shoes, I am only able to read files. Anything from standard in just gets ignored. Is it eating standard in, or is there another way to access it?
Example code lines: Either stand alone in a ruby file, or inside a Shoes app in shoes.
#ruby testargf.rb aus.txt is the same as ruby testargf.rb<aus.txt
#but isn't in shoes. shoes only prints with the first input, not the second
ARGF.each do |line| #readLine.each has same result
puts line
end
Or in Shoes:
#shoes testargfshoes.rb aus.txt should be the same as <aus.txt but isn't.
Shoes.app(title: "File I/0 test",width:800,height:650) do
ARGF.each do |line| #readLine.each has same result
puts line
para line
end
end
In retrospect, I do also see a further difference between Shoes and Ruby: Shoes ALSO prints out the source code of the program I am running, along with any files I pass along. If I try to input a file to standard in, ONLY the source code is printed.
I imagine this means that the shoes app is taking my program as an input, and then not sanitizing (or whatever the correct word would be) the input when it passes it along to my code. This seems to strengthen my "Shoes eats Standard In" hypothesis, since it is clearly USING standard In for something. I guess it can take two files in a row, but not one file and THEN a reference to standard in.
I can confirm that Ruby without Shoes provides identical behavior if I mix file input and STDIN with:
ruby testargf.rb aus_simple.txt < testargf.rb
I have had some success with the "-" special filename character, which apparently is a signal to switch to STDIN on the command line.
Example of use:
shoes testargfshoes.rb - <aus_simple.txt
Don't pass the "-" without passing any standard In, makes it hang.
Found the answer here: https://robots.thoughtbot.com/rubys-argf

How to syntax-highlight function arguments inside function in Sublime?

I would like to highlight the arguments of a Ruby function in Sublime, when they are used inside the function. Like so:
def my_func(arg1, arg2 = nil)
puts arg1 # should be highlighted
puts arg2 # should be highlighted
end
I've been messing with Sublime's plist syntax highlighting format for a while (same as Textmate's), but having trouble figuring out how to capture one group (the args in the def line) and use them to match more expressions in another group (the whole method)
I have seen \1 and \2 being used in EndCapture groups before, which gives me hope that this is possible, for example by using \1 in a match group. But I just can't seem to get it to work. Anybody have any ideas?
(too long for comment)
If writing regexes in XML/PLIST is driving you batty, try installing the PackageDev plugin via Package Control. There is an option to convert PLIST .tmLanguage syntax files to YAML, and when you're done editing you can convert it back to PLIST. This way, you don't have to mess around with trying to get all the <dict><array><whatever> tags correct in the .tmLanguage file, and you can focus on the regexes, capturing groups, etc. It also uses the Oniguruma syntax, which I assume you're at least somewhat familiar with if you're a Rubyist. I maintain an improved syntax for Python, and my work has been so much easier since I started using the .YAML-tmlanguage format.
Good luck!

Best way to read output of shell command

In Vim, What is the best (portable and fast) way to read output of a shell command? This output may be binary and thus contain nulls and (not) have trailing newline which matters. Current solutions I see:
Use system(). Problems: does not work with NULLs.
Use :read !. Problems: won’t save trailing newline, tries to be smart detecting output format (dos/unix/mac).
Use ! with redirection to temporary file, then readfile(, "b") to read it. Problems: two calls for fs, shellredir option also redirects stderr by default and it should be less portable ('shellredir' is mentioned here because it is likely to be set to a valid value).
Use system() and filter outputs through xxd. Problems: very slow, least portable (no equivalent of 'shellredir' for pipes).
Any other ideas?
You are using a text editor. If you care about NULs, trailing EOLs and (possibly) conflicting encodings, you need to use a hex editor anyway?
If I need this amount of control of my operations, I use the xxd route indeed, with
:se binary
One nice option you seem to miss is insert mode expression register insertion:
C-r=system('ls -l')Enter
This may or may not be smarter/less intrusive about character encoding business, but you could try it if it is important enough for you.
Or you could use Perl or Python support to effectively use popen
Rough idea:
:perl open(F, "ls /tmp/ |"); my #lines = (<F>); $curbuf->Append(0, #lines)

Resources