Open a file with user input Ruby - ruby

I take a string variable from a user like this:
mail = gets
and I want to use this variable to open a file.
file = File.new(mail, "r") ##obviously this isn't working
How do I actually use this mail variable to open a file of that name?
Thanks

mail = gets.chomp
gets function gives a string with \n in the end.

I prefer mail = gets.strip.
strip seems to be slightly slower than chomp but I find it to be a little bit more readable.
If you're curious about the benchmark, check out the gist here.

Related

Getting both File input AND STDIN from ARGF?

I am using the shoes library to run a piece of ruby code and have discovered that it treats the ruby code it's running as File Input, and thus does not allow me to get STDIN anymore (since ARGF allows File Input OR STDIN but apparently not both).
Is there anyway to override this? I'm told perl, for example, allows you to read from STDIN once the IO buffer is empty.
Edit:
I have had some success with the "-" special filename character, which apparently is a signal to switch to STDIN on the command line.
Previous Form of Question: Is Shoes ARGF Broken?
Using general Ruby, I can read either files or Standard In with ARGF. With Shoes, I am only able to read files. Anything from standard in just gets ignored. Is it eating standard in, or is there another way to access it?
Example code lines: Either stand alone in a ruby file, or inside a Shoes app in shoes.
#ruby testargf.rb aus.txt is the same as ruby testargf.rb<aus.txt
#but isn't in shoes. shoes only prints with the first input, not the second
ARGF.each do |line| #readLine.each has same result
puts line
end
Or in Shoes:
#shoes testargfshoes.rb aus.txt should be the same as <aus.txt but isn't.
Shoes.app(title: "File I/0 test",width:800,height:650) do
ARGF.each do |line| #readLine.each has same result
puts line
para line
end
end
In retrospect, I do also see a further difference between Shoes and Ruby: Shoes ALSO prints out the source code of the program I am running, along with any files I pass along. If I try to input a file to standard in, ONLY the source code is printed.
I imagine this means that the shoes app is taking my program as an input, and then not sanitizing (or whatever the correct word would be) the input when it passes it along to my code. This seems to strengthen my "Shoes eats Standard In" hypothesis, since it is clearly USING standard In for something. I guess it can take two files in a row, but not one file and THEN a reference to standard in.
I can confirm that Ruby without Shoes provides identical behavior if I mix file input and STDIN with:
ruby testargf.rb aus_simple.txt < testargf.rb
I have had some success with the "-" special filename character, which apparently is a signal to switch to STDIN on the command line.
Example of use:
shoes testargfshoes.rb - <aus_simple.txt
Don't pass the "-" without passing any standard In, makes it hang.
Found the answer here: https://robots.thoughtbot.com/rubys-argf

Ruby sub not applying in the output file using IO.write

I have an issue about getting the same result in the output txt file that I get applying the sub method in a string. So the thing is when I apply the following code in a single string I get the \n before the capital letter in the middle of the string:
line3= "We were winning The Home Secretary played a
important role."
line3.sub(/(?<!^) *+(?=[A-Z])/, "\n")
=> "We were winning\nThe Home Secretary played a\n important role."
But if I apply the following code the txt file that I get doesn't have any \n before the capital letter.
old= File.readlines("Modificado word.txt")
second= old.join
third= second.sub(/(?<!^) *+(?=[A-Z])/, "\n")
new= IO.write("new.txt", third)
I've tried multiple ways of encoding(surely in the wrong way) because I thought the the issue might be there but any of them worked. Even the gsub, but didn't work either.
Ok, I've got the solution, I don't know why but the type of encode of the txt file is in a format that the readlines command is not even able to read, so I copied all the content in another txt file which should be created from scratch and it worked :)

Cucumber Ruby read Word Doc or Text

I'm writing tests that will be confirming a lot of text on page. It's for Terms and Conditions pages, Cookies, Privacy Policy etc. Not what I'd like to do but it's a requirement I can't avoid. I've heard that Cucumber can open a text file like. txt or .doc and compare the text on screen.
I've tried to find any reference to this but have come up short. Is anyone able to point me in the right direction please?
Thanks
Cucumber/aruba has a Given step that goes like this :
Given a file named "foo" with:
"""
hello world
"""
You would then be able to check that you webpage has content with :
Then I should see "hello world"
And your step :
Then /^I should see "([^"]*)" do |text
(page.should have_content(text))
I continued to play and found this works for a .txt file:
File.foreach(doc) do |line|
line = line.strip
within_window(->{ page.title == 'Cookies' }) do
page_content = find('#main').text
page_content.gsub!('‘',"'")
page_content.gsub!('’',"'")
expect(page_content).to have_text line
end
end
end
doc is my filename variable. I used strip to remove the newline characters created when pasting into the txt file. The cookies open up in a new tab so I navigated to that. I had invalid UTF-8 characters so just used gsub on those.
I'm sure there's a much better way to do this but this works for now
I'd recommend using approval tests. A person approves the original, then the automated test verifies no change. If the license text changes, the test fails. If the text is supposed to pass, then the tester approves the new text, then it's automated thereafter.

Ruby REGEX parser

Can someone have a look at the below code and tell me whether this is truly the correct way to go about parsing text after the ":" sign.
require 'yaml'
the_file = ARGV[0]
f = File.open(the_file)
content = f.read
r = Regexp.new(/((?=:).+)/)
emails = content.scan(r).uniq
puts YAML.dump(emails)
This script parses email addresses from text files to clean out junk. TEXT:email_address.
I'm trying to make my scripts a bit more efficient. So all my ruby/regex scripts look the same, only with different regex patterns. I wrote them in ruby by cutting an dpasting here and there, and because I have ruby on the majority of my servers, so it's easier to run any script anywhere.
Any help would be appreciated.
If you truly just want text after the first :, I would not use a Regex. I would use String#split
lines = File.readlines(the_file)
emails = lines.map { |line| line.split(':', 2).last }.uniq
If you only want valid emails, I would just search for a regexp that captures emails:
email_regexp = /[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}/
puts YAML.dump(
File.read(ARGV[0]).scan(email_regexp)
)
If you know the colon is the left delimiter before the email, and a close paren on the right, then you can just use
:(.+[^)])
as your regex to extract whatever is in between. There are some very specific email-matching regexen out there though, which may be more appropriate (for when the source text is less 'regular')

Detect encoding

I'm getting some string data from the web, and I suspect that it's not always what it says it is. I don't know where the problem is, and I just don't care any more. From day one on this project I've been fighting Ruby string encoding. I really want some way to say: "Here's a string. What is it?", and then use that data to get it to UTF-8 so that it doesn't explode gsub() 2,000 lines down in the depths of my app. I've checked out rchardet, but even though it supposedly works for 1.9 now, it just blows up given any input with multiple bytes... which is not helpful.
You can't really detect the encoding. You can only assume it.
For the most Western languages applications, the following construct
will work. The traditional encoding usually is "ISO-8859-1". The new and preferred encoding is UTF-8. Why not simply try to encode it with UTF-8 and fallback with the old encoding
def detect_encoding( str )
begin
str.encode("UTF-8")
"UTF-8"
rescue
"ISO-8859-1"
end
end
It is impossible to tell from a string what encoding it is in. You always need some additional metadata that tells you what the string's encoding is.
If you get the string from the web, that metadata is in the HTTP headers. If the HTTP headers are wrong, there is absolutely nothing that you or Ruby or anyone else can do. You need to file a bug with the webmaster of the site where you got the string from and wait till he fixes it. If you have a Service Level Agreement with the website, file a bug, wait a week, then sue them.
Old question, but chardet works on 1.9: http://rubygems.org/gems/chardet
why not try use https://github.com/brianmario/charlock_holmes to get the exact encoding. Then also use it to convert to UTF8
require 'charlock_holmes'
class EncodeParser
def initialize(text)
#text = text
end
def detected_encoding
CharlockHolmes::EncodingDetector.detect(#text)[:encoding]
end
def convert_to_utf8
CharlockHolmes::Converter.convert(#text, detected_encoding, "UTF-8")
end
end
then just use EncodeParser.new(text).detected_encoding or EncodeParser.new(text). convert_to_utf8
We had some fine experience with ensure_encoding. It actually does the job for us to convert resource files having unknown encoding to UTF-8.
The README will give you some hints which options would be a good fit for your situation.
I have never tried chardet since ensure_encoding did the job just fine for us.
I covered here how we use ensure_encoding.
Try setting these in your environment.
export LC_ALL=en_US.UTF-8
export LC_CTYPE=en_US.UTF-8
Try ruby -EBINARY or ruby -EASCII-8BIT to command line
Try adding -Ku or -Kn to your ruby command line.
Could you paste the error message ?
Also try this: http://github.com/candlerb/string19/blob/master/string19.rb
Might try reading this: http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
I know it's an old question, but in modern versions of Ruby it's as simple as str.encoding. You get a return value something like this: #Encoding:UTF-8

Resources