Ruby REGEX parser - ruby

Can someone have a look at the below code and tell me whether this is truly the correct way to go about parsing text after the ":" sign.
require 'yaml'
the_file = ARGV[0]
f = File.open(the_file)
content = f.read
r = Regexp.new(/((?=:).+)/)
emails = content.scan(r).uniq
puts YAML.dump(emails)
This script parses email addresses from text files to clean out junk. TEXT:email_address.
I'm trying to make my scripts a bit more efficient. So all my ruby/regex scripts look the same, only with different regex patterns. I wrote them in ruby by cutting an dpasting here and there, and because I have ruby on the majority of my servers, so it's easier to run any script anywhere.
Any help would be appreciated.

If you truly just want text after the first :, I would not use a Regex. I would use String#split
lines = File.readlines(the_file)
emails = lines.map { |line| line.split(':', 2).last }.uniq

If you only want valid emails, I would just search for a regexp that captures emails:
email_regexp = /[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}/
puts YAML.dump(
File.read(ARGV[0]).scan(email_regexp)
)

If you know the colon is the left delimiter before the email, and a close paren on the right, then you can just use
:(.+[^)])
as your regex to extract whatever is in between. There are some very specific email-matching regexen out there though, which may be more appropriate (for when the source text is less 'regular')

Related

Remove   from Ruby String

i am try to parse some data and meet trouble with clean a   symbol. I knew that this is just a "space" but i realy got trouble to clean it from string
my code:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('my_page.hmtl')
price = page.search('#product_buy .price').text.to_s.gsub(/\s+/, "").gsub(" ","").gsub(" ", "")
puts price
And as result i always got "4 162" - with dat spaces. Don't know what to do.
Help please who meet this issue previously. Thank you
HTML escape codes don't mean anything to Ruby's regex engine. Looking for " " will look for those literal characters, not a thin space. Instead, versions of Ruby >= 1.8 support Unicode in strings, meaning that you can use the Unicode code point corresponding to a thin space to make your substitution. The Unicode code point for a thin space is 0x2009, meaning that you can reference it in a Ruby string as \u2009.
Additionally, instead of calling some_string.gsub('some_string', ''), you can just call some_string.delete('some_string').
Note that this isn't appropriate for all situations, because delete removes all instances of all characters appearing in the intersection of its arguments, while gsub will remove only segments matching the pattern provided. For example, 'hellohi'.gsub('hello', '') == "hi", while 'hellohi'.delete('hello') == 'i').
In your specific case, I'd use something like:
price = page.search('#product_buy .price').text.delete('\u2009\s')

Is there an addslashes equivalent in Ruby?

Using Ruby how would I be able to automatically escape single and double quotes in some of the variables being written to the output file. Coming from PHP I'm looking for an addslashes type function, but there doesn't seem to be a simple solution for this in Ruby.
require "csv"
def generate_array( file )
File.open("#{file}" + "_output.txt", 'w') do |output|
CSV.foreach(file) do |img, _, part, focus, country, loc, lat, lon, desc, link|
output.puts("[#{lat}, #{lon}, '#{img.downcase}', '#{part}', '#{loc}', '#{focus}', '#{country}', '#{desc}', '#{link}'],")
end
end
end
ARGV.each do |file|
generate_array(file)
end
I suppose you can emulate PHP addslashes functionality with this Ruby construct:
.gsub(/['"\\\x0]/,'\\\\\0')
For example:
slashed_line = %q{Here's a heavily \s\l\a\s\h\e\d "string"}
puts slashed_line.gsub(/['"\\\x0]/,'\\\\\0')
# Here\'s a heavily \\s\\l\\a\\s\\h\\e\\d \"string\"
There is also String#dump:
slashed_line = %q{Here's a heavily \s\l\a\s\h\e\d "string"}
puts slashed_line.dump
#=> "Here's a heavily \\s\\l\\a\\s\\h\\e\\d \"string\""
I don't know Ruby, but I know that in PHP addslashes is pretty much deprecated.
Every time that you need to escape data, it requires a different escape routine. HTML needs different encoding and handling over database work, and each database has its own special rules.
I assume by your question that you are looking to output things to a CSV file. That, again, opens up a whole kettle of fish as there is no standard CSV. You'll need to do some research on both what will be making the data (and if it will be strict ASCII or Unicode or something else) and which format of escaping quotes will be needed. Most CSV consumers use a two double quotes to replace a single double quote. If you need " in your string, you write "".

What does <<DESC mean in ruby?

I am learning Ruby, and in the book I use, there is an example code like this
#...
restaurant = Restaurant.new
restaurant.name = "Mediterrano"
restaurant.description = <<DESC
One of the best Italian restaurants in the Kings Cross area,
Mediterraneo will never leave you disappointed
DESC
#...
Can someone explain to me what <<DESC means in the above example? How does it differ from the common string double quote?
It is used to create multiline strings. Basically, '<< DESC' tells ruby to consider everything that follows until the next 'DESC' keyword. 'DESC' is not mandatory, as it can be replaced with anything else.
a = <<STRING
Here
is
a
multiline
string
STRING
The << operator is followed by an identifier that marks the end of the document. The end mark is called the terminator. The lines of text prior to the terminator are joined together, including the newlines and any other whitespace.
http://en.wikibooks.org/wiki/Ruby_Programming/Here_documents
It allows the creation of multi-line string constants in a readable way. See http://en.wikibooks.org/wiki/Ruby_Programming/Here_documents.
It is called a heredoc, or heredocument. It allows you to write multiline. You can test it in your terminal!

Convert Ruby string to *nix filename-compatible string

In Ruby I have an arbitrary string, and I'd like to convert it to something that is a valid Unix/Linux filename. It doesn't matter what it looks like in its final form, as long as it is visually recognizable as the string it started as. Some possible examples:
"Here's my string!" => "Heres_my_string"
"* is an asterisk, you see" => "is_an_asterisk_you_see"
Is there anything built-in (maybe in the file libraries) that will accomplish this (or close to this)?
By your specifications, you could accomplish this with a regex replacement. This regex will match all characters other than basic letters and digits:
s/[^\w\s_-]+//g
This will remove any extra whitespace in between words, as shown in your examples:
s/(^|\b\s)\s+($|\s?\b)/\\1\\2/g
And lastly, replace the remaining spaces with underscores:
s/\s+/_/g
Here it is in Ruby:
def friendly_filename(filename)
filename.gsub(/[^\w\s_-]+/, '')
.gsub(/(^|\b\s)\s+($|\s?\b)/, '\\1\\2')
.gsub(/\s+/, '_')
end
First, I see that it was asked purely in ruby, and second that it's not the same purpose (*nix filename compatible), but if you are using Rails, there is a method called parameterize that should help.
In rails console:
"Here's my string!".parameterize => "here-s-my-string"
"* is an asterisk, you see".parameterize => "is-an-asterisk-you-see"
I think that parameterize, as being compliant with URL specifications, may work as well with filenames :)
You can see more about here:
http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize
There's also a whole lot of another helpful methods.

how to convert strings like "this is an example" to "this-is-an-example" under ruby

How do I convert strings like "this is an example" to "this-is-an-example" under ruby?
The simplest version:
"this is an example".tr(" ", "-")
#=> "this-is-an-example"
You could also do something like this, which is slightly more robust and easier to extend by updating the regular expression:
"this is an example".gsub(/\s+/, "-")
#=> "this-is-an-example"
The above will replace all chunks of white space (any combination of multiple spaces, tabs, newlines) to a single dash.
See the String class reference for more details about the methods that can be used to manipulate strings in Ruby.
If you are trying to generate a string that can be used in a URL, you should also consider stripping other non-alphanumeric characters (especially the ones that have special meaning in URLs), or replacing them with an alphanumeric equivalent (example, as suggested by Rob Cameron in his answer).
If you are trying to make something that is a good URL slug, there are lots of ways to do it.
Generally, you want to remove everything that is not a letter or number, and then replace all whitespace characters with dashes.
So:
s = "this is an 'example'"
s = s.gsub(/\W+/, ' ').strip
s = s.gsub(/\s+/,'-')
At the end s will equal "this-is-an-example"
I used the source code from a ruby testing library called contest to get this particular way to do it.
If you're using Rails take a look at parameterize(), it does exactly what you're looking for:
http://api.rubyonrails.org/classes/ActiveSupport/CoreExtensions/String/Inflections.html#M001367
foo = "Hello, world!"
foo.parameterize => 'hello-world'

Resources