Using Ruby how would I be able to automatically escape single and double quotes in some of the variables being written to the output file. Coming from PHP I'm looking for an addslashes type function, but there doesn't seem to be a simple solution for this in Ruby.
require "csv"
def generate_array( file )
File.open("#{file}" + "_output.txt", 'w') do |output|
CSV.foreach(file) do |img, _, part, focus, country, loc, lat, lon, desc, link|
output.puts("[#{lat}, #{lon}, '#{img.downcase}', '#{part}', '#{loc}', '#{focus}', '#{country}', '#{desc}', '#{link}'],")
end
end
end
ARGV.each do |file|
generate_array(file)
end
I suppose you can emulate PHP addslashes functionality with this Ruby construct:
.gsub(/['"\\\x0]/,'\\\\\0')
For example:
slashed_line = %q{Here's a heavily \s\l\a\s\h\e\d "string"}
puts slashed_line.gsub(/['"\\\x0]/,'\\\\\0')
# Here\'s a heavily \\s\\l\\a\\s\\h\\e\\d \"string\"
There is also String#dump:
slashed_line = %q{Here's a heavily \s\l\a\s\h\e\d "string"}
puts slashed_line.dump
#=> "Here's a heavily \\s\\l\\a\\s\\h\\e\\d \"string\""
I don't know Ruby, but I know that in PHP addslashes is pretty much deprecated.
Every time that you need to escape data, it requires a different escape routine. HTML needs different encoding and handling over database work, and each database has its own special rules.
I assume by your question that you are looking to output things to a CSV file. That, again, opens up a whole kettle of fish as there is no standard CSV. You'll need to do some research on both what will be making the data (and if it will be strict ASCII or Unicode or something else) and which format of escaping quotes will be needed. Most CSV consumers use a two double quotes to replace a single double quote. If you need " in your string, you write "".
Related
I need to tokenise strings in Ruby - string.split is almost perfect, except some of the strings may be enclosed in double-quotes, and within them, whitespace should be preserved. In the absence of lex for Ruby (correct?), writing a character-by-character tokenizer seems silly. What are my options?
I want a loop that's essentially:
while !file.eof:
line = file.readline
tokens = line.tokenize() # like split() but handles "some thing" as one token
end
I.e an an array of white-space delimited fields, but with correct handling of quoted sequences. Note there is no escape sequence for the quotes I need to handle.
The best I can imagine so far, is repeatedly match()ing a reg-exa which matches either the quotes sequence or everything until the next whitespace character, but even then I'm not sure how to formulate than neatly.
Like Andrew said the most straightforward way is parse input with stock CSV library and set appropriate :col_sep and :quote_char options.
If you insist to parse manually you may use the following pattern in a more ruby way:
file.each do |line|
tokens = line.scan(/\s*("[^"]+")|(\w+)/).flatten.compact
# do whatever with array of tokens
end
split accepts a regex so you could just write the regexp you want and call split on the line you just read.
line.split(/\w+/)
Try using Ruby's CSV library, and use a space (" ") as the :col_sep
:col_sep
The String placed between each field. This String will be transcoded
into the data’s Encoding before parsing.
Can someone have a look at the below code and tell me whether this is truly the correct way to go about parsing text after the ":" sign.
require 'yaml'
the_file = ARGV[0]
f = File.open(the_file)
content = f.read
r = Regexp.new(/((?=:).+)/)
emails = content.scan(r).uniq
puts YAML.dump(emails)
This script parses email addresses from text files to clean out junk. TEXT:email_address.
I'm trying to make my scripts a bit more efficient. So all my ruby/regex scripts look the same, only with different regex patterns. I wrote them in ruby by cutting an dpasting here and there, and because I have ruby on the majority of my servers, so it's easier to run any script anywhere.
Any help would be appreciated.
If you truly just want text after the first :, I would not use a Regex. I would use String#split
lines = File.readlines(the_file)
emails = lines.map { |line| line.split(':', 2).last }.uniq
If you only want valid emails, I would just search for a regexp that captures emails:
email_regexp = /[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,6}/
puts YAML.dump(
File.read(ARGV[0]).scan(email_regexp)
)
If you know the colon is the left delimiter before the email, and a close paren on the right, then you can just use
:(.+[^)])
as your regex to extract whatever is in between. There are some very specific email-matching regexen out there though, which may be more appropriate (for when the source text is less 'regular')
In Ruby I have an arbitrary string, and I'd like to convert it to something that is a valid Unix/Linux filename. It doesn't matter what it looks like in its final form, as long as it is visually recognizable as the string it started as. Some possible examples:
"Here's my string!" => "Heres_my_string"
"* is an asterisk, you see" => "is_an_asterisk_you_see"
Is there anything built-in (maybe in the file libraries) that will accomplish this (or close to this)?
By your specifications, you could accomplish this with a regex replacement. This regex will match all characters other than basic letters and digits:
s/[^\w\s_-]+//g
This will remove any extra whitespace in between words, as shown in your examples:
s/(^|\b\s)\s+($|\s?\b)/\\1\\2/g
And lastly, replace the remaining spaces with underscores:
s/\s+/_/g
Here it is in Ruby:
def friendly_filename(filename)
filename.gsub(/[^\w\s_-]+/, '')
.gsub(/(^|\b\s)\s+($|\s?\b)/, '\\1\\2')
.gsub(/\s+/, '_')
end
First, I see that it was asked purely in ruby, and second that it's not the same purpose (*nix filename compatible), but if you are using Rails, there is a method called parameterize that should help.
In rails console:
"Here's my string!".parameterize => "here-s-my-string"
"* is an asterisk, you see".parameterize => "is-an-asterisk-you-see"
I think that parameterize, as being compliant with URL specifications, may work as well with filenames :)
You can see more about here:
http://api.rubyonrails.org/classes/ActiveSupport/Inflector.html#method-i-parameterize
There's also a whole lot of another helpful methods.
I have a bunch of string with special escape codes that I want to store unescaped- eg, the interpreter shows
"\\014\"\\000\"\\016smoothing\"\\011mean\"\\022color\"\\011zero#\\016"
but I want it to show (when inspected) as
"\014\"\000\"\016smoothing\"\011mean\"\022color\"\011zero#\016"
What's the method to unescape them? I imagine that I could make a regex to remove 1 backslash from every consecutive n backslashes, but I don't have a lot of regex experience and it seems there ought to be a "more elegant" way to do it.
For example, when I puts MyString it displays the output I'd like, but I don't know how I might capture that into a variable.
Thanks!
Edited to add context: I have this class that is being used to marshal / restore some stuff, but when I restore some old strings it spits out a type error which I've determined is because they weren't -- for some inexplicable reason -- stored as base64. They instead appear to have just been escaped, which I don't want, because trying to restore them similarly gives the TypeError
TypeError: incompatible marshal file format (can't be read)
format version 4.8 required; 92.48 given
because Marshal looks at the first characters of the string to determine the format.
require 'base64'
class MarshaledStuff < ActiveRecord::Base
validates_presence_of :marshaled_obj
def contents
obj = self.marshaled_obj
return Marshal.restore(Base64.decode64(obj))
end
def contents=(newcontents)
self.marshaled_obj = Base64.encode64(Marshal.dump(newcontents))
end
end
Edit 2: Changed wording -- I was thinking they were "double-escaped" but it was only single-escaped. Whoops!
If your strings give you the correct output when you print them then they are already escaped correctly. The extra backslashes you see are probably because you are displaying them in the interactive interpreter which adds extra backslashes for you when you display variables to make them less ambiguous.
> x
=> "\\"
> puts x
\
=> nil
> x.length
=> 1
Note that even though it looks like x contains two backslashes, the length of the string is one. The extra backslash is added by the interpreter and is not really part of the string.
If you still think there's a problem, please be more specific about how you are displaying the strings that you mentioned in your question.
Edit: In your example the only thing that need unescaping are octal escape codes. You could try this:
x = x.gsub(/\\[0-2][0-7]{2}/){ |c| c[1,3].to_i(8).chr }
How do I convert strings like "this is an example" to "this-is-an-example" under ruby?
The simplest version:
"this is an example".tr(" ", "-")
#=> "this-is-an-example"
You could also do something like this, which is slightly more robust and easier to extend by updating the regular expression:
"this is an example".gsub(/\s+/, "-")
#=> "this-is-an-example"
The above will replace all chunks of white space (any combination of multiple spaces, tabs, newlines) to a single dash.
See the String class reference for more details about the methods that can be used to manipulate strings in Ruby.
If you are trying to generate a string that can be used in a URL, you should also consider stripping other non-alphanumeric characters (especially the ones that have special meaning in URLs), or replacing them with an alphanumeric equivalent (example, as suggested by Rob Cameron in his answer).
If you are trying to make something that is a good URL slug, there are lots of ways to do it.
Generally, you want to remove everything that is not a letter or number, and then replace all whitespace characters with dashes.
So:
s = "this is an 'example'"
s = s.gsub(/\W+/, ' ').strip
s = s.gsub(/\s+/,'-')
At the end s will equal "this-is-an-example"
I used the source code from a ruby testing library called contest to get this particular way to do it.
If you're using Rails take a look at parameterize(), it does exactly what you're looking for:
http://api.rubyonrails.org/classes/ActiveSupport/CoreExtensions/String/Inflections.html#M001367
foo = "Hello, world!"
foo.parameterize => 'hello-world'