in Ruby, trying to convert those weird quotes into "regular" quotes - ruby

I am trying to parse a text file that has the weird quotes like
“ and ” into "normal quotes like "
I tried this:
text.gsub!("“",'"')
text.gsub!("”",'"')
but when it's done, they are still there and show up as
\x93 and \x94
so I tried adding that too with no luck:
text.gsub!('\\x93', '"')
text.gsub!('\\x94', '"')
The problem is, when I try to show those weird quotes on a webpage, it makes that weird diamond with a question mark symbol: �

It seems to work:
text = "“foo”"
=> "\342\200\234foo\342\200\235"
irb(main):002:0> text.gsub!("“",'"')
=> "\"foo\342\200\235"
irb(main):003:0> text.gsub!("”",'"')
=> "\"foo\""
You need to use a hex editor to figure out all the character codes involved.

Re: the second question of why the weird quotes show on a web page as the � symbol:
Your problem is that your web page is not in UTF-8 mode. To get it there, see
http://www.w3.org/International/O-HTTP-charset
If you can't change your web server, add a meta line in the head section of your web pages: http://www.utf-8.com/
Larry

Your first gsubs should work. The reason the second set of gsubs don't work is that you're using single quotes and double backslash. Try the other way around:
text.gsub!("\x93", '"')
text.gsub!("\x94", '"')
You can also do this in one line:
text.gsub!("\x93", '"').gsub!("\x94", '"')
# or
text.gsub!(/(\x93|\x94)/, '"')
Are you sure the encoding of the string is correct?

Related

Ruby regex and special characters like dash (—) and »

I'm trying to replace all punctuation and the likes in some text with just a space. So I have the line
text = "—Bonne chance Harry murmura t il »"
How can I remove the dash and the dash and »? I tried
text.gsub( /»|—/, ' ')
which gives an error, not surprisingly. I'm new to ruby and just trying to get a hang of things by writing a script to pull all the words out of a chapter of a book. I figure I'd just remove the punctuation and symbols and just use text.split. Any help would be appreciated. I couldn't find much
It turns out the problem had to do with the utf-8 encoding. Adding
# encoding: utf-8
solved my issues and what #Andrewlton said works great
This should properly substitute in the way you were trying to do it; just add brackets and remove the pipe:
text.gsub(/[»—]/, ' ')
The standard punctuation regexp also works:
text.gsub(/\p{P}/, ' ')
You should be able to use regexp pretty universally, coming from whatever language you know. Hope this helps!

Single quote string interpolation to access a file in linux

How do I make the parameter file of the method sound become the file name of the .fifo >extension using single quotes? I've searched up and down, and tried many different >approaches, but I think I need a new set of eyes on this one.
def sound(file)
#cli.stream_audio('audio\file.fifo')
end
Alright so I finally got it working, might not be the correct way but this seemed to do the trick. First thing, there may have been some white space interfering with my file parameter. Then I used the File.join option that I saw posted here by a few different people.
I used a bit of each of the answers really, and this is how it came out:
def sound(file)
file = file.strip
file = File.join('audio/',"#{file}.fifo")
#cli.stream_audio(file) if File.exist? file
end
Works like a charm! :D
Ruby interpolation requires that you use double quotes.
Is there a reason you need to use single quotes?
def sound(FILE)
#cli.stream_audio("audio/#{FILE}.fifo")
end
As Charles Caldwell stated in his comment, the best way to get cross-platform file paths to work correctly would be to use File.join. Using that, your method would look like this:
def sound(FILE)
#cli.stream_audio(File.join("audio", "#{FILE}.fifo"))
end
Your problem is with your usage of file path separators. You are using a \. Whereas this may not seem like a big deal, it actually is when used in Ruby strings.
When you use \ in a single quoted string, nothing happens. It is evaluated as-is:
puts 'Hello\tWorld' #=> Hello\tWorld
Notice what happens when we use double quotes:
puts "Hello\tWorld" #=> "Hello World"
The \t got interpreted as a tab. That's because, much like how Ruby will interpolate #{} code in a double quote, it will also interpret \n or \t into a new line or tab. So when it sees "audio\file.fifo" it is actually seeing "audio" with a \f and "ile.fifo". It then determines that \f means 'form feed' and adds it to your string. Here is a list of escape sequences. It is for C++ but it works across most languages.
As #sawa pointed out, if your escape sequence does not exist (for instance \y) then it will just remove the \ and leave the 'y'.
"audio\yourfile.fifo" #=> audioyourfile.fifo
There are three possible solutions:
Use a forward slash:
"audio/#{file}.fifo"
The forward slash will be interpreted as a file path separator when passed to the system. I do most my work on Windows which uses \ but using / in my code is perfectly fine.
Use \\:
"audio\\#{file}.fifo"
Using a double \\ escapes the \ and causes it to be read as you intended it.
Use File.join:
File.join("audio", "#{file}.fifo")
This will output the parameters with whatever file separator is setup as in the File::SEPARATOR constant.

quote_char causing fits in ruby CSV import

I have a simple CSV file that uses the | (pipe) as a quote character. After upgrading my rails app from Ruby 1.9.2 to 1.9.3 I'm getting an "CSV::MalformedCSVError: Missing or stray quote in line 1" error.
If I pop open vim and replace the | with regular quotes, single quotes or even "=", the file works fine, but | and * result in the error. Anyone have any thoughts on what might be causing this? Here's a simple one-liner that can reproduce the error:
#csv = CSV.read("public/sample_file.csv", {quote_char: '|', headers: false})
Also reproduced this in Ruby 2.0 and also in irb w/out loading rails.
Edit: here are some sample lines from the CSV
|076N102 |,|CARD |,| 1|,|NEW|,|PCS |
|07-1801 |,|BASE |,| 18|,|NEW|,|PCS |
I think you've just discovered a bug in CSV ruby module.
From csv.rb :
1587: #re_chars = /#{%"[-][\\.^$?*+{}()|# \r\n\t\f\v]".encode(#encoding)}/
This Regexp is used to escape characters conflicting with special regular expression symbols, including your "pipe" char | .
I don't see any reason for the prepending [-], so if you do remove it, your example starts to work:
edit: the hyphen has to be escaped inside character set expression (surrounded with brackets []) only when not as the leading character. So had to update the fixed Regexp:
1587: #re_chars = /#{%"(?<!\\[)-(?=.*\\])|[\\.^$?*+{}()|# \r\n\t\f\v]".encode(#encoding)}/
CSV.read('sample.csv', {quote_char: '|'})
# [["076N102 ",
# "CARD ",
# " 1", "NEW", "PCS "],
# ["07-1801 ",
# "BASE ",
# " 18", "NEW", "PCS "]]
As most languages does not support lookbehind expressions with quantifiers, Ruby included, I had to write it as a negative version for the left bracket. It would also match hyphens with missing left one of a bracket pair. If you'd find a better solution, leave a comment pls.
Glad to hear any comments before fill in a bug report to ruby-lang.org .

Backslash in string returns two backslash

I entered this access token value
864876322670016\u00257C4e1d481ecad9eb45b9386745.1-1026038548\u00257CshuA8v7lgo7-hRr2AjbUBd3shek
on a form but it was returned with double backslash like this
864876322670016\\u00257C4e1d481ecad9eb45b9386745.1-1026038548\\u00257CshuA8v7lgo7-hRr2AjbUBd3shek
I'm passing this value to Facebook GraphAPI and this returns an error.
How can I get return replace the double backslash with a single one? or is there a way for the double backslash to not appear?
Are you sure it's actually returned with double backslashes? Internally strings with backslashes will look like they have double backslashes because Ruby is escaping them:
> a = 'aaa\bbb\ccc'
=> "aaa\\bbb\\ccc" # Looks like doubles
> a
=> "aaa\\bbb\\ccc"
> a.inspect
=> "\"aaa\\\\bbb\\\\ccc\"" # Looks even worse
> puts a
aaa\bbb\ccc # ...but it isn't
But if they are double backslashes you can do something like this:
> puts aa
aaa\\bbb\\ccc # String with double backslash
> aa.gsub!("\\\\", "\\")
> puts aa
aaa\bbb\ccc
It's just the way it's being displayed, in escaped form. Your error is likely elsewhere.
This is a common misinterpretation of the output, and a little confusing when you first see it, as Casper has pointed out.
From this question/answer, where the person's issue was essentially the same:
Dang it. I forgot that when the result is displayed in double quotes it shows it escaped.
There's also a short discussion of this perceived issue in this blog post.

Watir magic escape sequence?

I am currently using Watir with Firefox and it seems that when I try to set a field with the following text:
##$QWER7890uiop
The command I am using is the following:
text_field(:name, "password").value=("!##$QWER7890uiop)
I've also tried this:
text_field(:name, "password").set "!##$QWER7890uiop)
Only the first 2 characters get entered. Is there something I can do to by pass this feature?
You need to escape the string using single quotes '.
text_field(:name, "password").value='"!##$QWER7890uiop'
Many characters are substituted inside double quotes.
Escape sequences like \n, \t, \s, etc are replaced by their equivalent character(s). See here for full list.
#{} where anything the braces is interpreted as a ruby expression.
#$something where $something is interpreted as a ruby global variable. That's the problem with your quote above, beside not being terminated.
%s is interpreted as an ERB template expression (it is interpolated).
For instance:
puts "%s hours later" % 'Five'
results in
"Five hours later".

Resources