Elixir: String Force encode UTF-8 - ruby

I have a gem in ruby that does translations using google API and I am "translating" it to Elixir.
For example, I get from API some like this:
api-data
And in Ruby today I do this:
encoded = rawdata.force_encoding("UTF-8")
I would like to know if there is a way to "force_encode" (like Ruby does) but with Elixir?
UPDATE SOLUTION
I reached a solution based in your answers guys thanks a lot!
As Elixir handle it as binaries then that is the trick: I get the response body: body |> IO.iodata_to_binary ...
defmodule Request do
alias Extract
use HTTPotion.Base
def process_url(url) do
"https://translate.google.com/translate_a/" <> url
end
def process_response_body(body) do
body |> IO.iodata_to_binary |> Extract.extract
end
end
Here is entire code

You use force encoding in Ruby when the data is tagged as binary but really is UTF-8. In Elixir, they are both at the same time, because all strings are binaries, we don't tag them in anyway. In other words, you shouldn't need to force encoding.
However, if the data is not in UTF-8, then you need to find a way to convert it to UTF-8 in the first place.

Related

Ruby RSS::Parser.to_s silently fails?

I'm using Ruby 1.8.7's RSS::Parser, part of stdlib. I'm new to Ruby.
I want to parse an RSS feed, make some changes to the data, then output it (as RSS).
The docs say I can use '#to_s', but and it seems to work with some feeds, but not others.
This works:
#!/usr/bin/ruby -w
require 'rss'
require 'net/http'
url = 'http://news.ycombinator.com/rss'
feed = Net::HTTP.get_response(URI.parse(url)).body
rss = RSS::Parser.parse(feed, false, true)
# Here I would make some changes to the RSS, but right now I'm not.
p rss.to_s
Returns expected output: XML text.
This fails:
#!/usr/bin/ruby -w
require 'rss'
require 'net/http'
url = 'http://feeds.feedburner.com/devourfeed'
feed = Net::HTTP.get_response(URI.parse(url)).body
rss = RSS::Parser.parse(feed, false, true)
# Here I would make some changes to the RSS, but right now I'm not.
p rss.to_s
Returns nothing (empty quotes).
And yet, if I change the last line to:
p rss
I can see that the object is filled with all of the feed data. It's the to_s method that fails.
Why?
How can I get some kind of error output to debug a problem like this?
From what I can tell, the problem isn't in to_s, it's in the parser itself. Stepping way into the parser.rb code showed nothing being returned, so to_s returning an empty string is valid.
I'd recommend looking at something like Feedzirra.
Also, as a FYI, take a look at Ruby's Open::URI module for easy retrieval of web assets, like feeds. Open-URI is simple but adequate for most tasks. Net::HTTP is lower level, which will require you to type a lot more code to replace the functionality of Open-URI.
I had the same problem, so I started debugging the code. I think the ruby rss has a few too many required elements. The channel need to have "title, link, description", if one is missing to_s will fail.
The second feed in the example above is missing the description, which will make the to_s fail...
I believe this is a bug, but I really don't understand the code and barely ruby so who knows. It would seem natural to me that to_s would try its best even if some elements are missing.
Either way
rss.channel.description="something"
rss.to_s
will "work"
The problem lies in def have_required_elements?
Or in the
self.class::MODELS

ruby parse HTTParty array - json

Still new to Ruby - I apologize in advance if this has been asked.
I am using HTTParty to get data from an API, and it is returning an array of JSON data that I can't quite figure out how to parse.
#<Net::HTTPOK:0x1017fb8c0>
{"ERRORARRAY":[],"DATA":[{"ALERT":1,"LABEL":"hello","WATCHDOG":1},{"LABEL":"goodbye","WATCHDOG":1}
I guess the first question is that I don't really know what I am looking at. When I do response.class I get HTTParty::Response. It appears to be a Hash inside an array? I am not sure. Anyway, I want a way to just grab the "LABEL" for every separate array, so the result would be "hello", "goodbye". How would I go about doing so?
you don't need to parse it per say. what you could do is replace ':' with '=>' and evaluate it.
example: say you have ["one":"a","two":"b"], you could set s to equal that string and do eval s.gsub(/^\[/, '{').gsub(/\]$/, '}').gsub('":', '"=>') will yield a ruby hash (with inspect showing {"one"=>"a", "two"=>"b"})
alternatively, you could do something like this
require 'json'
string_to_parse = "{\"one\":\"a\",\"two\":\"b\"}"
parsed_and_a_hash = JSON.parse(string_to_parse)
parsed_and_a_hash is a hash!
If that's JSON, then your best bet is to install a library that handles the JSON format. There's really no point in reinventing the wheel (although it is fun). Have a look at this article.
If you know that the JSON data will always, always be in exactly the same format, then you might manage something relatively simple without a JSON gem. But I'm not sure that it's worth the hassle.
If you're struggling with the json gem, consider using the Crack gem. It has the added benefit of also parsing xml.
require 'crack'
my_hash_array = Crack::JSON.parse(my_json_string)
my_hash_array = Crack::XML.parse(my_xml_string)

Convert POST parameters to hash in Ruby without rails

What's the best library to use to convert the HTTP POST string received from a browser into a Ruby hash? I don't want to use the large rails-based libraries. I am using eventmachine and evma_httpserver, and want to include the lightest library possible that will decode and convert the params string.
Note: I don't need a webserver. I have the encoded post string in hand, and just need to convert it to a hash.
URI.decode_www_form from the Ruby standard library can do this: http://rubydoc.info/docs/ruby-stdlib/1.9.2/URI#decode_www_form-class_method
You could use the rack gem for its Rack::Utils.parse_query method.
If you want lighter than that, you could just copy the source code to the parse_query and unescape methods from it.
If you want event lighter (but perhaps not as performant or robust) than that, just implement your own split and lean on CGI.unescape.
Try this:
require "uri"
result = URI.decode_www_form("your=post&params=values").inject({}) {|r, (key,value)| r[key.to_sym] = value;r}
puts result[:your]
puts result[:params]

StringScanner scanning IO instead of a string

I've got a parser written using ruby's standard StringScanner. It would be nice if I could use it on streaming files. Is there an equivalent to StringScanner that doesn't require me to load the whole string into memory?
You might have to rework your parser a bit, but you can feed lines from a file to a scanner like this:
File.open('filepath.txt', 'r') do |file|
scanner = StringScanner.new(file.readline)
until file.eof?
scanner.scan(/whatever/)
scanner << file.readline
end
end
StringScanner was intended for that, to load a big string and going back and forth with an internal pointer, if you make it a stream, then the references get lost, you can not use unscan, check_until, pre_match, post_match,
well you can, but for that you need to buffer all the previous input.
If you are concerned about the buffer size then just load by chunk of data, and use a simple regexp or a gem called Parser.
The simplest way is to read a fix size of data.
# iterate over fixed length records
open("fixed-record-file") do |f|
while record = f.read(1024)
# parse here the record using regexp or parser
end
end
[Updated]
Even with this loop you can use StringSanner, you just need to update the string with each new chunk of data:
string=(str)
Changes the string being scanned to str and resets the scanner.
Returns str
There is StringIO.
Sorry misread you question. Take a look at this seems to have streaming options

Ruby: Convert HTML/Redcloth to plain text

Does anybody know how I can convert html to plain text with Ruby. Well really I need to convert RedCloth to plain text, either way would be fine.
I'm not talking about just striping out the tags (that is all I've done so far). For example I would like an ordered list to retain the numbers, unordered lists to use an asterisk for bullets etc.
def red_cloth_to_plain_text(s)
s = RedCloth.new(s).to_html
s = strip_tags(s)
s = html_unescape(s) # reverse of html_escape
s = undo_red_cloths_html_codes(s)
return s
end
Maybe I have to attempt a RedCloth to plain text formatter
You need to make a new formatter class.
module RedCloth::Formatters
module PlainText
include RedCloth::Formatters::Base
# ...
end
end
I won't write your code for you today but this is very easy to do. Read the RedCloth source if you doubt me: it's only 346 lines for the HTML formatter.
So, once you have your PlainText formatter you patch the class and use it:
module RedCloth
class TextileDoc
def to_txt( *rules )
apply_rules(rules)
to(RedCloth::Formatters::PlainText)
end
end
end
print RedCloth.new(str).to_txt
Joseph Halter wrote a RedCloth plain formatter:
http://github.com/JosephHalter/redcloth-formatters-plain
Example usage:
RedCloth.new("p. this is *simple* _test_").to_plain
will return:
"this is simple test"
That may be what you have to do. You're not the first to want this, but I'm guessing it's not part of the library yet because everyone wants their plaintext a little different.

Resources