Segmentation fault in hpricot - ruby

I'm using hpricot to read HTML. I got a segmentation fault error, I googled and some say upgrade to latest version of Ruby. I am using rails 2.3.2 and ruby 1.8.7. How to resolve this error?

I was trying to parse html pages with many unicode characters in them and Hpricot kept crashing. Finally, I used the monkey patch from sanitize and put it in the environment.rb for my rails application. There hasn't been a single crash since I added this patch:
http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb

If you're free to choose your HTML parsing library, switch it.
Why, the creator of Hpricot, recently posted that you should better use Nokogiri instead of HPricot, nowadays.
You may also have a look at HTTParty.

On ruby 1.8.5 try using hpricot -v 0.6.161
That worked for me.

From memory, since I last used it about a year ago:
Hpricot stores attributes in a fixed-size buffer, and some frameworks generate outrageously long hashes in document attributes. There's some static field you can set before parsing that lets you set the size of this buffer.
I remember it being fairly prominent in the docs on the webpage, though of course it's gone now.

Well, based on your own question, I'd say "Upgrade to the latest version of Ruby". However, I've also had problems with hpricot segfaulting, which seemed to be related to my usage of threading.

This appears to be an outstanding issue on the bug list. I have experienced it to. My theory is has to do with the HTML structure or bad/corrupt character in the file but I have not found where exactly.
Here are the links to the issues:
http://github.com/why/hpricot/issues/#issue/10
http://github.com/why/hpricot/issues/#issue/4

I'm having the same segfault issue but sadly can't consult the issues Dave cited above, even via Google cache -- from what I've been googling the parse.rb segfaults have to do with encoded entities or alt character sets (accented characters perhaps)
The sanitize lib encountered the same issue and posted a monkeypatch here:
http://github.com/rgrove/sanitize/blob/1e1dc9681de99e32dc166f591343dfa60fc1f648/lib/sanitize/monkeypatch/hpricot.rb

Related

Xzing recognizes Barcode only in Java version

I'm trying to bulk scan some jpg files with barcodes on them. I've used the ruby bindings for the c++ port of xzing. When I have this file:
scanned by the Web-Version of Xzing (https://zxing.org/w/decode.jspx) everything turns out fine. When I try to scan this one in ruby (using https://github.com/glassechidna/zxing_cpp.rb) nothing is recognized. I already tried cranking up the contrast, but it did not help. It's not my ruby setup because it works for loads of other nearly identical codes. The only thing I can think of is any difference between the Java version and the C++ port, but this is absolute poking in the dark, I've started using zxing just today.
Could anyone get this code recognized in ruby? Thank you very much.
The gem you're using and/or it's dependencies our out of date. If you want to still use Ruby for your project, you can try using one of the online services in the comments for the decoding. You could either try to use the
mechanize gem or roll your own using other http ruby tools such as httparty or Ruby's Net::HTTP

Ruby 2.4.4 documentation missing

My question is about the ruby-doc.org documentation, but also relates to the ri documentation lookup inside ruby.
I've already read dozens of similar questions/answers about the ri not working and giving "nothing known" messages and I've tried to follow some of that old advice. It just seems that those old answers aren't applicable to me.
One was to install the rdoc --all --ri from the ruby root directory. I tried that and it failed (unable to convert to UTF8 or something like that).
Another suggested that the rubyinstaller for windows installer just doesn't contain that info anymore and I should use the online documentation, which, when I goto http://ruby-doc.org/downloads/ I discover that the version I am using (2.4.4) does not exist.
This is odd, because the rubyinstaller site specifically says that if I'm new to Ruby(which I am), I should install 2.4.4. You'd think that if any version had good documentation, it would be that one. Instead, it seems to be missing entirely.
This all started because I am trying to learn Ruby and am watching the Lynda.com course on Ruby by Kevin Skoglund, which was recorded many versions ago and in that course he refers to the ri command from the shell, which in my version doesn't work. see below:
ruby --version
ruby 2.4.4p296 (2018-03-28 revision 63013) [x64-mingw32]
ri --version
ri.cmd 5.0.0
ri String
Nothing known about String
Now, if it's not available within ruby using ri, and I have to use online documentation, AND it's missing for my version, which happens to be the version recommended for new users, … you see my frustration.
Here's what I really want...
1. I want to use ri and have it work.
2. If that's just not possible, I'd like to know where the documentation for my version is online, because it's not where it's supposed to be.
Any help is appreciated. If it involves installing anything, letting me know HOW to do that is also appreciated. As I mentioned, I'm new.
Since you are using the RubyInstaller, I will assume that you are on Windows.
I will open this by saying that I am not 100% on this, but I am pretty condfident in this answer.
The "Use 2.4, not 2.5" was due to errors with Ruby Gem when Ruby 2.5
was first released, as I happen to be on a Windows machine installing
Ruby at that exact time, and that was the reasoning at that point not
to use the newest 2.5 version.
The above mentioned reason has since been corrected.
Realistically, if you are beginner, as long as you using documentation that is close to your version (2.0+ - 2.4-ish), it will be fine. Now obviously, and I shouldn't have to provide this disclaimer, though I will so to avoid the inevitable down-votes if I don't, this is not 100% perfect solution and there will be very small differences. As a beginner, the likelihood of you encountering any of these differences are extremely low, low enough not to even worry about. There are missing and poorly documented sections of every language, and Ruby is no exception. Typically these are less used classes (though Ruby Fiddle is an exception that I hate how poorly documented it is), and will have no effect on your learning process as you learn the fundamentals and core of the language.
To my recollection, the "core" is rather well documented, and so long as you use documentation from 2.0+ (the closer to 2.4 the better), you should be completely fine, and it is exactly the same. The "standard library" may be slightly more hit or miss, and your mileage may vary a little more, yet still nothing too extreme.
So, to address the second part of your question, do not worry too hard about finding the EXACT version of documentation you are using. It may not even exist online, though the installer should have provided a CHM help file (there will be shortcut for it with the shortcuts for Ruby, IRB, etc.
As for "why" ri is not working, I am not 100% sure yet again. I am on ArchLinux, and RDoc doesn't even built. Honestly, RDoc is being left by the wayside for newer (and IMO better) document engine, namely YARD. A possible solution that I do, and prefer, is to install the YARD gem right after I install Ruby:
gem install yard
And then set YARD to generate my documentation with this in CMD:
yard config --gem-install-yri
If you decide to take this route, much more can be learned about it here.
The benefit with it is that it also supports RDoc and is backwards compatible.

Octopress - Generating blank files

Asked this on superuser.com, not sure if stackoverflow is a better suitable place for it, but I am not getting any answers yet:
===
I am trying to generate a new blog entry in my octopress setup, but I noticed that some previous posts are being generated as empty files in public, so are the new ones I am trying to generte.
There seems to be no difference at all between the markup files from one entry which is being properly generated to another that isn't
I've got two octopress installations, one's working and this one I am talking about isn't, updates octopress on both, reinstalled bundle but no luck, files as atom.xml are also not being generated correctly.
Also updated from ruby 1.9.2p290 to latest release from 1.9.3 but also did not difference.
Anyone's encountered this before?
===
This is most likely because you started using codeblocks. This was happening to me, and even posts/pages that didn't use codeblocks would fail to generate. My problem (on Windows) was that I didn't even have Python installed (thought I did). Installing it fixed the problem, then gave me another error, which was fixed by updating the pygments.rb (note .rb) gem. Doing these two things fixed all my problems.
There's a similar issue if you're on arch linux which defaults python to version 3 which isn't supported by pygments.rb yet. You'll have to look around to figure out how to fix that to use 2.7 instead, but it should be pretty straightforward.
Can you provide an example of: a) a post that doesn't generate correctly, and b) a post that does generate correctly?
I assume they are just individual posts (and not, for example, pages like /about/). I would also assume that they render as blank both in the blog index on your front page and on the individual post page.
Also - what does render? Is it rendering the rest of the page, but just without the "content" of the post itself? Or does the page not even exist? (404?)

Ruby: code "updgrading" from 1.8.6 to 1.9.2

i'm interested in updating code which is written in ruby v1.8.6 to 1.9.2. Are there any useful links to read about (probably with some warnings and recommendations)?
Just to be clear, so, i'm not expecting any problems right now, but i would like to avoid them.
P.S.
Links like this one are mostly not helpful.
A slideshow showing the differences: http://slideshow.rubyforge.org/ruby19.html#11
I migrated an app from ruby 1.8.7 to 1.9.3 a few days ago, and no problems happened.
Bu i advise you to test all of your code for little bugs.
Verify that all the gems you depend on are 1.9.2 compatible. isitruby19.com is a good resource to check this.
Other than that, the best strategy is to try migrating and do some testing. If your application has comprehensive unit test coverage, this shouldn't be too painful.

Where to find the Rails API rdoc template

I'm trying to prettify my rdoc documentation, using version 3.5.3. I'm not a fan of the built-in darkfish theme, so I tried to find a way to replace it with the one used by the official Rails API documentation at http://api.rubyonrails.org/, but I've had no luck finding it in any readily available form. I've searched all over github, among other things.
What I've found so far is
https://github.com/mislav/hanna
which might be slightly out of date, and it's fork
https://github.com/rdoc/hanna-nouveau
Both are nice, but not quite what I want. So before I start fiddling with those templates, does anyone know if the template used by the Rails API docs is available as a gem somewhere?
Thanks!
I know this is very late, but it looks like the new version of Rails uses something called sdoc, which enhances the output with JavaScript searching, and is a little cleaner IMO. Doing a simple "gem install sdoc" will get what you need, then just use rdoc.options << '-f' << 'sdoc'.
The github project appears to be at https://github.com/voloko/sdoc/
By the way, thanks for the question! Without the initial answer posted, I'd have never found where to look, and been stuck with that horrible darkfish theme for my own projects!
This looks like it. Ignore the instructions that say to do a gem install horo --pre -- that'll actually give you an older beta version. Just do gem install horo and you'll get the current 1.0.3 version (Edit: I sent a pull request to update the instructions, which has already been accepted).
https://github.com/tenderlove/horo
By the way, I found this by looking at the Rails source code and viewing the Rakefile to see the RDoc options. Specifically, line 67 shows rdoc.options << '-f' << 'horo'.

Resources