I'm trying to find a markdown interpreter class/module that I can use in a rakefile.
So far I've found maruku, but I'm a bit wary of beta releases.
Has anyone had any issues with maruku? Or, do you know of a better alternative?
I use Maruku to process 100,000 - 200,000 documents per day. Mostly forum posts but I also use it on large documents like wiki pages. Maruku is much faster than BlueCloth and it doesn't choke on large documents. It's all Ruby and although the code isn't especially easy to extend and augment, it is doable. We have a few tweaks and extras in our dialect of Markdown.
If you want something that is pure Ruby, I definitely recommend Maruku.
For the fastest option out there, you probably want RDiscount. The guts are implemented in C.
See also: "Moving Past BlueCloth" on Ryan Tomayko's blog.
Ryan's post includes the following benchmark of 100 iterations of a markdown test:
BlueCloth: 13.029987s total time, 00.130300s average
Maruku: 08.424132s total time, 00.084241s average
RDiscount: 00.082019s total time, 00.000820s average
Update August 2009
BlueCloth2 was released (http://www.deveiate.org/projects/BlueCloth)
It's speed is on par with RDiscount because it is based on RDiscount - it is not pure Ruby.
(Thanks Jim)
Update November 2009
Kramdown 1.0 was just released. I haven't tried it yet, but it is a pure-Ruby Markdown parser that claims to be 5x faster than Maruku.
Update April 2011
Maruku hasn't seen a commit since June 2010. You may want to look into Kramdown instead.
A new fast option that is not pure Ruby: GitHub has released Redcarpet, which is based on libupskirt: https://github.com/blog/832-rolling-out-the-redcarpet
Update August 2013
Kramdown is still a very healthy project (based on recent commits, outstanding issues, pull requests) and a great choice for a pure Ruby Markdown engine https://github.com/gettalong/kramdown
Redcarpet is probably still the most commonly used and actively maintained option for people that don't need or want pure Ruby.
The listing at http://ruby-toolbox.com/categories/markup_processors.html would be a good place to start looking.
RDiscount is Fast and simple to use.
Try RDiscount. BlueCloth is slow and buggy.
The benchmark in the answer given by casey use BlueCloth 1. BlueCloth 2 is the fastest these days : http://www.deveiate.org/projects/BlueCloth
I believe BlueCloth is the most prominent one.
Looks like a lot of these answers are outdated.
Best thing I've found out there as of now (summer 2013) is the Redcarpet gem: https://github.com/vmg/redcarpet
To ensure you're getting BlueCloth 2, install like this:
gem install bluecloth
Note that "bluecloth" should be in all lowercase, not camel case.
Source: http://rubygems.org/gems/bluecloth
If you need a fair example for how to use something like Kramdown in a rakefile there is a repo on github with code and articles in markdown.md that can be converted to html with Ruby code syntax highlighting but alas line numbers as well.(I would prefer to turn off line numbering)
If anyone knows how to shut off the line numbering default please tell us.
Anyway the link is https://github.com/elm-city-craftworks/practicing-ruby-manuscripts
Related
I'm interested in the Stanza constituency parser for Italian.
In https://stanfordnlp.github.io/stanza/constituency.html it is said that a new release with updated models (including an Italian model trained on the Turin treebank) should have been available in mid-November.
Any idea about when the next release of Stanza will appear?
Thanks
alberto
Technically you can already get it! If you install the dev branch of stanza, you should be able to download an IT parser.
pip install git+git://github.com/stanfordnlp/stanza.git#704d90df2418ee199d83c92c16de180aacccf5c0
stanza.download("it")
It's trained on the Turin treebank, which has about 4000 trees. If you download the Bert version of the model, it gets over 91 F1 on the Evalita test set (but has a length limit of about 200 words per sentence).
We might splurge on getting the VIT treebank or something. I've been agitating that we use that budget on Danish or PT or some other language where we have very few users, but it's a hard sell...
Edit: there's also some scripts included for converting the publicly available Turin trees into brackets. Their MWT annotation style was to repeat the MWT twice in a row, which doesn't doesn't work too well for a task like parsing raw text.
It is still very much a live task ... either December or January, I would say.
p.s. This isn't really a great SO question....
Is there a ruby gem that I can use with Ruby or Ruby on Rails that accepts an info hash and returns information on the torrent? Like seeders, leachers, size, etc.?
If not is there any other way I can get this information using Ruby? Is there an API that I can easily digest?
Thanks in advance.
Take a look at the thepiratebay.
Although, it seems like it's not maintained actively anymore. But, should solve your problem.
You can find a torrent:
ThePirateBay::Torrent.find("123123123")
Also, you can get all the seeders, leechers and size:
ThePirateBay::SortBy::Size # Size, largest first
ThePirateBay::SortBy::Seeders # Most seeders first
ThePirateBay::SortBy::Leechers # Most leechers first
So, why not giving it a try?
It really depends what torrents you are talking about. Different torrent trackers have different APIs.
You might want to dig into specific tracker API (please be mindful these ones are not Ruby APIs):
https://getstrike.net/api/
https://www.npmjs.com/package/thepiratebay
This question is related to this one: Using Delta Indexes for associations in Thinking Sphinx
I have exactly the same dilemma right now. I tried the solution posted by Pat and Claudio but no luck since I'm using Thinking Sphinx version 3.0.6.
I'm using ts delayed delta gem as well.
This is covered here: https://github.com/pat/thinking-sphinx/issues/780 - but the short answer is:
ThinkingSphinx::Deltas::IndexJob.new('product_delta').perform
If you want to queue it up in Delayed Job, though, then the following is what you're after:
Delayed::Job.enqueue(
ThinkingSphinx::Deltas::DelayedDelta::DeltaJob.new('product_delta')
)
I want to parse web page (catalog) using some Ruby libraries for that and store it to the database. Currently it is hard for me what to choose what kind of library is the best for such kind of purposes. I'm familiar with Hpricot but I'm not really sore that nowadays it is on the edge.
P.S - Or any kind of data to parse URL-s?
Thank you!
I think for HTML parsing nokogiri with open-uri is best.
Why do you care about a library, that "nowadays is on the edge"? If you feel yourself confidently with Hpricot, then use it. Don't waste your time on endless seeks: merely start writing a program. That is my answer.
Hehe, I was looking to quote Hpricot author on this matter, and I've found this comment:
Hpricot was the work of the hacker _why who has now disappeared. But
even before he disappeared nokogiri overtook hpricot in performance.
He even tweeted "caller asks, “should i use hpricot or nokogiri?” if
you're NOT me: use nokogiri. and if you're me: well cut it out, stop
being me"
And here is a link to a comment I've quoted:
http://news.ycombinator.com/item?id=1955644
Summing this up: go with Nokogiri.
I need some gem/plugin to create an Excel spreadsheet with formulas to use in my Rails application. Any suggestions?
I've used Roo and it's quite good and easy to do spreadsheet processing (once you get all the gem dependencies installed). However, it doesn't support formulas natively. It won't eval the formula and return the result (this would be difficult I think -- use the excel engine?) but it will give you the text of the formula, for example:
=SUM(.A1,.B1)
It'd be pretty easy to handle this specific case but if you have many different formulas and functions then rolling your own evaluator is going to be difficult. Going and getting A1 and B1 to add them together is very doable with Roo. It's just a question of how complex your formulas are.
writeexcel does it wonderfully!
I think you should create blank Excel file with formulas and then fill it with Rails. Because you can't create formulas with Ruby.
There's a spreadsheet gem listed on RubyGems but having never used it I can't recommend it.