I understand that this is a very broad question and could get flagged but I need inputs from experienced programmers and will ask it anyway. If there is another forum where I can post this question, please let me know.
Currently we manage all our application information in an Excel spreadsheet. At a high level it contains an app id, the server names that it is hosted on and the name of the environment. The Excel spreadsheet has become too large and I am looking to build a simple application for it.
Ideally, I would like to write this app on Windows as everyone uses Windows but dont know how to go about it in Windows. I then thought of using MySQL and PHP or Perl (CGI) to build this but thought of exploring something new. I read about Joomla and a few other CMS products which make it very easy to build websites but am not sure whether these allows me to pull information from a database.
I am seeking inputs on what would be a good way to way to build this application.
Use Joomla! CMS is a good choice and to pull data from database you may use webservice calls. So, you will able to create a CMS website using joomla and will able to pull data easily from database with the help of webservice.
You can get webservice support in joomla by installing component redCORE in joomla.
Component: https://github.com/redCOMPONENT-COM/redCORE
Wiki: http://redcomponent-com.github.io/redCORE/?chapters/webservices/overview.md
Other videos: https://www.youtube.com/watch?v=UzJkC7f9fJE
https://www.youtube.com/watch?v=1NRT5jh3Ewc
Joomla dev group discussion https://groups.google.com/d/msg/joomla-dev-cms/3OctbkIZlQw/5d_1MLrzbgYJ
You can also post questions in Joomla forum http://forum.joomla.org/
I think Joomla is a great option to handle big loads of information. If you already know PHP and don't need to reinvent the wheel, it's cool. The way of handling data in Joomla is using Components.
If you want to try, it would be as easy as installing a local copy of Joomla, building the field structure on component-creator.com installing it and importing the data inside the component using phpmyadmin.
I would like to build a chrome extension (CE) that pulls data from a ruby db for a specific user. So, in a basic example, if an user submits their favorite color as 'red' and sport as 'tennis' into the db from the core website, when they click the CE, 'red' and 'tennis' will show up no matter where they are on the internet.
Any guidance on how to build something like this? Seems quite simple but am not sure how the CE files fit in with the ruby folder framework.
Also, is it possible to write to a ruby database from a popped out CE? i.e. - submitting 'red' and 'tennis' from the CE to the ruby database to go along with the previous example. Any guidance?
Cheers
This is a very general question so it sounds like you will need to learn a lot. Which can be a good thing :)
Here are the general steps you need:
Look into building an API for your ruby application. This will allow you to get data from your database. For example, you can
make an app where you go to http://yoursite.com/api/favorites and that will return a list of all favorites as JSON. Then in your Chrome Extension you can parse the JSON and display the results to the user. You will probably want to do this using an ajax call (see jquery.ajax for an easy way to use ajax).
Assuming you want user accounts, your user will need to be logged in. Then you can use your user's cookies to verify that they are logged in and show them custom info. i.e. going to http://yoursite.com/api/favorites will just show the favorites for that user, not for everyone.
Finally, submitting things to the database...you can have another route where users can send stuff. For example, if you go to http://yoursite.com/api/favorites/add?color=red then it will add the color red to that user's favorites. You will need to write all the logic for adding stuff to the database...again, it might help you to go through a rails tutorial and then look at building an API.
Related to #3, look into RESTful APIs. A good convention is that if you issue a GET request, you're asking for data, but if you issue a POST request, you are adding data (in your case, creating a new favorite).
Finally, for terminology: it's not a "ruby" database, it's just a database. You can access a database using almost any language, and it sounds like you are accessing it using ruby right now :)
If you only need to store data for one machine browsing anywhere online, chrome has a storage api that would work great.
If you do need a ruby server, I would recommend looking at sinatra.
I have a app which runs on ruby on rails and mysql database. I need to generate couple of reports. Presently I am generating csv file but my Idea is to integrate with the google docs, as soon as a user opens a particular doc automatically it has to sync with the my rails app and fetch the data from the database and display in the google docs.
I need little heads up.I have gone through the docs, but couldn't figure out the way of doing it.
Thanks
Check out the to_google_spreadsheets gem. It is well documented and seems to have all the functionality you are looking for.
Cheers,
Sean
You will need to write an onOpen in your gdoc. From there call urlfetch to your app.
I got sick of rewriting login forms and user account management pages with the usual use-cases of registering a new account, changing password, changing e-mail, w/ associated e-mails. (This is for clients that won't accept an OAuth/OpenID solution). So I am creating a sample site with Sinatra and Datamapper that contains nothing but those features in their most distilled form.
What I'd like to do is package that site into a gem that someone could drop into an existing app and customize. I think it could get tricky because the app defines its own database and web server. So they would have to be redesigned as mix-ins for a Sinatra::App and Datamapper::Model.
Has anyone else tried this?
I created the Ruby gem "accounts" to provide this functionality for web-apps using Sinatra. It can be cloned or forked at https://github.com/lsiden/accounts.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I am looking at writing my own, but I am wondering if there are any good web crawlers out there which are written in Ruby.
Short of a full-blown web crawler, any gems that might be helpful in building a web crawler would be useful. I know this part of the question is touched upon in a couple of places, but a list of gems applicable to building a web crawler would be a great resource as well.
I used to write spiders, page scrapers and site analyzers for my job, and still write them periodically to scratch some itch I get.
Ruby has some excellent gems to make it easy:
Nokogiri is my #1 choice for the HTML parser. I used to use Hpricot, but found some sites that made it explode in flames. I switched to Nokogiri afterwards and have been very happy with it. I regularly use it for parsing HTML, RDF/RSS/Atom and XML. Ox looks interesting too, so that might be another candidate, though I find searching the DOM a lot easier than trying to walk through a big hash, such as what is returned by Ox.
OpenURI is good as a simple HTTP client, but it can get in the way when you want to do more complex things or need to have multiple requests firing at once. I'd recommend looking at HTTPClient or Typhoeus with Hydra for modest to heavyweight jobs. Curb is good too, because it uses the cURL library, but the interface isn't as intuitive to me. It's worth looking at though. HTTPclient is also worth looking at, but I lean toward the previously mentioned ones.
Note: OpenURI has some flaws and vulnerabilities that can affect unsuspecting programmers so it's fallen out of favor somewhat. RestClient is a very worthy successor.
You'll need a backing database, and some way to talk to it. This isn't a task for Rails per se, but you could use ActiveRecord, detached from Rails, to talk to the database. I've done that a couple times and it works all right. Instead, I really like Sequel for my ORM. It's very flexible in how it lets you talk to the database, from using straight SQL to using Sequel's ability to programmatically build a query, to modeling the database and using migrations. Once you have the database built, you could use Rails to act as a front-end to the data though.
If you are going to navigate sites in any way beyond simply grabbing pages and following links, you'll want to look at Mechanize. It makes it easy to fill out forms and submit pages. As an added bonus, you can grab the content of a page as a Nokogiri HTML document and parse away using Nokogiri's multitude of tricks.
For massaging/mangling URLs I really like Addressable::URI. It's more full-featured than the built-in URI module. One thing that URI does that's nice is it has the URI#extract method to scan a string for URLs. If that string happened to be the body of a web page it would be an alternate way of locating links, but its downside is you'll also get links to images, videos, ads, etc., and you'll have to filter those out, probably resulting in more work than if you use a parser and look for <a> tags exclusively. For that matter, Mechanize also has the links method which returns all the links in a page, but you'll still have to filter them to determine whether you want to follow or ignore them.
If you think you'll need to deal with Javascript manipulated pages, or pages that get their content dynamically from AJAX, you should look into using one of the WATIR variants. There are flavors for the different browsers on different OSes, such as Firewatir, Safariwatir and Operawatir, so you'll have to figure out what works for you.
You do NOT want to rely on keeping your list of URLs to visit, or visited URLs, in memory. Design a database schema and store that information there. Spend some time up front designing the schema, thinking about what things you'll want to know as you collect links on a site. SQLite3, MySQL and Postgres are all excellent choices, depending on how big you think your database needs will be. One of my site analyzers was custom designed to help us recommend SEO changes for a Fortune 50 company. It ran for over three weeks covering about twenty different sites before we had enough data and stopped it. Imagine what would have happened if we had a power-outage and all that data went in the bit-bucket.
After all that you'll want to also make your code be aware of proper spidering etiquette: What are the key considerations when creating a web crawler?
I am building wombat, a Ruby DSL to crawl web pages and extract content. Check it out on github https://github.com/felipecsl/wombat
It is still in an early stage but is already functional with basic functionality. More stuff will be added really soon.
So you want a good Ruby-based web crawler?
Try spider or anemone. Both have solid usage according to RubyGems download counts.
The other answers, so far, are detailed and helpful but they don't have a laser-like focus on the question, which asks for ruby libraries for web crawlers. It would seem that this distinction can get muddled: see my answer to "Crawling vs. Web-Scraping?"
Tin Man's comprehensive list is good but partly outdated for me.
Most websites my customers deal with are heavily AJAX/Javascript dependent.
I've been using Watir / watir-webdriver / selenium for a few years too, but the overhead of having to load up a hidden web browser on the backend to render that DOM stuff just isn't viable, let alone that all this time they still haven't implemented a useable "browser session reuse" to let new code execution reuse an old browser in memory for this purpose, shooting down tickets that might have worked their way up the API layers eventually. (refering to https://code.google.com/p/selenium/issues/detail?id=18 ) **
https://rubygems.org/gems/phantomjs
is what we're migrating new projects over to now, to let the necessary data get rendered without even any sort of invisible Xvfb memory & CPU heavy web browser.
** Alternative approaches also failed to pan out:
how to serialize an object using TCPServer inside?
Can a watir browser object be re-used in a later Ruby process?
If you don't want to write your own, then use any ordinary web crawler. There are dozens out there.
If you do want to write your own, then write your own. A web crawler isn't exactly a complicated activity, it consists of:
Downloading a website.
Locating URLs in that website, filtered however you dang well please.
For each URL in that website, repeat step 1.
Oh, and this seems to be a duplicate of "Web crawler in ruby".