Dynamic scraping and parsing [closed] - ruby

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Original Question rephrased:
I know a good amount PHP, Js, CSS, OOP and have recently honed my regex skills by using the vim editor's netrw and elinks plugins to download a series of web pages (about a million lines) that were parsed and made ready for uploading into my website. I work on a linux/ubuntu system, a localhost setup and this particular project is implementing the Concrete5 cms - which is written in PHP.
Seeing the benefits of scraping and parsing information, I would like to have my site dynamically perform this function, though on a much smaller scale; such as, enabling my new user to transfer their personal information from another website into mine - which will typically be under a secure connection (though not always) and password.
Question: What is the best tool (scripting language) to use for this? I do not know either Perl or Ruby but I believe either one of those would be a good choice. I have also heard AWK and SED. I'm sure I can figure out HOW to do it once I begin studying the language. I would really appreciate some experienced input on which language would be the best to begin investing my time into learning it.
Thanks for your help.

I would strongly recommend Ruby and Capybara for web scraping. (See the non-test related examples toward the bottom of the capybara page). Reasons:
Simple, short scraping syntax, cookie support, js support.
Ruby has many other uses, a friendly syntax, and an active job market.
Capybara has multiple supported drivers. You can run a real browser (visibly), a real browser headlessly (invisibly) so javascript sites work. With the same code, you can toggle the driver to run http requests with no js (mechanize) for speed. This helps you overcome many hurdles (like needing to run JS/Ajax), needing to see the interaction, etc. with a change to a single line of code (Capybara.current_driver = :some_driver).
Drivers: Capybara-Webkit, Capybara-Mechanize
Ability to use CS, or Xpath selectors, whatever you're comfortable with.
Active development, and an ecosystem growing rapidly around the underlying technologies.

Perl has two very nice ready-to-use tools for scraping that I know of: Web::Scraper and Scrappy. Both are able to work with CSS3 and XPath selectors for identifying elements; Scrappy builds on Web::Scraper and adds integrated scraping and crawling, with a nice URL-matching system to select the links to follow to gather more information, (while Web::Scraper works with a single document). It moves between pages using the well-established and robust WWW::Mechanize library, which is smart, reliable, and aware of authentication and cookies.
If you want to get into the lower level yourself, there are a lot of good tools to build on, including the aforementioned WWW::Mechanize, HTML::TreeBuilder, HTML::TreeBuilder::XPath, HTML::TableExtractor and more.

Related

Ruby tools to set up a personal website [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'm trying to find a good set of tools to be able to implement my personal website.
The must have:
The site or its generator must be Ruby based
It must be easy to deploy and maintain
The nice to have:
It should be typographically clean and beautiful
It should have html5/css3 capabilities
I was thinking about having a go directly with rails 3 but it seemed somehow overkill.
EDIT
The content will be a mix of portfolio and blogging.
What are you rubysts using? is it working well?
You didn't really specify how exactly your site is going to be in terms of static/dynamic content etc, so all one can really do is list some options:
Sinatra
Padrino
Ramaze
nanoc
Stasis
Camping (thanks fl00r)
At work we use Rails, for my private projects I tend to use Sinatra and am very happy with its minimalism. I am however planning to do something with Padrino soon, since it seems to be positioned in a nice niche between Sinatra and Rails.
I'm currently using Nanoc, and I'd definitely recommend starting with a static site generator. This almost completely cuts out many types of issue. It also enables you to store your content as text files on a filesystem, rather than dealing with a database and special editor interfaces.
If you need server-side programming then move up to something that uses Git as the storage, again really to avoid locking your content into a database.
It's well worth looking at Compass to help you with the CSS - Compass will work with whatever you choose. Compass does require you to spend a little time learning it, but can makes CSS much easier in the longer-term. For example, it has helpers that let you set up CSS3 effects.
Jekyll seems to be what the cool kids use these days. It's a generator, not a CMS.
You can find lots of "open source" sites online with various setups (see here)
My Solution
After too much thinking I ended up using Nesta CMS as envisioned in this Peepcode blog
article.
Actually I use a home made scss version of http://semantic.gs and the html5 boilerplate layout.
Nesta is now plugin capable and has two wonderful plugins available:
Blogazine which helps you obtain the peepcode blog solution
Maldini which generates citations and reference lists from BibTeX files
Thanks everyone for sharing your thoughts.

UI Automation Testing Tools [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 11 months ago.
Improve this question
I'm working on UI automation.
We are using the following tools.
Bewildr
Snoop
Our WPF application uses a custom framework developed by the company. Many of the buttons are generated dynamically. For example, the controls that have ID guids, get new ID guids every time I run the program. Many controls don't have names.
Are there any other tools which might be worth a look?
Is this commercial or personal - ie do you have a budget? That'll affect whether you might consider the Mercury or HP suites, or just go straight to opensource ;)
http://en.wikipedia.org/wiki/List_of_GUI_testing_tools provides a good list of GUI testing tools. AutoIT is nice and easy to learn and use, especially if you're a coder anyway. Phantom AL and IcuTest are both useful for WPF applications.
If you have a budget, there's not much better than the Mercury/HP toolsets - QTP (QuickTest Pro) and WinRunner - the former uses VBScript while the later uses a custom Test Script Language - very clever for quickly writing tests.
I won't provide links to them all as the Wiki article already has that, but I hope that helps.
As for targeting the names, hypothetically you could work out the order in which they're being loaded and tab through them that way, ignoring names and guids. Alternatively you could send clicks to targetted coordinates on the app if you know where the buttons are going to be.
Mark,
There's nothing you mention that bewildr can't already do. Even if you don't know the name, id or even the type of object, you can always get elements dynamically using the .children method... See this for a brief intro: http://www.natontesting.com/2010/11/27/bewildr-0-1-7/
...and here for code examples:
https://github.com/natritmeyer/bewildr/blob/82cd1e907484583be26bc22024ca6a8f34c0d6a4/features/step_definitions/hierarchy_steps.rb
#Jon Abaca
As my knowledge, It depends on which interfaces you are going to test(mobile/web) and you are going to test those applications with code knowledge staff or not.
with less knowledge of coding, mobile testing, cross browsing and ci/cd. you can go with Katalon-studio, yes it's free
or else better to go with Selenium.
https://github.com/last-hit-aab/last-hit is a UI automation testing tool for chrome developer to test their web site without change test script

Which language should I use to program a GUI application? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I would like to write a GUI application for management of information (text documents). In more details, it should be similar to the TiddlyWiki. I would like to have there some good visual effects (like nice representation for three structures, which you can rotate, some sound). I also would like to include some communication via Internet (for sharing and collaboration). In should include some features of such applications as a web browser, word processor, Skype.
Which programming language should I use?
I like the idea of usage of JavaScripts (like TddlyWiki). The good thing about that, is that user should not install anything. They open a file in a browser and it works! The bad thing is that JavaScript cannot communicate via internet with other applications.
I think the choice of the programming language, in my case, id conditioned by 2 things:
What can be done with this programming language (which restrictions are there).
How easy to program. I would like to have "block" which can do a lot of things (rather than to program then and, in this way, to "rediscover a bicycle")
ADDED:
I would like to make it platform independent.
There is no simple solution in 2010.
If you want to make your GUI platform independent, you have these options:
Run it as a JavaScript application inside the browser with a server running a program + database you like. Hard to get to work but the most simple solution for your users. There are good editors like CKEditor but they use HTML underneath, and sometimes, they are slow or weird. Also, they are absolutely unsuited for large amounts of text.
Use Java. Java is available for many platforms but not all. It comes with an UI framework called Swing that could be better. Java offers a huge set of frameworks and libraries. Most are free to use but it will take some time for you to select the best ones in your case. Plus: So far, there are no good text editor components in Java. So you either have to buy one or you must live with some ... oddities.
Use .NET/Mono. Not available right away for many platforms but you can find binary installers for Mono for the major ones (Linux and Mac) and Mono is available as source, so your fans can build versions for their favorite OS themselves. There are pretty good editor components for .NET but almost everything for .NET is either not free (as in freedom) or costs money.

Headless, scriptable Firefox/Webkit on linux? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
I'm looking to automate some web interactions, namely periodic download of files from a secure website. This basically involves entering my username/password and navigating to the appropriate URL.
I tried simple scripting in Python, followed by more sophisticated scripting, only to discover this particular website is using some obnoxious javascript and flash based mechanism for login, rendering my methods useless.
I then tried HTMLUnit, but that doesn't seem to want to work either. I suspect use of Flash is the issue.
I don't really want to think about it any more, so I'm leaning towards scripting an actual browser to log in and grab the file I need.
Requirements are:
Run on linux server (ie. no X running). If I really need to have X I can make that happen, but I won't be happy.
Be reliable. I want to start this thing and never think about it again.
Be scriptable. Nothing too sophisticated, but I should be able to tell the browser the various steps to take and pages to visit.
Are there any good toolkits for a headless, X-less scriptable browser? Have you tried something like this and if so do you have any words of wisdom?
What about phantomjs?
I did related task with IE embedded browser (although it was gui application with hidden browser component panel). Actually you can take any layout engine and cut output logic. Navigation is should be done via firing script-like events.
You can use Crowbar. It is headless version of firefox (Gecko engine). It turns browser into RESTful server that can accept requests ("fetch url"). So it parse html, represent it as DOM, wait defined delay for all script performed.
It works on linux. I suppose you can easily extend it for your goal using JS and rich XULrunner abilities.
Have you tried Selenium? It will allow you to record a usage scenario, using an extension for Firefox, which can later be played back using a number of different methods.
Edit: I just realized this was a very late response. :)
Have a look at WebKitDriver. The project includes headless implementation of WebKit.
I don't know how to do flash interactions (and am also interested), but for html/javascript you can use Chickenfoot.
And to get a headless + scriptable browser working on Linux you can use the Qt webkit library. Here is an example use.
To accomplish this, I just write Chrome extensions that post to CouchDBs (example and its Futon). Add the Couch to the permissions in the manifest to allow cross-domain XHRs.
(I arrived at this thread in search of a headless alternative to what I've been doing; having found this thread, I'm going to try Crowbar at some point.)
Also, considering the bizarre characteristics of this website, I can't help wondering whether you can exploit some security hole to get around the Flash and Javascript.

Best way to create GOOD LOOKING, multi-platform, desktop Ruby apps? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I've got an idea for an idiotically simple application, one that converts HAML and SASS into HTML & CSS files for the user by watching directory changes (like Compass). Almost all the components are already available in the community, I just need to figure out what to use for the front-end.
The catch:
It must be:
a standalone app (i.e. users must NOT be required to install Ruby or HAML),
that looks good,
and is available in several platforms (linux, mac, windows).
So far I know very little about:
RubyScript2Exe: which packages ruby applications for you
Adobe AIR: desktop-style web-development...but is it easy to integrate with Ruby?
Adobe Flex: Is this only for web-based development?
Java /jRudy: (I get scared just thinking about it)
FXRuby: a ruby GUI toolkit which is unfortunately too old-fashioned (read 'ugly') to attract the audience I'm looking to target (designers and HTML developers....no, I'm not planning to charge for it, just want to make an attractive app)
Shoes: Another ruby-based GUI toolkit that may or may not suffice...is there a GUI
builder for this?
Of course, other options are more than welcome.
If you provide an answer, please be kind enough to also leave a link to a good starter tutorial that integrates Ruby and your technology of choice?
I recently had to decide on a Windowed front end for a simple app. I looked into FXRuby, TKRuby, Shoes and WXRuby.
Shoes was the only one that helped me make my app. The rest were (probably) more powerful but the cost in complexity (compared to Shoes) seemed vast. I had never had to sit down and work with a big ugly window API before and didn't want to learn one just to achieve my simple report generator. It wasn't clear how to take code for these API's and reliably generate an executable. Shoes' built in packager works nicely for me.
The one problem I had with Shoes was the trouble getting documentation. I eventually learned that running shoes -m launches a shoes app which acts as a very useful manual. The official tutorial is a worthwhile (and short) read. That's located here.
Shoes served me well and will be my first port of call on any simple utility i choose to make in the future.
have you had a look at titanium desktop? might be what your looking for
Oh, hotness flows from my pores about this question. I believe the future of the internet lies over thisaway Cappuccino. I know it sounds like a plug but I swear, I'm just impressed as hell by 280slides and Atlas. A web framework that's built using Cocoa's interface builder and can be compiled for both Cocoa natively as well as a kickass web page by a simple drop-down box? Hot hot hot. Boiling maybe?
Limelight is another alternative. It's JRuby based and available as a binary install for Windows and OSX, or as gem for any platform. There is a tutorial and screencast linked on the Limelight homepage.
I haven't used it, but thought it was worth a mention (I did download for Windows, but couldn't get it to launch - I suspect my work proxy is causing problems).
My vote would be for Shoes as well.

Resources