Pretty easy question I hope: does anyone know of a tool that will effectively scrape sites built with Microsoft Matrix? I could write the code in python, but it will take me way longer than I think I want to dedicate to the task, namely because of the really bad and ugly HTML generated by Matrix.
I have tried Web Harvey, Helium Scraper, and I tried the Web Scraper plugin for Chrome. WebHarvey choked on the HTML and couldn't load subsequent pages. Helium Scraper was able to move from one details page to another (the Next links were followed) but content from within the details pages was not lifted out. The Chrome plugin web scraper was not able to navigate links, with the popup window displaying an error page. My gut is telling me that this has to do with uniquely ASP.net things, but I could be wrong.
Any pointers or suggestions appreciated.
You know there are two completely different versions of Microsoft Web Matrix right? There's the one from 2003; i have no idea what its html looks like. There's the one from 2011 to current which uses razor cshtml source files to produce its html. In the 2011+ one, you write the html by hand; there's no drag and drop, and so it's unlikely you'll get consistent html from site to site.
I was thinking about the applications of web scraping (still quite new to it) and came up with a question. Can you get an image from a page if there are advertisements on the page (like can you avoid advertisements and only look for the correct image content on the page)? Also, if the image is also a link to another page, can you say go to the next page and get that image (and then go from there until you either reach a certain amount or get all of the images)? This would mean avoiding going to the advertisements pages.
Absolutely. If you use a tool like kimonolabs.com this can be relatively easy. You click the data that you want on the page, so instead of getting all images including advertisements, Kimono uses the CSS selectors of the data you clicked to know which data to scrape.
You can use Kimono to scrape data within links as well. It's actually a very common use. Here's a break-down of that strategy: https://help.kimonolabs.com/hc/en-us/articles/203438300-Source-URLs-to-crawl-from-another-kimono-API
This might be a helpful solution for you, especially if you're not a programmer because it doesn't require coding experience. It's a pretty powerful tool.
I think if you are ok with PHP programming then give a look into php simple html dome parser. I have used it a lot and scrapped number of websites.
I want to build an OS X application, in which one of the requirements is for the user to be able to generate PDF output according to a layout that they, the user, will create. Typical items on the page would be things like a corporate logo (a JPEG or PNG), an address (a block of text) and a narrative (another block of text).
I'd like the user to be able to move and resize the items using the mouse to drag handles around on-screen.
Is there an Interface Builder object that will let me do that, or some third-party library that exists for this purpose?
Try GCDrawKit if you're looking for a drop-in solution. It's still in beta (and has been for ages) but you might find it useful.
You seem to be looking for an all-encompassing, self-contained "Pages" control or some sort of reporting suite. That's asking a bit much.
There is nothing in the Cocoa frameworks that gives you this. Unfortunately, there's no Cocoa equivalent of Crystal Reports either. You'll have to roll your own.
I suggest using standard CSS / HTML templates with WebKit. The only drawback is WebKit doesn't yet support CSS pagination, so there's no concept of "8.5"x11" page 1...15" but it's the closest you'll come without writing your own Pages application (NOT an easy project by any stretch of the imagination).
I talked to a friend of mine and he told me that it's possible to create an image in an image editor (gimp/photoshop) and then use it as a button . He said that's the way applications that have great GUIs do it.
He also said that there is a file describing which parts of the image make up the button.
Is this possible , or is he "crazy"? :)
This needs to be clarified with a language of choice, etc. In general, most languages (WinForms, Java AWT/SWT, etc) have an image or background image property that allows you to use images for buttons. There are even skinning frameworks that will let you use images for all controls in an easy-to-define manner.
If you are talking about HTML, there is a button input type that can allow an image to be used as a button for a form.
#Vhaerun
CodeProject is a good place to find lots of skinning libraries. I used this one a long time ago. Winamp is a great example of a skinned application, where users can actually create their own templates to completely change the look of the application without changing code whatsoever. Actually, most media players have some sort of skinning available.
You can do anything, especially since you have no constraints re language, environment, etc.
No he is not crazy, you can use images on almost all GUI tools instead of buttons, they are generally an image on the button, or in some cases you can put the image on the screen and have an onclick event assigned to it.
You haven't been very specific with your question so nobody is able to give you a definitive answer, but here's an attempt to do so without demeaning you:
It's quite common for graphics designers (using tools like photoshop, gimp, etc.) to participate alongside developers for both desktop and web based applications. Web based applications can easily capture information about when an image is clicked and frequently people will either design the button with the text in the image file itself, or use background pictures/borders with plain text on top. There is not standard, per se, on how this is accomplished on the web, but plenty of sites serve as an example (try using Firebug with FireFox to inspect other sites and see how they do things).
If the circumstance at hand is desktop oriented then the answer becomes much more complicated. Skinning is accomplished in many way and, depending on platforms and libraries being used, implementation specifics vary greatly. In it's most simple terms, most GUI frameworks (like GTK, QT, Windows Forms, Windows Presentation Foundation) include a basic picture control, and this control can usually process a "Click" event, which would allow it to function as a button, but if you want different states (pressed, disabled, etc.) you will have to invest more effort in such a thing; you also won't find this method suitable for replacing the rendering of all buttons in an application, but rather something you would do manually for each one, or write your own custom button control that uses your assets specifically.
In terms of a file describing different images that combine as described in the file to override the rendering of the button this would lead me to believe you are either working with an already existent application that is skinable (like Firefox or Winamp) or that he is speaking of some specific UI toolkit. I'm not aware of this functionality being generally available in most of the common system-level UI toolkits.
In the future you may wish to be more specific with your questions.
In HTML, you could do:
<input type="button" src="/path/to/image.png" />
Alternately, assigning an onclick event to an image causes that image to work similarly to a button:
<img src="/path/to/image.png" onclick="function(){doSomething();}" />
If you're talking HTML you can use <input type="image" src="myfile.png" />
Specifications here
Imagemaps I guess.
No seperate file describes the map, it is all part of the html document.
http://www.w3schools.com/TAGS/tag_map.asp
What Firefox add-ons do you use that are useful for programmers?
I guess it's silly to mention Firebug -- doubt any of us could live without it. Other than that I use the following (only listing dev-related):
Console2: next-generation error console
DOM inspector: as the title might indicate, allows you to browse the DOM
Edit Cookies: change cookies on the fly
Execute JS: ad-hoc Javascript execution
IE Tab: render a page in IE
Inspect This: brings the selected object into the DOM inspector
JSView: display linked javascript and CSS
LORI (Life of Request Info): shows how long it takes to render a page
Measure IT: a popup ruler.
URL Params: shows GET and POST variables
Web Developer: a myriad of tools for the web developer
Here are mine (developer centric):
FireBug - a myriad of productivity enhancing tools, includes javascript debugger, DOM inspector, allows you to edit the CSS/HTML on the fly which is highly valuable for troubleshooing layout and display problems.
Web Developer - again another great developer productivity tool. I mostly use it for quickly validating pages, disabling javascript (yes I disable javascript sometimes, don't you?), viewing cookies, etc.
Tamper Data - lets you tamper with http headers, form values, cookies, etc. prior to posting back to a page, or getting a page. Incredibly valuable for poking and prodding your pages, and seeing how your web app responds when used with slightly malicious intent.
JavaScript Debugger - has a few more features than javascript debugger provided by firebug. Although I must admit, I sparingly use this one since firebug has largely won me over.
Live HTTP Headers - invaluable for troubleshooting, use it frequently. Lets you spy on all HTTP headers communicated back and forth between client and server. It has helped me track down nefarious problems, especially when debugging issues when deploying your web app between environments.
Header Spy - nice addon for the geeky types, shows you the web server and platform a web site runs on in the status bar.
MeasureIt - I don't use this all too frequently, but I've still found it valuable from time to time.
ColorZilla - again, not something I use all that frequently, but when I need it, I need it. Valuable when you want to know a color and you don't want to dig through a CSS file, or open up a graphics editing app to get a color embedded in some image.
Add N Edit Cookies - this has been a great debugging tool in web farms where the load balancer writes a cookie, and uses the cookie value to keep your session "sticky". It allowed me to switch at will between servers to track down problems on specific machine. Also a good tool if you want to try to mess with a site that uses cookies to track your login status/account, and you want to see how your code responds to malformed or hacked info.
Yellowpipe Lynx Viewer Tool - yeah I know what your thinking, lynx, who needs it, its so 1994. But if you are developing a site that needs to take web accessibility into account (meaning accessible to users with visual impairments who use screen readers), or if you need to get a sense of how a web spider/indexer "sees" your site, this tool is invaluable. Granted, you could always just go out and grab Lynx for yourselfhere's the windows xp port that I use.
I've got a handful of other addons that I've used from time to time that I'll just quickly mention: FireFTP (one I installed wasn't stable and I've not tried a newer release), Html Validator (also found this one unstable, least back when I installed like a year ago), IE Tab (I usually just have both IE and FireFox open concurrently, but that is just me, I know many others that find this addon useful).
I'd also recommend the Web Developer extension by Chris Pederick.
As far as web development, especially for javascript, I find Firebug to be invaluable. Web developer toolbar is also very useful.
The ones I have are...
Y-SLow
Live Headers
Firebug
Dom Inspector
One that wasn't mentioned yet is this HTML Validator extension that I found very useful.
#Flávio Amieiro
MeasureIt is an unnecessary extension to have if you install the Web Developer Toolbar. Web Developer Toolbar includes a ruler as one of its features. Under the "Miscellaneous" category for Web Developer click the option "Display Ruler" to use a ruler identical to the MeasureIt one.
That will allow you reduce the number of extensions needed by at least one.
Firefox addons:
FireBug:helps web developers and designers test and inspect front-end code. It provides us with many useful features such as a console panel for logging information, a DOM inspector, detailed information about page elements, and much, much more.
Web Developer-gives you the power disable CSS, edit CSS on the fly, measure certain areas of a page and much more.
ColorZilla
Just click on the icon, hover over the area you'd like to know the hex color for, and click.
Window Resizer
to make sure the layout is displayed properly in the standard resolutions of today.
Total Validator
validating websites much easier by checking HTML, links, CSS and doing a lot more.
Web Developer for web development. Scribefire if you're a blogger-progammer
For web developing I use the Web Developer Toolbar, CSS Viewer and MeasureIt.
But I'm really not one of those who has a thousand of extensions to do everything. I like to keep things simple.
EDIT: Thanks to Dan's answer I don't need MeasureIt anymore. Can't believe I've never seen that! I guess I'll just have to pay more atention to this WebDeveloper toolbar.
Adding to everyones lists, Tamper Data is quite useful, lets you intercept requests and change the data in them.
It can be used to bypass javascript validation and check whether the server side is doing its thing.
I use Web Developer, it's a real time saver.
+1 for LORI ("life-of-request-info"). It's a very convenient alternative for rough measurements of the load time of a particular web page -- the kind of thing that you might otherwise use an external stopwatch for.
New Tab Homepage. Combined with a "speed dial"-type homepage (a personal, fast-loading page of links that you use frequently), helps you get where you're going faster when you open a new browser tab.
LastTab. Changes the behavior of Ctrl+Tab to let you navigate back and forth between your most-recently-used tabs with repeated presses of Ctrl+Tab, the same way that Alt+Tab works in Windows. Also provides a nice view of all open tabs while Ctrl is still being held down for easy navigation. (The resultant behavior is very similar to the Ctrl+Tab behavior in recent releases of Visual Studio.)
FireFTP is good for grabbing/uploading any necessary files.
I find Hackbar to be quite useful. Very useful if you want to edit the querystring part of the url, to test for vulnerabilities, or just general other types of testing where you might end up with complicated query string values.
I was learning DOM inspector, but I've switched to Firebug.
Some of which has been missed above are here
Load Time Analyzer – View detailed graphs of the loading time of web pages in firefox. The graphs display events like page requests, image loading times etc.
Poster – A must have tool for web developers enabling them to interact with web services and other web resources.
Aardvark – A cool extension for web developers and designers, allows them to view CSS attributes, id, class by highlighting page element individually.
Fiddler is a really great debugging proxy. Think of it as a more powerful version of the "Net" panel in Firebug or the Live HTTP headers.
It used to be an IE-only extension, now it also has hooks into Firefox.
Groundspeed, is useful for testing server side code. It was created for input validation tests during pentest, but can be useful for any test that require manipulating input (similar to TamperData).
It lets you control the form elements in the page, you can change their type and other attributes (size, lenght, javascript event handlers, etc). So for example you can change a hidden field or a select to a textbox and then enter any value to test the server response and stuff like that.