How do I Make a Web Crawling Application User-Friendly

How do I Make a Web Crawling Application User-Friendly - user-interface

I'm creating a web crawling application that I want the 'average' user to be able to use. I'm concerned that a web crawling application is probably just too complex for most users though because users need to:
Understand URL structure (domain, path, etc...).
Understand crawling 'depth'.
Understand file extensions and be able to setup 'filters' to narrow their crawling to achieve better performance (or they'll be frustrated with the program).
Understand where URLs are found in pages (image srcs, links, plain text URLs, etc...).
What can I do to help users get quickly acquainted with my program? Or even better, what can I do so the program is intuitive enough that users just 'get it'? I know this seems pretty broad, but if you can confine your answers to web crawlers that should help. I've read up on general usability, ui design, etc... but I'm struggling with the domain I'm working in. Thanks.

Just because a web crawler is complex in implementation, doesn't mean it has to be complex to use. Only offer what is really necessary, use sensible defaults for the rest. That will get you 80% of use cases, and then rely on the other 20% being more willing for have a deeper understanding.
Why should they have to understand this? Depends on the expected usage, but I would of assumed most uses where crawling a full website, so only the domain is needed.
Gert G's suggestion of a slider with extending folder structure was a good one. This doesn't have to be dynamic with the site in question, just an illustration of what it means.
Forget exposing file extensions, instead offer common types of file with icons, possibly even grouping them (e.g. all common image types, jpg, png, gif, go into one 'images' type). Only give raw file extension settings under an advanced config section, those that need it will understand it.
I don't really see why they need to understand this? Surely that's a job for the crawler.

Some ideas:
Make an interactive user interface (e.g. a slider for depth, which shows a small picture of folders and subfolders opening as they move the slider)
Avoid clutter. Divide the settings into logical tabs.
Make video tutorials for the things you need to teach them.

Perhaps you could have a picture of "the web" showing two or three pages each from two or three websites. As the user selected where to find links (for example, images, plain text, links, etc), the parts of the page they selected would be highligted in the images.

Related

Wordpress theme/image management

So I've been creating a custom responsive theme in Wordpress and I've hit a wall when it comes to image management. I'd like to style images in a way that wordpress doesn't seem to inherently support - I'm looking for something like this:
with the images added via the regular wordpress media management pane, and inserted into posts/pages. The images should be out of the flow of the content but accurately placed next to the correct headers/text blocks. Most importantly, the images ought to collapse into a column with the rest of the content at the correct media query breakpoints.
Here's what I've tried, from worst to best:
Hard coded images in template files
Obviously the worst option. Not portable, requires a lot of meddling, and would be almost impossible to align the images with the correct content. Also, no real way of making the images responsive with the content.
Use the default image styling and abandon the idea of pulling the images out of the regular flow
Non optimal, but it would allow anyone to change/edit images easily.
Remove images from the results of the_content(), then place and style them separate.
Portable, but has the same problems as #1 - difficult to align the images with content and keep responsiveness.
Use the featured image on pages that only require one image
Pretty good option for pages that need ONLY one image, but there is no easy way to make the featured image an arbitrary size/aspect ratio.
Use markup in the editor to correctly layout the images
Requires anyone editing the posts/pages to have some knowledge of the underlying theme. This seems to work the best, but it isn't portable (might break stuff on theme change).
While I've had the best results with this option, it seems sort of antithetical to using a cms/wysiwyg editor in the first place.
My question is whether or not the last option really is the best to get the result I want?

In the end, the answer was clearly custom fields, and none of the other options I listed. With the advanced custom fields plugin, it becomes a breeze to do what I wanted. You don't need the plugin, but it makes image management a whole lot easier, as it fully integrates the wordpress media library with the custom field (which you would have to do manually otherwise). With the plugin, custom fields meet all of my needs (responsiveness, portability, and ease of use for the technically challenged).

CMS WYSIWYG Editors - What techniques do you use to client-proof these types of pages?

This is a topic that may be considered something not necessarily "programming related"; however, I feel it is since I'm asking for specific techniques.
Essentially, as a web developer, I work with a variety of platforms that include a WYSIWYG editor in the backend (TinyMCE, WYGWAM, etc) and one of the selling points of such systems is that it becomes easier to manage your own content because of these tools.
In theory, sounds great, in practice, not so much.
It can be way too easy for a client to break a layout by using many of the advanced features of a WYSIWYG editor. They can start floating things, setting too much margin/padding, etc.
Generally, I have tried to build any of these types of pages with only some sensible default styles applied to a few of the most common tags, such as setting a font size, colors, some margins, and some text decorations.
I would like to know if anyone has used anything more advanced to essentially turn the output of:
$cms->getContent();
...or equivalent into something that is effectively sandboxed and operates entirely agnostic of any other style/layout elements being used.
As often as possible, I express to clients that they should purchase an HTML/CSS book for Dummies and read it so that they aren't deer in headlights when they click "code view" in a WYSIWYG. But I know they don't do this, nor do they hire anyone who has experience, and it ends up allowing a client more control than they should responsibly have.
Plus, it sucks when you are using their sites as work samples to show others knowing they have the ability to take your beautiful design and development and make it look like crap.

A few things:
I have a standard WYGWAM config that I reuse on new sites by importing the exp_wygwam_configs table.
I keep options very limited in the editors
Areas of the page delineated for images should use a File field, with an image resizer like CE Image used to insure proper size
Client training. Make videos with Camtasia or similar tool if you have to.
Use a custom stylesheet for WYGWAM that has a small subset of styles, so they can choose h2...h4, for example, but not h1 or h5.

After encountering a lot of issues with WYSIWYG editors (which, by the way, never reflect accurately what you "get" in the end), I now prefer to leave only the most basic formatting features in the editor's configuration. For example, take a look at stackoverflow's editor.
It's got the following features: bold, italic, link, quote, pictures, lists, and alignments. The only special feature here are code sample and html, which are targeted to this site's audience. Most of your client don't need them.
I think it's the best approach, because if you give your clients the feeling that they can do whatever they want in the page, but in the end, this content is filtered when the page is rendered, they are going to be really frustrated. Not to mention the fact that the site will be slowed by the filtering process and the need to put the filtered content in cache.
Sometimes the client indeed wants to have a special layout in a page, but I think that can be best done by customizing the CMS so that it fits the client need.

How can I extract images from a site that I'm linking to?

If you're familiar with Reddit, you'll know how all of their posts containing pictures get a small thumbnail preview beside the title of the submission. How does Reddit go about doing that? Does it just check to see if the link ends with .jpg, .png, .bmp, etc?

reddit will try to pull a thumbnail from any source--not just an image URL. This is done firstly by having set rules for specific sites, and secondly by having one generic process for retrieving thumbnails for unknown URLs--and is an automated periodic task.
One of the (many) benefits of reddit is that the source code is open, and if you understand Python, you should check out /r2/lib/scraper.py for a more detailed view at how this process works.
Also, while StackOverflow is a great place to have programming-related questions answered, you might also want to check out reddit's own /r/redditdev for information on reddit development.

Indeed, if the URL contains .jpg, .png,
etc., use that.
If the site is a
popular domain (flickr.com,
youtube.com, amazon.com, etc.), have
a set of predefined rules to extract
something you know will be relevant
(may it be the featured image, YouTube
thumbnail, Amazon product image,
etc.)
Otherwise, if all you have to
work with is some HTML, you'll have to dig it out yourself. You could choose the
first one on the page, the biggest by size,
or even the one you've algorithmically
determined to be the most relevent (e.g. relatively big, inside what you think is the main body content.)
If you have to resort to the last option, one technique I'd recommend is to extract multiple images, and A/B test them to find the one which has the best click-through rate. That way you can nearly always get the best one.

You can check for the content of the <img> tag.

What is needed in a web site's wireframe?

I'm going to be going to be meeting with a number of programmers and custom software companies to get bids on creating a website for a company that I'm involved with. My question is this: What should I prepare for the programmers so that they can give me an accurate bid, timetable, etc. for the development of the website? I have a clear picture of how I would like the site to work and the features that I would like to have included.

I'd suggest using something like balsamiq to put some simple sketches together as suggested elsewhere.
Quite often the act of putting your requirements down on paper in a way that represents the actual site will flush out all manner of issues you hadn't considered before, and will give you a much clearer understanding of what you're after.
Also consider the sources of the data you're displaying. From a functional spec aspect, simply saying something like 'show this figure here' is easy. From a programming point of view, coming up with the figure in the first place is often the hard bit.

The best you can do is to put the end-user's hat on and describe what you'd like the system to look like / work.
Imagine all pages and create a new frame for each one. Make as many annotations as you can so all bidders know exactly what you are expecting.
I'd also add at the end if it's likely the site's requirements to change during development, so everybody's warned in advance.

Details Details Details.
You might think you have a clear picture, you don't. You need to write every single step down no matter how trivial. You will see there are things you haven't thought of.
Try to write down as much information as you can think of. Go through all the scenarios a user would when using your site. Use steps such as
1) User clicks on Buy Button
2) Screen shows up with 4 items, Link to details, price, quantity and a 32x32 thumbnail.
2a) If User clicks on thumbnail full resolution image i s displayed
etc etc.
Don't try to gloss over the "simple" stuff and you will get the most accurate bid possible!

Basically draw out what you want (ie textboxes, drop down lists, controls, etc) in very simple manner. Then add little numbers around each area that has some functionality. In the margins or on another sheet, describe each point you numbered on the controls with simple instructions on how that functionality should work.
Think of it as a skeleton to describe the application you want.

Not a complete list, but here a couple of thoughts:
Do not forget the back button.
Back button behaviour is an issue on every site I've ever worked on. Specify exactly what you want to happen on every page if the user gets to that page by hitting the back button. Often it's easy, but sometimes it is not at all trivial.
Security:
Do people need to log on, how, how do you create accounts, reset passwords etc. What pages need you to be logged on, what happens if you hit those pages without being logged on.

You could read Joel Spolsky's Painless Functional Specifications for ideas, but I've just tried to summarise what that means for web software too.
I usually do this in 3 stages:
a list of contents, under the headings they'll appear on the site. Get this firmly agreed by all parties before doing any wireframes;
a greyscale functional wireframe in plain HTML/CSS, using examples of real-world content and dummy static pages for dymanic content, with everything where it should be. This is the first thing programmers want to see;
a purely visual graphic mockup of each type of page - this is the next thing programmers like to see, as in 'show me how you want it to look and I'll make it happen'.

How to make my applications "skinnable"?

Is there some standard way to make my applications skinnable?
By "skinnable" I mean the ability of the application to support multiple skins.
I am not targeting any particular platform here. Just want to know if there are any general guidelines for making applications skinnable.
It looks like skinning web applications is relatively easy. What about desktop applications?

Skins are just Yet Another Level Of Abstraction (YALOA!).
If you read up on the MVC design pattern then you'll understand many of the principles needed.
The presentation layer (or skin) only has to do a few things:
Show the interface
When certain actions are taken (clicking, entering text in a box, etc) then it triggers actions
It has to receive notices from the model and controller when it needs to change
In a normal program this abstraction is done by having code which connects the text boxes to the methods and objects they are related to, and having code which changes the display based on the program commands.
If you want to add skinning you need to take that ability and make it so that can be done without compiling the code again.
Check out, for instance, XUL and see how it's done there. You'll find a lot of skinning projects use XML to describe the various 'faces' of the skin (it playing music, or organizing the library for an MP3 player skin), and then where each control is located and what data and methods it should be attached to in the program.
It can seem hard until you do it, then you realize it's just like any other level of abstraction you've dealt with before (from a program with gotos, to control structures, to functions, to structures, to classes and objects, to JIT compilers, etc).
The initial learning curve isn't trivial, but do a few projects and you'll find it's not hard.
-Adam

Keep all your styles in a separate CSS file(s)
Stay away from any inline styling

It really depends on how "skinnable" you want your apps to be. Letting the user configure colors and images is going to be a lot easier than letting them hide/remove components or even write their own components.
For most cases, you can probably get away with writing some kind of Resource Provider that serves up colors and images instead of hardcoding them in your source file. So, this:
Color backgroundColor = Color.BLUE;
Would become something like:
Color backgroundColor = ResourceManager.getColor("form.background");
Then, all you have to do is change the mappings in your ResourceManager class and all clients will be consistent. If you want to do this in real-time, changing any of the ResourceManager's mappings will probably send out an event to its clients and notify them that something has changed. Then the clients can redraw their components if they want to.

Implementation varies by platform, but here are a few general cross-platform considerations:
It is good to have an established overall layout into which visual elements can be "plugged." It's harder (but still possible) to support completely different general layouts through skinning.
Develop a well-documented naming convention for the assets (images, HTML fragments, etc.) that comprise a skin.
Design a clean way to "discover" existing skins and add new ones. For example: Winamp uses a ZIP file format to store all the images for its skins. All the skin files reside in a well-known folder off the application folder.
Be aware of scaling issues. Not everyone uses the same screen resolution.
Are you going to allow third-party skin development? This will affect your design.
Architecturally, the Model-View-Controller pattern lends itself to skinning.
These are just a few things to be aware of. Your implementation will vary between web and fat client, and by your feature requirements. HTH.

The basic principle is that used by CSS in web pages.
Rather than ever specifying the formatting (colour / font / layout[to some extent]) of your content, you simply describe what kind of content it is.
To give a web example, in the content for a blog page you might mark different sections as being an:
Title
Blog Entry
Archive Pane
etc.
The Entry might be made of severl subsections such as "heading", "body" and "timestamp".
Then, elsewhere you have a stylesheet which specifies all the properties of each kind of element, size, alignment, colour, background, font etc. When rendering the page or srawing / initialising the componatns in your UI you always consult the current stylesheet to look up these properties.
Then, skinning, and indeed editing your design, becomes MUCH easier. You simple create a different stylesheet and tweak the values to your heat's content.
Edit:
One key point to remember is the distinction between a general style (like classes in CSS) and a specific style (like ID's in CSS). You want to be able to uniquely identify some items in your layout, such as the heading, as being a single identifiable item that you can apply a unique style to, whereas other items (such as an entry in a blog, or a field in a database view) will all want to have the same style.

It's different for each platform/technology.
For WPF, take a look at what Josh Smith calls structural skinning: http://www.codeproject.com/KB/WPF/podder2.aspx

This should be relatively easy, follow these steps:
Strip out all styling for your entire web application or website
Use css to change the way your app looks.
For more information visit css zen garden for ideas.

You shouldn't. Or at least you should ask yourself if it's really the right decision.
Skinning breaks the UI design guidelines. It "jars" the user because your skinned app operates and looks totally different from all the other apps their using. Things like command shortcut keys won't be consistent and they'll lose productivity. It will be less handicapped accessible because screen readers will have a harder time understanding it.
There are a ton of reasons NOT to skin. If you just want to make your application visually distinct, that's a poor reason in my opinion. You will make your app harder to use and less likely that people will ever go beyond the trial period.
Having said all that, there are some classes of apps where skinning is basically expected, such as media players and imersive full screen games. But if your app isn't in a class where skinning is largely mandated, I would seriously consider finding other ways to make your app better than your competition.

Depending on how deep you wish to dig, you can opt to use a 'formatting' framework (e.g. Java's PLAF, the web's CSS), or an entirely decoupled multiple tier architecture.
If you want to define a pluggable skin, you need to consider that from the very beginning. The presentation layer knows nothing about the business logic but it's API and vice versa.

It seems most of the people here refer to CSS, as if its the only skinning option.
Windows Media Player (and Winamp, AFAIR) use XML as well as images (if neccesary) to define a skin.
The XML references hooks, events, etc. and handles how things look and react. I'm not sure about how they handle the back end, but loading a given skin is really as simply as locating the appropriate XML file, loading the images then placing them where they need to go.
XML also gives you far more control over what you can do (i.e. create new components, change component sizes, etc.).
XML combined with CSS could give wonderful results for a skinning engine of a desktop or web application.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio