Related
I am trying to build a price comparison program for personal use (and for practice) that allows me to compare prices of the same item across different websites. I have just started using the Scrapy library and played around by scraping websites. These are my steps whenever I scrape a new website:
1) Find the website's search url, understand its pattern, and store it. For instance, Target's search url is composed by a fixed url="https://www.target.com/s?searchTerm=" plus the search terms (in parsed url)
2)Once I know the website's search url, I send a SplashRequest using the Splash library. I do this because many pages are heavily loaded with JS
3)Look up the HTML structure of the results page and determine the correct xpath expression to parse the prices. However, many websites have results page in different formats depending on the search terms or product category, changing thus the page's HTML code. Therefore, I have to examine all the possible results page's formats and come up with an xpath that can account for all the different formats
I find this process to be very inefficient, slow, and inaccurate. For instance, at step 3, even though I have the correct xpath, I am still unable to scrape all the prices in the page (sometimes I also get prices of items that are not present in the HTML rendered page), which I dont understand. Also, I dont know whether the websites know that my requests come from a bot, thus maybe sending me a faulty or incorrect HTML code. Moreover, this process cannot be automated. For example, I have to repeat step 1 and 2 for every new website. Therefore, I was wondering if there was a more efficient process, library, or approach that I could use to help me finish this program. I also heard something about using the website's API, although I dont quite understand how it works. This is my first time doing scraping and I dont know too much about web technologies, so any help/advice is highly appreciate!
The most common problem with crawling is that in general, they are determining everything to be scraped syntactically, while conceptualizing the entities you are to be working with helps a lot, I am speaking from my own experience.
In a research about scraping I was involved in we have reached to the conclusion that we need to use a semantic tree. This tree should contain nodes, which represent important data for your purpose and a parent-child relation means that the parent encapsulates the child in the HTML, XML or other hierarchical structure.
You will therefore need some kind of concept about how you will want to represent the semantic tree and how it will be mapped with site structures. If your search method allows you to use the logical OR, then you will be able to define the same semantic tree for multiple online sources.
On the other hand, if the owners of some sites are willing to allow you to scrape their data, then you might ask them to define the semantic tree.
If a given website's structure is changed, then using a semantic tree more often than not you will be able to comply to the change by just changing the selector of a few elements, if the semantic tree's node structure remains the same. If some owners are partners in allowing scraping, then you will be able to just download their semantic trees.
If a website provides an API, then you can use that, read about REST APIs to do so. However, these APIs are probably not uniform.
What 3 things would you tell developers new to XPages to do to help maximize the performance of their XPages apps?
Tim Tripcony had given a bunch of suggestion here.
http://www-10.lotus.com/ldd/xpagesforum.nsf/topicThread.xsp?action=openDocument&documentId=365493C31B0352E3852578FD001435D2#AEFBCF8B111E149B852578FD001E617B
Not sure if this tipp is for beginners, but use any of the LifeCyclePhaseListeners from the OpenNTF Snippets to see what is going on in your datasources during a complete or partial refresh (http://openntf.org/XSnippets.nsf/snippet.xsp?id=a-simple-lifecyclelistener-)
Use the extension Library. Report Bugs ( or what you consider a bug ) at OpenNTF.
Use the SampleDb from the extLib. ou can easily modify the samples to your own need. Even good for testing if the issue you encounter is reproducable in this DB.
Use Firebug ( or a similar tool that comes with the browser of your choice ) If you see an error in the error tab, go and fix it.
Since you're asking for only 3, here are the tips I feel make the biggest difference:
Determine what your users / customers mean by "performance", and set the page persistence option accordingly. If they mean scalability (max concurrent users), keep pages on disk. If they mean speed, keep pages in memory. If they want an ideal mixture of speed and scalability, keep the current page in memory. This latter option really should be the server default (set in the server's xsp.properties file), overridden only as needed per application.
Set value bindings to compute on page load (denoted by a $ in the source XML) wherever possible instead of compute dynamically (denoted by a #). $ bindings are only evaluated once, # bindings are recalculated over and over again, so changing computations that only need to be loaded once per page to $ bindings speed up both initial page load and any events fired against the page once loaded.
Minimize the use of SSJS. Wherever possible, use standard EL instead (e.g. ${database.title} instead of ${javascript:return database.getTitle();}). Every SSJS expression must be parsed into an abstract syntax tree to be evaluated, which is incrementally slower than the standard EL resolver.
There are many other ways to maximize performance, of course, but in my opinion these are the easiest ways to gain noticeable improvement.
1. Use the Script Library instead writing a bulk of code into the Xpage.
2. Use the Theme or separate CSS class for each elements [Relational]
3. Moreover try to control your SSJS code. Because server side request only reduce our system performance.
4. Final point consider this as sub point of 3, Try to get the direct functions from our SSJS, Don't use the while llop and for loop for like document collection, count and other things.
The basics like
Use the immediate flags ( or one of the other flags) on serverside events if possible
Check the Flag which (forgot its name..) generates the css and js as
one big file at runtime therefore minimizing the ammount of
requests.
Choose your scope wisely. Dont put everything in your sessionscope but define when, where and how your are using the data and based on that use the correct scope. This can lead to better memory usage..
And of course the most important one read the mastering xpages book.
Other tips I would add:
When retrieving data use viewentrycollections or the viewnavigstor
Upgrade to 8.5.3
Use default html tags if possible. If you dont need the functionality of a xp:div or xp:panel use a <div> instead so you dont generate an extra uicomponent on the tree.
Define what page persistance mode you need
Depends a lot what you mean by performance. For performance of the app:
Use compute on page load wherever feasible. It significantly improves performance.
In larger XPages particularly, combine code into single controls where possible. E.g. Use a single Computed Field control combining literal strings, EL and SSJS rather than one control for each language. On that point, EL performs better than SSJS, and SSJS on the XPage performs better than SSJS in a Script Library.
Use dataContexts for properties that are calculated more than once on an XPage.
Partial Execution mode is a very strong recommendation, but probably beyond new XPages developers at this point. Java will also perform better than SSJS in a Script Library, but again beyond new developers. XPages controls you've created with the Extensibility Framework should perform better, because they should run fewer lines of Java than multiple controls, but I haven't tested that.
If you mean performance of the developer:
Get the Extension Library.
Use themes to set default properties, e.g. A standard style for all your pagers.
Use Firebug. If you're developing for Notes Client or IE, still use Firebug. You'll spend longer suffering through Client/IE thank you will fixing the few quirks that will remain.
I am trying to learn how to use BDD for our development process and I sometimes end-up writing things that implies a UI design, so for brand new development or new features, the UI does not always exists.
For example, if I say this in a scenario "When a column header is clicked" it implies that this feature is based on some sort of table or grid, but at this point we are still just writing user-stories so there is no UI yet.
That gets me confused to know at what point in the process do we come up with a UI design ?
Keep in mind, I only have read articles about BDD and I think it would help our team a lot but still very new at this! Thx!
If you write your scenarios with a focus on the capabilities of the system, you'll be able to refactor the underlying steps within those scenarios more easily. It keeps them flexible. So I'd ask - what does clicking the column get for you? Are you selecting something? What are you going to do with the selection? Are you searching for something and sorting by a value?
I like to see scenarios which say things like:
When I look for the entry
When I go to the diary for January
When I look at the newest entries
When I look at the same T-shirt in black
These could all involve clicking on a column header, but the implementation detail doesn't matter. It's the capability of the system.
Beneath these high-level scenarios and steps I like to create a screen or page with the smaller steps like clicking buttons in it. This makes it easy to refactor.
I wrote this in a DSL rather than English, but it works with the same idea - you can't tell from the steps whether it's a GUI or a web page, and some of the steps involve multiple UI actions:
http://code.google.com/p/wipflash/source/browse/Example.PetShop.Scenarios/PetRegistrationAndPurchase.cs
Hope you find it interesting and maybe it helps. Good luck!
I guess you can write around that by saying "when I sort the information by X, then..." But then you would have to adjust your scenario to remove any mention of the data being displayed in a grid format, which could lead to some rather obtuse writing.
I think it's a good idea to start with UI design as soon as you possibly can. In the case you mentioned above, I think it would be perfectly valid to augment the user story with sketch of the relevant UI as you would imagine it, and then refine it as you go along. A pencil sketch on a piece of paper should be fine. Or you could use a tablet and SketchBook Pro if you want something all digital.
My point is that I don't see a real reason for the UI design to be left out of user stories. You probably already know that you're going to build a Windows, WPF, or Web application. And it's safe to assume that when you want to display tabular data, you'll be using a grid. Keeping these assumptions out of the requirements obfuscates them without adding any real value.
User stories benefit from the fact, that you describe concrete interactions and once you know concrete data and behaviour of the system for it, you might as well add more information about the way you interact. This allows you to use some tools like Cucumber, which with Selenium enables you to translate a story to a test. You might go even further and e.g. for web apps capture all pages you start concrete story at and collect all interactions with that page resulting in some sort of information architecture you might use for documentation or prototyping and later UI testing.
On the other hand, this makes your stories somewhat brittle when it comes to UI changes. I think the agile way of thinking about this is same as when it comes to design changes - do not design for the future, do the simplest possible thing, in the future you might need to change it anyway.
If you stripped your user stories of all concrete things (even inputs) you will end up with use cases(at least in their simplest format, depends on how you write your stories). Use cases are in this respect not brittle at all, they specify only goals. This makes them resistant to change, but its harder to transfer information automatically using tools.
As for the process, RUP/UP derives UI from use cases, but I think agile is in its nature incremental (I will not say iterative, this would exclude agile methods like FDD and Kanban). This means, as you implement new story, you add to your UI what is necessary. This only makes adding UI specifics in stories more reasonable. The problem is, that this is not a very good way to create UI or more generally UX(user experience). This is exactly what one might call a weakpoint of agile. The Agile manifesto concentrates on functional software, but that is it. There are as far as I know no agile techniques for designing UI or UX.
I think you just need to step back a bit.
BAD: When I click the column header, the rows get sorted by the column I clicked.
GOOD: Then I sort the rows by name, or sometimes by ZIP code if the name is very common, like "Smith".
A user story / workflow is a sequence of what the user wants to achieve, not a sequence of actions how he achieves that. You are collecting the What's so you can determine the best How's for all users and use cases.
Looking at a singular aspect of your post:
if I say this in a scenario "When a column header is clicked" it implies that this feature is based on some sort of table or grid, but at this point we are still just writing user-stories so there is no UI yet.
If this came from a user, not from you, it would show a hidden expectation that there actually is a table or grid with column headers. Even coming from you it's not entirely without value, as you might be a user, too. It might be short-sighted, thinking of a grid just because it comes from an SQL query, or it might be spot-on because it's the presentation you expect the data in. A creative UI isnÄt a bad thing as such, but ignoring user expectations is.
It happens sometimes that one feature seems to belong to more than one place.
Trivial example, let's say I've got the following menus :
File
Pending orders
Accepted orders
Tools
Help
I've got a search feature, and the same search window work for both pending and accepted orders (it's just an 'order status' combo you can change)
Where does this search feature belongs?
The Tools menu seems to be a good choice, but I'm afraid the users may expect the search accepted orders to be in the accepted orders menu, which would make sense
Duplicating the menu entry in both pending and accepted order seems wrong to me.
What would you do? (And let's pretend we cannot merge the two orders menu into one single menu)
I think the problem you've run into is that you're thinking like a programmer. (code duplication bad). I'm not faulting you for it, I do the same thing. Multiple paths to the same screen, or multiple ways to handle the same process can actually be extremely beneficial. I would guess that more than one person is going to use your program and each probably have slightly different job functions. In essence, they have different needs for the application and will approach using it different ways. If you stick to the all items have one way of being accessed, some people will find the application beneficial and others won't. Sure all people can learn to do a task a certain way, but it won't make sense to some users. It's not intuitive (read familiar) to they way they are used to processing information, which means the application will ultimately be less beneficial to them. When people find a process (program etc.) frustrating, they won't adopt it. They find reasons why the process will need to be changed or abandoned.
An excellent example of the multiple approaches to a problem is with Adobe Photoshop. Normally there are at least 2 different ways to access a function. Most users only know of one, because that's all they are concerned with, but most users are really comfortable with using one, because it makes the most sense to them. With a little extra work, Adobe scored a huge win, because more people find their product intuitive.
Having a feature in multiple locations is not a bad thing. Consider the overall workflow for viewing both pending orders and accepted orders, and think of your new feature as a component, rather than a one-off entity.
After you map out exactly what tasks a user completes in the pending and accepted order viewing process, see where having the ability to search would provide value (by shortening the workflow or otherwise). This is where your search component belongs.
The main thing to remember about UI is that all that really matters in the end is whether your design makes using your application or site a better experience for your users.
In the search example you list above you'll commonly see apps take two approaches:
Put the search feature in a single location and allow the user to filter the search by selecting pending or accepted, or
Put the search feature in both menus, already configured for the type of search to be done based on the menu it was launched from.
If you repeat the above choice for a number of factors you'll see a much more advanced (aka 'complicated') search interface for number one, and a much simpler (aka 'restrictive') search interface for number two.
Which one is best completely depends on your users. This is why many general applications have a simple search by default and a link to a more advanced search for those that want or need the additional capabilities; they're attempting to make everyone happy. There is absolutely nothing wrong with that if you're writing for a wide variety of people with different needs. If you're writing for a set of users with a restricted set of needs however, you can make some better choices.
In my experience your best bet is to work with one or two of your primary users and map out all of the steps they need to take to get each of the tasks the application will be helping them with accomplished. If there aren't a lot of branching points in that sequence of steps there shouldn't be a lot of choices or settings to make in the application; otherwise the users may feel that the app is harder to work with than it needs to be.
For the search example above, if the user has already navigated into the Pending Orders menu, the likelihood that they'll want to launch a search for Accepted Orders is very small and having to make that choice, or go elsewhere to do the search, will be an extra decision or action they'll need to take. Basic principle is if your user has already made a decision, use it; don't make them tell you again.
Use the UI you come up with as a first cut. Let your users, or a subset of them, try it out and make suggestions. If you have the option, watch them use it. You'll learn far more about how to improve the interface by seeing how they work with it than you will from what they tell you.
Generally you do not want the same menu item appearing in different menus. It adds complexity and clutter to the menu, and users will wonder if the two menu items are really the same or not. When it appears that a menu item belongs in two places, then you may have a more basic problem with your menu organization.
For instance, your example shows a menu bar that is organized by the class or attribute of the object the commands within act on. In general, the menu bar should be organized by category of action not type of object. For example, you could have a Retrieval menu for commands like Search and other means of displaying orders, and a Modify menu for processing the orders (e.g., updating, accepting, forwarding). Both menus would have menu items that apply to both types of objects, although some commands may apply to only one.
Organizing commands by object type is actually a good idea but it is better accomplished with a context menu (right click) than the menu bar.
I would try the search in both the Accepted Orders and Pending Orders menus. However, user testing will show if this is a good idea or not. But it also depends on your user base.
You are doing user testing right?
...you may already know this, but this is a good place to use the command\action pattern IMHO.
So to answer your question: IMO, yes, it is ok :) This situation is definitely warranted.
Just put it under both menus and have it open your search window, pre-configured for the order type who's menu it was launched from. Name them accordingly and voila they're actually two different actions - even though they use the same code/component.
Keep the user-selectable "status combo you can change" in the search window active though so the user still can adjust the settings without relaunching it from the other menu... and then perhaps rethink the structure, see some of the great answers in here for ideas ^^
I've read a statement somewhere that generating UI automatically from DB layout (or business objects, or whatever other business layer) is a bad idea. I can also imagine a few good challenges that one would have to face in order to make something like this.
However I have not seen (nor could find) any examples of people attempting it. Thus I'm wondering - is it really that bad? It's definately not easy, but can it be done with any measure success? What are the major obstacles? It would be great to see some examples of successes and failures.
To clarify - with "generating UI automatically" I mean that the all forms with all their controls are generated completely automatically (at runtime or compile time), based perhaps on some hints in metadata on how the data should be represented. This is in contrast to designing forms by hand (as most people do).
Added: Found this somewhat related question
Added 2: OK, it seems that one way this can get pretty fair results is if enough presentation-related metadata is available. For this approach, how much would be "enough", and would it be any less work than designing the form manually? Does it also provide greater flexibility for future changes?
We had a project which would generate the database tables/stored proc as well as the UI from business classes. It was done in .NET and we used a lot of Custom Attributes on the classes and properties to make it behave how we wanted it to. It worked great though and if you manage to follow your design you can create customizations of your software really easily. We also did have a way of putting in "custom" user controls for some very exceptional cases.
All in all it worked out well for us. Unfortunately it is a sold banking product and there is no available source.
it's ok for something tiny where all you need is a utilitarian method to get the data in.
for anything resembling a real application though, it's a terrible idea. what makes for a good UI is the humanisation factor, the bits you tweak to ensure that this machine reacts well to a person's touch.
you just can't get that when your interface is generated mechanically.... well maybe with something approaching AI. :)
edit - to clarify: UI generated from code/db is fine as a starting point, it's just a rubbish end point.
hey this is not difficult to achieve at all and its not a bad idea at all. it all depends on your project needs. a lot of software products (mind you not projects but products) depend upon this model - so they dont have to rewrite their code / ui logic for different client needs. clients can customize their ui the way they want to using a designer form in the admin system
i have used xml for preserving meta data for this sort of stuff. some of the attributes which i saved for every field were:
friendlyname (label caption)
haspredefinedvalues (yes for drop
down list / multi check box list)
multiselect (if yes then check box
list, if no then drop down list)
datatype
maxlength
required
minvalue
maxvalue
regularexpression
enabled (to show or not to show)
sortkey (order on the web form)
regarding positioning - i did not care much and simply generate table tr td tags 1 below the other - however if you want to implement this as well, you can have 1 more attribute called CssClass where you can define ui specific properties (look and feel, positioning, etc) here
UPDATE: also note a lot of ecommerce products follow this kind of dynamic ui when you want to enter product information - as their clients can be selling everything under the sun from furniture to sex toys ;-) so instead of rewriting their code for every different industry they simply let their clients enter meta data for product attributes via an admin form :-)
i would also recommend you to look at Entity-attribute-value model - it has its own pros and cons but i feel it can be used quite well with your requirements.
In my Opinion there some things you should think about:
Does the customer need a function to customize his UI?
Are there a lot of different attributes or elements?
Is the effort of creating such an "rendering engine" worth it?
Okay, i think that its pretty obvious why you should think about these. It really depends on your project if that kind of model makes sense...
If you want to create some a lot of forms that can be customized at runtime then this model could be pretty uselful. Also, if you need to do a lot of smaller tools and you use this as some kind of "engine" then this effort could be worth it because you can save a lot of time.
With that kind of "rendering engine" you could automatically add error reportings, check the values or add other things that are always build up with the same pattern. But if you have too many of this things, elements or attributes then the performance can go down rapidly.
Another things that becomes interesting in bigger projects is, that changes that have to occur in each form just have to be made in the engine, not in each form. This could save A LOT of time if there is a bug in the finished application.
In our company we use a similar model for an interface generator between cash-software (right now i cant remember the right word for it...) and our application, just that it doesnt create an UI, but an output file for one of the applications.
We use XML to define the structure and how the values need to be converted and so on..
I would say that in most cases the data is not suitable for UI generation. That's why you almost always put a a layer of logic in between to interpret the DB information to the user. Another thing is that when you generate the UI from DB you will end up displaying the inner workings of the system, something that you normally don't want to do.
But it depends on where the DB came from. If it was created to exactly reflect what the users goals of the system is. If the users mental model of what the application should help them with is stored in the DB. Then it might just work. But then you have to start at the users end. If not I suggest you don't go that way.
Can you look on your problem from application architecture perspective? I see you as another database terrorist – trying to solve all by writing stored procedures. Why having UI at all? Try do it in DB script. In effect of such approach – on what composite system you will end up? When system serves different businesses – try modularization, selectively discovered components, restrict sharing references. UI shall be replaceable, independent from business layer. When storing so much data in DB – there is hard dependency of UI – system becomes monolith. How you implement MVVM pattern in scenario when UI is generated? Designers like Blend are containing lots of features, which cannot be replaced by most futuristic UI generator – unless – your development platform is Notepad only.
There is a hybrid approach where forms and all are described in a database to ensure consistency server side, which is then compiled to ensure efficiency client side on deploy.
A real-life example is the enterprise software MS Dynamics AX.
It has a 'Data' database and a 'Model' database.
The 'Model' stores forms, classes, jobs and every artefact the application needs to run.
Deploying the new software structure used to be to dump the model database and initiate a CIL compile (CIL for common intermediate language, something used by Microsoft in .net)
This way is suitable for enterprise-wide software and can handle large customizations. But keep in mind that this approach sets a framework that should be well understood by whoever gonna maintain and customize the application later.
I did this (in PHP / MySQL) to automatically generate sections of a CMS that I was building for a client. It worked OK my main problem was that the code that generates the forms became very opaque and difficult to understand therefore difficult to reuse and modify so I did not reuse it.
Note that the tables followed strict conventions such as naming, etc. which made it possible for the UI to expect particular columns and infer information about the naming of the columns and tables. There is a need for meta information to help the UI display the data.
Generally it can work however the thing is if your UI just mirrors the database then maybe there is lots of room to improve. A good UI should do much more than mirror a database, it should be built around human interaction patterns and preferences, not around the database structure.
So basically if you want to be cheap and do a quick-and-dirty interface which mirrors your DB then go for it. The main challenge would be to find good quality code that can do this or write it yourself.
From my perspective, it was always a problem to change edit forms when a very simple change was needed in a table structure.
I always had the feeling we have to spend too much time on rewriting the CRUD forms instead of developing the useful stuff, like processing / reporting / analyzing data, giving alerts for decisions etc...
For this reason, I made long time ago a code generator. So, it become easier to re-generate the forms with a simple restriction: to keep the CSS classes names. Simply like this!
UI was always based on a very "standard" code, controlled by a custom CSS.
Whenever I needed to change database structure, so update an edit form, I had to re-generate the code and redeploy.
One disadvantage I noticed was about the changes (customizations, improvements etc.) done on the previous generated code, which are lost when you re-generate it.
But anyway, the advantage of having a lot of work done by the code-generator was great!
I initially did it for the 2000s Microsoft ASP (Active Server Pages) & Microsoft SQL Server... so, when that technology was replaced by .NET, my code-generator become obsoleted.
I made something similar for PHP but I never finished it...
Anyway, from small experiments I found that generating code ON THE FLY can be way more helpful (and this approach does not exclude the SAVED generated code): no worries about changing database etc.
So, the next step was to create something that I am very proud to show here, and I think it is one nice resolution for the issue raised in this thread.
I would start with applicable use cases: https://data-seed.tech/usecases.php.
I worked to add details on how to use, but if something is still missing please let me know here!
You can change database structure, and with no line of code you can start edit data, and more like this, you have available an API for CRUD operations.
I am still a fan of the "code-generator" approach, and I think it is just a flavor of using XML/XSLT that I used for DATA-SEED. I plan to add code-generator functionalities.