software for organizing text - full-text-search

(
I suspect that the question may not belong here as it's about software and not about programing. However, this is my computers community, and I trust you guys to refer me elsewhere if you think it's not appropriate to answer it here.
)
So,
I'm writing a lot. Text. For myself. Diaries, ideas, insights, observations. It always comes in the form of passages, passage at a time.
Until now I used to write in word documents, organizing them by rough categories divided to different documents, and by chronological order.
I figure out that this is way sub optimal. I can have more, and I do need more.
I'm looking for a software that will allow me to:
1 - tag passages
2 - store date and time automatically (created and edited)
3 - powerful full text search
4 - besides the above, I'd like it to have as much word processing capabilities as possible
Recommendations for a software that have this?
Now, I don't need this to be online. I'm doing this for myself, and don't want it to be published anywhere. I figure out however that many web platforms may have much of what I need, so I don't automatically reject the possibility to use one for my offline needs.
Thanks guys
Gidi

You could install wordpress or any other suitable blogging software locally and have your own private blog - let's you write passages as short as you like, you can tag it, categorize it, search it. Keeps track of when it was created and edited. And you can probably add a fair amount of word processing capabilities to it via plugins. And you could put it online when you wanted to.
It's a bit install overhead required (probably XAMP) though.

Related

Body Text extraction from websites e.g. extract only article heading and text not all text in site

I am looking for algorithms that allow text extraction from websites. I do not mean "strip html", or any of the hundreds of libraries that allow this.
So for example for a news article I would like to identify the heading and all the text, but not the comments section and so on.
Are there any algorithms for that out there? Thank you!
In computer science literature this problem is usually referred to as the page segmentation or boiler plate detection problem. See the report Boilerplate Detection using Shallow Text Features and its related blog post. Also, I have a few reports and software sites bookmarked that address the problem. Also, see this stackoverflow question.
there are a few open source tools available that do similar article extraction tasks.
https://github.com/jiminoc/goose which was open source by Gravity.com
It has info on the wiki as well as the source you can view. There are dozens of unit tests that show the text extracted from various articles.
"Content extraction" is a very difficult topic. There are no common standards to identify the "main-article" content (there are several approaches to make HTML easier readably for crawlers, e.g. schema.org, but none of these is very popularly used).
So it turns out, that if you want good results, its probably best to define your own XPath selectors for each (news) website you want to scrape. Although there are some APIs for HTML content extraction, but as I said its very hard to develop an algorithm which works for every site.
Some APIs you could use:
alchemyapi.com
diffbot.com
boilerpipe-web.appspot.com
aylien.com
textracto.com
What you're trying to do is called "content extraction". It turns out to be a surprisingly hard problem to solve well, and many naive solutions do quite badly.
Instapaper and Readability both have to solve this, and you may learn something from looking at their solutions. They also both provide services that you may be able to take advantage of - perhaps you can outsource your problem to them and let their API take care of it. :)
Failing that, a search for "html content extraction" returns a great deal of useful results, including a number of papers on the subject.
I compared a few different libraries, and had really great luck with Mozilla's Readability library (Node), or its Python wrapper.
For example, take this CNN article: https://edition.cnn.com/2022/06/01/tech/elon-musk-tesla-ends-work-from-home/index.html
Readability successfully returns only the relevant data:
New York (CNN Business) Elon Musk is demanding that Tesla office workers return to in-person work or leave the company. The policy, disclosed in leaked emails Musk sent to Tesla's executive staff Tuesday, was first reported by electric vehicle news site Electrek. "Anyone who wishes to do remote work must be in the office for a minimum (and I mean *minimum*) of 40 hours per week or depart Tesla. This is less than we ask of factory workers," Musk wrote, adding that the office must be the employee's primary workplace where the other workers they regularly interact with are based — "not a remote branch office unrelated to the job duties." Musk said he would personally review any request for exemption from the policy, but that for the most part, "If you don't show up, we will assume you have resigned."
etc.
I think your best shoot is study what information can you get from the metadata and write a good html parser, oEmbed could be a good standard =)
https://oembed.com/#section7

Ways to enhance a trial user's first time experience

I am looking for some ideas on enhancing a trial-user's user experience when he uses a product for the first time. The product is aimed at a particular domain and has various features/workflows. Experienced users of the product naturally find interesting ways to combine features to get the results they want (somewhat like using an IDE from a programmer's perspective).Trial users get to use all features of the product in a limited fashion (For ex: If there is a search functionality, the trial-user might see only the top 20 results, or he may be allowed to search only a 100 times). My question is: What are the best ways to help a trial-user explore/understand the possibilities of the product in the trial period, especially in the first 20 - 60 mins before the user gives up on the product?
Edit 1: The product is a desktop app (served via JNLP, so no install required) and as pointed out in the comments, the expectations can be different in this case. That said, many webapps do take a virtual desktop form and so, all suggestions are welcome.
Check out how blinksale.com handles this. It's an invoicing app, but to prevent it from looking too empty for a new account, they show static images in places where you'd actually have content if you used the app. Makes it look less barren at first until you get your own data in.
if you can, avoid feature limiting a trial. it stops the user from experiencing what the product is ACTUALLY like. It also prevents a user from finding out if a feature actually works like they want/expect/need it to.
if you have a trial version, and you can, optimise it for first time use. focus on / highlight the features that allow the user to quickly and easily get benefits for useful output from the system.
allow users to export any data they enter into a trial system - and indicate that this is possible/easy. you don't want them to be put off from trying something because of a potential for wasted effort.
avoid users being required to do lots of configuration before using a trial. prepopulate settings based on typical/common/popular settings. you may also want to consider having default settings for different types of usage. e.g. "If you want to see what the system is like for scenario X, use configuration J. If you want to see what the system is like for use case Y, use configuration K." where J & K are collections of settings best suited to a particular type of usage.
I'll speak from personal experience while evaluating trial applications.
The most annoying trial applications are those which keep popping up nag screens or constantly reminding me that I'm using a trial. Trials which act exactly like the real product from the beginning till the end of the trial period are just awesome. Limited features are annoying, the only exception I can think of when you could use it is where you have rarely used feature which would allow people to exploit the trial (by using this "once-in-lifetime" needed feature and uninstalling). If you have for example video editing software trial which puts "trial" watermark on output, I'd uninstall it as soon as I'd notice it. In my opinion trial should seamlessly integrate into user work-flow so that once the trial ends they would think "Hey, I have been using this awesome program almost each day since I got the trial, I absolutely have to buy it." Sure some people will exploit it, but at the end you should target the group which will use your product in daily work-flow instead of one time users. Even if user "trials" it 2 times per year, he will keep coming back to your product and might even buy it after 2nd or 3rd "one-time use".
(Sorry for the wall of the text and rant)
As for how to improve the first session. I usually find my way around programs easily, but one time only pop-up/screen (or with check-box to never show it again) with videos showing off best features and intended work-flow are quite helpful. Also links to sample documents might be helpful. If your application can self-present itself (for example slide-show about the your slide-show program) you could include such document. People don't like to read long and boring help files, but if you have designer in your team, you could ask him to make a short colourful intro pdf. Also don't throw all the features at the user at the same time. Split information into simple categories and if user is interested into one specific category keep feeding him more specific information. That's why videos are so good, with 3-6 x ~3-5 minute videos you can tell a lot. Also depending how complex your program is you could include picture with information where specific things are located on the screen.
Just my personal opinion, I have never made a trial myself. Hope it helps.
An interactive walk through/lab exercise that really highlights the major and exciting offerings of your application.
Example: Yahoo mail does the same when the users opt to use new mail interface
There are so many ways you can go with this. I still can't claim to have found the best approach.
However, my plan from the beginning with my online (Silverlight) software was to give away something thousands of people will find useful and can use for free. The free version is pretty well representative of the professional product, with only a few features missing that enhance productivity (I'm working on those professional features now). And then I do have a nag popup that comes up every 5 minutes suggesting that you should buy it. That popup can be dismissed as many times as you want. I know that popup will annoy some people but I suppose that's the trade off. There is no perfect plan. But I don't think the occasional nag popup scares that many people away, especially when it can be dismissed with a single click.
I was inspired by Balsamiq Mockups, which has been hugely successful over the past couple years. My trial/nag popup way of doing things was copied almost exactly from Balsamiq. I honestly don't know if this is the ideal plan, but it has obviously worked for them. By the way, I think another reason for Balsamiq's success is that the demo doesn't have to be downloaded & installed. Since the demo is in Flash, there's a very high conversion rate of users actually trying it and becoming addicted to it.

What kind of specs, documents, analysis do you get from superiors when starting a project? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I currently work in a small business (15-20 employees, 5 programmers) where most projects are custom built CMS and a few web applications products.
Since I started working there, I have worked on many projects, but specifications for each project vary a lot. Sometimes we get a little detail, a Word document telling what the client wants, and what we are suggesting (suggested form fields, a short description of display, etc.). Sometimes almost nothing except "do what you think is the best approach for this project/module/request".
My question to you guys, who might work in different kind of businesses, is: How (huge pile of paper? Word docs? Visios?) and what kind of information do you get from your superiors, managers, teamates when starting a project (plenty of analysis, drawings, etc.)? How much detail do you get on this?
Hope my question is clear enough, thank you.
Specs..that's kind of funny...how about never :(.
Seriously a lot of companies assume specs aren't needed, its absolutely unacceptable but this is how it is in a LOT of companies. They assume a one liner and the programmer knows what the program should do, the inputs / outputs and so on.
Unfortunately in my case I have to actually help write the specs..and Im the programmer :(.
I mostly get a lot of verbal direction and I use a voice recorder to record the conversation and transcribe it when I am done. I write my own specs from my customers' words.
Then, as a good consultant should, I take the writeup back to the customer and verify it, and get a signature and build it, and they live happily every after! (no they dont, they change their mind a 100 times)
It can vary depending on what group the work falls under:
Support request - If the change will take a short period of time and is fixing something broken, there is this group. This could be as simple as, "Add Bob to the list of authorized users for that ancient form" where the form is something written years ago and aside from adding and removing users, it isn't touched for fear of breaking things.
Service Advisory Committee request - Items that are up to a few days are in this group as these are kind of like mini-projects as the request may be to create a new form or portal for a group. This could be upgrading some 3rd party software where we have some customizations that make the upgrade not necessarily a simple thing for Operations to do.
Project - In this case there are usually a few Word documents and/or e-mail threads that help nail down requirements in terms of scope, budget, and time. These can take months though there is something to be said for having a prototype to change rather than creating the initial prototype to tell if requirements are really met or not. Course my current project is over a year old, still has a few more months to the timeline and already has a successor coming after it is done,i.e. there is a Phase II to go after Phase I.
Uber project - These merit their own group of documentation and are the million dollar, multiple company projects that usually try to document everything up front rarely works out well here. Thus, there is some adoptioon of agile for these but there are still some growing pains to go through as how we use agile matures. Think installing a dozen modules of some off-the-shelf software that requires both internal and external developers to customize the suite for our specific needs as the software is supposed to be very robust, flexible and help save lots of time and money on how people otherwise do their jobs generally. Think ERP or CRM for a couple of examples here.
We are a 16-person company that creates and supports customized software for small retail shop owners.
The projects we get fall into three general categories (as related to specs):
"Here, automate this form." A sales person explains that our customer only wants this form to appear where they can fill it out and print it to make it look professional to their customer. Our specs is a single piece of paper that looks something like an order form or report. This is always false; they want pop-up lookups, automatic updating from other sources, and "while you're at it" add-ons that more than double the time. These, we've learned to just live in the moment and let the project take its course. By the time we're done, the program doesn't look anything like their original form.
Small changes. Like a simple e-mail explaining that the background color is stale, or a request to sort a report by a different column. These, we just do as time allows.
Big company integrations, where we're tasked with making our software work with some big outfit like Intuit (QuickBook) or FedEx (shipping rates). These often have well thought out documentation and sample code. We get 100's of pages in word documents or pdfs. The problem with these is when their specs are wrong. We find out about inaccuracies when we try to test or certify our integration. In these instances, we usually take longer in certification than we did to originally develop the processes.
In all cases, the real trouble is when a sales person promises a solution to the customer before even asking a programmer what it would take. As recently as 2 weeks ago, a sales person got into real trouble and had to issue a refund (that person is no longer with the company).
None - at least not from management.
Instead, as a developer (and particularly one leading a software project right now), I'm expected to contact my users/customers/etc and work directly with them to come up with our specifications and requirements. The documentation I do request from my team is only what will be useful to the team. I am lucky in that management rarely requests a document that doesn't make sense or won't provide some use to our project.
I currently have a half-dozen or so specs each 60-80 pages. One of them is 80 pages with no table of contents. Good times.
Our Product Managers and senior engineers prepare three planning docs for our data management software projects.
High-level requirements: 1-to-3 sentence descriptions of hardware/software supported or specific feature for this project. (10-15 pages of Excel-like grids)
Technical details: Engineering implementation of each high-level requirements. Up to a page for each, depending on amount of detail. (30-40 pages of filled-in feature details)
Business agreement: Summary of 1 & 2 with engineering schedule and Product Mgmt's market analysis. Everyone signs off on this. (5 pages analysis, 20 technical)
I haven't seen work flows or other Visio-like details in our specs. The prioritized requirements and schedule prove critical, so we understand when to lop things off to save development and testing time.

What are good/bad ways of providing help for an application..?

I'm in the process of developling various applications for whom the end users are both engineers and salesman. Some of the operations and options may not be immediately obvious to all users. All applications are delivered with a PDF and paper manual - but of course nobody reads them!
I would like to improve the usability of the applications by including dynamic context sensitive help. One option would be alá MSDN and have F1 call up a web page - however internet access will not always be available and even this will be too much effort for some.
Another idea is to have descriptions pop up when an option is hovered over - like a tooltip.
I'm interested in other peoples views on this and what are best practices in this situation. Along a similar theme to this post What are common UI misconceptions and annoyances? I'd like to start a discussion regarding these two points:
What would be the best way to go about it?
What help features in existing applications you use either delight or annoy you..?
In my experience nobody but programmers reads the help. So when you have a technical and non-technical target audience you end up providing 2 ways of doing everything:
A Wizard with a few options.
A property editor with lots of options.
In either case, pictures are usually better than words for documentation. So a screenshot or 3 with big green arrows and circles calling out what does what will go a lot further than an indexing, exhaustive help file.
In my experience it would be very helpful to have a tooltip on each option that provides a little more definition/clarity for each option. Additionally, you can improve usability by having the default screen contain a few common, simple options and providing an advanced section that provides more control.
I'm currently working on a similar side-project. We have an existing product that's used by people as part of their day job. There is an inherent learning curve on the product, so users receive some degree of training and have people they can turn to for assistance. Even so, we know it needs more help and user documentation in general.
We are starting this help enhancement project by running a quick survey on the end users, (offering a prize draw as an incentive). We will also speak to the support staff who have to deal with help requests. This will uncover some of the pain points, and will give us a clear idea of how to focus our time & resources.
Guidelines on when to use inline tips vs tool tips etc can be found in various style guides, e.g. here:
http://developers.sun.com/docs/web-app-guidelines/uispec4_0/11-help.htm
Bear in mind that it's probably a bad idea to just copy & paste the text from your existing manuals into contextual help tips. You're going to need help writing completely new content. See if you can get some time from a technical writer / copywriter.

Refresh Oldschool GUI Design [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I'm building Desktop Software for over 10 years now, mostly it's simple Data-Input Software. My problem is, it's always looking the same: A Treeview on the Left and a lot of Text/Data Fields to the right, depending on the type of data currently is worked on. Are there any fresh ideas how such software nowadays should look like?
For further clarification:
It's very hierarchical data, mostly for electronic devices. There are elements of data which provide static settings for the device and there are parts which describe some sort of 'Program' for the device. There are a lot (more than 30) of different input masks. Of course i use combo boxes and Up/Down Entry Fields.
Having all of your software look the same thing is a good thing. One of the best ways to make it easy for people to use your software is to make it look exactly the same as other software your users already know how to use.
There are basically two common strategies for how to handle entry of a lot of data. The first is to have lots of data entry fields on one page. The next is to have only a few data entry fields but a lot of pages in a sort of wizard-style interface. Expert users find the latter much slower to use, as do users who are entering data over and over again. However, the wizard style interface is less confusing for newer users since it offers fewer elements at once and tends to provide more detail on them.
I do suggest replacing as many text fields as possible with auto-complete-based combo-boxes. This allows users to enter data exactly the same as with text-boxes, but also allows users to save typing by hitting the down key to scroll through choices after typing part of the data in.
Providing more detail on what data is being entered would probably yield more specific answers.
I'd also answer with a question, which is to ask what your motivation for considering a change is? Like the other posters, I'd agree that there is some value in consistency, but there's also a strong value in not ignoring niggles-in-the-back-of-the-mind feelings you have. Maybe you have a sense that your users aren't as productive as you'd like them to be, or you've heard feedback to that effect from your customers, or you're just looking to add some innovation for your own interest. Scratching itches is a good trait in a developer, in my view.
One thing I'd advocate would be a detailed user study. How much do you know about what your users do with the interfaces you create? Do you know the key tasks, the overall workflow? Would you know if one task regularly consumed 60% of your users' time, or if there was a task that was only performed once a month? Getting a good sense of what the users actually do (and not what they say they do) is a great place to start thinking about what changes might be worthwhile, especially if you can refactor the task to get a qualitatively different user experience.
A couple of specific alternative designs you might like to include in re-visioning the UI might be be facet browsing (works well for searching and exploring in hierarchies), or building a database of defaults / past responses so that text boxes can use predictive completion. However, I think my starting point would be the user study.
Ian
If it works...
Depending on what you've got happening with the data (that is, is it hierarchical, or fairly flat), you might want to try a tab-based metaphor, or perhaps the "Outlook-style", with a sidebar showing the sections of an application. One other notion I've played with lately is the "Object desktop" that I first saw proposed by Scott Ambler (Building Object Applications That Work). In this, you can display collections of items, or the user can "peel off" individual records for easy access.
Your information is not enough to really suggest you an interface alternative. However, may I answer your question with a question? Why do you think you have to change it? Has your customer complained? If not, it looks like your customer is happy with the way the software works right now, thus I wouldn't change it. If your customer complains about it, he'll most likely not just say "It's bad", he will say "Why can't it look like ..." and this will give you an idea how to change it.
I once had to re-design a very outdated goods management system. The old one was written for a now dead database system, still running in MS-DOS. The customer suggested I should create a prototype how this re-implementation might look like and then he'll decide if I get that job or not. I replaced the old, dead database with a modern MySQL database, I replaced the problematic shared peer access with a client server approach and I chose to rewrite the UI in Java, since different OSes were used and this had the smallest porting costs. So far the concept seemed good, the customer liked it. However, when he asked his employees what they think about it, they asked "So far it's great, but we have one question: Why doesn't it look like the old one?". Actually, it turned out that even with all the modern technologies, they wanted the interface to exactly look and being operated like the old one. So I had to re-build a 1986 usability nightmare MS-DOS UI in Java, because no other UI was accepted.
For me it is more about a clean, usable, logical design than anything else. If your program makes sense to the user, isn't clunky and works as advertised, then everything else UI related is essentially just like painting the house. I've sometimes rolled out a new version of a program with essentially the same controls that are skinned differently.
There's a reason that you've probably chosen the tree view - because it probably makes really good sense to do so. There are different containers and controls available in the various UI libraries, depending on the language, but I tend to stick with the familiar because the user probably gets how a tree control works and how a combobox works.
A user interface needs to be usable, just don't do the misstake to change to something working to something fancy-schmancy just because it looks better (been down that road)...
Make sure that added
widgets/controls really add a business value
Make sure that the added
widgets/controls do not mess up your
architecture (too much) and makes
the application harder to
manage/maintain
Try to keep platform standards on
how to do things (for example the Vista ux guidelines)
:)
//W

Resources