I would like to represent data that gives an overview but allows them to drill down in an inline fashion - so if you had a grouping of say 6 objects the user could expand the data and it would show the 6 objects immeadiately below it before any more high level data.
It would appear that MSHFlexgrid gives this ability but I can't find any information about actually using it, or what it's limitations are (can you have differing number of fields and/or can they have different spacing, what about column headers, indentation at for the start, etc).
I found this site, but the images are broken (in ie8 and ff3.5). Google searches show people just using the flat data representation but nothing using the hierarchical properties). Does anyone know any good tutorials or forums with a good discussion about pitfalls?
Due to lack of information about using it, I am thinking of coding my own version but if anyone has done work in this area I haven't found it - I would of thought it would be a natural wish for data representation. If someone has coded a version of this (any language) then I wouldn't mind reading about it - maybe my idea of how to do it wouldn't be the best way.
You might want to check out vbAccelerator. He has a Multi-Column Treeview control that sounds like what you may be looking for. He gives you the source and has some pretty decent samples.
The MSHFlexGrid reference pages and the "using the MSHFlexGrid" topic in the Visual Basic manual?
Sorry if you've already looked at these!
Related
I need to create a universal web scraper to parse articles on the different websites. Of course, I know about XPath, but I want to try to make it universal for any website despite the HTML markup of a page.
I need to determine whether there is an article on the page and if it is - parse a text of title, body and tags (if exists).
Frankly speaking, my knowledge in DS is not very huge, but I assume this task (determine whether it is article, and parsing only needed parts) is possible to solve.
What tools should I use? Any help?
Actually, for the second task, I need to implement something similar that google chrome mobile does. When page is not optimised for mobile, then propose to show the page in adaptive mode (just title, and main content).
If you are using Python, some libraries to look at are:
scrapy, which scrapes data and can extract some of the results) and,
BeautifulSoup, which is more geared towards the extraction part itself.
It is possible to request a version of a website (e.g. for Chrome, Safari, Mobile, old-school systems) by creating a custom header for your scraper.
HAve a look at the relevant documentation, and you can get an idea of how to use headers in scrapy here.
I do not know of any more specialised tools. Your tasks are more analytical and are typically not performed with the use of models for estimating e.g. what content is where on a webpage. This might be an intersting research direction though; to see if you can create a model that generalises across many websites to extract the desired content.
That leads me on to my last point, which is to say that creating a single scraper that works for any website *containing your artile type) is not usually possible. People create websites differently, however they see fit, which means they also change them. This usually leads to a good scraper requiring constant updates as time (and developers) moves on.
EDIT:
Then if you have lots of labelled examples, it might be possible to train a model. The challenge might be the look-back range of the model. For example, a typical LSTM model is given a parameter that tells it how far to look back into the past. It is stored within its memory internally. In your case, you might be looking for a start and end HTML tag of an article, to then extract just that part. These tahs could be thousands of words apart. Something a standard LSTM might not be fit to retain and use.
If you could pose your problem a little differently, then there are other approaches that might be plausible. E.g., you could make it a "question-answer" problem, by saying: I have this HTML, where is the article content? If that sounds ok for your use-case, have a look here for some model based approaches.
I'm getting frustrated with my own inability to find a source of information regarding what options / attributes are to be used when defining the XML file for a form in a component.
The file I'm talking about might be located in /administrator/components/com_report_wiz/models/forms, as an example. It defines the field to be used in the admin form for a component. I used a component creator to build a sample component as a learning experience. It created an xml file in that folder which has fieldset elements that then contain field elements. This us then used with the getLabel and getInput methods of JForm to generate the form shown in the admin interface. That's terrific!
But, after spending hours Googleing everything I could think of, I still can't find any reference that shows what types of fields are available, and their parameters/options. I've found lot's of tutorials and such regarding creating custom field types, and that's been interesting.
In the file I'm looking at, for example, the following creates a simple text input field in the form:
<field name="rpt_appname" type="text"
label="COM_REPORT_WIZ_FORM_LBL_REPORT_RPT_APPNAME"
description="COM_REPORT_WIZ_FORM_DESC_REPORT_RPT_APPNAME"
default="None"
maxlength="100" />
I would love to find some reference that lists the different possible values for the "type" attribute, and the parameters that can be used with each.
I'm beginning to think I'm dumber than a box of rocks since I can't figure out where to find information on some of the most basic parts of Joomla! development. The docs that are auto-generated from the code are less than helpful to me since they don't explain the parameters to functions. It's nice to know what parameters a method/function expects, but it's more helpful to understand what those parameters are and contain.
The tutorials have been helpful, but are mostly too basic to use for more advanced features, or at least as a source of information. They have been great, and I really appreciate the effort the authors have put into them, but now that I've gone through them, I find it difficult to discover the info needed to write a proper, complex component. With a system as complex and extensive as Joomla, it seems that there should be a place to find out how to use the wonderful abilities it provides without having to resort to reading the source code.
Any suggestions about where to search, search terms would be greatly appreciated!!
The first starting point would be to look at the Joomla! Documentation - I know sometimes it is frustrating to use it, but give it a change. It gets better and better as we speak.
Typing in the search box text will get you to the page Text form field type. Also in the documentation you will find a list of Standard form field types.
My favourite way of doing is directly inspecting the code in JOOMLA_ROOT/libraries/joomla/form/fields for the needed form type. You get to see there all the parameters and quicklier understand why something does not work the way you think it should work.
Since you are new to Joomla, your questions might get a better attention at the Joomla! Q&A site.
Hope this answers your question.
I got a financial application and I wish to add to it the ability to get user command or input in textbox and then take the right action. for example, wish the user to write "show the revenue in the last 10 days" and it'll show the revenue to him/her - the point is that I wish it to really understand the meaning of the question, so the previus statement will bring the same results as "do I got any revenue in the last 10 days" or something like that - BI (something like the Wolfram|Alpha engine).
I wonder if there's any opensource library or algorithm books or whatever that I can use to learn the subject. Regards to opensource libraries - I don't mind which language it'll be written in.
I've read about this subject and saw many engines and services (OpenNLP, Apache UIMA, CoreNLP etc.) but did not figure out if they're right for my needs.
Any answer or suggestion is welcome.
Many thanks!
The field you're talking about is usually called "natural language processing". It's hard, and an active field of research. There are various libraries which you could consider based on your preferred programming language and use case:
http://en.wikipedia.org/wiki/List_of_natural_language_processing_toolkits
I've used NLTK a little bit. This field is seriously difficult to get right, so you might want to try to restrict your application to some small set of verbs and nouns such that people are using a controlled vocabulary in the first instance, and then try to extend it beyond that.
I want in an application with a simple text input, enriched with some marks to include formatting or semantic labeling. I want the syntax as easy as possible and I want to include self-defined labels.
Example:
[bold]Stackoverflow[/bold] is a [tag]good[/tag] resource for programmers.
Tables would be needed too.
HTML/XML and LaTeX are mighty enough to allow this, but too complicated. Wiki-Syntax seems simple, but uses another symbol for each markup, has unclear quoting and every Wiki seems to have another syntax. For tables and similar stuff Wiki becomes very complicated.
Exists a language/syntax, that matches my needs or can be slightly changed to do so? Or do I have to invent something myself? In that case, do you have suggestions?
Definitely do NOT invent your own. There are plenty of simple markup languages already, and users HATE learning new ones. Trust me on this!
I would suggest using one of the following:
Textile
Markdown
BBCode
Make your decision based on your userbase, as well as what tools and parsers are available in your chosen language. For my site, we went with Textile, but I've found that BBCode tends to be the language that most people already know. However, this will vary with different user demographics.
StackOverflow, along with several other sites, uses Markdown. I think it will give you the best balance between features and simplicity.
Let me add ReStructuredText to the list.
An additional benefit of using it is given by the availability of ReStructuredText to Anything service that makes extremely easy to create HTML or PDF versions of the document.
As already pointed out there are a lot of lightweight markup languages (many are listed here: wikipedia article), there should be no need of creating your own.
For example, right now I have a roll-my-own solution that uses data files that include blocks like:
PlayerCharacter Fighter
Hitpoints 25
Strength 10
StartPosition (0, 0, 0)
Art
Model BigBuffGuy
Footprint LargeFootprint
end
InventoryItem Sword
InventoryItem Shield
InventoryItem HealthPotion
end
human editable (w/ minimal junk characters, ideally)
resilient to errors (fewest 'wow i can't parse anything useful anymore' style errors, and thus i've lost all of the data in the rest of the file) - but still able to identify and report them, of course. My example the only complete failure case is missing 'end's.
nested structure style data
array/list style data
customizable foundation types
fast
Are there any well known solutions that meet/exceed these requirements?
Yaml is a good solution and very close to what you have. Search for it.
I second the YAML suggestion. It's extremely easy to edit, very forgiving of mistakes and widely supported (especially among the dynamic languages).
I'd say the most common choices are:
JSON (offical site) - very flexible, though the punctuation can take a bit for people to get used to
INI - super simple to use, but a bit limited in data-types
XML - pretty flexible, common, but way too verbose sometimes
You could try JSON available at: http://www.json.org/
It was designed for javascript and web usage initially. But it's pretty clean, and supported in many languages.
Lua was designed to be a programming language where the syntax lets you easily use it as a markup language as well, so that you include data files as if they were code. Many computer games use it for their scripting, such as World of Warcraft due to its speed and ease of use. However it's originally designed and maintained for the energy industry so there's a serious background.
Scheme with its S-expressions is also a very nice but different-looking syntax for data. Finally, you've got XML that has the benefit of the most entry-level developers knowing it. You can also roll your own well-defined and efficient parser with a nice development suite such as ANTLR.
I would suggest JSON.
Just as readable/editable as YAML
If you happen to use for Web then can be eval()'ed into JavaScript objects
Probably as cross language as YAML