When training a doc2vec model using a corpus in the TaggedDocument class, you can provide a list of tags. When the doc2vec model is trained it learns a vector representation for the tags. For example you could have one tag representing the document, and another representing some classification that can be shared between documents.
How would one provide additional tags when streaming a corpus using TaggedLineDocument?
The TaggedLineDocument class only considers documents to be one per line, with a single tag that is their line-number.
If you want more tags, you'll have to provide your own iterable which does that. It should only be a few lines of code, depending on where your other tags come from. You can use the source for TaggedLineDocument – which is itself only 9 lines of Python code –as a model to build on:
https://github.com/RaRe-Technologies/gensim/blob/e4199cb4e9a90df44ca59c1d0505b138caa21951/gensim/models/doc2vec.py#L1126
Note: while supplying ore than one tag per document is a natural extension of the original 'Paragraph Vectors' approach, and often can provide benefits, sometimes it also 'dilutes' the salience of each tag's vector – which will be a special concern as the average number of tags per document grows, or the model acquires many more tags than unique documents. So be sure to comparatively evaluate whether any multiple-tag strategy is helping or hurting, in different modes, and whether things like pre-known categories work better as extra tags or known-labels for some later steps.
In my webapp I have several places (like user profile) where I use a columns to place propertyName -> propertyValue pairs. Sometimes property values are short and I can afford having both property name and value in one row but sometimes property values are big ( like cloud of user interests or paragraph describing something) so it makes sense to devote whole new row to property_value.
Are there any recommended ways how I can (mostly with means of singularity.gs) make property-value go to another row if property-value is too long?
It's really tricky to answer without knowing more about the web app. Generally speaking I'd wrap the key value pairs in a div and attach a class to the wrapper that styles the elements inside.
EG. If the Key-Value is "big" use block elements, if its "small" use inline-blocks to stay inline .
I'd use singularity to layout the outer wrappers, not the elements themselves. Basic CSS should cover the inner elements. If you don't know the length of the data coming out of a given key-value pair then I don't think CSS alone can solve your challenge. JS, or server side scripting will likely be needed to attach the appropriate wrapper classes described above.
Is that possible? I tried doing it using the admin GUI. When I add the attribute it dissapears from the available attributes, as usual, but if I move it inside the groups on the attribute set there is no option of "copying" or alike.
I realized of this issue as programmatically we are importing products, categories and attributes. On the language attributes we have for example german. It is used in several places, for manual language, for titles, descriptions, etc. We didn't knew about it and just realized when detected that the attribute is only found on the last created group where it was placed.
If it is not possible then how do we solve this issue? Should we create several different attributes to be used on different groups inside an attribute set? Maybe german_manual, geman_language or something similar? Or there is already a solution for this and we do now know it?
Hope I have clearly described my question, is somewhat complicated.
Attribute Set is virtual box for attributes (we can consider attributes as physical properties). Groups inside one Attribute Set is just more virtual boxes to split one big "box" (Attribute Set), they are created to simplify attribute management, divide Attribute Set on logical containers.
Now as I understood from your question, you want 1 attribute to be assigned to 2 different Groups inside 1 Attribute Set. It's the same that put 1 stone into 2 boxes...
I am making a pretty abstract tree drawing system, but I am having quite a lot of trouble formalizing all the drawing features it should have. I'd very much appreciate if someone could point me to things to read about this topic, because unfortunately my searches have been in vain.
I am looking for/trying to make a meta-language for displaying trees. In these trees each node is an instance of a user-defined Object which have a user-defined graphical representation.
Each Object is associated with a Name, a graphical representation and has a finite number of childs ( 0+ ), which are only known to be Objects themselves. Object recursion is not allowed.
Each Object may have user-defined Options that are used to trigger conditions which would change their graphical representation ( in user-defined ways ). Some Options are automatically applied, others may require user interaction ( "Would you like this Object to be A or B?" ), thus explaining why Object trees need to be instanced.
Object
Name // The Object Name
Childs // List of Object Childs
ContextName // The Name of the Child within this context
Types // List of Objects' names. This child may be only one of them. Decided by the user during instancing.
Options // List of Options assigned to this child. Some of them may require user interaction, and apply other Options to the Child's childs.
*Priority // This is an integer which is used to decide the order in which childs are drawn.
Symbol Name // The Graphical representation of the Object
Once an Object Tree has been instanced, it has to be drawn without any addictional user input, and this is where I am having some trouble. The instancing of an Object tree assigns to each Object a particular graphical representation ( let's call it Symbol ). The assignment is however not known before the instancing. Different Objects may also have the same Symbol, which may be drawn differently depending on the Object's Options.
Because of this, Symbols must be defined separately from Objects, and must have a series of abstract mechanisms to be able to draw themselves ( and thir assigned childs ) correctly, following the user-specified rules.
Each Symbol is represented by an image ( or no image ) plus a finite number of Attachments. Attachments are relative positions to the Symbol's coordinates which tell the drawing code where to draw the Symbols of the Object's childs. Each one of them may have particular conditions to be used ( e.g. this Attachment may only be used by a Symbol that has a particular Option, or if N Symbols have already been drawn, no collisions with already drawn Symbols etc etc ).
The algorithm has to find a free Attachment for each Object's child, following the order specified by their Priority. If it is not possible to find an Attachment for a Child the user may specify beforehand rules that allow some automatic retries, but if they also fail then the whole tree drawing fails. Some of these rules allow for adding addictional child Symbols and/or assigning child Symbols to other childs ( making them grandChildren ) etc.
Symbol
Name
Main Image // Image Path, Height, Width
Attachments // List of the attachments, their position, requirements and addictional infos
Fail Rules // List of actions to do if it is not possible to successfully assign each Child to an Attachment
My main problem is that the number of variables that a Symbol should be able to access is pretty high. Each Symbol, which I'll again remind should be defined using this meta-language, should be able to access its child Symbols' informations ( not others to avoid deadlocks and circular referencing ): for example the user may want the heigth and width of a Symbol to be equal to the sum of the heigth and width of all the Child's Symbols, or to use the same picture, and so on. This is also caused by the fact that the user writes Symbols' rules independently from the final structure.
At the same time, since the tree must be drawn from top to bottom, some of these informations may not be available from the start, and may require a great deal of backtracking.
Also, since all of this has to be defined within a meta-language which I have to be able formalize and parse, I have to define which are the functions that the meta-language requires to allow the maximum grade of freedom to the language-writing user without being overly complex ( this is a vague limit, but essentially I don't want to have Tikz as a subset of my meta-language ). I am having however quite a bit of trouble identifying them.
As I said before, I am looking for informations about this kind of topic and/or methods for completing a project like this. Once I'll be able to fully complete the meta-language I think I won't have too much trouble implementing the code to do all of this, my problems are for the most part theoretical.
I have done a few similar projects with hierarchical data. I point you to where I started:
Joe Celko is the king of tree data. I recommend you start with his book. It is a mix of logic and business case. Trees and Hierarchies in SQL for Smarties even has a new edition out. There is a language for describing the hierarchies there, too.
I have used Oracle for storing my hierarchies which has a very efficient system for pulling and storing tree data. Look up "connect by" either in documentation or in the book: Mastering Oracle SQL by Mishra and Beaulieu.
You can use pointers to pull the images from the server so you wouldn't store them in the database. I have built several systems that use hierarchical displays of data with graphical objects, it keeps the overhead down this way. DevExpress and Telerik both have great viewers for displaying the trees and I have mine build the next levels dynamically. It doesn't know how many or what the next level is going to be until it is drilled down on. Try these examples and read the docs and you will be able to put this together in not time.
For telerik this link will show you multiple load on demand views: http://demos.telerik.com/aspnet-ajax/treeview/examples/programming/loadondemandmodes/defaultcs.aspx
For Devexpress: http://demos.devexpress.com/ASPxTreeListDemos/Data/VirtualMode.aspx
Think in HTML/DOM.
I was surprised, when I found, that the file format of the outliner I am using, NoteCase, is plain HTML. NoteCase can be found here: http://notecase.sourceforge.net/index1.html
If you don't familiar with it, outliner is an application type, which you can organize mainly text nodes in hierarchical tree. There are task outliners, too. When an outliner has graphical representation, it's called mind mapping. Anyway, the directory structure of a filesystem is an outline, too. There are lot of outliners for various areas. See Wikipedia for more details.
Notecase uses DL/DT/DD: DL is the list, DT is the item, and DD is the description of an item. They can be nested, of course.
If the format is HTML, you need only a CSS to show it in a browser easy readable for human eyes.
If you have additional properties, you can define additional tags or attributes, which browser will not show, but your renderer can.
You should write a converter, which transforms your source HTML file format to a more detailed HTML format, which contains computed fields (e.g. which sums of values from the sub-nodes, or replaces "inherit" marks in a sub-node with the inherited value from parent node), some additional formatting, or you can transform attributes into HTML nodes:
<node type="x" size="y" />
to
<div class="node">
<div class="param"> type: x </div>
<div class="param"> size: y </div>
</div>
Your data representation is a kinda DOM, and you should process it similar way. First, parse and read values from the file. Then, you should run some additional rounds (walk the tree) to fill missing values with defaults, calculate inherited and summarized values etc.
I think, you can't use standard DOM parsers, because you mentioned custom sorting of stuff depending on parameters, which DOM modell doesn't really support.
Don't afraid to walk the object tree just as many passes as many operation you want to perform on it. You can play with changing the order of the passes, enabling and disabling passes... as your have more and more features, it will articulated as new processing passes.
You may have passes, which must be run several times, e.g. if one pass can't calculate a value (because it's source should be calculated first), it may return a flag that "I've not done yet", and it should run again on the tree, until it results "no change mades, I've done".
I hope I've push you a bit.
We are using DevExpress XtraReports 2009v3.3 and although I can achieve what I want through various formatting objects in code, there must be a (better/less painless/maintainable/visual) way of achieving what I require...
I need to produce a report, designed to end-user 'look & feel'. We have many companies which use our software and they all require different design schema's/templates for their reports. For example - a single report, depending on who logs on (we know what company they belong to) apply the template the report should use.
As an example, some of the requirements (per end-user/company) include:
their own logo (positioned in the correct place),
Margins being of specific size
their own fonts (or font choice)
alternating colours schemes
Specific rows / columns being particular colours (both permanently and based on value)
Formatting of values, for example a european user would get euro, a uk user gets pounds on certain columns/cells/rows.
I know there is an End-User Report Designer, however this isn't what we require - I must create the schema/template design for a report then apply it at runtime.
Also using save/load layout for multiple repx files isn't the best solution as a change to the report would cause a lot of extra work as you would have to update each repx template file.
It is possible to create different reports and save it to the repx format via XtraReport.SaveLayout method and use these repx files as template.
Similar but different question
I asked dev express and they basically said there isn't any 'layout abstraction' that would do what I require.