How to create my own RecommenderJob?

How to create my own RecommenderJob? - hadoop

I have found several tutorials on how to create my own non-distributed recommender but none how to create my own distributed recommender job (any link is welcome if you know one).
In the book “Mahout in Action” there are some examples of how to write Mappers/Reducers using Mahout’s objects, but it does not seem to show how to put these jobs together?
However there is item/RecommenderJob in mahout-core which gives an idea of how this can be done. My actual intent is to replace the first mapper so that I don't have to prepare my data outside of mahout (lines look like "userid,itemid1,itemid2,itemid3..." and using item.RecommenderJob I obviously need lines like "itemid1,itemid2", "itemid1,itemid3", ...).
Now would it be a good idea to just copy over the RecommenderJob class and change what I need?
I have tried it, but since this class uses variables that are in package scope (e. g. UserVectorSplitterMapper.USERS_FILE) I have to replace these – which does not feel good.
Should I rather create a new class extending AbstractJob and pick out the things I need from RecommenderJob? Then what are the elements in RecommenderJob that I really need?

Your alternatives are to precede the job with your own job that translates your input into a form the job wants, or, indeed to just modify the job. I don't think it's a big deal to copy the job and modify and customize it if you need non-trivial changes that aren't (and wouldn't make sense to be) supported as some kind of config parameter.

Related

Clarify steps to add a language variant to Stanza

I would like to add a non-standard variant of a language already supported by Stanza. It should be named differently from the standard variety included in the common distribution of Stanza. I could use a modification of the corpus for training the AI, since the changes are mostly morphological rather than syntactical, but how many steps would I need to take in order to make a new language variety for Stanza from this background? I don't understand what data are input and what are output in the process of adding a new language in the web documentation.

It sounds like you are trying to add a different set of processors rather than a whole new language. The difference being that other steps of the pipeline will still work the same, right? NER models, for example.
If that's the case, if you can follow the steps to retrain the current models, you should be able to then replace the input data with your morphological updates.
I suggest filing an issue on github if you encounter difficulties in the process. It will be a lot easier to back & forth there.
Times when we would actually recommend a whole new language are when 1) it's actually a new language or 2) it uses a different character set - think different writing systems for ZH or for Punjabi, if we had any Punjabi models

How can I separate properties visually for an EPiServer ContentType?

I want to make the editor experience better and more visually pleasing when filling in content on a page (In all properties view). Could be a simple divider or a heading..
I am already using tabs, whenever it makes sense. Also, I have been experimenting with using blocks as properties. This adds a nice separation with at clear heading, but it is so much more code to maintain and a bit of a mess to be honest when the properties truly belong to the page type.

Out-of-the-box, it is not possible to decorate properties with headlines, unless you use block-properties, as you mention yourself.
However, I thought your question was quite interesting, and I discovered that extending Episerver to accommodate this behavior is surprisingly easy. I have written an example solution, which you're free to use as you like: https://arlc.dk/grouping-properties-with-headlines-without-property-blocks.
If you dislike the solution, an alternative approach would be to introduce your own Property-type (Headline), and create a 1) a custom dojo-widget to simply display a headline, and 2) an EditorDescriptor to set the ClientEditingClass.
Linus wrote an excellent blog post on this here: https://world.episerver.com/blogs/Linus-Ekstrom/Dates/2012/7/Creating-a-custom-editor-for-a-property/.
EDIT:
I see, I have skipped too quickly over the overriding part.
You don't have to override any files by replacing them, and you won't have to extract Shell.zip (unless you're curious how Episerver has implemented their widgets). The part that overrides the specific component is define("epi/shell/form/Field". As long as your definition of this widget is loaded after shell, dojo will use your implementation, whenever something is requiring "epi/shell/form/Field". The thing that ensures your implementation is loaded after, is in module.config, under 'This injects our field-implementation [...]'.
The path ~/ClientResources/Scripts/Shell/Field/Field.js is simply the location I have chosen to put the overriden version of Field.js. You can put it wherever you like, as long as you update module.config accordingly, with the new path.
It works like this: First, Episerver defines widget A. Then you define a widget with the same name, A. When anything tries to fetch A, it returns your implementation, rather than Episerver's.

Whats the best way to include static information in my app?

I am currently working on my first small desktop menubar app (macOS, Swift 3). It needs to access
a) A list of words (Think word dictionary, 1k-5k words, per supported language)
b) A list of structured data (Think simple structs, ~500)
I am currently pondering, whether to build these in code - maybe a factory class per language. Or include them in my app as json and parse at runtime. Or maybe build an SQLite file and read that during runtime, although that approach would be harder to diff in source control ...
As I am new to the platform I was wondering whether there might be a better way that I am not aware of, or maybe performance considerations that render one of the mentioned approaches useless.
As usual, thanks in advance folks !

Your listed solutions can be used for this task. However I think for such kind of tasks the best solution is to use CoreData, where you can store a list of words as well as structured data, also make relations between them if you need it

How to use the MultipleTextOutputFormat class to rename the default output file to some meaningful names?

After the reduce phase in Hadoop, I wanted the output file names to be something meaningful depending on the input key value. However I'm not successful on following the example on "Hadoop: The Definative Guide" which used MultipleTextOutputFormat to do this. The reason is that it's based on old API and it doesn't work on the new API ?
Can anybody hint on the solution or point me to the relevant documentation ?

You are probably right. Most things that worked in the old API don't always work in the new one.
There is a "new way" of doing this now, called MultipleOutputs.

Is it possible? (To show specified methods, depending on a given parameter.)

I have a (binary)reader, to read DBC files (Its a file format used by a game) into structures, that I know. Every DBC file differs a little bit (their structs), from the others, and their reading method differs too. (there is spell.dbc item.dbc map.dbc etc...)
So I made unique reading methods for all the DBC files I want to read. (I think its not the best solution, but for now, its okey for me.)
Here Is an example usage:
DBCReader reader = new DBCReader(DBCFile.Spell); // you can use DBCFile.Map or others here
Now my question : Is that possible to list only the spell related methods, when I use my reader?
So my DBCReader class contains different reading methods, for all the dbc files,and I would like to only see the spell related ones.
So now when I write "reader." to C#, it lists all my methods for all dbc files, like
ReadMapName() (which is for Map.dbc),
ReadSpellID() (which is for
Spell.dbc), GetItemName() (which is
for Item.dbc) etc..
But I only want the spell related methods to be listed. Is it possible? :)
Thanks.

I'm not aware of a way to get intellisense to display a subset of the methods in a class.
You could make seperate classes for each type, ie:
dbcSpellreader, dbcMapReader, etc.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio