Is there a search construct that would allow a client to search for Locations in another location (e.g. a bed within a hospital or campus).
There are tree-traversal notions for ValueSets (code systems), but I don't anything for the part-of trees?
It would be possible to define a search criteria that did this, but there isn't one at the moment. If you think it would be widely used, you can submit a change request to define one.
Related
Context: I'm working on an analyzer for useragent strings (Yauaa) and as part of this analysis I want to make an educated guess what brand of the device should be reported. I have an implementation that I need to rewrite to be a lot more efficient.
Because I do not want to have a complete list of all devices I want to do the detection based on the prefix of the model.
So I have a dataset with prefixes and the brand that is associated:
"GT-" --> "Samsung"
"LLD-" --> "Huawei"
And then I want to do a .get("GT-1234124") which should result in "Samsung" because that is the "longest matching prefix".
I had a look at the Trie structure but that seems to be for the opposite situation. What I understand is that you start with a set of values and you can efficiently get all the values that starts with the provided prefix.
If I were to implement this from scratch I would use a tree similar to the Trie but walk around it differently. What I'm looking for is a datastructure that does what I need as fast as possible.
What datastructure do you recommend for this usecase?
Is there an existing (proven) implementation I can use?
I did some digging into datastructures and found that essentially the Trie structure is what I need with a different way of walking around the structure.
Since this structure is really simple I created my own implementation that works very well.
See:
https://github.com/nielsbasjes/yauaa/blob/master/analyzer/src/main/java/nl/basjes/parse/useragent/utils/PrefixLookup.java
Updates:
I wrote an article about this https://techlab.bol.com/finding-the-longest-matching-string-prefix-fast/
I put my implementation into a separate library which I opensourced and which is already available via maven central. See https://github.com/nielsbasjes/prefixmap
The use case for the graph database is to have users and contents (vertices) linked by likes, favorites and reports relations (edges). The problem I have is that I will sometimes need to show the reported contents (from any users). Since this is not a standard graph traversal, I fear this would have a big performance hit.
Is it possible to index the edges of type "reports" to quickly get the list of all contents that have been reported? Is there a better way to do this?
No, you cannot (don't need to) explicitly manage indices. Neptune uses a novel indexing strategy based on semi-clustered indices and offers excellent index performance out of the box. There is no need for custom indices.
From Neptune FAQs: https://aws.amazon.com/neptune/faqs/
Do I need to create indices on my data with Amazon Neptune?
No, existing graph database users are often forced to try and outguess the vendor implementation. Explicitly maintaining indices is just one aspect of that. Amazon Neptune does not require you to create specific indices to achieve good query performance, and it minimizes the need for such second guessing of the database design.
Can you share some details on the specific queries that you are looking for?
I am facing a problem in finding out the right definition of the level and the type of the Gene Ontology term.
I know that the Type of the GO term is related to isA graph without considering the partOf relationships. But I am still confused when I am having the following case:
When finding the level:
If I have GO:123 (assuming it's the root) pointing to GO:345 and to the term GO:567 with isA relations in both links.
Also, GO:345 is pointing to the term GO:567 with partOf relationship.
Now, what's the level of GO:567? Is it 2 because the root is pointing to it? or is it 3 because the son of the root is pointing to it? How should I deal with such cases in both, level and type of the graph?
You may get more helpful answers taking this question to Biostar. Or perhaps the GO website and GO mailing lists.
The issues you raise arise directly out of the semantics of the GO. It is a directed acyclic graph (DAG) so nodes do not have a single definition of depth or level. You could choose to use min(depth) or max(depth) if you want one consistent definition but that may not be a good choice for your application, which you don't describe. Likewise the edges of the graph have different properties describing different biological relationships ("is a", "part of" and "regulates").
How you best processes these properties depends on what you want to accomplish or what queries you want to make of the data. Because the annotation density of GO terms to genes varies greatly across organisms and terms, you may be better off considering metrics that measure the informativeness of annotations in your particular context rather than 'depth' in the graph.
A new project with some interesting requirements has arrived on my desk. I need to develop a searchable directory of businesses, with a focus on delivering relevant results based on arbitrary search queries. The businesses can be of any niche; there's no one area that is more represented than another.
When googling for things like "search algorithm" or "content relevance algorithm," all I get are references to Google's "Mystical Algorithm of the Old Gods" and SEO firms.
Does the relevance value of MySQL's full text Match() function have what it takes for the task? I've never used it, but I'm definitely going to do some testing. Also, since this will largely be a human edited directory, I can assume that we can add weighted factors like tagging and categories. What would be a good way to combine these factors with MySQL's Match() relevancy?
I'm also open to ideas that I've not discussed here.
For an example of information retrieval based techniques lookup TF-IDF or BM25.
For machine learning based techniques, lookup RankNet and its variants from MSR.
If you have hand edited data, have a look at Oracle text search. In one of my previous projects we had some good results.
I was not directly involved in the database setups, but I know that the results were very welcome. (Before this they had just keyword based search).
Use a search engine like Solr to index the data. You can still use MySql to hold the data, but for searches use a search engine.
In our application we have a repository that contains things (they are called methods and queries, but this is not particularly relevant for this question). Each thing has a title, description (though some may lack both) and some other data. Users save things to repository and load and use things from repository.
I wonder what is the best way to organize the repository from usability point of view. There seems to be two major approaches. The first approach is to put things in folders, subfolders and so on, and have a hierarchical structure similar to a filesystem. The second approach (that has become fashionable) is two have a flat space and assign zero or more tags to each thing, so that users can view a list of things for a particular tag.
Currently we use flat space, tags and search. It appears to be somewhat unmanageable. I am not sure if switching to folders/subfolders will make it better.
I would like to learn more about the pros and cons of each approach and what properties of the collection and the things themselves suggest using one or another approach or a combination of both. If anybody can point me to some studies or discussions of those, I would really appreciate that.
There is no reason you can't use both methods. To some extent finding things is dependent upon what the thing is and why it is being looked for. A hierachical design can work well when somebody knows what they are looking for and a tag / keyword based system can work better when the structure is less obvious.
Also a network structure that links similar things can also be very good as you can see with the internet or a wikipedia.
I use the law of symmetry to help me in this situation.
First you build the tree like structure in the back end and then build the tagging system for the front end.
You use both to organize your data collection.
A tag cloud works better than a hierarchy if
the taxonomy is uncertain
("Now is this a small car or a large truck?")
there is no central authority for classification
there is no obivous or natural order between the classes
(cars can be classified by color or by size, there is no obvious rank between color and size)
new categories may be created on the fly
Otherwise, a hierarchy gives more confidence in completeness, as every item has exactly one obviously correct location: did I find all documents about birds? Is there really no document about five-story houses?
Tag clouds need some maintenance, I am not sure if this can be completely user-provided:
Dealing with synonyms, tag synonyms, merging tags, clarifying tags (e.g. is "blue" a feeling or a color?)
Another option are attribute-value pairs. They can be built upon a well-maintained tag cloud, e.g. grouping "red / black / blue" tags under "color". They can also work with floating values, search can be extended to similar values in case of not enough results (such as age, date, even multidimensionals like color).
However, this requires to know ahead what search criteria users need. If you need to introduce a new category, you need to re-tag the entire body of documents.
See also my request for clarification: what are the problems? Not enough tagging? tagging to distinct? Users not finding what they are loking for? Users not confident in search results?