What to prefer in GQL; StringListProperty or ListProperty? - performance

I am building an application with a many to many relationship;
An item of entity 'Picture' can be linked to any number of Galleries ('Gallery'). And of course a Gallery can hold any number of Pictures.
So, following the Google Suggestion here, I will use a List at 'Picture' which holds the foreign keys of 'Gallery'. This is the BigTable approach.
(The old-style Relational DB approach would be to have a table / entity in between 'Picture' and 'Gallery'.)
Here's my question: When storing the Key, should I go for a "StringListProperty" on 'Picture' or would a "ListProperty(db.Key)" work better?
One reason I see for a StringList would be, that I could store also other values then Keys, but on the other hand that would be dirty style anyway. But I am also pretty sure that Google suggested not to use more then one List at an entity because the Index(es) will explode. So this will keep me a backdoor.
As for the ListProperty with type "Key" one point would be the automatic verification, if the value is actually a Key.
As it is very easy to convert Strings to Keys and vice versa, I don't see any reason for one of the List types to prefer here.
When it comes to performance issues, I have no idea on how I could test this - but it looks like this will be the main factor in this decision.
Curious about your input. Especially if someone has tested the performance on this or would be so kind and do it.
Cheers,
//Hannes

Use a db.ListProperty(db.Key) if you're intending to store lists of keys. They will be stored in a binary representation, which is more compact than the string representation you would use in a string list.
You're right that mixing keys with other objects in a list is messy. Having multiple lists in an entity is fine, as long as you don't index more than one of them in the same custom index - that is what causes exploding indexes.

Use db.ListProperty(db.Key), this is will make the data fetch easier than string.. if Gallery model has property had pic_list which is of type db.ListProperty(db.Key), which contains the list of keys of picture entity.. Suppose Picture is the name of your entity.. then Picture.get(//GalleryObject//.pic_list) will get all the picture entites..

Related

Marklogic - get list of all unique document structures in a Marklogic database

I want to get a list of all distinct document structures with a count in a Marklogic database.
e.g. a database with these 3 documents:
1) <document><name>Robert</name></document>
2) <document><name>Mark</name></document>
3) <document><fname>Robert</fname><lname>Smith</lname></document>
Would return that there are two unique document structures in the database, one used by 2 documents, and the other used by 1 document.
I am using this xquery and am getting back the list of unique sequence of elements correctly:
for $i in distinct-values(for $document in doc()
return <div>{distinct-values(
for $element in $document//*/*/name() return <div>{$element}</div>)} </div>)
return $i
I appreciate that this code will not handle duplicate element names but that is OK for now.
My t questions are:
1) Is there a better/more efficient way to do this? I am assuming yes.
2) Is there a way to get back enough detail so that I could build up the xml tree of each unique structure?
3) What is the best way to return the count of each distinct structure e.g. 2 and 1 and in the above example
If you have a finite list of elements for which you need to do this for, consider co-occurance or other similiar solutions: https://docs.marklogic.com/cts:value-co-occurrences
This requires a range index on each element in question.
MarkLogic works best to use indexes whenever possible. The other solution I can think of is that you actually create a hash/checksum for the values of the target content for each document in question and store this with the document (or in a triple if you happen to have a licence for semantics). Then you you would already have a key for
the unique combinations.
1) Is there a better/more efficient way to do this? I am assuming yes.
If it were up to me, I would create the document structured in a consistent fashion (like you're doing), then hash it, and attach the hash to each document as a collection. Then I could count the docs in each collection. I can't see any efficient way (using indexes) to get the counts without first writing to the document content or metadata (collection is a type of metadata) then querying against the indexes.
2) Is there a way to get back enough detail so that I could build up the xml tree of each unique structure?
After you get the counts for each collection, you could retrieve one doc from each collection and walk through it to build an empty XML structure. XSLT would probably be a good way to do this if you already know XSLT.
3) What is the best way to return the count of each distinct structure e.g. 2 and 1 and in the above example
Turn on the collection lexicon on your database. Then do something like the following:
for $collection in cts:collections()
return ($collection, cts:frequency($collection))
Not sure I follow exactly what you are after, but I am wondering if this is more what you are looking for- functx:distinct-element-paths($doc)
http://www.xqueryfunctions.com/xq/functx_distinct-element-paths.html
Here's a quick example:
xquery version "1.0-ml";
import module namespace functx = "http://www.functx.com" at "/MarkLogic/functx/functx-1.0-nodoc-2007-01.xqy";
let $doc := <document><fname>Robert</fname><lname>Smith</lname></document>
return
functx:distinct-element-paths($doc)
Outputs the following strings (which could be parsed, of course):
document
document/fname
document/lname
there are existing 3rd party tools that may work, depending on the size of the data, and the coverage required (is 100% sampleing needed).
Search for "Generate Schema from XML" --
Such tools will look at a sample set and infer a schema (xsd, dtd, rng etc).
They do an accurate job, but not always in the same way a human would.
If they do not have native ML integration then you need to expose a service or exort the data for analysis.
Once you HAVE a schema, load it into MarkLogic, and you can query the schema (and elements validated by it) directly and programmatically in ML
If you find a 'generate schema' tool that is implemented in XSLT, XQuery, or JavaScript you may be able to import and execute it in-server.

Algorithm and data structure to store First name and last name

Is there a efficient way to store first name and last name in data structure so that we can lookup using either first or last name? I would consider a binary search tree with first name. It would be efficient to search first name. But wouldnt be efficient when trying to search last name. we can also consider one more BST with last name. Any ideas to implement it efficiently?
What if the question is
String names[] = { "A B","C D"};
A requirement is to be able to extend this directory dynamically at runtime,
without persistent storage. The directory can eventually grow to hundreds or
thousands of names and must be searchable by first or last name.
Now we can't have hash tables to store. Any ideas?
Two hash tables: one from first name to person, and one from last name to person.
Simple is best.
Why not put both first and last names in a trie?
As a bonus, this way you can even get suggestions on partial names by traversing all leaves after current node (maybe on an asynchronous call)
You're idea is pretty good, but here's another option: how about implementing to hash tables?
The first hash table would use first names as a key, and the associated value would either be the last name or a pointer to a Name object. The second hash table would use last names as keys, with the first names or pointers to Name as the values.
Personally, for choosing the values, I would go for a pointer to a Name object, since this method would be more applicable in case you'd like to store even more information (e.g. data of birth, etc.)
Also, see Does Java have a HashMap with reverse lookup?…, which is specific to Java but the discussion on the data structures is relevant to any language.
Note that structures such as Bidirectional Sorted Maps also allow range searches (which dual hash tables don't).
if you need to search only by first name or only by last name then yes, two hashmaps are the best (and notice you're not duplicating the data, you're partitioning it) but if you don't mind then put both first and last names in a single hashmap and don't differentiate between the two.

Query core data store based on a transient calculated value

I'm fairly new to the more complex parts of Core Data.
My application has a core data store with 15K rows. There is a single entity.
I need to display a subset of those rows in a table view filtered on a calculated search criteria, and for each row displayed add a value that I calculate in real time but don't store in the entity.
The calculation needs to use a couple of values supplied by the user.
A hypothetical example:
Entity: contains fields "id", "first", and "second"
User inputs: 10 and 20
Search / Filter Criteria: only display records where the entity field "id" is a prime number between the two supplied numbers. (I need to build some sort of complex predicate method here I assume?)
Display: all fields of all records that meet the criteria, along with a derived field (not in the the core data entity) that is the sum of the "id" field and a random number, so each row in the tableview would contain 4 fields:
"id", "first", "second", -calculated value-
From my reading / Googling it seems that a transient property might be the way to go, but I can't work out how to do this given that the search criteria and the resultant property need to calculate based on user input.
Could anyone give me any pointers that will help me implement this code? I'm pretty lost right now, and the examples I can find in books etc. don't match my particular needs well enough for me to adapt them as far as I can tell.
Thanks
Darren.
The first thing you need to do is to stop thinking in terms of fields, rows and columns as none of those structures are actually part of Core Data. In this case, it is important because Core Data supports arbitrarily complex fetches but the sqlite store does not. So, if you use a sqlite store your fetches are restricted those supported by SQLite.
In this case, predicates aimed at SQLite can't perform complex operations such as calculating whether an attribute value is prime.
The best solution for your first case would be to add a boolean attribute of isPrime and then modify the setter for your id attribute to calculate whether the set id value is prime or not and then set the isPrime accordingly. That will be store in the SQLite store and can be fetched against e.g. isPrime==YES &&((first<=%#) && (second>=%#))
The second case would simply use a transient property for which you would supply a custom getter to calculate its value when the managed object was in memory.
One often overlooked option is to not use an sqlite store but to use an XML store instead. If the amount of data is relatively small e.g. a few thousand text attributes with a total memory footprint of a few dozen meg, then an XML store will be super fast and can handle more complex operations.
SQLite is sort of the stunted stepchild in Core Data. It's is useful for large data sets and low memory but with memory becoming ever more plentiful, its loosing its edge. I find myself using it less these days. You should consider whether you need sqlite in this particular case.

Enumerate indexes on a Extensible Storage Engine (ESENT) table

Background
I'm writing an adapter for ESE to .NET and LINQ in a Google Code project called eselinq. One important function I can't seem to figure out is how to get a list of indexes defined for a table. I need to be able to list available indexes so the LINQ part can automatically determine when indexes can be used. This will allow much more efficient plans for user queries if appropriate indexes can be found.
There are two related functions for querying index information:
JetGetTableIndexInfo - get index information by tableID
JetGetIndexInfo - get index information by tableName
These only differ in how the related table is specified (name or tableid). It sounds like these would support the function I want but all the info levels seem to require that I already have a certain index to query information for. The only exception is JET_IdxInfoCount, but that only counts how many indexes are present.
JET_IdxInfo with its JET_INDEXLIST sounds plausible but it only lists the columns on a specific index.
Alternatives
I am aware that I could get the index information another way, like annotations on .NET types corresponding to database tables, or by requiring a index mapping be provided ahead of time. I think there's enough introspection implemented to make everything else work out of the box without the user supplying extra information, except for this one function.
Another option may be to examine the system tables to find related index objects, but this is would mean depending on an undocumented interface.
To satisfy this question, I want a supported method of enumerating the indexes (just the name would be sufficient) on a table.
You are correct about JetGetTableIndexInfo and JetGetIndexInfo and JET_IdxInfo. The twist is that the data is returned in a somewhat complex: a temporary table is returned containing a row for the index and then a row for each column in the table. To just get the index names you will need to skip the column rows (the column count is given by the value of the columnidcColumn column in the first row).
For a .NET example of how to decipher this, look at the ManagedEsent project. In the MetaDataHelpers.cs file there is a method called GetIndexInfoFromIndexlist that extracts all the data from the temporary table.

iterating over linq entity column

i need to insert a record with linq
i have a namevaluecollection with the data from a form post..
so started in the name=value&name2=value2 etc.. type format
thing is i need to inset all these values into the table, but of course the table fields are typed, and i need to type up the data before inserting it
i could of course explicitly do
linqtableobj.columnproperty = convert.toWhatever(value);
but i have many columns in the table, and the data coming back from the form, doesnt always contain all fields in the table
thought i could iterate over the linq objects columns, getting their datatype - to use to convert the appropriate value from the form data
fine all good, but then im still stuck with doing
linqtableobj.columnproterty = converted value
...if there is one for every column in the table
foreach(col in newlinqrowobj)
{
newlinqobj[col] = convert.changetype(namevaluecollection[col.name],col.datatype)
}
clearly i cant do that, but anything like that possible.. or
is it possible to loop around the columns for the new 'record' setting the values as i go.. and i guess grabbing the types at that point to do the conversion
stumped i am
thanks
nat
If you have some data type with a hundred different properties, and you want to copy those into a completely different data type with a hundred different properties, then somehow somewhere in your code you are going to have to define a hundred different "mapping" instructions. It doesn't matter what framework you are using, or whether the "mapping" instructions are lines of C# code, XML elements, lambda functions, proprietary "stuff", or whatever. There's no getting away from it.
Bearing that in mind, having one line of code per property looks to me like the fastest, simplest, most readable and maintainable solution.
If I understood your problem correctly, you could use reflection (or dynamic code generation if it is performance sensitive) to circumvent your typing problems
There is a preety good description of how to do something like this at codeproject.
Basically you get a PropertyInfo for the property you want to set (if it's not a property I think you would need dynamic code generation) and use it's setValue method (after calling the appropriate Convert.ChangeType of course). This will basicall circumvent the whole static typing, so there you are.

Resources