eXist-db collection sort - xpath

Following on from this question about navigating collections using pos:
In eXist 4.7 I have a collection in myapp/data/ which contains thousands of TEI XML documents. I use the following solution from Martin Honnen to get the document before and after a certain document
let $data := myapp/data
let $examples := $data/tei:TEI[#type="example"]
for $example at $pos in $examples
where $example/#xml:id = 'TC0005'
return (
$examples[$pos - 1],
$example
$examples[$pos + 1]
)
With this I would have expected $examples[$pos - 1] to produce document 'TC0004' and $examples[$pos + 1] to produce 'TC0006' (based on the sort order seen in eXide collection navigation view for example). They do not, producing the inverse instead.
Honnen and Michael Kay responded that
ordering of documents within a collection is very much processor-dependent
Applying an order by $example/#xml:id ascending clause did not change the result for the better.
So, the question is how can I impose an alpha-numeric order on $data?
Many thanks.

It seems at the XQuery level you can change let $examples := $data/tei:TEI[#type="example"] to
let $examples := sort($data/tei:TEI[#type="example"], (), function($e) { $e/#xml:id })
(assuming the XQuery/XPath 3.1 higher-order sort function is available) or to
let $examples := for $e in $data/tei:TEI[#type="example"] order by $e/#xml:id return $e
using the order by clause.
I don't know whether exist-db has some way to impose an order during the creation or during the selection of a collection.

Based on experience with older versions of eXist, the $pos value while going through a loop is not the sorted position order. It is the position while going through.
What you first want to do is create an ordered list, then get the three items from the list you're looking for.
let $data := myapp/data[tei:TEI/#type eq 'example']
let $examples := for $e in $data order by $e/#xml:id ascending return $e
let $pos := index-of($examples/#xml:id, 'TC0005')
return if (count($pos) eq 1) then (
if ($pos gt 1) then $examples[$pos - 1] else (),
$examples[$pos]
$examples[$pos + 1]
) else ()
A potential problem with this approach is that you'll have to sort all items every time. Creating a sorted cached list may alleviate this problem and would also allow for a much more efficient query, where you can use preceding-sibling and following-sibling from the query result.
Another potential solution, if the naming convention for the IDs is consistent, would be to query the before and after IDs.
The check to see if there is one item in $pos is to prevent cases where #xml:id is not unique (yes, that would be against the spec, but it happens in real world data) or no item exists. Keep in mind that index-of returns an array of indexes - 0 or more.

Related

Iterate over table in order of value

Lets say I have a table like so:
{
value = 4
},
{
value = 3
},
{
value = 1
},
{
value = 2
}
and I want to iterate over this and print the value in order so the output is like so:
1
2
3
4
How do I do this, I understand how to use ipairs and pairs, and table.sort, but that only works if using table.insert and the key is valid, I need to be loop over this in order of the value.
I tried a custom function but it simply printed them in the incorrect order.
I have tried:
Creating an index and looping that
Sorting the table (throws error: attempt to perform __lt on table and table)
And a combination of sorts, indexes and other tables that not only didn't work, but also made it very complicated.
I am well and truly stumped.
Sorting the table
This was the right solution.
(throws error: attempt to perform __lt on table and table)
Sounds like you tried to use a < b.
For Lua to be able to sort values, it has to know how to compare them. It knows how to compare numbers and strings, but by default it has idea how to compare two tables. Consider this:
local people = {
{ name = 'fred', age = 43 },
{ name = 'ted', age = 31 },
{ name = 'ned', age = 12 },
}
If I call sort on people, how can Lua know what I intend? I doesn't know what 'age' or 'name' means or which I'd want to use for comparison. I have to tell it.
It's possible to add a metatable to a table which tells Lua what the < operator means for a table, but you can also supply sort with a callback function that tells it how to compare two objects.
You supply sort with a function that receives two values and you return whether the first is "less than" the second, using your knowledge of the tables. In the case of your tables:
table.sort(t, function(a,b) return a.value < b.value end)
for i,entry in ipairs(t) do
print(i,entry.value)
end
If you want to leave the original table unchanged, you could create a custom 'sort by value' iterator like this:
local function valueSort(a,b)
return a.value < b.value;
end
function sortByValue( tbl ) -- use as iterator
-- build new table to sort
local sorted = {};
for i,v in ipairs( tbl ) do sorted[i] = v end;
-- sort new table
table.sort( sorted, valueSort );
-- return iterator
return ipairs( sorted );
end
When sortByValue() is called, it clones tbl to a new sorted table, and then sorts the sorted table. It then hands the sorted table over to ipairs(), and ipairs outputs the iterator to be used by the for loop.
To use:
for i,v in sortByValue( myTable ) do
print(v)
end
While this ensures your original table remains unaltered, it has the downside that each time you do an iteration the iterator has to clone myTable to make a new sorted table, and then table.sort that sorted table.
If performance is vital, you can greatly speed things up by 'caching' the work done by the sortByValue() iterator. Updated code:
local resort, sorted = true;
local function valueSort(a,b)
return a.value < b.value;
end
function sortByValue( tbl ) -- use as iterator
if not sorted then -- rebuild sorted table
sorted = {};
for i,v in ipairs( tbl ) do sorted[i] = v end;
resort = true;
end
if resort then -- sort the 'sorted' table
table.sort( sorted, valueSort );
resort = false;
end
-- return iterator
return ipairs( sorted );
end
Each time you add or remove an element to/from myTable set sorted = nil. This lets the iterator know it needs to rebuild the sorted table (and also re-sort it).
Each time you update a value property within one of the nested tables, set resort = true. This lets the iterator know it has to do a table.sort.
Now, when you use the iterator, it will try and re-use the previous sorted results from the cached sorted table.
If it can't find the sorted table (eg. on first use of the iterator, or because you set sorted = nil to force a rebuild) it will rebuild it. If it sees it needs to resort (eg. on first use, or if the sorted table was rebuilt, or if you set resort = true) then it will resort the sorted table.

Sort Google Spreadsheet With Multiple Criteria Using Script

I have a spreadsheet that I update on a regular basis. I also have to re-sort the spreadsheet when finished because of the changes made. I need to sort with multiple criteria like the below settings. I have searched for examples but my Google search skills have failed me.
Sort range from A1:E59
[x] Data has header rows
sort by "Priority" A > Z
then by "Open" Z > A
then by "Project" A > Z
Mogsdad's answer works fine if none of your cells have values automatically calculated via a formula. If you do use formulas, though, then that solution will erase all of them and replace them with static values. And even so, it is more complicated than it needs to be, as there's now a built-in method for sorting based on multiple columns. Try this instead:
function onEdit(e) {
var priorityCol = 1;
var openCol = 2;
var projectCol = 3;
var sheet = SpreadsheetApp.getActiveSheet();
var dataRange = sheet.getDataRange();
dataRange.sort([
{column: priorityCol, ascending: true},
{column: openCol, ascending: false},
{column: projectCol, ascending: true}
]);
}
Instead of making a separate function, you can use the built-in onEdit() function, and your data will automatically sort itself when you change any of the values. The sort() function accepts an array of criteria, which it applies one after the other, in order.
Note that with this solution, the first column in your spreadsheet is column 1, whereas if you're doing direct array accesses like in Mogsdad's answer, the first column is column 0. So your numbers will be different.
That is a nice specification, a great place to start!
Remember that Google Apps Script is, to a large extent, JavaScript. If you extend your searching into JavaScript solutions, you'll find plenty of examples of array sorts here on SO.
As it happens, much of what you need is in Script to copy and sort form submission data. You don't need the trigger part, but the approach to sorting can be easily adapted to handle multiple columns.
The workhorse here is the comparison function-parameter, which is used by the JavaScript Array.sort() method. It works through the three columns you've indicated, with ascending or descending comparisons. The comparisons used here are OK for Strings, Numbers and Dates. It could be improved with some cleaning up, or even generalized, but it should be pretty fast as-is.
function sortMySheet() {
var sheet = SpreadsheetApp.getActiveSheet();
var dataRange = sourceSheet.getDataRange();
var data = dataRange.getValues();
var headers = data.splice(0,1)[0]; // remove headers from data
data.sort(compare); // Sort 2d array
data.splice(0,0,headers); // replace headers
// Replace with sorted values
dataRange.setValues(data);
};
// Comparison function for sorting two rows
// Returns -1 if 'a' comes before 'b',
// +1 if 'b' before 'a',
// 0 if they match.
function compare(a,b) {
var priorityCol = 0; // Column containing "Priority", 0 is A
var openCol = 1;
var projectCol = 2;
// First, compare "Priority" A > Z
var result = (a[priorityCol] > b[priorityCol] ) ?
(a[priorityCol] < b[priorityCol] ? -1 : 0) : 1;
if (result == 0) {
// "Priority" matched. Then compare "Open" Z > A
result = (b[openCol] > a[openCol] ) ?
(b[openCol] < a[openCol] ? -1 : 0) : 1;
}
if (result == 0) {
// "Open" matched. Finally, compare "Project" A > Z
result = (a[projectCol] > b[projectCol] ) ?
(a[projectCol] < b[projectCol] ? -1 : 0) : 1;
}
return result;
}
Try this using the Apps Script sort instead of the native JavaScript. I had the same issue with sorting the header row(s) and this solved the issue.
So I think something like this should work:
function onOpen() {
SpreadsheetApp.getActiveSpreadsheet()
.getSheetByName("Form Responses 1").sort(2);
}
Regarding sorting by multiple columns, you can chain that sort() method, with the final sort() having the highest priority, and the first sort() the lowest. So something like this should sort by Start date, then by End date:
function onOpen() {
SpreadsheetApp.getActiveSpreadsheet()
.getSheetByName("Form Responses 1").sort(3).sort(2);
}
Reference link:-
https://support.google.com/docs/thread/16556745/google-spreadsheet-script-how-to-sort-a-range-of-data?hl=en
Not sure if this is still relevant, but you can use the sort() function to define another tab as a sorted version of the original data.
Say your original data is in a tab named Sheet1; I'm also going to act as though your Priority, Open, and Project columns are A, B, and C, respectively.
Create a new tab, and in cell A1 type:
=sort(Sheet1!A1:E59, 1, TRUE, 2, FALSE, 3, TRUE)
The first argument specifies the sheet and range to be sorted, followed by three pairs: the first of each pair specifies the column (A=1, B=2, etc.), and the second specifies ascending (TRUE) or descending (FALSE).

xquery, preserve sort order

Is there any way to preserve sort order in xquery? My problem is that the data has to get passed to the MVC framework's get-response() function on the return, so I think it's automatically reverting to document order. I thought that doing the sort right in the first paramter of the subsequence() function would capture the first 'n' items after they are sorted, but it does not. I also tried having the $search-results parameter sorted before the call to subsequence(), but that did not work either. See the following code:
let $data :=
<figures count="{$count}"
mediatypes="{$mtypes}"
start="{$start}"
end="{$start+$myns:image-paging-default}"
page="{$page}"
increment="{$myns:image-paging-default}"
total-pages="{
if ($count lt $myns:image-paging-default) then
1
else
ceiling(($count + 1) div $myns:image-paging-default)
}"
{
subsequence(
( for $item in ($search-results)
order by $item//figure/#ftype descending
return $item),
$start,
$myns:image-paging-default)
}
</figures>
let $sidebar := xdmp:get-server-field('imagefacets')
return utils:get-response($req, ($data,$sidebar) )

Performing linq search using substring - is there a better way?

I'm using Entity Framework and have started writing queries using linq.
I have a Store table where each store has a name field. One of the searches I want to do is by initial letter and I'm trying to find the best way of achieving it. By best I mean most efficient.
Most searches only look for one key, 'A', 'B', 'C' etc but one of the searches is different in that the group '0-9' will contain a list of keys, one for 0, one for 1 etc. So my starting point is a list of some kind.
I then need to say if a Store name starts with any key in the list, because a Store does not store the initial letter in the table.
There are 2 things I'm looking for help with. Firstly how to get the linq working as I've outlined. Secondly, any advice as to whether there this is the best/only approach to bringing back the data or whether there may be a better way.
Your question is not very clear, but from what I understand you want to search for stores which begin with a key found in a list of keys. You can achieve that like this:
List<string> keys = new List<string>() { "A", "B", "M" };
var result = stores.Where(store => keys.Any(key => store.Name.StartsWith(key));
If the queries can differ, then for the letters:
stores.Where(s=>s.Name.First == c);
And for numerals:
stores.Where(char.IsDigit);
If they must be the same, then I suggest a character range:
stores.Where(s=> c1 <= s.Name.First && s.Name.First <= c2)
You can represent the ranges by Tuples if you want:
Tuple<char, char> range = Tuple.Create('A', 'A');
//Tuple<char, char> range = Tuple.Create('0', '9');
stores.Where(s=> range.Item1 <= s.Name.First && s.Name.First <= range.Item2)
EDIT: Using the function in the Entities Framework
Tuple<char, char> range = Tuple.Create('A', 'A');
//Tuple<char, char> range = Tuple.Create('0', '9');
stores.Where(s=> range.Item1 <= s.Name.Substring(0, 1) && s.Name.Substring(0, 1) <= range.Item2)
I asked a more specific question on another thread.
Here is the answer:
Linq to entities - first letter of string between 2 keys

LINQ return records where string[] values match Comma Delimited String Field

I am trying to select some records using LINQ for Entities (EF4 Code First).
I have a table called Monitoring with a field called AnimalType which has values such as
"Lion,Tiger,Goat"
"Snake,Lion,Horse"
"Rattlesnake"
"Mountain Lion"
I want to pass in some values in a string array (animalValues) and have the rows returned from the Monitorings table where one or more values in the field AnimalType match the one or more values from the animalValues. The following code ALMOST works as I wanted but I've discovered a major flaw with the approach I've taken.
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var result = from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m;
return result;
}
To explain the problem, if I pass in animalValues = { "Lion", "Tiger" } I find that three rows are selected due to the fact that the 4th record "Mountain Lion" contains the word "Lion" which it regards as a match.
This isn't what I wanted to happen. I need "Lion" to only match "Lion" and not "Mountain Lion".
Another example is if I pass in "Snake" I get rows which include "Rattlesnake". I'm hoping somebody has a better bit of LINQ code that will allow for matches that match the exact comma delimited value and not just a part of it as in "Snake" matching "Rattlesnake".
This is a kind of hack that will do the work:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
var values = animalValues.Select(x => "," + x + ",");
var result = from m in db.Monitorings
where values.Any(c => ("," + m.AnimalType + ",").Contains(c))
select m;
return result;
}
This way, you will have
",Lion,Tiger,Goat,"
",Snake,Lion,Horse,"
",Rattlesnake,"
",Mountain Lion,"
And check for ",Lion," and "Mountain Lion" won't match.
It's dirty, I know.
Because the data in your field is comma delimited you really need to break those entries up individually. Since SQL doesn't really support a way to split strings, the option that I've come up with is to execute two queries.
The first query uses the code you started with to at least get you in the ballpark and minimize the amount of data you're retrieving. It converts it to a List<> to actually execute the query and bring the results into memory which will allow access to more extension methods like Split().
The second query uses the subset of data in memory and joins it with your database table to then pull out the exact matches:
public IQueryable<Monitoring> GetMonitoringList(string[] animalValues)
{
// execute a query that is greedy in its matches, but at least
// it's still only a subset of data. The ToList()
// brings the data into memory, so to speak
var subsetData = (from m in db.Monitorings
where animalValues.Any(c => m.AnimalType.Contains(c))
select m).ToList();
// given that subset of data in the List<>, join it against the DB again
// and get the exact matches this time
var result = from data in subsetData
join m in db.Monitorings on data.ID equals m.ID
where data.AnimalType.Split(',').Intersect(animalValues).Any ()
select m;
return result;
}

Resources