Reading a large batch of objects taking way too long - applescript

I’m experimenting with scripting a batch of OmniFocus tasks in JXA and running into some big speed issues. I don't think the problem is specific to OmniFocus or JXA; rather I think this is a more general misunderstanding of how getting objects works - I'm expecting it to work like a single SQL query that loads all objects in memory but instead it seems to do each operation on demand.
Here’s a simple example - let’s get the names of all uncompleted tasks (which are stored in a SQLite DB on the backend):
var tasks = Application('OmniFocus').defaultDocument.flattenedTasks.whose({completed: false})
var totalTasks = tasks.length
for (var i = 0; i < totalTasks; i++) {
tasks[i].name()
}
[Finished in 46.68s]
Actually getting the list of 900 tasks takes ~7 seconds - already slow - but then looping and reading basic properties takes another 40 seconds, presumably because it's hitting the DB for each one. (Also, tasks doesn't behave like an array - it seems to be recomputed every time it's accessed.)
Is there any way to do this quickly - to read a batch of objects and all their properties into memory at once?

Introduction
With AppleEvents, the IPC technology that JavaScript for Automation (JXA) is built upon, the way you request information from another application is by sending it an "object specifier," which works a little bit like dot notation for accessing object properties, and a little bit like a SQL or GraphQL query.
The receiving application evaluates the object specifier and determines which objects, if any, it refers to. It then returns a value representing the referred-to objects. The returned value may be a list of values, if the referred-to object was a collection of objects. The object specifier may also refer to properties of objects. The values returned may be strings, or numbers, or even new object specifiers.
Object specifiers
An example of a fully-qualified object specifier written in AppleScript is:
a reference to the name of the first window of application "Safari"
In JXA, that same object specifier would be expressed:
Application("Safari").windows[0].name
To send an IPC request to Safari to ask it to evaluate this object specifier and respond with a value, you can invoke the .get() function on an object specifier:
Application("Safari").windows[0].name.get()
As a shorthand for the .get() function, you can invoke the object specifier directly:
Application("Safari").windows[0].name()
A single request is sent to Safari, and a single value (a string in this case) is returned.
In this way, object specifiers work a little bit like dot notation for accessing object properties. But object specifiers are much more powerful than that.
Collections
You can effectively perform maps or comprehensions over collections. In AppleScript this looks like:
get the name of every window of Application "Safari"
In JXA it looks like:
Application("Safari").windows.name.get()
Even though this requests multiple values, it requires only a single request to be sent to Safari, which then iterates over its own windows, collecting the name of each one, and then sends back a single list value containing all of the name strings. No matter how many windows Safari has open, this statement only results in a single request/response.
For-loop anti-pattern
Contrast that approach to the for-loop anti-pattern:
var nameOfEveryWindow = []
var everyWindowSpecifier = Application("Safari").windows
var numberOfWindows = everyWindowSpecifier.length
for (var i = 0; i < numberOfWindows; i++) {
var windowNameSpecifier = everyWindowSpecifier[i].name
var windowName = windowNameSpecifier.get()
nameOfEveryWindow.push(windowName)
}
This approach may take much longer, as it requires length+1 number of requests to get the collection of names.
(Note that the length property of collection object specifiers is handled specially, because collection object specifiers in JXA attempt to behave like native JavaScript Arrays. No .get() invocation is needed (or allowed) on the length property.)
Filtering, and why your code example is slow
The really interesting part of AppleEvents is the so-called "whose clause". This allows you provide criteria with which to filter the objects from which the values will be returned from.
In the code you included in your question, tasks is an object specifier that refers to a collection of objects that have been filtered to only include uncompleted tasks using a whose clause. Note that this is still just reference at this point; until you call .get() on the object specifier, it's just a pointer to something, not the thing itself.
The code you included then implements the for-loop anti-pattern, which is probably why your observed performance is so slow. You are sending length+1 requests to OmniFocus. Each invocation of .name() results in another AppleEvent.
Furthermore, you're asking OmniFocus to re-filter the collection of tasks every time, because the object specifier you're sending each time contains a whose clause.
Try this instead:
var taskNames = Application('OmniFocus').defaultDocument.flattenedTasks.whose({completed: false}).name.get()
This should send a single request to OmniFocus, and return an array of the names of each uncompleted task.
Another approach to try would be to ask OmniFocus to evaluate the "whose clause" once, and return an array of object specifiers:
var taskSpecifiers = Application('OmniFocus').defaultDocument.flattenedTasks.whose({completed: false})()
Iterating over the returned array of object specifies and invoking .name.get() on each one would likely be faster than your original approach.
Answer
While JXA can get arrays of single properties of collections of objects, it appears that due to an oversight on the part of the authors, JXA doesn't support getting all of the properties of all of the objects in a collection.
So, to answer you actual question, with JXA, there is not a way to read a batch of objects and all their properties into memory at once.
That said, AppleScript does support it:
tell app "OmniFocus" to get the properties of every flattened task of default document whose completed is false
With JXA, you have to fall back to the for-loop anti-pattern if you really want all of the properties of the objects, but we can avoid evaluating the whose clause more than once by pulling its evaluation outside of the for loop:
var tasks = []
var taskSpecifiers = Application('OmniFocus').defaultDocument.flattenedTasks.whose({completed: false})()
var totalTasks = taskSpecifiers.length
for (var i = 0; i < totalTasks; i++) {
tasks[i] = taskSpecifiers[i].properties()
}
Finally, it should be noted that AppleScript also lets you request specific sets of properties:
get the {name, zoomable} of every window of application "Safari"
But there is no way with JXA to send a single request for multiple properties of an object, or collection of objects.

Try something like:
tell app "OmniFocus"
tell default document
get name of every flattened task whose completed is false
end tell
end tell
Apple event IPC is not OOP, it’s RPC + simple first-class relational queries. AppleScript obfuscates this, and JXA not only obfuscates it even worse but cripples it too; but once you learn to see through the faux-OO syntactic nonsense it makes a lot more sense. This and this may give a bit more insight.
[ETA: Omni recently implemented its own embedded JavaScriptCore-based scripting support in its apps; if JS is your thing you might find that a better bet.]

Related

Store a sheet object in cache of my google app script [duplicate]

I am trying to develop a webapp using Google Apps Script to be embedded into a Google Site which simply displays the contents of a Google Sheet and filters it using some simple parameters. For the time being, at least. I may add more features later.
I got a functional app, but found that filtering could often take a while as the client sometimes had to wait up to 5 seconds for a response from the server. I decided that this was most likely due to the fact that I was loading the spreadsheet by ID using the SpreadsheetApp class every time it was called.
I decided to cache the spreadsheet values in my doGet function using the CacheService and retrieve the data from the cache each time instead.
However, for some reason this has meant that what was a 2-dimensional array is now treated as a 1-dimensional array. And, so, when displaying the data in an HTML table, I end up with a single column, with each cell being occupied by a single character.
This is how I have implemented the caching; as far as I can tell from the API reference I am not doing anything wrong:
function doGet() {
CacheService.getScriptCache().put('data', SpreadsheetApp
.openById('####')
.getActiveSheet()
.getDataRange()
.getValues());
return HtmlService
.createTemplateFromFile('index')
.evaluate()
.setSandboxMode(HtmlService.SandboxMode.IFRAME);
}
function getData() {
return CacheService.getScriptCache().get('data');
}
This is my first time developing a proper application using GAS (I have used it in Sheets before). Is there something very obvious I am missing? I didn't see any type restrictions on the CacheService reference page...
CacheService stores Strings, so objects such as your two-dimensional array will be coerced to Strings, which may not meet your needs.
Use the JSON utility to take control of the results.
myCache.put( 'tag', JSON.stringify( myObj ) );
...
var cachedObj = JSON.parse( myCache.get( 'tag' ) );
Cache expires. The put method, without an expirationInSeconds parameter expires in 10 minutes. If you need your data to stay alive for more than 10 minutes, you need to specify an expirationInSeconds, and the maximum is 6 hours. So, if you specifically do NOT need the data to expire, Cache might not be the best use.
You can use Cache for something like controlling how long a user can be logged in.
You could also try using a global variable, which some people would tell you to never use. To declare a global variable, define the variable outside of any function.

How exactly does the fetchAllIfNeeded differ from fetchAll in the JS SDK?

I never quite understood the if needed part of the description.
.fetchAll()
Fetches the given list of Parse.Object.
.fetchAllIfNeeded()
Fetches the given list of Parse.Object if needed.
What is the situation where I might use this and what exactly determines the need? I feel like it's something super elementary but I haven't been able to find a satisfactory and clear definition.
In the example in the API, I notice that the fetchAllIfNeeded() has:
// Objects were fetched and updated.
In the success while the fetchAll only has:
// All the objects were fetched.
So does the fetchAllIfNeeded() also save stuff too? Very confused here.
UPDATES
TEST 1
Going on some of the hints #danh left in the comments I tried the following things.
var todos = [];
var x = new Todo({content:'Test A'}); // Parse.Object
todos.push(x);
x.save();
// So now we have a todo saved to parse and x has an id. Async assumed.
x.set({content:'Test B'});
Parse.Object.fetchAllIfNeeded(todos);
So in this scenario, my client x is different than the server. But the x.hasChanged() is false since we used the set function and the change event is triggered. fetchAllIfNeeded returns no results. So it isn't that it's trying to compare this outright to what is on the server to sync and fetch.
I notice that in the request payload, running the fetchAllIfNeeded is sending the following interesting thing.
{where: {objectId: {$in: []}}, _method: "GET",…}
So it seems that on the clientside something determines whether an object isNeeded
Test 2
So now, based on the comments I tried manipulating the changed state of the object by setting with silent.
x.set({content:'Test C'}, {silent:true});
x.hasChanged(); // true
Parse.Object.fetchAllIfNeeded(todos);
Still nothing interesting. Clearly the server state ("Test A") is different than clientside ("Test C"). and I still results [] and the request payload is:
{where: {objectId: {$in: []}}, _method: "GET",…}
UPDATE 2
Figured it out by looking at the Parse source. See answer.
After many manipulations, then taking a look at the source - I figured this out. Basically fetchAllIfNeeded will fetch models in an array that have no data, meaning there are no attribute properties and values.
So the use case would be you have lets say a parent object with an array of nested Parse Objects. When you fetch the parent object, the nested child objects in the array will not be included (unless you have the include query constraint set). Instead, the pointers are sent back to clientside and in your client, those pointers are translated into 'empty' models with no data, basically just blank Parse.Objects with ids.
Specifically, the Parse.Object has an internal Boolean property called _hasData which seems to be toggled true any time stuff like set, or fetch, or whatever gives that model attributes.
So, lets say you need to fetch those child objects. You can just do something like
var childObjects = parent.get('children'); // Array
Parse.Object.fetchAllIfNeeded(childObjects);
And it will search for those children who are currently only represented as empty Objects with id.
It's useful as opposed to fetchAll in that you might go through the children array and lazily load one at a time as needed, then at a later time need to "get the rest". fetchAllIfNeeded essentially just filters "the rest" and sends a whereIn query that limits fetching to those child objects that have no data.
In the Parse documentation, they have a comment in the callback response to fetchAllIfNeeded as:
// Objects were fetched and UPDATED.
I think they mean the clientside objects were updated. fetchAllIfNeeded is definitely sending GET calls so I doubt anything updates on the serverside. So this isn't some sync function. This really confused me as I instantly thought of serverside updating when they really mean:
// Client objects were fetched and updated.

Improve Script performance by caching Spreadsheet values

I am trying to develop a webapp using Google Apps Script to be embedded into a Google Site which simply displays the contents of a Google Sheet and filters it using some simple parameters. For the time being, at least. I may add more features later.
I got a functional app, but found that filtering could often take a while as the client sometimes had to wait up to 5 seconds for a response from the server. I decided that this was most likely due to the fact that I was loading the spreadsheet by ID using the SpreadsheetApp class every time it was called.
I decided to cache the spreadsheet values in my doGet function using the CacheService and retrieve the data from the cache each time instead.
However, for some reason this has meant that what was a 2-dimensional array is now treated as a 1-dimensional array. And, so, when displaying the data in an HTML table, I end up with a single column, with each cell being occupied by a single character.
This is how I have implemented the caching; as far as I can tell from the API reference I am not doing anything wrong:
function doGet() {
CacheService.getScriptCache().put('data', SpreadsheetApp
.openById('####')
.getActiveSheet()
.getDataRange()
.getValues());
return HtmlService
.createTemplateFromFile('index')
.evaluate()
.setSandboxMode(HtmlService.SandboxMode.IFRAME);
}
function getData() {
return CacheService.getScriptCache().get('data');
}
This is my first time developing a proper application using GAS (I have used it in Sheets before). Is there something very obvious I am missing? I didn't see any type restrictions on the CacheService reference page...
CacheService stores Strings, so objects such as your two-dimensional array will be coerced to Strings, which may not meet your needs.
Use the JSON utility to take control of the results.
myCache.put( 'tag', JSON.stringify( myObj ) );
...
var cachedObj = JSON.parse( myCache.get( 'tag' ) );
Cache expires. The put method, without an expirationInSeconds parameter expires in 10 minutes. If you need your data to stay alive for more than 10 minutes, you need to specify an expirationInSeconds, and the maximum is 6 hours. So, if you specifically do NOT need the data to expire, Cache might not be the best use.
You can use Cache for something like controlling how long a user can be logged in.
You could also try using a global variable, which some people would tell you to never use. To declare a global variable, define the variable outside of any function.

Use Session state within pLinq queries

I have a fairly simple Linq query (simplified code):
dim x = From Product In lstProductList.AsParallel
Order By Product.Price.GrossPrice Descending Select Product
Product is a class. Product.Price is a child class and GrossPrice is one of its properties. In order to work out the price I need to use Session("exchange_rate").
So for each item in lstProductList there's a function that does the following:
NetPrice=NetPrice * Session("exchange_rate")
(and then GrossPrice returns NetPrice+VatAmount)
No matter what I've tried I cannot access session state.
I have tried HttpContext.Current - but that returns Nothing.
I've tried Implements IRequiresSessionState on the class (which helps in a similar situation in generic http handlers [.ashx]) - no luck.
I'm using simple InProc session state mode. Exchange rate has to be user specific.
What can I do?
I'm working with:
web development, .Net 4, VB.net
Step-by-step:
page_load (in .aspx)
dim objSearch as new SearchClass()
dim output = objSearch.renderProductsFound()
then in objSearch.renderProductsFound:
lstProductList.Add(objProduct(1))
...
lstProductList.Add(objProduct(n))
dim x = From Product In lstProductList.AsParallel
Order By Product.Price.GrossPrice Descending Select Product
In Product.Price.GrossPrice Get :
return me.NetPrice+me.VatAmount
In Product.Price.NetPrice Get:
return NetBasePrice*Session("exchange_rate")
Again, simplified code, too much to paste in here. Works fine if I unwrap the query into For loops.
I'm not exactly sure how HttpContext.Current works, but I wouldn't be surprised if it would work only on the main thread that is processing the HTTP request. This would mean that you cannot use it on any other threads. When PLINQ executes the query, it picks some random threads from the thread pool and evaluates the predicates in the query using these threads, so this may be the reason why your query doesn't work.
If the GrossPrice property needs to access only a single thing from the session state, it should be fairly easy to change it to a method and pass the value from the session state as an argument:
Dim rate = Session("exchange_rate")
Dim x = From product In lstProductList.AsParallel
Order By product.Price.GetGrossPrice(rate) Descending
Select product
Depending on where you use x later, you could also add a call to ToList to force the evaluation of the query (otherwise it may be executed lazily at some later time), but I think the change I described above should fix it.
You should really read the values out of session state and into local variables that you know you need within the LINQ statement. Otherwise you're actually accessing the NameValueCollection instance every single time for every single element in every single thread when the value is essentially constant.

Can I do delayed execution in ASP? Or: please offer any help on improving performance of an old ASP site

I'm trying to improve the performance of a website written in classic ASP.
It supports multiple languages the problem lies in how this was implemented. It has the following method:
GetTranslation(id,language)
Which is called all over the shop like this:
<%= GetTranslation([someid],[thelanguage]) %>
That method just looks up the ID and language in SQL and returns the translation. Simple.
But incredibly inefficient. On each page load, there's around 300 independent calls to SQL to get an individual translation.
I have already significantly improved performance for many scenarios:
A C# tool that scans the .asp files and picks up references to GetTranslation
The tool then builds up a "bulk-cache" method that (depending on the page) takes all the IDs it finds and in one fell swoop caches the results in a dictionary.
The GetTranslation method was then updated to check the dictionary for any requests it has and only go to SQL if it's not already in there (and cache it's own result if necessary)
This only goes so far.
When IDs of translations are stored in the database I can't pick these up (particularly easily).
Ideally the GetTranslation method would, on each call, build up one big SQL string that would only be executed at the end of the page request.
Is this possible in ASP? Can I have the result of a <%= ... %> to be a reference to something that is later resolved?
I would also sincerely appreciate any other creative ways I might improve the performance of this old, ugly beast.
I don't think you can do delayed execution in Classic ASP. As for suggestions on improving the performance. You can have a class like this :
Class TranslationManager
Private Sub Class_Initialize
End Sub
Private Sub Class_Terminate
End Sub
Private Function ExistsInCache(id, language)
ExistsInCache = _
Not IsEmpty(Application("Translation-" & id & "-" & language))
End Function
Private Function GetFromCache(id, language)
GetFromCache = Application("Translation-" & id & "-" & language)
End Function
Private Function GetFromDB(id, language)
//'GET THE RESULT FROM DB
Application("Translation-" & id & "-" & language) = resultFromDB
GetFromDB = resultFromDB
End Function
Public Default Function GetTranslation(id, language)
If ExistsInCache(id, language) Then
GetTranslation = GetFromCache(id, language)
Else
GetTranslation = GetFromDB(id, language)
End If
End Function
End Class
And use it like this in your code
Set tm = New TranslationManager
translatedValue = tm([someid], [thelanguage])
Set tm = Nothing
This would definitely reduce the calls to DB. But you need to be very careful about how much data you put into the application object. You don't wanna run out of memory. It's best you also track how long the translations stay in the memory and have them expired (deleted from the Application object) when they were not accessed for some time.
We use a i18n class with a namespace and language attribute in oure e-commerce system. The class has a default function called 'translate' which basically preforms a dictionary lookup. This dictionary is loaded using the memento pattern from a text file containing all the translations for the namespace and language.
The skeletons for these translations files are generated by a custom tool (written in vbscript actually) which parses the ASP's for i18n($somestring) calls. The filenames are based on the namespace and language e.g. "shoppingcart_step_1_FR.txt". The tool can actually update/extent existing translation files when we add new translatable strings to the ASP code, which is very important for maintenance.
The performance overhead for using this method is minimal. Due to the segmentation our largest translation file contains about 200 translatable strings (including static image urls). Loading it per request has very little effect on the performance. I guess one could cache the translation dictionaries in the application object using some third-party treadsafe dictionary, yet IMHO this isn't worth the trouble.
An extra tip, use variable replacement in your string to improve translatability.
For example use:
<%=replace(i18n("Buy $productname"), "$productname", product.name)%>
instead of
<%=i18n("Buy")%> <%=product.name%>
The first method is much more flexible for the translators. A lot of languages have different sentence structures.
<%= x %> simply translates into Response.Write(x). There's no way to make that deferred.
In fact, Classic ASP has no way to make anything deferred, as far as I can remember.
You've already done a lot in terms of writing the caching tool. Next step would be writing a tool to convert these ASP pages to ASP.NET.
Your cache is the best saving you can make here, you could do away with the complication of the .net pre-cacher and just have each call to GetTranslate check your dictionary for its entry, if not there then it can fetch it and cache it in Application space. That would be lightening fast. Then there just the problem of refreshing the Cache every now and then but that will be done for you every 24 hours ish when the worker process gets refreshed.
If you need it to be more up to date than that then you could pull out all your references as you do using your .net cacher. Then you could create a new function to go get all the entries for a given set of ids, write a call to this function near the top of your ASP pages for each page which would call your DB with one SQL string stashing the results in a local dictionary. Then modify GetTranslation to use the values from this dictionary. Would need to be updated but that could be part of your build process or just a job that runs every hour/night.
The approach I use in my own CMS is to load all the static Language strings from the Database into the Application object at start up using unique keys based on the variable name, language ID and CMS instance. Then I use page level caching into an array as the page gets created so repeatedly used variable gets cached and avoids lots of trips to the Application object where possible. I use an array instead of a dictionary as the way my CMS function each page is created in sections with each section being isolated from each other so I'd be creating multiple dictionaries per page which is undesirable.
Obviously the viability of this solution depends entirely on the number of variables you have and the number of translations you have plus how many variables you need to retrieve on each page.

Resources