BigQuery JavaScript UDF process - per row or per processing node? - performance

I'm thinking of using BigQuery's JavaScript UDF as a critical component in a new data architecture. It would be used to logically process each row loaded into the main table, and also to process each row during periodical and ad-hoc aggregation queries.
Using an SQL UDF for the same purpose seems to be unfeasible because each row represents a complex object, and implementing the business logic in SQL, including things such as parsing complex text fields, gets ugly very fast.
I just read the following in the Optimizing query computation documentation page:
Best practice: Avoid using JavaScript user-defined functions. Use native UDFs instead.
Calling a JavaScript UDF requires the instantiation of a subprocess.
Spinning up this process and running the UDF directly impacts query
performance. If possible, use a native (SQL) UDF instead.
I understand why a new process for each processing node is needed, and I know that JS tends to be deployed in a single-thread-per-process manner (even though v8 does support multithreading these days). But it's not clear to me if once a JS runtime process is up, it can be expected to get reused between calls to the same function (e.g. for processing different rows on the same processing node). The amount of reuse will probably significantly affect the cost. My table is not that large (tens to hundreds of millions of rows), but still I need to have a better understanding here.
I could not find any authoritative source on this. Has anybody done any analysis of the actual impact of using a JavaScript UDF on each processed row, in terms of execution time and cost?

If it's not documented, then that's an implementation detail that could change. But let's test it:
CREATE TEMP FUNCTION randomThis(views INT64)
RETURNS FLOAT64
LANGUAGE js AS """
if (typeof variable === 'undefined') {
variable = Math.random()
}
return variable
""";
SELECT randomThis(views), COUNT(*) c
FROM (
SELECT views
FROM `fh-bigquery.wikipedia_v3.pageviews_2019`
LIMIT 10000000
)
GROUP BY 1
ORDER BY 2 DESC
I was expecting ten million different numbers, or a handful, but I only got one: The same process was reused ten million times, and variables were kept around in between calls.
This even happened when I went up to 100 million, signaling that parallelism is bounded by one JS VM.
Again, these are implementation details that could change. But while it stays that way, you can make the best use out of it.

I was expecting ten million different numbers, or a handful, but I only got one
That's because you didn't allow Math.random to be called more than once
and variables were kept around in between calls
due to the variable defined at the global scope.
In other words your code explicitly permits Math.random to be executed once only (by implictly defining the variable at the global scope).
If you try this:
CREATE TEMP FUNCTION randomThis(seed INT64)
RETURNS FLOAT64
LANGUAGE js AS """
let ret = undefined
if (ret === undefined) {
ret = Math.random()
}
return ret
""";
SELECT randomThis(size), COUNT(*) c
FROM (
SELECT repository_size as size
FROM `my-internal-dataset.sample-github-table`
LIMIT 10000000
)
GROUP BY 1
ORDER BY 2 DESC
then you get many rows. And now it does take much longer time to execute, probably because the single VM became a bottleneck.
Used another dataset to reduce the query cost.
Conclusion:
1. There is one VM (or maybe a container) per query to support JS UDF. This is in line with a single subprocess ("Calling a JavaScript UDF requires the instantiation of a subprocess") mentioned in the documentation.
2. If you can apply execute-once pattern (using some kind of a cache or coding technique like memoisation) and write a UDF similar to the previous answer, then the sheer presence of JS UDF has a limited impact on your query.
3. If you have to write a JS UDF like in this answer, then the impact on your query becomes very significant with query execution time skyrocketing even for simple JS code. So for this case it's certainly better to stay out.

Related

Best Practices for Multiple OnEdit Functions

Problem
I have 6 OnEdit Functions, which work as intended individually, but when together they don't work as intended. By this I mean some simply don't trigger.
Properties of the Script
They have different names - function onEdit(e) {code}, function onEdit1(e1) {code}, function onEdit2(e2) {code}, function onEdit3(e3) {code}, function onEdit4(e4) {code}, function onEdit5(e5) {code}
They are all in the same .gs tab
Some of them have the same variables. For example OnEdit has var range = e.range; and OnEdit5 has var range = e5.range;
My Understanding
I believe that you can run multiple OnEdit functions within the same .gs tab. Is this correct? Or do I need to somehow create new .gs tabs?
I believe that my onEdit functions should be named differently, so they are called correctly. Is this correct, or should I be getting rid of the different functions and putting them into one massive function? (I imagine this would lead to slower execution and more cases of not being able to isolate incorrect code).
I believe that the variables that are created within each function are specific to that function. Is this true? Or are they impacting each other?
Why I'm asking this
Iterations of this question seem to have been asked before. But people generally give advice on integrating two functions into one big one, rather than preparing someone to integrate 10-20 different OnEdit functions. Nor do they give a clear indication of best coding practices.
I've spent hours reading through this subject and feel that people new to scripts, like me, would greatly benefit from knowing this.
Thank you in advance for any contributions!
Notes:
There can only be one function with a same name. If there are two, the latter will overwrite the former. It's like the former never existed.
A function named onEdit is triggered automatically on (You guessed it!)edit
There's no simple trigger for other names like onEdit1 or onEdit2....
Simple triggers are limited to 30 seconds of execution
So, in a single code.gs file or even in a single project, there can only be one function named onEdit and trigger successfully.
If you create multiple projects, onEdit will trigger in each project asynchronously. But there are limits to number of projects that can be created and other quotas will apply.
Alternatively, you can use installed triggers: which doesn't have limit of 30s. You can also use any name for your function.
The best way to optimize functions is to never touch the spreadsheet unless it is absolutely necessary. For example, sorting various values inside the script is better than repeatedly calling .sort on the multiple ranges multiple times. The lesser the interaction between sheets and scripts, the better. A highly optimized script will only require two calls to spreadsheet: one to get the data and the other to set the data.
After optimizing the number of calls to sheet, you can optimize the script itself: Control the logic such that only the necessary amount of operations are done for each edit. For example, if the edit is in A1(A1,B1 are checkboxes, if clicked clears A2:A10,B2:B10 respectively), then you should check if A1 is clicked and If clicked, clear the range and exit and not to check for B1 again. Script optimization requires atleast a basic knowledge of JavaScript objects. Nevertheless, this isn't as effective as reducing the number of calls-which is the slowest part of any apps script.
References:
Best practices

Overhead when calling a component function vs inline code - ColdFusion

I've been diagnosing a performance issue with generating a CSV containing around 50,000 lines and I've narrowed it down to a single function that is used once per line.
After a lot of messing about, I've discovered that there is an overhead in using the function, rather than placing the logic directly in the loop - my question is: Why?!
The function in question is very simple, it accepts a string argument and passes that to a switch/case block containing around 15 options - returning the resulting string.
I've put a bunch of timers all over the place and discovered that a lot (not all) of the time this function call takes between 0 and 200 ms to run... however if I put the exact same code inline, it sits at 0 on every iteration.
All this points to a fundamental issue in my understanding of object instantiation and I'd appreciate some clarification.
I was always under the impression that if I instantiate a component at the top of a page, or indeed if I instantiate it in a persistent scope like Application or Session, then it would be placed into memory and subsequent calls to functions within that component would be lightning fast.
It seems however, that there is an overhead to calling these functions and while we're only talking a few milliseconds, when you have to do that 50,000 times it quickly adds up.
Furthermore, it seems that doing this consumes resources. I'm not particularly well versed in the way the JVM uses memory, I've read up on it and played with settings and such, but it's an overwhelming topic - especially for those of us with no Java development experience. It seems that when calling the method over inline code, sometimes the ColdFusion service just collapses and the request never ends. Other times it does indeed complete, although way too slowly. This suggests that the request can complete only when the server has the resources to handle it - and thus that the method call itself is consuming memory... (?)
If indeed the calling of a method has an overhead attached, I have a big problem. It's not really feasible to move all of this code inline, (while the function in question is simple, there are plenty of other functions that I will need to make use of) and doing so goes against everything I believe as a developer!!
So, any help would be appreciated.
Just for clarity and because I'm sure someone will ask for it, here's the code in question:
EDIT: As suggested, I've changed the code to use a struct lookup rather than CFSwitch - below is amended code for reference, however there's also a test app in pastebin links at the bottom.
Inside the init method:
<cfset Variables.VehicleCategories = {
'T1' : 'Beetle'
, 'T1C' : 'Beetle Cabrio'
, 'T2' : 'Type 2 Split'
, 'T2B' : 'Type 2 Bay'
, 'T25' : 'Type 25'
, 'Ghia' : 'Karmann Ghia'
, 'T3' : 'Type 3'
, 'G1' : 'MK1 Golf'
, 'G1C' : 'MK1 Golf Cabriolet'
, 'CADDY' : 'MK1 Caddy'
, 'G2' : 'MK2 Golf'
, 'SC1' : 'MK1/2 Scirocco'
, 'T4' : 'T4'
, 'CO' : 'Corrado'
, 'MISC' : 'MISC'
} />
Function being called:
<cffunction name="getCategory" returntype="string" output="false">
<cfargument name="vehicleID" required="true" type="string" hint="Vehicle type" />
<cfscript>
if (structKeyExists(Variables.VehicleCategories, Arguments.VehicleID)) {
return Variables.VehicleCategories[Arguments.VehicleID];
}
else {
return 'Base SKUs';
}
</cfscript>
</cffunction>
As requested, I've created a test application to replicate this issue:
http://pastebin.com/KE2kUwEf - Application.cfc
http://pastebin.com/X8ZjL7D7 - TestCom.cfc (Place in 'com' folder outside webroot)
http://pastebin.com/n8hBLrfd - index.cfm
Function call will always be slower than inline code in Any language. That's why there's inline keyword in C++, and in JVM land there is JIT optimizer that will inline functions for you if it deems necessary.
Now ColdFusion is yet another layer on top of JVM. Therefore a function in CF is not a function in JVM, so things don't translate 1:1 at the JIT optimizer standpoint. A CFML function is actually compiled down to a Java class. Also, scopes like arguments, local (Java hashtables) are created on every invocation. Those takes time and memory and therefore overhead.
...if I instantiate it in a persistent scope like Application or
Session, then it would be placed into memory and subsequent calls to
functions within that component would be lightning fast
It'd be faster than instantiating a new instance for sure, but it's not going to be "lightning fast" especially when you call it in a tight loop.
In conclusion, inline the function and if it's still not fast enough, locate the slowest part of the code and write it in Java.
Just a side note here, since Railo uses inner classes instead of complete independent classes, it is faster if you write in such a style as to have many small functions. In my experiments, both engines perform similarly with basic inline code. Adobe ColdFusion lends itself to large god functions if you need to squeak out performance under load. With the JVM being unable to inline ColdFusion functions during compilation, you'll never get the benefit of the compiler being smart with your code.
This is especially important if you created an application that uses a ton of explicit getters/setters and you find your traffic increasing from small volume to high volume. All those little functions will bring you to your knees vs. having fewer large "god" functions.
Slowest to fastest with one basic test we ran of 100,000 iterations:
Adobe ColdFusion (many small functions) ( 200X slower than Java)
Railo (many small functions) (60X slower)
ColdFusion / Railo (all code inline in one giant function) (10X slower)
Native Java Class (fastest)

What's the most efficient way to ignore code in lua?

I have a chunk of lua code that I'd like to be able to (selectively) ignore. I don't have the option of not reading it in and sometimes I'd like it to be processed, sometimes not, so I can't just comment it out (that is, there's a whole bunch of blocks of code and I either have the option of reading none of them or reading all of them). I came up with two ways to implement this (there may well be more - I'm very much a beginner): either enclose the code in a function and then call or not call the function (and once I'm sure I'm passed the point where I would call the function, I can set it to nil to free up the memory) or enclose the code in an if ... end block. The former has slight advantages in that there are several of these blocks and using the former method makes it easier for one block to load another even if the main program didn't request it, but the latter seems the more efficient. However, not knowing much, I don't know if the efficiency saving is worth it.
So how much more efficient is:
if false then
-- a few hundred lines
end
than
throwaway = function ()
-- a few hundred lines
end
throwaway = nil -- to ensure that both methods leave me in the same state after garbage collection
?
If it depends a lot on the lua implementation, how big would the "few hundred lines" need to be to reliably spot the difference, and what sort of stuff should it include to best test (the main use of the blocks is to define a load of possibly useful functions)?
Lua's not smart enough to dump the code for the function, so you're not going to save any memory.
In terms of speed, you're talking about a different of nanoseconds which happens once per program execution. It's harming your efficiency to worry about this, which has virtually no relevance to actual performance. Write the code that you feel expresses your intent most clearly, without trying to be clever. If you run into performance issues, it's going to be a million miles away from this decision.
If you want to save memory, which is understandable on a mobile platform, you could put your conditional code in it's own module and never load it at all of not needed (if your framework supports it; e.g. MOAI does, Corona doesn't).
If there is really a lot of unused code, you can define it as a collection of Strings and loadstring() it when needed. Storing functions as strings will reduce the initial compile time, however of most functions the string representation probably takes up more memory than it's compiled form and what you save when compiling is probably not significant before a few thousand lines... Just saying.
If you put this code in a table, you could compile it transparently through a metatable for minimal performance impact on repeated calls.
Example code
local code_uncompiled = {
f = [=[
local x, y = ...;
return x+y;
]=]
}
code = setmetatable({}, {
__index = function(self, k)
self[k] = assert(loadstring(code_uncompiled[k]));
return self[k];
end
});
local ff = code.f; -- code of x gets compiled here
ff = code.f; -- no compilation here
for i=1, 1000 do
print( ff(2*i, -i) ); -- no compilation here either
print( code.f(2*i, -i) ); -- no compile either, but table access (slower)
end
The beauty of it is that this compiles as needed and you don't really have to waste another thought on it, it's just like storing a function in a table and allows for a lot of flexibility.
Another advantage of this solution is that when the amount of dynamically loaded code gets out of hand, you could transparently change it to load code from external files on demand through the __index function of the metatable. Also, you can mix compiled and uncompiled code by populating the "code" table with "real" functions.
Try the one that makes the code more legible to you first. If it runs fast enough on your target machine, use that.
If it doesn't run fast enough, try the other one.
lua can ignore multiple lines by:
function dostuff()
blabla
faaaaa
--[[
ignore this
and this
maybe this
this as well
]]--
end

Strangely, making copies of parameters drastically speeds up SP in SQL Server 2008

When running a sproc with SqlDataAdapter.fill(), I noticed it was taking upwards of 90 seconds when running the same sproc in management studio took only 1-2 seconds. I started messing around with the parameters to try to find the issue, and I eventually did, though it's a strange one. I discovered that if I simply declared three new variables in the sproc and directly copied the contents of the parameters into them, and then used those new variables in the body of the sproc, the fill() method dropped to 1-2 seconds just like running the sproc directly in management studio. In other words, doing this:
CREATE PROCEDURE [dbo].[TestProc]
#location nvarchar(100), #startTime datetime, #endTime datetime
AS
declare #location2 nvarchar(100), #endTime2 datetime, #startTime2 datetime
set #location2 = #location
set #startTime2 = #startTime
set #endTime2 = #endTime
--... query using #location2, #startTime2, #endTime2
If I changed even just one of the references in the query body from #startTime2 back to #startTime (the actual parameter passed in from C#), the query jumped right back up to around 90s or even longer.
SO.... why in the world does SQLDataAdapter or SQL Server care what I do with its parameters once they're passed into the sproc? Why would this affect execution time? Any guidance of how to root out this issue further is greatly appreciated. Thanks!
Edit: Although I could've sworn there was a difference between running the query from C# using SqlDataAdapter and using management studio, as of right now, I can't replicate the difference. Now, management studio also takes > 90 seconds to run the sproc when I do NOT copy the parameters. This is a huge relief, because it means the problem isn't somehow with C#, and it just a more run of the mill (though still strange) SQL Server issue. One of the guys on my team that's an excellent SQL guy is looking at the execution path of the sproc when run with and without first copying the parameters. If we figure it out, I'll post the answer here. Thanks for the help so far!
It's undoubtedly a case of parameter sniffing and improper reuse of execution plans that were created with a different set of parameters that had a very different optimal access pattern.
The sudden change to the two different-style accesses being the same (rather than one quick) strongly suggests that the cached execution plan was updated to a version that now performs slowly with both access methods, or your data or your parameters changed.
In my experience the general culprit in this sort of small/huge time difference of execution is use of a nested loop join where a hash match is actually required. (For a very small number of rows the nested loop is superior, past a certain fairly low barrier, then the hash match becomes less expensive. Unless you're lucky that your inputs are both sorted by the join criteria, a merge join is rare to find as sorting large sets tends to be more expensive than hash matching.)
The reason that your parameter tweaking in the SP fixed the problem is that then SQL Server became aware you were doing something to the parameters by setting them to some value (ignoring what you'd set them to) and it had to compute a new execution plan, so it threw out the old one and designed a new access path based on the current set of parameters, getting better results.
If this problem persists then playing with SP recompilation/clearing the plan cache combined with using different parameters that must deal with hugely different number of rows may reveal where the problem is. Look at the execution plan that is used to run the SP with different parameters and see the effects of different access strategies being employed in the wrong conditions.

efficient serverside autocomplete

First off all I know:
Premature optimization is the root of all evil
But I think wrong autocomplete can really blow up your site.
I would to know if there are any libraries out there which can do autocomplete efficiently(serverside) which preferable can fit into RAM(for best performance). So no browserside javascript autocomplete(yui/jquery/dojo). I think there are enough topic about this on stackoverflow. But I could not find a good thread about this on stackoverflow (maybe did not look good enough).
For example autocomplete names:
names:[alfred, miathe, .., ..]
What I can think off:
simple SQL like for example: SELECT name FROM users WHERE name LIKE al%.
I think this implementation will blow up with a lot of simultaneously users or large data set, but maybe I am wrong so numbers(which could be handled) would be cool.
Using something like solr terms like for example: http://localhost:8983/solr/terms?terms.fl=name&terms.sort=index&terms.prefix=al&wt=json&omitHeader=true.
I don't know the performance of this so users with big sites please tell me.
Maybe something like in memory redis trie which I also haven't tested performance on.
I also read in this thread about how to implement this in java (lucene and some library created by shilad)
What I would like to hear is implementation used by sites and numbers of how well it can handle load preferable with:
Link to implementation or code.
numbers to which you know it can scale.
It would be nice if it could be accesed by http or sockets.
Many thanks,
Alfred
Optimising for Auto-complete
Unfortunately, the resolution of this issue will depend heavily on the data you are hoping to query.
LIKE queries will not put too much strain on your database, as long as you spend time using 'EXPLAIN' or the profiler to show you how the query optimiser plans to perform your query.
Some basics to keep in mind:
Indexes: Ensure that you have indexes setup. (Yes, in many cases LIKE does use the indexes. There is an excellent article on the topic at myitforum. SQL Performance - Indexes and the LIKE clause ).
Joins: Ensure your JOINs are in place and are optimized by the query planner. SQL Server Profiler can help with this. Look out for full index or full table scans
Auto-complete sub-sets
Auto-complete queries are a special case, in that they usually works as ever decreasing sub sets.
'name' LIKE 'a%' (may return 10000 records)
'name' LIKE 'al%' (may return 500 records)
'name' LIKE 'ala%' (may return 75 records)
'name' LIKE 'alan%' (may return 20 records)
If you return the entire resultset for query 1 then there is no need to hit the database again for the following result sets as they are a sub set of your original query.
Depending on your data, this may open a further opportunity for optimisation.
I will no comply with your requirements and obviously the numbers of scale will depend on hardware, size of the DB, architecture of the app, and several other items. You must test it yourself.
But I will tell you the method I've used with success:
Use a simple SQL like for example: SELECT name FROM users WHERE name LIKE al%. but use TOP 100 to limit the number of results.
Cache the results and maintain a list of terms that are cached
When a new request comes in, first check in the list if you have the term (or part of the term cached).
Keep in mind that your cached results are limited, some you may need to do a SQL query if the term remains valid at the end of the result (I mean valid if the latest result match with the term.
Hope it helps.
Using SQL versus Solr's terms component is really not a comparison. At their core they solve the problem the same way by making an index and then making simple calls to it.
What i would want to know is "what you are trying to auto complete".
Ultimately, the easiest and most surefire way to scale a system is to make a simple solution and then just scale the system by replicating data. Trying to cache calls or predict results just make things complicated, and don't get to the root of the problem (ie you can only take them so far, like if each request missed the cache).
Perhaps a little more info about how your data is structured and how you want to see it extracted would be helpful.

Resources