the param oagesize of SortedSetScan can not work - stackexchange.redis

I'm using SortedSetScan to filter some data,my code is below:
db.SortedSetScan("SR.Cache.APP:Termial1", "A*", 50, CommandFlags.None);
but the pagesize is always not work,the result count always be all. What's wrong about my code? or is it a bug?
may anybody can help me,thx!

Yes, the result of that is always "all". The page size simply impacts the number of round-trips, versus the amount of data each call, when issuing the underlying ZSCAN command. The IEnumerable<T>, however, is lazy, etc, so if you only want the first 50 items, use:
db.SortedSetScan(...).Take(50)
instead, which will perform whatever operations it needs in order to get 50 items. Tweaking the page size simply changes how many operations are needed. It would be incorrect to think "I'll make the page size 50 so it only takes one operation" - it doesn't work like that; redis *SCAN commands can return empty pages, or pages with one or two items on, regardless of the page size. The page size is more a "how many things to look at before giving up for this iteration" guidance fore redis. This is described more fully on the redis SCAN documentation - in particular, read what it says about "The COUNT option".
Note that the sequence obtained from all of the SE.Redis scanning operations can be resumed at a later point by casting the IEnumerable<T> or IEnumerator<T> to an IScanningCursor, and obtaining the cursor details to supply as parameters.
You might also want to think whether the "range" methods are more appropriate (note: they don't allow pattern filters).

Related

Lost Duration while Debugging Apex CPU time limit exceeded

I'm open to posting the code in this section to work through the optimization but its a bit length and complex, so instead I'm hoping that somebody can assist me with a few debugging questions I have. My goal is to find out what is causing my Apex CPU Time Limit Exceeded issue.
When using the Debug Log in its basic or normal layout I receive the message
Maximum CPU Time: 15062 out of 10,000 ** Close to Limit
I've optimized and re-wrote various loops and queries several times now and in each case this number concludes around there which leads me to believe it is lying to me and that my actual usage far exceeds that number. So on my journey I switched the Log Panels of the Developer Console to Analysis in hopes of isolating exactly what loop, method, or area of the code is giving me a headache.
This leads me to my main question and problem.
Execution Tree, Performance Tree & Executed Units
All show me that my durations UNDER the 10,000ms allowance. My largest consumption is 3,556.19ms which is being used by a wrapper class I created and consumed in the constructor method where there is a fair amount of logic that is constructing a fairly complicated wrapper class that spans over 5-7 custom objects. Still even with those 3,000ms the remainder of the process shows at negligible times bringing my total around 4,000ms. Again my question is.... Why am I unable to see or find what is consuming all my time?
Incorrect Iteration Data
In addition to this, on the Performance tree there is a column of data that shows the number of iterations for each method. I know that my Production Org has 81 objects that would essentially call the constructor for my custom wrapper object. I.E. my Constructor SHOULD be called 81 times, but instead it is called 32 times. So my other question is can I rely on the iteration data in the column? Or because it was iterating so many times does it stop counting at a certain point? Its possible that one of my objects is corrupted or causing an infinite loop somehow, but I don't want to dig through all the data in search of that conclusion if its a known issue that the iteration data is not accurate anyway.
System.Debug in the Production org
The Last question is why my System.Debug() lines are not displaying in my Developer Console on the production org. I've added serveral breadcrumbs throughout the code that would help me isolate just which objects are making it through and which are not, however, I cannot in any layout view system.debug messages outside of my Sandbox.
Sorry for the wealth of questions but I did want to give an honest effort to better understand the debugging process in Salesforce. If this is a lost cause I'm happy to start sharing some code as well but hopefully some debugging tips can get me to the solution.
It's likely your debug log got truncated, see "Each debug log must be 20 MB or smaller. If it exceeds this amount, you won’t see everything you need." in https://trailhead.salesforce.com/en/content/learn/modules/apex_basics_dotnet/debugging_diagnostics
Download the log and search for text similar to "skipped 123456 bytes of detailed log" to confirm, some system.debug statements will just not show up.
You might have to fine-tune the log levels (don't log validation rules and workflows? don't log every single variable assignment with "FINE" level etc). You might have to set all flags to NONE, then track only 1 particular class/trigger that you suspect (see https://help.salesforce.com/articleView?id=code_debug_log_classes.htm&type=5 and https://salesforce.stackexchange.com/questions/214380/how-are-we-supposed-to-use-debug-logs-for-a-specific-apex-class-only)
If it's truncated it's possible analysis tools give up (I had mixed luck with console to be honest, sometimes https://apextimeline.herokuapp.com/ is great to give overview - but it'll also fail to parse a 20 MB log...
When all else fails you can load up the log into Notepad++ (or any editor of your choice), find lines related to method entry/method exit (you might need a regular expression search), take these filtered lines tor excel, play with "text to columns" and just look at timing manually, see if there's a record that causes the spike. Because it could be #10 that's the problem, the fact it exhausts limits on #32 of 81 doesn't mean much. Search like [METHOD_ENTRY|METHOD_EXIT]MyTriggerHandler.onBeforeUpdate could be a good start. But first thing is to make sure log is not truncated.

Making sure my Go page view counter isn't abused

I believe I have a found a very good and fast solution for efficiently counting page views:
Working example in go playground here: https://play.golang.org/p/q_mYEYLa1h
My idea is to push this to the database every X minutes, and after pushing a key then delete it from the page map.
My question now is, what would be the optimal way to ensure that this isn't abused? Ideally, I would only want to increase page count from the same person if there was a time interval of 2 hours since last visiting the page.
As far as I know, it would be ideal to store and compare both IP and user agent (I don't want to rely on cookie/localstorage), but I'm not quite sure how to efficiently store and compare this information.
I'd likely get both the IP (req.Header.Get("x-forwarded-for")) and UserAgent (req.UserAgent()) from http.Request.
I was thinking making a visitor struct similar to my page struct that would look like this:
type visitor struct {
mutex sync.Mutex
urlIPUAAndTime map[string]time
}
This way should make it possible to do something similar to before. However, imagine if the website had so many requests that there would be hundreds of millions of unique visitor maps being stored, and each of these could only be deleted after 2 (or more) hours. I therefore think this is not a good solution.
I guess it would be ideal/necessary to write to and read from some file, but not sure how this should be done efficiently. Help would be greatly appreciated
One of optimization ways is to add a Bloom filter before this map. Bloom filter is a probabilistic structure which can say one of these:
this user is definitely new
and this user possibly was here
This is a way to cut off computation on early stage. If many of your users are new then you save requests to database to check all of them.
What if structure says "user is possibly non-unique"? Then you go the database and check it.
Here's one more optimization: if you do not need very accurate information and can agree with mistake about several percent, you may use the sole bloom filter. I guess many large sites use this technique for estimation.

Questions within questions for tin can api?

Does Tin Can API support questions within questions?
If so, what would be the specification for passing data to an LRS?
I was thinking of adding ID's to each sub question.
This would be much easier to answer if you could provide an example, but the flexibility of the Tin Can API is such that you can literally capture anything (which is also part of the complexity) with more or less grace.
Some immediate options come to mind:
Use a single interaction activity statement (likely with type choice) and use the formatting allowed to have multi-value responses (i.e. golf[,]tetris).
Use multiple statements where there is a combined statement (necessary if there is an overall result) such that there is a single main activity and each sub-question has its own statement where the sub-question has its own activity and the main activity would be stored in the context.contextActivities.parent list. When there is a combined statement in this case I would include a reference to the combined statement in the sub-question statements' context.statement property such that you can tie them all together.
Use result, context, and activity definition extensions to capture anything. This should be a last resort option, it usually makes setting things up simple but adds significant complexity on the reporting side. Though tempting because of the simplicity, unless you are trying to capture a specific type of data point (like geo-location data, math equations, etc.) usually you should try to avoid the use of extensions.
Which of the above makes the most sense is probably determined by what sort of response is being given, and whether or not questions are nested such that there is an overall result and sub-results or whether there is just overall results.

How do I preload items in a SproutCore ListView?

I've got a relatively fast SproutCore app that I'm trying to make just a tad bit faster.
Right now, when the user scrolls my SC.ListView and they scroll into view some list items that have not been loaded from the server (say from a relationship), the app automatically makes a call to the server to load these records. Even though this is fast, there is still a short period of time where my list items are blank.
I know that I can make them say "Loading..." or something like that (and I have), but I was wondering: is there was a way to pre-load my "off-screen" records so that as the user scrolls, the list items are already loaded?
My ListItemViews will be fairly large (pixel-wise), so even loading double the amount of data is not going to be killer from an AJAX perspective, and it would be nice if as the user scrolled, the content was always loaded (unless they scroll SUPER-SUPER-fast, in which case I'm okay with them seeing a loading indicator).
I currently found a solution by adding the following to my SC.ListView, but I've noticed some major performance issues on mobile and they are directly related to making this change, so I was wondering if there was a better way.
contentIndexesInRect: function(rect) {
rect.height = rect.height * 2;
return sc_super();
}
Overriding contentIndexesInRect is the way I would do this. I would do less than double it though – I might get the result from sc_super() and then add a few extra items to the resulting index set. (I believe it comes back frozen, so you may have to clone-edit-freeze.) One or two extra may give you enough breathing room to get the stuff loaded, without contributing nearly as much to the apparent performance issue.
I'm surprised that it results in major performance issues though. It sounds to me like your list items themselves may be heavier-weight than they need to be – for example, they may have a lot of bindings to hook and unhook. If that's what's going on, you may benefit more from improving their efficiency.
I think you would be better served to load the additional data outside of the context of what the list is actually displaying. For instance, forcing more list items to render in order to trigger additional requests does result in having the extra data available, but also adds several unnecessary elements to the DOM, which is actually detrimental to overall performance. In fact these extra elements are most likely the cause of the major slowdown on mobile once you get to a sufficient number of extras.
Instead, I would first ensure that your list item views are properly pooling so that only the visible items are updating in place with as little DOM manipulation as possible. Then second, I would lazily load in additional data only after the required data is requested. There are quite a few ways to do this depending on your setup. You might want to add some logic to a data source to trigger an additional request on each filled request range or you might want to do something like override itemViewForContentIndex in SC.CollectionView as the point to trigger the extra data. In either case, I imagine it could look something like this,
// …
prefetchTriggered: function (lastIndex) {
// A query that will fetch more data (this depends totally on your setup).
var query = SC.Query.remote(MyApp.Record, {
// Parameters to pass to the data source so it knows what to request.
lastIndex: lastIndex
});
// Run the query.
MyApp.store.find(query);
},
// …
As I mention in the comments above, the structure of the request depends totally on your setup and your API so you'll have to modify it to meet your needs. It will work better if you are able to request a suitable range of items, rather than one-at-a-time.

Transferring lots of objects with Guid IDs to the client

I have a web app that uses Guids as the PK in the DB for an Employee object and an Association object.
One page in my app returns a large amount of data showing all Associations all Employees may be a part of.
So right now, I am sending to the client essentially a bunch of objects that look like:
{assocation_id: guid, employees: [guid1, guid2, ..., guidN]}
It turns out that many employees belong to many associations, so I am sending down the same Guids for those employees over and over again in these different objects. For example, it is possible that I am sending down 30,000 total guids across all associations in some cases, of which there are only 500 unique employees.
I am wondering if it is worth me building some kind of lookup index that I also send to the client like
{ 1: Guid1, 2: Guid2 ... }
and replacing all of the Guids in the objects I send down with those ints,
or if simply gzipping the response will compress it enough that this extra effort is not worth it?
Note: please don't get caught up in the details of if I should be sending down 30,000 pieces of data or not -- this is not my choice and there is nothing I can do about it (and I also can't change Guids to ints or longs in the DB).
Your wrote at the end of your question the following
Note: please don't get caught up in the details of if I should be
sending down 30,000 pieces of data or not -- this is not my choice and
there is nothing I can do about it (and I also can't change Guids to
ints or longs in the DB).
I think it's your main problem. If you don't solve the main problem you will be able to reduce the size of transferred data to 10 times for example, but you still don't solve the main problem. Let us we think about the question: Why so many data should be sent to the client (to the web browser)?
The data on the client side are needed to display some information to the user. The monitor is not so large to show 30,000 total on one page. No user are able to grasp so much information. So I am sure that you display only small part of the information. In the case you should send only the small part of information which you display.
You don't describe how the guids will be used on the client side. If you need the information during row editing for example. You can transfer the data only when the user start editing. In the case you need transfer the data only for one association.
If you need display the guids directly, then you can't display all the information at once. So you can send the information for one page only. If the user start to scroll or start "next page" button you can send the next portion of data. In the way you can really dramatically reduce the size of transferred data.
If you do have no possibility to redesign the part of application you can implement your original suggestion: by replacing of GUID "{7EDBB957-5255-4b83-A4C4-0DF664905735}" or "7EDBB95752554b83A4C40DF664905735" to the number like 123 you reduce the size of GUID from 34 characters to 3. If you will send additionally array of "guid mapping" elements like
123:"7EDBB95752554b83A4C40DF664905735",
you can reduce the original size of data 30000*34 = 1020000 (1 MB) to 300*39 + 30000*3 = 11700+90000 = 101700 (100 KB). So you can reduce the size of data in 10 times. The usage of compression of dynamic data on the web server can reduce the size of data additionally.
In any way you should examine why your page is so slowly. If the program works in LAN, then the transferring of even 1MB of data can be quick enough. Probably the page is slowly during placing of the data on the web page. I mean the following. If you modify some element on the page the position of all existing elements have to be recalculated. If you would be work with disconnected DOM objects first and then place the whole portion of data on the page you can improve the performance dramatically. You don't posted in the question which technology you use in you web application so I don't include any examples. If you use jQuery for example I could give some example which clear more what I mean.
The lookup index you propose is nothing else than a "custom" compression scheme. As amdmax stated, this will increase your performance if you have a lot of the same GUIDs, but so will gzip.
IMHO, the extra effort of writing the custom coding will not be worth it.
Oleg states correctly, that it might be worth fetching the data only when the user needs it. But this of course depends on your specific requirements.
if simply gzipping the response will compress it enough that this extra effort is not worth it?
The answer is: Yes, it will.
Compressing the data will remove redundant parts as good as possible (depending on the algorithm) until decompression.
To get sure, just send/generate the data uncompressed and compressed and compare the results. You can count the duplicate GUIDs to calculate how big your data block would be with the dictionary compression method. But I guess gzip will be better because it can also compress the syntactic elements like braces, colons, etc. inside your data object.
So what you are trying to accomplish is Dictionary compression, right?
http://en.wikibooks.org/wiki/Data_Compression/Dictionary_compression
What you will get instead of Guids which are 16 bytes long is int which is 4 bytes long. And you will get a dictionary full of key value pairs that will associate each guid to some int value, right?
It will decrease your transfer time when there're many objects with the same id used. But will spend CPU time before transfer to compress and after transfer to decompress. So what is the amount of data you transfer? Is it mb / gb / tb? And is there any good reason to compress it before sending?
I do not know how dynamic is your data, but I would
on a first call send two directories/dictionaries mapping short ids to long GUIDS, one for your associations and on for your employees e.g. {1: AssoGUID1, 2: AssoGUID2,...} and {1: EmpGUID1, 2:EmpGUID2,...}. These directories may also contain additional information on the Associations and Employees instances; I suspect you do not simply display GUIDs
on subsequent calls just send the index of Employees per Association { 1: [2,4,5], 3:[2,4], ...}, the key being the association short id and the ids in the array value, the short ids of the employees. Given your description building the reverse index: Employee to Associations may give better result size wise (but higher processing)
Then its all down to associative arrays manipulations which is straightforward in JS.
Again, if your data is (very) dynamic server side, the two directories will soon be obsolete and maintaining synchronization may cost you a lot.
I would start by answering the following questions:
What are the performance requirements? Are there size requirements? Speed requirements? What is the minimum performance that is truly needed?
What are the current performance metrics? How far are you from the requirements?
You characterized the data as possibly being mostly repeats. Is that the normal case? If not, what is?
The 2 options you listed above sound reasonable and trivial to implement. Try creating a look-up table and see what performance gains you get on actual queries. Try zipping the results (with look-ups and without), and see what gains you get.
In my experience if you're not TOO far from the goal, performance requirements are often trial and error.
If those options don't get you close to the requirements, I would take a step back and see if the requirements are reasonable in the time you have to solve the problem.
What you do next depends on which performance goals are lacking. If it is size, you're starting to be limited if you're required to send the entire association list ever time. Is that truly a requirement? Can you send the entire list once, and then just updates?

Resources