There are two ways to get changes from Exchange using EWS:
1. Use SyncFolderItems to get a list of changed items
2. Use FindItem to get a list of changed items (filtered by modify time)
I just need to know new and modified items, not the deleted ones, so both of above can satisfy the functional need. The performance impact is more important for us. Based on my testing on a folder with 20000 items (200 are new items), both APIs take similar amount of time to complete.
This seems to be counter intuitive, I thought SyncFolderItems would be more efficient as it doesn't need to search against everything. My question is, from performance perspective, are they the same or there are other considerations? (I prefer to use FindItem if so since I don't need to store the sync state of each folder)
Thanks!
Related
I'm trying to find some information about the performance of angular.
If I had a list of 10k (or 50k) objects with 20 attributes each, would an average pc be able to filter this array in a reasonable time?
(Assuming a good implementation and this beeing the only operation executed)
Does anybody have any expierience in this direction?
Generally speaking, unless you can find a way to do this asynchronously; it's going to cause blocking UI issues. I've found the average "PC" (and let's not forget browsers are not created equal) is too slow for my use cases and that was with more simple objects (less attributes).
If you had a very limited set of attributes you actually want to filter on, there is a chance you could speed up the work by leveraging IndexedDB.
I'm pondering over how to efficiently represent locations in a database, such that given an arbitrary new location, I can efficiently query the database for candidate locations that are within an acceptable proximity threshold to the subject.
Similar things have been asked before, but I haven't found a discussion based on my criteria for the problem domain.
Things to bear in mind:
Starting from scratch, I can represent data in any way (eg. long&lat, etc)
Any result set is time-sensitive, in that it loses validity within a short window of time (~5-15mins) so I can't cache indefinitely
I can tolerate some reasonable margin of error in results, for example if a location is slightly outside of the threshold, or if a row in the result set has very recently expired
A language agnostic discussion is perfect, but in case it helps I'm using C# MVC 3 and SQL Server 2012
A couple of first thoughts:
Use an external API like Google, however this will generate thousands of requests and the latency will be poor
Use the Haversine function, however this looks expensive and so should be performed on a minimal number of candidates (possibly as a Stored Procedure even!)
Build a graph of postcodes/zipcodes, such that from any node I can find postcodes/zipcodes that border it, however this could involve a lot of data to store
Some optimization ideas to reduce possible candidates quickly:
Cache result sets for searches, and when we do subsequent searches, see if the subject is within an acceptable range to a candidate we already have a cached result set for. If so, use the cached result set (but remember, the results expire quickly)
I'm hoping the answer isn't just raw CPU power, and that there are some approaches I haven't thought of that could help me out?
Thank you
ps. Apologies if I've missed previously asked questions with helpful answers, please let me know below.
What about using GeoHash? (refer to http://en.wikipedia.org/wiki/Geohash)
I have the following setup:
boolean data: (userid, itemid)
hadoop based mahout itemSimilarityJob with following arguements:
--similarityClassname Similarity_Loglikelihood
--maxSimilaritiesPerItem 50 & others (input,output..)
item based boolean recommender:
-model MySqlBooleanPrefJDBCDataModel
-similarity MySQLJDBCInMemoryItemSimilarity
-candidatestrategy AllSimilarItemsCandidateItemsStrategy
-mostSimilarItemsCandidateStrategy AllSimilarItemsCandidateItemsStrategy
Is there a way to use similarity cooccurence in my setup to get final recommendations? If I plug SIMILARITY_COOCCURENCE in the job, the MySqlJDBCInMemorySimilarity precondition checks fail since the counts become greater than 1. I know I can get final recommendations by running the recommender job on the precomputed similarities. Is there way to do this real time using the api like in the case of similarity loglikelihood (and other similarity metrics with similarity values between -1 & 1) using MysqlInMemorySimilarity?
How can we cap the max no. of similar items per item in the item similarity job. What I mean here is that the allsimilaritemscandidatestrategy calls .allsimilaritems(item) to get all possible candidates. Is there a way I can get say top 10/20/50 similar items using the API. I know we can pass a --maxSimilaritiesPerItem to the item similarity job but i am not completely sure as to what is stands for and how it works. If I set this to 10/20/50, will I be able to achieve what stated above. Also is there way to accomplish this via the api?
I am using a rescorer for filtering out and rescoring final recommendations. With rescorer, the calls to /recommend/userid?howMany=10&rescore={..} & to /similar/itemid?howMany=10&rescore{..} are taking way to longer (300ms-400ms) compared to (30-70ms) without the rescorer. I m using redis as an in memory store to fetch rescore data. The rescorer also receives some run-time data as shown above. There are only a few checks that happen in rescorer. The problem is that as the no. of item preferences for a particular user increase (> 100), the no. of calls to isFiltered() & rescore() increase massively. This is mainly due to the fact that for every user preference, the call to candidateStrategy.getCandidatItems(item) returns around (100+) similar items for each and the rescorer is called for each of these items. Hence the need to cap the max number of similar items per item in the job. Is this correct or am I missing something here? Whats the best way to optimise the rescorer in this case?
The MysqlJdbcInMemorySimilarity uses GenericItemSimilarity to load item similarities in memeory and its .allsimilaritems(item) returns all possible similar items for a given item from the precomputed item similarities in mysql. Do i need to implement my own item similarity class to return top 10/20/50 similar items. What about the if user's no. of preferences continue to grow?
It would be really great if anyone can tell me how to achieve the above? Thanks heaps !
What Preconditions check are you referring to? I don't see them; I'm not sure if similarity is actually prohibited from being > 1. But you seem to be asking whether you can make a similarity function that just returns co-occurrence, as an ItemSimilarity that is not used with Hadoop. Yes you can; it does not exist in the project. I would not advise this; LogLikelihoodSimilarity is going to be much smarter.
You need a different CandidateItemStrategy, particularly, look at SamplingCandidateItemsStrategy and its javadoc. But this is not related to Hadoop, rather than run-time element, and you mention a flag to the Hadoop job. That is not the same thing.
If rescoring is slow, it means, well, the IDRescorer is slow. It is called so many times that you certainly need to cache any lookup data in memory. But, reducing the number of candidates per above will also reduce the number of times this is called.
No, don't implement your own similarity. Your issue is not the similarity measure but how many items are considered as candidates.
I am the author of much of the code you are talking about. I think you are wrestling with exactly the kinds of issues most people run into when trying to make item-based work at significant scale. You can, with enough sampling and tuning.
However I am putting new development into a different project and company called Myrrix, which is developing a sort of 'next-gen' recommender based on the same APIs, but which ought to scale without these complications as it's based on matrix factorization. If you have time and interest, I strongly encourage you to have a look at Myrrix. Same APIs, the real-time Serving Layer is free/open, and the Hadoop-based Computation Layer backed in also available for testing.
Recently I found an nice blog post presenting 2 approaches for tracking online users of a web site with the help of Redis.
1) Smart-keys and setting their expiration
http://techno-weenie.net/2010/2/3/where-s-waldo-track-user-locations-with-node-js-and-redis
2) Set-s and intersects
http://www.lukemelia.com/blog/archives/2010/01/17/redis-in-practice-whos-online/
Can you judge which one should be faster and why?
For knowing whether or not a particular user is online, the first method will be a lot faster - nothing is faster than reading a single key.
Finding users on a particular page is not as clear (I haven't seen hard numbers on the performance of either intersection or wildcard keys), but if the set is big enough to cause performance problems in either implementation it isn't practical to display them all anyway.
For matching users to a friends list I would probably go with the first approach also - even a few hundred get operations (checking the status of everyone in the list) should outperform intersection on multiple sets if those sets have a large number of records and are difficult to maintain.
Redis sets are more appropriate for things that can't be done with keys, particularly where getting all items in the set is more important than checking if a particular item is in the set.
All web developers run into this problem when the amount of data in their project grows, and I have yet to see a definitive, intuitive best practice for solving it. When you start a project, you often create forms with tags to help pick related objects for one-to-many relationships.
For instance, I might have a system with Neighbors and each Neighbor belongs to a Neighborhood. In version 1 of the application I create an edit user form that has a drop down for selecting users, that simply lists the 5 possible neighborhoods in my geographically limited application.
In the beginning, this works great. So long as I have maybe 100 records or less, my select box will load quickly, and be fairly easy to use. However, lets say my application takes off and goes national. Instead of 5 neighborhoods I have 10,000. Suddenly my little drop-down takes forever to load, and once it loads, its hard to find your neighborhood in the massive alphabetically sorted list.
Now, in this particular situation, having hierarchical data, and letting users drill down using several dynamically generated drop downs would probably work okay. However, what is the best solution when the objects/records being selected are not hierarchical in nature? In the past, of done this with a popup with a search box, and a list, but this seems clunky and dated. In today's web 2.0 world, what is a good way to find one object amongst many for ones forms?
I've considered using an Ajaxifed search box, but this seems to work best for free text, and falls apart a little when the data to be saved is just a reference to another object or record.
Feel free to cite specific libraries with generic solutions to this problem, or simply share what you have done in your projects in a more general way
I think an auto-completing text box is a good approach in this case. Here on SO, they also use an auto-completing box for tags where the entry already needs to exist, i.e. not free-text but a selection. (remember that creating new tags requires reputation!)
I personally prefer this anyways, because I can type faster than select something with the mouse, but that is programmer's disease I guess :)
Auto-complete is usually the best solution in my experience for searches, but only where the user is able to provide text tokens easily, either as part of the object name or taxonomy that contains the object (such as a product category, or postcode).
However this doesn't always work, particularly where 'browse' behavior would be more suitable - to give a real example, I once wrote a page for a community site that allowed a user to send a message to their friends. We used auto-complete there, allowing multiple entries separated by commas.
It works great when you know the names of the people you want to send the message to, but we found during user acceptance that most people didn't really know who was on their friend list and couldn't use the page very well - so we added a list popup with friend icons, and that was more successful.
(this was quite some time ago - everyone just copies Facebook now...)
Different methods of organizing large amounts of data:
Hierarchies
Spatial (geography/geometry)
Tags or facets
Different methods of searching large amounts of data:
Filtering (including autocomplete)
Sorting/paging (alphabetically-sorted data can also be paged by first letter)
Drill-down (assuming the data is organized as above)
Free-text search
Hierarchies are easy to understand and (usually) easy to implement. However, they can be difficult to navigate and lead to ambiguities. Spatial visualization is by far the best option if your data is actually spatial or can be represented that way; unfortunately this applies to less than 1% of the data we normally deal with day-to-day. Tags are great, but - as we see here on SO - can often be misused, misunderstood, or otherwise rendered less effective than expected.
If it's possible for you to reorganize your data in some relatively natural way, then that should always be the first step. Whatever best communicates the natural ordering is usually the best answer.
No matter how you organize the data, you'll eventually need to start providing search capabilities, and unlike organization of data, search methods tend to be orthogonal - you can implement more than one. Filtering and sorting/paging are the easiest, and if an autocomplete textbox or paged list (grid) can achieve the desired result, go for that. If you need to provide the ability to search truly massive amounts of data with no coherent organization, then you'll need to provide a full textual search.
If I could point you to some list of "best practices", I would, but HID is rarely so clear-cut. Use the aforementioned options as a starting point and see where that takes you.