Create a LDAP cache using unboundid LDAP SDK?

Create a LDAP cache using unboundid LDAP SDK? - performance

I would like to make a LDAP cache with the following goals
Decrease connection attempt to the ldap server
Read local cache if entry is exist and it is valid in the cache
Fetch from ldap if there is no such request before or the entry in the cache is invalid
Current i am using unboundid LDAP SDK to query LDAP and it works.
After doing some research, i found a persistent search example that may works. Updated entry in the ldap server will pass the entry to searchEntryReturned so that cache updating is possible.
https://code.google.com/p/ldap-sample-code/source/browse/trunk/src/main/java/samplecode/PersistentSearchExample.java
http://www.unboundid.com/products/ldapsdk/docs/javadoc/com/unboundid/ldap/sdk/AsyncSearchResultListener.html
But i am not sure how to do this since it is async or is there a better way to implement to cache ? Example and ideas is greatly welcomed.
Ldap server is Apache DS and it supports persistent search.
The program is a JSF2 application.

I believe that Apache DS supports the use of the content synchronization controls as defined in RFC 4533. These controls may be used to implement a kind of replication or data synchronization between systems, and caching is a somewhat common use of that. The UnboundID LDAP SDK supports these controls (http://www.unboundid.com/products/ldap-sdk/docs/javadoc/index.html?com/unboundid/ldap/sdk/controls/ContentSyncRequestControl.html). I'd recommend looking at those controls and the information contained in RFC 4533 to determine whether that might be more appropriate.
Another approach might be to see if Apache DS supports an LDAP changelog (e.g., in the format described in draft-good-ldap-changelog). This allows you to retrieve information about entries that have changed so that they can be updated in your local copy. By periodically polling the changelog to look for new changes, you can consume information about changes at your own pace (including those which might have been made while your application was offline).
Although persistent search may work in your case, there are a few issues that might make it problematic. The first is that you don't get any control over the rate at which updated entries are sent to your client, and if the server can apply changes faster than the client can consume them, then this can overwhelm the client (which has been observed in a number of real-world cases). The second is that a persistent search will let you know what entries were updated, but not what changes were made to them. In the case of a cache, this may not have a huge impact because you'll just replace your copy of the entire entry, but it's less desirable in other cases. Another big problem is that a persistent search will only return information about entries updated while the search was active. If your client is shut down or the connection becomes invalid for some reason, then there's no easy way to get information about any changes while the client was in that state.
Client-side caching is generally a bad thing, for many reasons. It can serve stale data to applications, which has the potential to cause incorrect behavior or in some cases pose a security risk, and it's absolutely a huge security risk if you're using it for authentication. It could also pose a security risk if not all of the clients have the same level of access to the data contained in the cache. Further, implementing a cache for each client application isn't a scalable solution, and if you were to try to share a cache across multiple applications, then you might as well just make it a full directory server instance. It's much better to use a server that can simply handle the desired load without the need for any additional caching.

Related

Clarification on database caching

Correct me if I'm wrong, but from my understanding, "database caches" are usually implemented with an in-memory database that is local to the web server (same machine as the web server). Also, these "database caches" store the actual results of queries. I have also read up on the multiple caching strategies like - Cache Aside, Read Through, Write Through, Write Behind, Write Around.
For some context, the Write Through strategy looks like this:
and the Cache Aside strategy looks like this:
I believe that the "Application" refers to a backend server with a REST API.
My first question is, in the Write Through strategy (application writes to cache, cache then writes to database), how does this work? From my understanding, the most commonly used database caches are Redis or Memcached - which are just key-value stores. Suppose you have a relational database as the main database, how are these key-value stores going to write back to the relational database? Do these strategies only apply if your main database is also a key-value store?
In a Write Through (or Read Through) strategy, the cache sits in between the application and the database. How does that even work? How do you get the cache to talk to the database server? From my understanding, the web server (the application) is always the one facilitating the communication between the cache and the main database - which is basically a Cache Aside strategy. Unless Redis has some kind of functionality that allows it to talk to another database, I don't quite understand how this works.
Isn't it possible to mix and match caching strategies? From how I see it, Cache Aside and Read Through are caching strategies for application reads (user wants to read data), while Write Through and Write Behind are caching strategies for application writes (user wants to write data). Couldn't you have a strategy that uses both Cache Aside and Write Through? Why do most articles always seem to portray them as independent strategies?
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
Could you implement a cache using a normal (not in-memory) database? I suppose this would still be somewhat useful since you do not need to make an additional network hop to the database server (since the cache lives on the same machine as the web server)?

Introduction & clarification
I guess you have one misunderstood point, that the cache is NOT expclicitely stored on the same server as the werbserver. Sometimes, not even the database is sperated on it's own server from the webserver. If you think of APIs, like HTTP REST APIs, you can use caching to not spend too many resources on database connections & queries. Generally, you want to use as few database connections & queries as possible. Now imagine the following setting:
You have a werbserver who serves your application and a REST API, which is used by the webserver to work with some resources. Those resources come from a database (lets say a relational database) which is also stored on the same server. Now there is one endpoint which serves e.g. a list of posts (like blog-posts). Every user can fetch all posts (to make it simple in this example). Now we have a case where one can say that this API request could be cached, to not let all users always trigger the database, just to query the same resources (via the REST API) over and over again. Here comes caching. Redis is one of many tools which can be used for caching. Since redis is a simple in-memory key-value storage, you can just put all of your posts (remember the REST API) after the first DB-query, into the cache. All future requests for the posts-list would first check whether the posts are alreay cached or not. If they are, the API will return the cache-content for this specific request.
This is one simple example to show off, what caching can be used for.
Answers on your question
My first question is, why would you ever write to a cache?
To reduce the amount of database connections and queries.
how is writing to these key-value stores going to help with updating the relational database?
It does not help you with updating, but instead it helps you with spending less resources. It also helps you in terms of "temporary backing up" some data - but that only as a very little side effect. For this, out there are more attractive solutions (Since redis is also not persistent by default. But it supports persistence.)
Do these cache writing strategies only apply if your main database is also a key-value store?
No, it is not important which database you use. Whether it's a NoSQL or SQL DB. It strongly depends on what you want to cache and how the database and it's tables are set up. Do you have frequent changes in your recources? Do resources get updated manually or only on user-initiated actions? Those are questions, leading you to the right caching implementation.
Isn't it possible to mix and match caching strategies?
I am not an expert at caching strategies, but let me try:
I guess it is possible but it also, highly depends on what you are doing in your DB and what kind of application you have. I guess if you find out what kind of application you are building up, then you will know, what strategy you have to use - i guess it is also not recommended to mix those strategies up, because those strategies are coupled to your application type - in other words: It will not work out pretty well.
What happens if you have a cluster of webs servers? Do they each have their own local in-memory database that acts as a cache?
I guess that both is possible. Usually you have one database, maybe clustered or synchronized with copies, to which your webservers (e.g. REST APIs) make their requests. Then whether each of you API servers would have it's own cache, to not query the database at all (in cloud-based applications your database is also maybe on another separated server - so another "hop" in terms of networking). OR (what i also can imagine) you have another middleware between your APIs (clusterd up) and your DB (maybe also clustered up) - but i guess that no one would do that because of the network traffic. It would result in a higher response-time, what you usually want to prevent.
Could you implement a cache using a normal (not in-memory) database?
Yes you could, but it would be way slower. A machine can access in-memory data faster then building up another (local) connection to a database and query your cached entries. Also, because your database has to write the entries into files on your machine, to persist the data.
Conclusion
All in all, it is all about being fast in terms of response times and to prevent much network traffic. I hope that i could help you out a little bit.

what are some caches that are responsible for fetching the data on miss?

The book 'architecture of open source software' says that the most common type of global cache in a web application is responsible for fetching the data itself, in case it is missing, as shown on this fixure. This seems different than what I've encountered so far. Most applications I have encountered make the application server responsible for fetching data from the db, and updating the server. At first, I thought the book might be talking about caching proxies, like Varnish, but they cover those in the next section, so that doesn't seem to be the case.
What cache systems actually fetch the data in case of a miss, and how do they know how to interact with the database?

Caching solutions provide read-through/write behind features which enable users to configure a read-through/write-behind provider be implementing some interface and deploying it with cache server. These providers contain logic about how cache server can interact with database to load/save data in database.
On a cache fetch operation if data is not present in cache server, cache loads data from database using configured provider thus avoiding a cache miss.
This way client applications deal cache as only data source and cache itself is responsible for interactions with database. You can read further details in this article by Iqbal Khan.
NCache and TayzGrid are enterprise solutions among many others that provide this feature.

Caching Dynamic data that isn't really dynamic in an IIS7 environment

Okay, so I have an old ASP Classic website. I've determined I can reduce a huge number of DB calls by caching the data daily. Our site data is read only, and changes very slowly. I think based on our site usage, I would be able to cache pages by query string for every visit each day, without a hit to our server.
My first thought was to use Output Caching, but the problem I discovered right away was that it wasn't until the third page request was generated that I gained any performance. I verified this using SQL profiler, but I'm not sure why.
My second thought was to add this ObjPageCache include file from https://web.archive.org/web/20211020131054/https://www.4guysfromrolla.com/webtech/032002-1.shtml After some research I discovered that this could cause more issues than it may solve http://support.microsoft.com/kb/316451
I'm hoping someone on here will tell me that since 2002 the issue with Sending ServerXMLHTTP or WinHTTP Requests to the Same Server has been resolved with Microsoft.

Depending on how your data is maintained you could choose from a number of ways to cache it.
If your data is changed and saved in one single place you could choose to generate an html-file which you save to the serverdisk and refer to in your linking. This will require write access for the process running your site though (e.g. NETWORK SERVICE). This will produce fast pages as the server serves these pages without any scriptingengine getting involved.
Another option is reading the data into an DomDocument which you store in the Application object and refer to on the page that needs it (hence saving the roundtrip to the database). You could keep two timestamps together with the cached data (one for the cachingtime and one for the time of change of data in the database). Timestamps will allow for fast check for staleness of the cached data: cached timestamp <> database timestamp => refresh data; otherwise use cached data. One thing to note about this approach is that Application does not accept objects other than multithreaded object so you will have to use the MSXML2.FreeThreadedDomDocument.6.0
Personally I prefer the last one as it allows for a more dynamic usage and I don't have to worry about write access permissions for the process running my site (which would probably pose security risks anyways).

what way to store data by key and value?

I store data in
HttpContext.Current.Application.Add(appKey, value);
And read data by this one:
HttpContext.Current.Application[appKey];
This has the advantage for me that is using a key for a value but after a short time (about 20 minutes) it does not work, and I can not find [appKey],because the application life cycle in iis data will lose.
i want to know is that another way to store my data by key and value?
i do not want sql server,file,... and want storing data on server not on client
i store users some data in it.
thanks for your helping

Since IIS may recycle and throw away any cache/memory contents at any time, the only way you will get data persisted is to store it outside IIS. Some examples are; (and yes, I included the ones you stated you didn't want just to have the list a bit more complete, feel free to skip them)
A SQL database (there are quite a few free ones if the price is prohibitive)
A NoSQL database (same thing there, quite a few free ones and usually simpler to use for key/value)
File (which you also stated you didn't want)
Some kind of external memory cache, a'la AppFabric cache or memcached.
Cookies (somewhat limited in size and not secure in any way by default)

you could create a persistent cookie on the user's machine so that the session doesn't expire, or increase the session timeout to a value that would work better for your situation/users
How to create persistent cookies in asp.net?
Session timeout in ASP.NET

You're talking about persisting data beyond the scope of a session. So you're going to have to use some form of persistent storage (Database, File, Caching Server).
Have you considered using AppFabric. It's actually pretty easy to implement. You could either access it directly from your code using the nuget packages, or you could just configured it as a session store. (I think) doing the latter would mean you'd get rid of the session timeout issue.

Do you understand that whatever you decide to store in Application, will be available for all users in your application?
Now regarding your actual question, what kind of data do you plan on storing? If its user sensitive data, then it probably makes sense to store it in the session. If it's client specific and it doesn't contain any sensitive information, than cookies is probably a reasonable way forward.
If it is indeed an application wide data and it must be the same for every user of your application, then you can make configuration changes to make sure that it doesn't expiry after 20 minutes.

Move application to Websphere clusters

What should we take care of before moving an application from a single Websphere Application Server to a Websphere cluster

This is my list from experience. It is not complete but should cover the most common problem areas:
Plan head the distributed session management configuration (ie. will you use memory-to-memory or database based replicaton). Make a notice that if you are still on 32-bit platform the resource requirement overhead from clustering might cause you instability issues if your application uses already lots of memory.
Make sure that everything you put into user sessions can be serialized with the default serializer (implements Serializable). You might otherwise run into problems with distributed sessions.
The same goes for everything you put into DynaCache. Make sure everything serializes properly.
Specify and make sure all the resource definitions (JDBC providers etc) will be made to a proper scope. I would usually recommend using the actual Cluster scope for everything that your applications installed to cluster use. That ensures the testing features work properly from proper points, and that you don't make conflicting definitions.
Make sure your application uses relative paths for resources in web interfaces. Once you start load balancing and stuff you can run into some serious problems if you have bolted down a lot of stuff.
If you had any sort of timers make sure they work well with clusters. With Quartz that means probably that you should use the JDBC store for timer tasks. With EJB Timers make sure you register the timers only once (it is possible to corrupt the timer database of WAS if you have several nodes attempting the registering at the exactly same time) and make sure you install them to Cluster scope.
Make sure you use the WAS provided SSO mechanisms. If you have a custom implementation please make sure it handles moving the user between servers in cluster well.

Keep it simple, depending on your requirements, try configuring your load balancer to use sticky sessions and not hold state in your HTTP Session. That way you don't need to use resource hungry in memory session replication.
Single Sign On isn't an issue for a single cluster as your HTTP clients will not be moving off the same http://server.acme.com/... host domain name.
Most of your testing should focus on database contention. If you have a highly transactional application (i.e. many writes to the same table) make sure you look at your database Isolation levels so that locks are not held unecessarily. Same goes for your transaction demarkaction. Keep transactions as brief as possible. If you dont have database skills yourself make sure you get a Database Analyst to help you monitor the database while you test.

Also a good advice to raise a PMR to IBM Support up front of any major changes, such as this one or upgrading to new versions etc. Raise it as a "Software Usage Question" and they can provide you with feedback from their knowledge database based on other customers input. Same would apply for any type of product which you have a support agreement for - ask support before problems occur.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio