Does SQLAlchemy support some kind of caching so if I repeatedly run the same query it returns the response from cache instead of querying the database? Is this cache automatically cleared when the DB is updated?
Or what's the best way to implement this on a CherryPy + SQLAlchemy setup?
We have a pretty comprehensive caching solution, as an example in conjunction with embedded hooks, in 0.6. It's a recipe to subclass Query, make it aware of Beaker, and allow control of query caching for explicit queries as well as lazy loaders via query options.
I'm running it in production now. The example itself is in the dist and the intro documentation is at http://www.sqlalchemy.org/docs/orm/examples.html#beaker-caching .
UPDATE: Beaker has now been replaced with dogpile caching: http://docs.sqlalchemy.org/en/latest/orm/examples.html#module-examples.dogpile_caching
Not an answer to your second question, but from the comments in this link indicates that SQLAlchemy does not support cacheing : http://spyced.blogspot.com/2007/01/why-sqlalchemy-impresses-me.html
Raven said...
Does SQLAlchemy do any kind of internal caching?
For example, if you ask for the same data twice (or an obvious subset
of the initially requested data) will the database be hit once or twice?
I recently wrote a caching database abstraction layer for an
application and (while fun) it was a fair bit of work to get it to a
minimally functional state. If SQLAlchemy did that I would seriously
consider jumping on the bandwagon.
I've found things in the docs that imply something like this might be
going on, but nothing explicit.
4:36 PM
Jonathan Ellis said...
No; the author of SA [rightly, IMO] considers caching a separate concern.
What you saw in the docs is probably the SA identity map, which makes it so
if you load an instance in two different places, they will refer
to the same object. But the database will still be queried twice, so it is
not a cache in the sense you mean.
SQLAlchemy supports two types of caches:
Caching the result set so that repeatedly running the same query hits the cache instead of the database. It uses dogpile which supports many different backends, including memcached, redis, and basic flat files.
Docs are here: http://docs.sqlalchemy.org/en/latest/orm/examples.html#module-examples.dogpile_caching
Caching the query object so that Python interpreter doesn't have to manually re-assemble the query string every time. These queries are called baked queries and the cache is called baked. Basically it caches all the actions sqlalchemy takes BEFORE hitting the database--it does not cut down on database calls. Initial benchmarks show speedups of up to 40% in query generation time at the tradeoff of a slight increase in code verbosity.
Docs are here: http://docs.sqlalchemy.org/en/latest/orm/extensions/baked.html
Or use an application-level cache via weak references dicts (weakref.WeakValueDictionary), see an example here : http://www.sqlalchemy.org/trac/wiki/UsageRecipes/UniqueObject
Related
I've been asking myself this question for quite a while. Maybe someone has already done some digging (or is involved in WP) to know the answer.
I'm talking about storing objects from WP-functions in PHP variables, for the duration of a page load, e.g. to avoid having to query the database twice for the same result set.
I don't mean caching in the sense of pre-rendering dynamic pages and saving them in HTML format for faster retrieval.
Quite a few "template tags" (Wordpress functions) may be used multiple times in a theme during one page load. When a theme or plugin calls such a function, does WP run a database query every time to retrieve the necessary data, and does it parse this data every time to return the desired object?
Or, does the function store the its result in a PHP variable the first time it runs, and checks if it already exists before it queries the database or parses?
Examples include:
wp_get_nav_menu_object()
wp_get_nav_menu_items()
wp_list_categories()
wp_tag_cloud()
wp_list_authors()
...but also such important functions as bloginfo() or wp_nav_menu().
Of course, it wouldn't make much sense to cache any and all queries like post-related ones. But for the above examples (there are more), I believe it would.
So far, I've been caching these generic functions myself when a theme required the same function to be called more than once on a page, by writing my own functions or classes and caching in global or static variables. I don't see why I should add to the server load by running the exact same generic query more than a single time.
Does this sort of caching already exist in Wordpress?
Yes, for some queries and functions. See WP Object Cache. The relevant functions are wp_cache_get, wp_cache_set, wp_cache_add, and wp_cache_delete. You can find these functions being used in many places through the WordPress code to do exactly what you are describing.
What are some of the common and notable performance issues/bottlenecks that are typically encountered in a web application in both, the front-end layer, and the back-end layer?
An example of what I mean in a database is not having something you are querying on be an index. That would slow down the query. On the front-end it might be something funky going on with JavaScript that makes your application seem slow.
What are the general rules of thumb that help navigate such issues? And what are some good to-do's?
Thanks,
Alex
On front-end:
-push all of your assets - css files, images, static content - to a CDN. Edgecast is pretty good and reasonably priced.
-don't use load entire javascript frameworks when you only need a few features from it. only load what's needed.
On back-end
-memcache the results from all database calls by using a hash of the sql query as the key name, and the result set as the value
-make sure you are not making your database tables really 'wide' - tons of columns and column types like 'text' and 'blob'
For the front-end, there are well-known guidelines/rules you can follow, and there are some great tools like YSlow that can help you pinpoint the bottlenecks.
For the back-end, as you've noted, efficient use of indexes is a must. Other optimizations usually involve caching, and basic stuff like avoiding doing stuff within loops that can be done once. I'm sure people here will have suggestions, but remember "premature optimization is the root of all evil!" :-)
Millhouse is on to it. I can also add:
Batch expensive operations up. For example: don't make lots of individual calls to a database if you can do it all in one hit.
Avoid server hops where you can.
Process in parallel if you can (not so common for your 'average' web app but quite possible in larger Enterprise scale apps).
Pre-process: crunching data, pre-puiblishing content etc, the more you can do before it's needed the better.
Use a CQRS-based architecture. CQRS stands for Command/Query Responsability Segregation; it basically means that you have different code (services) for reading from the DB and writing to the DB. A good practice for scalability is to have separate DB's for reading and writing (it actually does make sense, if you read more about CQRS), and you can scale out the reading database by having copies run on multiple servers.
CQRS is not only interesting from a scalability point of view, but also from a code maintenance and clarity point of view. It does take some effort to learn about CQRS and understand it, though.
Check out these links:
http://www.slideshare.net/skillsmatter/ddd-exchange-2010-udi-dahan-on-architectural-innovation-cqrs
http://www.slideshare.net/pjvdsande/rethink-your-architecture-with-cqrs
convert dynamic contents to static contents. regenerate those static contents if their dependent objects changed. I saw one article said that more than 80 percent contents are static on Amazon website.
I am working on one real estate website which is Using RETS service to get the data to my local server.
but I have one little bit problem here,I can fetch data from RETS which is having about 3lacks record in RETS Database but I didn't find the way,How can I fetch that all records in bunch of 50k at a time ?
I didn't find any 'LIMIT' keyword on RETS.so how can I fetch without 'LIMIT' 50k records at a time?
Please help me.
RETS is not really much of a standard. It's more closely resembles a pseudo standard. It loosely defines an XML schema that describes real estate listings.
In version 1.x, the "standard" was composed of DTD documents. In 2.x, the "standard" uses XSD documents to describe the list.
http://www.rets.org/documentation
However, in practice, there is almost no consistency amongst implementers. Having connected to hundreds of "RETS Compliant" service providers, I'm convinced that not one of them is like any other one.
Furthermore, the 2.x "standard" has not changed in 3 years. It's an unmaintained, sloppy attempt at a standard. It (RETS) is often used as a business buzz word by non-technical people. In reality, it's just an arbitrary attempt at modeling real estate listing in XML.
Try asking the specific implementer for their documentation. Often, they don't have any. So, emailing the lead developer has frequently been helpful. Sometimes they'll provide a WSDL which will outline the supported calls. Often, the WSDL doesn't coincide with the actual service, so beware.
As for your specific question, try caching the results. Usually, the use of a limit on a RETS call is a sign of a direct dependency. As requests for your service increase, the load that your service puts on theirs will break (and not be appreciated). Also, if their service goes down (even temporarily), yours will be interrupted as well. Most importantly, it will make the live requests to your pages really, really slow (especially if their system is slow at the time). The listings usually don't change frequently enough for worries about stale data, so caching up to and hour is pretty acceptable.
Best of luck!
libRets provides support for generating a query with fetch limits:
http://www.crt.realtors.org/projects/rets/librets/documentation/api/classlibrets_1_1_search_request.html
But last I knew: I remember the company Intereality either ignored or outright didn't provide complete compatibility to RETS. Quickest way to know your dealing with them is that also thought making all "System" name's for table fields numeric.
If you're lucky, you're using a Rapattoni backed server and they do provide spec. compatible servers.
Last point, I can't for the life of me remember it's name, but I used to use a free Java based RETS tool to build valid queries ( included offset/limit clauses ) and that made it a tad easier to build automated fetchers for a client's batch processing system.
IN RETS if Count More Than limit then We can download using Batch form or we can remove that Limit using regex while downloading
Best way to solve Problem divide Data Count in small unit of download and while we have to consider download limit in mind Field for Divide that one in MLS/IDX I Suggest Modification Date and ListingDate
I am fetching some questions from the server (database) and showing it to client (user) in the browser. The client will answer the question and based on his/her answer the next set of questions will be fetched from the database. Now, I want to pre-fetch the next set of questions while the user read the present question so that the waiting time for user to see the next question will be shorter.
My questions is, how to store the pre-fetched questions i.e. which data structure should I use to store the pre-fetched questions in the memory so that I can get better performance? I want a "cache" type of thing. Also once the user hit any question from the cache the question won't be there any more.
PS: Each question has unique Id.
Thanks
Naveen
There are multiple options to go about it. One that makes a big difference, one that makes little.
Little difference would be to fetch questions and store it in user's session. It's basically depends on where your session is stored, could also be database, or a file. This only makes sense if your db tables are very denormalized and it requires lots of joins to get the answer. I doubt that's the case so this won't make much difference for the user no matter which data structure used.
Big difference would make prefetching them with AJAX using javascript straight into the browser. In this case a simple array would suffice. JS gives you flexibility to build any objects with any properties, anything would be good enough. So write a poller in JS which fetches the questions from server while user is looking at the question, return them using JSON for example. JSON will become a simple object. Since each user stores only a couple of questions prefetched in their browser particular data structure choice won't make a difference here either.
Try using LinkedHashMap as You will have LRU algorithm implemented quickly with good performance.
Read this link as well :
LinkedHashMap as cache
First a few questions to adapt to your context :
assuming you use Java ?
using Hibernate also ?
If you want to prefetch in the server, many caching solutions exists.
Taking into account your unique id (see PS), if this ID is database related and you are using Hibernate, the easiest solution would be to configure the Hibernate second-level cache for that entity. Then, your only code would be to run the query in advance....
If theses requisites do not fit, I used EhCache as the caching solutions.
Somehow easy to start using, and it has plenty of features available when you later need them.
I have been playing with some LINQ ORM (LINQ directly to SQL) and I have to admit I like its expressive powers . For small utility-like apps, It also works quite fast: dropping a SQL server on some surface and you're set to linq away.
For larger apps however, the DAL never was that big of an issue to me to setup, nor maintain, and more often than not, once it was set, all the programming was not happening there anyway...
My, honest - I am an ORM newbie - question : what is the big advantage of ORM over writing a decent DAL by hand?
(seems like a double, couldn't find it though)
UPDATE : OK its a double :-) I found it myself eventually :
ORM vs Handcoded Data Access Layer
Strong-typing
No need to write the DAL yourself => time savings
No need to write SQL code yourself =>
less error-prone
I've used Hibernate in the past to dynamically create quite complex queries. The logic involved to create the appropriate SQL would have been very time-consuming to implement, compared with the logic to build the appropriate Criteria. Additionally, Hibernate knew how to work with various different databases, so I didn't need to put any of that logic in our code. We had to test against different databases of course, and I needed to write an extension to handle "like" queries appropriately, but then it ran against SQL Server, Oracle and HSqldb (for testing) with no issues.
There's also the fact that it's more code you don't have to write, which is always a nice thing :) I can't say I've used LINQ to SQL in anything big, but where I've used it for a "quick and dirty" web-site (very small, rarely updated, little benefit from full layer abstraction) it was lovely.
I used JPA in a project, and at first I was extremely impressed. Gosh it saved me all that time writing SQL! Gradually, however, I became a bit disenchanted.
Difficulty defining tables without surrogate keys. Sometimes we need tables that don't have surrogate keys. Sometimes we want a multicolumn primary key. TopLink had difficulties with that.
Forced datastructure relationships. JPA uses annotations to describe the relationship between a field and the container or referencing class. While this may seem great at first site, what do you do when you reference the objects differently in the application? Say for example, you need just specific objects that reference specific records based on some specific criteria (and it needs to be high-performance with no unnecessary object allocation or record retrieval). The effort to modify Entity classes will almost always exceed the effort that would have existed had you never used JPA in the first place (assuming you are at all successful getting JPA to do what you want).
Caching. JPA defines the notion of caches for your objects. It must be remembered that the database has its own cache, typically optimized around minimizing disk reads. Now you're caching your data twice (ignoring the uncollected GC heap). How this can be an advantage is beyond me.
Data != Objects. For high-performance applications, the retrieval of data from the DB must be done very efficiently. Forcing object creation is not always a good thing. For example, sometimes you may want arrays of primitives. This is about 30 minutes of work for an experienced programmer working with straight JDBC.
Performance, debugging.
It is much more difficult to gauge the performance of an application with complex things going on in the (sub-optimal, autogenerated) caching subsystem, further straining project resources and budgets.
Most developers don't really understand the impedence mismatch problem that has always existed when mapping objects to tables. This fact ensures that JPA and friends will probably enjoy considerable (cough cough) success for the forseeable future.
Well, for me it is a lot about not having to reinvent/recreate the wheel each time I need to implement a new domain model. It is simply a lot more efficient to use for instance nHibernate (my ORM of choice) for creating, using and maintaining the data access layer.
You don't specify exactly how you build your DAL, but for me I used to spend quite some time doing the same stuff over and over again. I used to start with the database model and work my way up from there, creating stored procedures etc. Even if I sometimes used little tools to generate parts of the setup, it was a lot of repetitive coding.
Nowadays I start with the domain. I model it in UML, and for most of the time I'm able to generate everything from that model, including the database schema. It need a few tweaks here and there, but with my current setup I get 95% of the job with the data access done in no time at all. The time I save I can use to fine tune the parts that need tuning. I seldom need to write any SQL statements.
That's my two cents. :-)
Portability between different db vendors.
My, honest - i am an ORM newbie - question : what is the big advance of ORM over writing a decent DAL by hand?
Not all programmers are willing or even capable of writing "a decent DAL". Those who can't or get scared from the mere thought of it, find LINQ or any other ORM a blessing.
I personally use LINQ to manipulate collections in the code because of its expressiveness. It offers a very compact and transparent way to perform some common tasks on collections directly in code.
LINQ will stop being useful to you when you will want to create very specific and optimized queries by hand. Then you are likely to get a mixture of LINQ queries intermingled with custom stored procedures wired into it. Because of this considerations, I decided against LINQ to SQL in my current project (since I have a decent (imho) DAL layer). But I'm sure LINW will do just fine for simple sites like maybe your blog (or SO for that matter).
With LINQ/ORM there may also be a consideration of lagging for high traffic sites (since each incoming query will have to be compiled all over again). Though I have to admit I do not see any performance issues on SO.
You can also consider waiting for the Entity Framework v2. It should be more powerful than LINQ (and hopefully not that bad as v1 (according to some people)).
Transparent persistence - changes get saved (and cascaded) without you having to call Save(). At first glance this seems like a nightmare, but once you get used to working with it rather than against it, your domain code can be freed of persistence concerns almost completely. I don't know of any ORM other than Hibernate / NHibernate that does this, though there might be some...
The best way to answer the question is to understand exactly what libraries like Hibernate are actually accomplishing on your behalf. Most of the time abstractions exist for a reason, often to make certain problems less complex, or in the case Hibernate is almost a DSL for expression certain persistance concepts in a simple terse manner.
One can easily change the fetch strategy for collections by changing an annotation rather than writing up lots of code.
Hibernate and Linq are proven and tested by many, there is little chance you can achieve this quality without lots of work.
Hibernate addresses many features that would take you months and years to code.
Also, while the JPA documentation says that composite keys are supported, it can get very (very) tricky quickly. You can easily spend hours (days?) trying to get something quite simple working. If JPA really makes things simpler then developers should be freed from thinking too much about these details. It doesn't, and we are left with having to understand two levels of abstraction, the ORM (JPA) and JDBC. For my current project I'm using a very simple implementation that uses a package protected static get "constructor" that takes a ResultSet and returns an Object. This is about 4 lines of code per class, plus one line of code for each field. It's simple, high-performance, and quite effective, and I retain total control. If I need to access objects differently I can add another method that reads different fields (leaving the others null, for example). I don't require a spec that tells me "how ORMs must (!) be done". If I require caching for that class, I can implement it precisely as required.
I have used Linq, I found it very useful. I saves a lot of your time writing data access code. But for large applications you need more than DAL, for them you can easily extent classes created by it. Believe me, it really improves your productivity.