BlogEngine xml format practicality - blogengine.net

How practical is using xml format storage of BlogEngine's posts, without adversely affecting its performance ?
For very small number of posts, definitely its a good choice.

XML storage is good for websites that have less than 100 [Posts(and/or)Pages].
If you have more than 100 then the site is going to slow down the server.
Also if the website is a "members" based site under 300 members for xml.
http://blogengine.codeplex.com/discussions/257560
http://blogengine.codeplex.com/discussions/253252

Related

Caching data with frequently changed fields, what are the best solutions?

I'm currently implementing a news feed feature in our application, where an user should be able to query a number of posts that were pre-generated for them and were cached in redis.
The problem is, each post contains a lot of fields that are frequently updated (number of likes, comments, etc...) and if I run these write operations to redis itself, I'm afraid it would affects the read performance, since there are very large number of users currently using our application.
Do you recommend any solution for this?
Even a few seconds can help greatly. I set most API objects to at least 3-5 seconds.
Here are some best practices recommended from AWS:
https://aws.amazon.com/caching/best-practices/

Should sitemap requests load quickly?

Given the fact that the only clients accessing a sites' sitemaps are search engine spiders, does it make any sense to make these sitemap requests load quickly?
If you are worried over such things, you can always choose to compress them (.gz) - that minimizes load a lot since compressed XML sitemap files are much smaller.

Google crawling indexing algorithms

I am looking for some documents on how Google crawl and index content. I read many "light" papers and articles on what you need to do to improve your ranking and make sure your content is properly indexed but I am looking for some more advanced technical documents on how Google crawl and index content.
The things I would like to know more about:
What elements Google look for when it crawls: page content, URLs format, keywords, description etc...
How the index is updated?
Basically, I am trying to understand why some pages are indexed but not others even if the formats are similar. Why only 10% of my site's pages appear when I do a search on the entire domain even if I can see on my server logs that Google crawled every single link.
The answers to both things are closely-guarded trade secrets, ostensibly to prevent gaming the system.
Also keep in mind that Google makes over 400 algorithmic changes per year, making it close to impossible for an outsider to be accurate and up-to-date. Short of working for Google, you're likely not going to find an in-depth and accurate answer.
However, Matt Cutts, head of the web spam team, frequently provides the most accurate insights in how Google handles content, both on his blog and on the GoogleWebmasterHelp YouTube channel. It's worth going through his content to get a much better understanding of Google's methodology.
In order to provide a technical approach of how a webcrawler works I will suggest you to take a deep look into nutch.apache.org solution.
A typical webcrawler displays the following areas, a fetcher, a parser, and indexer and a searcher. To put it briefly a webcrawler fetch all urls available on a website and creates segments where its store up to 101kb per page. Those pages are parsed but typical words such as and-or-the are not stored but other words are analyzed using bayesian calculations in order to make a rank.
Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. These tasks are mainly performed by storing a list of occurrences of each search critera, typically in the form of a hash table or binary tree using an inverted index.
As Mark stated Google´s calculations are mainly trade secrets but Patents issued by google could be a good start. Pagerank http://en.wikipedia.org/wiki/PageRank analyses backlinks mainly and the importance that websites pointing to your site have on people´s preferences. In my experience its important to offer an xml sitemap stating all your webpages at your site. On that sitemap you could define the crawl frequency for each page. gsitecrawler.com/ is an interesting possibility.
Google Website Optimizer will give you the chance to see what is google finding on your site, logs are ok but probably the robot finds problem and the best way to know that is with google´s website optimizer in order to display errors.
Finally most of your concerns are things that SEO´s specialist live for, I suggest you to check sites like seomoz.com and their tools... You will learn how to position your website better on organic results on search engines.
hope it helps!, sebastian.
"Yes" Google like fresh & unique content.
Use Google webmaster guideline "try this instead" H1 or H2 meta tag on your HTML programming under the head tag ....your keyword. Anchor have to must use your business related keywords in H1, H2, it can help your site search engine.
Also use for Rich snippets in this tag..!
It scans you web page very precisely and sensitively. Factors like you have javascript embedded or in different file matter, whether you are using frames in designing or using heavy graphics can reduce the ranking of your page. Keywords are obviously rank affecting entities. Broken links also bring your website ranking down.
Basically you can refer to http://www.tutorialspoint.com/seo/ to go through all the important points of google's crawler. This will take a maximum of 40 mins.
MapReduce: Simplified Data Processing on Large Clusters
I analysed the latest algorithm and found that now
Google gives more importance to CONTENT rather than LINKS.
So if your content is good enough with properly available tags, Google will automatically generate index for you. I would suggest H1 - H6 all to be used in good manner.

RETS data fetching problem

I am working on one real estate website which is Using RETS service to get the data to my local server.
but I have one little bit problem here,I can fetch data from RETS which is having about 3lacks record in RETS Database but I didn't find the way,How can I fetch that all records in bunch of 50k at a time ?
I didn't find any 'LIMIT' keyword on RETS.so how can I fetch without 'LIMIT' 50k records at a time?
Please help me.
RETS is not really much of a standard. It's more closely resembles a pseudo standard. It loosely defines an XML schema that describes real estate listings.
In version 1.x, the "standard" was composed of DTD documents. In 2.x, the "standard" uses XSD documents to describe the list.
http://www.rets.org/documentation
However, in practice, there is almost no consistency amongst implementers. Having connected to hundreds of "RETS Compliant" service providers, I'm convinced that not one of them is like any other one.
Furthermore, the 2.x "standard" has not changed in 3 years. It's an unmaintained, sloppy attempt at a standard. It (RETS) is often used as a business buzz word by non-technical people. In reality, it's just an arbitrary attempt at modeling real estate listing in XML.
Try asking the specific implementer for their documentation. Often, they don't have any. So, emailing the lead developer has frequently been helpful. Sometimes they'll provide a WSDL which will outline the supported calls. Often, the WSDL doesn't coincide with the actual service, so beware.
As for your specific question, try caching the results. Usually, the use of a limit on a RETS call is a sign of a direct dependency. As requests for your service increase, the load that your service puts on theirs will break (and not be appreciated). Also, if their service goes down (even temporarily), yours will be interrupted as well. Most importantly, it will make the live requests to your pages really, really slow (especially if their system is slow at the time). The listings usually don't change frequently enough for worries about stale data, so caching up to and hour is pretty acceptable.
Best of luck!
libRets provides support for generating a query with fetch limits:
http://www.crt.realtors.org/projects/rets/librets/documentation/api/classlibrets_1_1_search_request.html
But last I knew: I remember the company Intereality either ignored or outright didn't provide complete compatibility to RETS. Quickest way to know your dealing with them is that also thought making all "System" name's for table fields numeric.
If you're lucky, you're using a Rapattoni backed server and they do provide spec. compatible servers.
Last point, I can't for the life of me remember it's name, but I used to use a free Java based RETS tool to build valid queries ( included offset/limit clauses ) and that made it a tad easier to build automated fetchers for a client's batch processing system.
IN RETS if Count More Than limit then We can download using Batch form or we can remove that Limit using regex while downloading
Best way to solve Problem divide Data Count in small unit of download and while we have to consider download limit in mind Field for Divide that one in MLS/IDX I Suggest Modification Date and ListingDate

Recommended file sizes for web

I am doing a web performance optimization using YSlow and Page Speed recomendations but when it comes to file sizes neither says anything about it.
Does anybody knows where I can get some recomendation of the max size of some file types??
For example images shouldn't be greater than XXX KB or something like that.
But i need reliable sources not just suggestions. I mean XXX did some research and they recommend this or that.
I never stumble on any recommendations rather than common sense. A lot of small size files create a chatter, big files slow you down and can time-out. It really is what performance tuning is for - run your tests for your particular setup, there's no magic bullet
When I first did web development (1998) Australia our recommended limit for the entire page size (including images) was 40 kB (who was that laughing?).
The scary thing is that for accessability, we still have to worry about dial-up users and so try to keep to this if we can (exploiting client side caching where possible).

Resources