Magento Indexing of Catalog Products & its Attributes - magento

I have some queries for Magento Indexing Process:
First of all, why Magento does not perform the indexing process programmatically, after each Product or any of its Attribute(s) are newly added / modified? Why Admin(s) require to do the indexing, when the indexing processes are very much important & can be done programmatically?
In each indexing process, are all the products indexed (including those which have been indexed already), when the "Reindex Data" link is clicked, or only the non-indexed products are indexed?
If I want to view / debug the time taken for each indexing process, then what & how will I need to do?

Three part answer, so this should be worth triple points, right? :)
1) Probably because indexing tends to be a computationally intensive task, so rather than slow the site whenever a product is saved (usually during business hours), an Administrator can choose a low-load period, or schedule via cron for the indexing to occur at a typically low-load time.
2) All products are re-indexed. If you look in Mage_CatalogIndex_Model_Indexer::plainReindex(), you will see that it performs a clear to delete all index data before getting all active products to index.
$this->_getResource()->clear(
$attributeCodes,
$priceAttributeCodes,
count($priceAttributeCodes)>0,
count($priceAttributeCodes)>0,
count($priceAttributeCodes)>0,
$products,
$stores
);
<snip/>
$collection = $this->_getProductCollection($store, $products);
$collection->addAttributeToFilter(
'status',
array('in'=>Mage::getModel('catalog/product_status')->getSaleableStatusIds())
);
$this->_walkCollection($collection, $store, $attributeCodes);
where the _walkCollection method creates the index for each product.
3) You could turn on the Profiler. There are some great blog posts on how to use that. You could wrap the key code in Mage_CatalogIndex_Model_Indexer with Varien_Profiler::start('Indexer') etc to check the time taken.

To expand a bit on Jonathan's answer, it would be silly to run entire indexes every time a product is saved. Magento's indexing process for things like category products involves truncating tables and running the index queries again. If you were to save 20 different products, and the indexes were regenerated for each of those, you'd have wasted major time. Since the system can't guess when you mean to save 20 products in a row, you are instead expected to run your indexes yourself.

Related

magento 1.9 going very slow after create new category

I have created a new category on my store after creating this website taking 15-20 seconds to load, but before that it's taking only 5-6 seconds
Please re-index all the index and if it is still slow you can try by using flat tables for catalog.
Also make sure you have not added anything wrong in category details like special characters in desc etc.

Sorting elasticsearch types based on child type property

I'm working on a e-commerce search page and need to free text search products and have multiple facet options and sorting capabilities. The issue I'm facing has to do with product prices:
One product has multiple prices - there are special discounts, B2B customer specific prices, and specific B2C prices. There could be a few hundred prices per product.
I need to be able to do to a full text search on products, but still be able to sort on one of the selected price groups.
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible.
Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
My initial though would be to put all of the prices into the product item, but that means I'll need to update the product objects in the index every time a price changes - which is often. This will also make the objects quite big.
I would personally be inclined not to store complex pricing data within Elasticsearch, at least not prices calculated by business logic such as discounts and specific B2C prices.
A base price could be stored for querying and sorting, and apply pricing logic to this with scripting, using script queries and script sorting, respectively.
I see that elasticsearch now has the capability of HasParent/HasChildren queries, but I am not sure if that is the right way to go, or if it even is possible. Is it possible to keep prices as a separate type outside the product type and use the HasParent/HasChilden queries to sort the procuts on the price?
Parent/Child relationships operate on documents within a single index, with a join datatype field on a document to indicate the relationship between a parent and a child, and child documents indexed on the same shard as the parent. If children are not evenly distributed across parents/shards e.g. one parent document has a million children and the others have only a few each, it's possible to end up with hot spots within shards that can affect performance. Product and pricing data doesn't feel like a good fit for Parent/Child; pricing sounds like it's too dynamic to be stored within documents.

Magento reindexing of indexes database table name?

Which tables are connected with the process of reindexing of index in magento.
Please share any documents available for the same.
Can't take credit for this as it is taken from original post at: Can someone explain Magentos Indexing feature in detail?
Magento's indexing is only similar to database-level indexing in spirit. As Anton states, it is a process of denormalization to allow faster operation of a site. Let me try to explain some of the thoughts behind the Magento database structure and why it makes indexing necessary to operate at speed.
In a more "typical" MySQL database, a table for storing catalog products would be structured something like this:
PRODUCT:
product_id INT
sku VARCHAR
name VARCHAR
size VARCHAR
longdesc VARCHAR
shortdesc VARCHAR
... etc ...
This is fast for retrieval, but it leaves a fundamental problem for a piece of eCommerce software: what do you do when you want to add more attributes? What if you sell toys, and rather than a size column, you need age_range? Well, you could add another column, but it should be clear that in a large store (think Walmart, for instance), this would result in rows that are 90% empty and attempting to maintenance new attributes is nigh impossible.
To combat this problem, Magento splits tables into smaller units. I don't want to recreate the entire EAV system in this answer, so please accept this simplified model:
PRODUCT:
product_id INT
sku VARCHAR
PRODUCT_ATTRIBUTE_VALUES
product_id INT
attribute_id INT
value MISC
PRODUCT_ATTRIBUTES
attribute_id
name
Now it's possible to add attributes at will by entering new values into product_attributes and then putting adjoining records into product_attribute_values. This is basically what Magento does (with a little more respect for datatypes than I've displayed here). In fact, now there's no reason for two products to have identical fields at all, so we can create entire product types with different sets of attributes!
However, this flexibility comes at a cost. If I want to find the color of a shirt in my system (a trivial example), I need to find:
The product_id of the item (in the product table)
The attribute_id for color (in the attribute table)
Finally, the actual value (in the attribute_values table)
Magento used to work like this, but it was dead slow. So, to allow better performance, they made a compromise: once the shop owner has defined the attributes they want, go ahead and generate the big table from the beginning. When something changes, nuke it from space and generate it over again. That way, data is stored primarily in our nice flexible format, but queried from a single table.
These resulting lookup tables are the Magento "indexes". When you re-index, you are blowing up the old table and generating it again.

Avoid query each time the page is loaded

I work on an educational website in which we show dynamic filters. What does this mean? We now have several course categories that will increase in number in the following weeks.
Right now categories are a string field in each course. I'm planning to model this and create a Categories table. Having done this, it would be pretty easy to load the filters based on the categories we have on the database. However, I see the problem that each time the website is loaded, the query will be made.
I thought of caching these categories but I'm guessing is not the best solution.
Any idea on how can I avoid these queries each time but get this information from the database?

Magento catalog URL rewrite indexing taking too long

I've been dealing with this problem with around 10k+ products in two store views in magento 1.7.
The URL indexing process took around 30 hours to change its state to ready. Also i found multiple entries of the same product being made in the core_url_rewrite table and the number of rows now reached upto 6500k.
This is causing deadlocks. I tried clearing the locks but that didn't help. Is there a workaround solution for this problem as this is magento core functionality?
There's some good general advice on the Magento Stack Exchange site covering common indexing problems.
It's also common for larger store to create a rewrite/code-pool-override for the following method
#File: app/code/core/Mage/Catalog/Model/Resource/Url.php
protected function _getProducts($productIds, $storeId, $entityId, &$lastEntityId)
{
//...
}
This method queries for the products that need a URL reindex. By default, this includes all simple and configurable products. However, if you're not displaying simple products individually, you can tweak this query to not include those products. That can greatly reduce the number of URLs Magento needs to generate.

Resources