Joomla 3.1: extremely slow in bulk deletion - joomla

I have a simple script to delete articles from a category, it takes forever to delete about 10k articles, here's my code:
//include libraries
$db = JFactory::getDbo();
$query = $db->getQuery(true);
$query->select(array('id'));
$query->from('#__content');
$query->where('catid = 14');
$db->setQuery($query);
$results = $db->loadObjectList();
$table = JTable::getInstance('Content', 'JTable', array());
foreach ($results as $article)
{
$table->delete($article->id);
}
With this code it take about 12 hours to delete about 3k articles. Am I doing things wrong or this is just how Joomla works?
I could simply do MySQL script to delete rows in #_content, however that way many related tables would not be processed, for example, #_assets, #__ucm*, #__contenttagmap, etc.

12 hours is an extremely long time to delete 3k articles. I would expect it to take a minute/couple of minutes.
Try using the following to see if it makes any difference.
NOTE: Please make sure you have a backup of the database before trying this code!
$db = JFactory::getDbo();
$query = $db->getQuery(true);
$conditions = array(
'catid=14');
$query->delete('#__content');
$query->where($conditions);
$db->setQuery($query);
Hope this helps

You could delete 3000 articles by hand by making the filter list length 100, checking all, and then clicking the delete button 30 times. So clearly the way you are doing it is not what should be happening although basically the theory is right.
Can you please make sure that you have no asset_ids for anything but root.1 that have parent_id == 0?
Also make sure that all of the articles have properly set up asset ids i.e. make sure that they always are parented to the asset for category 14.
You should be able to use the model to delete an array. But even looping 3000 times should not take 8 hours unless you have come infinite loops that are starting over and over and triggering the failsafe for runaway loops.

Related

Programmatically modify related products in magento

I'm trying to programmatically manipulate the product relations in a Magento store.
From what I've read, setRelatedLinkData should be the way to go.
As I simple test, I'm just trying to replace a products related products with nothing (i.e. an empty array), however it's not working - the product in question is still showing the related product in the backend.
The test code I'm working with is:
$product = Mage::getModel('catalog/product')->load($product->getId());
$linkData = array();
print_r($linkData);
$product->setRelatedLinkData($linkData);
echo "Save\n";
$r = $product->save();
As mentioned above however the product still has a related product when I reload it in the backend.
NOTE: I don't only want to remove related products, eventually I want to be able to add new ones as well, so a DELTE FROM... SQL query isn't what I am looking for. However if I can't get it to work to remove products, then it's certainly not going to work to add them, so one step at a time :-)
The quickest way I can think of is to use the Link Resource:
app/code/core/Mage/Catalog/Model/Resource/Product/Link.php saveProductLinks
// sample code
$product = Mage::getModel('catalog/product')->load(147);
$linkData = array();
Mage::getResourceModel('catalog/product_link')->saveProductLinks(
$product, $linkData, Mage_Catalog_Model_Product_Link::LINK_TYPE_RELATED
);
and if you want to assign products use the same code but provide this as $linkData:
$linkData = array(
'145' => array('position' => 1),
'146' => array('position' => 2)
);

Which way is better to load models in magento?

I don't know if i am asking that right.I need to load a product, change some values and save it. My question is which way is the appropriate to do it.Currently i am using that way:
$id = Mage::getModel('catalog/product')->getIdBySku($sku);
$product = Mage::getModel('catalog/product')->load($id);
1) In general it works fine even with 40K products.But i read that this way is leading to memory leak? Also for that solution i read that if i disable the reindex functionality i can improve the process time.
2) If i use another way and load it as a collection and then apply some filters, ex addFieldToFilter('sku',$the_product_i_want)
would be better?
When i say better i mean: 1)the magento way, 2)time efficient 3)not doing something that i don't need to do.
Each time you call the following code snippet Mage::getModel('catalog/product') Magento will create a new model object in the memory. And that will lead to memory wastage. You can do as follows.
$model = Mage::getModel('catalog/product');
$id = $model->getIdBySku($sku);
$product = $model->load($id);
At the same time if you have the product object you can use the following code.
$collection = Mage::getModel('catalog/product')->getCollection();
$product = $collection->addFieldToFilter('sku',$the_product_i_want);

Unexpected behavior in Magento collection filters used in loops

I'm seeing an unexpected behavior in collections that maybe someone can enlighten me on (or maybe it's a bit of PHP I don't grok) (sorry for the length of this post, but I had to include sample code and results).
My goal was to write a report where I get an order and it's order items, then go to the invoice and shipment data for the order and get the matching order item data from their items. I know that there is only one invoice and shipment for all orders, so even though Magento uses a 1-M relationship between orders and invoices/shipments, I can take advantage of it as if it were 1-1
I know that the items are all related using the order_item_id fields, so I tried to write a function that used the the following call -
$invoiceItem = $order
->getInvoiceCollection()
->getFirstItem()
->getItemsCollection()
->addFieldToFilter('order_item_id', $orderItem->getItemId())
->getFirstItem();
But that didn't results I expected, what I saw was the same invoice item returned regardless of the order item id used in the filter.
So, to try to understand the problem, I wrote the following small program to see how the queries where created.
<?php
require dirname(__FILE__).'/../app/Mage.php';
umask(0);
Mage::app('default');
$orders = Mage::getResourceModel('sales/order_collection')
->addAttributeToSelect('*')
->addAttributeToFilter('state', 'processing')
->addAttributeToSort('created_at', 'desc')
->addAttributeToSort('status', 'asc')
->load();
foreach ($orders as $order) {
echo "\ngetting data for order id ". $order->getId();
$items = $order->getAllItems();
$invoice = $order->getInvoiceCollection()->getFirstItem();
foreach ($items as $orderItem) {
echo "\n\ngetting data for order item id ". $orderItem->getItemId();
$invoiceItems = $order
->getInvoiceCollection()
->getFirstItem()
->getItemsCollection()
->addFieldToFilter('order_item_id', $orderItem->getItemId());
echo "\n".$invoiceItems->getSelect();
}
die; //just quit after one iteration
}
The output from this program was the following -
getting data for order id 7692
getting data for order item id 20870
SELECT `main_table`.* FROM `sales_flat_invoice_item` AS `main_table` WHERE (parent_id = '7623') AND (order_item_id = '20870')
getting data for order item id 20871
SELECT `main_table`.* FROM `sales_flat_invoice_item` AS `main_table` WHERE (parent_id = '7623') AND (order_item_id = '20870') AND (order_item_id = '20871')
getting data for order item id 20872
SELECT `main_table`.* FROM `sales_flat_invoice_item` AS `main_table` WHERE (parent_id = '7623') AND (order_item_id = '20870') AND (order_item_id = '20871') AND (order_item_id = '20872')
As you can see, every time though the loop another "AND (order_item_id =" was added for each item Id that I was filtering on. I thought that every time though the loop, I'd be getting a fresh version of the collection from using $order->getInvoiceCollection().
So, can anyone tell me what's going wrong in my sample code and educate me on the correct way to do this?
Thanks!
Regarding your business question: need more info. Is the goal to generate a collection with order item objects which are EACH aware of invoice and shipment details? It seems like there are rendering concerns getting pushed into modeling concerns.
Regarding the select statement question: Varien collections have an optimization which prevents them from accessing the storage backend more than once. This standard behavior is achieved in DB collection instances by setting the _isCollectionLoaded property to true.
In your case, the invoice collection instance is created via the order instance stored in a protected property, and immediately load()ed via IteratorAggregate (invoked via foreach). Because you are using the same order object instance in each iteration, you are dealing with this loaded invoice collection instance and are effectively calling addFieldToFilter(/* next order id */) with each iteration, resulting in the ever-expanding WHERE clause. This specific optimization can easily be worked around by calling $order->reset(). This brings us back to the salient issue though, which is the need to better understand the goal and (likely) use a custom collection or to manipulate a collection to join in the specific data that you need.

Magento: Updating Product Catalogs faster

I have written quite a few scripts to update my product catalog based on some or other parameter. In each of them the base logic is something simillar to this...
//Get collection
$collection = Mage::getModel('catalog/product')->getCollection();
$collection->addAttributeToSelect('sku');
$collection->addAttributeToSelect('publihser');
$collection->addFieldToFilter(array(array('attribute'=>'publisher','eq'=>$publisher)));
// for each product in collection do a individual save
foreach ($collection as $product) {
$product->setSKU($newValue);
$product->save();
}
Though this work, each save is a SQL update query and the fact is that having a very large catalog, this is fairly slow.
I was wondering if this could be sped up by doing single save on the collection instead on the product.
There are several things you can do to write a much faster update script. I don't know how you are getting some of your variables so you'll need to modify to get it working in your case, but the code below should be much faster than the way you are currently doing it. e.g:
// Set indexing to manual before starting updates, otherwise it'll continually get slower as you update
$processes = Mage::getSingleton('index/indexer')->getProcessesCollection();
$processes->walk('setMode', array(Mage_Index_Model_Process::MODE_MANUAL));
$processes->walk('save');
// Get Collection
$collection = Mage::getModel('catalog/product')->getCollection()
->addAttributeToSelect('sku')
->addAttributeToSelect('publihser')
->addFieldToFilter(array(array('attribute'=>'publisher','eq'=>$publisher)));
function productUpdateCallback($args){
$product = Mage::getModel('catalog/product');
$product->setData($args['row']);
$productId = $product->getId();
$sku = 'yourSku';
// Updates a single attribute, much faster than calling a full product save
Mage::getSingleton('catalog/product_action')
->updateAttributes(array($productId), array('sku' => $sku), 0);
}
// Walk through collection, for large collections this is much faster than using foreach
Mage::getSingleton('core/resource_iterator')->walk($collection->getSelect(), array('productUpdateCallback'));
// Reindex all
$processes->walk('reindexAll');
// Set indexing back to realtime, if you have it set to manual normally you can comment this line out
$processes->walk('setMode', array(Mage_Index_Model_Process::MODE_REAL_TIME));
$processes->walk('save');

Preventing Doctrine's query cache in Symfony

In my Symfony/Doctrine app, I have a query that orders by RANDOM(). I call this same method several times, but it looks like the query's result is being cached.
Here's my relevant code:
$query = $table->createQuery('p')
->select('p.*, RANDOM() as rnd')
->orderBy('rnd')
->limit(1)
->useQueryCache(null)
->useResultCache(null);
$result = $query->fetchOne();
Unfortunately, the same record is returned every time, regardless of me passing null to both useQueryCache and useResultCache. I tried using false instead of null, but that didn't work either. Lastly, I also tried calling both setResultCacheLifeSpan(0) and setResultCacheLifeSpan(-1), but neither call made a difference.
Any insight on how to prevent caching since I want a different random row to be selected each time I call this method?
Edit: I also tried calling clearResultCache(), but that just ended up causing an error stating: "Result Cache driver not initialized".
Edit 2: As requested, here's the SQL generated by calling $query->getSqlQuery():
SELECT c.id AS c__id, c.name AS c__name, c.image_url AS c__image_url,
c.level AS c__level, c.created_at AS c__created_at, c.updated_at
AS c__updated_at, RANDOM() AS c__0 FROM cards c ORDER BY c__0 LIMIT 1
It turns out I'm a moron. I tried to simplify my query for this question, and in doing so, I didn't capture the true cause. I had a where() and andWhere() call, and the combination of conditions resulted in only one possible record being matched. Thanks for taking the time to respond, everyone, sorry to have wasted your time!
Doctrine also caches entities you created in the same request/script run.
For instance:
$order = new Order();
$order->save();
sleep(10); // Edit this record in de DB in another procces.
$q = new Doctrine_Query();
$result = $q->select()
->from('Order o')
->where('o.id = '.$order->id);
$order = $result->getFirst();
print_r($order->toArray());
The print_r will not contain the changes you made during the sleep.
The following code will remove that kind of memory cache:
$manager = Doctrine_Manager::getInstance();
$connection = $manager->getCurrentConnection();
$tables = $connection->getTables();
foreach ( $tables as $table ) {
$table->clear();
}
PS: Added this answer because I found this topic trying to resolve above issue.

Resources