algolia magento module ordered_qty as ranking attribute blocking cron executions - magento

We moved Algolia search from our local development environment to our staging environment. On staging we have 144,000 sample orders and 100,000 products. Both of these numbers are smaller than our production environment.
We inserted our app id and other credentials and saved. We're using AOE scheduler to execute our crons. algoliasearch_run_queue has been running for 5 hours now and it seems to be making the same queries:
SELECT SUM(order_items.qty_ordered) AS ordered_qty, AS order_items_name, `o....
I believe this is related to ranking = ordered_qty. This cron is holding up all processing of subsequent crons, meaning other magento task, (order emails, indexing, etc) will not take place during the time this one is running.
What is the fix for this?

An improvement has been done in 1.4.3 but will probably not resolve the issue for such big store. Computing ordered_qty can indeed be long but it's used to have a good relevance.


What will be the wait time before big query executes a query?

Every time I execute a query in Google bigquery in the Explanation tab, I can see that their involves an average waiting time. Is it possible to know the percentage or seconds of wait time?
Since BigQuery is a managed service, around the glob a lot of customers are using it. It has an internal scheduling system based on the billingTier (explained here and other internals of your project. Based on this the query is scheduled to be executed based on the cluster availability. So there will be a minimum time until it finds a cluster of machines to execute your job.
I never seen there significant times. In case you have this issue then contact google support to see your project. If you edit your original question and add a job ID, a google enginner may check it out if there is an issue orn ot.
It's currently not exposed in the UI.
But you can find a similar concept from API (search "wait" from following page):
Is it possible to reduce the big query execution wait time to the minimum?
Purchase more BigQuery Slots.
Contact your sales representative or support for more information.

Magento Reindexing Data - Risks

I have a Magento site in which the cross-selling products do not seem to be appearing.
After looking on Stack and Google it seems that 'reindexing the data' has solved this issue for a lot of individuals.
My question is, are there any risks associated with performing this task? Or is it a relatively straight forward procedure?
Indexing is a fundamental part of Magento and will not effect your site in a negative way.
Magento uses a complex EAV (entity-attribute-value) database structure that can sometimes require heavy database queries to retrieve simple results. Because of this, the Magento developers have implemeted Index tables that query all of this data, and store it into a single table structure. This allows Magento to quickly query the single Index table, rather than making complex joins across multiple tables.
With that being said, Reindexing does not alter your existing data. It simply queries your existing data and copies it to it's own tables.
To reindex your site, you can simple go to System > Index Management, check off all the indexes that you wish to reindex, then submit.
If you have a large set of products, I recommend reindexing your site from the shell command line.
Login to your site using an SSH program (such as Putty)
Once logged in, cd to your magento/shell/ (where magento is your Magento root directory)
Run the following command to reindex your site: php indexer.php reindexall
Wait for the index processes to complete.
Lastly, ensure that your Catalog is using the Flat index tables. To do this:
Go to System > Configuration > Catalog > Frontend (section)
Set Use Flat Catalog Category to Yes
Set Use Flat Catalog Product to Yes
Click Save Config
No, you're safe to reindex whenever you see the notice appear.
If you know you're going to make a lot of changes, you can wait until you're done, saving yourself some time but only running it once at the end.
The only exception where this is not safe is if you have tens of thousands of products and/or lots of store views. It may end up running for hours and hours, slowing down your site leading to an undesirable experience for the customer.
I have found on sites with a large number of products, running the price reindex can cause a database lock, which can cause certain actions to be unavailable and for orders to be duplicated during that time. It also can affect performance and eat resources. I recommend performing this late at night only if possible.

Slow Indexing in Magento EE 1.12

We have 4000 products with 4 stores and around 80 categories. We are running a dedicated DB server having SSD and followed the white paper of optimizing DB as well as App Server. Each product also have 12 custom options. Now Indexing is very slow. Any suggestions. Would it make any difference to reduce number of categories in the store etc.
Indexing is dramatically slowed when multi store setups are used. On one of our servers 3000 products on a single store setup takes approx 60 seconds to index everything. But 3000 products on a multi store setup (7 stores) can take up to 10 minutes. We have it set up to manual index only. You could set the indexing to run on a cron which might help you. (Throwing more resources at the server will obviously help too).
Reduce Catalog price rules if in use
Reduce all attributes not required for Search Index - the default Magento install includes attributes like "Tax Status" in the Search Index and then this is multiplied 4x for your multi stores
Same can be said for URL rewrites - 4x increase
12 custom options per product ? Could common attributes be used instead ?
Run indexing from SSH rather than from the Magento admin
In the web root in the shell/ folder you will find indexer.php
php indexer.php --help will give you some options
You can then time each of the nine indexes to see which is taking the most time which may help narrow the problem down
By running from command line you can increase the php memory limit just for that one process which may improve results
I set up a cron job to re-index the site in off peak times and in Magento admin, change to Manual Update only

Magento customer/order transferring

Most of you coders most likely already have the habit of working on different platforms (development-staging-production). In the company I work with they also have these different platforms with a Magento enterprise edition (v1.9.0.0) instance deployed.
About 2 months ago our team took a database backup of the production version to start working from (rather large development project about the content (product images, descriptions, ...) and automatic product loading.
Currently all the modifications have been deployed on the staging platform, containing order information of orders that have been placed two months ago at latest.
After buying a (badly coded and full of bugs) extension for exporting and importing orders (including order information, quotes, shipping info and customer info) which does not work properly I had decided to just copy all the following tables from the production site:
All tables starting with customer_
All tables starting with s_
All tables starting with sales_
I imported them on my development platform (just to try it out and it works! :O
All order, shipping, credit-memo and customer information is maintained and seem to be fully working and correct.
Here comes the actual question:
Will there be a chance of possible conflict with something order/customer-related in the future by doing this? As far as I know orders only carry relations to customers and customer addresses and not to actually products (at least I think they are linked by SKU and not by product entity_id like most things in magento)
This is proven by the fact that if you remove all products from your magento instance, all the order and customer information is maintained and fully working.
Edit: This actually worked ;)
This is probably a bit late but I have come across this situation several times recently and this is an excellent question and it's important to be aware of it to avoid future problems with the customer/sales relationship.
At the time of writing the most current Magento version is and yes, the habit of working with a production and development site is likely, so the transferring of new sales and customer during the development is an important step to take and shouldn't require extensions if one has even little experience with the DB, so here it goes:
It is correct that the transferring of the customer_ and sales_ tables will transfer all the data correctly and safely.
Thereafter, one MUST only update the eav_entity_store table's increment_last_id column for each row.
Doing the last step above avoids that the new orders, invoices, shipping or creditmemo's ignore the new ID's and assures that the new orders start from where the transferred orders left off.
It may be a bit confusing, but its a very easy step. This Article explains it more in detail in case.
There's this script:
It grabs a customer and all his orders through the single command:
if 1234 is the customer_entity.entity_id - You can take a look in the source code to see how the table restraints were queried to make sure all rows were grabbed.
You are approaching this the wrong way around. If you have EE , then it bundles content staging procedures and you should use that for your content changing.
And yes it most certainly can cause issues, as all your relations with other order related content like sent invoices, all objects attributes might just get new entity_id 's and this will eventually end in a mess along your road somewhere.
If you add attribute sets and attributes to a large installation it's always recommended to implement those as extension setup routines so you can move your codebase and all changes are automatically populated to whatever database you might connect in the future.

(ASP.NET) How would you go about creating a real-time counter which tracks database changes?

Here is the issue.
On a site I've recently taken over it tracks "miles" you ran in a day. So a user can log into the site, add that they ran 5 miles. This is then added to the database.
At the end of the day, around 1am, a service runs which calculates all the miles, all the users ran in the day and outputs a text file to App_Data. That text file is then displayed in flash on the home page.
I think this is kind of ridiculous. I was told they had to do this due to massive performance issues. They won't tell me exactly how they were doing it before or what the major performance issue was.
So what approach would you guys take? The first thing that popped into my mind was a web service which gets the data via an AJAX call. Perhaps every time a new "mile" entry is added, a trigger is fired and updates the "GlobalMiles" table.
I'd appreciate any info or tips on this.
Thanks so much!
Answering this question is a bit difficult since there we don't know all of your requirements and something didn't work before. So here are some different ideas.
First, revisit your assumptions. Generating a static report once a day is a perfectly valid solution if all you need is daily reports. Why hit the database multiple times throghout the day if all that's needed is a snapshot (for instance, lots of blog software used to write html files when a blog was posted rather than serving up the entry from the database each time -- many still do as an optimization). Is the "real-time" feature something you are adding?
I wouldn't jump to AJAX right away. Use the same input method, just move the report from static to dynamic. Doing too much at once is a good way to get yourself buried. When changing existing code I try to find areas that I can change in isolation wih the least amount of impact to the rest of the application. Then once you have the dynamic report then you can add AJAX (and please use progressive enhancement).
As for the dynamic report itself you have a few options.
Of course you can just SELECT SUM(), but it sounds like that would cause the performance problems if each user has a large number of entries.
If your database supports it, I would look at using an indexed view (sometimes called a materialized view). It should support allows fast updates to the real-time sum data:
SELECT SUM([Count]) AS TotalMiles,
COUNT_BIG(*) AS [EntryCount],
FROM Miles
If the overhead of that is too much, #jn29098's solution is a good once. Roll it up using a scheduled task. If there are a lot of entries for each user, you could only add the delta from the last time the task was run.
UPDATE GlobalMiles SET [TotalMiles] = [TotalMiles] +
(SELECT SUM([Count])
FROM Miles
WHERE UserId = #id
AND EntryDate > #lastTaskRun
WHERE UserId = #id
If you don't care about storing the individual entries but only the total you can update the count on the fly:
UPDATE Miles SET [Count] = [Count] + #newCount WHERE UserId = #id
You could use this method in conjunction with the SPROC that adds the entry and have both worlds.
Finally, your trigger method would work as well. It's an alternative to the indexed view where you do the update yourself on a table instad of SQL doing it automatically. It's also similar to the previous option where you move the global update out of the sproc and into a trigger.
The last three options make it more difficult to handle the situation when an entry is removed, although if that's not a feature of your application then you may not need to worry about that.
Now that you've got materialized, real-time data in your database now you can dynamically generate your report. Then you can add fancy with AJAX.
If they are truely having performance issues due to to many hits on the database then I suggest that you take all the input and cram it into a message queue (MSMQ). Then you can have a service on the other end that picks up the messages and does a bulk insert of the data. This way you have fewer db hits. Then you can output to the text file on the update too.
I would create a summary table that's rolled up once/hour or nightly which calculates total miles run. For individual requests you could pull from the nightly summary table plus any additional logged miles for the period between the last rollup calculation and when the user views the page to get the total for that user.
How many users are you talking about and how many log records per day?
