magento url rewrite useless? - magento

Two points I want to mention.
First Point.
I noticed a strange behavior, most probably a bug.
I configured a new clean instance of Magento (no other module, so from scratch) and an empty database.
I created 3 categories below the root one.
And 3 products, one in each category.
Something like:
Cat 1
+ Prod 1
Cat 2
+ Prod 2
Cat 3
+ Prod 3
If I change the order of the category so "Cat 3" is before "Cat 2" like this:
Cat 1
+ Prod 1
Cat 3
+ Prod 3
Cat 2
+ Prod 2
I just need to drag and drop "Cat 3" above "cat 2" from the category management screen.
So the "order" number of cat2 and cat3 are actually exchanged.
BUT the url index process reindexes ALL products of ALL categories (URL REWRITE index)!
I analyzed the SQL log, and it actually does an INSERT with every single product in the database.
I see insert in core_url_rewrite for "Prod 1", "Prod 2" and "Prod 3".
This is a bug, because "Cat 3" keeps the same parent category, so:
1) there is no need to rewrite products within "Cat 3" (the product name didn 't change, the category name didn't change!!)
2) there is no need to rewrite products linked to other categories
Actually, by doing a select, I can see that the rows of the core_url_rewrite table are the same (for sure as no name changed! and no association between products and any categories above the products changed!)
Here is one SQL query that I see out of the log file wen I move the category:
SQL: INSERT INTO `core_url_rewrite` (`store_id`,`category_id`,`product_id`,`id_path`,`request_path`,`target_path`,`is_system`) VALUES (?, ?, ?, ?, ?, ?, ?) ON DUPLICATE KEY UPDATE store_id = VALUES(`store_id`), category_id = VALUES(`category_id`), product_id = VALUES(`product_id`), id_path = VALUES(`id_path`), request_path = VALUES(`request_path`), target_path = VALUES(`target_path`), is_system = VALUES(`is_system`)
BIND: array (
0 => '1',
1 => NULL,
2 => '4',
3 => 'product/4',
4 => 'testun.html',
5 => 'catalog/product/view/id/4',
6 => 1,
)
AFF: 0
TIME: 0.0005
Actually, the worse thing is, it does an insert of a row that already exist, so it actually does not insert anything. The insert failed (you can see "AFF: 0" meaning nothing has been inserted)
It is a waste of time to process each product for nothing, and try to insert something that might be already there!!
Second points
I found another bug/strange behavior.
If I have 2 products with the same name (it can happen), then the url key is the same (by default).
BTW url key is also the same by default when you duplicate a product to create a new one.
So The reindex process becomes crazy.
eg, 2 products with the name "camera" will have the url rewrting like this:
camera-1.html
camera-2.html
I'm ok with this.
BUT, if now I reindex everything, it becomes crazy.
it will change the url rewriting of those products (even if I didn't change anything related to those products).
it will update the 2 products like this:
UPDATE camera-1.html => camera-3.html
UPDATE camera-2.html => camera-4.html
and insert redirection if the setting is enabled (so previous links are not lost), somethign like
INSERT camera-1.html , camera-3.html ,RP
INSERT camera-2.html , camera-4.html , RP
RP options is about permanent redirect.
So 2 useless updates and 2 useless Insert for nothing.
If I reindex again, I wait the end, and reindex immediately, then Magento does 4 updates, 4 inserts etc.
Why?? No change at all with any data between the reindex :-)
If you have 5 000 products with the same name (like I have), then it's 10 000 updates and 10 000 (real) insert for nothing...
Size of core_url_rewrite increase again and again on a daily basis. Suration is extremely high
Note: I have a good reason to have 5 000 products with exactly the same name :-)
Whatever my reason this looks strange.
Have you already checked this?
Quite easy to check with a fresh installation of magento and log files enabled.
Last thing is, why do we need the core_url_rewrite table?
This is one of the main cause of performance issue with magento!
4 lines of php code+htaccess url rewrite would do exactly the same job, no need of a DB Table for this (except for custom url rewrite or CMS page).
one method to generate dynamically the url of a product (based on name and category if needed) and one to generate the url of a category.
then htaccess to redirect.
you just need a keyword in the url to know whether it is a link to a product or a category, and its ID.
something like:
my-cat/camera-112-p.html
htacces URL rewrite detects it's a link to a product (because of -p.htm), it gets the product id out of the url (112) and redirect the user accordingly.
having the product ID might looks ugly or an issue with SEO, but I don't think so (not as bad as you can read).
And it has to be balanced with the big benefit:
1) no huge table anymore
2) no need to reindex this table (this takes hours, like 8 hours, with a lot of magento website). This process can cause a lot of timeout issue, locking etc.
at least this should be possible through an option (or a module).
Note also that you don't even need to care about permanent redirection, since the content (text) within the link does not matter! Just the ID matters.
Does it exist? if yes I will definetely buy it to say "bye-bye" to this complex messy mechanism (with bugs)
any feedback will be hight appreciated.
(especially if you find any rational in the way magento behaves, taken into account the poor performance linked to use/manage this table, so the rationnal has to be highly appreciated :-) )
thanks
Rod

Point one and two seem to have been addressed see the notes for EE 1.13.0.2 (released today, CE 1.7 coming soon): http://www.magentocommerce.com/knowledge-base/entry/ee113-later-release-notes#prod-url-unique
But, it's worth addressing some of your points.
Why do/did URL rewrites work this way? Because that's the way they worked - it's just how they were created/evolved, including the racing rewrite bug you noticed when two products have the same url_key.
Based on a lot of benchmarking and experience, I can state that the core_url_rewrite table is not "the main cause of poor performance in Magento". The reindex process can suck though, no doubt.
The URL rewrite table is necessary for custom rewrites in general. Suggesting that manipulation of server config files (e.g. Apache .htaccess) to add rewrites fails to consider that Magento is an application which can be modified and extended without direct developer knowledge (e.g. by store owner).
The suggestion to use a pretty-urls mod_rewrite pattern is not tenable for any shop concerned with SEO, and I assure you that the URL path is quite important to ranking/relevance.

Related

Smart pagination algorithm that works with local data cache

This is a problem I have been thinking about for a long time but I haven't written any code yet because I first want to solve some general problems I am struggling with. This is the main one.
Background
A single page web application makes requests for data to some remote API (which is under our control). It then stores this data in a local cache and serves pages from there. Ideally, the app remains fully functional when offline, including the ability to create new objects.
Constraints
Assume a server side database of products containing +- 50000 products (50Mb)
Assume no db type, we interact with it via REST/GraphQL interface
Assume a single product record is < 1kB
Assume a max payload for a resultset of 256kB
Assume max 5MB storage on the client
Assume search result sets ranging between 0 ... 5000 items per search
Challenge
The challenge is to define a stateless but (network) efficient way fetch pages from a result set so that it is deterministic which results we will get.
Example
In traditional paging, when getting the next 100 results for some query using this url:
https://example.com/products?category=shoes&firstResult=100&pageSize=100
the search result may look like this:
{
"totalResults": 2458,
"firstResult": 100,
"pageSize": 100,
"results": [
{"some": "item"},
{"some": "other item"},
// 98 more ...
]
}
The problem with this is that there is no way, based on this information, to get exactly the objects that are on a certain page. Because by the time we request the next page, the result set may have changed (due to changes in the DB), influencing which items are part of the result set. Even a small change can have a big impact: one item removed from the DB, that happened to be on page 0 of the result set, will change what results we will get when requesting all subsequent pages.
Goal
I am looking for a mechanism to make the definition of the result set independent of future database changes, so if someone was looking for shoes and got a result set of 2458 items, he could actually fetch all pages of that result set reliably even if it got influenced by later changes in the DB (I plan to not really delete items, but set a removed flag on them, for this purpose)
Ideas so far
I have seen a solution where the result set included a "pages" property, which was an array with the first and last id of the items in that page. Assuming your IDs keep going up in number and you don't really delete items from the DB ever, the number of items between two IDs is constant. Meaning the app could get all items between those two IDs and always get the exact same items back. The problem with this solution is that it only works if the list is sorted in ID order... I need custom sorting options.
The only way I have come up with for now is to just send a list of all IDs in the result set... That way pages can be fetched by doing a SELECT * FROM products WHERE id IN (3,4,6,9,...)... but this feels rather inelegant...
Any way I am hoping it is not too broad or theoretical. I have a web-based DB, just no good idea on how to do paging with it. I am looking for answers that help me in a direction to learn, not full solutions.
Versioning DB is the answer for resultsets consistency.
Each record has primary id, modification counter (version number) and timestamp of modification/creation. Instead of modification of record r you add new record with same id, version number+1 and sysdate for modification.
In fetch response you add DB request_time (do not use client timestamp due to possibly difference in time between client/server). First page is served normally, but you return sysdate as request_time. Other pages are served differently: you add condition like modification_time <= request_time for each versioned table.
You can cache the result set of IDs on the server side when a query arrives for the first time and return a unique ID to the frontend. This unique ID corresponds to the result set for that query. So now the frontend can request something like next_page with the unique ID that it got the first time it made the query. You should still go ahead with your approach of changing DELETE operation to a removed operation because it would make sure that none of the entries from the result set it deleted. You can discard the result set of the query from the cache when the frontend reaches the end of the result set or you can set a time limit on the lifetime of the cache entry.

Magento quick search by SKU

When I try to search products by SKU, I get incomplete results. For example: I have products with SKU IR-CP-CH_1 and A-453-B-I_1. Both products are configurable products, both are visible for Catalog, Search. I get correct result for query IR-CP-CH_1 and no result for A-453-B-I_1.
Indexes are rebuilt. I use combined search type (like + fulltext). In advanced search everything works fine.
I suggest you take a quick look in your database at the table catalogsearch_fulltext. In the data_index column you should be able to see the SKUs as part of the full text string Magento creates for quick searching in.
See if you can either manually spot the elusive SKU 'A-453-B-I_1' or hit it with an
SELECT * FROM catalogsearch_fulltext WHERE data_index LIKE '%453%'
Maybe the SKU got entered with some strange characters or a space instead of a hyphen. You could search in the product_id column instead to see what search string Magento does have for that SKU.
If the string is in the table and the character glyphs match exactly, then I think you are looking to indexing, caching, stock, store views etc as suggested in the comments above by others
If the string is not in the table at all then I think you are looking to 'visibility'.
If you look in the table catalogsearch_query and find your search string 'A-453-B-I_1' then look to the num_results column - if that value is greater than zero then items were found but it's not displaying that product for some reason.
**EDIT following comments below
Actually I think you should remove that '0' result from the catalogsearch_query table. You could remove it using SQL or phpmyadmin. Magento will return a result from catalogsearch_query if it finds one rather that search catalogsearch_fulltext every time.
It is possible that at some point the result was '0' but now it is non-zero but Magento is stuck with the '0 results' in the catalogsearch_query table.
There is more analysis that can be done, but try that first and if it still isn't right we can look at trapping the database query to try to understand why Magento thinks the result is zero.
For information, in my case, a free module (activo_catalogsearch) was breaking the research by sku because it was not up to date and probably conflicting with magento 1.9.4.1 (worked fine before with magento 1.9.2.1)

Passing more than 3 items in a reports column link

I have a report that is listing students and I want a column to edit a student. I've done so by following this answer:
How do you add an edit button to each row in a report in Oracle APEX?
However, I can only seem to pass 3 items and there's no option to add more. I took a screenshot to explain more:
I need to pass 8 values, how can I do that?
Thanks!
Normally, for this you would only pass the Primary Key columns (here looks like #RECORD_NUMBER# only). The page that you send the person to would then load the form based on the primary key lookup only. If multiple users were using this application, you would want the edit form to always retrieve the current values of the database, not what happened to be on the screen when a particular person ran a certain report.
Change the Target type to URL.
Apex will format what to already have into a URL text field which magically appears between Tem3 and Page Checksum.
All you need to do is to add your new items and values in the appropriate places in the URL.
I found a workaround, at least it was useful to my scenario.
I have an IR page, query returns 4 columns, lets say: ID, DESCRIPTION, SOME_NUMBER,SOME_NUMBER2.
ID NUMBER(9), DESCRIPTION VARCHAR2(30), SOME_NUMBER NUMBER(1), SOME_NUMBER2 NUMBER(3).
What I did was, to setup items this way:
P11_ITEM1-->#ID#
P11_ITEM2-->#DESCRIPTION#
P11_ITEM3-->#SOME_NUMBER##SOME_NUMBER2#
Previous data have been sent to page 11.
In page 11, all items are display only items.
And P11_ITEM3 actually received two concatenated values.
For example, the calling page has columns SOME_NUMER=4 and SOME_NUMBER2=150
so, in pag1 11, P11_ITEM3 shows 4150
In page 11 I created a Before Footer process (pl/sql expression)
to set up new items, for example P11_N1 as source SUBSTR(P11_ITEM3,1,1)
and item P11_N2 as source SUBSTR(P11_ITEM3,2,3)
So, I had those items with corresponding values from the calling IR page.
The reason I did not pass the primary key only for new lookup access, is because i do not want to stress database performing new queries since all data are already loaded into page items. I've been an oracle DBA for twenty years and I know there is no need to re execute queries if you already have the information somewhere else.
These workarounds are not very useful for a product that bills itself as a RAD tool.
Just include a single quoted word in the select statement (Select col1, 'Randomword', col2 from table 1;)
Then define that column as a link and bingo! More items than 3 to select.

Products not appearing in Magento search results, despite being search enabled

(Yes I know this is a duplicate, but none of the other solutions work for me.)
I’ve got a product that won’t appear in search results, despite being search enabled.
Here is an image to prove it.
And I’m using the latest version of Magento CE version 1.7.0
I've reindexed several times and have disabled the cache.
Please help! I can’t setup my shop without this.
1. Check product Quantity and Stock and that product is assigned to a category
2. Clear Cache
3. Run Re-Indexing.
Try to change the Search Type from like to fulltext
Configuration -> Catalog -> Catalog Search.
Regards
Check the MySql variable ft min word len. I had a similar issue, if I searched for the word "Pad" it returned 0 results because the ft min word len variable was set to 4 charachters and the word 'pad' has only 3.
Warning: Changing the value of the ft min word len variable to 3 may affect your server performance.
I just had the same problem and it turned out, that the Website-Checkbox (on the Products' tab "Website" in the Magento Backend) was not set during import.
Tried the following (still working on development-site)
Re-Index all Data
Delete the Cache
Delete all Session Files (rm -rf session/* in var/)
Checked Product Settings for:
Stock
Visibility
Status
Selected Categories
Website
I've been playing around with these for years. Here is what I currently have setup.
catalog catalog/search (in admin)
2
25
128
combine like/full text
2000
attributes: that are searchable
name, short_description, sku, color, manufacturer_part_number, manufacturer, upc
I did have description searchable but found that it returns way too many results..I just really want the search text found in my products name to be found.
Our client had created a new Root Category and moved all their products under here. They had neglected to update the Main Website Store configuration in:-
"System/Manage Stores/Main Website Store/Root Category"
We set this to the "New Root Category" and the search sprang back into life :)

MySQL get rows, but for any matching get the latest version

I'm developing a CMS, and implementing versioning by adding a new entry to the database with the current timestamp.
My table is set up as follows:
id | page | section | timestamp | content
"Page" is the page being accessed, which is either the path to the page ($page_name below), or '/' (to indicate 'global' fields).
"Section" is the section of the page being edited.
I want to be able to select all rows for a given page, but each section should only be selected once, the one with the latest timestamp being selected.
I've tried using the following CodeIgniter Active Record code:
$this->db->select('DISTINCT(section), content');
$this->db->where_in('page', array('/', $page_name));
$this->db->order_by('timestamp', 'desc');
$query = $this->db->get('cms_content');
Which is producing the following SQL:
SELECT DISTINCT(section), `content`
FROM (`cms_content`)
WHERE `page` IN ('/', 'index.html')
AND `enabled` = 1
ORDER BY `timestamp` desc
Which is returning both test rows (rows have all same fields except id, timestamp and content).
Any ideas as to where I'm going wrong?
Thanks!
Your mistake is thinking that DISTINCT applies only to section - an easy mistake to make as the parentheses are misleading here. In fact the DISTINCT applies to the entire row whether or not you have parentheses. It is therefore best to omit the parentheses to avoid confusion.
Your problem is a classic 'max per group' problem. There are many, many ways to write this query and it is probably one of the most popular SQL questions on this site so you can search Stack Overflow to find ways to solve it. One way to get you started is to only select rows which hold the maximum timestamp for that section:
SELECT section, content
FROM cms_content T1
WHERE page IN ('/', 'index.html')
AND enabled = 1
AND timestamp = (
SELECT MAX(timestamp)
FROM cms_content T2
WHERE page IN ('/', 'index.html')
AND enabled = 1
AND T1.section = T2.section
)
I'm sorry but I do not know how to convert this SQL code into CodeIgniter Active Record. If another user more familiar with Active Record wishes to use this as a starting point for their own answer, they are welcome.
DISTINCT is for all columns selected, and because "content" differs you will get two different rows.
You only want to order by timestamp and limit 1 because you always want the latest.
But may I suggest that you keep a cross reference to the "active" page? That way, you are able to revert to a previous revision without dumping the new ones.
Meaning:
page
----
id
info
active_page_id
page_revisions
--------------
id
page_id
content
timestamp
...
Meaning, you have one-to-many between page <-> page_revisions, aswell as a one-to-one between page and page_revisions to keep track of the "current" revision. With this approach you are able to just join in the active revision.
This will do the job in Codeigniter, without temporary tables:
$this->db->query( "SELECT *
FROM cms_content AS c1
LEFT JOIN cms_content AS c2
ON c1.page=c2.page
AND c1.section=c2.section
AND c1.timestamp < c2.timestamp
WHERE c2.timestamp IS NULL AND page=?", $page );

Resources