I have an eCommerce store in Magento (if that matters). I have a goal setup as "Buy tickets", Goal type as "Destination, and here is the setup:
The funnel seems to be "broken", as obviously doesn't seem to be working right. I'm not the most advanced user, and still learning, so not sure what the problem is! Hope someone can help, thank you.
Your regular expression for Step 2 also matches Step 3 and 4. This causes inaccuracies.
Try the below for your regex pattern instead. Unfortunately Google Analytics doesn't support negative lookaheads in regular expressions - so this is more convoluted than it should be.
Step 2: /checkout/[A-Za-z0-9\-\.]+
This assumes the only special characters that could appear in your URL are - and .. Modify to add more if needed (except for #).
Replace Step 1 with Regex to allow all formats of homepage
^/$|^/\?
Related
We are confronting different search engines for our research
archives and having browsed the Xapian-Omega documentation, we
decided to try it out since the Omega option appears to be an
appropriate solution with several interesting search options.
We installed Xapian-Omega on a Linux Server (Deb 7) and tested
the setup with success. However we are unsure as to how one can
employ or perhaps even enable the use of Wild Cards or Regular
Expressions with Xapian-Omega.
We read that for Xapian one has to enable the Wild Card option
"QueryParser flags"
Could someone clarify this ?
ie. explain with or indicate a page with an example or two.
But we did not see much information regarding examples with Omega
CGI and although this latter runs well, wild card options
(such as * for the general wild card and ? as a single character),
do not seem to work as expected by default and they would be
useful, even though stemming and substrings etc may be functional.
Eg: It would be interesting to be able to employ standard simple
wild char searches with a certain precision such as :
medic* for medicine medical medicament
or with ? for single characters
Can Regexp be recognised with Omega ?
eg : sep[ae]r[ae]te(\w+)?
or searching for structured formats such as Email or Credit Card
Numbers or certain formula types in research papers etc.
In a note from Olly Betts long ago (Dev Mailing List) regarding
this one suggestion was to grep the index file but this would
defeat the RAD advantage of Omega.
Any examples of searches using Omega with Wild Cards or Regular
Expressions would be most appreciated ... even an indication of
a page where information regarding this theme is well presented
with examples illustrating how to develop advanced searches
using Xapian alone would be most welcome (PHP or Python perhaps).
(We are not concerned for the moment about the eventual
substantial increase in the size of the index size or in the
time to index the archive)
You can enable right-wildcards (such as "medic*") in Omega using $set{flag_wildcard,1} (covered in the Omegascript documentation), which enables FLAG_WILDCARD. There's a section in the user manual on using wildcards.
Xapian doesn't provide support for regular expression searching, although in theory I believe it would be possible to support, if potentially costly (depending on the regex). It would have to run the regular expression against unstemmed terms in the database, and then feed them into the search. Where it becomes difficult is if the regex expands to a lot of terms (eg just 'a' as a regex). There's also some subtlety in making it efficient; it's easy to jump through the term list to something with a constant prefix, and you'd want to take advantage of that if possible.
For your example of sep[ae]r[ae]te(\w+)?, it sounds like you actually want a combination of spelling correction (for the a-e substitutions, which you can enable using $set{flag_spelling_correction,1}) and stemming (for the trailing letters after 'te'; Omega defaults to English stemming, but that can be changed), or either wildcard or partial match support.
If you do need regular expressions for your use case, then I'd suggest bringing it up on the xapian-discuss mailing list. Xapian has moved on since the last discussion, and I believe it would be easier to build such support now than it was then.
James Ayatt: Thank you for your answer and help, my apologies for this belated reply, a distraction with other work.
We had already seen the Omegascript page but it was not clear to us how to employ these options with the CGI interface. Also the use of * seems to be for trailing chars, is that correct ? ie not for internal groups of words eg: omeg*ipt; there are cases where the stemming option would not be sufficient. We did not see an option for single wild chars, sometimes represented by ? in certain search engines. Could you comment here ?
Regarding the use of regular expressions we had immagined that it might not be quite as simple as one could hope. The examples mentioned in the preceding post were of course simple possible uses, there are of course many more. Your comment on using the stemming option seems appropriate.
In certain cases it could be interesting to enable some type of regexp option for the extraction of text forms, such as those mentioned. The quick extractiion of such text, perhaps together with some surrounding text could be very useful.
We will certainly try your proposal with the mailing list.
Thank you again.
I'm in need to have more than 1 synonym for a search term in magento (version 1.4.2.0 - can't upgrade it for now), but all my attempts to add multiple synonyms have failed.
I've been looking around without any solution, any of you had a similar need and managed to find a solution?
Thanks for any help,
Mat.
So you have people look for 'doodad' or 'dodad' and you want to show people the 'macguffin' instead.
So far you have tried to add these search terms in on the back-end but it has not worked.
The fix-workaround is surprisingly simple.
Type in 'dodad' in the frontend - no result given.
Now type 'doodad' in the frontend - again no results.
Now go into the backend and go to the last page of the search terms.
The entries for 'dodad' and 'doodad' will be in there. You can now put 'macguffin' in the synonym box.
Now go to the front and type in 'dodad' or 'doodad' into the search box and it will take you straight to the 'macguffin' item.
It's hard for me to grasp what a rewrite is actually doing. I would like to setup some rules and then be able to throw tests at it and step through it like a debugger.
As far as I'm aware, there isn't a tool that would let you do that, however, there's a few things you could use that have been helping me a lot.
We all know rewrites are all about regular expressions, and being able to write them properly is a MUST.
Helicon Tech has a tool called Regular Expression Test Utility (in the bottom of the page). it's really good for writing your rules and evaluating them. It will let you use rewrite commands, and will tell you if there are matches, or errors with your evaluations.
Also, as a second tools (for quick regexps), i use a tool called regexr by gskiner. it's available as an online version, or my favourite desktop version. It won't check rewrite evaluations, but will let you write your regular expressions, and highlight the results, so when for example I want to redirect a user that hits the page:
http://www.website.co.uk/blog/index.cfm?article=1
with
http://www.website.co.uk/blog/article/1
I simply use it to test my matches as such:
alt text http://img13.imageshack.us/img13/5608/exampleyo.jpg
Hope this help you
What are the best algorithms for recognizing structured data on an HTML page?
For example Google will recognize the address of home/company in an email, and offers a map to this address.
A named-entity extraction framework such as GATE has at least tackled the information extraction problem for locations, assisted by a gazetteer of known places to help resolve common issues. Unless the pages were machine generated from a common source, you're going to find regular expressions a bit weak for the job.
If you have the markup proper—and not just the text from the page—I second the Beautiful Soup suggestion above. In particular, the address tag should provide the lowest of low-hanging fruit. Also look into the adr microformat. I'd only falll back to regexes if the first two didn't pull enough info or I didn't have the necessary data to look for the first two.
If you also have to handle international addresses, you're in for a world of headaches; international address formats are amazingly varied.
I'd guess that Google takes a two step approach to the problem (at least that's what I would do). First they use some fairly general search pattern to pick out everything that could be an address, and then they use their map database to look up that string and see if they get any matches. If they do it's probably an address if they don't it probably isn't. If you can use a map database in your code that will probably make your life easier.
Unless you can limit the geographic location of the addresses, I'm guessing that it's pretty much impossible to identify a string as an address just by parsing it, simply due to the huge variation of address formats used around the world.
Do not use regular expressions. Use an existing HTML parser, for example in Python I strongly recommend BeautifulSoup. Even if you use a regular expression to parse the HTML elements BeautifulSoup grabs.
If you do it with your own regexs, you not only have to worry about finding the data you require, you have to worry about things like invalid HTML, and lots of other very non-obvious problems you'll stumble over..
What you're asking is really quite a hard problem if you want to get it perfect. While a simple regexp will get it mostly right most of them time, writing one that will get it exactly right everytime is fiendishly hard. There are plenty of strange corner cases and in several cases there is no single unambiguous answer. Most web sites that I've seen to a pretty bad job handling all but the simplest URLs.
If you want to go down the regexp route your best bet is probably to check out the sourcecode of
http://metacpan.org/pod/Regexp::Common::URI::http
Again, regular expressions should do the trick.
Because of the wide variety of addresses, you can only guess if a string is an address or not by an expression like "(number), (name) Street|Boulevard|Main", etc
You can consider looking into some firefox extensions which aim to map addresses found in text to see how they work
You can check this USA extraction example http://code.google.com/p/graph-expression/wiki/USAAddressExtraction
It depends upon your requirement.
for email and contact details regex is more than enough.
For addresses regex alone will not help. Think about NLP(NER) & POS tagging.
For finding people related information you cant do anything without NER.
If you need information like paragraphs get the contents by using tags.
The latest makefiles we've received from a third party vendor contain rules with --depend on the end of build rules, so I thought I would look it up on Google, but try as I might, I can't persuade it to display any pages with exactly the characters --depend
I've tried surrounding it with quotes "--depend": I've tried the Advanced Search: I've tried backslashes "\-\-depend" in the (vain) hope that there is some sort of unpublished regular expression search available.
Am I missing something blindingly obvious?
Please note that this is NOT a question about what --depend does, I know that, it's a question about how you Google for very precise, programmer oriented, text.
You can specifiy literal symbols in a Google Code Search but not Google Web Search.
Examples;
Google Code Search for +"--depend"
Google Web Search for +"--depend"
I had the same issue searching for 'syntax-rules'. You would think they would have solved this by now.
I remember to have read somewhere that google's web search does not index non alphanumeric characters, treating them as word separators, so that's not possible.
Reason for this problem is that a minus sign at the start of a token indicates that you want to EXCLUDE it from the search.
This is how you can filter out really popular results that really have nothing to do with you want.
For example, try searching for "wow". Then try searching for "wow -warcraft".