Our script uses preg_match to determine if the url is correct and if it's not it throws a 404, except we want our urls to look like this:
http://www.my-domain.com/my-post-title-here/my_id_here/
When it originates as this:
http://www.my-domain.com/my-post-title-here-my_id_here.html
So we tried to change the preg_match to the following but its not working,
preg_match('(([a-z]+)/([0-9]+))/?', request_uri(), $matches);
any help would be appreciated!
Well we got the preg_match to work using the following,
preg_match('#([a-z])/(\d+)/$#i', request_uri(), $matches);
But we still cant get it to restrict to just the seo title, for example.
If the url looks like this
http://www.my-domain.com/my-post-title-here/my_id_here/
You can still load the post by replacing the post title with anything you want, which could result in duplicate content issues!
How can we restrict it to just the title of the post as that's how we generate the url??
Related
I have a column of links in Google Sheets. I want to tell if a page is producing an error message using importxml
As an example, this works fine
=importxml("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_T", "//td/b")
i.e. it looks for td, and pulls out b (which are postcodes in Canada)
But this code that looks for the error message does not work:
=importxml("https://www.awwwards.com/error1/", "//div/h1" )
I want it to pull out the "THE PAGE YOU WERE LOOKING FOR DOESN'T EXIST."
...on this page https://www.awwwards.com/error1/
I'm getting a Resource at URL not found error. What could I be doing wrong? Thanks
after quick trial and error with default formulae:
=IMPORTXML("https://www.awwwards.com/error1/", "//*")
=IMPORTHTML("https://www.awwwards.com/error1/", "table", 1)
=IMPORTHTML("https://www.awwwards.com/error1/", "list", 1)
=IMPORTDATA("https://www.awwwards.com/error1/")
it seems that the website is not possible to be scraped in Google Sheets by any means (regular formulae)
You want to retrieve the value of THE PAGE YOU WERE LOOKING FOR DOESN'T EXIST. from the URL of https://www.awwwards.com/error1/.
If my understanding is correct, how about this answer? Please think of this as just one of several possible answers.
Issue and workaround:
I think that the page of your URL is Error 404 (Not Found). So in this case, the status code of 404 is returned. I thought that by this, the built-in functions like IMPORTXML might not be able to retrieve the HTML data.
So as one workaround, how about using a custom function with UrlFetchApp? When UrlFetchApp is used, the HTML data can be retrieved even when the status code is 404.
Sample script for custom function:
Please copy and paste the following script to the script editor of the Spreadsheet. And please put =SAMPLE("https://www.awwwards.com/error1") to a cell on the Spreadsheet. By this, the script is run.
function SAMPLE(url) {
return UrlFetchApp
.fetch(url, {muteHttpExceptions: true})
.getContentText()
.match(/<h1>([\w\s\S]+)<\/h1>/)[1]
.toUpperCase();
}
Result:
Note:
This custom function is for the URL of https://www.awwwards.com/error1. When you use this for other URL, the expected results might not be able to be retrieved. Please be careful this.
References:
Custom Functions in Google Sheets
fetch(url, params)
muteHttpExceptions: If true the fetch doesn't throw an exception if the response code indicates failure, and instead returns the HTTPResponse. The default is false.
match()
toUpperCase()
If this was not the direction you want, I apologize.
So I'm trying to redirect the entire url with a th:href, but it is adding characters that I don't want.
My current url is this
http://localhost:8080/viewCourse/post/5
And I'm trying to backtrack to the course the post was a part of, which is
http://localhost:8080/viewCourse/1
So currently this is what my html looks like
<a th:href="#{'/viewCourse/'(${post.course.id})}"><span th:text="${post.course.name}"></span></a>
And this is the url I get
http://localhost:8080/viewCourse/?1
And the Id is correct, but I'm not sure why the ? is there.
I've also tried this
<a th:href="#{'/viewCourse/'(id=${post.course.id})}"><span th:text="${post.course.name}"></span></a>
Which gives me this
http://localhost:8080/viewCourse/?id=1
If anybody can see how I can fix this and let me know that would be great, thanks in advance.
You can achieve adding id without question mark by String concatenation
<a th:href="#{/viewCourse/} + ${post.course.id}"><span th:text="${post.course.name}"></span></a>
However I would recommend to study this answer https://stackoverflow.com/a/14938399/5900967
As this can fail in some contexts
Apparently your id was added as a parameter.
Your code should be like this:
<a th:href="#{/viewCourse/{id}(id=${post.course.id})}"><span th:text="${post.course.name}"></span></a>
And the output should be like this:
http://localhost:8080/viewCourse/1
To learn more about thymeleaf url syntax, see https://www.thymeleaf.org/doc/articles/standardurlsyntax.html
I've defined following route in config file as follows.
$route['apartments/(:any)'] = 'apartments/view/$1';
If I give http://localhost/apartment_advertisement/apartments/shobha_complex like this in url it works perfectly fine.
If I give http://localhost/apartment_advertisement/apartments/shobha_complex/abcd/abcd like this in url it goes to the same page as above. So I needed error page for this url. Please help me how to control these urls?. The work would be more appreciated.
Do you mean display an 404-not-found error when request URL has an unwanted "tail"? You can modify (:any) to restrict accepted string. It's simple:
$route['apartments/(\w+)'] = 'apartments/view/$1';
I need to exclude some parameters to aggregate properly the pages of my website. I know there is "Exclude URL Query Parameters" and that's OK.
The problem is when I use the URL rewrite. Example.
I have tried with a custom filter for renaming URLs, but it seems to be ignored.
Can anyone help me with the correct syntax?
Please, see this screenshot:
I doubt any of your URIs start with "fbphoto" (^fbphoto). "/fbphoto" is more likely (^/fbphoto)
If the intent is to rewrite all photo URLs with /fbphoto/, here's the syntax to use:
Search String:
^/fbphoto.*
Replace String:
/fbphoto/
So I'm using codeigniter and I've had very letter experience with it so far but here's my issue.
I have an if statement I've set up that says if (#$_GET['f'] == 'callback') then do something, if not, do something else.
So basically my URL ends up looking like this:
http://localhost/finalproject/myspacedev/index?f=start
and all I get is a 404 page. I've tried turning on get in the config, done a bunch of reading on here about using the uri segment class in CI, but all I get are 404 errors. What am I doing wrong here? It's driving me nuts!
Nevermind I'm dumb.
It's PATH_INFO, not PATH INFO.
Still having some issues but for now I'm good.
CodeIgniter automatically maps $_GET variables to class member parameters, which is far nicer to work with (see Controllers in the CI docs).
An example:
<?php
class blog extends Controller {
function archives($filter = '') {
// $filter is a $_GET paramemter
}
}
?>
The above controller would be available at /blog/archives/ and anything after that portion of the URI would be passed as the $_GET parameters. If /blog/archives/ returns a 404, then you probably don't have the .htaccess file in the web root, or you may not have it enabled in the Apache configuration.
It must have something to do with my .htaccess file, even though I thought I had it set up correctly. I tried to do it that way and never had any success so I just ended up enabling GET with the parse_str line that everyone passes around.
In any case, I got it to work even if its not the cleanest, most efficient way.