Why is this lighttpd url rewrite not workng? - mod-rewrite

I have the following mod_rewrite code for lighttpd, but it does not properly foreward the user:
$SERVER["socket"] == ":3041" {
server.document-root = server_root + "/paste"
url.rewrite-once = ( "^/([^/\.]+)/?$" => "?page=paste&id=$1")
}
It should turn the url domain.com/H839jec into domain.com/index.php?page=paste&id=H839jec however it is not doing that, instead it is redirecting everything to domain.com. I dont know much about mod_rewrite and would appreciate some input on why it is doing this.

Use the following :
url.rewrite-once = ("^/(.*)$" => "/?page=paste&id=$1")
I don't know the exact issue in your code, but first the regex looks unnecessarily complicated and may not match what you expected it to match, and second you're redirecting to a query string where as I would expect you still need to redirect to a valid path before the query string, that's why I redirect to /?page... instead of just ?page....

Related

URL Rewrite missing $_GET parameters

I currently have trouble figuring out why my url rewrite is not persisting the request (get) parameters.
This is a sample url:
http://localhost:8888/testwelt/allgemein?test=1234
And this is my rewrite inside the lighttpd.conf:
url.rewrite-once = (
"^(/testwelt/(?!(favicon.ico$|sitemap.xml$|js/|pages/)).*)(\?|$)(.*)" => "/testwelt/index.php?url=$1&$3"
)
A var_dump of my $_GET reveals this:
array(1) { ["url"]=> string(39) "/testwelt/allgemein?test=1234" }
I am not too fit when it comes to url-rewriting. What am i doing wrong?
Thank you!
I fixed my problem with something like this:
url.rewrite-once = (
"^/testwelt/(sitemap.xml$|favicon\.ico$|php/|css/|js/).*" => "$0",
"^/testwelt/([^?]*)(?:\?(.*))?" => "/testwelt/index.php?url=$1&$2"
)
Little explanation:
The first rule "prevents" a rewrite for specific files/folders by redirecting to the URL match.
The second rule matches until a "?" character which indicates the url parameters. Then it matches the URL parameters and adds them to the rewritten url.

Regular Expression find usage of word after "/" in URL

I am trying to parse through URLs using Ruby and return the URLs that match a word after the "/" in .com , .org , etc.
If I am trying to capture "questions" in a URL such as
https://stackoverflow.com/questions I also want to be able to capture https://stackoverflow.com/blah/questions. But I do not want to capture https://stackoverflow.com/queStioNs.
Currently my expression can match https://stackoverflow.com/questions but cannot match with "questions" after another "/", or 2 "/"s, etc.
The end of my regular expression is using \bquestions\.
I tried doing ([a-zA-Z]+\W{1}+\bjob\b|\bjob\b) but this only gets me URLs with /questions and /blah/questions but not /blah/bleh/questions.
What am I doing wrong and how do I match what I need?
You don't actually need a regex for this, you can instead use the URI module:
require 'uri'
urls = ['https://stackoverflow.com/blah/questions', 'https://stackoverflow.com/queStioNs']
urls.each do |url|
the_path = URI(url).path
puts the_path if the_path.include?'questions'
end
I don't know whether there is any simple way around, here is my solution:
regexp = '^(https|http)?:\/\/[\w]+\.(com|org|edu)(\/{1}[a-z]+)*$'
group_length = "https://stackoverflow.com/blah/questions".match(regexp).length
"https://stackoverflow.com/blah/questions".match(regexp)[group_length - 1].gsub("/","")
It will return 'questions'.
Update as per you comments below:
use [\S]*(\/questions){1}$
Hope it helps :)

how to lighttpd redirect with querystring

Following url is working fine:
http://www.domain.com/address
but when I pass any querystring like:
http://www.domain.com/address?back=order-opc.php?step=1
It shows 404 page
my rewrite:
"^/address" => "/address.php",
i have tried so many different rewrite nothing seems to work...
how should i rewrite ?
You should take in count the rest of the query string in your rule.
"^/address(\?.*)?" => "/address.php$1",
Your query string should be:
http://www.domain.com/address/?back=order-opc.php?step=1
Note the slash / as /address means /address/index.php or /address/index.html ( or whatever is your default document )

How do I get just the sitename from url in ruby?

I have a url such as:
http://www.relevantmagazine.com/life/relationship/blog/23317-pursuing-singleness
And would like to extract just relevantmagazine from it.
Currently I have:
#urlroot = URI.parse(#link.url).host
But it returns www.relevantmagazine.com can anyone help me?
Using a gem for this might be overkill, but anyway: There's a handy gem called domainatrix that can extract the sitename for your while dealing with things like two element top-level domains and more.
url = Domainatrix.parse("http://www.pauldix.net")
url.url # => "http://www.pauldix.net" (the original url)
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"
url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"
how about
#urlroot = URI.parse(#link.url).host.gsub("www.", "").split(".")[0]
Try this regular expression:
regex = %r{http://[w]*[\.]*[^/|$]*}
If you had the following url strings, it gives the following:
url = 'http://www.google.com/?q=blah'
url.scan(regex) => ["http://www.google.com"]
url = 'http://google.com/?q=blah'
url.scan(regex) => ["http://google.com"]
url = 'http://google.com'
url.scan(regex) => ["http://google.com"]
url = 'http://foo.bar.pauldix.co.uk/asdf.html?q=arg'
url.scan(regex) => ["http://foo.bar.pauldix.co.uk"]
It's not perfect, but it will strip out everything but the prefix and the host name. You can then easily clean up the prefix with some other code knowing now you only need to look for an http:// or http://www. at the beginning of the string. Another thought is you may need to tweak the regex I gave you a little if you are also going to parse https://. I hope this helps you get started!
Edit:
I reread the question, and realized my answer doesn't really do what you asked. I suppose it might be helpful to know if you know if the urls you're parsing will have a set format like always have the www. If it does, you could use a regular expression that extracts everything between the first and second period in the url. If not, perhaps you could tweak my regex so that it's everything between the / or www. and the first period. That might be the easiest way to get just the site name with none of the www. or the .com or .au.uk and such.
Revised regex:
regex = %r{http://[w]*[\.]*[^\.]*}
url = 'http://foo.bar.pauldix.co.uk/asdf.html?q=arg'
url.scan(regex) => ["http://foo"]
It'll be weird. If you use the regex stuff, you'll probably have to do it incrementally to clean up the url to extract the part you want.
Maybe you can just split it?
URI.parse(#link.url).host.split('.')[1]
Keep in mind that some registered domains may have more than one component to the registered country domain, like .co.uk or .co.jp or .com.au for example.
I found the answer inspired by tadman's answer and the answer in another question
#urlroot = URI.parse(item.url).host
#urlroot = #urlroot.start_with?('www.') ? #urlroot[4..-1] : #urlroot
#urlroot = #urlroot.split('.')[0]
First line get the host, second line gets removes the www. if they is one and third line get everything before the next dot.

FB PHP SDK and mod_rewrite: getUser returns 0

I am using the Facebook PHP SDK in order to allow users to log in to my site using Facebook.
In my test below (assume the URL is http://mysite.com/test/fbtest.php),
<?php
require_once("facebook.php");
$fb = new Facebook(array('appId' => 'APP_ID', 'secret' => 'APP_SECRET'));
$fbuser = false;
$fbuserid = $fb->getUser();
$fblogin = $fb->getLoginUrl(array(
'redirect_uri' => "http://{$_SERVER['HTTP_HOST']}/test/fbtest.php"));
if($fbuserid)
{
try
{
$fbuser = $fb->api("/me");
print_r($fbuser);
echo "<br/>\n";
}
catch (FacebookApiException $e)
{
$fbuser = false;
$fbuserid = 0;
}
}
if(!$fbuser)
echo "FB Login\n";
?>
This seems to work as expected. However, when I add the following rewrite rule,
RewriteRule ^/FBtest/(.*)$ http://%{HTTP_HOST}/test/fbtest.php$1
Then change my login redirect to the following,
$fblogin = $fb->getLoginUrl(array(
'redirect_uri' => "http://{$_SERVER['HTTP_HOST']}/FBtest/"));
Then $fb->getUser() always returns 0. I feel that I am missing something important.
In the server side flow, your redirect_uri get’s called with the necessary values as GET parameters in the query string.
'redirect_uri' => "http://{$_SERVER['HTTP_HOST']}/FBtest/"
So with this redirect_uri, something like
http://example.com/FBtest/?state=foo&code=bar
will be called on your server.
RewriteRule ^/FBtest/(.*)$ http://%{HTTP_HOST}/test/fbtest.php$1
RewriteRules don’t examine the query string, they only look at the path component of the URL. In your case, that’s /FBtest/, nothing behind it – so the internal redirect goes to /test/fbtest.php, and the query string parameters get lost, because you didn’t say you wanted to pass them on.
Add the flag [QSA] – for “query string append” – to your RewriteRule (and remove the unnecessary (.*)) – then things should work as expected, because your fbtest.php will get the query string parameters needed for the auth process.
Finally figured this problem out. Here is the RewriteRule I need:
RewriteRule ^/FBtest/$ http://%{HTTP_HOST}/test/fbtest.php [QSA,NE]
It turns out I needed to add both the QSA and NE flags to this rule.
Much like in CBroe's answer, the QSA flag was needed for the state/code parameters added to redirect_uri (using (.*) within the RewriteRule instead doesn't catch these additional parameters).
I also needed to add the NE flag because the state/code that the Facebook authentication was adding to the redirect_uri was being escaped.
You will not be able to use any other domain other than the domain you set in your applications settings.
"be sure you are using the root domain, and not sample.com/test/ as your url settings."

Resources