URL Rewrite Pattern for my URL - mod-rewrite

I'm using URL rewrite module, but i do not know how to apply a pattern for my requirement, please help
My URLs are :
mychick.com/Cat/Prod/chicken-skin-clean.html
&
mychick.com/Cat/Prod/chicken-head-crown.html
I need this URL should be rewritten to
mychick.com/Cat/Prod/chicken-skin-clean
mychick.com/Cat/Prod/chicken-head-crown
I need a single pattern to rewrite these two URLS

if you looking doing it in java , you can split it by ".html"
public static void main(String[] args) {
String s="mychick.com/Cat/Prod/chicken-skin-clean.html";
System.out.println(s);
System.out.println(s.split(".html")[0]);
}
on output you have
"mychick.com/Cat/Prod/chicken-skin-clean.html"
"mychick.com/Cat/Prod/chicken-skin-clean"
bet, it can be done better and its only works to cut .html, but you asked for it.
problem will by if in url is syntax to cut, but if so, you can join tab elements from 0 to lenght-1 and puting back ".html" between strings

Related

Why is this lighttpd url rewrite not workng?

I have the following mod_rewrite code for lighttpd, but it does not properly foreward the user:
$SERVER["socket"] == ":3041" {
server.document-root = server_root + "/paste"
url.rewrite-once = ( "^/([^/\.]+)/?$" => "?page=paste&id=$1")
}
It should turn the url domain.com/H839jec into domain.com/index.php?page=paste&id=H839jec however it is not doing that, instead it is redirecting everything to domain.com. I dont know much about mod_rewrite and would appreciate some input on why it is doing this.
Use the following :
url.rewrite-once = ("^/(.*)$" => "/?page=paste&id=$1")
I don't know the exact issue in your code, but first the regex looks unnecessarily complicated and may not match what you expected it to match, and second you're redirecting to a query string where as I would expect you still need to redirect to a valid path before the query string, that's why I redirect to /?page... instead of just ?page....

Jackrabbit XPath Query: UUID with leading number in path

I have what I think is an interesting problem executing queries in Jackrabbit when a node in the query path is a UUID that start with a number.
For example, this query work fine as the second node starts with a letter, 'f':
/*/JCP/feeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
This query however does not, if the first 'f' is replaced with '2':
/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
The exception:
Encountered "-" at line 1, column 26.
Was expecting one of:
<IntegerLiteral> ...
<DecimalLiteral> ...
<DoubleLiteral> ...
<StringLiteral> ...
... rest omitted for brevity ...
for statement: for $v in /*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence] return $v
My code in general
def queryString = queryFor path
def queryManager = session.workspace.queryManager
def query = queryManager.createQuery queryString, Query.XPATH // fails here
query.execute().nodes
I'm aware my query, with the leading asterisk, may not be the best, but I'm just starting out with querying in general. Maybe using another language other than XPATH might work.
I tried the advice in this post, adding a save before creating the query, but no luck
Jackrabbit Running Queries against UUID
Thanks in advance for any input!
A solution that worked was to try and properly escape parts of the query path, namely the individual steps used to build up the path into the repository. The exception message was somewhat misleading, at least to me, as in made me think that the hyphens were part of the root cause. The root problem was that the leading number in the node name created an illegal XPATH query as suggested above.
A solution in this case is to encode the individual steps into the path and build the rest of the query. Resulting in the leading number only being escaped:
/*/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//*[#sequence]
Code that represents a list of steps or a path into the Jackrabbit repository:
import org.apache.commons.lang3.StringUtils;
import org.apache.jackrabbit.util.ISO9075;
class Path {
List<String> steps; //...
public String asQuery() {
return steps.size() > 0 ? "/*" + asPathString(encodedSteps()) + "//*" : "//*";
}
private String asPathString(List<String> steps) {
return '/' + StringUtils.join(steps, '/');
}
private List<String> encodedSteps() {
List<String> encodedSteps = new ArrayList<>();
for (String step : steps) {
encodedSteps.add(ISO9075.encode(step));
}
return encodedSteps;
}
}
Some more notes:
If we escape more of the query string as in:
/_x002a_/JCP/_x0032_eeadeaf-1dae-427f-bf4e-842b07965a93//_x002a_[#sequence]
Or the original path encoded as a whole as in:
_x002f_a_x002f_fffe4dcf0-360c-11e4-ad80-14feb59d0ab5_x002f_2cbae0dc-35e2-11e4-b5d6-14feb59d0ab5_x002f_c
The queries do not produce the wanted results.
Thanks to #matthias_h and #LarsH
An XML element name cannot start with a digit. See the XML spec's rules for STag, Name, and NameStartChar. Therefore, the "XPath expression"
/*/JCP/2eeadeaf-1dae-427f-bf4e-842b07965a93/label//*[#sequence]
is illegal, because the name test 2eead... isn't a legal XML name.
As such, you can't just use any old UUID as an XML element name nor as a name test in XPath. However if you put a legal NameStartChar on the front (such as _), you can probably use any UUID.
I'm not clear on whether you think you already have XML data with an element named <2eead...> (and are trying to query that element's descendants); if so, whatever tool produced it is broken, as it emits illegal XML. On the other hand if the <2eead...> is something that you yourself are creating, then presumably you have the option of modifying the element name to be a legal XML name.

Regular Expression find usage of word after "/" in URL

I am trying to parse through URLs using Ruby and return the URLs that match a word after the "/" in .com , .org , etc.
If I am trying to capture "questions" in a URL such as
https://stackoverflow.com/questions I also want to be able to capture https://stackoverflow.com/blah/questions. But I do not want to capture https://stackoverflow.com/queStioNs.
Currently my expression can match https://stackoverflow.com/questions but cannot match with "questions" after another "/", or 2 "/"s, etc.
The end of my regular expression is using \bquestions\.
I tried doing ([a-zA-Z]+\W{1}+\bjob\b|\bjob\b) but this only gets me URLs with /questions and /blah/questions but not /blah/bleh/questions.
What am I doing wrong and how do I match what I need?
You don't actually need a regex for this, you can instead use the URI module:
require 'uri'
urls = ['https://stackoverflow.com/blah/questions', 'https://stackoverflow.com/queStioNs']
urls.each do |url|
the_path = URI(url).path
puts the_path if the_path.include?'questions'
end
I don't know whether there is any simple way around, here is my solution:
regexp = '^(https|http)?:\/\/[\w]+\.(com|org|edu)(\/{1}[a-z]+)*$'
group_length = "https://stackoverflow.com/blah/questions".match(regexp).length
"https://stackoverflow.com/blah/questions".match(regexp)[group_length - 1].gsub("/","")
It will return 'questions'.
Update as per you comments below:
use [\S]*(\/questions){1}$
Hope it helps :)

urlrewriting tuckey using Tuckey

My project (we have Spring 3) needs to rewrite URLs from the form
localhost:8888/testing/test.htm?param1=val1&paramN=valN
to
localhost:8888/nottestinganymore/test.htm?param1=val1&paramN=valN
My current rule looks like:
<from>^/testing/(.*/)?([a-z0-9]*.htm.*)$</from>
<to type="passthrough">/nottestinganymore/$2</to>
But my query parameters are being doubled, so I am getting param1=val1,val1 and paramN=valN,valN...please help! This stuff is a huge pain.
To edit/add, we have use-query-string=true on the project and I doubt I can change that.
The regular expression needs some tweaking. Tuckey uses the java regular expression engine unless specified otherwise. Hence the best way to deal with this is to write a small test case that will confirm if your regular expression is correct. For e.g. a slightly tweaked example of your regular expression with a test case is below.
#Test public void testRegularExpression()
{
String regexp = "/testing/(.*)([a-z0-9]*.htm.*)$";
String url = "localhost:8888/testing/test.htm?param1=val1&paramN=valN";
Pattern pattern = Pattern.compile(regexp);
Matcher matcher = pattern.matcher(url);
if (matcher.find())
{
System.out.println("$1 : " + matcher.group(1) );
System.out.println("$2 : " + matcher.group(2) );
}
}
The above will print the output as follows :
$1 : test
$2 : .htm?param1=val1&paramN=valN
You can modify the expression now to see what "groups" you want to extract from URL and then form the target URL.

Proper gsub regular expression for this URL?

Say I have a string representing a URL:
http://www.mysite.com/somepage.aspx?id=33
..I'd like to escape the forward slashes and the question mark:
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How can I do this via gsub? I've been playing with some regular expressions in there but haven't hit on the winning formula yet.
I suggest you use
url = url.gsub(/(?=[\/?])/, '\\')
As shown here
url = 'http://www.mysite.com/somepage.aspx?id=33'
url = url.gsub(/(?=[\/?])/, '\\')
puts url
output
http:\/\/www.mysite.com\/somepage.aspx\?id=33
How about this one result = searchText.gsub(/(\/|\?)/, "\\\\$1")
I will suggest using a block to make it more readable:
url.gsub(/[\/?]/) { |c| "\\#{c}" }

Resources