I'm building the new version of a website and I have to take charge of older urls looks like :
http://website.com/article_title_rewrited-article_id.html
Actually, I try to work about something like that :
app.get('/:title\-:id([0-9]).html', function...);
But of course it fails !
Can I do this type of rewriting using expressjs, or have I to use another method to port the url rewriting ?
Thanks by advance !
I see you are trying to use the Express url route parser as well as REGEX expressions, unfortunately this doesn't work, you have to use one or the other.
Remove the string and place with a REGEX pattern. Then the groups in the REGEX expressions will be available at req.params[0] and req.params[1].
app.get(/(.+)\.html/, function(req, res, next) {
res.redirect(req.params[0].substring(0, req.params[0].length - 5)); // -5 for length of '.html'
});
I believe that should (untested) sort it generically for all .html extensions if that can be a helpful guide :)
Related
So I'm working on a crawler to get a bunch of images on a page that are saved as links. The relevant code, at the moment, is:
def parse_html(html)
html_doc = Nokogiri::HTML(html)
nodes = html_doc.xpath("//a[#href]")
nodes.inject([]) do |uris, node|
uris << node.attr('href').strip
end.uniq
end
I am current getting a bunch of links, most of which are images, but not all. I want to narrow down the links before downloading with a regex. So far, I haven't been able to come up with a Ruby-Friendly regex for the job. The best I have is:
^https?:\/\/(?:[a-z0-9\-]+\.)+[a-z]{2,6}(?:/[^\/?]+)+\.(?:jpg|gif|png)$.match(nodes)
Admittedly, I got that regex from someone else, and tried to edit it to work and I'm failing. One of the big problems I'm having is the original Regex I took had a few "#"'s in it, which I don't know if that is a character I can escape, or if Ruby is just going to stop reading at that point. Help much appreciated.
I would consider modifying your XPath to include your logic. For example, if you only wanted the a elements that contained an img you can use the following:
"//a[img][#href]"
Or even go further and extract just the URIs directly from the href values:
uris = html_doc.xpath("//a[img]/#href").map(&:value)
As some have said, you may not want to use Regex for this, but if you're determined to:
^http(s?):\/\/.*\.(jpeg|jpg|gif|png)
Is a pretty simple one that will grab anything beginning with http or https and ending with one of the file extensions listed. You should be able to figure out how to extend this one, Rubular.com is good for experimenting with these.
Regexp is a very powerful tool but - compared to simple string comparisons - they are pretty slow.
For your simple example, I would suggest using a simple condition like:
IMAGE_EXTS = %w[gif jpg png]
if IMAGE_EXTS.any? { |ext| uri.end_with?(ext) }
# ...
In the context of your question, you might want to change your method to:
IMAGE_EXTS = %w[gif jpg png]
def parse_html(html)
uris = []
Nokogiri::HTML(html).xpath("//a[#href]").each do |node|
uri = node.attr('href').strip
uris << uri if IMAGE_EXTS.any? { |ext| uri.end_with?(ext) }
end
uris.uniq
end
I have the following in my routes.php:
Route::get('test/foo', 'TestController#index');
Route::get('test/bar', 'TestController#index');
Route::get('test/baz', 'TestController#index');
and I am trying to reduce this to the following:
Route::get(either 'test/foo' or 'test/bar' or 'test/baz', 'TestController#index');
One documented approach that would sort of apply here, is to place a regex constraint to the route:
Route::get('test/{uri}','TestController#index')->where('uri', 'regex for foo, bar, and baz...');
However, this solution would be ugly. Isn't there an elegant way to just express
{uri} in foo,bar,baz
in Laravel's language? Otherwise, what would the regex look like?
Thanks!
P.S. I've read this, but it didn't apply to my case with 3 routes.
I'm not sure why do you say that RegEx is ugly. I basically think RegEx is one of the most powerful tools.
In your case, I think the below snippet should do the job:
Route::get('user/{name}', function ($name) { // }) ->where('name', '(foo|bar|baz)');
The (foo|bar|baz) RegExr will match any of these string: 'foo', 'bar', 'baz'. So, if you need more, just add pipe (|) and add the needed string.
I have a a number of routes that can be like :
possible routes:
- mac-book-retina-17-pid234-234
- hp-laptop-pid234-234
- vaoe-x12-pid234-234
and I want to match all to one action using the constraints in Ruby route file. Something like
get 'products/:product_info', to: 'products#type', constraints: { product_info: /[a-z]+-a-z]+-a-z]+-pid\d+-\d+/ }
The problem is that the /[a-z]+-/ can get repeated 1 time, 2 times and 3 times, and it makes it hard to get a consistent shared Regex for all the cases.
The only part that is constant in all routes is the last part: pid234-234 which refers to the product id and another sub_id.
I am thinking of something like: find all strings untill you each this part(pid), but I do not know how to do that.
I would say a good place to start is dynamic-segments
get 'products/:product_info', to: 'products#type', constraints: { product_info: /[A-Z]\d{5}/ }
I hope that this helps
Happy Hacking
I think I managed to find a possible solution for this:
(.*)pid\d+-\d+
this regex will match all the strings until it reaches the pid-12-12.
I'm new to using regex expressions. I need to accept all subdomains like:
something.mysite.com
something2.mysite.com
anotherthing.mysite.com
What kind of regex can I put there if I want to do something like:
rack_env['SERVER_NAME'].match <regex>
You shouldn't be using a regex here. The way to go is:
rack_env['SERVER_NAME'].end_with?(".mysite.com")
Something along the lines of \.mysite\.com$ should work. http://rubular.com is a good resource for testing regular expressions.
[a-zA-Z0-9]+\.[a-z]+\.com
something.mysite.com //ok
something2.mysite.com //ok
anotherthing.mysite.com //ok
something2mysite.com //not ok
anotherthing.mysitecom //not ok
But It is risky because you.can.have.as.many.subdomain.as.you.want in the future
If it just the sub domains that are changing you could use:
/\w+\.mysite\.com/
I have the following:
tmpArray[cTerms++] = "[sclenka] CONTAINS \"*" + sessionScope.sclenka +"*\"";
(With the help of Per Henrik Lausten)
Which should result in: "*term*"
But it doesn't, I get this instead: "term"
So, my question is how do I use wildcard full text search?
Thank you!
If you want to use a wildcard search, then generate the following query string:
tmpArray[cTerms++] = "[sclenka] = \"*" + sessionScope.sclenka +"*\"";
This should generate a search on "*search query*".
In general, this is a good way of performing a search since the user probably expect your search to work like that.
Source: http://www-10.lotus.com/ldd/ddwiki.nsf/dx/Searching_for_Documents#Full-text+Search
If your string is correct and you are getting no results, then test the same string in the Notes client FTI search.
You can also use the following debug on the server.
DEBUG_FTV_SEARCH=1
Then check the output on the domino console when you do a search.
So if I understand you, the result is an escaped form of the search term in which the asterisks have been removed?
Could you use the construct:
tmpArray[cTerms++] = "[sclenka] CONTAINS \"" + String.fromCharCode(42) + sessionScope.sclenka + String.fromCharCode(42) + "\"";
At least that should avoid escaping?
I think you have missed a bit of escaping characters in the String you are generating.
tmpArray[cTerms++] = "[sclenka] CONTAINS \"" + sessionScope.sclenka +"\"";
leyrer, is it possible -- just possible -- that you're doing this in a browser and your session is not authenticated? If so, you may be searching the database as "anonymous" where when you test from the browser you're searching as "leyrer".
It's just a thought - but I used to see that all the time when people would start using my NCT Search tools. They'd swear they were getting no results, and when I'd dig I'd always find that they were using the browser as anonymous rather than as a logged in session.
#GKIDD
I just tested this on my own site. I have NCTSearch setup. I accepts the search term from the the web and runs database.ftsearch() as part of its job from within lotuscript.
I searched on "data*" and got at least as many results as when I searched on "database".
Based on that, I think something else is going on.
From my earlier comment on other answer, try this: Create another agent that does JUST the search. Have it grab the search term from agent context as if it were a docid. Call the agent from the first agent using "agent.runonserver(searchterm)" see if you can fool it
Andrew, I'm getting the results with Anonymous user, but not with the wildcard. Here goo.gl/YVtXm on the first line, it says that CONTAINS or contains or = does not work when searching from the web.