URL being stored in SCRIPT_NAME on subsequent requests in IIRF? - url-rewriting

I'm having an issue with IIRF (Ionics Isapi Rewrite Filter) on IIS6 (although in this case not sure that's relevant), and it appears to be working on initial requests, but on a standard refresh (i.e. F5 not CTRL + F5), most of the time it will just 404. Although it appears to be intermittment. My re-write rule itself seems correctly, and I've tested it on various RegEx testers, as well as the fact it does work fine on a clear-cache refresh. I'm not an expert, but it appears to relate to the fact that on the times it doesn't work, the URL of the page I'm trying to hit is being fed through in the SCRIPT_NAME HTTP variable, rather than coming via the URL, which in this case appears to be the path I want to re-write to (although like I say, it 404's so it doesn't appear to actually be going to this path in these cases). I'm sure you'll see quite quickly that I'm just essentially doing extensionless URL re-writing. I've tried adding rules to re-write on the SCRIPT_NAME but no such luck so far.
My config is:
RewriteLog iirf
RewriteLogLevel 5
RewriteEngine ON
IterationLimit 5
UrlDecoding OFF
# Rewrite all extensionless URLs to index.html
RewriteRule ^[^.]*$ /appname/index.html
See the log below - this is a case of it NOT working. I'm hitting /appname/task/5 but it appears to store that in the SCRIPT_NAME. Strangely the URL it appears to want to re-write is actually the URL I want it to rewrite to.. Again, this is only on subsequent requests. On the first request it almost always re-writes without issue and page loads fine.
Tue Jul 12 10:17:33 - 4432 - Cached: DLL_THREAD_DETACH
Tue Jul 12 10:17:33 - 4432 - Cached: DLL_THREAD_DETACH
Tue Jul 12 10:17:33 - 4432 - HttpFilterProc: SF_NOTIFY_URL_MAP
Tue Jul 12 10:17:33 - 4432 - HttpFilterProc: cfg= 0x01C8CC60
Tue Jul 12 10:17:33 - 4432 - HttpFilterProc: SF_NOTIFY_AUTH_COMPLETE
Tue Jul 12 10:17:33 - 4432 - DoRewrites
Tue Jul 12 10:17:33 - 4432 - GetServerVariable_AutoFree: getting 'QUERY_STRING'
Tue Jul 12 10:17:33 - 4432 - GetServerVariable_AutoFree: 1 bytes
Tue Jul 12 10:17:33 - 4432 - GetServerVariable_AutoFree: result ''
Tue Jul 12 10:17:33 - 4432 - GetHeader_AutoFree: getting 'method'
Tue Jul 12 10:17:33 - 4432 - GetHeader_AutoFree: 4 bytes ptr:0x000D93A8
Tue Jul 12 10:17:33 - 4432 - GetHeader_AutoFree: 'method' = 'GET'
Tue Jul 12 10:17:33 - 4432 - DoRewrites: Url: '/appname/index.html'
Tue Jul 12 10:17:33 - 4432 - EvaluateRules: depth=0
Tue Jul 12 10:17:33 - 4432 - GetServerVariable: getting 'SCRIPT_NAME'
Tue Jul 12 10:17:33 - 4432 - GetServerVariable: 16 bytes
Tue Jul 12 10:17:33 - 4432 - GetServerVariable: result '/appname/task/5'
Tue Jul 12 10:17:33 - 4432 - EvaluateRules: no RewriteBase
Tue Jul 12 10:17:33 - 4432 - EvaluateRules: Rule 1: pattern: ^[^.]*$ subject: /appname/index.html
Tue Jul 12 10:17:33 - 4432 - EvaluateRules: Rule 1: -1 (No match)
Tue Jul 12 10:17:33 - 4432 - EvaluateRules: returning 0
Tue Jul 12 10:17:33 - 4432 - DoRewrites: No Rewrite
Any help is much appreciated!
Thanks

I'm not sure what the reason for it being stored in the SCRIPT_NAME variable was, but I've written an extra rule to cater for it, which fixed it for me:
RewriteEngine ON
IterationLimit 5
UrlDecoding OFF
# Rewrite all extensionless URLs to index.html
RewriteRule ^[^.]*$ /appname/index.html
# Subsequent requests may store the URL in the SCRIPT_NAME variable for some reason
RewriteCond %{SCRIPT_NAME} ^[^.]*$ # If SCRIPT_NAME variable contains an extensionless URL
RewriteRule .*\S.* /appname/index.html [L] # Rewrite all URLs to index if RewriteCond met

I've had this issue for a while, and to compound it I have it running on an Integrated Windows Authentication secure website. I think the problem affects more than just the requested uri. My guess is somehow IIRF is caching the previous request's directive set, and using that.
For testing, firstly I ensured that the HTTP response headers forced the browser not to cache any content, so the browser should always request and receive updated content. This does work as expected.
I had the IIRF directives parse the request through various destination script files I could flip between. What I found, after updating and saving both the IIRF.ini directives and the php files, was that each request, essentially after a soft refresh (F5), would indeed receive updated content (IIS was parsing and serving the scripts), but the directives executed were the previous ones.
eg, if a request resulted in a legitimate 404 error via the directives, then the next URL request, whether to an existing script or not, would also produce the 404 error. Getting a new full request (generally the Ctrl-F5 hard refresh, or a first request) would actually execute the proper/current directives.
After some header analysis, it seems that the requests showing problems have unsynchronized server variables, on first run through the directives:
SCRIPT_NAME holds the current URL being requested
REQUEST_URI showed the previous URL requested (the only variable with the previous URL)
That meant that the RewriteRule directives would be making use of the previous request_uri, which didn't hold the correct url. That seems to explain the 'old' incorrect end result on subsequent requests. I added the following directives, which seems to have solved this, at least on a basic level:
RewriteCond %{REQUEST_URI}::%{SCRIPT_NAME} !^([^\?]*?)(\?.*)?::\1$
RewriteRule ^ %{SCRIPT_NAME} [R=301,L,QSA]
the first check makes sure the pre-querystring value of request_uri matches the script_name value
the 301 forces the browser to a hard refresh, which should then synchronize the URLs (and any querystring is also resent). It should be the first directive, with the rest following.
This doesn't work, however, if the same url is requested a 2nd time. The newly saved directives won't take until the hard refresh is sent. I haven't yet found a way to determine if the latest directives are being parsed on soft refresh of the same URL.
In context of my secure site, it also didn't seem to want to work with requests that didn't include the Authorization HTTP header. Each of the browsers does the login handshake process properly, but once authorized, if the browser doesn't send that header, it can cause cached 401 errors on requests that shouldn't produce it. So I determined that the same solution can apply to these requests - forcing the refresh as long as the browser has not sent the Authorization header:
RewriteCond %{REQUEST_URI}::%{SCRIPT_NAME} !^([^\?]*?)(\?.*)?::\1$ [OR]
RewriteCond %{HTTP_Authorization} !^.+$
RewriteRule ^ %{SCRIPT_NAME} [R=301,L,QSA]
I think this solution addresses the same-url soft refresh problem above, since it appears that the soft refresh doesn't send the Authorization header. Effectively, every request becomes a hard refresh, ensuring sync'd variables and Auth header for the secure content.
For false positives, in theory this solution may be insufficient if the same url is requested on a soft refresh after the directives have changed and the Authorization header is sent. Secure content will be most recent, but directives used will still be the previous version.
It may also be insufficient for any request being parsed with differing yet valid request_uri and script_name values (usually this is on second itartion though). One step may be to set an environment variable on first iteration match, and only do the sync check and redirect if it's the first iteration of the directives. But IIRF doesn't seem to support the [E] modifier.
In short, I think this describes the problem:
Via IIRF, for any valid first request iteration, ^%{REQUEST_URI}(\?.+)?$ should always match %{SCRIPT_NAME}, but an untrappable old-directives parsing would occur if the variables are identical on first iteration, unless there's a way to check the cached (currently used) directives against the saved directives.
At any rate, hopefully this is one step closer to determining a universal workaround for this seemingly cached previous-request-and-directives IIRF bug.

Related

How to solve malformed URI while using elasticdump?

I am using Elastcsearch 7.9 and elasticdump 6.7
I am trying to get a dump (.json) file from elastcsearch with all the documents of a index.
I am getting
Thu, 27 May 2021 06:26:35 GMT | starting dump
Thu, 27 May 2021 06:26:35 GMT | Error Emitted => URI malformed
Thu, 27 May 2021 06:26:35 GMT | Error Emitted => URI malformed
Thu, 27 May 2021 06:26:35 GMT | Total Writes: 0
Thu, 27 May 2021 06:26:35 GMT | dump ended with error (get phase) => URIError: URI malformed
My command
elasticdump \
--input=https://username:password#elasticsearchURL:9200/index \
--output=/home/ubuntu/dump.json \
--type=data
Here the problem is password have many special characters.
I cant change the password.
I tried
quotes for password.
Escape special character.
Encoding the url
for all cases I am getting same error
Please help me to send password with special characters(# & % ^ * : ) , $)
Thanks in advance.
actually, ı tried elastisdump but properly not working,you should use elastic snapshot,maybe working in less data but if data is big elasticdump wont work
loook at the elastic snapshot
snapshot
for example using snapshot
PUT /_snapshot/my_fs_backup?pretty
{
"type": "fs",
"settings": {
"location": "/home/zeus/Desktop/denesnapshot",
"compress": true
}
}
#chmod 777 /home/zeus/Desktop/denesnapshot
#create repo
PUT /_snapshot/my_fs_backup/deneme
#get all repo
GET /_snapshot/my_fs_backup/_all
#if you want delete repo
DELETE /_snapshot/my_fs_backup/deneme
#restore
POST /_snapshot/my_fs_backup/deneme/_restore
extra you should add
path.repo: /home/zeus/Desktop/denesnapshot
to elasticsearch.yml file
I imported 1.5 million data and that was 300GB
I made mistake that I have encoded the complete URL.Don't encode the complete URL.
Encode the password while adding password into url.
for example
Password - a#b)c#d%e*f.^gh
Encoded password - a%40b%29c%23d%25e*f.%5Egh
My script will be:
elasticdump \
--input=https://username:a%40b%29c%23d%25e*f.%5Egh#elasticsearchURL:9200/index \
--output=/home/ubuntu/dump.json \
--type=data
Please refer ASCII Encoding Reference for encoding the password

optional regex patten in td-agent

I have two different format of logs line, you can make test using this site
I need to keep optional client section in below line, if it is present it conclude otherwise it ignore
\[(?<date>[^\]]*)\] \[(?<level>[^\]]*)\] \[(?<pid-tid>[^\]]*)\] (\[(?<client>[^\]]*)\]) (?<message>[^\]]*)
Log Lines - without client
[Mon Jan 18 21:55:58.239970 2016] [proxy_http:error] [pid 2769:tid 140041068427008] AH01114: HTTP: failed to make connection to backend: xx.xxx.xx.xx
Log Lines - with client
[Mon Jan 18 21:55:58.239970 2016] [proxy_http:error] [pid 2769:tid 140041068427008] [client xx.xxx.x.xx:10723] AH01114: HTTP: failed to make connection to backend: xx.xxx.xx.xx
I have tried like (.*?clientsection) -> 0 or more matches
\[(?<date>[^\]]*)\] \[(?<level>[^\]]*)\] \[(?<pid-tid>[^\]]*)\] (.*?(\[(?<client>[^\]]*)\])) (?<message>[^\]]*)
but it does not work
In your second expression, (.*?(\[(?<client>[^\]]*)\])) part matches an obligatory space, and then captures any 0+ chars, as few as possible, then captures 0+ chars other than ] into "client" group and then matches ] placing it inside the numbered capture group. If the client part is missing in the text, your expression will attempt to match the first space, then a [...] substring, and then again a space.
If you want to fix the regex, you need to make the "client" group optional and make sure the adjoining context is also made optional.
Replace the (.*?(\[(?<client>[^\]]*)\])) with (?: \[(?<client>[^\]]*)\])?. Here, (?:...)? is an optional non-capturing group that will create no subgroup (no capture), and will match 1 or 0 occurrences of its pattern, only if all that sequence is present.
See the Rubular demo (\n is added to the negated character classes since a multiline string is used for testing).

Executable downloaded from Cloudfront in IE11 (Windows 7) downloads without a file extension

The long-short of it is an .exe downloaded from Cloudfront (using signed URLs) in IE11/Win7 downloads without an extension (exe_file.exe -> exe_file)
I don't think that it's the same issue as described here (among many, many other places) as the file is not renamed exe_file_exe, the extension is just dropped.
The file is being served from Cloudfront from S3 - and was uploaded via aws-cli
$ aws s3 cp exe_file.exe s3://cdn/exe_file.exe --content-type "application/x-msdownload"
as far as I'm aware the content-type argument isn't absolutely necessary as CF/S3/something, at some point, tries to do some intelligent MIME assigning (plus, before, when I was uploading without that arg, inspecting the download headers would show the correct MIME type).
Headers received when downloading the file
HTTP/1.1 200 OK
Content-Type: application/x-msdownload
Content-Length: 69538768
Connection: keep-alive
Date: Tue, 27 Dec 2016 17:36:51 GMT
Last-Modified: Thu, 22 Dec 2016 22:31:59 GMT
ETag: "c8fc68a920e198dca95e5549f8657bfb"
Accept-Ranges: bytes
Server: AmazonS3
Age: 335
X-Cache: Hit from cloudfront
This only happens in IE11 on Windows 7 - it works fine on IE11/Windows 10 (I say only but I have not tried on, for example, IE8 - you couldn't pay me enough money to put myself through that). And it does not happen with other downloads - dmg_file.dmg and linux_file.zip are both downloaded with the extension. Other browsers are also not impacted - they all download the file as-is in S3.
I have tried with and without AVs present - it does not make a difference.
You need to set the content-disposition correctly:
Forcing SaveAs using the HTTP header
In order to force the browser to show SaveAs dialog when clicking a hyperlink you have to include the following header in HTTP response of the file to be downloaded:
Content-Disposition: attachment; filename="<file name.ext>"; filename*=utf-8''<file name.ext>
Note: Those user agents that do not support the RFC 5987 encoding ignore filename* when it occurs after filename.
Where is the filename you want to appear in SaveAs dialog (like finances.xls or mortgage.pdf) - without < and > symbols.
You have to keep the following in mind:
The filename should be in US-ASCII charset and shouldn't contain special characters: < > \ " / : | ? * space.
The filename should not have any directory path information specified.
The filename should be enclosed in double quotes but most browsers will support file names without double quotes.
Ancient browsers also required the following (not needed nowadays, but for a fool proof solution might be worth doing):
Content-Type header should be before Content-Disposition.
Content-Type header should refer to an unknown MIME type (at least until the older browsers go away).
So, you should use cp with options:
--content-type (string) Specify an explicit content type for this operation. This value overrides any guessed mime types.
--content-disposition (string) Specifies presentational information for the object.
--metadata-directive REPLACE Specifies whether the metadata is copied from the source object or replaced with metadata provided when copying S3 objects.
Note that if you are using any of the following parameters: --content-type, content-language, --content-encoding, --content-disposition, --cache-control, or --expires, you will need to specify --metadata-directive REPLACE for non-multipart copies if you want the copied objects to have the specified metadata values.
try:
aws s3 cp exe_file.exe s3://cdn/exe_file.exe --content-type "application/x-msdownload" --content-disposition "attachment; filename=\"exe_file.exe\"; filename*=utf-8''exe_file.exe" --metadata-directive REPLACE
In addition to the accepted answer, I supplied my own response-content-disposition parameter to the Cloudfront Signer:
in Python, it looked like
from botocore.signers import CloudFrontSigner
def generate_presigned_url(filename, headers={}):
cf_signer = CloudFrontSigner(CF_KEY_ID, rsa_signer)
headers = '&'.join(["%s=%s" % (key, urllib.quote_plus(value)) for key, value in headers.iteritems()])
return cf_signer.generate_presigned_url(
'https://' + CF_DOMAIN + '/' + filename + ("" if len(headers) == 0 else "?%s" % (headers)),
# ... other params
)
called using
cloudfront.generate_presigned_url(file_name, {
'response-content-disposition': 'attachment; filename="exe_file.exe"; filename*=utf-8\'\'exe_file.exe'
})

Get source code of this website

I would like to get some data from some books I want to buy. But for that I need to get the source code of the page and I can not.
A exemplo URL is:
http://www.mcu.es/webISBN/tituloDetalle.do?sidTitul=793927&action=busquedaInicial&noValidating=true&POS=0&MAX=50&TOTAL=0&prev_layout=busquedaisbn&layout=busquedaisbn&language=es
I'm testing with various possibilities in curl, wget, lynx, accepting cookies, etc.
# curl http://www.mcu.es/webISBN/tituloDetalle.do?sidTitul=793927&action=busquedaInicial&noValidating=true&POS=0&MAX=50&TOTAL=0&prev_layout=busquedaisbn&layout=busquedaisbn&language=es
[1] 1680
[2] 1681
[3] 1682
[4] 1683
[5] 1684
[6] 1685
[7] 1686
[8] 1687
If I see the headers, I marked a 302
curl -I 'http://www.mcu.es/webISBN/tituloDetalle.do?sidTitul=793927&action=busquedaInicial&noValidating=true&POS=0&MAX=50&TOTAL=0&prev_layout=busquedaisbn&layout=busquedaisbn&language=es'
**HTTP/1.1 302 Movido tempor�lmente**
Date: Fri, 08 Jul 2016 09:31:07 GMT
Server: Apache
X-Powered-By: Servlet 2.4; JBoss-4.2.1.GA (build: SVNTag=JBoss_4_2_1_GA date=200707131605)/Tomcat-5.5
Location: http://www.mcu.es/paginaError.html
Vary: Accept-Encoding,User-Agent
Content-Type: text/plain; charset=ISO-8859-1
The same goes for me if I use '', "", \? \&, wget, lynx -source, accept cookies, etc.The only thing I get download error page (where I send the code 302)
You know how I can download the source code of the URL that I put an example? (Bash, php, python, perl ...)
Thank you very much.
The page you are looking for isn't available. Try visiting the website on your browser, you will still not be able to get the information you need. If you need the source you need to give the -L flag and it will get the source code.

Symfony 1.4 routing issues only when using index.php and mod_rewrite

The most succinct way of summarizing the problem at hand:
Development is over, and everything was run against frontend_dev.php during development and testing
This means that all URLs were: server.com/frontend_dev.php/module/action/parm
Moving to production means switching environments, and thusly using index.php instead
server.com/index.php/module/action/parm
Part of moving to production is using mod_rewrite under Apache2 to make the “index.php” part of the URL vanish, but still be functioning
server.com/module/action/parm is still routed against index.php
The URLs are indeed appearing w/o the index.php part, but symfony routing is now complaining:
ie, server.com/goals which routes to goals/index
-- perfectly fine using frontend_dev.php or index.php as an explicit controller
server.com/index.php/goals
-- using no explicit controller (via rewrite):
[Tue Dec 14 12:59:51 2010] [error] [client 75.16.181.113] Empty module and/or action after parsing the URL "/goals/" (/)
I have verified the rewrite is indeed routing to index.php by changing the rewrite to something that doesn’t exist:
[Tue Dec 14 13:05:43 2010] [error] [client 75.16.181.113] script '/opt/www/projects/adam/web/index2.php' not found or unable to stat
I have tried rerouting to frontend_dev.php, but only am provided with more debug information from symfony, none of which is helpful:
404 | Not Found | sfError404Exception Empty module and/or action after parsing the URL "/goals/" (/).
stack trace
1. at () in SF_SYMFONY_LIB_DIR/controller/sfFrontWebController.class.php line 44 ...
2. at sfFrontWebController->dispatch() in SF_SYMFONY_LIB_DIR/util/sfContext.class.php line 170 ...
3. at sfContext->dispatch() in SF_ROOT_DIR/web/frontend_dev.php line 13 ...
I have tried the using the RewriteBase option in .htaccess, but that does not help any, nor changing the true/false in the configuration line of the controllers
I hope this provides enough to understand why we’re confused, and able to direct us to a resolution.
Following is the current .htaccess and index/frontend configuration lines
Index.php:
$configuration = ProjectConfiguration::getApplicationConfiguration('frontend', 'prod', false);
Frontend_dev.php:
$configuration = ProjectConfiguration::getApplicationConfiguration('frontend', 'dev', true);
.htaccess:
RewriteEngine On
# uncomment the following line, if you are having trouble
# getting no_script_name to work
#RewriteBase /
# we skip all files with .something
#RewriteCond %{REQUEST_URI} ..+$
#RewriteCond %{REQUEST_URI} !.html$
#RewriteRule .* - [L]
# we check if the .html version is here (caching)
RewriteRule ^$ index.html [QSA]
RewriteRule ^([^.]+)$ $1.html [QSA]
RewriteCond %{REQUEST_FILENAME} !-f
# no, so we redirect to our front web controller
RewriteRule ^(.*)$ index.php [QSA,L]
I had similar issue and setting 'AllowOverride' to ALL for Symfony's WEB folder in Virtual Host's config sorted out this problem.
Welcome to Stack Overflow.
Maybe you're confusing the "index" route with "index.php"?
These URLs should theoretically all work.
server.com/frontend_dev.php/goals/index
server.com/index.php/goals/index
server.com/goals/index
server.com/goals
I can't remember if the trailing slash, like server.com/goals/, works or not. There's a gotcha there.

Resources