Downloading a working - local version of a website without js/css version names - download

Is there a way to wget local version of a website without its version names of js/css? What I used to get the site is below:
wget --mirror --page-requisites --convert-links --adjust-extension --compression=auto --reject-regex "/search|/rss" --no-if-modified-since --no-check-certificate --user-agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36" http://www.example.com
But it crawled the files with it's version names so my js file looks like this:
frontend.min.js#ver=2.5.11
Instead of
frontend.min.js
Also, source code has the same thing:
../jquery/frontend.min.js?ver=2.5.11
I would like to evade that and have it save without version names/info.

You can try removing --page-requisites if you don't need things such as pictures or interactive elements. Removing this will cause wget to not download any CSS or JS files.

Related

At least one output file must be specified, error message issue

I want to export m3u8 which I assuming a streaming URL to mp4.
while googled the way, I couldn't found "https://gist.github.com/primaryobjects/7423d7982656a31e72542f60d30f9d30" this one that saying suggest to use ffmpeg.
so I installed ffmpeg and write down
https://gist.github.com/primaryobjects/7423d7982656a31e72542f60d30f9d30
on CMD but no luck with this code ;
ffmpeg -user_agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/601.7.8 (KHTML, like Gecko) Version/9.1.3 Safari/537.86.7" -i
http://221.157.125.242:1935/mp4/mp4/happyday/mp4:happyday190201_00W.mp4/playlist.m3u8
-c copy pd.mkv
but it says:
"At least one output file must be specified"
How to solve the problem?
You are probably missing double quotes around the -i parameter.
Try this way:
ffmpeg -user_agent "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/601.7.8 (KHTML, like Gecko) Version/9.1.3 Safari/537.86.7" -i "http://221.157.125.242:1935/mp4/mp4/happyday/mp4:happyday190201_00W.mp4/playlist.m3u8" -c copy pd.mkv

How to change/give user agent definition to phantomjs from cmd

How can I give user agent definition to phantomjs, I am currently running using following command on aws server ec2 instance
phantomjs --web-security=no --ssl-protocol=any --ignore-ssl-errors=true driver.js http://example.com
You can set a user agent in PhantomJS only in script (driver.js in your example). The documentation about it: http://phantomjs.org/api/webpage/property/settings.html
If you want to pass user agent to PhantomJS in a command line, you can use a parameter. In the script you can take the parameter and set it as a user agent. You can try an example below:
var webPage = require('webpage');
var system = require('system');
var page = webPage.create();
var userAgent = system.args[1];
page.settings.userAgent = userAgent;
console.log('user agent: ' + page.settings.userAgent);
phantom.exit();
Running it as follows:
$ phantomjs ua.js "Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120
Safari/537.36"
you will get output:
user agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/37.0.2062.120 Safari/537.36

Pattern matching and with multiple parameters in shell script

We face a complicated issue in Apache web server running n Linux where intermittently Apache gives 5XX error for some of the URLs and and that too not continuously. Its like starts with few requests and grows in timely manner. The issue resolves once we restart the Apache.
We are trying to fix the issue but we need a work around till the time where we need to put a script to monitor the access log of Apache server and whenever the issue occurs we have to restart the Apache.
We thought a shell script like tailing the log and grep all 5xx errors to a separate file and another shell script which will be triggered by cron will check the file if the error is repeated for number of times within a mentioned time.
My problem is the uRLs are not always same and so I have to grep the file which has the all 5XX errors and need to see if URLs are repeated and time also.
Can anyone suggest me some logic how i can filter the errors like. I tried to be clear but not sure if this is correct way of explaining the issue.
The logs are bit modified with values but format is same.
x.x.x.x, y.y.y.y - - [11/May/2016:08:29:05 +0800](0) "HTTPS" "GET /html/js/barebone.jsp?browserId=other&themeId=expressportal_WAR_expressportaltheme&colorSchemeId=01&minifierType=js&minifierBundleId=javascript.barebone.files&languageId=en_US&b=6200&t=1462268846000 HTTP/1.1" 502 319 "https://myportal.test.com/web/guest/home" "Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36"
x.x.x.x, y.y.y.y - - [11/May/2016:08:29:05 +0800](0) "HTTPS" "GET /combo/?browserId=other&minifierType=&languageId=en_US&b=6200&t=1462268846000&/html/js/aui/event-touch/event-touch-min.js&/html/js/aui/event-move/event-move-min.js HTTP/1.1" 502 319 "https://myportal.test.com/web/guest/home" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36"
x.x.x.x, y.y.y.y - - [11/May/2016:08:29:05 +0800](0) "HTTPS" "GET /html/js/liferay/available_languages.jsp?browserId=other&themeId=expressportal_WAR_expressportaltheme&colorSchemeId=01&minifierType=js&languageId=en_US&b=6200&t=1462268846000 HTTP/1.1" 502 319 "https://myportal.test.com/web/guest/home" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36"
x.x.x.x, y.y.y.y - - [11/May/2016:08:29:05 +0800](0) "HTTPS" "GET /combo/?browserId=other&minifierType=&languageId=en_US&b=6200&t=1462268846000&/html/js/aui/widget-stack/assets/skins/sam/widget-stack.css HTTP/1.1" 502 319 "https://myportal.test.com/web/guest/home" "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.94 Safari/537.36"
Are you 100% sure a restart fix the 500 errors ? If so, this line in the crontab should do:
tail -n 100 /var/log/apache2/error.logs | awk '{if ($9 >= 500) {nb += 1}} END {if (nb > 10) {exit 1}}' /var/log/apache2/access.log || service apache2 restart
It means that if there's more than 10 errors in the last 100 lines: restart. You may change the values for your specific problem.
First think I can think is: upgrade your Apache if it's not up to date.

How to remove more than 1 fields from one file using unix script

I have one file which has 2 '-' (hyphen or minus) symbol as fields. It has got 21 fields. I can count the position of those fields and those are $2 and $3. How would I remove such fields using unix shell script. Sample data is given below:
192.168.1.223 - - [15/Jul/2015:16:54:07 +0530] "GET / HTTP/1.1" 403 4954 "-" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2453.0 Safari/537.36"
192.168.1.223 - - [15/Jul/2015:16:54:08 +0530] "GET /icons/apache_pb.gif HTTP/1.1" 200 2326 "http://192.168.1.232/" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2453.0 Safari/537.36"
I want to remove '-' at $2 and $3 position.
But would like to add a point. These positions I can figure out since it is known to me. What if I do not know positions and no of fields are more. I just want to automate the system so that the code will search it in the line and remove it.
In short I want to write a script which will check each field of the given file and will remove those fields which are kind of junk characters like '-'(hyphen or minus).
The following works on my (Linux) machine to remove columns 2 and 3:
cut -d ' ' --complement -f 2,3
I don't know how standard/portable the --complement option is.
On the other hand, if you want to remove fields consisting of - no matter where they appear, try:
perl -pe 's/ -(?= )//g'

Strings manipulation and variables in bash script

I try to download something with wget using for loop in bash script:
When i'm not using variables everything work fine, when i assign it into variables i have 500 server error. This is strange for me, because this is only copy-paste.
What i'm trying to do is take number from loop i and paste it into body.
Here is my code:
#!/bin/bash
for i in {1..5}
do
STR="some_static_stuff_before"$i"some_static_suff_after"
echo $STR
wget -O ready/page$i.aspx --header="Host: www.something.com" --header="Pragma: no-cache" --header="Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" --header="Accept-Language: en-en" --header="Accept-Encoding: gzip, deflate" --header="Content-Type: application/x-www-form-urlencoded" --header="Origin: http://something.com" --header="Connection: keep-alive" --header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_1) AppleWebKit/537.73.11 (KHTML, like Gecko) Version/7.0.1 Safari/537.73.11" --header="Referer: http://www.something.com/something.aspx" --header="Cookie: ASP.NET_SessionId=u5cmt0figi4bvs40a30gnwsa; __utma=20059042.38323768.1389369038.1389710153.1389780868.6; __utmb=20059042.2.10.1389780868; __utmc=20059042; __utmz=20059042.1389627823.2.2.utmcsr=something.com|utmccn=(referral)|utmcmd=referral|utmcct=/something.aspx" --post-data='"$STR"' http://something.com/something.aspx
done
And when i paste object directly to --post-data there is no problem with download content.
I've tried --post-data= "/"$STR/"" and --post-data='"$STR"' and still not working.
You single-quoted the variable reference (in addition to double-quoting it), which prevents substitution of the variable value.
Instead of
--post-data='"$STR"'
use
--post-data="$STR"

Resources