Downloading text files only with wget - ftp

Using wget is it possible to spider a host for a specific file type? I'm archiving some documents from an FTP and I need to have it crawl the entire host only downloading .txt files.
I've attempted like so:
wget mysite.com/ftplist --config=./.wgetrc
With the following .wgetrc:
accept = txt
check_certificate = off
connect_timeout = 3
cookies = off
dns_cache = off
follow_ftp = on
logfile = amz.log
max_redirect = 3
no_clobber = on
recursive = on
save_headers = on
This will make a call to mysite.com/ftplist. This page contains ftp:// URL's in a list. wget makes a request to this page but won't proceed any further and seems to stop on that page.
Here is the amz.log
Saving to: ‘mysite.com/ftplinks/index.html.tmp’
0K .......... .......... .......... .......... .......... 656K
50K .......... .......... .......... .......... .......... 741K
100K .......... .......... .......... .......... .......... 1.12M
150K .......... .......... .......... .......... .......... 975K
200K .......... .......... .......... .......... .......... 935K
250K .......... .......... .......... .......... .......... 835K
300K .......... .......... .......... .......... .......... 870K
350K .......... .......... .......... .......... .......... 1.07M
400K .......... .......... .......... ....... 907K=0.5s
2018-12-20 17:55:54 (881 KB/s) - ‘mysite.com/ftplinks/index.html.tmp’ saved [447555]
Removing mysite.com/ftplinks/index.html.tmp since it should be rejected.
Am I missing something?

Related

Get Mbps from wget -o logged output

I'm using wget inside Python to test internet speed. My goal is to track the latency throughout the download, so I need to know the Mbps at least every second during the download.
If I manually run script and then wget I get the desired output ...
--2022-06-20 04:14:13-- https://speed.hetzner.de/100MB.bin
Resolving speed.hetzner.de (speed.hetzner.de)... 88.198.248.254
Connecting to speed.hetzner.de (speed.hetzner.de)|88.198.248.254|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/octet-stream]
Saving to: ‘100MB.bin.9’
100MB.bin.9 0%[ ] 0 --.-KB/s
100MB.bin.9 0%[ ] 119.69K 516KB/s
100MB.bin.9 0%[ ] 231.69K 488KB/s
100MB.bin.9 0%[ ] 343.69K 494KB/s
100MB.bin.9 0%[ ] 423.69K 447KB/s
100MB.bin.9 0%[ ] 519.69K 431KB/s
But if I run wget <address> -o wget.log I get the follwowing...
Resolving speed.hetzner.de (speed.hetzner.de)... 88.198.248.254
Connecting to speed.hetzner.de (speed.hetzner.de)|88.198.248.254|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 104857600 (100M) [application/octet-stream]
Saving to: ‘100MB.bin.7’
0K .......... .......... .......... .......... .......... 0% 1.22M 82s
50K .......... .......... .......... .......... .......... 0% 1.68M 71s
100K .......... .......... .......... .......... .......... 0% 1.70M 67s
150K .......... .......... .......... .......... .......... 0% 8.41M 53s
200K .......... .......... .......... .......... .......... 0% 3.89M 47s
(snip)
102250K .......... .......... .......... .......... .......... 99% 3.45M 0s
102300K .......... .......... .......... .......... .......... 99% 2.86M 0s
102350K .......... .......... .......... .......... ..........100% 3.22M 0s
102400K 100% 0.00 =30s
2022-06-20 03:44:11 (3.37 MB/s) - ‘100MB.bin.7’ saved [104857600/104857600]
What exactly does each column mean?
For example in the lines...
50K .......... .......... .......... .......... .......... 0% 1.68M 71s
100K .......... .......... .......... .......... .......... 0% 1.70M 67s
Does the 1.68M mean that the first 50 Kilo(bytes?) of data was downloaded at 1.68Mega(bits?) per second, and the 1.70M means the next 50K was at 1.70Mbps?
GNU wget has 2 distinct ways of representing progress: thermometer and dot, 1st is used when output is TTY (as is your first example), writing to file is non-TTY, so you need to instruct GNU wget to use thermometer implicitly if you want first style written to file, that is
wget --progress=bar:force <address> -o wget.log
for more detailed description see --progress in wget man page

Maven (with tycho) fails while materialize-products

I have an RCP Application which builds find on Eclipse 4.5.2 and 4.6.3; Today i tried to upgrade to Eclipse 2019-6;
when i try to manfest a product with maven (mvn clean install)
i get the following output:
[INFO] --- tycho-p2-repository-plugin:1.4.0:archive-repository (default-archive-repository) # at.biooffice.rcp.product ---
[INFO] Building zip: D:\WS18\git\at.biooffice.rcp.product\target\at.biooffice.rcp.product-3.0.1-SNAPSHOT.zip
[INFO]
[INFO] --- tycho-p2-director-plugin:1.4.0:materialize-products (materialize-products) # at.biooffice.rcp.product ---
[INFO] Installing product at.biooffice for environment win32/win32/x86_64 to D:\WS18\git\at.biooffice.rcp.product\target\products\at.biooffice\win32\win32\x86_64
Installing at.biooffice 3.0.1.201907011447.
Installation failed.
Cannot complete the install because one or more required items could not be found.
Software being installed: BioOffice 3.0.1.201907011447 (at.biooffice 3.0.1.201907011447)
Missing requirement: toolingwin32.win32.x86_64org.eclipse.equinox.ds 3.0.1.201907011447 requires 'osgi.bundle; org.eclipse.equinox.ds 1.6.0.v20190122-0806' but it could not be found
Cannot satisfy dependency:
From: BioOffice 3.0.1.201907011447 (at.biooffice 3.0.1.201907011447)
To: org.eclipse.equinox.p2.iu; toolingat.biooffice.configuration [3.0.1.201907011447,3.0.1.201907011447]
Cannot satisfy dependency:
From: toolingat.biooffice.configuration 3.0.1.201907011447
To: org.eclipse.equinox.p2.iu; toolingwin32.win32.x86_64org.eclipse.equinox.ds [3.0.1.201907011447,3.0.1.201907011447]
There were errors. See log file: D:\WS18\git\at.biooffice.parent\workspace\.metadata\.log
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] BioOffice 3.0.1-SNAPSHOT ........................... SUCCESS [ 2.483 s]
[INFO] microsoftsqlserver 2.0.0-SNAPSHOT .................. SUCCESS [ 0.590 s]
[INFO] org.eclipse.gemini.dbaccess.microsoftsqlserver 2.0.0-SNAPSHOT SUCCESS [ 0.869 s]
[INFO] com.mysql.jdbc 5.1.38-SNAPSHOT ..................... SUCCESS [ 0.170 s]
[INFO] org.eclipse.gemini.dbaccess.mysql 5.1.38-SNAPSHOT .. SUCCESS [ 0.298 s]
[INFO] at.biooffice.osgi.service.dialog 3.0.1-SNAPSHOT .... SUCCESS [ 0.269 s]
[INFO] lumo.osgi.service.notification 3.0.1-SNAPSHOT ...... SUCCESS [ 0.457 s]
[INFO] lumo.osgi.service.notification.impl 3.0.1-SNAPSHOT . SUCCESS [ 0.471 s]
[INFO] lumo.core.runtime 3.0.1-SNAPSHOT ................... SUCCESS [ 18.182 s]
[INFO] at.biooffice.osgi.service.eclipselink 3.0.1-SNAPSHOT SUCCESS [ 2.032 s]
[INFO] at.biooffice.rcp 3.0.1-SNAPSHOT .................... SUCCESS [ 2.129 s]
[INFO] at.biooffice.osgi.service.eclipselink.impl 3.0.1-SNAPSHOT SUCCESS [ 2.474 s]
[INFO] lumo.exports.csv 3.0.1-SNAPSHOT .................... SUCCESS [ 1.399 s]
[INFO] lumo.exports.mssql 3.0.1-SNAPSHOT .................. SUCCESS [ 1.387 s]
[INFO] lumo.exports.kml 3.0.1-SNAPSHOT .................... SUCCESS [ 1.365 s]
[INFO] lumo.exports.shp 3.0.1-SNAPSHOT .................... SUCCESS [ 1.209 s]
[INFO] lumo.exports.vcard 3.0.1-SNAPSHOT .................. SUCCESS [ 1.260 s]
[INFO] lumo.exports.taxaendangered 3.0.1-SNAPSHOT ......... SUCCESS [ 1.276 s]
[INFO] lumo.imports.xml.nls 3.0.1-SNAPSHOT ................ SUCCESS [ 1.272 s]
[INFO] lumo.osgi.service.multimedia 3.0.1-SNAPSHOT ........ SUCCESS [ 1.254 s]
[INFO] lumo.osgi.service.multimedia.impl 3.0.1-SNAPSHOT ... SUCCESS [ 1.380 s]
[INFO] at.biooffice.osgi.service.dialog.impl 3.0.1-SNAPSHOT SUCCESS [ 1.540 s]
[INFO] at.biooffice.update 3.0.1-SNAPSHOT ................. SUCCESS [ 1.262 s]
[INFO] at.biooffice.osgi.service.map 3.0.1-SNAPSHOT ....... SUCCESS [ 1.224 s]
[INFO] at.biooffice.osgi.service.map.impl 3.0.1-SNAPSHOT .. SUCCESS [ 1.537 s]
[INFO] at.biooffice.views.attachedliterature 3.0.1-SNAPSHOT SUCCESS [ 1.365 s]
[INFO] at.biooffice.views.bioobject 3.0.1-SNAPSHOT ........ SUCCESS [ 1.399 s]
[INFO] at.biooffice.common.admin 3.0.1-SNAPSHOT ........... SUCCESS [ 1.278 s]
[INFO] at.biooffice.views.collection 3.0.1-SNAPSHOT ....... SUCCESS [ 1.266 s]
[INFO] at.biooffice.views.contact 3.0.1-SNAPSHOT .......... SUCCESS [ 1.281 s]
[INFO] at.biooffice.views.determination 3.0.1-SNAPSHOT .... SUCCESS [ 1.265 s]
[INFO] at.biooffice.views.dataexchange 3.0.1-SNAPSHOT ..... SUCCESS [ 3.052 s]
[INFO] at.biooffice.views.excursion 3.0.1-SNAPSHOT ........ SUCCESS [ 1.257 s]
[INFO] at.biooffice.views.literature 3.0.1-SNAPSHOT ....... SUCCESS [ 1.263 s]
[INFO] at.biooffice.views.lookups 3.0.1-SNAPSHOT .......... SUCCESS [ 1.326 s]
[INFO] at.biooffice.views.multimedia 3.0.1-SNAPSHOT ....... SUCCESS [ 1.329 s]
[INFO] at.biooffice.views.report 3.0.1-SNAPSHOT ........... SUCCESS [ 2.193 s]
[INFO] at.biooffice.views.nls 3.0.1-SNAPSHOT .............. SUCCESS [ 1.284 s]
[INFO] at.biooffice.views.project 3.0.1-SNAPSHOT .......... SUCCESS [ 1.279 s]
[INFO] at.biooffice.views.querymanager 3.0.1-SNAPSHOT ..... SUCCESS [ 1.798 s]
[INFO] at.biooffice.views.site 3.0.1-SNAPSHOT ............. SUCCESS [ 1.313 s]
[INFO] at.biooffice.views.taxon 3.0.1-SNAPSHOT ............ SUCCESS [ 1.601 s]
[INFO] at.biooffice.views.welcome 3.0.1-SNAPSHOT .......... SUCCESS [ 1.339 s]
[INFO] at.biooffice.views.loan 3.0.1-SNAPSHOT ............. SUCCESS [ 1.375 s]
[INFO] at.biooffice.views.servicemonitor 3.0.1-SNAPSHOT ... SUCCESS [ 1.235 s]
[INFO] jre.win32.win32.x86_64 8.0.66 ...................... SUCCESS [ 2.631 s]
[INFO] at.biooffice.feature 3.0.1-SNAPSHOT ................ SUCCESS [ 0.967 s]
[INFO] at.biooffice.feature.admin 3.0.1-SNAPSHOT .......... SUCCESS [ 0.226 s]
[INFO] at.biooffice.feature.free.addons 3.0.1-SNAPSHOT .... SUCCESS [ 0.275 s]
[INFO] at.biooffice.rcp.product 3.0.1-SNAPSHOT ............ FAILURE [ 37.278 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 07:01 min
[INFO] Finished at: 2019-07-01T16:54:21+02:00
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.eclipse.tycho:tycho-p2-director-plugin:1.4.0:materialize-products (materialize-products) on project at.biooffice.rcp.product: Installation of product at.biooffice for environment win32/win32/x86_64 failed: Call to p2 director application failed with exit code 13. Program arguments were: [-metadataRepository, file:/D:/WS18/git/at.biooffice.rcp.product/target/,file:/D:/WS18/git/at.biooffice.rcp.product/target/targetPlatformRepository/, -artifactRepository, file:/D:/WS18/git/at.biooffice.rcp.product/target/,file:/resolution-context-artifacts#D%253A%255CWS18%255Cgit%255Cat.biooffice.rcp.product,file:/D:/WS18/git/at.biooffice.rcp.product/target/,file:/D:/WS18/git/at.biooffice.feature/target/,file:/D:/WS18/git/at.biooffice.feature.admin/target/,file:/D:/WS18/git/at.biooffice.feature.free.addons/target/,file:/D:/WS18/git/at.biooffice.common.admin/target/,file:/D:/WS18/git/lumo.core.runtime/target/,file:/D:/WS18/git/at.biooffice.osgi.service.dialog/target/,file:/D:/WS18/git/lumo.osgi.service.notification/target/,file:/D:/WS18/git/at.biooffice.views.lookups/target/,file:/D:/WS18/git/at.biooffice.views.nls/target/,file:/D:/WS18/git/at.biooffice.views.querymanager/target/,file:/D:/WS18/git/lumo.imports.xml.nls/target/,file:/D:/WS18/git/jre.win32.win32.x86_64/target/,file:/D:/WS18/git/lumo.osgi.service.multimedia/target/,file:/D:/WS18/git/lumo.osgi.service.multimedia.impl/target/,file:/D:/WS18/git/lumo.osgi.service.notification.impl/target/,file:/D:/WS18/git/microsoftsqlserver/target/,file:/D:/WS18/git/org.eclipse.gemini.dbaccess.microsoftsqlserver/target/,file:/D:/WS18/git/at.biooffice.osgi.service.dialog.impl/target/,file:/D:/WS18/git/at.biooffice.osgi.service.eclipselink/target/,file:/D:/WS18/git/at.biooffice.osgi.service.eclipselink.impl/target/,file:/D:/WS18/git/at.biooffice.views.bioobject/target/,file:/D:/WS18/git/at.biooffice.views.collection/target/,file:/D:/WS18/git/at.biooffice.views.contact/target/,file:/D:/WS18/git/at.biooffice.views.dataexchange/target/,file:/D:/WS18/git/at.biooffice.views.excursion/target/,file:/D:/WS18/git/at.biooffice.views.literature/target/,file:/D:/WS18/git/at.biooffice.views.multimedia/target/,file:/D:/WS18/git/at.biooffice.views.report/target/,file:/D:/WS18/git/at.biooffice.views.project/target/,file:/D:/WS18/git/at.biooffice.views.site/target/,file:/D:/WS18/git/at.biooffice.views.taxon/target/,file:/D:/WS18/git/at.biooffice.views.welcome/target/,file:/D:/WS18/git/at.biooffice.rcp/target/,file:/D:/WS18/git/at.biooffice.update/target/,file:/D:/WS18/git/at.biooffice.views.determination/target/,file:/D:/WS18/git/at.biooffice.views.attachedliterature/target/,file:/D:/WS18/git/at.biooffice.views.loan/target/,file:/D:/WS18/git/com.mysql.jdbc/target/,file:/D:/WS18/git/org.eclipse.gemini.dbaccess.mysql/target/,file:/D:/WS18/git/lumo.exports.kml/target/,file:/D:/WS18/git/lumo.exports.taxaendangered/target/,file:/D:/WS18/git/lumo.exports.vcard/target/,file:/D:/WS18/git/lumo.exports.shp/target/,file:/D:/WS18/git/lumo.exports.mssql/target/,file:/D:/WS18/git/lumo.exports.csv/target/,file:/D:/WS18/git/at.biooffice.osgi.service.map/target/,file:/D:/WS18/git/at.biooffice.osgi.service.map.impl/target/,file:/C:/Users/BlackPearl/.m2/repository/, -installIU, at.biooffice, -destination, D:\WS18\git\at.biooffice.rcp.product\target\products\at.biooffice\win32\win32\x86_64, -profile, DefaultProfile, -profileProperties, org.eclipse.update.install.features=true, -roaming, -p2.os, win32, -p2.ws, win32, -p2.arch, x86_64]. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn <goals> -rf :at.biooffice.rcp.product
so all bundles build successfully but the product does not!
which referrs to: org.eclipse.equinox.p2.iu
by definition i had both org.eclipse.rcp and org.eclipse.e4.rcp as requirement. this leads to this error
i tried removing org.eclipse.rcp
also org.eclipse.e4.rcp
but in the end nothing works.
any ideas how to fix this error?
It looks like you have an explicit dependency on the org.eclipse.equinox.ds plug-in. Current versions of Eclipse have dropped that plug-in and replaced it with the org.apache.felix.scr plug-in.

bash - getting a number from a stream

The typical output recorded into chk file from the command:
wget -O - http://website/file > /dev/null 2>chk &
is something like :
0K .......... .......... .......... .......... .......... 0% 143K 62s
50K .......... .......... .......... .......... .......... 1% 433K 41s
100K .......... .......... .......... .......... .......... 1% 1.20M 30s
150K .......... .......... .......... .......... .......... 2% 259K 31s
200K .......... .......... .......... .......... .......... 2% 83.2M 24s
...
8800K .......... .......... .......... .......... .......... 98% 260K 1s
8850K .......... .......... .......... .......... .......... 98% 329K 0s
8900K .......... .......... .......... .......... .......... 99% 433K 0s
8950K .......... .......... .......... .......... ......... 100% 331K=31s
2017-01-13 13:16:59 (288 KB/s) - written to stdout [9215609/9215609]
The file is updated, line after line, during the whole download process.
Well, I need to get only the percentage: 0, 1, 2 ... 99 and nothing more.
The following script do the job, even if not perfectly:
tail -n 5 chk | tail -n 1 | colrm 1 63 | cut -d '%' -f 1
The problem arises when I need to do the same into a bash script, as in the following:
#!/bin/bash
# Test script for getting the percentage number from 'wget' output
i=0
wget -O - http://website/file > /dev/null 2>chk &
sleep 1
while (( $i < 90 ))
do
i=`tail -n 5 chk | tail -n 1 | colrm 1 63 | cut -d '%' -f 1`
echo $i
done
The script starts getting the wanted file, it writes out the chk file, but stops with the error message:
line 9: ((: < 90 : syntax error: operand expected (error token is "< 90 ")
I have tried by using [[ ]], quotes... but doesn't work.
Any idea here to do a better job?
Progress bar with wget, whiptail and GNU sed:
wget --progress=dot 'URL' 2>&1 | sed -un 's/.* \([0-9]\+\)% .*/\1/p' | whiptail --gauge "Download" 7 50 0

where is stderr of wget look so different when writing to screen than to a file

For example, when using wget
$ wget https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz
The output looks like
--2016-03-05 20:01:58-- https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz
Resolving pypi.python.org (pypi.python.org)... 199.27.74.223
Connecting to pypi.python.org (pypi.python.org)|199.27.74.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 544247 (531K) [application/octet-stream]
Saving to: ‘Flask-0.10.1.tar.gz’
100%[====================================================================================================================================================================================================>] 544,247 2.38MB/s in 0.2s
2016-03-05 20:01:59 (2.38 MB/s) - ‘Flask-0.10.1.tar.gz’ saved [544247/544247]
But when I redirect it to a log
$ wget https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz &> tmp.log
$ cat tmp.log
--2016-03-05 20:02:54-- https://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz
Resolving pypi.python.org (pypi.python.org)... 199.27.74.223
Connecting to pypi.python.org (pypi.python.org)|199.27.74.223|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 544247 (531K) [application/octet-stream]
Saving to: ‘Flask-0.10.1.tar.gz.3’
0K .......... .......... .......... .......... .......... 9% 443K 1s
50K .......... .......... .......... .......... .......... 18% 409K 1s
100K .......... .......... .......... .......... .......... 28% 433K 1s
150K .......... .......... .......... .......... .......... 37% 374K 1s
200K .......... .......... .......... .......... .......... 47% 374K 1s
250K .......... .......... .......... .......... .......... 56% 338K 1s
300K .......... .......... .......... .......... .......... 65% 337K 0s
350K .......... .......... .......... .......... .......... 75% 241K 0s
400K .......... .......... .......... .......... .......... 84% 346K 0s
450K .......... .......... .......... .......... .......... 94% 384K 0s
500K .......... .......... .......... . 100% 693K=1.4s
2016-03-05 20:02:55 (369 KB/s) - ‘Flask-0.10.1.tar.gz.3’ saved [544247/544247]
I am being very curious, wondering what happened when writing to the screen?
How is the incremental appearance of those equal signs made possible, and where
are they gone when redireced to a log?
wget calls isatty() on stderr to decide whether or not to display the incremental equals signs progress bar for the download. This is convenient for many reasons as we can send terminal control characters to back up and erase the line and rewrite it. This is done by a different mechanism when writing to a file, and not possible when writing to a pipe.

sed won't replace text

I have this text file (wget.log):
1400K .......... .......... .......... .......... .......... 5% 78.5K 4m10s
1450K .......... .......... .......... .......... .......... 5% 46.6K 4m19s
1500K .......... .......... .......... .......... .......... 5% 105K 4m17s
1550K .......... .......... .......... .......... .......... 6% 63.0K 4m21s
1600K ..........
Since I just want to replace the .......... in the last 3 lines. I tried this command:
tail -n 3 /www/wget.log | sed 's/. /=>/g'
but it won't replace anything. I want the output to be like this:
1500K => 5% 105K 4m17s
1550K => 5% 105K 4m17s
1600K =>
How should I do that?
You can use:
tail -n 3 /www/wget.log | sed -r 's/(\.+ *)+/=> /'
1500K => 5% 105K 4m17s
1550K => 6% 63=> 0K 4m21s
1600K =>
On OSX use:
tail -n 3 /www/wget.log | sed -E 's/(\.+ *)+/=> /'
You could try the below sed command.
$ tail -n 3 /www/wget.log | sed 's/ \..*\.\( \|$\)/ => /g'
1500K => 5% 105K 4m17s
1550K => 6% 63.0K 4m21s
1600K =>

Resources