Parallel curl (or wget) downloads [closed]

Parallel curl (or wget) downloads [closed] - bash

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
I have a file with 2192 urls, one on each line. I am trying to download them in parallel like this:
cat urls.txt | tr -d '\r' | xargs -P 8 -n 1 curl -s -LJO -n -c ~/.urs_cookies -b ~/.urs_cookies
However, after counting all of the files once they are downloaded ls -1 | wc -l, I only have 1400 files. I know that the URLs are all properly formatted (they were autogenerated by the website where I am downloading the data from).
I can rerun the above command and get a few more files each time, but this is not sufficient. Further, downloading the files one at a time would be an option, but the server takes about 30 seconds to respond to the request, but each file only takes about 2 seconds to download. I have at least 5 files with 2192 URLs each. I would very much like to do a parallel download.
Can anyone help me figure out why parallel downloads would stop early?

If you're okay with a (slightly) different tool, may I recommend using GNU Wget2? It is the spiritual successor to GNU Wget. It is already available in the Debian and OpenSUSE repositories and on the AUR
Wget2 provides multi-threaded downloads out of the box with a nice progress bar to view the current status. It also supports HTTP/2 and many other newer features that were nearly impossible to add into Wget.
See my answer here: https://stackoverflow.com/a/49386440/952658 for some more details.
With Wget2, you can simply run $wget2 -i urls.txt and it will start downloading your files in parallel.
EDIT: As mentioned in the other answer, a disclaimer: I maintain both Wget and Wget2. So I'm clearly biased towards this tool

Related

Where does the file ''$'\033\033\033' come from in Linux? [closed]

Closed. This question is not about programming or software development. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 11 days ago.
Improve this question
In my directory at a Linux server I have discovered a file with such a strange name.
From the command history I can track that it was probably created by this command:
sudo docker logs <container_id> -n 200000 | less
I suspect I have entered some combination of letters in less (probably starting with s to save a file).
Do you know what exactly has happened?
P.S. If you want to remove such a file, see How to escape the escape character in bash?

I have discovered that such a file is created when you type s in a piped less and then you are asked to enter the log file name. If you type triple Escape and then Enter, you will get such a file.
The command s is actually helpful to save the contents of a piped less.

Ubuntu terminal removing multiple partial files using wildcard [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 12 months ago.
Improve this question
This maybe easy on linux users, but I am having a hard time figuring out how to delete multiple files (partial files) using wildcard.
sudo rm logs/archived/remove_me.2022.* or sudo rm logs/archived/remove_me.2022.? seems not to work.
I am getting the error rm: cannot remove 'logs/archived/remove_me.*': No such file or directory
I am currently on /var/lib/tomcat8/ trying to remove these logs inside logs/archived.
I am remove them one by one but there are a lot of files to remove .. (example. from 2020 and there are daily and in partial files).
Example I am inside /var/lib/tomcat8/logs/archived/ and I want to remove all log files starting with remove_me.2021.*
Below are the sample list of files that I want to remove.There are also other files in this directory that should not be removed.
remove_me.2022-03-02.1.log.gz
remove_me.2022-03-02.2.log.gz
remove_me.2022-03-02.3.log.gz
remove_me.2022-03-02.4.log.gz
remove_me.2022-03-02.5.log.gz
remove_me.2022-03-03.1.log
remove_me.2022-03-03.2.log
remove_me.2022-03-03.3.log
remove_me.2022-03-03.4.log
remove_me.2022-03-03.5.log
remove_me.2022-03-03.6.log
remove_me.2022-03-03.7.log
remove_me.2022-03-03.8.log
remove_me.2022-03-03.9.log
remove_me.2022-03-03.10.log

I believe the issue here is that the asterisk (*) is resolved by the current user, i.e., before becoming the superuser. Hence, it resolves to nothing, because the current user is not able to even read that directory.
Solve this by becoming superuser first, and then doing everything normally:
sudo -i
cd /var/lib/tomcat8/logs/archived/
rm remove_me.2022.*

wget recursive not working as expected [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
Wondering if I am overlooking the obvious
I am trying to use
wget -rl 0 -A "*.fna.gz" ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/Acinetobacter_nosocomialis/all_assembly_versions
To download all the files in all the directories contained in ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/Acinetobacter_nosocomialis/all_assembly_versions/ that match *.fna.gz
If you visit the above link, you will see a list of directories starting with GCA. I want all the files in those directories that match *.fna.gz but I get nothing when I run the command. I'm wondering if wget is not recognizing the GCA* directories as directories, and this is the problem? Or is there something wrong with my wget command?
I am suspicious because when I try to download the directories with FileZilla I get:
GCA_000248315.2_ASM24831v2: Not a regular file
Error: Critical file transfer error

These are not directories but links to somewhere else. There is no information in the file listing which gives the type of the target file, i.e. if directory or plain file or whatever. Thus wget will probably assume plain file and not follow it.

Apparently this isn't working as expected because of a bug on the server which displays symbolic links to directories as ordinary files. Thus as #Steffen Ullrich mentioned, "There is no information in the file listing which gives the type of the target file, i.e. if directory or plain file or whatever. Thus wget will probably assume plain file and not follow it." Thanks to codesquid_ on the FileZilla IRC for the clarification.
Follow up question regarding a work around at https://stackoverflow.com/questions/35307325/recursive-wget-cant-get-files-within-symbolic-link-directories

wget.exe for windows 10 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I'm attempting to download wget because a provisioner I am using can't retrieve certain information without it (naturally this issue doesn't come up at work but at work, I have a Mac, at home, I have a 64bit windows 10 machine). I have tried the wget.exe files from https://eternallybored.org/misc/wget/ and SourceForge but no luck. Are there any current issues with wget on Windows 10, if not does anyone have any ideas what my issues may be?

eternallybored build will crash when you are downloading a large file.
This can be avoided by disabling LFH (Low Fragmentation Heap) by GlobalFlag registry.

GNU Wget is a free network utility to retrieve files from the World Wide Web using HTTP and FTP, the two most widely used Internet protocols.
It works great for me on win 10
Screenshot

I use it on Windows 10 without issue. Don't know which version it is. The digital signature in the properties says March 19, 2015. I have it in a folder on the c drive called ab and I use:
c:\ab\wget --no-check-certificate https://myURL -O c:\ab\Save_Name.txt
and it saves the file as Save_Name.txt

Linux Solaris - How to bypass zip 2GB file-size limit [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I work on a Solaris (32-bit) platform. Package zip version : 2.3.
We have a bash script which compress a lot and large XML files. However, we have the following error: File too large & "File size limit exceeded"
We can not upgrade the kernel or the zip package or change the archive format.
I would like to know if it's possible with a bash script to generate several archives zip files :
Begin compression.For example, if the archive size reaches 1.8Go, the script starts one second archive ...
if it is possible, can you please, how I can set up ?
Thanks for your help
Best Regard,

if your zip command recognizes "-" as a special file, then you can zip files and send it to split command:
user#solaris> zip -r - /my/file-*.xml | split -b 2000000000
then transfer all x* files to another machine and concatenate them into single zip file:
user#linux$ cat from/solaris/x* > myxmlfiles.zip

Since you are only zipping XML files, it should be safe to split them first, zip each file, and then on the other end unzip and concatenate them.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio