Unable to download data using Aspera

Unable to download data using Aspera - download

I am trying to download data from the European Nucleotide Archive (ENA) using Aspera CLI however my downloads are getting stalled. I have downloaded several files earlier using the same tool but this is happening since last one month. I usually use the following command:
ascp -QT -P33001 -k 1 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp#fasp.sra.ebi.ac.uk:/vol1/fastq/ERR192/009/ERR1924229/ERR1924229.fastq.gz .
From a post on Beta Science, I learnt that this might be due to not limiting the download speed and hence tried usng the -l argument but was of no help.
ascp -QT -l 300m -P33001 -k 1 -i ~/.aspera/connect/etc/asperaweb_id_dsa.openssh era-fasp#fasp.sra.ebi.ac.uk:/vol1/fastq/ERR192/009/ERR1924229/ERR1924229.fastq.gz .

Your command works.
you might be overdriving your local network ?
how much bandwidth do you have ?
here "-l 300m" sets a target rate to 300 Mbps, if you have less than 30, this can cause such problems.
try to reduce the target rate to what you actually have.
(using wired ? Wifi ?)

Related

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

I seem to be having trouble properly combining thousands of netCDF files (42000+) (3gb in size, for this particular folder/variable). The main variable that i want to combine has a structure of (6, 127, 118) i.e (time,lat,lon)
Im appending each file 1 by 1 since the number of files is too long.
I have tried:
for i in input_source/**/**/*.nc; do ncrcat -A -h append_output.nc $i append_output.nc ; done
but this method seems to be really slow (order of kb/s and seems to be getting slower as more files are appended) and is also giving a warning:
ncrcat: WARNING Intra-file non-monotonicity. Record coordinate "forecast_period" does not monotonically increase between (input file file1.nc record indices: 17, 18) (output file file1.nc record indices 17, 18) record coordinate values 6.000000, 1.000000
that basically just increases the variable "forecast_period" 1-6 n-times. n = 42000files. i.e. [1,2,3,4,5,6,1,2,3,4,5,6......n]
And despite this warning i can still open the file and ncrcat does what its supposed to, it is just slow, at-least for this particular method
I have also tried adding in the option:
--no_tmp_fl
but this gives an eror:
ERROR: nco__open() unable to open file "append_output.nc"
full error attached below
If it helps, im using wsl and ubuntu in windows 10.
Im new to bash and any comments would be much appreciated.

Either of these commands should work:
ncrcat --no_tmp_fl -h *.nc
or
ls input_source/**/**/*.nc | ncrcat --no_tmp_fl -h append_output.nc
Your original command is slow because you open and close the output files N times. These commands open it once, fill-it up, then close it.

I would use CDO for this task. Given the huge number of files it is recommended to first sort them on time (assuming you want to merge them along the time axis). After that, you can use
cdo cat *.nc outfile

UFTP is not working as expected

I am using UFTP to transfer files within the subnetwork computers.
But when I used -H to send only particular computers instead of sending to all computers, it is not working as expected.
Let me explain in detail :
I have two windows machines in same network of IP's 172.21.170.198,172.21.181.216 respectively.
From one of the system, I used below mentioned command to send the file
uftp.exe -R 100000 -H 172.21.170.198,172.21.181.216 e:\setup.exe
But both machines won't receive those file.
But if I use this command both machines will receive the file.
uftp.exe -R 100000 E:\setup.exe
I want to know whether I made any mistake.
Please correct me if I am wrong.
Thanks in Advance.
Kindly revert back for any clarifications.
Regards,
Thiyagu

If ipv6 isn't enabled, it would look like this, converting the ipv4 addresses to hex (with a converter like http://www.kloth.net/services/iplocate.php):
uftp.exe -R 100000 -H 0xAC15AAC6,0xAC15B5D8 e:\setup.exe
But if you have an ipv6 address on the client, the client id sort of comes from the end of it backwards. Like if the address was "fe80::e5ca:e3ca:fea3:153f%5", the command would look like:
uftp.exe -R 100000 -H 0x3f15a3fe e:\setup.exe
(coming from "fe a3 15 3f")

Curl range not working(downloads entire file)

curl -v -r 0-500 http://somefile -o localfile
It should download just the first 501 bytes, no? Instead, it just downloads the entire thing. All 67 megabytes. Thanks curl! Could my companies proxy servers be blocking this feature somehow? I am skeptical about that, since the downloads themselves do work, just not the range feature. Am I missing something?

As a client you could always abort the download when you have received what you want.
By using head, you will be able to limit the download to 500 bytes, even if the server does not accept the range-header
curl -v -r 0-500 http://somefile |head -c 500 > localfile

It should download just the first 501 bytes, no?
It depends on the server. From man curl:
You should also be aware that many HTTP/1.1 servers do not have this feature enabled, so that when you attempt to get a range, you'll instead get the whole document.
As you can see in the response from the server, it's using HTTP/1.1. So it's not surprising that the range feature is not supported at the server side.

Please use the following command
curl -H "range: bytes=354-500" -O http://example.com/file.extension

ab (Apache Bench) error: apr_poll: The timeout specified has expired (70007) on Windows

I'm load testing IIS 7.5 (WinR2/SP1) from my Windows 7/SP1 client. I have a script that makes three ab calls like:
start /B cmd /c ab.exe -k -n 500 -c 50 http://rhvwr2vsu410/HelloWebAPI/Home/SyncProducts > SyncProducts.txt
When the concurrency is > 5, I soon get the error message
apr_poll: The timeout specified has expired (70007)
And ab stops making requests. I don't even get to Completed 100 requests.
This happens within 30 seconds of starting my script. The ab documentation page doesn't provide much. Related Stack Overflow question. Server Fault related question .

You must have the 2.4 version and use -s timeout option.
Edit:
https://www.wampserver.com/ - includes Apache 2.4.x Win32 and Win64.
Deprecated but still available however I not known until when and just not available:
You can use my win32-x86 binary (compiled under Visual Studio 2008 from trunk 8 Feb 2013):
http://mars.iti.pk.edu.pl/~nkg/ab-standalone.exe (no longer available)
http://mars.iti.pk.edu.pl/~nkg/ab-standalone-src.zip (no longer available)
I was made it using: http://code.google.com/p/apachebench-standalone/wiki/HowToBuild and
http://ftp.ps.pl/pub/apache//apr/binaries/win32/apr-1.3.6-iconv-1.2.1-util-1.3.8-win32-x86-msvcrt60.zip (just not available).

ab --help
-s timeout Seconds to max. wait for each response
Default is 30 seconds
Add option: -s 120 to ab command, Where 120 is new timeout. If it is not enough set it even higher...

ab --help
-s timeout Seconds to max. wait for each response
Default is 30 seconds
-k Use HTTP KeepAlive feature
It works for me

Sounds like an ab bug.
I had a similar problem on OS X (now that you mention it happens on Windows, I feel more confident that ab is the culprit). I went around profiling and tracing my web application, but couldn't find anything. I then tested static pages off of nginx, and it still gave me the error. So I then went and found a replacement... jMeter. Works great, but I would still like to know what the ab problem is.

How to detect non-busy machines over a LAN automatically?

I'm writing an MPI program to be run over a local area network. These machines can be ssh'd to by any student at any time.
Although I always test my program at night, the performance has been very inconsistent. My guess is that some nodes were busy when I ran the program.
So my question is: can I write a script to detect non-busy machines and update the machine file? What's an easy way to write it?
Thanks a lot.

SSH into each machine, then read the /proc/loadavg file or determine the "business" in some other way.

I think the easiest way would be installing the check_load[1] script from Nagios to every node you want to check and call it via ssh with some sensible parameters:
# /usr/lib64/nagios/plugins/check_load -w 1,2,3 -c 3,4,5
OK - load average: 0.20, 0.43, 0.50|load1=0.200;1.000;3.000;0; load5=0.430;2.000;4.000;0; load15=0.500;3.000;5.000;0;
# /usr/lib64/nagios/plugins/check_load -w 0.1,2,3 -c 3,4,5
WARNING - load average: 0.18, 0.43, 0.50|load1=0.180;0.100;3.000;0; load5=0.430;2.000;4.000;0; load15=0.500;3.000;5.000;0;
# /usr/lib64/nagios/plugins/check_load -w 0.01,2,3 -c
0.1,4,5
CRITICAL - load average: 0.41, 0.46, 0.51|load1=0.410;0.010;0.100;0; load5=0.460;2.000;4.000;0; load15=0.510;3.000;5.000;0;
CRITICAL would mean "really busy", WARNING could be "is kinda busy" and OK would mean "the machine is idle".
You have to pay attention for the tresholds you have to give as 1/5/15 minute for warning and critical; for instance, a machine with 16 cores having a load of 3 is perfectly ok, while a load of 3 on a single-core machine would mean it's really really busy.
Good luck!
Alex.
[1] http://nagiosplugins.org/man/check_load

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Unable to download data using Aspera - download

Your command works. you might be overdriving your local network ? how much bandwidth do you have ? here "-l 300m" sets a target rate to 300 Mbps, if you have less than 30, this can cause such problems. try to reduce the target rate to what you actually have. (using wired ? Wifi ?)

Related

Faster way of Appending/combining thousands (42000) of netCDF files in NCO

UFTP is not working as expected

Curl range not working(downloads entire file)

ab (Apache Bench) error: apr_poll: The timeout specified has expired (70007) on Windows

How to detect non-busy machines over a LAN automatically?

Categories

Resources