Adjust rate limit of a running pipe viewer (pv) process - bash

I am using pipe viewer (pv) to limit the transfer rate while uploading VM backups to an online storage. Here is how I use it in a bash script:
ssh root#xenserver "xe vm-export uuid=${CurrentSnapshotUUID} filename=" | ${gpgEncrypt} | pv --quiet --rate-limit 300k | /usr/local/bin/aws s3 cp - ${bucketS3}/${CurrentVM}_${TodayDate}.xva.gpg
This works like a charm, but I have a limitation that I cannot upload with 300 KByte/s during peak time. This causes excessive traffic which is pretty expensive.
Unfortunately, I cannot split the data into several parts and upload them one after the other. It's one huge data stream generated by the vm export that I need to process in one go. And I need to find a way to lower the rate limit at a certain time without interrupting pv.
Does anyone have an idea how I can achieve this?
Cheers,
Rob

Thanks to Andrew Wood, the author of pv, I found the answer to my question. You can change the rate limit of a remote pv session with PID 123 like this:
pv --remote 123 --rate-limit 200k
What a cool feature. Case closed!

Related

Limiting number of CPU cores used for indexing

Is there a way to limit how many CPU cores Xcode can use to index code in the background?
I write code in emacs but I run my app from Xcode, because the debugger is pretty great. The problem is that in emacs I use rtags for indexing, which already needs a lot of CPU, and then Xcode wants to do the same. Basically whenever I touch a common header file, my computer is in big trouble...
I like this question, it presents hacky problem-solving :)
Not sure if this would work (not sure how to force Xcode to index) but here are some thoughts that might set you on the right track: there's a tool called cpulimit that you can use to slow down processes (it inserts a sleep or something into a given process, I used it when experimenting with mining crypto).
If you can figure out the process ID of the indexing daemon, maybe you can cpulimit it!
I'd start by running ps -A | grep -i xcode before and after indexing occurs to see what's changed (if anything), or using Activity Monitor to see what spikes (/Applications/Xcode10.1.app/Contents/SharedFrameworks/DVTSourceControl.framework/Versions/A/XPCServices/com.apple.dt.Xcode.sourcecontrol.WorkingCopyScanner.xpc/Contents/MacOS/com.apple.dt.Xcode.sourcecontrol.WorkingCopyScanner looks interesting)
There is a -i or --include-children param on cpulimit that should take care of this, but not sure how well it works in practice.
I made a script /usr/local/bin/xthrottle;
#!/bin/ksh
PID=$(pgrep -f Xcode | head -n 1)
sudo renice 10 $PID
You can play with the nice level, -20 is least nice, 20 is nicest for your neighbour processes.

Nice / IOnice which one first? Does it matter? Any other way to reduce server load on a script?

I've been trying the "nicer" way to run a gzip from a bash script on an active server, but it somehow manages to get the load average above what I would wish it would be.
Which of the following would be softer on the I/O and the CPU?
Is there another way I'm not aware of?
/usr/bin/nice -n 19 /usr/bin/ionice -c2 -n7 gzip -9 -q foo*
or
/usr/bin/ionice -c2 -n7 /usr/bin/nice -n 19 gzip -9 -q foo*
Also, is there another commands such as ulimit that would help reducing the load on the server?
I'm not familiar with the ionice thing, but nice just means that if another process want's to use the CPU, then the nice process will be more willing to wait a bit.
The CPU load is unaffected by this since it's just a measure of the length of the "run queue", which will be the same.
I'm guessing it'll be the same with ionice, but affecting disk load.
So, the "niceness" only affects how willing your process is to allow others to go before you in the queue, but in the end the load will be the same because the CPU/disk has to carry out the job.
ANALOGY: Think of a person behind a checkout counter as an analogy. They still has to process the queue, but the people in the queue may be nice to each other and let others pass before them to the counter. The "load" is the length of that queue.

get progress of a file being uploaded - unix

i have a requirement to monitor the progress of a file being uploaded using script. In putty(software) we can view the Percentage Uplaod, Bytes transferred , Upload Speed and ETA in the right hand side. I want to develop a similar functionality. Is there any way to achieve this?
Your question is lacking any information of how your file is transferred. Most clients have some way to display progress, but that is depending on the individual client used (scp, sftp, ncftp, ...).
But there is a way to monitor progress independently on what is progressing: pv (pipe viewer).
This tool has the sole purpose of generating monitoring information. It can be used in a way much similar to cat. You either use it to "lift" a file to pv's stdout....
pv -petar <file> | ...
...or you use it in the middle of a pipe -- but you need to manually provide the "expected size" in order to get a proper progress bar, since pv cannot determine the size of the transfer beforehand. I used 2 Gigabyte expected size here (-s 2G)...
cat <file> | pv -petar -s 2G | ...
The options used are -p (progress bar), -e (ETA), -t (elapsed time), -a (average rate), and -r (current rate). Together they make for a nice mnemonic.
Other nice options:
-L, which can be used to limit the maximum rate in the pipe (throttle).
-W, to make pv wait until data is actually transferred before showing a progress bar (e.g. if the tool you are piping the data to will require a password first).
This is most likely not what you're looking for (since chances are the transfer client you're using has its own options for showing progress), but it's the one tool I know that could work for virtually any kind of data transfer, and it might help others visiting this question in the future.

Windows Azure virtual machine is slow to access network when scaling

I'm running some startup scripts (cmd/bat) on my small azure VM which include a file-transfer operation from a mounted VHD, and normally it finishes in about 3 minutes (copying files and extracting ~500Mb zip file with command-line 7z).
When I scale out to ~150 instances, the same operation is very slow (up to 15 minutes in total, most of which is used by 7z). Also, the nodes which are the slowest to complete the bootup procedure are very hard to access at first using mstsc(animation is lagging and takes a lot of time to log in), but that might not be related.
What could be the problem?
We had the idea to examine the cache, but it would be nice to know of any other potential bottleneck which may be present in the following situation.
UPDATE:
I tried extracting on the D:\ drive instead of extracting it on the C:\ and while scaling to 200, the unzip takes about a minute! So it seems like the problem is that C:\ might be on the blob. But again, I have 3GB of data in 40 files, so 60MB/s per blob should be enough to handle it. Or - can it be the case that we have a cap for the all blobs?
The VM sizes each have their own bandwidth limitations.
| VM Size | Bandwidth |
| ------------- |:-------------:|
| Extra Small | 5 (Mbps) |
| Small | 100 (Mbps) |
| Medium | 200 (Mbps) |
| Large | 400 (Mbps) |
| Extra Large | 800 (Mbps) |
I suspect you always have one copy of your mounted VHD and have ~150 instances hitting it. Increasing the VM size of the VM hosting the VHD would be a good test but an expensive solution. Longer term put the files in blob storage. That means modifying your scripts to access RESTful endpoints.
It might be easiest to create 2-3 drives on 2-3 different VMs and write a script that ensures they have the same files. Your scripts could randomly hit one of the 2-3 mounted VHDs to spread out the load.
Here are the most recent limitations per VM size. Unfortunately this table doesn't include network bandwidth: http://msdn.microsoft.com/en-us/library/windowsazure/dn197896.aspx
-Rich
p.s. I got the bandwidths from a PowerPoint slide in the Microsoft provided Azure Training Kit dated January 2013.
one thing to consider is the per-storage-account scalability target of a storage account. With georeplication enabled, you have 10Gbps egress and 20K transactions/sec, which you could be bumping into. Figure with 150 instances, you could potentially be pulling 150 x 100Mbps, or 15Gbps as all of your instances are starting up.
Not sure about the "mounted VHD" part of your question. With Azure's drive-mounting, only one virtual machine instance can mount to a drive at any given time. For this type of file-copy operation, typically you'd grab a file directly from a storage blob, rather than a file stored in a vhd (which, in turn, is stored in a page blob).
EDIT: Just wanted to mention that an individual blob is limited to 60MB/sec (also mentioned in the blog post I referenced). This could also be related to your throttling.

Analyze Local Network Traffic, Update Quota with tshark and BASH [duplicate]

This question already has answers here:
How do I calculate network utilization for both transmit and receive
(2 answers)
Closed 2 years ago.
I have a slightly weird problem and I really hope someone can help with this:
I go to university and the wireless network here issues every login a certain quota/week (mine is 2GB). This means that every week, I am only allowed to access 2GB of the Internet - my uploads and downloads together must total at most 2GB (I am allowed access to a webpage that tells me my remaining quota). I'm usually allowed a few grace KB but let's not consider that for this problem.
My laptop runs Ubuntu and has the conky system monitor installed, which I've configured to display (among other things, ) my remaining wireless quota. Originally, I had conky hit the webpage and grep for my remaining quota. However, since my conky refreshes every 5 seconds and I'm on the wireless connection for upwards of 12 hours, the checking of the webpage itself kills my wireless quota.
To solve this problem, I figured I could do one of two things:
Hit the webpage much less frequently so that doing so doesn't kill my quota.
Monitor the wireless traffic at my wireless card and keep subtracting it from 2GB
(1) is what I've done so far: I setup a cron job to hit the webpage every minute and store the result in file on my local filesystem. Conky then reads this file - no need for it to hit the webpage; no loss of wireless quota thanks to conky.
This solution is a win by a factor of 12, which is still not enough. However, I'm a fan of realtime data and will not reduce the cron frequency further.
So, the only other solution that I have is (2). This is when I found out about wireshark and it's commandline version tshark. Now, here's what I think I should do:
daemonize tshark
set tshark to monitor the amount (in KB or B or MB - I can convert this later) of traffic flowing through my wireless card
keep appending this traffic information to file1
sum up the traffic information in the file1 and subtract it from 2GB. Store the result in file2
set conky to read file2 - that is my remaining quota
setup a cron job to delete/erase_the_contents_of file1 every Monday at 6.30AM (that's when the weekly quota resets)
At long last, my questions:
Do you see a better way to do this?
If not, how do I setup tshark to make it do what I want? What other scripts might I need?
If it helps, the website tells me my remaining quota is KB
I've already looked at the tshark man page, which unfortunately makes little sense to me, being the network-n00b that I am.
Thank you in advance.
Interesting question. I've no experience using tshark, so personally I would approach this using iptables.
Looking at:
[root#home ~]# iptables -nvxL | grep -E "Chain (INPUT|OUTPUT)"
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
Chain OUTPUT (policy ACCEPT 9763462 packets, 1610901292 bytes)
we see that iptables keeps a tally of the the bytes that passes through the each chain. So one can presumably go about monitoring your bandwidth usage by:
When your system starts up, retrieve your remaining quota from the web
Zero the byte tally in iptables (Use the -z option)
Every X seconds, get usage from iptables and deduct from quota
Here are some examples of using iptables for IP accounting.
Caveats
There are some drawbacks to this approach. First of all you need root access to run iptables, which means you need conky running as root, or run a cron daemon which writes the current values to a file which conky has access to.
Also, not all INPUT/OUTPUT packets may count towards your bandwidth allocation, e.g. intranet access, DNS, etc. One can filter out only relevant connections by matching them and placing them in a separate iptables chain (examples in the link given above). An easier approach (if the disparity is not too large) would be to occasionally grab your real time quota from the web, reset your values and start again.
It also gets a little tricky when you have existing iptables rules which are either complicated or uses custom chains. You'll then need some knowledge of iptables to retrieve the right values.

Resources