I use grep to find the string "Converged?" with the terminal in several folders.
My command is
grep -r -i -A4 Converged?
What I get as output is in minimal example this:
start_struc.log: Item Value Threshold Converged?
start_struc.log- Maximum Force 0.000022 0.000450 YES
start_struc.log- RMS Force 0.000005 0.000300 YES
start_struc.log- Maximum Displacement 0.010813 0.001800 NO
start_struc.log- RMS Displacement 0.002734 0.001200 NO
--
start_struc.log: Item Value Threshold Converged?
start_struc.log- Maximum Force 0.000001 0.000450 YES
start_struc.log- RMS Force 0.000000 0.000300 YES
start_struc.log- Maximum Displacement 0.001210 0.001800 YES
start_struc.log- RMS Displacement 0.000312 0.001200 YES
But I just want the last time grep finds Converged? with the next four Lines.
I looked up different internet forums and the manual of grep but I think I did not find a flag I can use. Because the problem is, I get up to 50 hits before and I don't want to print them out in terminal.
Has someone an idea?
Thanks in advance
Related
Summary:
I need to count all unique lines in all .txt files in a HDFS instance.
Total size of .txt files ~450GB.
I use this bash command:
hdfs dfs -cat /<top-level-dir>/<sub-dir>/*/*/.txt | cut -d , -f 1 | sort --parallel=<some-number> | uniq | wc -l
The problem is that this command takes all free ram and the HDFS instance exits with code 137 (out of memory).
Question:
Is there any way I can limit the ram usage of this entire command to let's say half of what's free in the hdfs OR somehow clean the memory while the command is still running?
Update:
I need to remove | sort | because it is a merge sort implementation so O(n) space complexity.
I can use only | uniq | without | sort |.
Some things you can try to limit sort's memory consumption:
Use sort -u instead of sort | uniq. That way sort has a chance to remove duplicates on the spot instead of having to keep them until the end. 🞵
Write the input to a file and sort the file instead of running sort in a pipe. Sorting pipes is slower than sorting files and I assume that sorting pipes requires more memory than sorting files:
hdfs ... | cut -d, -f1 > input && sort -u ... input | wc -l
Set the buffer size manually using -S 2G. The size buffer is shared between all threads. The size specified here roughly equals the overall memory consumption when running sort.
Change the temporary directory using -T /some/dir/different/from/tmp. On many linux systems /tmp is a ramdisk so be sure to use the actual hard drive.
If the hard disk is not an option you could also try --compress-program=PROG to compress sort's temporary files. I'd recommend a fast compression algorithm like lz4.
Reduce parallelism using --parallel=N as more threads need more memory. With a small buffer too much threads are less efficient.
Merge at most two temporary files at once using --batch-size=2.
🞵 I assumed that sort was smart enough to immediately remove sequential duplicates in the unsorted input. However, from my experiments it seems that (at least) sort (GNU coreutils) 8.31 does not.
If you know that your input contains a lot of sequential duplicates as in the input generated by the following commands …
yes a | head -c 10m > input
yes b | head -c 10m >> input
yes a | head -c 10m >> input
yes b | head -c 10m >> input
… then you can drastically save resources on sort by using uniq first:
# takes 6 seconds and 2'010'212 kB of memory
sort -u input
# takes less than 1 second and 3'904 kB of memory
uniq input > preprocessed-input &&
sort -u preprocessed-input
Times and memory usage were measured using GNU time 1.9-2 (often installed in /usr/bin/time) and its -v option. My system has an Intel Core i5 M 520 (two cores + hyper-threading) and 8 GB memory.
Reduce number of sorts run in parallel.
From info sort:
--parallel=N: Set the number of sorts run in parallel to N. By default, N is set
to the number of available processors, but limited to 8, as there
are diminishing performance gains after that. Note also that using
N threads increases the memory usage by a factor of log N.
it runs out of memory.
From man sort:
--batch-size=NMERGE
merge at most NMERGE inputs at once; for more use temp files
--compress-program=PROG
compress temporaries with PROG; decompress them with PROG -d-T,
-S, --buffer-size=SIZE
use SIZE for main memory buffer
-T, --temporary-directory=DIR
use DIR for temporaries, not $TMPDIR or /tmp; multiple options
specify multiple directories
These are the options you could be looking into. Specify a temporary directory on the disc and specify buffer size ex. 1GB. So like sort -u -T "$HOME"/tmp -S 1G.
Also as advised in other answers, use sort -u instead of sort | uniq.
Is there any way I can limit the ram usage of this entire command to let's say half of what's free in the hdfs
Kind-of, use -S option. You could sort -S "$(free -t | awk '/Total/{print $4}')".
I am using Veins simulator, for creating cars with path I am using the following commands:
python c:/DLR/Sumo/tools/randomTrips.py -n test.net.xml -e 1200 -l
python c:/DLR/Sumo/tools/randomTrips.py -n test.net.xml -r test.rou.xml -e 1200 -l
This command generated 1200 vehicles for 1200 sec of simulation but I want to generated 1200 vehicles for 100 sec of simulation. How can I do that?
Just to answer this (mainly rephrasing what Julian Heinovski said in the comments)
randomTrips.py -n net.net.xml -o passenger.trips.xml -e 100
will generate 100 trips. If you want to make sure all of them are possible (connected in the network) you can add --validate but this will remove invalid trips (and you may end up with less than 100). You can simply play around with the number then.
To let all of them start at second 0 you can edit the trips file using a regular expression replacement, replacing all departure times with 0s. On *nix the following will probably do:
sed -i 's/depart="[0-9]*/depart="0/' passenger.trips.xml
Now you can start sumo for the period of your choice
sumo -n net.net.xml -r passenger.trips.xml -e 1200
Number of vehicles randomly distributed over specified period of time can be generated using --begin, --end and --period options.
For example to generate 1200 vehicles for 100 sec of simulation, following command can be used:
python randomTrips.py -n net.net.xml -r net.rou.xml -o net.trip.xml --begin=0 --end=100 --period=0.083333
In short,
number of generated vehicles = (end - begin) / period
I need help with my bash script.
The task is counting total size of files in directory. I already did it ( using ls, awk and grep). My output may look like this for example:
1326
40
598
258
12
$
These numbers means size of files in directory.
I need to count them all and I stuck here.
So I would be really grateful if someone could tell me how to count them all (and find the total size of files in directory)
Thank you
well, in Unix shell programing, never forget the most basic philosophy, being:
Keep It Simple, Stupid!
which is the French for Use the right tool that does one thing, but does it well. You can achieve to do what you want with a mix of ls or find, and grep, and awk, and cut, and sed and …, or you can use the tool that has been designed for calculating files sizes.
And that tool is du:
% du -chs /directory
4.3G /directory
4.3G total
Though, it will give the total size of every file within every directory of the given path. If you want to limit it to just the files within the directory (and not the ones below), you can do:
% du -chsS /directory
3G /directory
3G total
For more details, refer to the manual page [man du], and here are the arguments I'm using in the answer:
-c, --total produce a grand total
-h, --human-readable print sizes in human readable format (e.g., 1K 234M 2G)
-s, --summarize display only a total for each argument
if you remove -s you'll have the size details for each file of the directory, if you remove -h you'll have full size in bytes (instead of rounding into a more readable form), if you remove -c you won't have the grand total (i.e. the total line at the end).
HTH
awk to the rescue!
awk '$1+0==$1{sum+=$1; count++} END{print sum, count}'
adds up and counts all the numbers ($1+0==$1 for a number, but not for a string) and print them the sum and count when done.
I have a text file with a series of floating point numbers – one per line – like so:
1
0.98
1.21
0.68
0.647
0.1
More specifically: I generate these lines using an awk call.
How would I go about extracting the largest of these numbers in a single call? Bonus points for extracting the top n values.
Try this cat your_filename | sort -n | head -1
Read about head - you can pass number of how many lines you want to display.
Does it solve your problem?
I'm trying to create a simple bash script to monitor the following: CPU Utilization, outbound network bandwidth, and inbound network bandwidth. The kicker, I have to use information from /proc/loadavg for the CPU and information from /proc for the bandwidth.
For the CPU Utilization, because it is supposed to be on a short time interval, I can use the first value from /proc/loadavg. Thing is, I'm not sure how to just get that one value so what I have so far is this:
CPU=sudo cat /proc/loadavg | sed 's///'
echo "CPU Utilization: $CPU %"
Where I'm not sure what the sed operation should be. Also I'm not sure how to format what I would get from that so that it would print as "16.5%"
For the bandwidth monitors I haven't the slightest clue of what I could use in /proc to get that sort of information so I'm open to all suggestions.
Load average
You don't need sudo to read /proc/loadavg
In addition, sed is the wrong tool here, try using cut, for example:
$ cut -d' ' -f1 < /proc/loadavg
0.04
cut will cut lines by a delimiter (given with -d), in this case a space, and you can then use -f to select a field, in this case the first one.
Now, converting it to percentages is actually fairly meaningless, since you'll often end up above 100% (see comment below), I've seen load averages in excess of 50 (that would be 5000% percent?).
In all my years of UNIX/Linux experience, I can't recall ever seeing the load average being expressed as a percentage, and if I would encounter such a thing, I would find it very odd.
But if you really want to (you don't!), just multiply by 100 with dc, like so:
$ dc -e "`cut -d' ' -f1 < /proc/loadavg` 100 * p"
29.00
For the CPU Utilization, because it is supposed to be on a short time
interval, I can use the first value from /proc/loadavg.
The load average is not the same thing as CPU usage.
A load average of 1 means there is one process waiting for something (usually the CPU or disk).
A load average of 2 means there are two processes waiting.
A load average of 0.5 (over the last minute), can mean that for 30 seconds, there was one process waiting, and for 30 seconds, there were no processes waiting. It can also mean that for 15 seconds there were two processes waiting, and for 45 seconds there were no processes waiting. The keyword here is average.
If you want to get the CPU utilization, then this is probably the most portable way:
$ top -bn2 | grep "Cpu(s)" | \
tail -n1 | \
sed "s/.*, *\([0-9.]*\)%* id.*/\1/" | \
awk '{print 100 - $1"%"}'
Note you need to use -n2 to get fairly accurate results.
I've adapted this from this answer, which also lists some other possibilities, some simpler, but most tools mentioned aren't installed by default on most systems.
Network
For the bandwidth monitors I haven't the slightest clue of what I
could use in /proc to get that sort of information so I'm open to all
suggestions.
You can use the output of ifconfig, for example, on my system:
ens33: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.178.28 netmask 255.255.255.0 broadcast 192.168.178.255
inet6 2001:980:82cd:1:20c:29ff:fe9e:c84b prefixlen 128 scopeid 0x0<global>
inet6 fe80::20c:29ff:fe9e:c84b prefixlen 64 scopeid 0x20<link>
ether 00:0c:29:9e:c8:4b txqueuelen 1000 (Ethernet)
RX packets 45891 bytes 36176865 (34.5 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 20802 bytes 2603821 (2.4 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
It's the RX packets & TX packets we want. Let's extract just those values:
$ ifconfig ens33 | grep -E '(R|T)X packets' | grep -Eo '\([0-9].*\)' | tr -d '()'
34.5 MiB
2.5 MiB
First we grep all the lines starting with RX or TX
With those lines, we then grep for a parenthesis \(, followed by a number [0-9], followed by any characters .*, followed by a closing parenthesis \). With the -o flag we show only the matching part, instead of the whole line.
With tr, we remove the unwanted parentheses.
This should be what you want. If you want to get a number of bytes, you can use a different grep pattern in the second grep. I'll leave it as an exercise to you what exactly that is.
Here's how you can print the first number output by cat /proc/loadavg as a percent value (but see #Carpetsmoker's caveat regarding whether that makes sense), rounded to 1 decimal place:
printf "1-minute load average: %.1f%%\n" \
$(bc <<<"$(cut -d ' ' -f 1 /proc/loadavg) * 100")