Explain how this command finds the 5 most CPU intensive processes - bash

If I have the command
ps | sort -k 3,3 | tail -n 5
How is this find the 5 most CPU intensive processes?
I get that it is taking all the processes, sorting them based on a column through the -k option, but what does 3,3 mean?

You could read what you seek for from the official manual of sort (info sort in linux); in particular, you are interested in the following extracts:
‘-k POS1[,POS2]’
‘--key=POS1[,POS2]’
Specify a sort field that consists of the part of the line between
POS1 and POS2 (or the end of the line, if POS2 is omitted),
_inclusive_.
and, skipping a few paragraphs,
Example: To sort on the second field, use ‘--key=2,2’ (‘-k 2,2’).
See below for more notes on keys and more examples. See also the
‘--debug’ option to help determine the part of the line being used
in the sort.
So, basically, 3,3 emphasises that only the third column shall be considered for sorting, and the others will be ignored.

Related

Use cut and grep to separate data while printing multiple fields

I want to be able to separate data by weeks, and the week is stated in a specific field on every line and would like to know how to use grep, cut, or anything else that's relevant JUST on that field the week is specified in while still being able to save the rest of the data that's being given to me. I need to be able to pipe the information into it via | because that's how the rest of my program needs it to be.
as the output gets processed, it should look something like this
asset.14548.extension 0
asset.40795.extension 0
asset.98745.extension 1
I want to be able to sort those names by their week number while still being able to keep the asset name in my output because the number of times that asset shows up is counted up, but my problem is I can't make my program smart enough to take just the "1" from the week number but smart enough to ignore the "1" located in the asset name.
UPDATE
The closest answer I found was
grep "^.........................$week" ;
That's good, but it relies on every string being the same length. Is there a way I can have it start from the right instead of the left? Because if so then that'd answer my question.
^ tells grep to start checking from the left and . tells grep to ignore whatever's in that space
I found what I was looking for in some documentation. Anchor matches!
grep "$week$" file
would output this if $week was 0
asset.14548.extension 0
asset.40795.extension 0
I couldn't find my exact question or a closely similar question with a simple answer, so hopefully it helps the next person scratching their head on this.

Sort scientific and float

I have been trying desperately to use the command sort, to sort a mixture out of scientific and floating values which are both positive and negative, e.g.:
-2.0e+00
2.0e+01
2.0e+02
-3.0e-02
3.0e-03
3.0e-02
Without the floating point or without the scientific exponent, it works just fine with
sort -k1 -g file.dat. Using both at once as stated before, it results in:
-3.0e-02
-2.0e+00
2.0e+01
2.0e+02
3.0e-02
3.0e-03
This is obviously wrong since it should be:
-2.0e+00
-3.0e-02
3.0e-03
3.0e-02
...
Any idea how I can solve this issue? And once I solve this, is there any possibility to sort the absolute value (e.g. get rid of the negative ones)? I know I could try to square each value, sort, take the square root. Doing this I would be less precise though and it would be neat to have a nice, fast and straightforward way.
My linux system: 8.12, Copyright © 2011
Thank you very much!
UPDATE: if I run it in the debug mode sort -k1 -g filename.dat --debug I get the following result (I translated it into english, output was german)
sort: the sorting rules for „de_DE.UTF-8" are used
sort: key 1 is numerically and involves several fields
-3.0e-02
__
________
-2.0e+00
__
________
2.0e+01
_
_______
2.0e+02
_
_______
3.0e-02
_
_______
3.0e-03
_
_______
Based on comments under the question, this is a locale issue: sort is using a locale, which expects , as decimal separator, while your text has .. Ideal solution would to make sort use a different locale, and hopefully someone will write a correct answer covering that.
But, if you can't, or don't want to, change how sort works, then you can change the input it gets. This is easiest by making sort take its input from pipe, and modify it on the way. Here it is enough to change every . to ,, so the tool of choice is tr:
cat file.dat | tr . , | sort -k1 -g
This solution has one big drawback: if command is executed with locale where sort uses . as decimal separator, then instead of fixing, this will break the sorting. So if you are writing a shell script, which may be used elsewhere, don't do this.
Important note: Above command has unnecessary use of cat. Everybody who wants themselves to be taken seriously as professional shell script programmers, don't do that!

Find the most hit url in a large file

I was reading this Yelp interview on Glassdoor
"We have a fairly large log file, about 5GB. Each line of the log file contains an url which a user has visited on our site. We want to figure out what's the most popular 100 urls visited by our users. "
and one of the solution is
cat log | sort | uniq -c | sort -k2n | head 100
Can someone explain to me what is the purpose of the second sort (sort -k2n)?
Thanks!
It looks like the stages are:
1) get the log file into the filter
2) get identical filenames together
3) count the number of occurrences of each different filename
4) Sort the pairs (filename, number of occurrences) by number of occurrences
5) Print out the 100 more common filenames

Sort by specific column [duplicate]

This question already has answers here:
How to sort numeric and literal columns in Vim
(5 answers)
Closed 9 years ago.
If I have a text file with several tab-separated columns like this:
1 foo bar
3 bar foo
How would I sort based on the second or third column?
I read something like using :'<,'>!sort -n -k 2 in visual mode or :sort /.*\%2v/, but none of these commands seem to work.
You can use the built in sort command.
To sort by the second tab delimited column you can use :sort /[^\t]*\t/ to sort the second column.
To sort the third column you can use :sort /[^\t]*\t\{2}/
Generally just replace the number with the column number minus 1. (ie index columns with first column being index 0)
Sadly, it doesn't seem to be possible with the use of visual blocks inside the same file and/or with one command because :ex is linewise, i.e. Ctrl-v+ selection + :'<,'>sort would just sort the whole line either way.
A somewhat hacky "solution" would be to select whatever you want to sort with a visual block, sort it in another window and apply the changes to your original file. Something like this:
Ctrl-v + selection + x + :tabnew + p + :sort + Ctrl-vG$x + :q + `[P (align paste)
Source: Barry Arthur - Sort Me A Column (bairui from #vim#freenode).
The external sort, called via :'<,'>!sort -k 2 does work. Only if the -n flag (for numeric sorting) is given but the column you want to use is non-numeric, the result is not as expected. So to use the external sort, just drop -n in your example.
Remark: Also :'<,'>sort /.*\%2v/ does work for me.

Automatic update data file and display?

I am trying to develop a system where this application allows user to book ticket seat. I am trying to implement an automatic system(a function) where the app can choose the best seats for the user.
My current database(seats.txt) file store in this way(not sure if its a good format:
X000X
00000
0XXX0
where X means the seat is occupied, 0 means nothing.
After user login to my system, and choose the "Choose best for you", the user will be prompt to enter how many seats he/she want (I have done this part), now, if user enter: 2, I will check from first row, see if there is any empty seats, if yes, then I assign(this is a simple way, once I get this work, I will write a better "automatic-booking" algorithm)
I try to play with sed, awk, grep.. but it just cant work (I am new to bash programming, just learning bash 3 days ago).
Anyone can help?
FYI: The seats.txt format doesn't have to be that way. It can also be, store all seats in 1 row, like: X000X0XXX00XXX
Thanks =)
Here's something to get you started. This script reads in each seat from your file and displays if it's taken or empty, keeping track of the row and column number all the while.
#!/bin/bash
let ROW=1
let COL=1
# Read one character at a time into the variable $SEAT.
while read -n 1 SEAT; do
# Check if $SEAT is an X, 0, or other.
case "$SEAT" in
# Taken.
X) echo "Row $ROW, col $COL is taken"
let COL++
;;
# Empty.
0)
echo "Row $ROW, col $COL is EMPTY"
let COL++
;;
# Must be a new line ('\n').
*) let ROW++
let COL=1
;;
esac
done < seats.txt
Notice that we feed in seats.txt at the end of the script, not at the beginning. It's weird, but that's UNIX for ya. Curiously, the entire while loop behaves like one big command:
while read -n 1 SEAT; do {stuff}; done < seats.txt
The < at the end feeds in seats.txt to the loop as a whole, and specifically to the read command.
It's not really clear what help you're asking for here. "Anyone can help?" is a very broad question.
If you're asking if you're using the right tools then yes, the text processing tools (sed/awk/grep et al) are ideal for this given the initial requirement that it be done in bash in the first place. I'd personally choose a different baseline than bash but, if that's what you've decided, then your tool selection is okay.
I should mention that bash itself can do a lot of the things you'll probably be doing with the text processing tools and without the expense of starting up external processes. But, since you're using bash, I'm going to assume that performance is not your primary concern (don't get me wrong, bash will probably be fast enough for your purposes).
I would probably stick with the multi-line data representation for two reasons. The first is that simple text searches for two seats together will be easier if you keep the rows separate from each other. Otherwise, in the 5seat-by-2row XXXX00XXXX, a simplistic search would consider those two 0 seats together despite the fact they're nowhere near each other:
XXXX0
0XXXX
Secondly, some people consider the row to be very important. I won't sit in the first five rows at the local cinema simply because I have to keep moving my head to see all the action.
By way of example, you can get the front-most row with two consecutive seats with (commands are split for readability):
pax> cat seats.txt
X000X
00000
0XXX0
pax> expr $(
(echo '00000';cat seats.txt)
| grep -n 00
| tail -1
| sed 's/:.*//'
) - 1
2
The expr magic and extra echo are to ensure you get back 0 if no seats are available. And you can get the first position in that row with:
pax> cat seats.txt
| grep 00
| tail -1
| awk '{print index($0,"00")}'
3

Resources