the command wc's function - shell

I encouter many shell script in these days using command wc which is really awesome.For example:
For example, in the last 20 hours or so, 16 new users registered with FAS:
$ cat messages | grep \
'fedora-infrastructure:org.fedoraproject.prod.fas.user.create'|wc -l
16 /*output*/
calculate the lines of code in the . directory
find . -type f -exec cat '{}' \; | wc -l
I'm a newbie to Linux,so I want to now what kind of amazing stuff wc can do(with other commands support) not only basic usage come from man page

(Edit: It sounds like you might benefit most from understanding what pipes are, and how they work)
Try this on your command line:
man wc
It gives you the manual page for wc, with which you can find out all the things you can do with the tool.
And here are a few basic idioms / mental models to know when you are starting out learning Linux:
wc is one tool. There are many tools you can directly interact with in your [shell][2], such as: ls (lists contents of current directory), pwd (prints your current working directory), date (prints the current time), and more advanced ones such as awk, sed, grep, tr.
The | syntax (but not ||!) is called a "pipe":
By default, commands can either read from stdin (standard input) or read from file, or don't need to read anything at all (ls, for example, doesn't require input)
So when you do something like ls | wc -l, it's actually saying:
run ls
take the output of ls, which would normally be written to stdout (standard output), and "pipe it" directly into the stdin of wc, which, together with the -l option to wc, counts the number of lines.
There is no exhaustive list of other commands that wc can interact with, due to the pipes-and-filters paradigm in shell languages. Anything you see something like ... | wc, it just means that whatever output by the program before the | is fed to wc as input.

Unix tools are designed to be very simple, things like wc are very good examples of this.
wc = Word count
wc -l = Line count
In unix you can direct output from one command to another using |
The idea is that you combine these small tools together to achieve results way beyond their individual capacities.
find
Is a very powerful command with many options, but essentially finds filenames for files matching the predicates and options specified.
man
man is a help system built into unix, just type man wc to get info about wc you can use it for most commands available from the command line, with only a few exceptions.
Zsh, Antigen & Oh-My-Zsh
Unix is a lot easier with a good shell, and helpful tools. I recommend you use Zsh, the easy way to get this setup is to use Antigen and Oh-my-zsh (Antigen will help install Oh-My-Zsh and a bunch of other tools, so follow it's instructions.)
Once you have it setup, you have Tab auto-completion, showing commmands, command options (for many tools, such as git, find etc... etc..)
This will very quickly transform the murky darkness of the shell into a vibrant environment to work in.
How to learn cool combinations of commands?
Well, to get started, always remember you have basic looping and conditions available on the unix shell.
loops
while
for
until
select
conditions
if
case
These commands usually work with filename patterns, e.g. *.txt and also often work with piped | input.
For example, if you wanted to perform a complex operation, let's say rename a set of files replacing a given pattern:
for f in *; mv $a ${a/bak/}
Would remove the word bak from all the filenames in the current folder.\
Hidden gold
There are two commands sed and awk which are almost languages in their own right. It is not necessary to know them inside out, and many things they do can be replicated by simpler commands, such as tr and cut. However, understanding how they basically work is a very handy thing to know. In fact, Sed and Awk are so powerful, they even have an O'Reilly book dedicated to them.
Where to find examples of Unix command line awesomeness?
A good place to look for examples is command-line fu
Good luck

Related

How to not lose color when pipe output to variable [duplicate]

If I do
$ ls -l --color=always
I get a list of files inside the directory with some nice colouring for different file types etc..
Now, I want to be able to pipe the coloured output of ls through grep to filter out some files I don't need. The key is that I still want to preserve the colouring after the grep filter.
$ ls -l --color=always | grep -E some_regex
^ I lose the colouring after grep
EDIT: I'm using headless-server Ubuntu 8.10, Bash 3.2.39, pretty much a stock install with no fancy configs
Your grep is probably removing ls's color codes because it has its own coloring turned on.
You "could" do this:
ls -l --color=always | grep --color=never pattern
However, it is very important that you understand what exactly you're grepping here. Not only is grepping ls unnecessary (use a glob instead), this particular case is grepping through not only filenames and file stats, but also through the color codes added by ls!
The real answer to your question is: Don't grep it. There is never a need to pipe ls into anything or capture its output. ls is only intended for human interpretation (eg. to look at in an interactive shell only, and for this purpose it is extremely handy, of course). As mentioned before, you can filter what files ls enumerates by using globs:
ls -l *.txt # Show all files with filenames ending with `.txt'.
ls -l !(foo).txt # Show all files with filenames that end on `.txt' but aren't `foo.txt'. (This requires `shopt -s extglob` to be on, you can put it in ~/.bashrc)
I highly recommend you read these two excellent documents on the matter:
Explanation of the badness of parsing ls: http://mywiki.wooledge.org/ParsingLs
The power of globs: http://mywiki.wooledge.org/glob
You should check if you are really using the "real" ls, just by directly calling the binary:
/bin/ls ....
Because: The code you described really should work, unless ls ignores --color=always for some weird reason or bug.
I suspect some alias or function that adds (directly or through a variable) some options. Double-check that this isn't the case.

Unix Epoch to date with sed

I wanna change unix epoch to normal date
i'm trying:
sed < file.json -e 's/\([0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]/`date -r \1`/g'
any hint?
With the lack of information from your post, I can not give you a better answer than this but it is possible to execute commands using sed!
You have different ways to do it you can use
directly sed e instruction followed by the command to be
executed, if you do not pass a command to e then it will treat the content of the pattern buffer as external command.
use a simple substitute command with sed and pipe the output to sh
Example 1:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/;e"
Example 2:
echo 12687278 | sed "s/\([0-9]\{8,\}\)/date -d #\1/" |sh
Test 1 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Test 2 (with Japanese locale LC_TIME=ja_JP.UTF-8):
Remarks:
I will let you adapt the date command accordingly to your system specifications
Since modern dates are longer than 8 characters, the sed command uses an
open ended length specifier of at least 8, rather than exactly 8.
Allan has a nice way to tackle dynamic arguments: write a script dynamically and pipe it to a shell! It works. It tends to be a bit more insecure because you could potentially pipe unintentional shell components to sh - for example if rm -f some-important-file was in the file along with the numbers , the sed pipeline wouldn't change that line, and it would also be passed to sh along with the date commands. Obviously, this is only a concern if you don't control the input. But mistakes can happen.
A similar method I much prefer is with xargs. It's a bit of a head trip for new users, but very powerful. The idea behind xargs is that it takes its input from its standard in, then adds it to the command comprised of its own non-option arguments and runs the command(s). For instance,
$ echo -e "/tmp\n/usr/lib" | xargs ls -d
/tmp /usr/lib
Its a trivial example of course, but you can see more exactly how this works by adding an echo:
echo -e "/tmp\n/usr/lib" | xargs echo ls -d
ls -d /tmp /usr/lib
The input to xargs becomes the additional arguments to the command specified in xargs's own arguments. Read that twice if necessary, or better yet, fiddle with this powerful tool, and the light bulb should come on.
Here's how I would approach what you're doing. Of course I'm not sure if this is actually a logical thing to do in your case, but given the detail you went into in your question, it's the best I can do.
$ cat dates.txt
Dates:
1517363346
I can run a command like this:
$ sed -ne '/^[0-9]\{8,\}$/ p' < dates.txt | xargs -I % -n 1 date -d #%
Tue Jan 30 19:49:06 CST 2018
Makes sense, because I used the commnad echo -e "Dates:\ndate +%s" > dates.txt to make the file a few minutes before I wrote this post! Let's go through it together and I'll break down what I'm doing here.
For one thing, I'm running sed with -n. This tells it not to print the lines by default. That makes this script work if not every line has an 8+ digit "date" in it. I also added anchors to the start (^) and end ($) of the regex so the line had only the approprate digits ( I realize this may not be perfect for you, but without understanding your its input, I can't do better ). These are important changes if your file is not entirely comprised of date strings. Additionally, I am matching at least 8 characters, as modern date strings are going to be more like 10 characters long. Finally, I added a command p to sed. This tells it to print the matching lines, which is necessary because I specifically said not to print the nonmatching lines.
The next bit is the xargs iteslf. The sed will write a date string out to xargs's standard input. I set only a few settings for xargs. By default it will add the standard input to the end of the command, separated by a space. I didn't want a space, so I used -I to specify a replacement string. % doesn't have a special meaning; its just a placeholder that gets replaced with the input. I used % because its not a special character but rarely is used in commands. Finally, I added -n 1 to make sure only 1 input was used per execution of date. ( xargs can also add many inputs together, as in my ls example above).
The end result? Sed matches lines that consist, exclusively, of 8 or more numeric values, outputting the matching lines. The pipe then sends this output to xargs, which takes each line separately (-n 1) and, replacing the placeholder (-I %) with each match, then executes the date command.
This is a shell pattern I really like, and use every day, and with some clever tweaks, can be very powerful. I encourage anyone who uses linux shell to get to know xargs right away.
There is another option for GNU sed users. While the BSD land folks were pretty true to their old BSD unix roots, the GNU folks, who wrote their userspace from scratch, added many wonderful enhancements to the standards. GNU Sed can apparently run a subshell command for you and then do the replacement for you, which would be dramatically easier. Since you are using the bsd style date invocation, I'm going to assume you don't have gnu sed at your disposal.
Using sed: tested with macOs only
There is a slight difference with the command date that should use the flag (-r) instead of (-d) exclusive to macOS
echo 12687278 | sed "s/\([0-9]\{8,\}\)/$(date -r \1)/g"
Results:
Thu Jan 1 09:00:01 JST 1970

grep - how to output progress bar or status

Sometimes I'm grep-ing thousands of files and it'd be nice to see some kind of progress (bar or status).
I know this is not trivial because grep outputs the search results to STDOUT and my default workflow is that I output the results to a file and would like the progress bar/status to be output to STDOUT or STDERR .
Would this require modifying source code of grep?
Ideal command is:
grep -e "STRING" --results="FILE.txt"
and the progress:
[curr file being searched], number x/total number of files
written to STDOUT or STDERR
This wouldn't necessarily require modifying grep, although you could probably get a more accurate progress bar with such a modification.
If you are grepping "thousands of files" with a single invocation of grep, it is most likely that you are using the -r option to recursively a directory structure. In that case, it is not even clear that grep knows how many files it will examine, because I believe it starts examining files before it explores the entire directory structure. Exploring the directory structure first would probably increase the total scan time (and, indeed, there is always a cost to producing progress reports, which is why few traditional Unix utilities do this.)
In any case, a simple but slightly inaccurate progress bar could be obtained by constructing the complete list of files to be scanned and then feeding them to grep in batches of some size, maybe 100, or maybe based on the total size of the batch. Small batches would allow for more accurate progress reports but they would also increase overhead since they would require additional grep process start-up, and the process start-up time can be more than grepping a small file. The progress report would be updated for each batch of files, so you would want to choose a batch size that gave you regular updates without increasing overhead too much. Basing the batch size on the total size of the files (using, for example, stat to get the filesize) would make the progress report more exact but add an additional cost to process startup.
One advantage of this strategy is that you could also run two or more greps in parallel, which might speed the process up a bit.
In broad terms, a simple script (which just divides the files by count, not by size, and which doesn't attempt to parallelize).
# Requires bash 4 and Gnu grep
shopt -s globstar
files=(**)
total=${#files[#]}
for ((i=0; i<total; i+=100)); do
echo $i/$total >>/dev/stderr
grep -d skip -e "$pattern" "${files[#]:i:100}" >>results.txt
done
For simplicity, I use a globstar (**) to safely put all the files in an array. If your version of bash is too old, then you can do it by looping over the output of find, but that's not very efficient if you have lots of files. Unfortunately, there is no way that I know of to write a globstar expression which only matches files. (**/ only matches directories.) Fortunately, GNU grep provides the -d skip option which silently skips directories. That means that the file count will be slightly inaccurate, since directories will be counted, but it probably doesn't make much difference.
You probably will want to make the progress report cleaner by using some console codes. The above is just to get you started.
The simplest way to divide that into different processes would be to just divide the list into X different segments and run X different for loops, each with a different starting point. However, they probably won't all finish at the same time so that is sub-optimal. A better solution is GNU parallel. You might do something like this:
find . -type f -print0 |
parallel --progress -L 100 -m -j 4 grep -e "$pattern" > results.txt
(Here -L 100 specifies that up to 100 files should be given to each grep instance, and -j 4 specifies four parallel processes. I just pulled those numbers out of the air; you'll probably want to adjust them.)
Try the parallel program
find * -name \*.[ch] | parallel -j5 --bar '(grep grep-string {})' > output-file
Though I found this to be slower than a simple
find * -name \*.[ch] | xargs grep grep-string > output-file
This command show the progress (speed and offset), but not the total amount. This could be manually estimated however.
dd if=/input/file bs=1c skip=<offset> | pv | grep -aob "<string>"
I'm pretty sure you would need to alter the grep source code. And those changes would be huge.
Currently grep does not know how many lines a file as until it's finished parsing the whole file. For your requirement it would need to parse the file 2 times or a least determine the full line count any other way.
The first time it would determine the line count for the progress bar. The second time it would actually do the work an search for your pattern.
This would not only increase the runtime but violate one of the main UNIX philosophies.
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new "features". (source)
There might be other tools out there for your need, but afaik grep won't fit here.
I normaly use something like this:
grep | tee "FILE.txt" | cat -n | sed 's/^/match: /;s/$/ /' | tr '\n' '\r' 1>&2
It is not perfect, as it does only display the matches, and if they to long or differ to much in length there are errors, but it should provide you with the general idea.
Or a simple dots:
grep | tee "FILE.txt" | sed 's/.*//' | tr '\n' '.' 1>&2

BASH file attribute gymnastics: How do I easily get a file with full paths and privileges?

Dear Masters of The Command Line,
I have a directory tree for which I want to generate a file that contains on two entries per line: full path for each file and the corresponding privileges of said file.
For example, one line might contain:
/v1.6.0.24/lib/mylib.jar -r-xr-xr-x
The best way to generate the left hand column there appears to be find. However, because ls doesn't seem to have a capability to either read a list of filenames or take stdin, it looks like I have to resort to a script that does this for me. ...Cumbersome.
I was sure I've seen people somehow get find to run a command against each file found but I must be daft this morning as I can't seem to figure it out!
Anyone?
In terms of reading said file there might be spaces in filenames, so it sure would be nice if there was a way to get some of the existing command-line tools to count fields right to left. For example, we have cut. However, cut is left-hand-first and won't take a negative number to mean start the numbering on the right (as seems the most obvious syntax to me). ... Without having to write a program to do it, are there any easy ways?
Thanks in advance, and especial thanks for explaining any examples you may provide!
Thanks,
RT
GNU findutils 4.2.5+:
find -printf "$PWD"'/%p %M\n'
It can also be done with ls and awk:
ls -l -d $PWD/* | awk '{print $9 " " $1}' > my_files.txt
stat -c %A file
Will print file permissions for file.
Something like:
find . -exec echo -ne '{}\t\t' ';' -exec stat -c %A {} ';'
Will give you a badly formatted version of what your after.
It is made much trickier because you want everything aligned in tables. You might want to look into the 'column' command. TBH I would just relax my output requirements a little bit. Formatting output in SH is a pain in the ass.
bash 4
shopt -s globstar
for file in /path/**
do
stat -c "%n %A" "$file"
done

Preserve ls colouring after grep'ing

If I do
$ ls -l --color=always
I get a list of files inside the directory with some nice colouring for different file types etc..
Now, I want to be able to pipe the coloured output of ls through grep to filter out some files I don't need. The key is that I still want to preserve the colouring after the grep filter.
$ ls -l --color=always | grep -E some_regex
^ I lose the colouring after grep
EDIT: I'm using headless-server Ubuntu 8.10, Bash 3.2.39, pretty much a stock install with no fancy configs
Your grep is probably removing ls's color codes because it has its own coloring turned on.
You "could" do this:
ls -l --color=always | grep --color=never pattern
However, it is very important that you understand what exactly you're grepping here. Not only is grepping ls unnecessary (use a glob instead), this particular case is grepping through not only filenames and file stats, but also through the color codes added by ls!
The real answer to your question is: Don't grep it. There is never a need to pipe ls into anything or capture its output. ls is only intended for human interpretation (eg. to look at in an interactive shell only, and for this purpose it is extremely handy, of course). As mentioned before, you can filter what files ls enumerates by using globs:
ls -l *.txt # Show all files with filenames ending with `.txt'.
ls -l !(foo).txt # Show all files with filenames that end on `.txt' but aren't `foo.txt'. (This requires `shopt -s extglob` to be on, you can put it in ~/.bashrc)
I highly recommend you read these two excellent documents on the matter:
Explanation of the badness of parsing ls: http://mywiki.wooledge.org/ParsingLs
The power of globs: http://mywiki.wooledge.org/glob
You should check if you are really using the "real" ls, just by directly calling the binary:
/bin/ls ....
Because: The code you described really should work, unless ls ignores --color=always for some weird reason or bug.
I suspect some alias or function that adds (directly or through a variable) some options. Double-check that this isn't the case.

Resources