Is there an option in uniq -c (or an alternative) that doesn't add additional whitespaces around the count number? Currently I generally pipe it through sed, like so:
sort | uniq -c | sed 's/^ *\([0-9]*\) /\1 /'
But this seems kinda redundant, particularly given how frequently I have to do this.
You can try to make the sed command as short as possible with
sort | uniq -c | sed 's/^ *//'
If you have GNU grep, you can also use the -P flag:
sort | uniq -c | grep -Po '\d.*'
(Do not use awk '{$1=$1};1', it will trim more than you want)
When you need this often, you can make a function or script calling
sort | uniq -c | sed 's/^ *//'
or only
uniq -c | sed 's/^ *//'
Related
I have a grep command that find the files that need a value replaced. Then I have a perl one liner that needs to be executed on each file to replace a variables found in that file.
How can I pipe the results of my grep command to the perl one liner?
grep -Irc "/env/file1/" /env/scripts/ | cut -d':' -f1 | sort | uniq
/env/scripts/config/MainDocument.pl
/env/scripts/config/MainDocument.pl2
/env/scripts/config/MainDocument.pl2.bak
perl -p -i.bak -e 's{/env/file1/}{/env/file2/}g' /env/scripts/config/MainDocument.pl
Thanks for your help.
With the $(...) bash syntax.
perl -p -i.bak -e 's{/env/file1/}{/env/file2/}g' $(grep -Irc "/env/file1/" /env/scripts/ | cut -d':' -f1 | sort | uniq)
I'd forget the perl one liner to use xargs and sed instead.
grep -Irc "/env/file1/" /env/scripts/ | cut -d':' -f1 | sort | uniq | xargs sed -ibak ':/env/file1/:/env/file2/:'
My log files are in key-value format. I want to find value of a particular key on tail -f ..
Suppose one of the line in log is:
ts=2016-12-23-18-31-34-849 | deviceType=LENOVO Lenovo A6000 | elapsed=11 | firstHomePage=null | installId=37797b61-0bb1-4c1a-844c-5904c7e83de8 | ip=157.48.104.146
ts=2016-12-23-18-31-34-849 | deviceType=LENOVO Lenovo A6000 | elapsed=15 | firstHomePage=null | installId=37797b61-0bb1-4c1a-844c-5904c7e83de8 | ip=157.48.104.146
I am not sure how do I pipe output of my tail -f so that output should be following
11
15
Use GNU grep with the --line-buffered command to buffer stdout as it arrives in case of continuously growing file. The -o flag for matching only the pattern and -P to enable perl style regEx captures.
tail -f file | grep --line-buffered -oP "elapsed=\K(\d+)"
11
15
From the man grep page,
--line-buffered
Use line buffering on output.
Try grep:
tail log_file | grep -o '\<elapsed=[^[:space:]]*' | cut -d= -f2
awk -F'[=|]' '{print $6}' file
11
15
I have a couple OS that do not have sort -R to generate a random list from a txt file I have. For example, I am trying to use the following command:
sort -R file | head -20000 > newfile
I looked up the man pages in these OS and sure enough, the -R option is not listed.
What is an alternative that can generate a random list from a file and print to a new file?
CentOS 5
Try:
shuf file | head -n 20000 > newfile
or:
cat file | perl -MList::Util=shuffle -e 'print shuffle(<STDIN>);'
You can use the shuf command, if it is installed.
shuf can either take a file as its input
shuf file | head -n 20000 > newfile
or read from stdin
cat file | shuf | head -n 20000 > newfile
cat file | awk 'BEGIN{srand();}{print rand()"\t"$0}' | sort -k1 -n | cut -f2 | head -20000 > newfile
This is working out for me.
cat ALLEMAILS.txt | awk 'BEGIN{srand();}{print rand()"\t"$0}' | sort -k1 -n | cut -f2 | head -20000 | tee 20000random.txt
This for seeing progress.
I have a question about bash script, lets say there is file witch contains lines, each line will have path to a file and a date, the problem is how to find most frequent path.
Thanks in advance.
Here's a suggestion
$ cut -d' ' -f1 file.txt | sort | uniq -c | sort -rn | head -n1
# \_____________________/ \__/ \_____/ \______/ \_______/
# select the file column sort print sort on print top
# files counts count result
Example use:
$ cat file.txt
/home/admin/fileA jan:17:13:46:27:2015
/home/admin/fileB jan:17:13:46:27:2015
/home/admin/fileC jan:17:13:46:27:2015
/home/admin/fileA jan:17:13:46:27:2015
/home/admin/fileA jan:17:13:46:27:2015
$ cut -d' ' -f1 file.txt | sort | uniq -c | sort -rn | head -n1
3 /home/admin/fileA
You can strip out 3 from the final result by another cut.
Reverse the lines, cut the begginning (the date), reverse them again, then sort and count unique lines:
cat file.txt | rev | cut -b 22- | rev | sort | uniq -c
If you're absolutely sure you won't have whitespace in your paths, you can avoid rev altogether:
cat file.txt | cut -d " " -f 1 | sort | uniq -c
If the output is too long to inspect visually, aioobe's suggestion of following this with sort -rn | head -n1 will serve you well
It's worth noticing, as aioobe mentioned, that many unix commands optionally take a file argument. By using it, you can avoid the extra cat command in the beginning, by supplying its argument to the next command:
cat file.txt | rev | ... vs rev file.txt | ...
While I personally find the first option both easier to remember and understand, the second is preferred by many (most?) people, as it saves up system resources (specifically, the memory and references used by an additional process) and can have better performance in some specific use cases. Wikipedia's cat article discusses this in detail.
a newbie to shell programming here.
I have this codes so far:
prog inputfile outputfile1
sort -rn outputfile1 | cut -f1-2 > outputfile2
My question is there a way to pipe the outputfile directly from the first command to the second to get outputfile2, i.e. skipping the need to create an outputfile1? prog is a custom program that takes inputfile and outpufile names as parameters.
The closest thing I have found is substitution in shell, e.g.
sort <(ls dir)
But it's not really helpful in this case as I want to pipe the outputfile only and not the stdout.
Thanks for your help!
If I get you right — the opposite:
prog inputfile >(sort -rn | cut -f1-2 >outputfile)
depending on the prog you may use
prog inputfile /dev/stdout | sort -rn | cut -f1-2 >outputfile
or even
prog inputfile - | sort -rn | cut -f1-2 >outputfile