Suppose my C++ program has outputted a lot of stuff to the terminal, say a 10000x3 matrix.
Is there any Unix command-line utility for me to inspect the lines which contain a desired number/string?
In other words if the output looks like
1.23 4.56 7.89
1.54 9.86 7.78
6.78 1.23 9.86
4.56 6.77 8.98
9.86 3.45 7.54
Some unix command should search this output for 9.86 and print only the lines containing this number .
Try using mycppprogram | grep '9.86'
grep is your friend : man grep
Related
This question already has answers here:
Count the number of times a word appears in a file
(3 answers)
Closed 4 years ago.
I am a Bash & Terminal NEWBIE. I have been given the task of counting the number of entries of a specific area code using a single-line Bash Terminal command. Can you please point me in the right direction to achieving this goal? I've been using a bash scripting cheat sheet but i'm not familiar enough with bash commands to create a script to iterate and count the number of times [213] appears in file:
If you are looking for the string 123 anywhere in the file, then:
grep -c 123 file # counts 123 4123 41235 etc
If you are looking for the "word" 123, then:
grep -wc 123 file # counts 123 /123/ #123# etc., but not 1234 4123 ...
If you want multiple occurrences of the word on the same line to be counted separately, then use the -o option:
grep -ow 123 file | wc -l
See also:
Confused about word boundary on Unix & Linux Stack Exchange
grep -o '213' filename | wc -l
In the future, you should try searching for general forms of your command. You would have found a number of similar questions
See man grep. grep has a count option.
So you want to run grep -c 213 file.
Following awk may help you here too.(It will look for string 213 anywhere in the line(s) of Input_file)
awk /213/{count++} END{print count}' Input_file
In case you want to look for only those lines which have exactly have digit 213 then use following.
awk /^213$/{count++} END{print count}' Input_file
I'm trying to perform simple literal search/replace on a large (30G) one-line file, using sed.
I would expect this to take some time but, when I run it, it returns after a few seconds and, when I look at the generated file, it's zero length.
input file has 30G
$ ls -lha Full-Text-Tokenized-Single-Line.txt
-rw-rw-r-- 1 ubuntu ubuntu 30G Jun 9 19:51 Full-Text-Tokenized-Single-Line.txt
run the command:
$ sed 's/<unk>/ /g' Full-Text-Tokenized-Single-Line.txt > Full-Text-Tokenized-Single-Line-No-unks.txt
the output file has zero length!
$ ls -lha Full-Text-Tokenized-Single-Line-No-unks.txt
-rw-rw-r-- 1 ubuntu ubuntu 0 Jun 9 19:52 Full-Text-Tokenized-Single-Line-No-unks.txt
Things I've tried
running the very same example on a shorter file: works
using -e modifier: doesn't work
escaping "<" and ">": doesn't work
using a simple pattern line ('s/foo/bar/g') instead: doesn't work: zero-length file is returned.
EDIT (more information)
return code is 0
sed version is (GNU sed) 4.2.2
Just use awk, it's designed for handling records separated by arbitrary strings. With GNU awk for multi-char RS:
awk -v RS='<unk>' '{ORS=(RT?" ":"")}1' file
The above splits the input into records separated by <unk> so if enough <unk>s are present in the input then the individual records will be small enough to fit in memory. It then prints each record followed by a blank char so the overall impact to the data is that all <unk>s become blank chars.
If that direct approach doesn't work for you THEN it'd be time to start looking for alternative solutions.
with line-based editors like sed you can't expect this to work, since its unit of work (record) is the line terminated with line breaks.
One suggestion if you have white space in your file (to prevent searched pattern to split) is use
fold -s file_with_one_long_line |
sed 's/find/replace/g' |
tr -d '\n' > output
ps. fold default width is 80, in case you have words longer than 80 you can add -w 1000 or at least the longest word size to prevent word splitting.
Officially gnu sed has no line limit
http://www.linuxtopia.org/online_books/linux_tool_guides/the_sed_faq/sedfaq6_005.html
However the page state that:
"no limit" means there is no "fixed" limit. Limits are actually determined by one's hardware, memory, operating system, and which C library is used to compile sed.
I tried running sed on a 7gb single file could reproduce same issue.
This page https://community.hpe.com/t5/Languages-and-Scripting/Sed-Maximum-Line-Length/td-p/5136721 suggest using perl instead
perl -pe 's/start=//g;s/stop=//g;s/<unk>/ /g' file > output
If the tokens are space(not all whitespace) delimited and assuming your are only matching single words then you could use perl with space as the record separator
perl -040 -pe 's/<unk>/ /' file
or GNU awk to match all whitespace
awk -vRS="[[:space:]]" '{ORS=RT;sub(/<unk>/," ")} file
I have a large block of binary, take this for an example:
000110101100101001110001010101110010101010110101
(Not sure if the example is a multiple of 8 but...)
I'd like to split this block of text into 8 bit chunks, and output it to a file line by line, i.e. like:
00011010
11001010
01110001
etc...
Apologies if this is really simple, I've attempted using 'split' but can't get the right syntax and I'd ideally like to do this in bash. Thanks,
Try this with grep:
grep -Eo '.{8}' file > newfile
Output to newfile:
00011010
11001010
01110001
01010111
00101010
Same output to newfile with fold from GNU Core Utilities:
fold -w 8 file > newfile
I need to write a shell script that does the following:
In a given folder with files that fit the pattern: update-8.1.0-v46.sql I need to find the maximum version
I need to write the maximum version I've found into a configuration file
For 1, I've found the following answer: Shell script: find maximum value in a sequence of integers without sorting
The only problem I have is that I can't get down to a list of only the versions,
I tried:
ls | grep -o "update-8.1.0-v\(\d*\).sql"
but I get the entire file name in return and not just the matching part
Any ideas?
Maybe move everything to awk?
I ended up using:
SCHEMA=`ls database/targets/oracle/ | grep -o "update-$VERSION-v.*.sql" | sed "s/update-$VERSION-v\([0-9]*\).sql/\1/p" | awk '$0>x{x=$0};END{print x}'`
based on dreamer's answer
you can use sed for this:
echo "update-8.1.0-v46.sql" | sed 's/update-8.1.0-v\([0-9]*\).sql/\1/p'
The output in this case will be 46
grep isn't really the best tool for extracting captured matches, but you can use look-behind assertions if you switch it to use perl-like regular expressions. Anything in the assertion will not be printed when using the -o flag.
ls | grep -Po "(?<=update-8.1.0-v)\d+"
46
The file data.txt contains the following:
1.00 1.23 54.4 213.2 3.4
The output of the scripts are supposed to be:
ave: 54.646
Some simple scripts are preferred.
Here is one method:
$ awk '{s+=$1}END{print "ave:",s/NR}' RS=" " file
ave: 54.646
Another option is to use jq:
$ seq 100|jq -s add/length
50.5
-s (--slurp) creates an array for the input lines after parsing each line as JSON, or as a number in this case.
Edit: awk is faster and it doesn't require reading the whole input to memory:
$ time seq 1e6|awk '{x+=$0}END{print x/NR}'>/dev/null
real 0m0.145s
user 0m0.148s
sys 0m0.008s
$ time seq 1e6|jq -s add/length>/dev/null
real 0m0.685s
user 0m0.669s
sys 0m0.024s
perl -lane '$a+=$_ for(#F);print "ave: ".$a/scalar(#F)' file
if you have multiple lines and you just need a single average:
perl -lane '$a+=$_ for(#F);$f+=scalar(#F);END{print "ave: ".$a/$f}' file