one command line grep and word count recursively

one command line grep and word count recursively - bash

I can do the following using a for loop
for f in *.txt; do grep 'RINEX' $f |wc -l; done
Is there any possibility to get an individual file report by running one liner?
Meaning that I want to grep & wc one file at the time in a similar fashion like
grep 'RINEX' *.txt
UPDATE:
grep -c 'RINEX' *.txt
returns the name of each file and its corresponding number of occurrences. Thx #Evert

grep is not the right tool for this task.
grep does line based match, e.g. line grep 'o' <<< "fooo" will return 1 line. however we have 3 os.
This one-liner should do what you want:
awk -F'RINEX' 'FILENAME!=f{if(f)print f,s;f=FILENAME;s=0}
{s+=(NF-1)}
END{print f,s}' /path/*.txt

Related

1. How to use the input not including the first one 2.Using grep and sed to find the pattern entered by the user and how to create the next line

The command that I'm making wants the first input to be a file and search how many times a certain pattern occurs within the file, using grep and sed.
Ex:
$ cat file1
oneonetwotwotwothreefourfive
Intended output:
$ ./command file1 one two three
one 2
two 3
three 1
The problem is the file does not have any lines and is just a long list of letters. I'm trying to use sed to replace the pattern I'm looking for with "FIND" and move the list to the next line and this continues until the end of file. Then, use $grep FIND to get the line that contains FIND. Finally, use wc -l to find a number of lines. However, I cannot find the option to move the list to the next line
Ex:
$cat file1
oneonetwosixone
Intended output:
FIND
FIND
twosixFIND
Another problem that I've been having is how to use the rest of the input, not including the file.
Failed attempt:
file=$1
for PATTERN in 2 3 4 5 ... N
do
variable=$(sed 's/$PATTERN/find/g' $file | grep FIND $file | wc -l)
echo $PATTERN $variable
exit
Another failed attempt:
file=$1
PATTERN=$($2,$3 ... $N)
for PATTERN in $*
do variable=$(sed 's/$PATTERN/FIND/g' $file | grep FIND $file | wc-1)
echo $PATTERN $variable
exit
Any suggestions and help will be greatly appreciated. Thank you in advance.

Non-portable solution with GNU grep:
file=$1
shift
for pattern in "$#"; do
echo "$pattern" $(grep -o -e "$pattern" <"$file" | wc -l)
done
If you want to use sed and your "patterns" are actually fixed strings (which don't contain characters that have special meaning to sed), you could do something like:
file=$1
shift
for pattern in "$#"; do
echo "$pattern" $(
sed "s/$pattern/\n&\n/g" "$file" |\
grep -e "$pattern" | wc -l
)
done
Your code has several issues:
you should quote use of variables where word splitting may happen
don't use ALLCAPS variable names - they are reserved for use by the shell
if you put a string in single-quotes, variable expansion does not happen
if you give grep a file, it won't read standard input
your for loop has no terminating done

This might work for you (GNU bash,sed and uniq):
f(){ local file=$1;
shift;
local args="$#";
sed -E 's/'${args// /|}'/\n&\n/g
s/(\n\S+)\n\S+/\1/g
s/\n+/\n/g
s/.(.*)/echo "\1"|uniq -c/e
s/ *(\S+) (\S+)/\2 \1/mg' $file; }
Separate arguments into file and remaining arguments.
Apply arguments as alternation within a sed substitution command which splits words into lines separated by a newline either side.
Remove unwanted words and unwanted newlines.
Evaluate the manufactured file within a sed substitution using the uniq command with the -c option.
Rearrange the output and print the result.

The problem is the file does not have any lines
Great! So the problem reduces to putting newlines.
func() {
file=$1
shift
rgx=$(printf "%s\\|" "$#" | sed 's#\\|$##');
# put the newline between words
sed 's/\('"$rgx"'\)/&\n/g' "$file" |
# it's just standard here
sort | uniq -c |
# filter only input - i.e. exclude fourfive
grep -xf <(printf " *[0-9]\+ %s\n" "$#")
};
func <(echo oneonetwotwotwothreefourfive) one two three
outputs:
2 one
1 three
3 two

Concatenate files based on numeric sort of name substring in awk w/o header

I am interested in concatenate many files together based on the numeric number and also remove the first line.
e.g. chr1_smallfiles then chr2_smallfiles then chr3_smallfiles.... etc (each without the header)
Note that chr10_smallfiles needs to come after chr9_smallfiles -- that is, this needs to be numeric sort order.
When separate the two command awk and ls -v1, each does the job properly, but when put them together, it doesn't work. Please help thanks!
awk 'FNR>1' | ls -v1 chr*_smallfiles > bigfile

The issue is with the way that you're trying to pass the list of files to awk. At the moment, you're piping the output of awk to ls, which makes no sense.
Bear in mind that, as mentioned in the comments, ls is a tool for interactive use, and in general its output shouldn't be parsed.
If sorting weren't an issue, you could just use:
awk 'FNR > 1' chr*_smallfiles > bigfile
The shell will expand the glob chr*_smallfiles into a list of files, which are passed as arguments to awk. For each filename argument, all but the first line will be printed.
Since you want to sort the files, things aren't quite so simple. If you're sure the full range of files exist, just replace chr*_smallfiles with chr{1..99}_smallfiles in the original command.
Using some Bash-specific and GNU sort features, you can also achieve the sorting like this:
printf '%s\0' chr*_smallfiles | sort -z -n -k1.4 | xargs -0 awk 'FNR > 1' > bigfile
printf '%s\0' prints each filename followed by a null-byte
sort -z sorts records separated by null-bytes
-n -k1.4 does a numeric sort, starting from the 4th character (the numeric part of the filename)
xargs -0 passes the sorted, null-separated output as arguments to awk
Otherwise, if you want to go through the files in numerical order, and you're not sure whether all the files exist, then you can use a shell loop (although it'll be significantly slower than a single awk invocation):
for file in chr{1..99}_smallfiles; do # 99 is the maximum file number
[ -f "$file" ] || continue # skip missing files
awk 'FNR > 1' "$file"
done > bigfile

You can also use tail to concatenate all the files without header
tail -q -n+2 chr*_smallfiles > bigfile
In case you want to concatenate the files in a natural sort order as described in your quesition, you can pipe the result of ls -v1 to xargs using
ls -v1 chr*_smallfiles | xargs -d $'\n' tail -q -n+2 > bigfile
(Thanks to Charles Duffy) xargs -d $'\n' sets the delimiter to a newline \n in case the filename contains white spaces or quote characters

Using a bash 4 associative array to extract only the numeric substring of each filename; sort those individually; and then retrieve and concatenate the full names in the resulting order:
#!/usr/bin/env bash
case $BASH_VERSION in ''|[123].*) echo "Requires bash 4.0 or newer" >&2; exit 1;; esac
# when this is done, you'll have something like:
# files=( [1]=chr_smallfiles1.txt
# [10]=chr_smallfiles10.txt
# [9]=chr_smallfiles9.txt )
declare -A files=( )
for f in chr*_smallfiles.txt; do
files[${f//[![:digit:]]/}]=$f
done
# now, emit those indexes (1, 10, 9) to "sort -n -z" to sort them as numbers
# then read those numbers, look up the filenames associated, and pass to awk.
while read -r -d '' key; do
awk 'FNR > 1' <"${files[$key]}"
done < <(printf '%s\0' "${!files[#]}" | sort -n -z) >bigfile

You can do with a for loop like below, which is working for me:-
for file in chr*_smallfiles
do
tail +2 "$file" >> bigfile
done
How will it work? For loop read all the files from current directory with wild chard character * chr*_smallfiles and assign the file name to variable file and tail +2 $file will output all the lines of that file except the first line and append in file bigfile. So finally all files will be merged (accept the first line of each file) into one i.e. file bigfile.

Just for completeness, how about a sed solution?
for file in chr*_smallfiles
do
sed -n '2,$p' $file >> bigfile
done
Hope it helps!

Loop through text file and execute If Then statement for each line with bash

I have a command that lists the full 8 level deep path of all folders we are backing up.
I also have a command that enumerates all 8 level deep folders on the system.
Both of these are stored as variables in a bash script.
I'm trying to get a loop together that takes file 1 and uses the first line entry as a variable in an if/then/else, and then moves onwards to through the end of the file.
I've tried so many things but its beyond my skillset to provide an example that won't confuse the reader of this post.
TempFile1=/ifs/data/scripts/ConfigMonitor/TempFile1.txt
TempFile2=/ifs/data/scripts/ConfigMonitor/TempFile2.txt
find /ifs/*/*/backup -maxdepth 4 -mindepth 4 -type d > $TempFile1
isi snapshot schedules list -v | grep Path: | awk '{print $2}' > $TempFile2
list line 1 on $TempFile1
Grep for line 1 within $TempFile2
if result yielded then
echo found
else
echo fullpath not being backed up
fi

Use Grep's -f Flag
grep(1) says:
-f FILE, --file=FILE
Obtain patterns from FILE, one per line. The empty file
contains zero patterns, and therefore matches nothing. (-f is
specified by POSIX.)
Therefore, the following should work:
grep -f patterns_to_match.txt file_to_examine.txt
Faster Reporting
Another way to think about this is that you can ask GNU grep to show you all the matches:
echo 'Lines that match a pattern in your pattern file.'
grep -f patterns_to_match.txt file_to_examine.txt
and then show you all the lines that don't match any of the patterns:
echo 'Lines that do not match any patterns in your pattern file.'
grep -f patterns_to_match.txt -v file_to_examine.txt
This is likely to be faster and more efficient than looping through the file one line at a time in Bash. You may or may not get similar results with a grep other than GNU grep; while the -f and -v flags are specified by POSIX, I only tested it against GNU grep 2.16, so your mileage may vary.

This should iterate through Tempfile1.txt and grep for the line in TempFile2.txt.
while read line; do
if grep $line /path/to/TempFile2.txt > /dev/null
then
echo "Found $line"
else
echo "Did not find $line"
fi
done < Tempfile1.txt
Tempfile1.txt:
a
b
c
Tempfile2.txt
b
d
z
Output:
Did not find a
Found b
Did not find c

select nth file in folder (using sed)?

I am trying to select the nth file in a folder of which the filename matches a certain pattern:
Ive tried using this with sed: e.g.,
sed -n 3p /path/to/files/pattern.txt
but it appears to return the 3rd line of the first matching file.
Ive also tried
sed -n 3p ls /path/to/files/*pattern*.txt
which doesnt work either.
Thanks!

Why sed, when bash is so much better at it?
Assuming some name n indicates the index you want:
Bash
files=(path/to/files/*pattern*.txt)
echo "${files[n]}"
Posix sh
i=0
for file in path/to/files/*pattern*.txt; do
if [ $i = $n ]; then
break
fi
i=$((i++))
done
echo "$file"
What's wrong with sed is that you would have to jump through many hoops to make it safe for the entire set of possible characters that can occur in a filename, and even if that doesn't matter to you you end up with a double-layer of subshells to get the answer.
file=$(printf '%s\n' path/to/files/*pattern*.txt | sed -n "$n"p)
Please, never parse ls.

ls -1 /path/to/files/*pattern*.txt | sed -n '3p'
or, if patterne is a regex pattern
ls -1 /path/to/files/ | egrep 'pattern' | sed -n '3p'
lot of other possibilities, it depend on performance or simplicity you look at

Unix: merge many files, while deleting first line of all files

I have >100 files that I need to merge, but for each file the first line has to be removed. What is the most efficient way to do this under Unix? I suspect it's probably a command using cat and sed '1d'. All files have the same extension and are in the same folder, so we probably could use *.extension to point to the files. Many thanks!

Assuming your filenames are sorted in the order you want your files appended, you can use:
ls *.extension | xargs -n 1 tail -n +2
EDIT: After Sorin and Gilles comments about the possible dangers of piping ls output, you could use:
find . -name "*.extension" | xargs -n 1 tail -n +2

Everyone has to be complicated. This is really easy:
tail -q -n +2 file1 file2 file3
And so on. If you have a large number of files you can load them in to an array first:
list=(file1 file2 file3)
tail -q -n +2 "${list[#]}"
All the files with a given extension in the current directory?
list=(*.extension)
tail -q -n +2 "${list[#]}"
Or just
tail -q -n +2 *.extension

Just append each file after removing the first line.
#!/bin/bash
DEST=/tmp/out
FILES=space separated list of files
echo "" >$DEST
for FILE in $FILES
do
sed -e'1d' $FILE >>$DEST
done

tail outputs the last lines of a file. You can tell it how many lines to print, or how many lines to omit at the beginning (-n +N where N is the number of the first line to print, counting from 1 — so +2 omits one line). With GNU utilities (i.e. under Linux or Cygwin), FreeBSD or other systems that have the -q option:
tail -q -n +2 *.extension
tail prints a header before each file, and -q is not standard. If your implementation doesn't have it, or to be portable, you need to iterate over the files.
for x in *.extension; do tail -n +2 <"$x"; done
Alternatively, you can call Awk, which has a way to identify the first line of each file. This is likely to be faster if you have a lot of small files and slower if you have many large files.
awk 'FNR != 1' *.extension

ls -1 file*.txt | xargs nawk 'FNR!=1'

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

one command line grep and word count recursively - bash

grep is not the right tool for this task. grep does line based match, e.g. line grep 'o' <<< "fooo" will return 1 line. however we have 3 os. This one-liner should do what you want: awk -F'RINEX' 'FILENAME!=f{if(f)print f,s;f=FILENAME;s=0} {s+=(NF-1)} END{print f,s}' /path/*.txt

Related

1. How to use the input not including the first one 2.Using grep and sed to find the pattern entered by the user and how to create the next line

Concatenate files based on numeric sort of name substring in awk w/o header

Loop through text file and execute If Then statement for each line with bash

select nth file in folder (using sed)?

Unix: merge many files, while deleting first line of all files

Categories

Resources