count words in a file without using wc - shell

Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}

As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done

You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count

Related

Reverse the words but keep the order Bash

I have a file with lines. I want to reverse the words, but keep them in same order.
For example: "Test this word"
Result: "tseT siht drow"
I'm using MAC, so awk doesn't seem to work.
What I got for now
input=FILE_PATH
while IFS= read -r line || [[ -n $line ]]
do
echo $line | rev
done < "$input"
Here is a solution that completely avoids awk
#!/bin/bash
input=./data
while read -r line ; do
for word in $line ; do
output=`echo $word | rev`
printf "%s " $output
done
printf "\n"
done < "$input"
In case xargs works on mac:
echo "Test this word" | xargs -n 1 | rev | xargs
Inside your read loop, you can just iterate over the words of your string and pass them to rev
line="Test this word"
for word in "$line"; do
echo -n " $word" | rev
done
echo # Add final newline
output
tseT siht drow
You are actually in fairly good shape with bash. You can use string-indexes and string-length and C-style for loops to loop over the characters in each word building a reversed string to output. You can control formatting in a number of ways to handle spaces between words, but a simple flag first=1 is about as easy as anything else. You can do the following with your read,
#!/bin/bash
while read -r line || [[ -n $line ]]; do ## read line
first=1 ## flag to control space
a=( $( echo $line ) ) ## put line in array
for i in "${a[#]}"; do ## for each word
tmp= ## clear temp
len=${#i} ## get length
for ((j = 0; j < len; j++)); do ## loop length times
tmp="${tmp}${i:$((len-j-1)):1}" ## add char len - j to tmp
done
if [ "$first" -eq '1' ]; then ## if first word
printf "$tmp"; first=0; ## output w/o space
else
printf " $tmp" ## output w/space
fi
done
echo "" ## output newline
done
Example Input
$ cat dat/lines2rev.txt
my dog has fleas
the cat has none
Example Use/Output
$ bash revlines.sh <dat/lines2rev.txt
ym god sah saelf
eht tac sah enon
Look things over and let me know if you have questions.
Using rev and awk
Consider this as the sample input file:
$ cat file
Test this word
Keep the order
Try:
$ rev <file | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
tseT siht drow
peeK eht redro
(This uses awk but, because it uses no advanced awk features, it should work on MacOS.)
Using in a script
If you need to put the above in a script, then create a file like:
$ cat script
#!/bin/bash
input="/Users/Anastasiia/Desktop/Tasks/test.txt"
rev <"$input" | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
And, run the file:
$ bash script
tseT siht drow
peeK eht redro
Using bash
while read -a arr
do
x=" "
for ((i=0; i<${#arr}; i++))
do
((i == ${#arr}-1)) && x=$'\n'
printf "%s%s" $(rev <<<"${arr[i]}") "$x"
done
done <file
Applying the above to our same test file:
$ while read -a arr; do x=" "; for ((i=0; i<${#arr}; i++)); do ((i == ${#arr}-1)) && x=$'\n'; printf "%s%s" $(rev <<<"${arr[i]}") "$x"; done; done <file
tseT siht drow
peeK eht redro

Grep command in array

For a homework assignment I have to Take the results from the grep command, and write out up to the first 5 of them, numbering them from 1 to 5. (Print the number, then a space, then the line from grep.) If there are no lines, print a message saying so. So far I managed to store the grep command in an array but this is where I've gotten stuck: Can anyone provide guidance as to how to proceed in printing this as stated above
pattern="*.c"
fileList=$(grep -l "main" $pattern)
IFS=$"\n"
declare -a array
array=$fileList
for x in "${array[#]}"; do
echo "$x"
done
you can grep options -c and -l
pattern="*.c"
searchPattern="main"
counter=1
while read -r line ; do
IFS=':' read -r -a lineInfo <<< "$line"
if [[ $counter > 5 ]]; then
exit 1
fi
if [[ ${lineInfo[1]} > 0 ]]; then
numsOfLine=""
while read -r fileline ; do
IFS=':' read -r -a fileLineInfo <<< "$fileline"
numsOfLine="$numsOfLine ${fileLineInfo[0]} "
done < <(grep -n $searchPattern ${lineInfo[0]})
echo "$counter ${lineInfo[0]} match on lines: $numsOfLine"
let "counter += 1"
else
echo "${lineInfo[0]} no match lines"
fi
done < <(grep -c $searchPattern $pattern)
If you're only allowed to use grep and bash(?):
pattern="*.c"
fileList=($(grep -l "main" $pattern))
if test ${#fileList[#]} = 0 ; then
echo "No results"
else
n=0
while test $n -lt ${#fileList[#]} -a $n -lt 5 ; do
i=$n
n=$(( n + 1 ))
echo "$n ${fileList[$i]}"
done
fi
If you are allowed to use commands in addition to grep, you can pipe the results through nl to add line numbers, then head to limit the results to the first 5 lines, then a second grep to test if there were any lines. For example:
if ! grep -l "main" $pattern | \
nl -s ' ' | sed -e 's/^ *//' | \
head -n 5 | grep '' ; then
echo "No results"
fi

Check if a string contains "-" and "]" at the same time

I have the next two regex in Bash:
1.^[-a-zA-Z0-9\,\.\;\:]*$
2.^[]a-zA-Z0-9\,\.\;\:]*$
The first matches when the string contains a "-" and the other values.
The second when contains a "]".
I put this values at the beginning of my regex because I can't scape them.
How I can get match the two values at the same time?
You can also place the - at the end of the bracket expression, since a range must be closed on both ends.
^[]a-zA-Z0-9,.;:-]*$
You don't have to escape any of the other characters, either. Colons, semicolons, and commas have no special meaning in any part of a regular expression, and while a period loses its special meaning inside a bracket expression.
Basically you can use this:
grep -E '^.*\-.*\[|\[.*\-.*$'
It matches either a - followed by zero or more arbitrary chars and a [ or a [ followed by zero or more chars and a -
However since you don't accept arbitrary chars, you need to change it to:
grep -E '^[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*\[|\[[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*$'
Maybe, this can help you
#!/bin/bash
while read p; do
echo $p | grep -E '\-.*\]|\].*\-' | grep "^[]a-zA-Z0-9,.;:-]*$"
done <$1
user-host:/tmp$ cat test
-i]string
]adfadfa-
string-
]string
str]ing
]123string
123string-
?????
++++++
user-host:/tmp$ ./test.sh test
-i]string
]adfadfa-
There are two questions in your post.
One is in the description:
How I can get match the two values at the same time?
That is an OR match, which could be done with a range that mix your two ranges:
pattern='^[]a-zA-Z0-9,.;:-]*$'
That will match a line that either contains one (or several) -…OR…]…OR any of the included characters. That would be all the lines (except ?????, ++++++ and as df gh) in the test script below.
Two is in the title:
… a string contains “-” and “]” at the same time
That is an AND match. The simplest (and slowest) way to do it is:
echo "$line" | grep '-' | grep ']' | grep '^[-a-zA-Z0-9,.;:]*$'
The first two calls to grep select only the lines that:
contain both (one or several) - and (one or several) ]
Test script:
#!/bin/bash
printlines(){
cat <<-\_test_lines_
asdfgh
asdfgh-
asdfgh]
as]df
as,df
as.df
as;df
as:df
as-df
as]]]df
as---df
asAS]]]DFdf
as123--456DF
as,.;:-df
as-dfg]h
as]dfg-h
a]s]d]f]g]h
a]s]d]f]g]h-
s-t-r-i-n-g]
as]df-gh
123]asdefgh
123asd-fgh-
?????
++++++
as df gh
_test_lines_
}
pattern='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing the simple pattern of $pattern"
while read line; do
resultgrep="$( echo "$line" | grep "$pattern" )"
printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
p1='-'; p2=']'; p3='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing a 'grep AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ $resultgrep ]] && printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing an 'AWK AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultawk="$( echo "$line" |
awk -v p1="$p1" -v p2="$p2" -v p3="$p3" '$0~p1 && $0~p2 && $0~p3' )"
[[ $resultawk ]] && printf '%13s %-13s\n' "$line" "$resultawk"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing a 'bash AND' of '$p1', '$p2' and '$p3'."
while read line; do
rgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ ( $line =~ $p1 ) && ( $line =~ $p2 ) && ( $line =~ $p3 ) ]]
rbash=${BASH_REMATCH[0]}
[[ $rbash ]] && printf '%13s %-13s %-13s\n' "$line" "$rgrep" "$rbash"
done < <(printlines)
echo "#############################################################"
echo

How to browse a line from a file?

I have a file that contains 10 lines with this sort of content:
aaaa,bbb,132,a.g.n.
I wanna walk throw every line, char by char and put the data before the " , " is met in an output file.
if [ $# -eq 2 ] && [ -f $1 ]
then
echo "Read nr of fields to be saved or nr of commas."
read n
nrLines=$(wc -l < $1)
while $nrLines!="1" read -r line || [[ -n "$line" ]]; do
do
for (( i=1; i<=$n; ++i ))
do
while [ read -r -n1 temp ]
do
if [ temp != "," ]
then
echo $temp > $(result$i)
else
fi
done
paste -d"\n" $2 $(result$i)
done
nrLines=$($nrLines-1)
done
else
echo "File not found!"
fi
}
In parameter $2 I have an empty file in which I will store the data from file $1 after I extract it without the " , " and add a couple of comments.
Example:
My input_file contains:
a.b.c.d,aabb,comp,dddd
My output_file is empty.
I call my script: ./script.sh input_file output_file
After execution the output_file contains:
First line info: a.b.c.d
Second line info: aabb
Third line info: comp
(yes, without the 4th line info)
You can do what you want very simply with parameter-expansion and substring-removal using bash alone. For example, take an example file:
$ cat dat/10lines.txt
aaaa,bbb,132,a.g.n.
aaaa,bbb,133,a.g.n.
aaaa,bbb,134,a.g.n.
aaaa,bbb,135,a.g.n.
aaaa,bbb,136,a.g.n.
aaaa,bbb,137,a.g.n.
aaaa,bbb,138,a.g.n.
aaaa,bbb,139,a.g.n.
aaaa,bbb,140,a.g.n.
aaaa,bbb,141,a.g.n.
A simple one-liner using native bash string handling could simply be the following and give the following results:
$ while read -r line; do echo ${line%,*}; done <dat/10lines.txt
aaaa,bbb,132
aaaa,bbb,133
aaaa,bbb,134
aaaa,bbb,135
aaaa,bbb,136
aaaa,bbb,137
aaaa,bbb,138
aaaa,bbb,139
aaaa,bbb,140
aaaa,bbb,141
Paremeter expansion w/substring removal works as follows:
var=aaaa,bbb,132,a.g.n.
Beginning at the left and removing up to, and including, the first ',' is:
${var#*,} # bbb,132,a.g.n.
Beginning at the left and removing up to, and including, the last ',' is:
${var##*,} # a.g.n.
Beginning at the right and removing up to, and including, the first ',' is:
${var%,*} # aaaa,bbb,132
Beginning at the left and removing up to, and including, the last ',' is:
${var%%,*} # aaaa
Note: the text to remove above is represented with a wildcard '*', but wildcard use is not required. It can be any allowable text. For example, to only remove ,a.g.n where the preceding number is 136, you can do the following:
${var%,136*},136 # aaaa,bbb,136 (all others unchanged)
To print 2016 th line from a file named file.txt u have to run a command like this-
sed -n '2016p' < file.txt
More-
sed -n '2p' < file.txt
will print 2nd line
sed -n '2011p' < file.txt
2011th line
sed -n '10,33p' < file.txt
line 10 up to line 33
sed -n '1p;3p' < file.txt
1st and 3th line
and so on...
For more detail, please have a look in this tutorial and this answer.
In native bash the following should do what you want, assuming you replace the contents of your script.sh with the below:
#!/bin/bash
IN_FILE=${1}
OUT_FILE=${2}
IFS=\,
while read line; do
set -- ${line}
for ((i=1; i<=${#}; i++)); do
((${i}==4)) && continue
((n+=1))
printf '%s\n' "Line ${n} info: ${!i}"
done
done < ${IN_FILE} > ${OUT_FILE}
This will not print the 4th field of each line within the input file, on a new line in the output file (I assume this is your requirement as per your comment?).
[wspace#wspace sandbox]$ awk -F"," 'BEGIN{OFS="\n"}{for(i=1; i<=NF-1; i++){print "line Info: "$i}}' data.txt
line Info: a.b.c.d
line Info: aabb
line Info: comp
This little snippet can ignore the last field.
updated:
#!/usr/bin/env bash
if [ ! -f "$1" -o $# -ne 2 ];then
echo "Usage: $(basename $0) input_file out_file"
exit 127
fi
input_file=$1
output_file=$2
: > $output_file
if [ "$(wc -l < $1)" -ne 0 ];then
while true
do
read -r -n1 char
if [ "$char" == "" ];then
break
elif [ $char != "," ];then
temp=$temp$char
else
echo "line info: $temp" >> $output_file
temp=""
fi
done < $input_file
else
echo "file $1 is empty"
fi
Maybe this is what you want
Did you try
sed "s|,|\n|g" $1 | head -n -1 > $2
I assume that only the last word would not have a comma on its right.
Try this (tested with you sample line) :
#!/bin/bash
# script.sh
echo "Number of fields to save ?"
read nf
while IFS=$',' read -r -a arr; do
newarr=${arr[#]:0:${nf}}
done < "$1"
for i in ${newarr[#]};do
printf "%s\n" $i
done > "$2"
Execute script with :
$ ./script.sh inputfile outputfile
Number of fields ?
3
$ cat outputfile
a.b.c.d
aabb
comp
All words separated with commas are stored into an array $arr
A tmp array $newarr removes last $n element ($n get the read command).
It loops over new array and prints result in $2, the outputfile.

unix shell script to check for EOF

I wish to take names of two files as command line arguments in bash shell script and then for each word (words are comma separated and the file has more than one line) in the first file I need to count its occurrence in the second file.
I wrote a shell script like this
if [ $# -ne 2 ]
then
echo "invalid number of arguments"
else
i=1
a=$1
b=$2
fp=*$b
while[ fgetc ( fp ) -ne EOF ]
do
d=$( cut -d',' -f$i $a )
echo "$d"
grep -c -o $d $b
i=$(( $i + 1 ))
done
fi
for example file1 has words abc,def,ghi,jkl (in first line )
mno,pqr (in second line)
and file2 has words abc,abc,def
Now the output should be like abc 2
def 1
ghi 0
To read a file word by word separated by comma use this snippet:
while read -r p; do
IFS=, && for w in $p; do
printf "%s: " "$w"
tr , '\n' < file2 | grep -Fc "$w"
done
done < file1
Another approach:
words=( `tr ',' ' ' < file1`) #split the file1 into words...
for word in "${words[#]}"; do #iterate in the words
printf "%s : " "$word"
awk 'END{print FNR-1}' RS="$word" file2
# split file2 with 'word' as record separator.
# print number of lines == number of occurrences of the word..
done

Resources