Efficient way to add/ append huge files - bash

Below is a shell script that is written to process a huge file. It typically reads a fixed length file line by line, perform substring and append into another file as a delimited file. It works perfectly, but it is too slow.
array=() # Create array
while IFS='' read -r line || [[ -n "$line" ]] # Read a line
do
coOrdinates="$(echo -e "${line}" | grep POSITION | cut -d'(' -f2 | cut -d')' -f1 | cut -d':' -f1,2)"
if [[ -z "${coOrdinates// }" ]];
then
echo "Not adding"
else
array+=("$coOrdinates")
fi
done < "$1_CTRL.txt"
while read -r line;
do
result='"'
for e in "${array[#]}"
do
SUBSTRING1=`echo "$e" | sed 's/.*://'`
SUBSTRING=`echo "$e" | sed 's/:.*//'`
result1=`perl -e "print substr('$line', $SUBSTRING,$SUBSTRING1)"`
result1="$(echo -e "${result1}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
result=$result$result1'"'',''"'
done
echo $result >> $1_1.txt
done < "$1.txt"
Earlier, i had used the cut command and changed as above, but there is no improvement in the time taken.
Can please suggest what kind of changes can be done to improve the time taken for processing..
Thanks in advance
Update:
Sample content of the input file :
XLS01G702012 000034444132412342134
Control File :
OPTIONS (DIRECT=TRUE, ERRORS=1000, rows=500000) UNRECOVERABLE
load data
CHARACTERSET 'UTF8'
TRUNCATE
into table icm_rls_clientrel2_hg
trailing nullcols
(
APP_ID POSITION(1:3) "TRIM(:APP_ID)",
RELATIONSHIP_NO POSITION(4:21) "TRIM(:RELATIONSHIP_NO)"
)
Output file:
"LS0","1G702012 0000"

perl:
#!/usr/bin/env perl
use strict;
use warnings;
use autodie;
# read the control file
my $ctrl;
{
local $/ = "";
open my $fh, "<", shift #ARGV;
$ctrl = <$fh>;
close $fh;
}
my #positions = ( $ctrl =~ /\((\d+):(\d+)\)/g );
# read the data file
open my $fh, "<", shift #ARGV;
while (<$fh>) {
my #words;
for (my $i = 0; $i < scalar(#positions); $i += 2) {
push #words, substr($_, $positions[$i], $positions[$i+1]);
}
say join ",", map {qq("$_")} #words;
}
close $fh;
perl parse.pl x_CTRL.txt x.txt
"LS0","1G702012 00003"
Different results from what you requested:
in the POSITION(m:n) syntax of the control file, is n a length or an
index?
in the data file, are those spaces or tabs?

I suggest, with pure bash and to avoid subshells:
if [[ $line =~ POSITION ]] ; then # grep POSITION
coOrdinates="${line#*(}" # cut -d'(' -f2
coOrdinates="${coOrdinates%)*}" # cut -d')' -f1
coOrdinates="${coOrdinates/:/ }" # cut -d':' -f1,2
if [[ -z "${coOrdinates// }" ]]; then
echo "Not adding"
else
array+=("$coOrdinates")
fi
fi
more efficient, by gniourf_gniourf :
if [[ $line =~ POSITION\(([[:digit:]]+):([[:digit:]])\) ]]; then
array+=( "${BASH_REMATCH[*]:1:2}" )
fi
similarly:
SUBSTRING1=${e#*:} # $( echo "$e" | sed 's/.*://' )
SUBSTRING= ${e%:*} # $( echo "$e" | sed 's/:.*//' )
# to confirm, I don't know perl substr
result1=${line:$SUBSTRING:$SUBSTRING1} # $( perl -e "print substr('$line', $SUBSTRING,$SUBSTRING1)" )
#result1= # "$(echo -e "${result1}" | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//')"
# trim, if nécessary?
result1="${result1%${result1##*[^[:space:]]}}" # right
result1="${result1#${result1%%[^[:space:]]*}}" # left
gniourf_gniourf suggest having the grep out of the loop:
while read ...; do
...
done < <(grep POSITION ...)
for extra efficiency: while/read loops are very slow in Bash, so prefiltering as much as possible will speed up the process quite a lot.

Updated Answer
Here is a version where I parse the control file with awk, save the character positions and then use those when parsing the input file:
awk '
/APP_ID/ {
sub(/\).*/,"") # Strip closing parenthesis and all that follows
sub(/^.*\(/,"") # Strip everything up to opening parenthesis
split($0,a,":") # Extract the two character positions separated by colon into array "a"
next
}
/RELATIONSHIP/ {
sub(/\).*/,"") # Strip closing parenthesis and all that follows
sub(/^.*\(/,"") # Strip everything up to opening parenthesis
split($0,b,"[():]") # Extract character positions into array "b"
next
}
FNR==NR{next}
{ f1=substr($0,a[1]+1,a[2]); f2=substr($0,b[1]+1,b[2]); printf("\"%s\",\"%s\"\n",f1,f2)}
' ControlFile InputFile
Original Answer
Not a complete, rigorous answer, but this should give you an idea of how to do the extraction with awk once you have the POSITION parameters from the control file:
awk -v a=2 -v b=3 -v c=5 -v d=21 '{f1=substr($0,a,b); f2=substr($0,c,d); printf("\"%s\",\"%s\"\n",f1,f2)}' InputFile
Sample Output
"LS0","1G702012 00003"
Try running that on your large input file to get an idea of the performance, then tweak the output. Reading the control file is not at all time-critical so don't bother with optimising that.

To avoid the (slow) while loop , you can use cut and paste
#!/bin/bash
inFile=${1:-checkHugeFile}.in
ctrlFile=${1:-checkHugeFile}_CTRL.txt
outFile=${1:-checkHugeFile}.txt
cat /dev/null > $outFile
typeset -a array # Create array
while read -r line # Read a line
do
coOrdinates="${line#*(}"
coOrdinates="${coOrdinates%%)*}"
[[ -z "${coOrdinates// }" ]] && { echo "Not adding"; continue; }
array+=("$coOrdinates")
done < <(grep POSITION "$ctrlFile" )
echo coOrdinates: "${array[#]}"
for e in "${array[#]}"
do
nr=$((nr+1))
start=${e%:*}
len=${e#*:}
from=$(( start + 1 ))
to=$(( start + len + 1 ))
cut -c$from-$to $inFile > ${outFile}.$nr
done
paste $outFile.* | sed -e 's/^/"/' -e 's/\t/","/' -e 's/$/"/' >${outFile}
rm $outFile.[0-9]

Related

Grep -rl from a .txt list

I'm trying to locate a list of strings from a .txt file, the search target is a directory of multiple .csv (locating which .csv contain the string)
I already find how to do it manually:
grep -rl doggo C:\dirofcsv\
The next step is to to it from a list of hundreds of terms.
I tried grep -rl -f list.txt C:\dirofcsv < print.txt but I only have the last term printed.. I want to have the results lines by lines.
I'm missing something but I don't know where.
I'm working on windows with a term emulator.
EDIT: I've found how to list the terms from a file.Now I need to see which terms have which result like " doggo => file2, file4" did I need to write a loop ?
Thanks community.
grep -rl -f list.txt C:\dirofcsv >> print.txt
You are looking to append lines to the print.txt file and so will need to use >> as opposed to > which will overwrite what is already in the file.
To get the output listed in the output required in your edited requirement, you can use a loop redirected back into awk:
awk '/^FILE -/ { fil=$3; # When the output start with "FILE -" set fil to the third space delimited field
next # Skip to the next line
}
{ arr[fil][$0]="" # Set up a 2 dimensional array with the search term (fil) as the first index and the name of the file the second
}
END { for (i in arr) { # Loop through the array
printf "%s => ",i; First print the search term in the format required
for (j in arr[i]) {
printf "%s,",j # Print the file name followed by a comma
}
printf "\n" # Print a new line
}
}' <<< "$(while read line # Read list.txt line by line
do
echo "FILE - $line"; Echo a marker for identification in awk
grep -l "$line" C:\dirofcsv ; # Grep for the line
done < list.txt)" >> print.txt
One liner:
awk '/^FILE -/ { fil=$3;next } { arr[fil][$0]="" } END { for (i in arr) { printf "%s => ",i;for (j in arr[i]) { printf "%s,",j } printf "\n" } }' <<< "$(while read line;do echo "FILE - $line";grep -l "$line" C:\dirofcsv done < list.txt)" >> print.txt
I think you meant to pass the command as:
grep -rl -f list.txt C:\dirofcsv >> print.txt
Give it a shot. It should take all patterns from list.txt line by line and search in the directory C:\dirofcsv for files with matching patterns and print their names to print.txt file.
Try this for printing without a loop (just like you asked in comments ;-)
One Line Answer
dir=C:\dirofcsv
listfile=list.txt
eval $(jq -Rsr 'split("\n") | map(select(length > 0)) | reduce .[] as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' --arg dir "$dir" < "$listfile")
Another solution, for explanation say:
unset li
readarray li -u <"$listfile"
quoted_commands="$(jq -R 'reduce inputs as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' \
--arg dir $dir \
<<< $(echo; printf "%s" "${li[#]}"))"
quoted_commands=${quoted_commands%\"}
commands=${quoted_commands#\"}
eval $commands
Breaking down the command for better explaination in comments:
# read contents of listfile in li
unset li && readarray li -u <"$listfile"
# add the content to new list so that it prints the list elements in new-lines
# also add a newline at top as it will be discarded by jq (in this case only)
list="$(echo; printf "%s" "${li[#]}";)"
# pass jq command
quoted_commands="$(jq -R 'reduce inputs as $line
([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"])
| (join("; "))' \
--arg dir $dir <<< "$list")"
# the elements are read with reduce filter and converted to JSON Array of corresponding commands to execute
# the commands for all elements of list are joined with join filter
# trim quotes to execute commands properly
commands=$(sed -e 's/^"//' -e 's/"$//' <<< "$quoted_commands")
# run commands
eval "$commands"
You may want to print the above variables. Take care to use quotes in echo/printf while doing so, i.e., echo "$variable".
Replacement of sed command:
signgle_quoted=${quoted%\"}
commands=${signgle_quoted#\"}
echo "$commands"
I am now using the following implementations (though the dictionary implementation uses a for loop, the key : value implementation doesn't, and is a single line command):
# print an Associative bash array as a JSON dictionary
print_dict()
{
declare -n ref
ref=$1
for k in $(echo "${!ref[#]}")
do
printf '{"name":"%s", "value":"%s"}\n' "$k" "${ref[$k]}"
done | jq -s 'reduce .[] as $i ({}; .[$i.name] = $i.value)'
}
#-------------------------------------------------------------------------
# print the grep output in key : value format
function list_grep()
{
local listfile=$1
local dir=$2
eval $(jq -Rsr 'split("\n") | map(select(length > 0)) | reduce .[] as $line ([]; . + ["echo \($line) :; grep -rl \($line) \($dir); echo"]) | (join("; "))' --arg dir "$dir" < "$listfile")
}
#-------------------------------------------------------------------------
# print the grep output as JSON dictionary
function dict_grep()
{
local listfile=$1
local dir=$2
eval declare -A Arr=\($(eval echo $(jq -Rrs 'split("\n") | map(select(length > 0)) | reduce .[] as $k ([]; . + ["[\($k)]=\\\"$(grep -rl \($k) tmp)\\\""]) | (join(" "))' --arg dir $dir < tmp/list.txt))\)
print_dict Arr
}
#-------------------------------------------------------------------------
# call:
list_grep $listfile $dir
dict_grep $listfile $dir
-Himanshu

base64 decode while ignoring brackets

I'm trying to decode a file, which is mostly encoded with base64. What I want to do is to decode the following, while still maintaining the [_*_].
example.txt
wq9cXyjjg4QpXy/Crwo=
[_NOTBASE64ED_]
aGkgdGhlcmUK
[_CONSTANT_]
SGVsbG8gV29ybGQhCg==
Sometimes it'll be in this form
aGkgdGhlcmUK[_CONSTANT_]SGVsbG8gV29ybGQhCg==
Desired output
¯\_(ツ)_/¯
[_NOTBASE64ED_]
hi there
[_CONSTANT_]
Hello World!
hi there[_CONSTANT_]Hello World!
Error output
¯\_(ツ)_/¯
4��!:�#�H\�B�8ԓ��[��ܛBbase64: invalid input
What I've tried
base64 -di example.txt
base64 -d example.txt
base64 --wrap=0 -d -i example.txt
I tried to individually base64 the [_*_] using grep -o. Then find and
replacing them through a weird arrangement with arrays, but I couldn't
get it to work.
base64ing it all, then decoding. Results in double base64ed rows.
The file is significantly downsized!
Encoded using base64 --wrap=0, while loop, and if/else statement.
The [_*_] still need to be there after being decoded.
I am sure someone has a more clever solution than this. But try this
#! /bin/bash
MYTMP1=""
function printInlineB64()
{
local lines=($(echo $1 | sed -e 's/\[/\n[/g' -e 's/\]/]\n/g'))
OUTPUT=""
for line in "${lines[#]}"; do
MYTMP1=$(base64 -d <<< "$line" 2>/dev/null)
if [ "$?" != "0" ]; then
OUTPUT="${OUTPUT}${line}"
else
OUTPUT="${OUTPUT}${MYTMP1}"
fi;
done
echo "$OUTPUT"
}
MYTMP2=""
function printB64Line()
{
local line=$1
# not fully base64 line
if [[ ! "$line" =~ ^[A-Za-z0-9+/=]+$ ]]; then
printInlineB64 "$line"
return
fi;
# likely base64 line
MYTMP2=$(base64 -d <<< "$line" 2>/dev/null)
if [ "$?" != "0" ]; then
echo $line
else
echo $MYTMP2
fi;
}
FILE=$1
if [ -z "$FILE" ]; then
echo "Please give a file name in argument"
exit 1;
fi;
while read line; do
printB64Line "$line"
done < ${FILE}
and here is output
$ cat example.txt && echo "==========================" && ./base64.sh example.txt
wq9cXyjjg4QpXy/Crwo=
[_NOTBASE64ED_]
aGkgdGhlcmUK
[_CONSTANT_]
SGVsbG8gV29ybGQhCg==
==========================
¯\_(ツ)_/¯
[_NOTBASE64ED_]
hi there
[_CONSTANT_]
Hello World!
$ cat example2.txt && echo "==========================" && ./base64.sh example2.txt
aGkgdGhlcmUK[_CONSTANT_]SGVsbG8gV29ybGQhCg==
==========================
hi there[_CONSTANT_]Hello World!
You need a loop that reads each line and tests whether it's base64 or non-base64, and processes it appropriately.
while read -r line
do
case "$line" in
\[*\]) echo "$line" ;;
*) base64 -d <<< "$line" ;;
esac
done << example.txt
I would suggest using other languages other than sh but here is a solution using cut. This would handle the case where there are more than one [_constant_] in a line.
#!/bin/bash
function decode() {
local data=""
local line=$1
while [[ -n $line ]]; do
data=$data$(echo $line | cut -d[ -f1 | base64 -d)
const=$(echo $line | cut -d[ -sf2- | cut -d] -sf1)
[[ -n $const ]] && data=$data[$const]
line=$(echo $line | cut -d] -sf2-)
done
echo "$data"
}
while read -r line; do
decode $line
done < example.txt
If Perl is an option, you can say something like:
perl -MMIME::Base64 -lpe '$_ = join("", grep {/^\[/ || chomp($_ = decode_base64($_)), 1} split(/(?=\[)|(?<=\])/))' example.txt
The code below is equivalent to the above but is broken down into steps for the explanation purpose:
#!/bin/bash
perl -MMIME::Base64 -lpe '
#ary = split(/(?=\[)|(?<=\])/, $_);
foreach (#ary) {
if (! /^\[/) {
chomp($_ = decode_base64($_));
}
}
$_ = join("", #ary);
' example.txt
-MMIME::Base64 option loads the base64 codec module.
-lpe option makes Perl bahave like AWK to loop over input lines and implicitly handle newlines.
The regular expression (?=\[)|(?<=\]) matches the boundary between the base64 block and the maintaining block surrounded by [...].
The split function divides the line into blocks on the boundary and store them in an array.
Then loop over the array and decode the base64-encoded entry if found.
Finally merge the substring blocks into a line to print.

Reverse the words but keep the order Bash

I have a file with lines. I want to reverse the words, but keep them in same order.
For example: "Test this word"
Result: "tseT siht drow"
I'm using MAC, so awk doesn't seem to work.
What I got for now
input=FILE_PATH
while IFS= read -r line || [[ -n $line ]]
do
echo $line | rev
done < "$input"
Here is a solution that completely avoids awk
#!/bin/bash
input=./data
while read -r line ; do
for word in $line ; do
output=`echo $word | rev`
printf "%s " $output
done
printf "\n"
done < "$input"
In case xargs works on mac:
echo "Test this word" | xargs -n 1 | rev | xargs
Inside your read loop, you can just iterate over the words of your string and pass them to rev
line="Test this word"
for word in "$line"; do
echo -n " $word" | rev
done
echo # Add final newline
output
tseT siht drow
You are actually in fairly good shape with bash. You can use string-indexes and string-length and C-style for loops to loop over the characters in each word building a reversed string to output. You can control formatting in a number of ways to handle spaces between words, but a simple flag first=1 is about as easy as anything else. You can do the following with your read,
#!/bin/bash
while read -r line || [[ -n $line ]]; do ## read line
first=1 ## flag to control space
a=( $( echo $line ) ) ## put line in array
for i in "${a[#]}"; do ## for each word
tmp= ## clear temp
len=${#i} ## get length
for ((j = 0; j < len; j++)); do ## loop length times
tmp="${tmp}${i:$((len-j-1)):1}" ## add char len - j to tmp
done
if [ "$first" -eq '1' ]; then ## if first word
printf "$tmp"; first=0; ## output w/o space
else
printf " $tmp" ## output w/space
fi
done
echo "" ## output newline
done
Example Input
$ cat dat/lines2rev.txt
my dog has fleas
the cat has none
Example Use/Output
$ bash revlines.sh <dat/lines2rev.txt
ym god sah saelf
eht tac sah enon
Look things over and let me know if you have questions.
Using rev and awk
Consider this as the sample input file:
$ cat file
Test this word
Keep the order
Try:
$ rev <file | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
tseT siht drow
peeK eht redro
(This uses awk but, because it uses no advanced awk features, it should work on MacOS.)
Using in a script
If you need to put the above in a script, then create a file like:
$ cat script
#!/bin/bash
input="/Users/Anastasiia/Desktop/Tasks/test.txt"
rev <"$input" | awk '{for (i=NF; i>=2; i--) printf "%s%s",$i,OFS; print $1}'
And, run the file:
$ bash script
tseT siht drow
peeK eht redro
Using bash
while read -a arr
do
x=" "
for ((i=0; i<${#arr}; i++))
do
((i == ${#arr}-1)) && x=$'\n'
printf "%s%s" $(rev <<<"${arr[i]}") "$x"
done
done <file
Applying the above to our same test file:
$ while read -a arr; do x=" "; for ((i=0; i<${#arr}; i++)); do ((i == ${#arr}-1)) && x=$'\n'; printf "%s%s" $(rev <<<"${arr[i]}") "$x"; done; done <file
tseT siht drow
peeK eht redro

Check if a string contains "-" and "]" at the same time

I have the next two regex in Bash:
1.^[-a-zA-Z0-9\,\.\;\:]*$
2.^[]a-zA-Z0-9\,\.\;\:]*$
The first matches when the string contains a "-" and the other values.
The second when contains a "]".
I put this values at the beginning of my regex because I can't scape them.
How I can get match the two values at the same time?
You can also place the - at the end of the bracket expression, since a range must be closed on both ends.
^[]a-zA-Z0-9,.;:-]*$
You don't have to escape any of the other characters, either. Colons, semicolons, and commas have no special meaning in any part of a regular expression, and while a period loses its special meaning inside a bracket expression.
Basically you can use this:
grep -E '^.*\-.*\[|\[.*\-.*$'
It matches either a - followed by zero or more arbitrary chars and a [ or a [ followed by zero or more chars and a -
However since you don't accept arbitrary chars, you need to change it to:
grep -E '^[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*\[|\[[a-zA-Z0-9,.;:]*\-[a-zA-Z0-9,.;:]*$'
Maybe, this can help you
#!/bin/bash
while read p; do
echo $p | grep -E '\-.*\]|\].*\-' | grep "^[]a-zA-Z0-9,.;:-]*$"
done <$1
user-host:/tmp$ cat test
-i]string
]adfadfa-
string-
]string
str]ing
]123string
123string-
?????
++++++
user-host:/tmp$ ./test.sh test
-i]string
]adfadfa-
There are two questions in your post.
One is in the description:
How I can get match the two values at the same time?
That is an OR match, which could be done with a range that mix your two ranges:
pattern='^[]a-zA-Z0-9,.;:-]*$'
That will match a line that either contains one (or several) -…OR…]…OR any of the included characters. That would be all the lines (except ?????, ++++++ and as df gh) in the test script below.
Two is in the title:
… a string contains “-” and “]” at the same time
That is an AND match. The simplest (and slowest) way to do it is:
echo "$line" | grep '-' | grep ']' | grep '^[-a-zA-Z0-9,.;:]*$'
The first two calls to grep select only the lines that:
contain both (one or several) - and (one or several) ]
Test script:
#!/bin/bash
printlines(){
cat <<-\_test_lines_
asdfgh
asdfgh-
asdfgh]
as]df
as,df
as.df
as;df
as:df
as-df
as]]]df
as---df
asAS]]]DFdf
as123--456DF
as,.;:-df
as-dfg]h
as]dfg-h
a]s]d]f]g]h
a]s]d]f]g]h-
s-t-r-i-n-g]
as]df-gh
123]asdefgh
123asd-fgh-
?????
++++++
as df gh
_test_lines_
}
pattern='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing the simple pattern of $pattern"
while read line; do
resultgrep="$( echo "$line" | grep "$pattern" )"
printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
p1='-'; p2=']'; p3='^[]a-zA-Z0-9,.;:-]*$'
printf '%s\n' "Testing a 'grep AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ $resultgrep ]] && printf '%13s %-13s\n' "$line" "$resultgrep"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing an 'AWK AND' of '$p1', '$p2' and '$p3'."
while read line; do
resultawk="$( echo "$line" |
awk -v p1="$p1" -v p2="$p2" -v p3="$p3" '$0~p1 && $0~p2 && $0~p3' )"
[[ $resultawk ]] && printf '%13s %-13s\n' "$line" "$resultawk"
done < <(printlines)
echo "#############################################################"
echo
printf '%s\n' "Testing a 'bash AND' of '$p1', '$p2' and '$p3'."
while read line; do
rgrep="$( echo "$line" | grep "$p1" | grep "$p2" | grep "$p3" )"
[[ ( $line =~ $p1 ) && ( $line =~ $p2 ) && ( $line =~ $p3 ) ]]
rbash=${BASH_REMATCH[0]}
[[ $rbash ]] && printf '%13s %-13s %-13s\n' "$line" "$rgrep" "$rbash"
done < <(printlines)
echo "#############################################################"
echo

count words in a file without using wc

Working in a shell script here, trying to count the number of words/characters/lines in a file without using the wc command. I can get the file broken into lines and count those easy enough, but I'm struggling here to get the words and the characters.
#define word_count function
count_stuff(){
c=0
w=0
l=0
local f="$1"
while read Line
do
l=`expr $line + 1`
# now that I have a line I want to break it into words and characters???
done < "$f"
echo "Number characters: $chars"
echo "Number words: $words"
echo "Number lines: $line"
}
As for characters, try this (adjust echo "test" to where you get your output from):
expr `echo "test" | sed "s/./ + 1/g;s/^/0/"`
As for lines, try this:
expr `echo -e "test\ntest\ntest" | sed "s/^.*$/./" | tr -d "\n" | sed "s/./ + 1/g;s/^/0/"`
===
As for your code, you want something like this to count words (if you want to go at it completely raw):
while read line ; do
set $line ;
while true ; do
[ -z $1 ] && break
l=`expr $l + 1`
shift ;
done ;
done
You can do this with the following Bash shell script:
count=0
for var in `cat $1`
do
count=`echo $count+1 | bc`
done
echo $count

Resources