How to read line by line from file, count specific character [duplicate] - bash

This question already has answers here:
Count occurrences of a char in a string using Bash
(9 answers)
Closed 3 years ago.
I have a data in file.txt like below
N4*1
NM1*IL*2
PER*IC*XM*
how can i read line by line and get the character * count ?
Pseudocode
#!bash/sh
cd `dirname $0`
filelinecount=echo '$(wc -l file.txt)'
if [ $filelinecount -gt 0 ] ; then
for (int i=0, i++); do
fileline=$i (STORES LINE IN VARIABLE)
charactercount= cat '$fileline | wc [*] $fileline'
(GET CHARACTER [*] COUNT AND STORED IN VARIABLE)
echo $charactercount
done
else
echo "file.txt don't contain any lines"
fi
Expected output:
'For' Loop should read line by line from file and store each line in variable "fileline" then count the characters [*] and store in variable "charactercount" then print the variable $charactercount. This loop has to repeat for for all the files in the file. How can i achieve this in 'for' loop ?
1
2
3
This is not a duplicate question as this question clearly asked count of characters using "for" loop.
"Count occurrences of a char in a string using Bash" post don't have answer to this post

awk '{print gsub(/\*/,$0)}' file
To achieve the same in a loop:
#! /bin/bash
while read line
do
grep -o '*' <<<"$line" | grep -c .
done < file
This should print * count per line.
Update as per the comment:
#! /bin/bash
while read line
do
echo "$line" | awk -F"[*]" '{print NF-1}'
done < file
[ ! -s file ] && echo "no lines in file"
[ ! -s file ] has nothing to do with loop. -s flag checks if file has contents. If it has t returns true. But in your case you want opposite behaviour so we used !. So when file is empty, it returns true and && causes the next command to execute i.e. echo "no lines in file”.

tr -d -c '*\n' file.txt | awk '{print length}'
Remove everything except stars and newlines from the file. Then print the line lengths.

Related

Detect double new lines with bash script

I am attempting to return the line number of lines that have a break. An input example:
2938
383
3938
3
383
33333
But my script is not working and I can't see why. My script:
input="./input.txt"
declare -i count=0
while IFS= read -r line;
do
((count++))
if [ "$line" == $'\n\n' ]; then
echo "$count"
fi
done < "$input"
So I would expect, 3, 6 as output.
I just receive a blank response in the terminal when I execute. So there isn't a syntax error, something else is wrong with the approach I am taking. Bit stumped and grateful for any pointers..
Also "just use awk" doesn't help me. I need this structure for additional conditions (this is just a preliminary test) and I don't know awk syntax.
The issue is that "$line" == $'\n\n' won't match a newline as it won't be there after consuming an empty line from the input, instead you can match an empty line with regex pattern ^$:
if [[ "$line" =~ ^$ ]]; then
Now it should work.
It's also match easier with awk command:
$ awk '$0 == ""{ print NR }' test.txt
3
6
As Roman suggested, line read by read terminates with a delimiter, and that delimiter would not show up in the line the way you're testing for.
If the pattern you are searching for looks like an empty line (which I infer is how a "double newline" always manifests), then you can just test for that:
while read -r; do
((count++))
if [[ -z "$REPLY" ]]; then
echo "$count"
fi
done < "$input"
Note that IFS is for field-splitting data on lines, and since we're only interested in empty lines, IFS is moot.
Or if the file is small enough to fit in memory and you want something faster:
mapfile -t -O1 foo < i
declare -p foo
for n in "${!foo[#]}"; do
if [[ -z "${foo[$n]}" ]]; then
echo "$n"
fi
done
Reading the file all at once (mapfile) then stepping through an array may be easier on resources than stepping through a file line by line.
You can also just use GNU awk:
gawk -v RS= -F '\n' '{ print (i += NF); i += length(RT) - 1 }' input.txt
By using FS = ".+", it ensures only truly zero-length (i.e. $0 == "") line numbers get printed, while skipping rows consisting entirely of [[:space:]]'s
echo '2938
383
3938
3
383
33333' |
{m,g,n}awk -F'.+' '!NF && $!NF = NR'
3
6
This sed one-liner should do the job at once:
sed -n '/^$/=' input.txt
Simply writes the current line number (the = command) if the line read is empty (the /^$/ matches the empty line).

Unix bash script grep loop counter (for)

I am looping our the a grep result. The result contains 10 lines (every line has different content). So the loop stuff in the loop gets executed 10 times.
I need to get the index, 0-9, in the run so i can do actions based on the index.
ABC=(cat test.log | grep "stuff")
counter=0
for x in $ABC
do
echo $x
((counter++))
echo "COUNTER $counter"
done
Currently the counter won't really change.
Output:
51209
120049
148480
1211441
373948
0
0
0
728304
0
COUNTER: 1
If your requirement is to only print counter(which is as per shown samples only), in that case you could use awk(if you are ok with it), this could be done in a single awk like, without creating variable and then using grep like you are doing currently, awk could perform both search and counter printing in a single shot.
awk -v counter=0 '/stuff/{print "counter=" counter++}' Input_file
Replace stuff string above with the actual string you are looking for and place your actual file name for Input_file in above.
This should print like:
counter=1
counter=2
........and so on
Your shell script contains what should be an obvious syntax error.
ABC=(cat test.log | grep "stuff")
This fails with
-bash: syntax error near unexpected token `|'
There is no need to save the output in a variable if you only want to process one at a time (and obviously no need for the useless cat).
grep "stuff" test.log | nl
gets you numbered lines, though the index will be 1-based, not zero-based.
If you absolutely need zero-based, refactoring to Awk should solve it easily:
awk '/stuff/ { print n++, $0 }' test.log
If you want to loop over this and do something more with this information,
awk '/stuff/ { print n++, $0 }' test.log |
while read -r index output; do
echo index is "$index"
echo output is "$output"
done
Because the while loop executes in a subshell the value of index will not be visible outside of the loop. (I guess that's what your real code did with the counter as well. I don't think that part of the code you posted will repro either.)
Do not store the result of grep in a scalar variable $ABC.
If the line of the log file contains whitespaces, the variable $x
is split on them due to the word splitting of bash.
(BTW the statement ABC=(cat test.log | grep "stuff") causes a syntax error.)
Please try something like:
readarray -t abc < <(grep "stuff" test.log)
for x in "${abc[#]}"
do
echo "$x"
echo "COUNTER $((++counter))"
done
or
readarray -t abc < <(grep "stuff" test.log)
for i in "${!abc[#]}"
do
echo "${abc[i]}"
echo "COUNTER $((i + 1))"
done
you can use below increment statement-
counter=$(( $counter + 1));

Counting number of delimiters of special character bash shell script Performance improvement

Hi I have a script that is going to count the number of records in a file and find the expected delimiters per a record by dividing the total record count by rs_count. It works fine but it is a little slow on large records. I was wondering if there is a way to improve performance. The RS is a special character octal \246. I am using bash shell script.
Some additional info:
A line is a record.
The file will always have the same number of delimiters.
The purpose of the script is to check if the file has the expected number of fields. After calculating it, the script just echos it out.
for file in $SOURCE; do
echo "executing File -"$file
if (( $total_record_count != 0 ));then
filename=$(basename "$file")
total_record_count=$(wc -l < $file)
rs_count=$(sed -n 'l' $file | grep -o $RS | wc -l)
Delimiter_per_record=$((rs_count/total_record_count))
fi
done
Counting the delimiters (not total records) in a file
On a file with 50,000 lines, I note around a 10 fold increase by incorporating the sed, grep, and wc pipeline to a single awk process:
awk -v RS='Delimiter' 'END{print NR -1}' input_file
Dealing with wc when there's no trailing line breaks
If you count the instances of ^ (start of line), you will get a true count of lines. Using grep:
grep -co "^" input_file
(Thankfully, even though ^ is a regex, the performance of this is on par with wc)
Incorporating these two modifications into a trivial test based on your supplied code:
#!/usr/bin/env bash
SOURCE="$1"
RS=$'\246'
for file in $SOURCE; do
echo "executing File -"$file
if [[ $total_record_count != 0 ]];then
filename=$(basename "$file")
total_record_count=$(grep -oc "^" $file)
rs_count="$(awk -v RS=$'\246' 'END{print NR -1}' $file)"
Delimiter_per_record=$((rs_count/total_record_count))
fi
done
echo -e "\$rs_count:\t${rs_count}\n\$Delimiter_per_record:\t${Delimiter_per_record}\n\$total_record_count:\t${total_record_count}" | column -t
Running this on a file with 50,000 lines on my macbook:
time ./recordtest.sh /tmp/randshort
executing File -/tmp/randshort
$rs_count: 186885
$Delimiter_per_record: 3
$total_record_count: 50000
real 0m0.064s
user 0m0.038s
sys 0m0.012s
Unit test one-liner
(creates /tmp/recordtest, chmod +x's it, creates /tmp/testfile with 10 lines of random characters including octal \246, and then runs the script file on the testfile)
echo $'#!/usr/bin/env bash\n\nSOURCE="$1"\nRS=$\'\\246\'\n\nfor file in $SOURCE; do\n echo "executing File -"$file\n if [[ $total_record_count != 0 ]];then\n filename=$(basename "$file")\n total_record_count=$(grep -oc "^" $file)\n rs_count="$(awk -v RS=$\'\\246\' \'END{print NR -1}\' $file)"\n Delimiter_per_record=$((rs_count/total_record_count))\n fi\ndone\n\necho -e "\\$rs_count:\\t${rs_count}\\n\\$Delimiter_per_record:\\t${Delimiter_per_record}\\n\\$total_record_count:\\t${total_record_count}" | column -t' > /tmp/recordtest ; echo $'\246459ca4f23bafff1c8fc017864aa3930c4a7f2918b\246753f00e5a9278375b\nb\246a3\246fc074b0e415f960e7099651abf369\246a6f\246f70263973e176572\2467355\n1590f285e076797aa83b2ee537c7f99\24666990bb60419b8aa\246bb5b6b\2467053\n89b938a5\246560a54f2826250a2c026c320302529331229255\246ef79fbb52c2\n9042\246bb\246b942408a22f912268ffc78f08c\2462798b0c05a75439\246245be2ea5\n0ef03170413f90e\246e0\246b1b2515c4\2466bf0a1bb\246ee28b78ccce70432e6b\24653\n51229e7ab228b4518404360b31a\2463673261e3242985bf24e59bc657\246999a\n9964\246b08\24640e63fae788ea\246a1777\2460e94f89af8b571e\246e1b53e6332\246c3\246e\n90\246ae12895f\24689885e\246e736f942080f267a275132a348ec1e837b99efe94\n2895e91\246\246f506f\246c1b986a63444b4258\246bc1b39182\24630\24696be' > /tmp/testfile ; chmod +x /tmp/recordtest ; /tmp/./recordtest /tmp/testfile
Which produces this result:
$rs_count: 39
$Delimiter_per_record: 3
$total_record_count: 10
Though there's a number of solutions for counting instances of characters in files, quite a few come undone when trying to process special characters like octal \246
awk seems to handle it reliably and quickly.

Count number of lines under each header in a text file using bash shell script

I can do this easily in python or some other high level language. What I am interested in is doing this with bash.
Here is the file format:
head-xyz
item1
item2
item3
head-abc
item8
item5
item6
item9
What I would like to do is print the following output:
head-xyz: 3
head-abc: 4
header will have a specific pattern similar to the example i gave above. items also have specific patterns like in the example above. I am only interested in the count of items under each header.
You can use awk:
awk '/head/{h=$0}{c[h]++}END{for(i in c)print i, c[i]-1}' input.file
Breakdown:
/head/{h=$0}
For every line matching /head/, set variable h to record the header.
{c[h]++}
For every line in the file, update the array c, which stores a map from header string to line count.
END{for(i in c)print i, c[i]-1}
At the end, loop through the keys in array c and print the key (header) followed by the value (count). Subtract one to avoid counting the header itself.
Note: Bash version 4 only (uses associative arrays)
#!/usr/bin/env bash
FILENAME="$1"
declare -A CNT
while read -r LINE || [[ -n $LINE ]]
do
if [[ $LINE =~ ^head ]]; then HEADLINE="$LINE"; fi
if [ ${CNT[$HEADLINE]+_} ];
then
CNT[$HEADLINE]=$(( ${CNT[$HEADLINE]} + 1 ))
else
CNT[$HEADLINE]=0
fi
done < "$FILENAME"
for i in "${!CNT[#]}"; do echo "$i: ${CNT[$i]}"; done
Output:
$ bash countitems.sh input
head-abc: 4
head-xyz: 3
Does this answer your question #powerrox ?
If you don't consider sed a high-level language, here's another approach:
for file in head-*; do
echo "$file: \c"
sed -n '/^head-/,${
/^head-/d
/^item[0-9]/!q
p
}
' <$file | wc -l
done
In English, the sed script does
Don't print by default
Within lines matching /^head-/ to end of file
Delete the "head line"
After that, quit if you find a non-item line
Otherwise, print the line
And wc -l to count lines.

replacing multiple lines in shell script with only one output file

I have one file Length.txt having multiples names (40) line by line.
I want to write a small shell script where it will count the character count of each line of the file and if the count is less than 9 replace those lines with adding extra 8 spaces and 1 at the end of each line.
For example, if the name is
XXXXXX
replace as
XXXXXX 1
I tried with the below coding. It is working for me, however whenever it's replacing the line it is displaying all the lines at a time.
So suppose I have 40 lines in Length.txt and out of that 4 lines having less than 9 character count then my output has 160 lines.
Can anyone help me to display only 40 line output with the 4 changed lines?
#!/usr/bin/sh
#set -x
while read line;
do
count=`echo $line|wc -m`
if [ $count -lt 9 ]
then
Number=`sed -n "/$line/=" Length.txt`;
sed -e ""$Number"s/$line/$line 1/" Length4.txt
fi
done < Length.txt
A single sed command can do that:
sed -E 's/^.{,8}$/& 1/' file
To modify the contents of the file add -i:
sed -iE 's/^.{,8}$/& 1/' file
Partial output:
94605320 1
105018263
2475218231
7728563 1
1
* Fixed to include only add 8 spaces not 9, and include empty lines. If you don't want to process empty lines, use {1,8}.
$ cat foo.input
I am longer than 9 characters
I am also longer than 9 characters
I am not
Another long line
short
$ while read line; do printf "$line"; (( ${#line} < 9 )) && printf " 1"; echo; done < foo.input
I am longer than 9 characters
I am also longer than 9 characters
I am not 1
Another long line
short 1
Let me show you what is wrong with your script. The only thing missing from your script is that you need to use sed -i to edit file and re-save it after making the replacement.
I'm assuming Length4.txt is just a copy of Length.txt?
I added sed -i to your script and it should work now:
cp Length.txt Length4.txt
while read line;
do
count=`echo $line|wc -m`
if [ $count -lt 9 ]
then
Number=`sed -n "/$line/=" Length.txt`
sed -ie ""$Number"s/$line/$line 1/" Length4.txt
fi
done < Length.txt
However, you don't need sed or wc. You can simplify your script as follows:
while IFS= read -r line
do
count=${#line}
if (( count < 9 ))
then
echo "$line 1"
else
echo "$line"
fi
done < Length.txt > Length4.txt
$ awk -v FS= 'NF<9{$0=sprintf("%s%*s1",$0,8,"")} 1' file
XXXXXX 1
Note how simple it would be to check for a number other than 9 characters and to print some sequence of blanks other than 8.

Resources