Printing the line number of the first occurence of a pattern starting from a specific line number in a file - bash

How can one print the line number of the first occurence of the string foo in a file abc.text? I want the line of the first occurence as if abc.text started from it's 10th line.

One way would be to use GNU sed this way:
sed -n '10,$ {/foo/{=; q}}' abc.text
That is:
In the range from line 10 to the end
-> For a line matching foo
-> Print the line number and then stop processing
Some tests to demonstrate the correctness:
{ for i in {1..9}; do echo $i; done; echo foo; echo bar; echo foo; } | \
sed -n '10,$ {/foo/{=; q}}'
# correctly prints 10
{ for i in {1..9}; do echo $i; done; echo x; echo foo; echo bar; echo foo; } | \
sed -n '10,$ {/foo/{=; q}}'
# correctly prints 11
{ for i in {1..9}; do echo $i; done; } | \
sed -n '10,$ {/foo/{=; q}}'
# correctly prints nothing

You could try:
awk 'NR>=10 && $0~/foo/ {print NR; exit}' abc.text
Explanations:
NR is the line number, only lines with number >=10 will be considered
if you want the output line number to ignore the first 9 lines, print NR-9
$0~/foo/ is standard pattern matching in awk
the exit will stop processing at the first occurrence

This may be more readable for some users:
tail -n +10 abc.txt | grep -n foo | head -1 | cut -f1 '-d:'

Related

Accept filename as argument and calculate repeated words along with count

I need to find the number or repeated characters from a text file and need to pass filename as argument.
Example:
test.txt data contains
Zoom
Output should be like:
z 1
o 2
m 1
I need a command that will accept filename as argument and then lists the number of characters from that file. In my example I have a test.txt which has zoom word. So the output will be like how many times each letter has repeated.
My attempt:
vi test.sh
#!/bin/bash
FILE="$1" --to pass filename as argument
sort file1.txt | uniq -c --to count the number of letters
Just a guess?
cat test.txt |
tr '[:upper:]' '[:lower:]' |
fold -w 1 |
sort |
uniq -c |
awk '{print $2, $1}'
m 1
o 2
z 1
Suggesting awk script that count all kinds of chars:
awk '
BEGIN{FS = ""} # make each char a field
{
for (i = 1; i <= NF; i++) { # iteratre over all fields in line
++charsArr[$i]; # count each field occourance in array
}
}
END {
for (char in charsArr) { # iterrate over chars array
printf("%3d %s\n", charsArr[char], char); # cournt char-occourances and the char
}
}' |sort -n
Or in one line:
awk '{for(i=1;i<=NF;i++)++arr[$i]}END{for(char in arr)printf("%3d %s\n",arr[char],char)}' FS="" input.1.txt|sort -n
#!/bin/bash
#get the argument for further processing
inputfile="$1"
#check if file exists
if [ -f $inputfile ]
then
#convert file to a usable format
#convert all characters to lowercase
#put each character on a new line
#output to temporary file
cat $inputfile | tr '[:upper:]' '[:lower:]' | sed -e 's/\(.\)/\1\n/g' > tmp.txt
#loop over every character from a-z
for char in {a..z}
do
#count how many times a character occurs
count=$(grep -c "$char" tmp.txt)
#print if count > 0
if [ "$count" -gt "0" ]
then
echo -e "$char" "$count"
fi
done
rm tmp.txt
else
echo "file not found!"
exit 1
fi

How to get the line number of a string in another string in Shell

Given
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
I'd like to get the line number of the first occurrence of $str in $sourceStr, which should be 3.
I don't know how to do it.
I have tried:
awk 'match($0, v) { print NR; exit }' v=$str <<<$sourceStr
grep -n $str <<< $sourceStr | grep -Eo '^[^:]+';
grep -n $str <<< $sourceStr | cut -f1 -d: | sort -ug
grep -n $str <<< $sourceStr | awk -F: '{ print $1 }' | sort -u
All output 1, not 3.
How can I get the line number of $str in $sourceStr?
Thanks!
You may use this awk + printf in bash:
awk -v s="$str" '$0 == s {print NR; exit}' <(printf "%b\n" "$sourceStr")
3
Or even this awk without any bash support:
awk -v s="$str" -v source="$sourceStr" 'BEGIN {
split(source, a); for (i=1; i in a; ++i) if (a[i] == s) {print i; exit}}'
3
You may use this sed as well:
sed -n "/^$str$/{=;q;}" <(printf "%b\n" "$sourceStr")
3
Or this grep + cut:
printf "%b\n" "$sourceStr" | grep -nxF -m 1 "$str" | cut -d: -f1
3
It's not clear if you've just made a cut-n-paste error, but your sourceStr is not a multiline string (as demonstrated below). Also, you really need to quote your herestring (also demonstrated below). Perhaps you just want:
$ sourceStr="abc\nefg\nhij\nlmn\nhij"
$ echo "$sourceStr"
abc\nefg\nhij\nlmn\nhij
$ sourceStr=$'abc\nefg\nhij\nlmn\nhij'
$ echo "$sourceStr"
abc
efg
hij
lmn
hij
$ cat <<< $sourceStr
abc efg hij lmn hij
$ cat <<< "$sourceStr"
abc
efg
hij
lmn
hij
$ str=hij
$ awk "/${str}/ {print NR; exit}" <<< "$sourceStr"
3
Just use sed!
printf 'abc\nefg\nhij\nlmn\nhij\n' \
| sed -n '/hij/ { =; q; }'
Explanation: if sed meets a line that contains "hij" (regex /hij/), it prints the line number (the = command) and exits (the q command). Else it doesn't print anything (the -n switch) and goes on with the next line.
[update] Hmmm, sorry, I just noticed your "All output 1, not 3".
The primary reason why your commands don't output 3 is that sourceStr="abc\nefg\nhij\nlmn\nhij" doesn't automagically change your \n into new lines, so it ends up being one single line and that's why your commands always display 1.
If you want a multiline string, here are two solutions with bash:
printf -v sourceStr "abc\nefg\nhij\nlmn\nhij"
sourceStr=$'abc\nefg\nhij\nlmn\nhij'
And now that your variable contains space characters (new lines), as stated by William Pursell, in order to preserve them, you must enclose your $sourceStr with double quotes:
grep -n "$str" <<< "$sourceStr" | ...
There's always a hard way to do it:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | nl | grep $str | head -1 | gawk '{ print $1 }'
or, a bit more efficient:
str="hij";
sourceStr="abc\nefg\nhij\nlmn\nhij";
echo -e $sourceStr | gawk '/'$str/'{ print NR; exit }'

How to pad out values line by line while mainting overall record length in a Unix Shell script ksh

IFS=$'\n'
while read -r line
do
--header/trailer record
if echo ${line} | grep -e '000000000000000' -e '999999999999999' >/dev/null 2>&1
then
echo ${line} >> outfile.01.DAT.sampleNEW
elif echo ${line} | grep '+0' >/dev/null 2>&1
then
echo ${line} | sed -e 's/+/+00000000/; s/ X/X/' >> outfile.01.DAT.sampleNEW
else
echo ${line} | sed -e 's/-/-00000000/; s/ X/X/' >> outfile.01.DAT.sampleNEW
fi
done < Inputfile.01.DAT
I have a large file that I need to pad out the amount fields (signed) but retain the overall record length so have to remove some filler spaces at the end (each line ends with X). The file has a header/trailer that does not need to change. I have come up with a way but it is very slow when using a large input file. I am sure the use of grep here is not good.
Sample records. end with X - Overall length 107 bytes
000000000000000PPPPPPPPP Information INV TRANSACTION 0120160505201605052154HI203.SEQ 01 X
000000000000001PPPPP14PA 000YYYYYY488 -0001235.2520150319 X
000000000000002PPPMS PA 000RRRRR4539 +0008285.0020160301 X
000000000000003PPPP506 000TTTTTT605 -0000225.0020150608 X
9999999999999990000000000000439.940000000079802782.180000005 X
I suspect you want something like this, but it is very hard to tell given the way you have presented your question:
awk '
/000000000000000/ || /999999999999999/ {print;next}
/\+0/ {sub(/\+0/,"+00000000"); sub(/ X/,'X'); print; next}
/\-0/ {sub(/\-0/,"-00000000"); sub(/ X/,'X'); print; next}
' Inputfile.01.DAT
That says... "if the line contains a string of 15 zeroes or 15 nines, print it and move to the next line. If the line contains +0, replace it with +00000000 and remove 8 spaces before the final X, then print. Likewise for -0."
You could also maybe use Perl, and do something like this:
perl -nle '/0{15}|9{15}/ && print; s/([+-])0/$1\0000000000/ && s/ X/X/ && print' Inputfile.01.DAT

Run script on pipeline output variable number of times

I have a command that appends a computed column to stdout that I would like to apply a variable N number of times.
For example if my input was 'hello\nworld\n' and I wanted to append a column of 0, N=3 times I could type the following:
echo -e 'hello\nworld' | sed 's/$/ 0/' | sed 's/$/ 0/' | sed 's/$/ 0/'
I've been trying stupid ideas like:
echo -e 'hello\nworld' | (for i in $(seq 1 $N); do echo $(cat) 0; done)
and
echo -e 'hello\nworld' | (for i in $(seq 1 $N); do sed 's/$/ 0/'; done)
but clearly these are not chaining the pipeline.
Any ideas?
This is easily done with recursion:
repeat() {
count="$1"
shift
if [ "$count" -ge 1 ]
then
"$#" | repeat "$((count-1))" "$#"
else
cat
fi
}
Examples:
$ echo foo | repeat 0 sed 's/$/ 0/'
foo
$ echo foo | repeat 1 sed 's/$/ 0/'
foo 0
$ echo foo | repeat 3 sed 's/$/ 0/'
foo 0 0 0
So you want to append a value of 0 a multiple N = 3 times:
awk -v value="0" -v N=3 \
'{printf "%s", $0; for (i = 0; i < N; i++) printf " %s", value; print "" }'
Pass the value and the repeat count as variables to awk. For each line, print the input; then add N copies of the value; then emit a newline.
You could use OFS (the output field separator) in place of blanks to separate the output in the loop:
printf "%s%s", OFS, value;

How do I use Head and Tail to print specific lines of a file

I want to say output lines 5 - 10 of a file, as arguments passed in.
How could I use head and tail to do this?
where firstline = $2 and lastline = $3 and filename = $1.
Running it should look like this:
./lines.sh filename firstline lastline
head -n XX # <-- print first XX lines
tail -n YY # <-- print last YY lines
If you want lines from 20 to 30 that means you want 11 lines starting from 20 and finishing at 30:
head -n 30 file | tail -n 11
#
# first 30 lines
# last 11 lines from those previous 30
That is, you firstly get first 30 lines and then you select the last 11 (that is, 30-20+1).
So in your code it would be:
head -n $3 $1 | tail -n $(( $3-$2 + 1 ))
Based on firstline = $2, lastline = $3, filename = $1
head -n $lastline $filename | tail -n $(( $lastline -$firstline + 1 ))
Aside from the answers given by fedorqui and Kent, you can also use a single sed command:
#!/bin/sh
filename=$1
firstline=$2
lastline=$3
# Basics of sed:
# 1. sed commands have a matching part and a command part.
# 2. The matching part matches lines, generally by number or regular expression.
# 3. The command part executes a command on that line, possibly changing its text.
#
# By default, sed will print everything in its buffer to standard output.
# The -n option turns this off, so it only prints what you tell it to.
#
# The -e option gives sed a command or set of commands (separated by semicolons).
# Below, we use two commands:
#
# ${firstline},${lastline}p
# This matches lines firstline to lastline, inclusive
# The command 'p' tells sed to print the line to standard output
#
# ${lastline}q
# This matches line ${lastline}. It tells sed to quit. This command
# is run after the print command, so sed quits after printing the last line.
#
sed -ne "${firstline},${lastline}p;${lastline}q" < ${filename}
Or, to avoid any external utilites, if you're using a recent version of bash (or zsh):
#!/bin/sh
filename=$1
firstline=$2
lastline=$3
i=0
exec <${filename} # redirect file into our stdin
while read ; do # read each line into REPLY variable
i=$(( $i + 1 )) # maintain line count
if [ "$i" -ge "${firstline}" ] ; then
if [ "$i" -gt "${lastline}" ] ; then
break
else
echo "${REPLY}"
fi
fi
done
try this one-liner:
awk -vs="$begin" -ve="$end" 'NR>=s&&NR<=e' "$f"
in above line:
$begin is your $2
$end is your $3
$f is your $1
Save this as "script.sh":
#!/bin/sh
filename="$1"
firstline=$2
lastline=$3
linestoprint=$(($lastline-$firstline+1))
tail -n +$firstline "$filename" | head -n $linestoprint
There is NO ERROR HANDLING (for simplicity) so you have to call your script as following:
./script.sh yourfile.txt firstline lastline
$ ./script.sh yourfile.txt 5 10
If you need only line "10" from yourfile.txt:
$ ./script.sh yourfile.txt 10 10
Please make sure that:
(firstline > 0) AND (lastline > 0) AND (firstline <= lastline)

Resources