How to extract the lines between patterns? - bash

I have a file with format like :
[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line
[PATTERN]
line1
line2
line3
.
.
.
line
I want to extract the following blocks from above file :
[PATTERN]
line1
line2
line3
.
.
.
line
Note: Number of lines between 2 [PATTERN] may varies, so can't rely on number of lines.
Basically, I want to store each pattern and the lines following it to Database, so I wil have to iterate all such blocks in my file.
How do this with Shell Scripting ?

This assumes you are using bash as your shell. For other shells, the actual solution can be different.
Assuming your data is in data:
i=0 ; cat data | while read line ; do \
if [ "$line" == "[PATTERN]" ] ; then \
i=$(($i + 1)) ; touch file.$i ; continue ; \
fi ; echo "$line" >> file.$i ; \
done
Change [PATTERN] by your actual separation pattern.
This will create files file.1, file.2, etc.
Edit: responding to request about an awk solution:
awk '/^\[PATTERN\]$/{close("file"f);f++;next}{print $0 > "file"f}' data
The idea is to open a new file each time the [PATTERN] is found (skipping that line - next command), and writing all successive lines to that file. If you need to include [PATTERN] in your generated files, delete the next command.
Notice the escaping of the [ and ], which have special meaning for regular expressions. If your pattern does not contain those, you do not need the escaping. The ^ and $ are advisable, since they tie your pattern to the beginning and end of line, which you will usually need.

This can be for sure improved, but if you want to store lines in an array here is something I did in past:
#!/bin/bash
file=$1
gp_cnt=-1
i=-1
while read line
do
# Match pattern
if [[ "$line" == "[PATTERN]" ]]; then
let "gp_cnt +=1"
# If this is not the first match process group
if [[ $gp_cnt -gt 0 ]]; then
# Process the group
echo "Processing group #`expr $gp_cnt - 1`"
echo ${parsed[*]}
fi
# Start new group
echo "Pattern #$gp_cnt catched"
i=0
unset parsed
parsed[$i]="$line"
# Other lines (lines before first pattern are not processed)
elif [[ $gp_cnt != -1 ]]; then
let "i +=1"
parsed[$i]="$line"
fi
done < <(cat $file)
# Process last group
echo "Processing group #$gp_cnt"
echo ${parsed[*]}
I don't like the processing of the last group out of the loop...

Related

Bash while loop to read line upto ;

I have a text file like
line2
line3;
line4
line5
line6
line7;
I need a loop to read the line till ; on each loop.
on the first loop read up to line3; and on the second loop up to line7; and so on.
no need to merge the lines into a single one
Consider telling read to stop on ; instead of on newlines, and instead to use newlines to delimit individual input items (when using -a to read into an array, this makes each line an array element).
When given your input,
while IFS=$'\n' read -r -d ';' -a lines; do
echo "Read group of lines:"
printf ' - %s\n' "${lines[#]};"
done
...emits as output:
Read group of lines:
- line2
- line3;
Read group of lines:
- line4
- line5
- line6
- line7;
You can, if you choose, replace the printf with something like for line in "${lines[#]}"; do to create an inner loop to operate on lines within a group one-by-one.
You can use two loops: one to continue until the end of the file, and an inner loop to read individual lines until you find one that ends with a ;.
For example,
while :; do
lines=()
while IFS= read -r line; do
lines+=( "$line" )
if [[ $line = *; ]]; then
break
fi
done
if (( ${#lines[#]} == 0 )); then
# The previous loop didn't add anything to the array,
# so the last read must have failed, and we've reached
# then end of the file
break
done
# do something with $lines
done < file.txt
Or, use one loop that pauses to use lines when one ending with a ; is found:
lines=()
while IFS= read -r line; do
lines+=("$line")
if [[ $line = *; ]]; then
# do stuff with lines, then clear the array
lines=()
fi
done < file.txt
# If applicable, do something with the last batch of lines
# if the file doesn't end with a ;-terminated line.

Copy number of line composed by special character in bash

I have an exercise where I have a file and at the begin of it I have something like
#!usr/bin/bash
# tototata
#tititutu
#ttta
Hello world
Hi
Test test
#zabdazj
#this is it
And I have to take each first line starting with a # until the line where I don't have one and stock it in a variable. In case of a shebang, it has to skip it and if there's blank space between lines, it has to skip them too. We just want the comment between the shebang and the next character.
I'm new to bash and I would like to know if there's a way to do it please ?
Expected output:
# tototata
#tititutu
#ttta
Try in this easy way to better understand.
#!/bin/bash
sed 1d your_input_file | while read line;
do
check=$( echo $line | grep ^"[#;]" )
if ([ ! -z "$check" ] || [ -z "$line" ])
then
echo $line;
else
exit 1;
fi
done
This may be more correct, although your question was unclear about weather the input file had a script shebang, if the shebang had to be skipped to match your sample output, or if the input file shebang was just bogus.
It is also unclear for what to do, if the first lines of the input file are not starting with #.
You should really post your assignment's text as a reference.
Anyway here is a script that does collects first set of consecutive lines starting with a sharp # into the arr array variable.
It may not be an exact solution to your assignment (witch you should be able to solve with what your previous lessons taught you), but will get you some clues and keys to iterate reading lines from a file and testing that lines starts with a #.
#!/usr/bin/env bash
# Our variable to store parsed lines
# Is an array of strings with an entry per line
declare -a arr=()
# Iterate reading lines from the file
# while it matches Regex: ^[#]
# mean while lines starts with a sharp #
while IFS=$'\n' read -r line && [[ "$line" =~ ^[#] ]]; do
# Add line to the arr array variable
arr+=("$line")
done <a.txt
# Print each array entries with a newline
printf '%s\n' "${arr[#]}"
How about this (not tested, so you may have to debug it a bit, but my comments in the code should explain what is going on):
while read line
do
# initial is 1 one the first line, and 0 after this. When the script starts,
# the variable is undefined.
: ${initial:=1}
# Test for lines starting with #. Need to quote the hash
# so that it is not taken as comment.
if [[ $line == '#'* ]]
then
# Test for initial #!
if (( initial == 1 )) && [[ $line == '#!'* ]]
then
: # ignore it
else
echo $line # or do whatever you want to do with it
fi
fi
# stop on non-blank, non-comment line
if [[ $line != *[^\ ]* ]]
then
break
fi
initial=0 # Next line won't be an initial line
done < your_file

How can I append a string to a line when certain conditions are met?

I'm handling large .txt files and we are trying to identify which ones do not comply with the correct amount of characters in a line (80 characters top).
For the sake of this example let's say that we need 10 characters for every line, I need to append "(+Number of extra characters)" and "(-Number of missing characters)" for each line that does not have exactly 10 characters.
Here is what I have so far:
while IFS='' read -r line || [[ -n "$line" ]]; do
if [[ "${#line}" -gt 10 ]]; then
echo "Mo dan 10 D: ${#line}"
elif [[ "${#line}" -lt 10 ]]; then
echo "Less dan 10 D: ${#line}"
fi
done < "$1"
I'm stuck in finding a way to append those two strings I'm echoing in the corresponding line so we can identify them.
I researched about awk and sed but haven't been able to properly loop through the entire .txt file, count the amount of characters in every line and append a string with the appropriate message.
Would appreciate some assistance in either shell scripting or as an awk or sed solution.
Thank You.
Edit: This is an example input file (note white spaces also count as characters)
Line 1****
Line 2*****
Line 3*
Line 4****
Line 5****
Line 6**
Line 7****
Line 8********
Line 9****
This is the desired output
Line 1****
Line 2*****(+1)
Line 3*(-3)
Line 4****
Line 5****
Line 6**(-2)
Line 7****
Line 8********(+4)
Line 9****
For performance reasons, using a shell loop to process the lines of a file is the wrong approach (unless the file is very small).
A text-processing utility such as awk is the much better choice:
awk -v targetLen=10 '
diff = length($0) - targetLen { # input line ($0) does not have the expected length
$0 = $0 "(" (diff > 0 ? "+" : "") diff ")" # append diff (with +, if positive)
}
1 # Print the (possibly modified) line.
' <<'EOF' # sample input as a here-document
1234567890
123456789
123456789012
EOF
This yields:
1234567890
123456789(-1)
123456789012(+2)
Caveat: The BSD/macOS awk implementation is not locale-aware, so its length function counts bytes, which will only work as intended with ASCII-range characters.
$ cat lines.in
Line 1****
Line 2*****
Line 3*
Line 4****
Line 5****
Line 6**
Line 7****
Line 8********
Line 9****
$ cat lines.sh
#!/bin/bash
mark=10
while IFS='' read -r line || [[ -n "$line" ]]; do
diff=$(( ${#line} - mark ))
if [ ${diff} -eq 0 ]; then
echo "${line}"
else
printf "%s (%+d)\n" "${line}" "${diff}"
fi
done < "$1"
$ ./lines.sh lines.in
Line 1****
Line 2***** (+1)
Line 3* (-3)
Line 4****
Line 5****
Line 6** (-2)
Line 7****
Line 8******** (+4)
Line 9****
I based my answer on your original script
#!/bin/bash
while IFS='' read -r line || [[ -n "$line" ]]; do
nchars=${#line}
target=10
if [[ $nchars -gt $target ]]; then
echo "$line+($((nchars-target)))"
elif [[ $nchars -lt $target ]]; then
echo "$line-($((target-nchars)))"
else
echo "$line"
fi
done < "$1"
simply use it like this bash evalscript inputfile > outputfile

Running math, ignoring non-numeric values

I am trying to do some math on 2nd column of a txt file , but some lines are not numbers , i only want to operate on the lines which have numbers .and keep other line unchanged
txt file like below
aaaaa
1 2
3 4
How can I do this?
Doubling the second column in any line that doesn't contain any alphabetic content might look a bit like the following in native bash:
#!/bin/bash
# iterate over lines in input file
while IFS= read -r line; do
if [[ $line = *[[:alpha:]]* ]]; then
# line contains letters; emit unmodified
printf '%s\n' "$line"
else
# break into a variable for the first word, one for the second, one for the rest
read -r first second rest <<<"$line"
if [[ $second ]]; then
# we extracted a second word: emit it, doubled, between the first word and the rest
printf '%s\n' "$first $(( second * 2 )) $rest"
else
# no second word: just emit the whole line unmodified
printf '%s\n' "$line"
fi
fi
done
This reads from stdin and writes to stdout, so usage is something like:
./yourscript <infile >outfile
thanks all ,this is my second time to use this website ,i find it is so helpful that it can get the answer very quickly
I also find a answer below
#!/bin/bash
FILE=$1
while read f1 f2 ;do
if[[$f1 != *[!0-9]*]];then
f2=`echo "$f2 -1"|bc` ;
echo "$f1 $f2"
else
echo "$f1 $f2"
fi
done< %FILE

Shell Script: how to read a text file that does not end with a newline on Windows

The following program reads a file and it intends to store the all values (each line) into a variable but doesn't store the last line. Why?
file.txt :
1
2
.
.
.
n
Code :
FileName=file.txt
if test -f $FileName # Check if the file exists
then
while read -r line
do
fileNamesListStr="$fileNamesListStr $line"
done < $FileName
fi
echo "$fileNamesListStr" // 1 2 3 ..... n-1 (but it should print up to n.)
Instead of reading line-by-line, why not read the whole file at once?
[ -f $FileName ] && fileNameListStr=$( tr '\n' ' ' < $FileName )
One probable cause is that there misses a newline after the last line n.
Use the following command to check it:
tail -1 file.txt
And the following fixes:
echo >> file.txt
If you really need to keep the last line without newline, I reorganized the while loop here.
#!/bin/bash
FileName=0
if test -f $FileName ; then
while [ 1 ] ; do
read -r line
if [ -z $line ] ; then
break
fi
fileNamesListStr="$fileNamesListStr $line"
done < $FileName
fi
echo "$fileNamesListStr"
The issue is that when the file does not end in a newline, read returns non-zero and the loop does not proceed. The read command will still read the data, but it will not process the loop. This means that you need to do further processing outside of the loop. You also probably want an array instead of a space separated string.
FileName=file.txt
if test -f $FileName # Check if the file exists
then
while read -r line
do
fileNamesListArr+=("$line")
done < $FileName
[[ -n $line ]] && fileNamesListArr+=("$line")
fi
echo "${fileNameListArr[#]}"
See the "My text files are broken! They lack their final newlines!" section of this article:
http://mywiki.wooledge.org/BashFAQ/001
As a workaround, before reading from the text file a newline can be appended to the file.
echo "\n" >> $file_path
This will ensure that all the lines that was previously in the file will be read. Now the file can be read line by line.

Resources