stopping 'sed' after match found on a line; don't let sed keep checking all lines to EOF - bash

I have a text file in which each a first block of text on each line is separated by a tab from a second block of text like so:
VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.
In case it is hard to tell, tab is long space between "quasi-subjunctive" and "Be".
So I am thinking off the top of my head a 'for' loop in which a var is set using 'sed' to read the first block of text of a line, upto and including the tab (or not, doesn't really matter) and then the 'var' is used to find subsequent matches adding a "(x)" right before the tab to make the line unique. The 'x' of course would be a running counter numbering the first instance '1' incrementing and then each subsequent match one number higher.
One problem I see is stopping 'sed' after each subsequent match so the counter can be incremented. Is there a way to do this, since it is "sed's" normal behaviour to continue on thru without stop (as far as I know) until all lines are processed.

You can set the IFS to TAB character and read the line into variables. Something like:
$ while IFS=$'\t' read block1 block2;do
echo "block1 is $block1"
echo "block2 is $block2"
done < file
block1 is VERBS, AUXILIARY. "Be," subjunctive and quasi-subjunctive
block2 is Be, Beest, &c., was used in A.-S. (beon) generally in a future sense.

Ok so I got the job done with this little (or perhaps big if too much overkill?) script I whipped up:
#!/bin/bash
sedLnCnt=1
while [[ "$sedLnCnt" -lt 521 ]] ; do
lN=$(sed -n "${sedLnCnt} p" sGNoSecNums.html|sed -r 's/^([^\t]*\t).*$/\1/') #; echo "\$lN: $lN"
lnNum=($(grep -n "$lN" sGNoSecNums.html|sed -r 's/^([0-9]+):.*$/\1/')) #; echo "num of matches: ${#lnNum[#]}"
if [[ "${#lnNum[#]}" -gt 1 ]] ; then #'if'
lCnt="${#lnNum[#]}"
((eleN = $lCnt-1)) #; echo "\$eleN: ${eleN}" # var $eleN needs to be 1 less than total line count of zero-based array
while [[ "$lCnt" -gt 0 ]] ; do
sed -ri "${lnNum[$eleN]}s/^([^\t]*)\t/\1 \(${lCnt}\)\t/" sGNoSecNums.html
((lCnt--))
((eleN--))
done
fi
((sedLnCnt++))
done
Grep was the perfect way to find line numbers of matches, jamming them into an array and then editing each line appending the unique identifier.

Related

Using sed in order to change a specific character in a specific line

I'm a beginner in bash and here is my problem. I have a file just like this one:
Azzzezzzezzzezzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I try in a script to edit this file.ABC letters are unique in all this file and there is only one per line.
I want to replace the first e of each line by a number who can be :
1 in line beginning with an A,
2 in line beginning with a B,
3 in line beginning with a C,
and I'd like to loop this in order to have this type of result
Azzz1zzz5zzz1zzz...
Bzzz2zzz4zzz5zzz...
Czzz3zzz6zzz3zzz...
All the numbers here are random int variables between 0 and 9. I really need to start by replacing 1,2,3 in first exec of my loop, then 5,4,6 then 1,5,3 and so on.
I tried this
sed "0,/e/s/e/$1/;0,/e/s/e/$2/;0,/e/s/e/$3/" /tmp/myfile
But the result was this (because I didn't specify the line)
Azzz1zzz2zzz3zzz...
Bzzzezzzezzzezzz...
Czzzezzzezzzezzz...
I noticed that doing sed -i "/A/ s/$/ezzz/" /tmp/myfile will add ezzz at the end of A line so I tried this
sed -i "/A/ 0,/e/s/e/$1/;/B/ 0,/e/s/e/$2/;/C/ 0,/e/s/e/$3/" /tmp/myfile
but it failed
sed: -e expression #1, char 5: unknown command: `0'
Here I'm lost.
I have in a variable (let's call it number_of_e_per_line) the number of e in either A, B or C line.
Thank you for the time you take for me.
Just apply s command on the line that matches A.
sed '
/^A/{ s/e/$1/; }
/^B/{ s/e/$2/; }
# or shorter
/^C/s/e/$3/
'
s command by default replaces the first occurrence. You can do for example s/s/$1/2 to replace the second occurrence, s/e/$1/g (like "Global") replaces all occurrences.
0,/e/ specifies a range of lines - it filters lines from the first up until a line that matches /e/.
sed is not part of Bash. It is a separate (crude) programming language and is a very standard command. See https://www.grymoire.com/Unix/Sed.html .
Continuing from the comment. sed is a poor choice here unless all your files can only have 3 lines. The reason is sed processes each line and has no way to keep a separate count for the occurrences of 'e'.
Instead, wrapping sed in a script and keeping track of the replacements allows you to handle any file no matter the number of lines. You just loop and handle the lines one at a time, e.g.
#!/bin/bash
[ -z "$1" ] && { ## valiate one argument for filename provided
printf "error: filename argument required.\nusage: %s filename\n" "./$1" >&2
exit 1
}
[ -s "$1" ] || { ## validate file exists and non-empty
printf "error: file not found or empty '%s'.\n" "$1"
exit 1
}
declare -i n=1 ## occurrence counter initialized 1
## loop reading each line
while read -r line || [ -n "$line" ]; do
[[ $line =~ ^.*e.*$ ]] || continue ## line has 'e' or get next
sed "s/e/1/$n" <<< "$line" ## substitute the 'n' occurence of 'e'
((n++)) ## increment counter
done < "$1"
Your data file having "..." at the end of each line suggests your files is larger than the snippet posted. If you have lines beginning 'A' - 'Z', you don't want to have to write 26 separate /match/s/find/replace/ substitutions. And if you have somewhere between 3 and 26 (or more), you don't want to have to rewrite a different sed expression for every new file you are faced with.
That's why I say sed is a poor choice. You really have no way to make the task a generic task with sed. The downside to using a script is it will become a poor choice as the number of records you need to process increase (over 100000 or so just due to efficiency)
Example Use/Output
With the script in replace-e-incremental.sh and your data in file, you would do:
$ bash replace-e-incremental.sh file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...
To Modify file In-Place
Since you make multiple calls to sed here, you need to redirect the output of the file to a temporary file and then replace the original by overwriting it with the temp file, e.g.
$ bash replace-e-incremental.sh file > mytempfile && mv -f mytempfile file
$ cat file
Azzz1zzzezzzezzz...
Bzzzezzz1zzzezzz...
Czzzezzzezzz1zzz...

Process contents in array based on type in shellscript

I have an array that has three types of data in it, integer, integer/integer, and the string value.
I have shown a sample below.
myarr = (2301/2320,Team Lifeline, 2311, 7650/7670, 232)
I have the following algorithm that I want to come up with.
For index in myarr
if index contains data as number1/number2; then
create an array, "mynumbers" to hold all the numbers starting from number1 to number2
else if index is a string
add it in "mystrarr"
else
add it in "myintarr"
done
For the first case, if I have an enter in the myarr as 2301/2320,
then the mynumbers as shown in the pseudocode will have entries from {2301, 2302, ... , 2320}. I am not able to understand on how to parse the entry in myarr and identify that it has a / in the array.
For the second situation, I am also not sure on how to identify if the entry in the myarr and know it is a string. mystrarr should have {Team Lifeline}.
For the final case, the myintarr should have {2311, 232}.
Any help would be appreciated. I am very new to shell script.
Stack Overflow is not a coding service.... but I was bored so here you go...
#!/bin/bash
myarr=(2301/2320 'Team Lifeline' 2311 7650/7670 232)
for element in "${myarr[#]}"; do
if [[ $element =~ ^[0-9]+/[0-9]+$ ]]; then
range="{${element%/*}..${element##*/}}"
mynumbers=( $(eval "echo $range") )
elif [ $element -eq $element ] 2>> /dev/null; then
intarr+=( $element )
else
strarr+=( "$element" )
fi
done
echo "mynumbers = ${mynumbers[*]}"
echo "intarr = ${intarr[*]}"
echo "strarr = ${strarr[*]}"
A lot to unpack here for inexperienced. So ask questions where I didn't cover anything. Things to note:
All assignments there are no spaces around =.
Array assignments are of the format ( element1 element2 ... )
Appending to arrays with +=(...) format
Looping through array elements for element in "${myarr[#]}"
Note that the array generated by 7650/7670 will overwrite the array generated by 2301/2320. I assume you have some kind of plan for this array, so I didn't do anything to stop it from being overwritten.
More details
This line is validating the format for 111/222:
if [[ $element =~ ^[0-9]+/[0-9]+$ ]]; then
[[ x =~ x ]] performs a regex comparison and this regex essentially just means:
^ - beginning of the string
[0-9]+ - Atleast 1 number
/ - character literal
$ - end of string
These lines are expanding your beginning and ending numbers:
range="{${element%/*}..${element##*/}}"
mynumbers=( $(eval "echo $range") )
This is maybe more complicated than it needs to be as most people try to avoid eval in general for security reasons. I'm leveraging bash's brace expansion. If you run echo {5..9}, it will output 5 6 7 8 9. This does not trigger with variables, so I cheated and used eval.
This line is checking if we are dealing with an integer:
[ $element -eq $element ] 2>> /dev/null
This works by running an integer -eq (equals) comparison on the variable against itself. This will actually fail and throw an error message on anything but an integer. This is not the way it was designed to be used which is why we discard all the error messages (2>> /dev/null).
This is a nice succinct script, but is using some unconventional practices. A longer more verbose version may be better for a beginner.
You can use regular expressions to match elements that are nothing but digits, or digits/digits, and assume everything else is a string:
#!/bin/bash
myarr=(2301/2320 "Time Lifeline" 2311 7650/7670 232)
declare -a mynumbers mystrarr myintarr
for elem in "${myarr[#]}"; do
if [[ $elem =~ ^([0-9]+)/([0-9]+)$ ]]; then
mynumbers+=($(seq ${BASH_REMATCH[1]} ${BASH_REMATCH[2]}))
elif [[ $elem =~ ^[0-9]+$ ]]; then
myintarr+=($elem)
else
mystrarr+=("$elem")
fi
done
echo mynumbers is "${mynumbers[#]}"
echo myintarr is "${myintarr[#]}"
echo mystrarr is "${mystrarr[*]}"
Jason explained a lot in his (very similar; there's only so many obvious ways to do this) answer, so to expand on where ours are different:
We both use regular expressions to match the integer/integer case, but he then goes on to extract the two numbers using parameter expansion with pattern removal options, while mine captures the two integers in the regular expression, and uses the BASH_REMATCH array to access their values as well as the seq command to generate the numbers between the two.

How to split a string by a defined string with multiple characters in bash?

Following output consisting of several devices needs to be parsed:
0 interface=ether1 address=172.16.127.2 address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05 mac-address=CC:2D:E0:00:00:08
identity="myrouter1" platform="MikroTik" version="6.43.8 (stable)"
1 interface=ether2 address=10.5.44.100 address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07 mac-address=CC:2D:E0:00:00:05
identity="myrouter4" platform="MikroTik" version="6.43.8 (stable)"
3 interface=ether4 address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017 mac-address=B8:69:F4:00:00:07
identity="myrouter2" platform="MikroTik" version="6.43.8 (stable)"
...
10 interface=ether5 address=10.26.51.24 address4=10.26.51.24
address6=fe80::ba69:f4ff:fe00:0039 mac-address=B8:69:F4:00:00:04
identity="myrouter3" platform="MikroTik" version="6.43.8 (stable)"
11 interface=ether3 address=10.26.51.100 address4=10.26.51.100
address6=fe80::ce2d:e0ff:fe00:f00 mac-address=CC:2D:E0:00:00:09
identity="myrouter5" platform="MikroTik" version="6.43.8 (stable)"
edit: for ease of things I shortened and anonymized the output, first block has 7 lines, second block has 5 lines, third block has 7 lines, fourth block 4 lines, so the number of lines is inconsistent.
Basically its the output from a Mikrotik device: "/ip neighbor print detail"
Optimal would be to access every device(=number) on its own, then further access all setting=value (of one device) seperately to finally access settings like $device[0][identity] or similar.
I tried to set IFS='\d{1,2} ' but seems IFS only works for single character seperation.
Looking on the web I didn't find a way to accomplish this, am I looking for the wrong way and there is another way to solve this?
Thanks in advance!
edit: Found this solution Split file by multiple line breaks which helped me to get:
devices=()
COUNT=0;
while read LINE
do
[ "$LINE" ] && devices[$COUNT]+="$LINE " || { (( ++COUNT )); }
done < devices.txt
then i could use #Kamil's solution to easily access values.
While your precise output format is a bit unclear, bash offers an efficient way to parse the data making use of process substitution. Similar to command substitution, process substitution allows redirecting the output of commands to stdin. This allows you to read the result of a set of commands that reformat your mikrotik file into a single line for each device.
While there are a number of ways to do it, one of the ways to handle the multiple gymnastics needed to reformat the multi-line information for each device into a single line is by using tr and sed. tr to first replace each '\n' with an '_' (or pick your favorite character not used elsewhere), and then again to "squeeze" the leading spaces to a single space (technically not required, but for completeness). After replacing the '\n' with '_' and squeezing spaces, you simply use two sed expressions to change the "__" (resulting from the blank line) back into a '\n' and then to remove all '_'.
With that you can read your device number n and the remainder of the line holing your setting=value pairs. To ease locating your "identity=" line, simply converting the line into an array and looping using parameter expansions (for substring removal), you can save and store the "identity" value as id (trimming the double-quotes is left to you)
Now it is simply a matter of outputting the value (or doing whatever you wish with them). While you can loop again and output the array values, it is just a easy to pass the intentionally unquoted line to printf and let the printf-trick handle separating the setting=value pairs for output. Lastly, you form your $device[0][identity] identifier and output as the final line in the device block.
Putting it altogether, you could do something like the following:
#!/bin/bash
id=
while read n line; do ## read each line from process substitution
a=( $line ) ## split line into array
for i in ${a[#]}; do ## search array, set id
[ "${i%=*}" = "identity" ] && id="${i##*=}"
done
echo "device=$n" ## output device=
printf " %s\n" ${line[#]} ## output setting=value (unquoted on purpose)
printf " \$device[%s][%s]\n" "$n" "$id" ## $device[0][identity]
done < <(tr '\n' '_' < "$1" | tr -s ' ' | sed -e 's/__/\n/g' -e 's/_//g')
Example Use/Output
Note, the script takes the filename to parse as the first input.
$ bash mikrotik_parse.sh mikrotik
device=0
interface=ether1
address=172.16.127.2
address4=172.16.127.2
address6=fe80::ce2d:e0ff:fe00:05
mac-address=CC:2D:E0:00:00:08
identity="myrouter1"
platform="MikroTik"
version="6.43.8
(stable)"
$device[0]["myrouter1"]
device=1
interface=ether2
address=10.5.44.100
address4=10.5.44.100
address6=fe80::ce2d:e0ff:fe00:07
mac-address=CC:2D:E0:00:00:05
identity="myrouter4"
platform="MikroTik"
version="6.43.8
(stable)"
$device[1]["myrouter4"]
device=3
interface=ether4
address=fe80::ba69:f4ff:fe00:0017
address6=fe80::ba69:f4ff:fe00:0017
mac-address=B8:69:F4:00:00:07
identity="myrouter2"
platform="MikroTik"
version="6.43.8
(stable)"
$device[3]["myrouter2"]
Look things over and let me know if you have further questions. As mentioned at the beginning, you haven't defined an explicit output format you are looking for, but gleaning what information was in the question, this should be close.
I think you're on the right track with IFS.
Try piping IFS=$'\n\n' (to break apart the line groups by interface) through cut (to extract the specific field(s) you want for each interface).
Bash likes single long rows with delimter separated values. So first we need to convert your file to such format.
Below I read 4 lines at a time from input. I notices that the output spans over 4 lines only - I just concatenate the 4 lines and act as if it is a single line.
while
IFS= read -r line1 &&
IFS= read -r line2 &&
IFS= read -r line3 &&
IFS= read -r line4 &&
line="$line1 $line2 $line3 $line4"
do
if [ -n "$line4" ]; then
echo "ERR: 4th line should be empt - $line4 !" >&2
exit 4
fi
if ! num=$(printf "%d" ${line:0:3}); then
echo "ERR: reading number" >&2
exit 1
fi
line=${line:3}
# bash variables can't have `-`
line=${line/mac-address=/mac_address=}
# unsafe magic
vars=(interface address address4
address6 mac_address identity platform version)
for v in "${vars[#]}"; do
unset "$v"
if ! <<<"$line" grep -q "$v="; then
echo "ERR: line does not have $v= part!" >&2
exit 1
fi
done
# eval call
if ! eval "$line"; then
echo "ERR: eval line=$line" >&2
exit 1
fi
for v in "${vars[#]}"; do
if [ -z "${!v}" ]; then
echo "ERR: variable $v was not set in eval!" >&2
exit 1;
fi
done
echo "$num: $interface $address $address4 $address6 $mac_address $identity $platform $version"
done < file
then I retrieve the leading number from the line, which I suspect was printed with printf "%3d" so I just slice the line ${line:0:3}
for the rest of the line I indent to use eval. In this case I trust upstream, but I try to assert some cases (variable not defined in the line, some syntax error and similar)
then the magic eval "$line" happens, which assigns all the variables in my shell
after that I can use variables from the line like normal variables
live example at tutorialspoint
Eval command and security issues

Access next item in for loop bash

I am trying to loop through my file and grab the lines in groups of 2. Every data entry in the file contains a header line and then the following line has the data.
I am trying to: Loop through the file, grab every two lines and manipulate them. My current problem is that I am trying to echo the next line in the loop. So every time I hit a header row, it will print the data line (next line) with it.
out="$(cat $1)" #file
file=${out}
iter=0
for line in $file;
do
if [ $((iter%2)) -eq 0 ];
then
#this will be true when it hits a header
echo $line
# I need to echo the next line here
fi
echo "space"
iter=$((iter+1))
done
Here is an example of a possible input file:
>fc11ba964421kjniwefkniojhsdeddb4_runid=65bedc43sdfsdfsdfsd76b7303_read=42_ch=459_start_time=2017-11-01T21:10:05Z <br>
TGAGCTATTATTATCGGCGACTATCTATCTACGACGACTCTAGCTACGACTATCGACTCGACTACSAGCTACTACGTACCGATC
>fd38df1sd6sdf9867345uh43tr8199_runid=65be1fasdfsdfgdsfg4376b7303_read=60_ch=424_start_time=2017-11-01T21:10:06Z <br>
TGAGCTATTATTATCGGCGACTATCTATCTACGACGACTCTAGCTACGACTATCGACTCGACTACSAGCTACTACGTACCGATC
>1d03jknsdfnjhdsf78sd89ds89cc17d_runid=65bedsdfsdfsdf03_read=24_ch=439_start_time=201711-01T21:09:43Z <br>
TGAGCTATTATTATCGGCGACTATCTATCTACGACGACTCTAGCTACGACTATCGACTCGACTACSAGCTACTACGTACCGATC
header lines start with > and data is the lines containing TGACATC
EDIT:
For those asking about the output, based on the original question, I am trying to access the header and data together. Each header and matching data will be processed 6 times. The end goal is to have each header and data pair:
>fc11ba964421kjniwe (original header)
GATATCTAGCTACTACTAT (original data)
translate to:
>F1_fc11ba964421kjniwe
ASNASDKLNASDHGASKNHDLK
>F2_fc11ba964421kjniwe
ASHGASKNHDLKNASDKLNASD
>F3_fc11ba964421kjniwe
KNHDLKNASDKLNASDASHGAS
>R1_fc11ba964421kjniwe
ASHGLKNASDKLNASDASKNHD
>R2_fc11ba964421kjniwe
AKNASDKLNASDSHGASKNHDL
>R3_fc11ba964421kjniwe
SKNHDLKNASDKASHGALNASD
and then the next header and data entry would generate another 6 lines
If you know your records each consist of exactly 2 lines, use the read command twice on each iteration of the while loop.
while IFS= read -r line1; IFS= read -r line2; do
...
done < "$1"
Your for line in $file notation cannot work; in bash, the text after in is a series of values, not an input file. What you're probably looking for is a while read loop that takes the file as standard input. Something like this:
while read -r header; do
# We should be starting with a header.
if [[ $header != >* ]]; then
echo "ERROR: corrupt header: $header" >&2
break
fi
# read the next line...
read -r data
printf '%s\n' "$data" >> data.out
done < "$file"
I don't know what output you're looking for, so I just made something up. This loop enforces header position with the if statement, and prints data lines to an output file.
Of course, if you don't want this enforcement, you could simply:
grep -v '^>' "$file"
to return lines which are not headers.

How can I create a Bash script that creates multiple files with text, excluding one?

I need to create Bash script that generates text files named file001.txt through file050.txt
Of those files, all should have this text inserted "This if file number xxx" (where xxx is the assigned file number), except for file007.txt, which needs to me empty.
This is what I have so far..
#!/bin/bash
touch {001..050}.txt
for f in {001..050}
do
echo This is file number > "$f.txt"
done
Not sure where to go from here. Any help would be very appreciated.
#!/bin/bash
for f in {001..050}
do
if [[ ${f} == "007" ]]
then
# creates empty file
touch "${f}.txt"
else
# creates + inserts text into file
echo "some text/file" > "${f}.txt"
fi
done
The continue statement can be used to skip an iteration of a loop and go on to the next -- though since you actually do want to take an operation on file 7 (creating it), it makes just as much sense to have a conditional:
for (( i=1; i<50; i++ )); do
printf -v filename '%03d.txt' "$i"
if (( i == 7 )); then
# create file if it doesn't exist, truncate if it does
>"$filename"
else
echo "This is file number $i" >"$filename"
fi
done
A few words about the specific implementation decisions here:
Using touch file is much slower than > file (since it starts an external command), and doesn't truncate (so if the file already exists it will retain its contents); your textual description of the problem indicates that you want 007.txt to be empty, making truncation appropriate.
Using a C-style for loop, ie. for ((i=0; i<50; i++)), means you can use a variable for the maximum number; ie. for ((i=0; i<max; i++)). You can't do {001..$max}, by contrast. However, this does need meaning to add zero-padding in a separate step -- hence the printf.
Of course, you can costumize the files' name and the text, the key thing is the ${i}. I tried to be clear, but let us know if you don't understand something.
#!/bin/bash
# Looping through 001 to 050
for i in {001..050}
do
if [ ${i} == 007 ]
then
# Create an empty file if the "i" is 007
echo > "file${i}.txt"
else
# Else create a file ("file012.txt" for example)
# with the text "This is file number 012"
echo "This is file number ${i}" > "file${i}.txt"
fi
done

Resources