Removing current line in bash during read - bash

I'm using bash to read a file and after doing opeation on particular line ,i need to delete that line from input file.
Can you please suggest some way to do so using sed or any other way ?
i've tried using sed command like this :-
#!/bin/sh
file=/Volumes/workplace/GeneratedRules.ion
while read line;do
printf "%s\n" "$line"
sed '1d' $file
done <$file
my aim in this program is to read one line and then deleting it.
Input :-
AllIsWell
LetsHopeForBest
YouCanMakeIt
but the output , i got is more weird than i thought.
output :-
AllIsWell
LetsHopeForBest
YouCanMakeIt
LetsHopeForBest
LetsHopeForBest
YouCanMakeIt
YouCanMakeIt
LetsHopeForBest
YouCanMakeIt
but i need to output as :
AllIsWell
LetsHopeForBest
YouCanMakeIt
as i want to delete line after reading it.
NOTE :- i have simplified my problem here . The actual usecase is :-
I need to perform some bunch of operation on line except reading that and the input file is too long and my operation got fails in some way in between .So i want those lines which i have read to be deleted so that if i start the process again , it will not start from the beginning but at the point where it got stuck.
please help.

You effectively said you want your process to be restartable. Depending upon how you define the successful completion of an iteration of your while loop, you should store a line number in a separate file, x, that indicates how many lines you have successfully processed. If the file doesn't exist, then assume you would start reading at line 1.
Otherwise, you would get the content of x into variable n and then you would start reading $file at line $n + 1.
How you start reading at a particular line depends on constraints we don't know yet.
One way you could do it is to use sed to put lines $n + 1 into a temporary file, remove $file and then move the temporary file to $file before your while loop begins.
There are other solutions but each one might not elegantly satisfy your constraints.
But you'll want to carefully consider what happens if some other process is modifying the content of $file and when it is changing the content. If it only changes the content before or after your bash script is running, then you're probably ok to continue down this path. Otherwise, you have a synchronization problem to solve.

As stated in comments, there are many issues with altering the file you are currently reading from. Don't do it.
You could just keep track of which lines you have dealt with in the first loop (with a counter) then use sed to delete those lines after your first loop has processed them.
A simple example:
cd /tmp
echo 'Line 1
Line 2
Line 3
Line 4
Line 5' >file
echo "file before:"
cat file
cnt=1
while IFS= read -r line || [[ -n $line ]]; do
printf "'%s' processed\n" "$line";
if [ "$cnt" -ge 3 ]; then
break
fi
let "cnt+=1"
done <file
sed -i '' "1,${cnt}d" file
echo "file after:"
cat file
Prints:
file before:
Line 1
Line 2
Line 3
Line 4
Line 5
'Line 1' processed
'Line 2' processed
'Line 3' processed
file after:
Line 4
Line 5
Another method is to use something like awk, ruby or perl to 'slurp' the file and then feed that slurpped file line-by-line to your Bash while loop. The file can then be modified in your loop since the other process has already fully read and closed the file:
# Note: This is SLOWER and USES MORE MEMORY...
echo "file before:"
cat file
while IFS= read -r line; do
printf "'%s' processed\n" "$line";
sed -i '' "1d" file
done < <(awk -v cnt=3 'NR>cnt{next}
{arr[NR]=$0}
END { for(i=1;i<=cnt;i++) print arr[i] }' file)
echo "file after:"
cat file
# same output
Note:
Please make sure you polish up on how to use bash to read a stream line-by-line for less surprises.
Read HERE and HERE for more.

It needs a option -i, and if you need backup the file, just assign a suffix to the option. see the man sed
#!/bin/sh
file=/Volumes/workplace/GeneratedRules.ion
while read line;do
printf "%s\n" "$line"
sed -i '1d' $file
done <$file

Related

Grep lines between two patterns, one unique and one repeated

I have a text file which looks like this
1
bbbbb
aaa
END
2
ttttt
mmmm
uu
END
3
....
END
The number of lines between the single number patterns (1,2,3) and END is variable. So the upper delimiting pattern changes, but the final one does not. Using some bash commands, I would like to grep lines between a specified upper partner and the corresponding END, for example a command that takes as input 2 and returns
2
ttttt
mmmm
uu
END
I've tried various solutions with sed and awk, but still can't figure it out. The main problem is that I may need to grep a entry in the middle of the file, so I can't use sed with /pattern/q...Any help will be greatly appreciated!
With awk we set a flag f when matching the start pattern, which is an input argument. After that row, the flag is on and it prints every line. When reaching "END" (AND the flag is on!) it exits.
awk -v p=2 '$0~p{f=1} f{print} f&&/END/{exit}' file
Use sed and its addresses to only print a part of the file between the patterns:
#!/bin/bash
start=x
while [[ $start = *[^0-9]* ]] ; do
read -p 'Enter the start pattern: ' start
done
sed -n "/^$start$/,/^END$/p" file
You can use the sed with an address range. Modify the first regular expression (RE1) in /RE1/,/RE2/ as your convenience:
sed -n '/^[[:space:]]*2$/,/^[[:space:]]*END$/p' file
Or,
sed '
/^[[:space:]]*2$/,/^[[:space:]]*END$/!d
/^[[:space:]]*END$/q
' file
This quits upon reading the END, thus may be more efficient.
Another option/solution using just bash
#!/usr/bin/env bash
start=$1
while IFS= read -r lines; do
if [[ ${lines##* } == $start ]]; then
print=on
elif [[ ${lines##* } == [0-9] ]]; then
print=off
fi
case $print in on) printf '%s\n' "$lines";; esac
done < file.txt
Run the script with the number as the argument, 1 can 2 or 3 or ...
./myscript 1
This might work for you (GNU sed):
sed -n '/^\s*2$/{:a;N;/^\s*END$/M!ba;p;q}' file
Switch off implicit printing by setting the -n option.
Gather up the lines beginning with a line starting with 2 and ending in a line starting with END, print the collection and quit.
N.B. The second regexp uses the M flag, which allows the ^ and $ to match start and end of lines when multiple lines are being matched. Another thing to bear in mind is that using a range i.e. sed -n '/start/,/end/p' file, will start printing lines the moment the first condition is met and if the second match does not materialise, it will continue printing to the end of the file.

Delete lines in file that have a date older than x

I can read an entire file into memory like so:
#!/bin/bash
filename='peptides.txt'
filelines=`cat $filename`
ten_days_ago="$(date)"
for line in $filelines ; do
date_of="$(echo "$line" | jq -r '.time')"
if [[ "$ten_days_ago" > "$date_of" ]]; then
# delete this line
fi
done
the problem is:
I may not want to read the whole file into memory
If I stream it line by line with bash, how can I store which line to delete from? I would delete lines 0 to x, where line x has a date equal to 10 days ago.
A binary search would be appropriate here - so maybe bash is not a good solution to this? I would need to find the number of lines in the file, divide by two and go to that line.
You can use binary search only if the file is sorted.
You do not need to read the whole file into memory; you can process it line by line:
while read line
do
....
done <$filename
And: Yes, I personally would not use shell scripting for this kind of problems, but this is of course a matter of taste.
You didn't show what the input file looks like but judging by your jq its JSON data.
With that said this is how i would do it
today=$(date +%j)
tenDaysAgo=$(date --date="10 day ago" +%j)
#This is where you would create the data for peptides.txt
#20 spaces away there is a date stamp so it doesn't distract you
echo "Peptides stuff $today" >> peptides.txt
while read pepStuff; do
if [ $pepStuff == $tenDaysAgo ]; then
sed -i "/.*$pepStuff/d" peptides.txt
fi
done < <(awk '{print $3}' peptides.txt)

No new line produced by >>

I have the following piece of code that selects two line numbers in a file, extracts everything between these lines, replaces the new line characters with tabs and places them in an output file. I want all lines extracted within one loop to be on the same line, but lines extracted on different loops to go on a new line.
for ((i=1; i<=numTimePoints; i++)); do
# Get the starting point for line extraction. This is just an integer.
startScan=$(($(echo "${dataStart}" | sed -n ${i}p)+1))
# Get the end point for line extraction. This is just an integer.
endScan=$(($(echo "${dataEnd}" | sed -n ${i}p)-1))
# From file ${file}, take all lines between ${startScan} and ${endScan}. Replace new lines with tabs and output to file ${tmpOutputFile}
head -n ${endScan} ${file} | tail -n $((${endScan}-${startScan}+1)) | tr "\n" "\t" >> ${tmpOutputFile}
done
This script works mostly as intended, however all new lines are appended to the previous line, rather than placed on new lines (as I thought >> would do). In other words, if I were to now do cat ${tmpOutputFile} | wc then it returns 0 12290400 181970555. Can anyone point out what I'm doing wrong?
Any redirection, including >>, does not have anything to do with newline creation at all -- redirection operations don't generate output themselves, newlines or otherwise; they only control where file descriptors (stdout, stderr, etc) are connected to, and it's the programs performing those writes which are responsible for contents.
Consequently, your tr '\n' '\t' is entirely preventing newlines from being added to the output file -- there's nowhere one could come from that doesn't go through that pipeline.
Consider the following instead:
while read -r startScan <&3 && read -r endScan <&4; do
# generate your output
head -n "$endScan" "$file" | tail -n $(( endScan - startScan + 1 )) | tr '\n' '\t'
# append your newline
printf '\n'
done 3<<<"$dataStart" 4<<<"$dataEnd" >"$tmpOutputFile"
Note:
We aren't paying the cost of running sed to extract startScan and endScan, but rather are reading them a line at a time from herestrings created from the contents of dataStart and dataEnd
We're redirecting to our output file exactly once, and reusing that file handle for the entire loop (over multiple commands -- first the pipeline, and then the printf)
We're actually running a printf to generate that newline, rather than expecting it to be somehow implicitly created by magic.

How to get line WITH tab character using tail and head

I have made a script to practice my Bash, only to realize that this script does not take tabulation into account, which is a problem since it is designed to find and replace a pattern in a Python script (which obviously needs tabulation to work).
Here is my code. Is there a simple way to get around this problem ?
pressure=1
nline=$(cat /myfile.py | wc -l) # find the line length of the file
echo $nline
for ((c=0;c<=${nline};c++))
do
res=$( tail -n $(($(($nline+1))-$c)) myfile.py | head -n 1 | awk 'gsub("="," ",$1){print $1}' | awk '{print$1}')
#echo $res
if [ $res == 'pressure_run' ]
then
echo "pressure_run='${pressure}'" >> myfile_mod.py
else
echo $( tail -n $(($nline-$c)) myfile.py | head -n 1) >> myfile_mod.py
fi
done
Basically, it finds the line that has pressure_run=something and replaces it by pressure_run=$pressure. The rest of the file should be untouched. But in this case, all tabulation is deleted.
If you want to just do the replacement as quickly as possible, sed is the way to go as pointed out in shellter's comment:
sed "s/\(pressure_run=\).*/\1$pressure/" myfile.py
For Bash training, as you say, you may want to loop manually over your file. A few remarks for your current version:
Is /myfile.py really in the root directory? Later, you don't refer to it at that location.
cat ... | wc -l is a useless use of cat and better written as wc -l < myfile.py.
Your for loop is executed one more time than you have lines.
To get the next line, you do "show me all lines, but counting from the back, don't show me c lines, and then show me the first line of these". There must be a simpler way, right?
To get what's the left-hand side of an assignment, you say "in the first space-separated field, replace = with a space , then show my the first space separated field of the result". There must be a simpler way, right? This is, by the way, where you strip out the leading tabs (your first awk command does it).
To print the unchanged line, you do the same complicated thing as before.
A band-aid solution
A minimal change that would get you the result you want would be to modify the awk command: instead of
awk 'gsub("="," ",$1){print $1}' | awk '{print$1}'
you could use
awk -F '=' '{ print $1 }'
"Fields are separated by =; give me the first one". This preserves leading tabs.
The replacements have to be adjusted a little bit as well; you now want to match something that ends in pressure_run:
if [[ $res == *pressure_run ]]
I've used the more flexible [[ ]] instead of [ ] and added a * to pressure_run (which must not be quoted): "if $res ends in pressure_run, then..."
The replacement has to use $res, which has the proper amount of tabs:
echo "$res='${pressure}'" >> myfile_mod.py
Instead of appending each line each loop (and opening the file each time), you could just redirect output of your whole loop with done > myfile_mod.py.
This prints literally ${pressure} as in your version, because it's single quoted. If you want to replace that by the value of $pressure, you have to remove the single quotes (and the braces aren't needed here, but don't hurt):
echo "$res=$pressure" >> myfile_mod.py
This fixes your example, but it should be pointed out that enumerating lines and then getting one at a time with tail | head is a really bad idea. You traverse the file for every single line twice, it's very error prone and hard to read. (Thanks to tripleee for suggesting to mention this more clearly.)
A proper solution
This all being said, there are preferred ways of doing what you did. You essentially loop over a file, and if a line matches pressure_run=, you want to replace what's on the right-hand side with $pressure (or the value of that variable). Here is how I would do it:
#!/bin/bash
pressure=1
# Regular expression to match lines we want to change
re='^[[:space:]]*pressure_run='
# Read lines from myfile.py
while IFS= read -r line; do
# If the line matches the regular expression
if [[ $line =~ $re ]]; then
# Print what we matched (with whitespace!), then the value of $pressure
line="${BASH_REMATCH[0]}"$pressure
fi
# Print the (potentially modified) line
echo "$line"
# Read from myfile.py, write to myfile_mod.py
done < myfile.py > myfile_mod.py
For a test file that looks like
blah
test
pressure_run=no_tab
blah
something
pressure_run=one_tab
pressure_run=two_tabs
the result is
blah
test
pressure_run=1
blah
something
pressure_run=1
pressure_run=1
Recommended reading
How to read a file line-by-line (explains the IFS= and -r business, which is quite essential to preserve whitespace)
BashGuide

Grep from a file and store the result lines in separate file, and the file name should increment with every entry

I am trying to write a shell script which will get an input from a column in csv file line by line and that input is used to search in another file.
The line(s) in which the resulting pattern is found should be stored in another file. And with every such iteration, the file name should also increment.
For example, test1, test2, test3,...
Below is the code i used:
#!/bin/bash
input=file1.csv
while
IFS=,
read -a csv_line;
do echo "${csv_line[1]}";
for((i=0;i<=22558;i++))
do
#echo "${csv_line[1]}";
filename=$'test'${i++};
grep "csv_line[1]" log_file > filename.txt;
echo=$filename;
echo="$csv_line[1]}"
done<file1.csv
done
I did not get any error, but when i execute the script, it neither shows anything nor it gets stored in file. Using my code, the filename does not change to test1. but remains as "filename.txt"
Is that what you need?
#!/bin/bash
counter=0
while read -r csv_line
do
value=$(cut -d ',' -f2 <<< $csv_line)
counter=$((counter+1))
grep $value RBFCDC_in.log.2015-11-23.PROD > "test$counter";
done < dbo_fidessa_order_hist_sql.csv

Resources