Bash Process Substitution usage with tee and while loop - bash

I want to use nested process subtitution with tee in a while loop.
while read line; do
#process line
echo "--$line"
done < <(cat sample.file | tee >(grep "SPECLINE") | grep "LINESTOPROCESS")
Therefore, I need:
all lines in sample.file that contain "LINETOPROCESS" expression should be passed into the loop, and they will be printed with "--" prefix.
all lines contain "SPECLINE" needs to be printed in tee's first process substitution (in the grep).
I want to avoid cat-ting the sample.file more than once as it is too large and heavy.
With a simple sample.test file:
line1 SPECLINE
line2 LINETOPROCESS
line3 LINETOPROCESS
line4 SPECLINE
line5 I don't need it
line6 also not
line7 also not
line8 SPECLINE
line9 LINETOPROCESS
My result:
# ./test.sh
#
My desired result:
# ./test.sh
line1 SPECLINE
--line2 LINETOPROCESS
--line3 LINETOPROCESS
line4 SPECLINE
line8 SPECLINE
--line9 LINETOPROCESS
Or I can also accept this as output:
# ./test.sh
--line2 LINETOPROCESS
--line3 LINETOPROCESS
--line9 LINETOPROCESS
line1 SPECLINE
line4 SPECLINE
line8 SPECLINE
UPDATE1
greps are for demo only.
I really need those 2 substitutions.
sample.file is a http file.
grep "SPECLINE" would be "hxselect -i -s ';' -c 'div.hour'
grep "LINESTOPROCESS" would be "hxselect -i -s ';' -c 'div.otherclass' | hxpipe
hx programs are not line-oriented. They are reading from stdin and outputting to stdout.
Therefore the tee's first command will select divs with 'hour' class and separate them with ';'. Afterwards, the pipe after tee will select all divs with class 'otherclass' and hxpipe will flatten it for the loop for further processing.

I would use no process substitution at all.
while IFS= read -r line; do
if [[ $line = *SPECLINE* ]]; then
printf '%s\n' "$line"
elif [[ $line = *LINETOPROCESS* ]]; then
printf '--%s\n' "$line"
fi
done < sample.txt
You are already paying the cost of reading an input stream line-by-line in bash; no reason to add the overhead of two separate grep processes to it.
A single awk process would be even better, as it is more efficient than bash's read-one-character-at-a-time approach to reading lines of text.
awk '/SPECLINE/ {print} /LINETOPROCESS/ {print "--"$0}' sample.txt
(which is too simple if a single line could match both SPECLINE and LINETOPROCESS, but I leave that as an exercise to the reader to fix.)

The following just loops through the entire file and just prints the matching lines. All other lines are ignored.
while read line; do
case "$line" in
*SPECLINE*) echo "$line" ;;
*LINETOPROCESS*) echo "--$line" ;;
esac
done < sample.file

When you want the tee, you can make 2 changes.
Your testcode greps LINESTOPROCESS, the input is LINETO..
The output process substition gives problems like https://stackoverflow.com/a/42766913/3220113 explained. You can do this differently.
while IFS= read -r line; do
#process line
echo "--$line"
done < x2 |
tee >(grep "SPECLINE") >(grep "LINETOPROCESS") >/dev/null
I don't know hxselect, but it seems to operate on a complete well-formed XML document, so avoid the grep.

Related

how to move string from position to position in bash (sed, awk)

Please help me:)
How to move string from 4-line to under 1-line in bash (awk, sed)?
Example, i have a file:
line1
line2
line3
moveline4
result use SED, AWK and other utils:
line1
moveline4
line2
line3
Another approach with sed:
sed '2{N;h;d};4G;' file
Explanations:
2N: merges second and third line
h: stores both lines into the hold space
d: deletes both lines
4G: adds the pattern space content after fourth line
input file:
line1
line2
line3
moveline4
line5
line6
sed solution:
sed -n -e '1p' -e '4p' lines.txt; sed -n -e '2,3p' -e '5,$p' lines.txt
output:
line1
moveline4
line2
line3
line5
line6
You can redirect your ouput to a specific file (> and >>)
sed -n -e '1p' -e '4p' lines.txt > new_file.txt; sed -n -e '2,3p' -e '5,$p' lines.txt >> new_file.txt
explanation:
-n to avoid default printing of lines
-e to pass several commands to sed
1p to print the first line
4p to print the 4th line
2,3p to print the 2nd and 3rd line
5,$p to print the 5th line to the last line
awk solution:
awk 'NR==1||NR==4' lines.txt; awk 'NR>=2 && NR!=4' lines.txt
You can redirect your ouput to a specific file (> and >>)
awk 'NR==1||NR==4' lines.txt > new_file.txt; awk 'NR>=2 && NR!=4' lines.txt >> new_file.txt
explanation:
NR==1||NR==4 will make awk do its default action for number record 1 or 4 which is printing (lines 1 and 4)
NR>=2 && NR!=4 for lines >=2 and !=4 you print them
head/tail solution:
head -1 lines.txt; head -4 lines.txt| tail -1; head -3 lines.txt | tail -2; tail -n +5 lines.txt
for information about the behavior of head and tail please do man head and man tail
perl solution:
perl -ne 'print if 1..1' lines.txt; perl -ne 'print if 4..4' lines.txt; perl -ne 'print if 2..3' lines.txt; perl -ne 'print if 5..6' lines.txt
full bash solution:
#!/usr/bin/env bash
#print the 1st line
while IFS='' read -r line || [[ -n "$line" ]]; do
echo "$line"
break
done <"$1"
#print the 4th line
x=1
while IFS='' read -r line || [[ -n "$line" ]]; do
if [ $x -eq 4 ]; then
echo "$line"
break
fi
x=$((x+1))
done <"$1"
#print all lines except the 1st and the 4th
x=1
while IFS='' read -r line || [[ -n "$line" ]]; do
if [ $x -ne 1 -a $x -ne 4 ]; then
echo "$line"
fi
x=$((x+1))
done <"$1"
you can call the script ./myscript.sh lines.txt and redirect the output if necessary: ./myscript.sh lines.txt > new_file.txt

How to grep and remove from a file all lines between a separator

I have a file that looks like this:
===SEPARATOR===
line2
line3
===SEPARATOR===
line5
line6
===SEPARATOR===
line8
...
lineX
===SEPARATOR===
How can I do a while loop and go through the file, dump anything between two ===SEPARATOR=== occurrences into another file for further processing?
I want to add only line2, line3 to the second file on the first iteration. I will parse the file; and on the next iteration I want line5 line6 in second file to do the same parsing again but on different data.
It sounds like you want to save each block of lines to a separate file.
The following solutions create output files f1, f2, containing the (non-empty) blocks of lines betwen the ===SEPARATOR=== lines.
With GNU Awk or Mawk:
awk -v fnamePrefix='f' -v RS='(^|\n)===SEPARATOR===(\n|$)' \
'NF { fname = fnamePrefix (++n); print > fname; close(fname) }' file
Pure bash - which will be slow:
#!/usr/bin/env bash
fnamePrefix='f'; i=0
while IFS= read -r line; do
[[ $line == '===SEPARATOR===' ]] && { (( ++i )); > "${fnamePrefix}${i}"; continue; }
printf '%s\n' "$line" >> "${fnamePrefix}${i}"
done < file
You can exclude all lines matching ===SEPARATOR=== with grep -v and redirect the rest to a file:
grep -vx '===SEPARATOR===' file > file_processed
-x makes sure that only lines completely matching ===SEPARATOR=== are excluded.
This uses sed to find lines between separators, and then grep -v to delete the separators.
$ sed -n '/===SEPARATOR===/,/===SEPARATOR===/ p' file | grep -v '===SEPARATOR==='
line2
line3
line8
...
lineX
There's got to be a more elegant answer that doesn't repeat the separator three times, but I'm drawing a blank.
I am assuming that you do not need the line5 and line6 . You can do it with awk like this:.
awk '$0 == "===SEPARATOR===" {interested = ! interested; next} interested {print}'
Credit goes to https://www.gnu.org/software/gawk/manual/html_node/Boolean-Ops.html#Boolean-Ops
Output:
[root#hostname ~]# cat /tmp/1 | awk '$0 == "===SEPARATOR===" {interested = ! interested; next} interested {print}' /tmp/1
line2
line3
line8
...
lineX
awk to the rescue!
with multi-char support (e.g. gawk)
$ awk -v RS='\n?===SEPARATOR===\n' '!(NR%2)' file
line2
line3
line8
...
lineX
or without that
$ awk '/===SEPARATOR===/{p=!p;next} p' file
line2
line3
line8
...
lineX
which is practically the same with #Jay Rajput's answer.

Bash while sed is not null

I need to do while loop when sed is not null. Example:
File 1
line1
line2
line3
File 2
i=1
while sed "${int}p" # here I need expression which checks if line is not null
# here echo this line and i++
I tried to write just while sed -n "${int}p" but it does not work as I expected.
You can use the = command in sed to get the number of lines:
sed -n '/./!q;=;p' input | sed 'N;s/\n/ /'
For an input:
a
b
c
d
This gives:
1 a
2 b
3 c
If you only want to get line number of the line before the first non-empty line:
sed -n '/./!q;=' input | tail -1
A while loop that prints all lines:
while read line; do
echo "$line"
done < input
If you want to count the lines until the first empty line, you could do this.
$ cat in.txt
line1
line2
line3
line4
line5
$ echo $(($(sed '/^\s*$/q' < in.txt | wc -l) - 1))
3

How do I read entire lines with <<< operator?

I need to validate some program output. I know the output contains exactly three lines that I would like to store in distinct variables for easy access. I tried several variations of the following without success.
IFS=$'\n' read line1 line2 line3 <<< $(grep pattern file.log)
IFS='' read line1 line2 line3 <<< $(grep pattern file.log)
Is it possible to combine read and <<< to do what I want? How?
If it is not possible, what is the explanation? What alternative do you suggest?
Thank you.
{
read line1
read line2
read line3
} < <(grep pattern file.log)
I feel that putting a command substitution ($()) after a herestring (<<<)
is an unnecessary contortion, as is trying to trick read into reading more than one line at a time. This is a case where multiple reads in a compound command is a natural solution.
Well I have found how to make it work. It hinged on the use of double quotes.
First I tried this:
data=$(grep pattern file.log)
IFS=$'\n' read -d '' -r line1 line2 line3 <<< "$data"
Then found that the variable was not absolutely necessary if I kept the double quotes:
IFS=$'\n' read -d '' -r line1 line2 line3 <<< "$(grep pattern file.log)"
By default, a newline indicates end of data. To avoid read to stop working after the first line, I override the default with -d ''.
By default, the field separator is <space><tab><newline>. Since I want to read entire lines, I set IFS to $'\n'.
Note that -r is not central to the solution. I added it to avoid stumbling on eventual backslashes in my input.
EDIT
Using read and <<< has one subtle but important drawback: blank lines will disappear entirely.
data="1
3"
IFS=$'\n' read -d '' -r line1 line2 line3 <<< "$data"
echo ${line1:-blank}
echo ${line2:-blank}
echo ${line3:-blank}
output:
1
3
blank
Same if you try storing lines into an array with -a:
IFS=$'\n' read -d '' -r -a line_ar <<< "$data"
echo ${line_ar[0]:-blank}
echo ${line_ar[1]:-blank}
echo ${line_ar[2]:-blank}
output:
1
3
blank
However, you will still obtain all lines with a construct like this:
while read -r line ; do
echo ${line:-blank}
done <<< "$data"
output:
1
blank
3
If the number of lines is very limited, then you're better off using multiple reads as suggestd by kojiro. Then again, it is perfectly legal to use <<<:
{
read -r line1
read -r line2
read -r line3
} <<< "$data"
echo ${line1:-blank}
echo ${line2:-blank}
echo ${line3:-blank}
Remember to enclose your "$var" inside double quotes to have newline expanded.
Set IFS to \n and pass -d '' to read. Also you can use process substitution instead of a here string
while IFS=$'\n' read -d'' -r line1 line2 line3; do :; done < <(grep pattern file.log)
Use an array
read -a array < <(grep pattern file.log)
# Optional
line1=${array[0]}
line2=${array[1]}
line3=${array[2]}
However, I would recommend kojiro's answer.
while read -r arry[x]; do
((x++));
done < <(grep pattern file.log)
This will create an array called arry and the 3 lines will belong to each indices starting from 0.

ksh + while loop + get the same file with the same spaces

need advice about the following
with the following ksh script I actually copy file1 to file2
my problem is that lines in file2 are not with the same location as file1
#!/bin/ksh
while read -r line ; do
echo $line >> file2
done < file1
for example
more file1
line1
line2
line3
more file2
line1
line2
line3
the question what I need to change in my script in order to get lines location as described in file1? after I run my ksh script?
lidia
You can try:
while read -r line ; do
echo $line | sed -re 's/^\s+//' >> file2
done < file1
This uses sed to get rid of the leading whitespaces present in lines from file1.
you can set IFS=
while IFS= read -r line ; do echo "$line"; done<file

Resources