How to read two lines in shell script (In single iteration line1, line2 - Next iteration it should take line2,line3.. so on) - bash

In my shell script, one of the variable contains set of lines. I have a requirement to get the two lines info at single iteration in which my awk needs it.
var contains:
12 abc
32 cdg
9 dfk
98 nhf
43 uyr
5 ytp
Here, In a loop I need line1, line2[i.e 12 abc \n 32 cdg] content and next iteration needs line2, line3 [32 cdg \n 9 dfk] and so on..
I tried to achieve by
while IFS= read -r line
do
count=`echo ${line} | awk -F" " '{print $1}'`
id=`echo ${line} | awk -F" " '{print $2}'`
read line
next_id=`echo ${line} | awk -F" " '{print $2}'`
echo ${count}
echo ${id}
echo ${next_id}
## Here, I have some other logic of awk..
done <<< "$var"
It's reading line1, line2 at first iteration. At second iteration it's reading line3, line4. But, I required to read line2, line3 at second iteration. Can anyone please sort out my requirement.
Thanks in advance..

Don't mix a shell script spawing 3 separate subshells for awk per-iteration when a single call to awk will do. It will be orders of magnitude faster for large input files.
You can group the messages as desired, just by saving the first line in a variable, skipping to the next record and then printing the line in the variable and the current record through the end of the file. For example, with your lines in the file named lines, you could do:
awk 'FNR==1 {line=$0; next} {print line"\n"$0"\n"; line=$0}' lines
Example Use/Output
$ awk 'FNR==1 {line=$0; next} {print line"\n"$0"\n"; line=$0}' lines
12 abc
32 cdg
32 cdg
9 dfk
9 dfk
98 nhf
98 nhf
43 uyr
43 uyr
5 ytp
(the additional line-break was simply included to show separation, the output format can be changed as desired)
You can add a counter if desired and output the count via the END rule.

The solution depends on what you want to do with the two lines.
My first thought was something like
sed '2,$ s/.*/&\n&/' <<< "${yourvar}"
But this won't help much when you must process two lines (I think | xargs -L2 won't help).
When you want them in a loop, try
while IFS= read -r line; do
if [ -n "${lastline}" ]; then
echo "Processing lines starting with ${lastline:0:2} and ${line:0:2}"
fi
lastline="${line}"
done <<< "${yourvar}"

Related

bash - how do I use 2 numbers on a line to create a sequence

I have this file content:
2450TO3450
3800
4500TO4560
And I would like to obtain something of this sort:
2450
2454
2458
...
3450
3800
4500
4504
4508
..
4560
Basically I would need a one liner in sed/awk that would read the values on both sides of the TO separator and inject those in a seq command or do the loop on its own and dump it in the same file as a value per line with an arbitrary increment, let's say 4 in the example above.
I know I can use several one temp file, go the read command and sorts, but I would like to do it in a one liner starting with cat filename | etc. as it is already part of a bigger script.
Correctness of the input is guaranteed so always left side of TOis smaller than bigger side of it.
Thanks
Like this:
awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}' file
or, if you like starting with cat:
cat file | awk -F'TO' -v inc=4 'NF==1{print $1;next}{for(i=$1;i<=$2;i+=inc)print i}'
Something like this might work:
awk -F TO '{system("seq " $1 " 4 " ($2 ? $2 : $1))}'
This would tell awk to system (execute) the command seq 10 4 10 for lines just containing 10 (which outputs 10), and something like seq 10 4 40 for lines like 10TO40. The output seems to match your example.
Given:
txt="2450TO3450
3800
4500TO4560"
You can do:
echo "$txt" | awk -F TO '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i++) print i}'
If you want an increment greater than 1:
echo "$txt" | awk -F TO -v p=4 '{$2<$1 ? t=$1 : t=$2; for(i=$1; i<=t; i+=p) print i}'
Give a try to this:
sed 's/TO/ /' file.txt | while read first second; do if [ ! -z "$second" ] ; then seq $first 4 $second; else printf "%s\n" $first; fi; done
sed is used to replace TO with space char.
read is used to read the line, if there are 2 numbers, seq is used to generate the sequence. Otherwise, the uniq number is printed.
This might work for you (GNU sed):
sed -r 's/(.*)TO(.*)/seq \1 4 \2/e' file
This evaluates the RHS of the substitution command if the LHS contains TO.

How to add all values in a certain column?

I want to add all the 3rd fields from each line and produce the result.
Below is the way I solved the problem
sum=0
grep '2016Feb' input.txt|awk -F\- '{print $3}'|while read LINE; do
sum = $(expr $sum + $LINE)
done
echo $sum
Is there a better way of solving the problem than my code? Possible a command that solves the problem # command line itself?
For a file like:
$ cat input.txt
Feb2016-2016-110
Feb2016-2016-20
Feb2016-2016-220
Feb2016-2016-140
Feb2016-2016-100
The output is: 590.
Just set the field separator to the dash and sum the third column:
$ awk -F- '{sum+=$3} END{print sum+0}' file
590 ^^
# in case there are no matching lines, print 0
Since it looks like you are just counting those lines that contain the text "Feb2016", you can also add a filter:
awk -F- '/Feb2016/{sum+=$3} END{print sum+0}' file
# ^^^^^^^^^
# just on lines containing the string "Feb2016"
$ cat data
Feb2016-2016-110
Feb2016-2016-20
Feb2016-2016-220
Feb2016-2016-140
Feb2016-2016-100
$ cut -d - -f 3 data | paste -s -d '+' | bc
590
$

Combine two lines from different files when the same word is found in those lines

I'm new with bash, and I want to combine two lines from different files when the same word is found in those lines.
E.g.:
File 1:
organism 1
1 NC_001350
4 NC_001403
organism 2
1 NC_001461
1 NC_001499
File 2:
NC_001499 » Abelson murine leukemia virus
NC_001461 » Bovine viral diarrhea virus 1
NC_001403 » Fujinami sarcoma virus
NC_001350 » Saimiriine herpesvirus 2 complete genome
NC_022266 » Simian adenovirus 18
NC_028107 » Simian adenovirus 19 strain AA153
i wanted an output like:
File 3:
organism 1
1 NC_001350 » Saimiriine herpesvirus 2 complete genome
4 NC_001403 » Fujinami sarcoma virus
organism 2
1 NC_001461 » Bovine viral diarrhea virus 1
1 NC_001499 » Abelson murine leukemia virus
Is there any way to get anything like that output?
You can get something pretty similar to your desired output like this:
awk 'NR == FNR { a[$1] = $0; next }
{ print $1, ($2 in a ? a[$2] : $2) }' file2 file1
This reads in each line of file2 into an array a, using the first field as the key. Then for each line in file1 it prints the first field followed by the matching line in a if one is found, else the second field.
If the spacing is important, then it's a little more effort but totally possible.
For a more Bash 4 ish solution:
declare -A descriptions
while read line; do
name=$(echo "$line" | cut -d '»' -f 1 | xargs echo)
description=$(echo "$line" | cut -d '»' -f 2)
eval "descriptions['$name']=' »$description'"
done < file2
while read line; do
name=$(echo "$line" | cut -d ' ' -f 2)
if [[ -n "$name" && -n "${descriptions[$name]}" ]]; then
echo "${line}${descriptions[$name]}"
else
echo "$line"
fi
done < file1
We could create a sed-script from the second file and apply it to the first file. It is straight forward, we use the sed s command to construct another sed s command from each line and store in a variable for later usage:
sc=$(sed -rn 's#^\s+(\w+)([^\w]+)(.*)$#s/\1/\1\2\3/g;#g; p;' file2 )
sed "$sc" file1
The first command looks so weird, because we use # in the outer sed s and we use the more common / in the inner sed s command as delimiters.
Do a echo $sc to study the inner one. It just takes the parts of each line of file2 into different capture groups and then combines the captured strings to a s/find/replace/g; with
find is \1
replace is \1\2\3
You want to rebuild file2 into a sed-command file.
sed 's# \(\w\+\) \(.*\)#s/\1/\1 \2/#' File2
You can use process substitution to use the result without storing it in a temp file.
sed -f <(sed 's# \(\w\+\) \(.*\)#s/\1/\1 \2/#' File2) File1

UNIX: How to print out specific lines in a file using sed/awk/grep?

I have a data in Unix which were fetched by a command and prints:
line01
line02
line03
line04
line05
line06
line07
line08
line09
line10
line11
line12
and I wanted to sort it out such that the lines 10 to 12 are above lines 1 to 9. like this:
line10
line11
line12
line01
line02
line03
line04
line05
line06
line07
line08
line09
i tried using
<command that fetches the data> | awk 'NR>=10 || NR<=9'
and
<command that fetches the data> | sed -n -e '4,5p' -e '1,3p'
but it still display in a sorted order. i'm new to unix so i don't know how to properly use awk/sed.
PS. These data are stored in a variable which will then be processed by another command. so i needed it to be sorted that way so that line 10-12 will be processed first. :)
Use head and tail:
$ tail -n 2 file && head -n 3 file
name4
name5
name1
name2
name3
Your awk and sed approach do not work because you are just saying: print lines number X, Y and Z, and they will do so as soon as they find any of them. If you wanted to use these tools, you would need to read the file first, storing its content, and then print it.
$ awk -v OFS="\n" '{a[NR]=$0} END {print a[4], a[5], a[1], a[2], a[3]}' file
name4
name5
name1
name2
name3
Or even give the order as a variable:
awk -v order="4 5 1 2 3"
'BEGIN {split(order,lines)}
{a[NR]=$0}
END {for (i=1;i<=length(lines);i++) print a[lines[i]]}' file
If you want to give the order of the lines as an argument, you can use process substitution saying awk '...' <(command) file and working with FNR/NR to distinguish between the input and the file
Or you can use - to read from stdin as first file:
echo "4 5 1 2 3" | awk 'FNR==NR {n=split($0,lines); next}
{a[FNR]=$0}
END {for (i=1;i<=n;i++) print a[lines[i]]}' - file
As one-liner:
$ echo "4 5 1 2 3" | awk 'FNR==NR {n=split($0,lines); next} {a[FNR]=$0} END {for (i=1;i<=n;i++) print a[lines[i]]}' - a
This might work for you (GNU sed):
sed '1h;2,9H;1,9d;12G' file
Replace the hold space with line 1, then append lines 2 to 9 to the hold space and delete lines 1 thru 9. Print all other lines normally but on line 12 append the lines stored in the hold space to the pattern space.
Using sort actually:
$ sort -r -s -k 1.5,1.5 /tmp/lines
line10
line11
line12
line01
line02
line03
line04
line05
line06
line07
line08
line09
The -k 1.5,1.5 means I'm using only 5th character of first word for sorting. -r means reverse order and -s means stable - leaving lines that have same 5th character in the same order.

BASH: Cannot awk with a variable in a while loop

I have a Problem when trying to awk a READ input in a while loop.
This is my code:
#!/bin/bash
read -p "Please enter the Array LUN ID (ALU) you wish to query, separated by a comma (e.g. 2036,2037,2045): " ARRAY_LUNS
LUN_NUMBER=`echo $ARRAY_LUNS | awk -F "," '{ for (i=1; i<NF; i++) printf $i"\n" ; print $NF }' | wc -w`
echo "you entered $LUN_NUMBER LUN's"
s=0
while [ $s -lt $LUN_NUMBER ];
do
s=$[$s+1]
LUN_ID=`echo $ARRAY_LUNS | awk -F, '{print $'$s'}' | awk -v n1="$s" 'NR==n1'`
echo "NR $s :"
echo "awk -v n1="$s" 'NR==n1'$LUN_ID"
done
No matter what options with awk i try, i dont get it to display more than the first entry before the comma. It looks to me, like the loop has some problems to get the variable s counted upwards. But on the other hand, the code line:
LUN_ID=`echo $ARRAY_LUNS | awk -F, '{print $'$s'}' | awk -v n1="$s" 'NR==n1'`
works just great! Any idea on how to solve this. Another solution to my READ input would be just fine as well.
#!/bin/bash
typeset -a ARRAY_LUNS
IFS=, read -a -p "Please enter the Array LUN ID (ALU) you wish to query, separated by a comma (e.g. 2036,2037,2045): " ARRAY_LUNS
LUN_NUMBER="${#ARRAY_LUNS[#]}"
echo "you entered $LUN_NUMBER LUNs"
for((s=0;s<LUN_NUMBER;s++))
do
echo "LUN id $s: ${ARRAY_LUNS[s]}"
done
Why does your awk code not work?
The problem is not the counter. I said The last awk command in the pipe i.e.
awk -v n1="$s" 'NR==n1'.
This awk code tries to print the first line when s is 1, the second line when s is 2, the third line when s is 3, and so on... But how many lines are printed by echo $ARRAY_LUNS? Just ONE... there is no second line, no third line... just ONE line and just ONE line is printed.
That line contains all LUN_IDs in ONE LINE, i.e, one LUN_ID next to another LUN_ID, like this way:
34 45 21 223
NOT this way
34
45
21
223
Those LUN_IDs are fields printable by awk using $1, $2, $3, ... and so on.
Therefore if you want you code to run fine just remove that last command in the pipe:
LUN_ID=$(echo "$ARRAY_LUNS" | awk -F, '{print $'$s'}')
Please, for any further question, firstly read this awk guide

Resources