This question already has answers here:
How do I pair every two lines of a text file with Bash? [duplicate]
(3 answers)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
I have one txt file with below content:
20210910 ABC ZZZ EEE Rcvd Staging QV QV P
20210813_20210816_20210818
20210910 XYZ YYY EEE Rcvd Staging QV QV R
20210813_20210816
There are four rows. How to echo those in two rows. I am not getting how to write if statement in the below code. If the logic is correct please advice :
cat file.txt | while read n
do
if [ row number odd ]
then
column1=`echo $n | awk 'NF' | awk '{print $1}'`
column2=`echo $n | awk 'NF'| awk '{print $2}'`
...till column9
else
column10=`echo $n | awk 'NF'| awk '{print $1}'`
[Printing all columns :
echo " $column1 " >> ${tmpfn}
echo " $column2 " >> ${tmpfn}
...till column10]
fi
done
Output:
20210910 ABC ZZZ EEE Rcvd Staging QV QV P 20210813_20210816_20210818
20210910 XYZ YYY EEE Rcvd Staging QV QV R 20210813_20210816
You can do this with a single awk script:
awk '{x=$0; getline y; print x, y}' file.txt
No need for an if statement. Just call read twice each time through the loop.
while read -r line1 && read -r line2
do
printf "%s %s" "$line1" "$line2"
done < file.txt > "${tmpfn}"
Use this Perl one-liner (it joins each pair of lines on the tab character):
perl -lne 'chomp( $s = <> ); print join "\t", $_, $s;' file.txt > out_file.txt
For example:
seq 1 4 | perl -lne 'chomp( $s = <> ); print join "\t", $_, $s;'
1 2
3 4
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
Here,
-n and -l command line switches cause the script to read 1 line from STDIN or from file(s) on the command line (in a loop), and store it in variable $_, removing the terminal newline.
chomp( $s = <> ); : Do the same as above, and store it in variable $s.
Now you have, for example, line 1 stored in $_ and line 2 stored in $s.
print join "\t", $_, $s; : print the two lines delimited by tab.
Repeat the above.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches
Related
I want to count the number of starting space at the beginning of line. My sample text file is following
aaaa bbbb cccc dddd
aaaa bbbb cccc dddd
aaaa bbbb cccc dddd
aaaa bbbb cccc dddd
Now when I write a simple script to count, I notice the different between inline command and full script of awk ouput.
First try
#!/bin/bash
while IFS= read -r line; do
echo "$line" | awk '
{
FS="[^ ]"
print length($1)
}
'
done < "tmp"
The output is
4
4
4
4
Second try
#!/bin/bash
while IFS= read -r line; do
echo "$line" | awk -F "[^ ]" '{print length($1)}'
done < "tmp"
The output is
0
2
4
0
I want to write a full script which has inline type output.
Could anyone explain me about this different? Thank you very much.
Fixed your first try:
$ while IFS= read -r line; do
echo "$line" | awk '
BEGIN { # you forgot the BEGIN
FS="[^ ]" # gotta set FS before record is read
}
{
print length($1)
}'
done < file
Output now:
0
2
4
0
And to speed it up, just use awk for it:
$ awk '
BEGIN {
FS="[^ ]"
}
{
print length($1)
}' file
Could you please try following without changing FS. Written and tested it in https://ideone.com/N8QcC8
awk '{if(match($0,/^ +/)){print RSTART+RLENGTH-1} else{print 0}}' Input_file
OR try:
awk '{match($0,/^ */); print RLENGTH}' Input_file
Output will be:
0
2
4
0
Explanation: in first solution simply using if and else condition. In if part I am using match function of awk and giving regex in it to match initial spaces of line in it. Then printing sum of RSTART+RLENGTH-1 to print number of spaces. Why it prints it because RSTART and RLENGTH are default variables of awk who gets set when a regex match is found.
On 2nd solution as per rowboat suggestion simply printing RLENGTH which will take care of printing 0 too without using if else condition.
You can try Perl. Simply capture the leading spaces in a group and print its length.
"a"=~/a/ is just to reset the regex captures at the end of each line.
perl -nle ' /(^\s+)/; print length($1)+0; "a"=~/a/ ' count_space.txt
0
2
4
0
In my shell script, one of the variable contains set of lines. I have a requirement to get the two lines info at single iteration in which my awk needs it.
var contains:
12 abc
32 cdg
9 dfk
98 nhf
43 uyr
5 ytp
Here, In a loop I need line1, line2[i.e 12 abc \n 32 cdg] content and next iteration needs line2, line3 [32 cdg \n 9 dfk] and so on..
I tried to achieve by
while IFS= read -r line
do
count=`echo ${line} | awk -F" " '{print $1}'`
id=`echo ${line} | awk -F" " '{print $2}'`
read line
next_id=`echo ${line} | awk -F" " '{print $2}'`
echo ${count}
echo ${id}
echo ${next_id}
## Here, I have some other logic of awk..
done <<< "$var"
It's reading line1, line2 at first iteration. At second iteration it's reading line3, line4. But, I required to read line2, line3 at second iteration. Can anyone please sort out my requirement.
Thanks in advance..
Don't mix a shell script spawing 3 separate subshells for awk per-iteration when a single call to awk will do. It will be orders of magnitude faster for large input files.
You can group the messages as desired, just by saving the first line in a variable, skipping to the next record and then printing the line in the variable and the current record through the end of the file. For example, with your lines in the file named lines, you could do:
awk 'FNR==1 {line=$0; next} {print line"\n"$0"\n"; line=$0}' lines
Example Use/Output
$ awk 'FNR==1 {line=$0; next} {print line"\n"$0"\n"; line=$0}' lines
12 abc
32 cdg
32 cdg
9 dfk
9 dfk
98 nhf
98 nhf
43 uyr
43 uyr
5 ytp
(the additional line-break was simply included to show separation, the output format can be changed as desired)
You can add a counter if desired and output the count via the END rule.
The solution depends on what you want to do with the two lines.
My first thought was something like
sed '2,$ s/.*/&\n&/' <<< "${yourvar}"
But this won't help much when you must process two lines (I think | xargs -L2 won't help).
When you want them in a loop, try
while IFS= read -r line; do
if [ -n "${lastline}" ]; then
echo "Processing lines starting with ${lastline:0:2} and ${line:0:2}"
fi
lastline="${line}"
done <<< "${yourvar}"
I have two text files, and each file has one column with several rows:
FILE1
a
b
c
FILE2
d
e
f
I want to create a file that has the following output:
a - d
b - e
c - f
All the entries are meant to be numbers (decimals). I am completely stuck and do not know how to proceed.
Using paste seems like the obvious choice but unfortunately you can't specify a multiple character delimiter. To get around this, you can pipe the output to sed:
$ paste -d- file1 file2 | sed 's/-/ - /'
a - d
b - e
c - f
Paste joins the two files together and sed adds the spaces around the -.
If your desired output is the result of the subtraction, then you could use awk:
paste file1 file2 | awk '{ print $1 - $2 }'
given:
$ cat /tmp/a.txt
1
2
3
$ cat /tmp/b.txt
4
5
6
awk is a good bet to process the two files and do arithmetic:
$ awk 'FNR==NR { a[FNR""] = $0; next } { print a[FN""]+$1 }' /tmp/a.txt /tmp/b.txt
5
7
9
Or, if you want the strings rather than arithmetic:
$ awk 'FNR==NR { a[FNR""] = $0; next } { print a[FNR""] " - "$0 }' /tmp/a.txt /tmp/b.txt
1 - 4
2 - 5
3 - 6
Another solution using while and file descriptors :
while read -r line1 <&3 && read -r line2 <&4
do
#printf '%s - %s\n' "$line1" "$line2"
printf '%s\n' $(($line1 - $line2))
done 3<f1.txt 4<f2.txt
I have a document A which contains n lines. I also have a sequence of n integers all of which are unique and <n. My goal is to create a document B which has the same contents as A, but with reordered lines, based on the given sequence.
Example:
A:
Foo
Bar
Bat
sequence: 2,0,1 (meaning: First line 2, then line 0, then line 1)
Output (B):
Bat
Foo
Bar
Thanks in advance for the help
Another solution:
You can create a sequence file by doing (assuming sequence is comma delimited):
echo $sequence | sed s/,/\\n/g > seq.txt
Then, just do:
paste seq.txt A.txt | sort tmp2.txt | sed "s/^[0-9]*\s//"
Here's a bash function. The order can be delimited by anything.
Usage: schwartzianTransform "A.txt" 2 0 1
function schwartzianTransform {
local file="$1"
shift
local sequence="$#"
echo -n "$sequence" | sed 's/[^[:digit:]][^[:digit:]]*/\
/g' | paste -d ' ' - "$file" | sort -n | sed 's/^[[:digit:]]* //'
}
Read the file into an array and then use the power of indexing :
echo "Enter the input file name"
read ip
index=0
while read line ; do
NAME[$index]="$line"
index=$(($index+1))
done < $ip
echo "Enter the file having order"
read od
while read line ; do
echo "${NAME[$line]}";
done < $od
[aman#aman sh]$ cat test
Foo
Bar
Bat
[aman#aman sh]$ cat od
2
0
1
[aman#aman sh]$ ./order.sh
Enter the input file name
test
Enter the file having order
od
Bat
Foo
Bar
an awk oneliner could do the job:
awk -vs="$s" '{d[NR-1]=$0}END{split(s,a,",");for(i=1;i<=length(a);i++)print d[a[i]]}' file
$s is your sequence.
take a look this example:
kent$ seq 10 >file #get a 10 lines file
kent$ s=$(seq 0 9 |shuf|tr '\n' ','|sed 's/,$//') # get a random sequence by shuf
kent$ echo $s #check the sequence in var $s
7,9,1,0,5,4,3,8,6,2
kent$ awk -vs="$s" '{d[NR-1]=$0}END{split(s,a,",");for(i=1;i<=length(a);i++)print d[a[i]]}' file
8
10
2
1
6
5
4
9
7
3
One way(not an efficient one though for big files):
$ seq="2 0 1"
$ for i in $seq
> do
> awk -v l="$i" 'NR==l+1' file
> done
Bat
Foo
Bar
If your file is a big one, you can use this one:
$ seq='2,0,1'
$ x=$(echo $seq | awk '{printf "%dp;", $0+1;print $0+1> "tn.txt"}' RS=,)
$ sed -n "$x" file | awk 'NR==FNR{a[++i]=$0;next}{print a[$0]}' - tn.txt
The 2nd line prepares a sed command print instruction, which is then used in the 3rd line with the sed command. This prints only the line numbers present in the sequence, but not in the order of the sequence. The awk command is used to order the sed result depending on the sequence.
I have a tab delimited file (in which number of columns in each row is not fixed) which looks like this:
chr1 92536437 92537640 NM_024813 NM_053274
I want to have a file from this in following order (first three columns are identifiers which I need it while splitting it)
chr1 92536437 92537640 NM_024813
chr1 92536437 92537640 NM_053274
Suggestions for a shell script.
#!/bin/bash
{
IFS=' '
while read a b c rest
do
for fld in $rest
do
echo -e "$a\t$b\t$c\t$fld"
done
done
}
Note that you should enter a real tab there (IFS)
I also thought I should do a perl version:
#!/bin/perl -n
($a,$b,$c,#r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for #r
To do it all from the commandline, reading from in.txt and outputting to out.txt:
perl -ne '($a,$b,$c,#r)=(chomp and split /\t/); print "$a\t$b\t$c\t$_\n" for #r' in.txt > out.txt
Of course if you save the perl script (say as script.pl)
perl script.pl in.txt > out.txt
If you also make the script file executable (chmod +x script.pl):
./script.pl in.txt > out.txt
HTH
Not shell, and the other answer is perfectly fine, but i onelined it in perl :
perl -F'/\s/' -lane '$,="\t"; print #F,$_ for splice #F,3' $FILE
Edit: New (even more unreadable ;) version, inspired by the other answers. Abusing perl's command line parameters and special variables for autosplitting and line ending handling.
Means: For each of the fields after the three first (for splice #F,3), print the first three and it (print #F,$_).
-F sets the field separator to \s (should be \t) for -a autosplitting into #F.
-l turns on line ending handling for -n which runs the -e code for each line of the input.
$, is the output field separator.
[Edited]
So you want to duplicate the first three columns for each remaining item?
$ cat File | while read X
do PRE=$(echo "$X" | cut -f1-3 -d ' ')
for Y in $(echo "$X" | cut -f4- -d ' ')
do echo $PRE $Y >> OutputFilename
done
done
Returns:
chr 786 789 NM
chr 786 789 NR
chr 786 789 NT
chr 123 345 NR
This cuts the first three space delimited columns as a prefix, and then abuses the fact that a for loop will step through a space delimited list to call echo.
Enjoy.
This is just a subset of your data comparison in two files question.
Extracting my slightly hacky solution from there:
for i in 4 5 6 7; do join -e _ -j $i f f -o 1.1,1.2,1.3,0; done | sed '/_$/d'