awk inline command and full script has different output - bash

I want to count the number of starting space at the beginning of line. My sample text file is following
aaaa bbbb cccc dddd
aaaa bbbb cccc dddd
aaaa bbbb cccc dddd
aaaa bbbb cccc dddd
Now when I write a simple script to count, I notice the different between inline command and full script of awk ouput.
First try
#!/bin/bash
while IFS= read -r line; do
echo "$line" | awk '
{
FS="[^ ]"
print length($1)
}
'
done < "tmp"
The output is
4
4
4
4
Second try
#!/bin/bash
while IFS= read -r line; do
echo "$line" | awk -F "[^ ]" '{print length($1)}'
done < "tmp"
The output is
0
2
4
0
I want to write a full script which has inline type output.
Could anyone explain me about this different? Thank you very much.

Fixed your first try:
$ while IFS= read -r line; do
echo "$line" | awk '
BEGIN { # you forgot the BEGIN
FS="[^ ]" # gotta set FS before record is read
}
{
print length($1)
}'
done < file
Output now:
0
2
4
0
And to speed it up, just use awk for it:
$ awk '
BEGIN {
FS="[^ ]"
}
{
print length($1)
}' file

Could you please try following without changing FS. Written and tested it in https://ideone.com/N8QcC8
awk '{if(match($0,/^ +/)){print RSTART+RLENGTH-1} else{print 0}}' Input_file
OR try:
awk '{match($0,/^ */); print RLENGTH}' Input_file
Output will be:
0
2
4
0
Explanation: in first solution simply using if and else condition. In if part I am using match function of awk and giving regex in it to match initial spaces of line in it. Then printing sum of RSTART+RLENGTH-1 to print number of spaces. Why it prints it because RSTART and RLENGTH are default variables of awk who gets set when a regex match is found.
On 2nd solution as per rowboat suggestion simply printing RLENGTH which will take care of printing 0 too without using if else condition.

You can try Perl. Simply capture the leading spaces in a group and print its length.
"a"=~/a/ is just to reset the regex captures at the end of each line.
perl -nle ' /(^\s+)/; print length($1)+0; "a"=~/a/ ' count_space.txt
0
2
4
0

Related

How to echo each two lines in one line [duplicate]

This question already has answers here:
How do I pair every two lines of a text file with Bash? [duplicate]
(3 answers)
Closed 1 year ago.
This post was edited and submitted for review 1 year ago and failed to reopen the post:
Original close reason(s) were not resolved
I have one txt file with below content:
20210910 ABC ZZZ EEE Rcvd Staging QV QV P
20210813_20210816_20210818
20210910 XYZ YYY EEE Rcvd Staging QV QV R
20210813_20210816
There are four rows. How to echo those in two rows. I am not getting how to write if statement in the below code. If the logic is correct please advice :
cat file.txt | while read n
do
if [ row number odd ]
then
column1=`echo $n | awk 'NF' | awk '{print $1}'`
column2=`echo $n | awk 'NF'| awk '{print $2}'`
...till column9
else
column10=`echo $n | awk 'NF'| awk '{print $1}'`
[Printing all columns :
echo " $column1 " >> ${tmpfn}
echo " $column2 " >> ${tmpfn}
...till column10]
fi
done
Output:
20210910 ABC ZZZ EEE Rcvd Staging QV QV P 20210813_20210816_20210818
20210910 XYZ YYY EEE Rcvd Staging QV QV R 20210813_20210816
You can do this with a single awk script:
awk '{x=$0; getline y; print x, y}' file.txt
No need for an if statement. Just call read twice each time through the loop.
while read -r line1 && read -r line2
do
printf "%s %s" "$line1" "$line2"
done < file.txt > "${tmpfn}"
Use this Perl one-liner (it joins each pair of lines on the tab character):
perl -lne 'chomp( $s = <> ); print join "\t", $_, $s;' file.txt > out_file.txt
For example:
seq 1 4 | perl -lne 'chomp( $s = <> ); print join "\t", $_, $s;'
1 2
3 4
The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.
Here,
-n and -l command line switches cause the script to read 1 line from STDIN or from file(s) on the command line (in a loop), and store it in variable $_, removing the terminal newline.
chomp( $s = <> ); : Do the same as above, and store it in variable $s.
Now you have, for example, line 1 stored in $_ and line 2 stored in $s.
print join "\t", $_, $s; : print the two lines delimited by tab.
Repeat the above.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

How to read two lines in shell script (In single iteration line1, line2 - Next iteration it should take line2,line3.. so on)

In my shell script, one of the variable contains set of lines. I have a requirement to get the two lines info at single iteration in which my awk needs it.
var contains:
12 abc
32 cdg
9 dfk
98 nhf
43 uyr
5 ytp
Here, In a loop I need line1, line2[i.e 12 abc \n 32 cdg] content and next iteration needs line2, line3 [32 cdg \n 9 dfk] and so on..
I tried to achieve by
while IFS= read -r line
do
count=`echo ${line} | awk -F" " '{print $1}'`
id=`echo ${line} | awk -F" " '{print $2}'`
read line
next_id=`echo ${line} | awk -F" " '{print $2}'`
echo ${count}
echo ${id}
echo ${next_id}
## Here, I have some other logic of awk..
done <<< "$var"
It's reading line1, line2 at first iteration. At second iteration it's reading line3, line4. But, I required to read line2, line3 at second iteration. Can anyone please sort out my requirement.
Thanks in advance..
Don't mix a shell script spawing 3 separate subshells for awk per-iteration when a single call to awk will do. It will be orders of magnitude faster for large input files.
You can group the messages as desired, just by saving the first line in a variable, skipping to the next record and then printing the line in the variable and the current record through the end of the file. For example, with your lines in the file named lines, you could do:
awk 'FNR==1 {line=$0; next} {print line"\n"$0"\n"; line=$0}' lines
Example Use/Output
$ awk 'FNR==1 {line=$0; next} {print line"\n"$0"\n"; line=$0}' lines
12 abc
32 cdg
32 cdg
9 dfk
9 dfk
98 nhf
98 nhf
43 uyr
43 uyr
5 ytp
(the additional line-break was simply included to show separation, the output format can be changed as desired)
You can add a counter if desired and output the count via the END rule.
The solution depends on what you want to do with the two lines.
My first thought was something like
sed '2,$ s/.*/&\n&/' <<< "${yourvar}"
But this won't help much when you must process two lines (I think | xargs -L2 won't help).
When you want them in a loop, try
while IFS= read -r line; do
if [ -n "${lastline}" ]; then
echo "Processing lines starting with ${lastline:0:2} and ${line:0:2}"
fi
lastline="${line}"
done <<< "${yourvar}"

Use bash to cut lines in one file to lengths explicitly stated in another

I have one file that is a list of numbers, and another file (same number of lines) in which I need the length of each line to match the number of the line in the other file. For example:
file 1:
5
8
7
11
15
file 2:
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
output:
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
I've tried using awk and cut together but I keep getting the error "fatal: attempt to use array `line' in a scalar context". I'm not sure how else to go about this. Any guidance is much appreciated!
awk is probably more appropriate, but you can also do:
while read line <&3; do
read len <&4; echo "${line:0:$len}";
done 3< file2 4< file1
awk is your tool for this: one of
# read all the lengths, then process file2
awk 'NR == FNR {len[NR] = $1; next} {print substr($0, 1, len[FNR])}' file1 file2
# fetch a line from file1 whilst processing file2
awk '{getline len < lenfile; print substr($0, 1, len)}' lenfile=file1 file2
another awk
$ paste file1 file2 | awk '{print substr($2,1,$1)}'
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
Using Perl
perl -lne ' BEGIN { open($f,"file1.txt");#x=<$f>;close($f) }
print substr($_,0,$x[$.-1]) ' file2.txt
with the given inputs
$ cat cmswen1.txt
5
8
7
11
15
$ cat cmswen2.txt
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
abcdefghijklmnopqrstuvwxyz
$ perl -lne ' BEGIN { open($f,"cmswen1.txt");#x=<$f>;close($f) } print substr($_,0,$x[$.-1]) ' cmswen2.txt
abcde
abcdefgh
abcdefg
abcdefghijk
abcdefghijklmno
$

Read file and pipe it to awk does not print the expected results

If I have file test.txt contains 3 lines;
1
2
3
If I run
cat test.txt| while read a ; do awk -v c=$a '{print c}' ;done
It will print these values.
1
1
But if I run
cat test.txt| awk '{a=$0; print a}'
It will behave as expected.
1
2
3
Any explanation?
Thanks
In your first version awk is not getting stdin as you intended.
Copying from the comment by #William Purcell: "The read command reads the first line of input, and awk reads the next 2 (the 2 and the 3). That's why you see two lines of output."
For the next two lines you have initialized c variable with 1 (from first read).
If you wrap your statement in BEGIN block it will work as intended.
$ seq 3 | while IFS= read -r a; do awk -v c="$a" 'BEGIN{print c}'; done
however, that's rather inefficient way of doing things.
Let awkread from the variable that has been set, not from stdin.
seq 3 | while IFS= read -r a; do awk '{print}' <<<"$a"; done

Find and update(append) csv with shell script

Input file:
ID,Name,Values
1,A,vA|A2
2,B,VB
Expected output:
1,A,vA|VA2|vA3
2,B,VB
Search file for a given ID and then append a given value in the values {field}
use case : append 'testvalue' to the values filed of ID = 1
Problem is : How tho cache the line found ?
sed's s can be used to substitution, I used sed's p {print but of no use }.
Just set n to ID of the row you want to update and x to the value:
# vA3 to entry with ID==1
$ awk -F, '$1==n{$0=$0"|"x}1' n=1 x="vA3" file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
# TEST_VALUE to entry with ID==2
$ awk -F, '$1==x{$0=$0"|"v}1' x=2 v="TEST_VALUE" file
ID,Name,Values
1,A,vA|A2
2,B,VB|TEST_VALUE
Explanation:
-F, sets the field separator to be a comma.
$1==x checks if the line we are looking at contains the ID we want to change. Where $1 is the first field on each line and x is the variable we define.
If the previous condition was true then follow block gets executed {$0=$0"|"v} where $0 is the variable containing the whole line so we are just appending the string "|" and value of the variable v to end of the line.
The trailing 1 is just a shortcut in awk to say print the line. The 1 is the condition for the block which is evaluated to true and since no block is provide awk executes the default block {print $0}. Explicitly the script would be awk -F, '$1==n{$0=$0"|"x}{print $0}' n=1 x="vA3" file.
The following script is doing something similar to Your need. It is in pure bash.
#!/usr/bin/bash
[ $# -ne 2 ] && echo "Arg missing" && exit 1;
while read l; do
[ ${l%%,*} == "$1" ] && l="$l|$2"
echo $l
done <infile
You can use as script <ID> <VALUE>. Example:
$ ./script 1 va3
ID,Name,Values
1,A,vA|A2|va3
2,B,VB
$ cat infile
ID,Name,Values
1,A,vA|A2
2,B,VB
or may be this?
awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
$ echo "1,A,vA
2,B,VB" | awk '/vA/ { $NF=$NF"|VA2" } 1' FS=, OFS=,
1,A,vA|VA2
2,B,VB
Edit 1: awk started supporting in-file substitution recently. But with your requirement it is best to go with sed solution that Kent has posted above.
$ cat file
ID,Name,Values
1,A,vA|A2
2,B,VB
$ awk '$1==1 { $NF=$NF"|vA3" } 1' FS=, OFS=, file
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
are your looking for this?
kent$ echo "1,A,vA
2,B,VB"|sed '/vA/s/$/|VA2/'
1,A,vA|VA2
2,B,VB
EDIT check the ID, then replace
kent$ echo "ID,Name,Values
1,A,vA|A2
2,B,VB"|sed 's/^1,.*/&|vA3/'
ID,Name,Values
1,A,vA|A2|vA3
2,B,VB
& means the matched part. that would be what you meant "cache"
sed ' 1 a\ |VA2|vA3 ' file1.txt

Resources