How to add number to beginning of each line? - shell

This is what I normally use to add numbers to the beginning of each line:
awk '{ print FNR " " $0 }' file
However, what I need to do is start the number at 1000001. Is there a way to start with a specific number like this instead of having to use line numbers?

there is a special command for this nl
nl -v1000001 file

You can just add 1000001 to FNR (or NR):
awk '{ print (1000001 + FNR), $0 }' file

$ seq 5 | awk -v n=1000000 '{print ++n, $0}'
1000001 1
1000002 2
1000003 3
1000004 4
1000005 5
$ seq 5 | awk -v n=30 '{print ++n, $0}'
31 1
32 2
33 3
34 4
35 5

Related

Print first few and last few lines of file through a pipe with "..." in the middle

Problem Description
This is my file
1
2
3
4
5
6
7
8
9
10
I would like to send the cat output of this file through a pipe and receive this
% cat file | some_command
1
2
...
9
10
Attempted solutions
Here are some solutions I've tried, with their output
% cat temp | (head -n2 && echo '...' && tail -n2)
1
2
...
% cat temp | tee >(head -n3) >(tail -n3) >/dev/null
1
2
3
8
9
10
# I don't know how to get the ...
% cat temp | sed -e 1b -e '$!d'
1
10
% cat temp | awk 'NR==1;END{print}'
1
10
# Can only get 2 lines
An awk:
awk -v head=2 -v tail=2 'FNR==NR && FNR<=head
FNR==NR && cnt++==head {print "..."}
NR>FNR && FNR>(cnt-tail)' file file
Or if a single pass is important (and memory allows), you can use perl:
perl -0777 -lanE 'BEGIN{$head=2; $tail=2;}
END{say join("\n", #F[0..$head-1],("..."),#F[-$tail..-1]);}' file
Or, an awk that is one pass:
awk -v head=2 -v tail=2 'FNR<=head
{lines[FNR]=$0}
END{
print "..."
for (i=FNR-tail+1; i<=FNR; i++) print lines[i]
}' file
Or, nothing wrong with being a caveman direct like:
head -2 file; echo "..."; tail -2 file
Any of these prints:
1
2
...
9
10
It terms of efficiency, here are some stats.
For small files (ie, less than 10 MB or so) all these are less than 1 second and the 'caveman' approach is 2 ms.
I then created a 1.1 GB file with seq 99999999 >file
The two pass awk: 50 secs
One pass perl: 10 seconds
One pass awk: 29 seconds
'Caveman': 2 MS
You may consider this awk solution:
awk -v top=2 -v bot=2 'FNR == NR {++n; next} FNR <= top || FNR > n-top; FNR == top+1 {print "..."}' file{,}
1
2
...
9
10
Two single pass sed solutions:
sed '1,2b
3c\
...
N
$!D'
and
sed '1,2b
3c\
...
$!{h;d;}
H;g'
Assumptions:
as OP has stated, a solution must be able to work with a stream from a pipe
the total number of lines coming from the stream is unknown
if the total number of lines is less than the sum of the head/tail offsets then we'll print duplicate lines (we can add more logic if OP updates the question with more details on how to address this situation)
A single-pass awk solution that implements a queue in awk to keep track of the most recent N lines; the queue allows us to limit awk's memory usage to just N lines (as opposed to loading the entire input stream into memory, which could be problematic when processing a large volume of lines/data on a machine with limited available memory):
h=2 t=3
cat temp | awk -v head=${h} -v tail=${t} '
{ if (NR <= head) print $0
lines[NR % tail] = $0
}
END { print "..."
if (NR < tail) i=0
else i=NR
do { i=(i+1)%tail
print lines[i]
} while (i != (NR % tail) )
}'
This generates:
1
2
...
8
9
10
Demonstrating the overlap issue:
$ cat temp4
1
2
3
4
With h=3;t=3 the proposed awk code generates:
$ cat temp4 | awk -v head=${h} -v tail=${t} '...'
1
2
3
...
2
3
4
Whether or not this is the 'correct' output will depend on OP's requirements.
I suggest with bash:
(head -n 2; echo "..."; tail -n 2) < file
Output:
1
2
...
9
10

Finding difference of values between corresponding fields in two CSV files

I have been trying to find difference of values between corresponding fields in two CSV files
$ cat f1.csv
A,B,25,35,50
C,D,30,40,36
$
$ cat f2.csv
E,F,20,40,50
G,H,22,40,40
$
Desired output:
5 -5 0
8 0 -4
I could able to achieve it like this:
$ paste -d "," f1.csv f2.csv
A,B,25,35,50,E,F,20,40,50
C,D,30,40,36,G,H,22,40,40
$
$ paste -d "," f1.csv f2.csv | awk -F, '{print $3-$8 " " $4-$9 " " $5-$10 }'
5 -5 0
8 0 -4
$
Is there any better way to achieve it with awk alone without paste command?
As first step replace only paste with awk:
awk -F ',' 'NR==FNR {file1[FNR]=$0; next} {print file1[FNR] FS $0}' f1.csv f2.csv
Output:
A,B,25,35,50,E,F,20,40,50
C,D,30,40,36,G,H,22,40,40
Then split file1[FNR] FS $0 to an array with , as field separator:
awk -F ',' 'NR==FNR {file1[FNR]=$0; next} {split(file1[FNR] FS $0, arr, FS); print arr[3]-arr[8], arr[4]-arr[9], arr[5]-arr[10]}' f1.csv f2.csv
Output:
5 -5 0
8 0 -4
From man awk:
FNR: The input record number in the current input file.
NR: The total number of input records seen so far.
Another way using nl and awk
$ (nl f1.csv;nl f2.csv) | sort | awk -F, ' {a1=$3;a2=$4;a3=$5; getline; print a1-$3,a2-$4,a3-$5 } '
5 -5 0
8 0 -4
$

Linux - loop through each element on each line

I have a text file with the following information:
cat test.txt
a,e,c,d,e,f,g,h
d,A,e,f,g,h
I wish to iterate through each line and then for each line print the index of all the characters different from e. So the ideal output would be either with a tab seperator or comma seperator
1 3 4 6 7 8
1 2 4 5 6
or
1,3,4,6,7,8
1,2,4,5,6
I have managed to iterate through each line and print the index, but the results are printed to the same line and not seperated.
while read line;do echo "$line" | awk -F, -v ORS=' ' '{for(i=1;i<=NF;i++) if($i!="e") {print i}}' ;done<test.txt
With the result being
1 3 4 6 7 8 1 2 4 5 6
If I do it only using awk
awk -F, -v ORS=' ' '{for(i=1;i<=NF;i++) if($i!="e") {print i}}'
I get the same output.
Could anyone help me with this specific issue with seperating the lines?
If you don't mind some trailing whitespace, you can just do:
while read line;do echo "$line" | awk -F, '{for(i=1;i<=NF;i++) if($i!="e") {printf i " "}; print ""}' ;done<test.txt
but it would be more typical to omit the while loop and do:
awk -F, '{for(i=1;i<=NF;i++) if($i!="e") {printf i " "}; print ""}' <test.txt
You can avoid the trailing whitespace with the slightly cryptic:
awk -F, '{m=0; for(i=1;i<=NF;i++) if($i!="e") {printf "%c%d", m++ ? " " : "", i }; print ""}' <test.txt

Variable in commands in bash

I wrote program that should write words from example.txt from the longest to the shortest. I don't know how exactly '^.{$v}$' should look like to make it work?
#!/bin/bash
v=30
while [ $v -gt 0 ] ; do
grep -P '^.{$v}$' example.txt
v=$(($v - 1))
done
I tried:
${v}
$v
"$v"
It is my first question, sorry for any mistake :)
What you're doing is not how you'd approach this problem in shell. Read why-is-using-a-shell-loop-to-process-text-considered-bad-practice to learn some of the issues and then this is how you'd really do what you're trying to do in a shell script:
$ cat file
now
is
the
winter
of
our
discontent
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n | cut -f3-
discontent
winter
now
the
our
is
of
To understand what that's doing, look at the awk output:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file
3 1 now
2 2 is
3 3 the
6 4 winter
2 5 of
3 6 our
10 7 discontent
The first number is the length of each line and the second number is the order the lines appeared in the input file so when we come to sort it:
$ awk -v OFS='\t' '{print length($0), NR, $0}' file | sort -k1rn -k2n
10 7 discontent
6 4 winter
3 1 now
3 3 the
3 6 our
2 2 is
2 5 of
we can sort by length (longest first) with -k1rn but retain the order from the input file for lines that are the same length by adding -k2n. Then the cut just removes the 2 leading numbers that awk added for sort to use.
use :
grep -P "^.{$v}$" example.txt

awk space delimiter with empty content

I have a text file which is delimited by space
1 dsfsdf 2
2 3
4 sdfsdf 4
5 sdfsdf 5
When I run
awk -F' ' '{s+=$3} END {print s}' test
It returns 11. It should return 14. I believe awk gets confused about the second line, between two spaces nothing there. How should I modify my command?
Thanks
try
awk -F' {1}' '{s+=$3} END {print s}' test
you get
14
Note
if test file contains
1 dsfsdf 2 1
2 3 1
4 sdfsdf 4 1
5 sdfsdf 5 1
also it works, i use gnu-awk
edit
how, #Ed_Morton and #"(9 )*" says is better to use literal space [ ]
awk -F'[ ]' '{s+=$3} END {print s}' test
this should work too if only the second column has missing values.
awk '{s+=$(NF-1)} END{print s}'

Resources