Add sequential number at the beginning of files - bash

I have 5 files I want to add sequential numbers and tabulation at the beginning of each file but the second file should start with the last number from the first file and so on here's an example:
file1
line1
line2
....
line13
file2
line1
line2
file5
line1
line2
Output file1
1 line1
........
13 line13
output file2
14 line1
15 line2
And so on

if you want to concatenate files and number lines, use cat:
cat -n file1 file2 file3 file4 file5
if you want to create a separate output file for each input file, use awk:
awk '{
printf "%d\t%s\n",NR,$0 > ("output_"FILENAME)
}' file1 file2 file3 file4 file5
reads file1..5, numbers lines and outputs them to output_file1..5. note that if you have too many files then above awk command will fail with an error like too many open file descriptors., in that case use following, it closes the previous file when input file changes.
awk '
FILENAME!=f{close("output_"f);f=FILENAME}
{printf "%d\t%s\n",NR,$0 > ("output_"f)}
' file1 file2 file3 file4 file5

Related

pick a string from file1, search in file2 and get the other values from file2 and paste in file1

File1:
string1,string2,string3
File2:
string2,value1,value2,value3,value4
Output:
string1,string2,value1,value2,string3

how to copy lines one by one from a file and paste then into another file after every n lines using shell script

say i have file1 with content
line1
line2
line3
and another file2 with content
lineA
lineB
lineC
lineD
lineE
lineF
lineG
lineH
lineI
I want to make file2 as
lineA
lineB
lineC
line1
lineD
lineE
lineF
line2
lineG
lineH
lineI
line3
Here is a way to do it with paste
cat file2 | paste -d'\n' - - - file1
The dash argument for paste means to read from the standard input, which is the cat file2 output, while the fourth argument is file1. So, with three dashes, we will paste every 3 lines of one file with 1 from another and the delimiter is the newline character (-d'\n').
This will work in case of remaining lines in any of these files, as paste will continue when EOF is found for one of the inputs. But it may print a couple of empty lines in that case, so you can pipe to any command to remove them, (supposing you don't have actual empty lines in your files), for example
cat file2 | paste -d'\n' - - - file1 | sed '/^$/d'
This python code will do it, the parameters in your case would be
python interlace.py file1 file2 file3 3
I would suggest just using a mv file3 file2 afterward if you want it to be in-place. This is because if you start writing to file2 before you've read everything it can be overwritten
import sys
if len(sys.argv[1:]) == 4:
file1 = open(sys.argv[1], 'r')
file2 = open(sys.argv[2], 'r')
file3 = open(sys.argv[3], 'w')
line_count = int(sys.argv[4])
current_counter = 0
for file2_line in file2.readlines():
current_counter += 1
file3.write(file2_line)
if current_counter == line_count:
file3.write(file1.readline())
current_counter = 0
for file1_line in file1.readlines():
file3.write(file1_line)
file3.close()
This also works in the cases where file1 runs out of lines early, in which case file2's lines continue as normal, and when file1 has extra lines they just get added to the end.
This might work for you (GNU sed):
n=3
sed "$n~$n"'R file1' file2
After the third line and subsequently every third line of file2, append a line from file1.
Using awk and getline:
awk '1;NR%3==0{if((getline < "file1")>0)print}' file2
lineA
lineB
lineC
line1
lineD
...
You could probably obfuscate it to awk '1;NR%3==0&&(getline < "file1")' file2 (untested).

Shell - replace only number on specific line with number from another file

I am working on a Shell script and I need to replace only number on line 13 with a number from another file.
file1:
line1
line2
...
Text: 95%
...
file2:
98.4256
The result should look like this:
file1:
...
Text: 98.4256%
...
Basically I need to replace the number before % in file1 on line 13 with a number from file2 (the number in file2 is on line 1).
Thanks in advance for any tips.
sed "4 s/:.*/: $(cat file2)%/" file1
line1
line2
...
Text: 98.4256%
...
Change 4 to any other number of your requirement.
Contents of file1
cat file1
line1
line2
...
Text: 95%
...
Contents of file2
cat file2
98.4256

diff command and writing output to tab separated file

I have two txt files
file 1:
a 1
b 2
d 4
and file 2:
a 1
d 4
I want the lines which are in file1 but not in file2 to be in a tab separated file3 i.e.
b 2
I use
diff file1 file2 | grep ">" > file3
file3 has the right lines but I want to get rid of the ">" symbol.
Can you suggest how I can do this?
You don't want diff here you want comm.
comm -2 -3 file1 file2
Here is an awk command that doesn't require input files to be sorted:
awk 'FNR==NR{a[$0]; next} !($0 in a)' file2 file1
b 2
Explanation:
FNR==NR # execute this block for first file in the list (file2)
a[$0] # populate an associative array with key as $0 (full line)
next # move to next record
!($0 in a) # for 2nd file in list (file1) print if a record doesn't exist in array a

extract different lines from files using Bash

I have two files and I use the "comm -23 file1 file2" command to extract the lines that are different from a file to another.
I would also need something that extracts the different lines but also preserves the string "line_$NR".
Example:
file1:
line_1: This is line0
line_2: This is line1
line_3: This is line2
line_4: This is line3
file2:
line_1: This is line1
line_2: This is line2
line_3: This is line3
I need this output:
differences file1 file2:
line_1: This is line0.
In conclusion I need to extract the differences as if the file has not line_$NR at the beginning but when I print the result I need to also print line_$NR.
Try using awk
awk -F: 'NR==FNR {a[$2]; next} !($2 in a)' file2 file1
Output:
line_1: This is line0
Short Description
awk -F: ' # Set filed separator as ':'. $1 contains line_<n> and $2 contains 'This is line_<m>'
NR==FNR { # If Number of records equal to relative number of records, i.e. first file is being parsed
a[$2]; # store $2 as a key in associative array 'a'
next # Don't process further. Go to next record.
}
!($2 in a) # Print a line if $2 of that line is not a key of array 'a'
' file2 file1
Additional Requirement (In comment)
And if I have multiple ":" in a line : "line_1: This :is: line0"
doesn't work. How can I only take the line_x
In that case, try following (GNU awk only)
awk -F'line_[0-9]+:' 'NR==FNR {a[$2]; next} !($2 in a)' file2 file1
this awk line is longer, however it would work no matter where the differences were located:
awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' file1 file2
test:
kent$ head f*
==> f1 <==
line_1: This is line0
line_2: This is line1
line_3: This is line2
line_4: This is line3
==> f2 <==
line_1: This is line1
line_2: This is line2
line_3: This is line3
#test f1 f2
kent$ awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' f1 f2
line_1: This is line0
#test f2 f1:
kent$ awk 'NR==FNR{a[$NF]=$0;next}a[$NF]{a[$NF]=0;next}7;END{for(x in a)if(a[x])print a[x]}' f2 f1
line_1: This is line0

Resources