tab delimit a file in bash - bash

I have two files. I would like to join them by column and convert them from tab delimited to space delimted.
What is needed on top of
paste fileA fileB
to make that work?

Through awk,
awk 'FNR==NR{a[FNR]=$1; next} {print a[FNR]"\t"$2}' file1 file2
Example:
$ cat m
cat
dog
$ cat r
foo bar
bar foo
$ awk 'FNR==NR{a[FNR]=$1; next} {print a[FNR]"\t"$2}' m r
cat bar
dog foo

Talking about pure bash, something like this, haven't tested but you should be able to fix any bugs:
exec 3<file1
exec 4<file2
while :; do
read -r -u 3 f1_w || exit
read -r -u 4 f2_w1 f2_w2 || exit 1
echo -e "${f1_w}\t${f2_w2}"
done

Related

How to replace a match with an entire file in BASH?

I have a line like this:
INPUT file1
How can I get bash to read that line and directly copy in the contents of "file1.txt" in place of that line? Or if it sees: INPUT file2 on a line, put in `file2.txt" etc.
The best I can do is a lot of tr commands, to paste the file together, but that seems an overly complicated solution.
'sed' also replaces lines with strings, but I don't know how to input the entire content of a file, which can be hundreds of lines into the replacement.
Seems pretty straightforward with awk. You may want to handle errors differently/more gracefully, but:
$ cat file1
Line 1 of file 1
$ cat file2
Line 1 of file 2
$ cat input
This is some content
INPUT file1
This is more content
INPUT file2
This file does not exist
INPUT file3
$ awk '$1=="INPUT" {system("cat " $2); next}1' input
This is some content
Line 1 of file 1
This is more content
Line 1 of file 2
This file does not exist
cat: file3: No such file or directory
A perl one-liner, using the CPAN module Path::Tiny
perl -MPath::Tiny -pe 's/INPUT (\w+)/path("$1.txt")->slurp/e' input_file
use perl -i -M... to edit the file in-place.
Not the most efficient possible way, but as an exercise I made a file to edit named x and a couple of input sources named t1 & t2.
$: cat x
a
INPUT t2
b
INPUT t1
c
$: while read k f;do sed -ni "/$k $f/!p; /$k $f/r $f" x;done< <( grep INPUT x )
$: cat x
a
here's
==> t2
b
this
is
file ==> t1
c
Yes, the blank lines were in the INPUT files.
This will sed your base file repeatedly, though.
The awk solution given is better, as it only reads through it once.
If you want to do this in pure Bash, here's an example:
#!/usr/bin/env bash
if (( $# < 1 )); then
echo "Usage: ${0##*/} FILE..."
exit 2
fi
for file; do
readarray -t lines < "${file}"
for line in "${lines[#]}"; do
if [[ "${line}" == "INPUT "* ]]; then
cat "${line#"INPUT "}"
continue
fi
echo "${line}"
done > "${file}"
done
Save to file and run like this: ./script.sh input.txt (where input.txt is a file containing text mixed with INPUT <file> statements).
Sed solution similar to awk given erlier:
$ cat f
test1
INPUT f1
test2
INPUT f2
test3
$ cat f1
new string 1
$ cat f2
new string 2
$ sed 's/INPUT \(.*\)/cat \1/e' f
test1
new string 1
test2
new string 2
test3
Bash variant
while read -r line; do
[[ $line =~ INPUT.* ]] && { tmp=($BASH_REMATCH); cat ${tmp[1]}; } || echo $line
done < f

combine two files and overwrite original file using cat

I try to combine two files using cat command, but facing a problem.
original.txt
============
foo
bar
foo
bar
following is my script.
cat original.txt | wc -l > linecount.txt | cat linecount.txt original.txt > original.txt
This script returns error that says "input file and output file is the same.".
Expected result is like this.
original.txt
============
4
foo
bar
foo
bar
Any idea?
You can probably use:
{ wc -l < original.txt; cat original.txt; } > linecount.txt &&
mv linecount.txt original.txt
Or using awk:
awk 'NR==FNR{++n; next} FNR==1{print n} 1' original.txt{,} > linecount.txt &&
mv linecount.txt original.txt
Or:
awk -v n=$(wc -l < original.txt) 'NR==1{print n} 1' original.txt > linecount.txt &&
mv linecount.txt original.txt
You can use sponge from the moreutils package. I like it for that:
cat <(wc -l orig.txt) orig.txt | sponge orig.txt
If you don't have sponge or cannot install it, you can implement it with awk as a bash function:
function sponge() {
awk -v o="${1}" '{b=NR>1?b""ORS""$0:$0}END{print b > o}'
}
Keep in mind that this will need to store the whole file in memory. Don't use it for very large files.

extract multiple lines of a file unix

I have a file A with 400,000 lines. I have another file B that has a bunch of line numbers.
File B:
-------
98
101
25012
10098
23489
I have to extract those line numbers specified in file B from file A. That is I want to extract lines 98,101,25012,10098,23489 from file A. How to extract these lines in the following cases.
File B is a explicit file.
File B is arriving out of a pipe. For eg., grep -n pattern somefile.txt is giving me the file B.
I wanted to use see -n 'x'p fileA. However, I don't know how to give the 'x' from a file. Also, I don't to how to pipe the value of 'x' from a command.
sed can print the line numbers you want:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p'
bar
If you want multiple lines:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p;3p'
bar
baz
To transform a set of lines to a sed command like this, use sed for beautiful sedception:
$ printf $'98\n101' | sed -e 's/$/;/'
98;
101;
Putting it all together:
sed -ne "$(sed -e 's/$/p;/' B)" A
Testing:
$ cat A
1
22
333
4444
$ cat B
1
3
$ sed -ne "$(sed -e 's/$/p;/' B)" A
1
333
QED.
awk fits this task better:
fileA in file case:
awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB fileA
fileA content from pipe:
cat fileA|awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB -
oh, you want FileB in file or from pipe, then same awk cmd:
awk '...' fileB fileA
and
cat fileB|awk '...' - fileA

Looping over input fields as array

Is it possible to do something like this:
$ cat foo.txt
1 2 3 4
foo bar baz
hello world
$ awk '{ for(i in $){ print $[i]; } }' foo.txt
1
2
3
4
foo
bar
baz
hello
world
I know you could do this:
$ awk '{ split($0,array," "); for(i in array){ print array[i]; } }' foo.txt
2
3
4
1
bar
baz
foo
world
hello
But then the result is not in order.
Found out myself:
$ awk '{ for(i = 1; i <= NF; i++) { print $i; } }' foo.txt
I'd use sed:
sed 's/\ /\n/g' foo.txt
No need for awk, sed or perl. You can easily do this directly in the shell:
for i in $(cat foo.txt); do echo "$i"; done
If you're open to using Perl, either of these should do the trick:
perl -lane 'print $_ for #F' foo.txt
perl -lane 'print join "\n",#F' foo.txt
These command-line options are used:
-n loop around each line of the input file, do not automatically print the line
-l removes newlines before processing, and adds them back in afterwards
-a autosplit mode – split input lines into the #F array. Defaults to splitting on whitespace.
-e execute the perl code

Replace certain token with the content of a file (using a bash-script)

I have a file containing some text and the words INSERT_HERE1 and INSERT_HERE2. I'd like to replace these words with the content of file1.txt and file2.txt respectively.
I suspect sed or awk could pull it off but I've basically never used them.
Sed does have a built-in read file command. The commands you want would look something like this:
$ sed -e '/INSERT_HERE1/ {
r FILE1
d }' -e '/INSERT_HERE2/ {
r FILE2
d }' < file
This would output
foo
this is file1
bar
this is file2
baz
The r command reads the file, and the d command deletes the line with the INSERT_HERE tags. You need to use the curly braces since sed commands and multi-line input since sed commands have to start on their own line, and depending on your shell, you may need \ at the end of the lines to avoid premature execution. If this is something you would use a lot, you can just put the command in a file and use sed -f to run it.
If you are okay with Perl you can do:
$ cat FILE1
this is file1
$ cat FILE2
this is file2
$ cat file
foo
INSERT_HERE1
bar
INSERT_HERE2
baz
$ perl -ne 's/^INSERT_HERE(\d+)\s+$/`cat FILE$1`/e;print' file
foo
this is file1
bar
this is file2
baz
$
This is not tested, but would be pretty close to what you need:
sed -e "s/INSERT_HERE1/`cat file1.txt`/" -e "s/INSERT_HERE2/`cat file2.txt`/" <file >file.out
It will not properly handle a file with slashes in it, though, so you may need to tweak it a bit.
I'd recommend Perl instead, though. Something like this:
#!/usr/bin/perl -w
my $f1 = `cat file1.txt`;
my $f2 = `cat file2.txt`;
while (<>) {
chomp;
s/INSERT_HERE1/$f1/;
s/INSERT_HERE2/$f2/;
print "$_\n";
}
This assumes that INSERT_HERE1 and INSERT_HERE2 may only appear once per line, and that the file1.txt does not include the text INSERT_HERE2 (wouldn't be difficult to fix, though). Use like this:
./script <file >file.out
This is suitable for small substitution files that may be substituted many times:
awk 'BEGIN {
while ((getline line < ARGV[1]) > 0) {file1 = file1 nl line; nl = "\n"};
close (ARGV[1]); nl = "";
while ((getline line < ARGV[2]) > 0) {file2 = file2 nl line; nl = "\n"};
close (ARGV[2]);
ARGV[1] = ""; ARGV[2] = "" }
{ gsub("token1", file1);
gsub("token2", file2);
print }' file1.txt file2.txt mainfile.txt
You may want to add some extra newlines here and there, depending on how you want your output to look.
Easily done with Bash. If you need it to be POSIX shell let me know:
#!/bin/bash
IFS= # Needed to prevent the shell from interpreting the newlines
f1=$(< /path/to/file1.txt)
f2=$(< /path/to/file2.txt)
while read line; do
if [[ "$line" == "INSERT_HERE1" ]]; then
echo "$f1"
elif [[ "$line" == "INSERT_HERE2" ]]; then
echo "$f2"
else
echo "$line"
fi
done < /path/to/input/file
This snippet replaces any section that is specified in the upper array. For e.g. here
<!--insert.txt-->
with the contents of "insert.txt"
#!/bin/bash
replace[1]=\<!--insert.txt--\> ; file[1]=insert.txt
replace[2]=\<!--insert2.txt--\> ; file[2]=insert2.txt
replacelength=${#replace[#]}
cat blank.txt > tmp.txt
for i in $(seq 1 ${replacelength})
do
echo Replacing ${file[i]} ...
sed -e "/${replace[i]}/r ${file[i]}" -e "/${replace[i]}/d" tmp.txt > tmp_2.txt
mv tmp_2.txt tmp.txt
done
mv tmp.txt file.txt
If you're not afraid of .zip files you can try this example as long as it is online: http://ablage.stabentheiner.de/2013-04-16_contentreplace.zip
I would use perl's in place replacement with -i.ext option
perl -pi.bak -e 's|INSERT_HERE1|`cat FILE1`|ge;
s|INSERT_HERE2|`cat FILE2`|ge;' myfile
Then, use diff myfile.bak myfile to verify:

Resources