extract multiple lines of a file unix - shell

I have a file A with 400,000 lines. I have another file B that has a bunch of line numbers.
File B:
-------
98
101
25012
10098
23489
I have to extract those line numbers specified in file B from file A. That is I want to extract lines 98,101,25012,10098,23489 from file A. How to extract these lines in the following cases.
File B is a explicit file.
File B is arriving out of a pipe. For eg., grep -n pattern somefile.txt is giving me the file B.
I wanted to use see -n 'x'p fileA. However, I don't know how to give the 'x' from a file. Also, I don't to how to pipe the value of 'x' from a command.

sed can print the line numbers you want:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p'
bar
If you want multiple lines:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p;3p'
bar
baz
To transform a set of lines to a sed command like this, use sed for beautiful sedception:
$ printf $'98\n101' | sed -e 's/$/;/'
98;
101;
Putting it all together:
sed -ne "$(sed -e 's/$/p;/' B)" A
Testing:
$ cat A
1
22
333
4444
$ cat B
1
3
$ sed -ne "$(sed -e 's/$/p;/' B)" A
1
333
QED.

awk fits this task better:
fileA in file case:
awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB fileA
fileA content from pipe:
cat fileA|awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB -
oh, you want FileB in file or from pipe, then same awk cmd:
awk '...' fileB fileA
and
cat fileB|awk '...' - fileA

Related

How to replace a match with an entire file in BASH?

I have a line like this:
INPUT file1
How can I get bash to read that line and directly copy in the contents of "file1.txt" in place of that line? Or if it sees: INPUT file2 on a line, put in `file2.txt" etc.
The best I can do is a lot of tr commands, to paste the file together, but that seems an overly complicated solution.
'sed' also replaces lines with strings, but I don't know how to input the entire content of a file, which can be hundreds of lines into the replacement.
Seems pretty straightforward with awk. You may want to handle errors differently/more gracefully, but:
$ cat file1
Line 1 of file 1
$ cat file2
Line 1 of file 2
$ cat input
This is some content
INPUT file1
This is more content
INPUT file2
This file does not exist
INPUT file3
$ awk '$1=="INPUT" {system("cat " $2); next}1' input
This is some content
Line 1 of file 1
This is more content
Line 1 of file 2
This file does not exist
cat: file3: No such file or directory
A perl one-liner, using the CPAN module Path::Tiny
perl -MPath::Tiny -pe 's/INPUT (\w+)/path("$1.txt")->slurp/e' input_file
use perl -i -M... to edit the file in-place.
Not the most efficient possible way, but as an exercise I made a file to edit named x and a couple of input sources named t1 & t2.
$: cat x
a
INPUT t2
b
INPUT t1
c
$: while read k f;do sed -ni "/$k $f/!p; /$k $f/r $f" x;done< <( grep INPUT x )
$: cat x
a
here's
==> t2
b
this
is
file ==> t1
c
Yes, the blank lines were in the INPUT files.
This will sed your base file repeatedly, though.
The awk solution given is better, as it only reads through it once.
If you want to do this in pure Bash, here's an example:
#!/usr/bin/env bash
if (( $# < 1 )); then
echo "Usage: ${0##*/} FILE..."
exit 2
fi
for file; do
readarray -t lines < "${file}"
for line in "${lines[#]}"; do
if [[ "${line}" == "INPUT "* ]]; then
cat "${line#"INPUT "}"
continue
fi
echo "${line}"
done > "${file}"
done
Save to file and run like this: ./script.sh input.txt (where input.txt is a file containing text mixed with INPUT <file> statements).
Sed solution similar to awk given erlier:
$ cat f
test1
INPUT f1
test2
INPUT f2
test3
$ cat f1
new string 1
$ cat f2
new string 2
$ sed 's/INPUT \(.*\)/cat \1/e' f
test1
new string 1
test2
new string 2
test3
Bash variant
while read -r line; do
[[ $line =~ INPUT.* ]] && { tmp=($BASH_REMATCH); cat ${tmp[1]}; } || echo $line
done < f

How to merge multiple files in order and append filename at the end in bash

I have multiple files like this:
BOB_1.brother_bob12.txt
BOB_2.brother_bob12.txt
..
BOB_35.brother_bob12.txt
How to join these files in order from {1..36} and append filename at the end of each row? I have tried:
for i in *.txt; do sed 's/$/ '"$i"'/' $i; done > outfile #joins but not in order
cat $(for((i=1;i<38;i++)); do echo -n "BOB_${i}.brother_bob12.txt "; done) # joins in order but no filename at the end
file sample:
1 345 378 1 3 4 5 C T
1 456 789 -1 2 3 4 A T
Do not do cat $(....). You may just:
for ((i=1;i<38;i++)); do
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done
You may also do:
printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'
You may use:
for i in {1..36}; do
fn="BOB_${i}.brother_bob12.txt"
[[ -f $fn ]] && awk -v OFS='\t' '{print $0, FILENAME}' "$fn"
done > output
Note that it will insert FILENAME as the last field in every record. If this is not what you want then show your expected output in question.
This might work for you (GNU sed);
sed -n 'p;F' BOB_{1..36}.brother_bob12.txt | sed 'N;s/\n/ /' >newFile
Used 2 invocations of sed, the first to append the file name after each line of each file. The second to replace the newline between each 2 lines by a space.

How do I merge different text files?

I have 3 txt files:
file1.txt:
11
file2.txt:
22
file3.txt:
33
I want to combine the 3 text files into a single file and put a comma between them.
endfile.txt should be as follows:
11,22,33
I'd try:
cat file1.txt; cat file2.txt; cat file3.txt > endfile.txt
Wrote line by line but I want to print side by side and put a comma
Could you help?
cat file1.txt | cat - file2.txt | cat - file3.txt | tr '\n' ',' | head --bytes -1
A very easy approach simply uses printf:
(printf "%s" $(cat file1.txt); printf ",%s" $(cat file2.txt file3.txt)) > endfile.txt
Which would results in 11,22,33 in endfile.txt. The two grouping of printf were used to prevent a comma from being written before 11 and the entire line is executed as a subshell so output from all commands is redirected to endfile.txt. You may also want to write a final '\n' after file3.txt to ensure the resulting endfile.txt contains a POSIX line-ending.
My answer is following.
$ cat *.txt | sed -z 's/\n\(.\)/,\1/g'
If you define exactly order, it is following.
$ cat file{1,2,3}.txt | sed -z 's/\n\(.\)/,\1/g'
CAUTION
My sed is version 4.8.
$ sed --version | head -n 1
sed (GNU sed) 4.8
Use paste:
paste -sd, file1.txt file2.txt file3.txt > endfile.txt

Delete first line of file if it's empty

How can I delete the first (!) line of a text file if it's empty, using e.g. sed or other standard UNIX tools. I tried this command:
sed '/^$/d' < somefile
But this will delete the first empty line, not the first line of the file, if it's empty. Can I give sed some condition, concerning the line number?
With Levon's answer I built this small script based on awk:
#!/bin/bash
for FILE in $(find some_directory -name "*.csv")
do
echo Processing ${FILE}
awk '{if (NR==1 && NF==0) next};1' < ${FILE} > ${FILE}.killfirstline
mv ${FILE}.killfirstline ${FILE}
done
The simplest thing in sed is:
sed '1{/^$/d}'
Note that this does not delete a line that contains all blanks, but only a line that contains nothing but a single newline. To get rid of blanks:
sed '1{/^ *$/d}'
and to eliminate all whitespace:
sed '1{/^[[:space:]]*$/d}'
Note that some versions of sed require a terminator inside the block, so you might need to add a semi-colon. eg sed '1{/^$/d;}'
Using sed, try this:
sed -e '2,$b' -e '/^$/d' < somefile
or to make the change in place:
sed -i~ -e '2,$b' -e '/^$/d' somefile
If you don't have to do this in-place, you can use awk and redirect the output into a different file.
awk '{if (NR==1 && NF==0) next};1' somefile
This will print the contents of the file except if it's the first line (NR == 1) and it doesn't contain any data (NF == 0).
NR the current line number,NF the number of fields on a given line separated by blanks/tabs
E.g.,
$ cat -n data.txt
1
2 this is some text
3 and here
4 too
5
6 blank above
7 the end
$ awk '{if (NR==1 && NF==0) next};1' data.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
and
cat -n data2.txt
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
$ awk '{if (NR==1 && NF==0) next};1' data2.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
Update:
This sed solution should also work for in-place replacement:
sed -i.bak '1{/^$/d}' somefile
The original file will be saved with a .bak extension
Delete the first line of all files under the actual directory if the first line is empty :
find -type f | xargs sed -i -e '2,$b' -e '/^$/d'
This might work for you:
sed '1!b;/^$/d' file

How can I delete all lines before a specific string from a number of files

I have n files, like:
file1:
1aaa
2eee
Test XXX
Hanna
Lars
file2:
1fff
2ddd
3zzz
Test XXX
Mike
Charly
I want to remove all rows before "Test XXX" from all n files.
The number of rows to delete varies between files.
My idea:
for file in 1 :n
do
pos=grep -n "Test XXX" file$file
sed -i "1:$pos-1 d" file$file >new$file
done
This should work for you:
sed -i '1,/Test XXX/d' file1
sed -i '1,/Test XXX/d' file2
or simply
sed -i '1,/Test XXX/d' file*
This will work for your examples and even if the matched pattern is on the very first line:
sed -n -E -e '/Text XXX/,$ p' input.txt | sed '1 d'
For example if you input is simply
Test XXX
Mike
Charly
This will give you
Mike
Charly
If you want to keep the first match Test XXX then just use:
sed -n -E -e '/Text XXX/,$ p' input.txt
You can do it with bash ( eg for 1 file)
t=0
while read -r line
do
[[ $line =~ Test.*XXX ]] && t="1"
case "$t" in
1) echo "$line";;
esac
done < file > tempo && mv tempo file
Use a for loop as necessary to go through all the files
cat <<-EOF > file1.txt
1aaa
2eee
Test XXX
Hanna
Lars
EOF
cat file1.txt | sed -e '/Test *XXX/p' -e '0,/Test *XXX/d'
Output:
Test XXX
Hanna
Lars
Explanation:
-e '/Test *XXX/p' duplicates the line matching /Test *XXX/
-e '0,/Test *XXX/d' deletes from line 0 to the first line matching /Test *XXX/
By duplicating the line, then removing the first one, we effectively retain the matched line, successfully deleting all lines BEFORE Test XXX
Note: this will not work as expected if there are multiple Test XXX lines.

Resources