How can I delete all lines before a specific string from a number of files - bash

I have n files, like:
file1:
1aaa
2eee
Test XXX
Hanna
Lars
file2:
1fff
2ddd
3zzz
Test XXX
Mike
Charly
I want to remove all rows before "Test XXX" from all n files.
The number of rows to delete varies between files.
My idea:
for file in 1 :n
do
pos=grep -n "Test XXX" file$file
sed -i "1:$pos-1 d" file$file >new$file
done

This should work for you:
sed -i '1,/Test XXX/d' file1
sed -i '1,/Test XXX/d' file2
or simply
sed -i '1,/Test XXX/d' file*

This will work for your examples and even if the matched pattern is on the very first line:
sed -n -E -e '/Text XXX/,$ p' input.txt | sed '1 d'
For example if you input is simply
Test XXX
Mike
Charly
This will give you
Mike
Charly
If you want to keep the first match Test XXX then just use:
sed -n -E -e '/Text XXX/,$ p' input.txt

You can do it with bash ( eg for 1 file)
t=0
while read -r line
do
[[ $line =~ Test.*XXX ]] && t="1"
case "$t" in
1) echo "$line";;
esac
done < file > tempo && mv tempo file
Use a for loop as necessary to go through all the files

cat <<-EOF > file1.txt
1aaa
2eee
Test XXX
Hanna
Lars
EOF
cat file1.txt | sed -e '/Test *XXX/p' -e '0,/Test *XXX/d'
Output:
Test XXX
Hanna
Lars
Explanation:
-e '/Test *XXX/p' duplicates the line matching /Test *XXX/
-e '0,/Test *XXX/d' deletes from line 0 to the first line matching /Test *XXX/
By duplicating the line, then removing the first one, we effectively retain the matched line, successfully deleting all lines BEFORE Test XXX
Note: this will not work as expected if there are multiple Test XXX lines.

Related

how to add beginning of file to another file using loop

I have files 1.txt, 2.txt, 3.txt and 1-bis.txt, 2-bis.txt, 3-bis.txt
cat 1.txt
#ok
#5
6
5
cat 2.txt
#not ok
#56
13
56
cat 3.txt
#nothing
#
cat 1-bis.txt
5
4
cat 2-bis.txt
32
24
cat 3-bis.txt
I would like to add lines starting with # (from non bis files) at the beginning of files "bis" in order to get:
cat 1-bis.txt
#ok
#5
5
4
cat 2-bis.txt
#not ok
#56
32
24
cat 3-bis.txt
#nothing
#
I was thinking to use grep -P "#" to select lines with # (or maybe sed -n) but I don't know how to loop files to solve this problem
Thank you very much for your help
You can use this solution:
for f in *-bis.txt; do
{ grep '^#' "${f//-bis}"; cat "$f"; } > "$f.tmp" && mv "$f.tmp" "$f"
done
If you only want # lines at the beginning of the files only then use:
Change
grep '^#' "${f//-bis}"
with:
awk '!/^#/{exit}1' "${f//-bis}"
You can loop over the ?.txt files and use parameter expansion to derive the corresponding bis- filename:
for file in ?.txt ; do
bis=${file%.txt}-bis.txt
grep '^#' "$file" > tmp
cat "$bis" >> tmp
mv tmp "$bis"
done
You don't need grep -P, simple grep is enough. Just add ^ to only match the octothorpes at the beginning of a line.

How to merge multiple files in order and append filename at the end in bash

I have multiple files like this:
BOB_1.brother_bob12.txt
BOB_2.brother_bob12.txt
..
BOB_35.brother_bob12.txt
How to join these files in order from {1..36} and append filename at the end of each row? I have tried:
for i in *.txt; do sed 's/$/ '"$i"'/' $i; done > outfile #joins but not in order
cat $(for((i=1;i<38;i++)); do echo -n "BOB_${i}.brother_bob12.txt "; done) # joins in order but no filename at the end
file sample:
1 345 378 1 3 4 5 C T
1 456 789 -1 2 3 4 A T
Do not do cat $(....). You may just:
for ((i=1;i<38;i++)); do
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done
You may also do:
printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'
You may use:
for i in {1..36}; do
fn="BOB_${i}.brother_bob12.txt"
[[ -f $fn ]] && awk -v OFS='\t' '{print $0, FILENAME}' "$fn"
done > output
Note that it will insert FILENAME as the last field in every record. If this is not what you want then show your expected output in question.
This might work for you (GNU sed);
sed -n 'p;F' BOB_{1..36}.brother_bob12.txt | sed 'N;s/\n/ /' >newFile
Used 2 invocations of sed, the first to append the file name after each line of each file. The second to replace the newline between each 2 lines by a space.

Shell Scripting: Delete all instances of an exact word from file (Not pattern)

I'm trying to delete every instance of a certain word in a file. I can't make it so that it doesn't delete the pattern from other words. For example if I want to remove the word 'the' from the file. It will remove 'the' from 'then' and leave me with just 'n'.
Right now I have tried:
sed s/"$word"//g -i final_in
And:
sed 's/\<"$word"\>//g' -i final_in
But neither of them have worked. I thought this would be pretty easy to Google, but every solution I find does not work properly.
$word='the'
$sed -r "s/\b$word\b//g" << HEREDOC
> Sample text
> therefore
> then
> the sky is blue
> HEREDOC
Sample text
therefore
then
sky is blue
\b=word boundary
# test
word='the'
echo 'aaa then bbb' | sed -r "s/$word//g"
# To match exacte word, you can add spaces :
word=then
echo 'aaa then bbb' | sed -e "s/ $word / /g"
# to modify a file
word='the'
cat file.txt | sed -r "s/ $word / /g" > tmp.txt
mv tmp.txt file.txt
# to consider ponctuations :
word=then
echo 'aaa. then, bbb' | sed -e "s/\([:.,;/]\)* *$word *\([:.,;/]\)*/\1 \2/g"

extract multiple lines of a file unix

I have a file A with 400,000 lines. I have another file B that has a bunch of line numbers.
File B:
-------
98
101
25012
10098
23489
I have to extract those line numbers specified in file B from file A. That is I want to extract lines 98,101,25012,10098,23489 from file A. How to extract these lines in the following cases.
File B is a explicit file.
File B is arriving out of a pipe. For eg., grep -n pattern somefile.txt is giving me the file B.
I wanted to use see -n 'x'p fileA. However, I don't know how to give the 'x' from a file. Also, I don't to how to pipe the value of 'x' from a command.
sed can print the line numbers you want:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p'
bar
If you want multiple lines:
$ printf $'foo\nbar\nbaz\n' | sed -ne '2p;3p'
bar
baz
To transform a set of lines to a sed command like this, use sed for beautiful sedception:
$ printf $'98\n101' | sed -e 's/$/;/'
98;
101;
Putting it all together:
sed -ne "$(sed -e 's/$/p;/' B)" A
Testing:
$ cat A
1
22
333
4444
$ cat B
1
3
$ sed -ne "$(sed -e 's/$/p;/' B)" A
1
333
QED.
awk fits this task better:
fileA in file case:
awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB fileA
fileA content from pipe:
cat fileA|awk 'NR==FNR{a[$0]=1;next}a[FNR]' fileB -
oh, you want FileB in file or from pipe, then same awk cmd:
awk '...' fileB fileA
and
cat fileB|awk '...' - fileA

Delete first line of file if it's empty

How can I delete the first (!) line of a text file if it's empty, using e.g. sed or other standard UNIX tools. I tried this command:
sed '/^$/d' < somefile
But this will delete the first empty line, not the first line of the file, if it's empty. Can I give sed some condition, concerning the line number?
With Levon's answer I built this small script based on awk:
#!/bin/bash
for FILE in $(find some_directory -name "*.csv")
do
echo Processing ${FILE}
awk '{if (NR==1 && NF==0) next};1' < ${FILE} > ${FILE}.killfirstline
mv ${FILE}.killfirstline ${FILE}
done
The simplest thing in sed is:
sed '1{/^$/d}'
Note that this does not delete a line that contains all blanks, but only a line that contains nothing but a single newline. To get rid of blanks:
sed '1{/^ *$/d}'
and to eliminate all whitespace:
sed '1{/^[[:space:]]*$/d}'
Note that some versions of sed require a terminator inside the block, so you might need to add a semi-colon. eg sed '1{/^$/d;}'
Using sed, try this:
sed -e '2,$b' -e '/^$/d' < somefile
or to make the change in place:
sed -i~ -e '2,$b' -e '/^$/d' somefile
If you don't have to do this in-place, you can use awk and redirect the output into a different file.
awk '{if (NR==1 && NF==0) next};1' somefile
This will print the contents of the file except if it's the first line (NR == 1) and it doesn't contain any data (NF == 0).
NR the current line number,NF the number of fields on a given line separated by blanks/tabs
E.g.,
$ cat -n data.txt
1
2 this is some text
3 and here
4 too
5
6 blank above
7 the end
$ awk '{if (NR==1 && NF==0) next};1' data.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
and
cat -n data2.txt
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
$ awk '{if (NR==1 && NF==0) next};1' data2.txt | cat -n
1 this is some text
2 and here
3 too
4
5 blank above
6 the end
Update:
This sed solution should also work for in-place replacement:
sed -i.bak '1{/^$/d}' somefile
The original file will be saved with a .bak extension
Delete the first line of all files under the actual directory if the first line is empty :
find -type f | xargs sed -i -e '2,$b' -e '/^$/d'
This might work for you:
sed '1!b;/^$/d' file

Resources