split text file in two using bash script - bash

I have a text file with a marker somewhere in the middle:
one
two
three
blah-blah *MARKER* blah-blah
four
five
six
...
I just need to split this file in two files, first containing everything before MARKER, and second one containing everything after MARKER. It seems it can be done in one line with awk or sed, I just can't figure out how.
I tried the easy way — using csplit, but csplit doesn't play well with Unicode text.

you can do it easily with awk
awk -vRS="MARKER" '{print $0>NR".txt"}' file

Try this:
awk '/MARKER/{n++}{print >"out" n ".txt" }' final.txt
It will read input from final.txt and produces out1.txt, out2.txt, etc...

sed -n '/MARKER/q;p' inputfile > outputfile1
sed -n '/MARKER/{:a;n;p;ba}' inputfile > outputfile2
Or all in one:
sed -n -e '/MARKER/! w outputfile1' -e'/MARKER/{:a;n;w outputfile2' -e 'ba}' inputfile

The split command will almost do what you want:
$ split -p '\*MARKER\*' splitee
$ cat xaa
one
two
three
$ cat xab
blah-blah *MARKER* blah-blah
four
five
six
$ tail -n+2 xab
four
five
six
Perhaps it's close enough for your needs.
I have no idea if it does any better with Unicode than csplit, though.

Related

replace different text in different lines using sed

I need to do the following:
I have two files, the first one contains only the lines that are going to be modified:
1
2
3
and the second contains the text that is going to be replaced in original file (final_output.txt)
13e
19f
16a
the original file is
wire1: 0x'd318
wire2: 0x'd415
wire3: 0x'd362
I want to get the following:
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
This is only a part of final_output.txt, because the file can contain at least 100 lines, and I pretend to do it using for, but I don't know how to implement it
awk to the rescue!
assuming the part after the single quote will be replaced.
$ awk -v q="'" 'NR==FNR {a[$1]=$2;next}
FNR in a {sub(q".*",a[FNR])}1' <(paste index rep) file
index is the index file, rep is the replacement file, and file is the original data file.
Another solution where file1 contains only the lines, file2 contains the text that is going to be replaced in original file and final_output.txt contains your original text.
for ((i=1;i<=$(wc -l < file1);i++)); do sed -i "$(sed -n "${i}p" file1)s#$(sed -n "$(sed -n "${i}p" file1)p" final_output.txt | grep -oP "'.*")#$(sed -n "${i}p" file2)#g" final_output.txt; done
Output
darby#Debian:~/Scrivania$ cat final_output.txt
wire1: 0x13e
wire2: 0x19f
wire3: 0x16a
darby#Debian:~/Scrivania$

pass sed a long list of line numbers to remove from a file

I am trying to remove 500+ non-consecutive lines from a very large file with sed.
I have the lines stored in a list.txt file but I cant't use it in a for loop
for i in `cat list`; do echo 'sed -i -e ' \'"$i"d\'' huge_file.txt' ; done
because line numbers in the original file would change every time sed removes one and exits.
I should do:
sed -i -e '1d;2d;93572277d;93572278d; ......;nth ' huge_file.txt
Is there a way to pass that list to sed in a file?
you can try with awk:
awk -v s="2,3,..,n" 'BEGIN{n=split(s,t,",");for(i=1;i<=n;i++)d[t[i]]=1}
!d[NR]' huge.txt
You pass the comma-separated line numbers to awk by -v, in awk split it in array, and check each line, if the line number in the array, skip.
Test it with small file, if it worked as you expected, you can do:
awk -v '....' '....' huge.txt > tmp.txt && mv tmp.txt huge.txt
to write the change back to your original input file.
update
If you have 500 line numbers in another file, say, each number in a line, you can:
awk 'NR==FNR{a[$0]=1;next}!a[FNR]' ln.txt huge.txt
If it's just for a single particular task (not frequent) you may use the following GNU sed approach (assuming that numbers in list.txt are separated with newline \n):
sed -i "$(sed -z 's/\n/d;/g' list.txt)" huge_file.txt

grep - List all lines not containing both pattern

I have a text file having some records. I have two patterns to verify and I want to list all lines from the file not containing both pattern. How can I do this using grep command?
I tried few things using grep -v but nothing seem to work.
Suppose my text file is as follows.
1. qwerpattern1yui
2. adspattern2asd
3. cczxczc
4. jkjkpattern2adsdapattern1
I want to list lines 1, 2 and 3 only.
Thanks in advance.
You can use:
grep -w -v -e "word1" -e "word2" file
OR else using egrep:
egrep -w -v -e "word1|word2" file
UPDATE: Based on comments, it seems following awk will work better:
awk '!(/pattern1/ && /pattern2/)' file
If I'm keeping up with the comments and edits right, I think this is what you need:
$ grep -E -v 'pattern1.*pattern2|pattern2.*pattern1' test
1. qwerpattern1yui
2. adspattern2asd
3. cczxczc
$
If you like to try awk
awk '!/pattern1|pattern2/' file
It will not print any lines if it contains any of the patters
You can also expand this:
awk '!/pattern1|pattern2|pattern3|pattern4/' file
Example
cat file
one
two
three
four
one two
two
nine
six two
remove all lines with one or two or both of them.
awk '!/one|two/' file
three
four
nine
While the standard tools-based answers (awk, grep, etc) are generally simpler and more straightforward, for completion if you needed a pure-bash solution, you could do this:
$ while IFS= read -r ln; do [[ $ln =~ pattern1 ]] && [[ $ln =~ pattern2 ]] && continue; printf "%s\n" "$ln"; done < test
1. qwerpattern1yui
2. adspattern2asd
3. cczxczc
$

sed command in linux.how to use sed to replace only first n ocuurences

I want to replace only first four occurences of LC-COUNT=1.how can i do that.
sed -i "s/LC-COUNT=1/LC-COUNT=$LC_COUNT/1,4" file.txt
Try this -
sed -e '0,/LC-COUNT=1/s//LC-COUNT=\$LC_COUNT/' file.txt > output.txt
Running it only once will replace first occurrences of LC-COUNT=1 by LC-COUNT=$LC_COUNT and will put the output output.txt file. Note : You will have to escape $ char first.
You are going to have to run it four times. But next time, consider output.txt as the original file, I mean do the replace in output.txt.
I think finding and replacing first N occurrences is not possible with sed.
In vim you do the similar kind of thing like -
:%s/LC-COUNT=1/LC-COUNT=\$LC_COUNT/gc
There gc option will ask you for confirmation on each find-replace. You can
This is better suited for awk.
Consider this awk command:
awk -F= '$1=="LC-COUNT" && c<=4 {$2="=$LC_COUNT";c++}1' OFS= file

Move lines between text files in shell script

Basically what I'm trying to do is move lines 1 through 4 from A.txt
and replace the lines 5 through 8 in B.txt with them.
I figured out how to get the first four lines with sed,
but I cannot figure out how to "send" them to replace the lines in the second txt file.
cat A.txt
1 a
2 b
3 c
4 d
5 e
cat B.txt
one
two
three
four
five
six
seven
eigh
nine
Result
one
two
three
four
1 a
2 b
3 c
4 d
nine
This might work for you (GNU sed):
sed -i -e '5,8R a.txt' -e '5,8d' b.txt
for your example, this awk one-liner works too:
awk 'NR>4&&NR<9{getline $0<"a.txt"}7' b.txt
this prints the expected output, you need play with redirection if you want to save it back to b.txt.
This awk should do:
awk 'FNR==NR {a[NR]=$0;next} FNR>=5 && FNR<=8 {$0=a[FNR-4]}1' A.txt B.txt > tmp && mv tmp B.txt
It stores the lines of A.txt in an array named a
Then if line number of B.txt is between 5 and 8 replace value using info from array a
Result is stored in a temp file tmp and then moved back to B.txt
#!/usr/local/bin/bash -x
sed -n '1,4p' B.txt > B.txt.tmp
sed -n '1,4p' A.txt >> B.txt.tmp
sed -n '9p' B.txt >> B.txt.tmp
mv B.txt B.txt.bak
mv B.txt.tmp B.txt
This is static. Still, as long as you know your line addresses, this will work.
If you want support for variable span lengths, you will need to do something like this in your files:-
#----------numbers-begin----------
one
two
three
four
#----------numbers-end----------
From there, you can get to them inside the file with:-
sed -n '/--numbers-begin--/,/--numbers-end--/p' <filename> > newfile
Not only does that give you anchors to play with, but sed printing is my own preferred method of importing strings for variables in scripts, because it doesn't cause the shell to try and literally interpret the text as a command, as cat does for some reason.
The other thing that you can do in future files, is something like this:-
numbers:one
numbers:two
numbers:three
numbers:four
words:dog
words:cat
words:rat
Then:-
#!/usr/local/bin/bash
for i in $(sed -n '/^/,/$/p' file)
do
if [ $(echo ${i} | sed -n '/numbers/p' ]
then
echo ${i} | cut -d':' -f2 >> numbers-only-file
fi
done
Data structuring. It's all about the data structuring. Structure your data properly, and you will have practically no work at all.

Resources