Unix: Split a file into two based on matched string - bash

I want to split a file into two, but cannot find a way to do this.
Master.txt
Happy Birthday to you! [[#HAPPY]]
Stop it. [[#COMMAND]]
Make a U-turn. [[#COMMAND]]
I want to split into two files, with the 2nd file starting when it matches the regex pattern [[#
Output1.txt
Happy Birthday to you!
Stop it.
Make a U-turn.
Output2.txt
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]
I've tried using awk:
awk -v RS="[[#*" '{ print $0 > "temp" NR }'
but it doesn't give my desired output -- any help would be appreciated!

Here is one way with GNU awk:
awk -v RS='\\[\\[#|\n' 'NR%2{print $0>"Output1.txt";next}{print "[[#"$0>"Output2.txt"}' master
Test:
$ ls
master
$ cat master
Happy Birthday to you! [[#HAPPY]]
Stop it. [[#COMMAND]]
Make a U-turn. [[#COMMAND]]
$ awk -v RS='\\[\\[#|\n' 'NR%2{print $0>"Output1.txt";next}{print "[[#"$0>"Output2.txt"}' master
$ ls
master Output1.txt Output2.txt
$ head Out*
==> Output1.txt <==
Happy Birthday to you!
Stop it.
Make a U-turn.
==> Output2.txt <==
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]

A pure bash solution might be a little slower, but is very readable:
while read line; do
[[ $line =~ (.*)(\[\[#.*]]) ]]
printf "%s" "${BASH_REMATCH[1]}" >&3
printf "%s" "${BASH_REMATCH[2]}" >&4
done 3> output1.txt 4> output2.txt

you can write small script like this…
#!/bin/ksh
sed -i -e 's/ \[\[#/,\[\[#/' $1
cut -d, -f1 $1 > $1.part1
cut -d, -f2 $1 > $1.part2
---------------------------------------------
OR…use multi-command line
# sed -i -e 's/ \[\[#/,\[\[#/' Master.txt ; cut -d, -f1 Master.txt > output1.txt ; cut -d, -f1 Master.txt > output.txt

Simpler in sed, IMHO:
$ sed 's/^\([^[]*\).*/\1/' Master.txt > Output1.txt
$ sed 's/^[^[]*//' Master.txt > Output2.txt

sed -n 's/\[\[#/\
&/;P
/\n/ {s/.*\n//;H;}
$ {x;s/\n//;w Output2.txt
}' YourFile > Output1.txt
in 1 sed but awk is better suite for this task

This might work for you(GNU sed):
sed -n 's/\[\[#/\n&/;P;s/.*\n//w file3' file1 >file2

No need for gnu awk, this should work for any awk
awk -F'\\[\\[#' '{print $1>"Output1.txt";print "[[#"$2>"Output2.txt"}' Master.txt
cat Output1.txt
Happy Birthday to you!
Stop it.
Make a U-turn.
cat Output2.txt
[[#HAPPY]]
[[#COMMAND]]
[[#COMMAND]]

Related

Cut files from 1 files and grep -v from another files

File1
Ada
Billy
Charles
Delta
Eight
File2
Ada,User,xxx
Beba,User,xxx
Charles,Admin,xxx
I am exuting the following
Acc=`cut -d',' -f1 $PATH/File2
for account in `cat $File1 |grep -v Acc`
do
cat......
sed....
How to correct this?>
Expect output
Check file2 account existing on file1
Ada
Charles
This awk should work for you:
awk -F, 'FNR == NR {seen[$1]; next} $1 in seen' file2 file1
Ada
Charles
If this is not the output you're looking for then edit your question and add your expected output.
Your grep command searches for files which do not contain the string Acc. You need the Flag -f, which causes grep to accept a list of pattern from a file, something like this:
tmpf=/tmp/$$
cut -d',' -f1 File2 >$tmpf
for account in $(grep -f "$tmpf" File1)
do
...
done

Is it possible to pipe head output to sed?

Input file
hello how are u
some what doing fine
so
thats all
huh
thats great
cool
gotcha im fine
I wanted to remove last 4 lines without re directing to another file or say in place edit.
I used head -n -3 input.txt but its removing only the last 2 lines.
Also wanted to understand is it possible to pipe head's output to sed
like head -n -3 input.txt | sed ...
Yes, I went thru sed's option to remove last n lines like below but couldn't understand the nuances of the command so went ahead with the alternative of head command
sed -e :a -e '$d;N;2,5ba' -e 'P;D' file
EDIT: Without creating a temp file solution:
awk -i inplace -v lines=$(wc -l < Input_file) 'FNR<=(lines-4)' Input_file
Could you please try following and let me know if this helps you.
tac Input_file | tail -n +5 | tac > temp_file && mv temp_file Input_file
Solution 2nd: Using awk.
awk -v lines=$(wc -l < Input_file) 'FNR<=(lines-4)' Input_file > temp_file && mv temp_file Input_file

Creating a script that checks to see if each word in a file

I am pretty new to Bash and scripting in general and could use some help. Each word in the first file is separated by \n while the second file could contain anything. If the string in the first file is not found in the second file, I want to output it. Pretty much "check if these words are in these words and tell me the ones that are not"
File1.txt contains something like:
dog
cat
fish
rat
file2.txt contains something like:
dog
bear
catfish
magic ->rat
I know I want to use grep (or do I?) and the command would be (to my best understanding):
$foo.sh file1.txt file2.txt
Now for the script...
I have no idea...
grep -iv $1 $2
Give this a try. This is straight forward and not optimized but it does the trick (I think)
while read line ; do
fgrep -q "$line" file2.txt || echo "$line"
done < file1.txt
There is a funny version below, with 4 parrallel fgrep and the use of an additional result.txt file.
> result.txt
nb_parrallel=4
while read line ; do
while [ $(jobs | wc -l) -gt "$nb_parralel" ]; do sleep 1; done
fgrep -q "$line" file2.txt || echo "$line" >> result.txt &
done < file1.txt
wait
cat result.txt
You can increase the value 4, in order to use more parrallel fgrep, depending on the number of cpus and cores and the IOPS available.
With the -f flag you can tell grep to use a file.
grep -vf file2.txt file1.txt
To get a good match on complete lines, use
grep -vFxf file2.txt file1.txt
As #anubhava commented, this will not match substrings. To fix that, we will use the result of grep -Fof file1.txt file2.txt (all the relevant keywords).
Combining these will give
grep -vFxf <(grep -Fof file1.txt file2.txt) file1.txt
Using awk you can do:
awk 'FNR==NR{a[$0]; next} {for (i in a) if (index(i, $0)) next} 1' file2 file1
rat
You can simply do the following:
comm -2 -3 file1.txt file2.txt
and also:
diff -u file1.txt file2.txt
I know you were looking for a script but I don't think there is any reason to do so and if you still want to have a script you can jsut run the commands from a script.
similar awk
$ awk 'NR==FNR{a[$0];next} {for(k in a) if(k~$0) next}1' file2 file1
rat

How to cut and overwrite a certain section from a file BASH

How to cut and overwrite a certain section from a file BASH
Content in dl.txt: "127. www.example.com"
I have tried:
#cat dl.txt|egrep -v "^[0-9]+.[ ]" > dl.txt
#cat dl.txt|egrep "www.example.com" > dl.txt
Could this maybe be done in awk ?
If you mean you want to alter the contents of the file dl.txt and delete "127. ", you can use sed:
sed -i.bak 's/127. //' dl.txt
Then you will see that dl.txt is changed and dl.txt.bak is a backup copy.
Likewise if you want to remove the "www.example.com"
sed -i.bak 's/www.example.com//' dl.txt
Or if you want to delete everything up to, and including the space on each line:
sed -i.bak 's/.* //' dl.txt
Or using awk:
awk '{print $1}' dl.txt
127.
awk '{print $2}' dl.txt
www.example.com
Or you can do it nearly in-place with awk like this, only overwriting the original if the awk is successful:
awk '{print $2}' dl.txt > $$.tmp && mv $$.tmp dl.txt
Below should get rid of 127. before the url, or any ip address for that matter:
sed -i 's/[0-9]\+\.\?\s*//g' dl.txt

Using awk to put a header in a text file

I have lots of text files and need to put a header on each one of them depending of the data on each file.
This awk command accomplishes the task:
awk 'NR==1{first=$1}{sum+=$1;}END{last=$1;print NR,last,"L";}' my_text.file
But this prints it on the screen and I want to put this output in the header of each of my file, and saving the modifications with the same file name.
Here is what I've tried:
for i in *.txt
do
echo Processing ${i}
cat awk 'NR==1{first=$1}{sum+=$1;}END{last=$1;print NR,last,"L";}' "${i}" ${i} > $$.tmp && mv $$.tmp "${i}"
done
So I guess I can't use cat to put them as a header, or am I doing something wrong?
Thanks in advance
UPDATE:
with awk:
awk 'BEGIN{print "header"}1' test.txt
without awk:
with cat & echo:
cat <(echo "header") test.txt
(OR)
using tac:
tac test.txt | echo "header" >> test.txt | tac test.txt
I THINK what you're trying to do with your loop is:
for i in *.txt
do
echo "Processing $i"
awk 'NR==1{first=$1}{sum+=$1}END{last=$1;print NR,last,"L"}' "$i" > $$.tmp &&
cat "$i" >> $$.tmp &&
mv $$.tmp "$i"
done
but it's not clear what you're really trying to do since you never use first or sum and setting last in the END section is a bad idea as it will not work across all awks and there's a simple alternative.
If you update your question with some sample input and expected output we can help you.

Resources