How to merge multiple files into single file in given directories - shell

I want to write a shell script to merge contents of multiple files in a given directories.
DIR1 contains sample1.txt sample2.txt
sample1.txt contents :---this is sample1 file
sample2.txt contents :---this is sample2 file
DIR2 contains demo1.txt demo2.txt
demo1.txt contents :---this is demo1 file
I tried :
(find /home/DIR1 /home/DIR2 -type f | xargs -i cat {} ) > /result/final.txt
It worked!
this is sample2 file this is sample1 file this is demo1 file
however output appears in a single line I need every file's output in a separate new line.
like this:
this is sample1 file
this is sample2 file
this is demo1 file
how to achieve this?
Any help would be appreciated in advance.

Your files don't end with newlines, and therefor there are no newlines in the output file.
You should either make sure your input files end with newlines, or add them in the find command:
find /home/DIR1 /home/DIR2 -type f -exec cat {} \; -exec echo \;

As is pointed out in the comments, the issue may be because the End of File(EOF) is not preceded by the \n (newLine character).
One way of circumventing this issue, is to replace the "--- " with newline character. Hope the following command resolves the issue:
(find DIR1/ DIR2/ -type f | xargs -i cat {} ) | sed "s/$/\r/g" > result/final.txt

Related

Delete lines X to Y using Mac Unix Sed

Command line on a Mac. Have some text files. Want to remove certain lines from a group of files, then cat the remaining text of the file to a new merged file. Currently have the following attempt:
for file in *.txt;
do echo $file >> tempfile.html;
echo ''>>tempfile.html;
cat $file>>tempfile.html;
find . -type f -name 'tempfile.html' -exec sed -i '' '3,10d' {} +;
find . -type f -name 'tempfile.html' -exec sed -i '' '/<ACROSS>/,$d' {} +;
# ----------------
# some other stuff
# ----------------
done;
I am extracting a section of text from a bunch of files and concating them all together, but still need to know from which file each selection originated. First I concat the name of the file then (supposedly) the selection of text from each file. then repeat the process.
Plus, I need to leave the original text files in place for other purposes.
So the concatinated file would be:
filename1.txt
text-selection
more_text
filename2.txt
even-more-text
text-text-test-test
The first SED is supposed to delete from line 3 to line 10. The second is supposed to delete from the line containing to the end of the file.
However, what happens is the first deletes everything in the tempfile. The second one was doing nothing. (each were tested separately)
What am I doing wrong?
I must be missing something. Even trying -- what appears to be -- a very simple example does not work either. My hope was, the following example, would delete lines 3-10, but save the rest of the file to test.txt.
sed '3,10d' nxd2019-01-06.txt > test.txt
Your invocation of find will attempt to run sed with as many files as possible per call. But note: Addresses in sed do not address lines in each input file, they address the whole input of sed (which can consist out of many input files)
Try this:
> a.txt cat <<EOF
1
2
EOF
> b.txt cat <<EOF
3
4
EOF
Now try this:
sed 1d a.txt b.txt
2
3
4
As you can see, sed removed the first line from a.txt, not from b.txt
The problem in your case, is the second invocation of find. If will remove everything from the first occurrence of ACROSS until the last line in the last file found by find This will effectively remove the content from all but the first tempfile.html.
Having that the remaining logic in your script is working, you should just change the find invocations to:
find . -type f -name 'tempfile.html' -exec sed -i '' '3,10d' {} \;
find . -type f -name 'tempfile.html' -exec sed -i '' '/<ACROSS>/,$d' {} \;
This would call sed once per input file.

how to replace a pattern in only one file from current directory

Suppose I have 4 files with .txt extension in current directory, and i want to replace a pattern from only one file. Condition is we dont know name of file and replacement must be done in only one file. Other files should not be affected.
example:
user/home>ls -lart *.txt
a1.txt
b1.txt
c1.txt
d1.txt
there is one word "Day" in all of these files, i want to replace it with "Night" in only one file without affecting other files. How can I do this?
I tried below, but it replaces the pattern in all 4 files.
find . -type f -name "*.txt"|xargs sed 's/Day/Night/g'
Search for a file with the pattern and replace inplace with the -i flag:
sed -i 's/Day/Night/' $(grep -l "Day" *.txt | head -1 )
There are many ways to do that.
for file in *.txt; do
sed 's/Day/Night/g' "$file";
break; done
should be both fast and robust.

bash- use filename to append to every line in each file using sed

I have multiple files named as such --> 100.txt, 101.txt, 102.txt, etc.
The files are located within a directory. For every one of these files, I need to append the number before the extension in the file name to every line in the file.
So if the file content of 100.txt is:
blahblahblah
blahblahblah
...
I need the output to be:
blahblahblah 100
blahblahblah 100
...
I need to do this using sed.
My current code looks like this, but it is ugly and not very concise:
dir=$1
for file in $dir/*
do
base=$(basename $file)
filename="${base%.*}"
sed "s/$/ $filename/" $file
done
Is it possible to do this in such a way?
find $dir/* -exec sed ... {} \;
The code you already have is essentially the simplest, shortest way of performing the task in bash. The only changes I would make are to pass -i to sed, assuming you are using GNU sed (otherwise you will need to redirect the output to a temporary file, remove the old file, and move the new file into its place), and to provide a default value in case $1 is empty.
dir="${1:-.}"
the following command line will find all files that that has filename with only numbers with an extension and append the filename (numbers) at the end of each line in that file..(I tested with a couple of files)
find <directory path> -type f -name '[0-9]*' -exec bash -c 'num=`basename "{}"|sed "s/^\([0-9]\{1,\}\)\..*/\1/"`;sed -i.bak "s/.$/& $num/" "{}"' \;
Note: command line using sed not tested in OS X
replace <directory path> with the path of your directory

Recursive cat with file names

I'd like to cat recursively several files with same name to another file. There's an earlier question "Recursive cat all the files into single file" which helped me to get started. However I'd like to achieve the same so that each file is preceded by the filename and path, different files preferably separated with a blank line or ----- or something like that. So the resulting file would read:
files/pipo1/foo.txt
flim
flam
floo
files/pipo2/foo.txt
plim
plam
ploo
Any way to achieve this in bash?
Of course! Instead of just cating the file, you just chain actions to print the filename, cat the file, then add a line feed:
find . -name 'foo.txt' \
-print \
-exec cat {} \; \
-printf "\n"

combining grep and find to search for file names from query file

I've found many similar examples but cannot find an example to do the following. I have a query file with file names (file1, file2, file3, etc.) and would like to find these files in a directory tree; these files may appear more than once in the dir tree, so I'm looking for the full path. This option works well:
find path/to/files/*/* -type f | grep -E "file1|file2|file3|fileN"
What I would like is to pass grep a file with filenames, e.g. with the -f option, but am not successful. Many thanks for your insight.
This is what the query file looks like:
so the file contains one column of filenames separated by '\n' and here is how it looks like:
103128_seqs.fna
7010_seqs.fna
7049_seqs.fna
7059_seqs.fna
7077A_seqs.fna
7079_seqs.fna
grep -f FILE gets the patterns to match from FILE ... one per line*:
cat files_to_find.txt
n100079_seqs.fna
103128_seqs.fna
7010_seqs.fna
7049_seqs.fna
7059_seqs.fna
7077A_seqs.fna
7079_seqs.fna
Remove any whitespace (or do it manually):
perl -i -nle 'tr/ //d; print if length' files_to_find.txt
Create some files to test:
touch `cat files_to_find.txt`
Use it:
find ~/* -type f | grep -f files_to_find.txt
output:
/home/user/tmp/7010_seqs.fna
/home/user/tmp/103128_seqs.fna
/home/user/tmp/7049_seqs.fna
/home/user/tmp/7059_seqs.fna
/home/user/tmp/7077A_seqs.fna
/home/user/tmp/7079_seqs.fna
/home/user/tmp/n100079_seqs.fna
Is this what you want?

Resources