Recursive cat with file names - bash

I'd like to cat recursively several files with same name to another file. There's an earlier question "Recursive cat all the files into single file" which helped me to get started. However I'd like to achieve the same so that each file is preceded by the filename and path, different files preferably separated with a blank line or ----- or something like that. So the resulting file would read:
files/pipo1/foo.txt
flim
flam
floo
files/pipo2/foo.txt
plim
plam
ploo
Any way to achieve this in bash?

Of course! Instead of just cating the file, you just chain actions to print the filename, cat the file, then add a line feed:
find . -name 'foo.txt' \
-print \
-exec cat {} \; \
-printf "\n"

Related

Delete lines X to Y using Mac Unix Sed

Command line on a Mac. Have some text files. Want to remove certain lines from a group of files, then cat the remaining text of the file to a new merged file. Currently have the following attempt:
for file in *.txt;
do echo $file >> tempfile.html;
echo ''>>tempfile.html;
cat $file>>tempfile.html;
find . -type f -name 'tempfile.html' -exec sed -i '' '3,10d' {} +;
find . -type f -name 'tempfile.html' -exec sed -i '' '/<ACROSS>/,$d' {} +;
# ----------------
# some other stuff
# ----------------
done;
I am extracting a section of text from a bunch of files and concating them all together, but still need to know from which file each selection originated. First I concat the name of the file then (supposedly) the selection of text from each file. then repeat the process.
Plus, I need to leave the original text files in place for other purposes.
So the concatinated file would be:
filename1.txt
text-selection
more_text
filename2.txt
even-more-text
text-text-test-test
The first SED is supposed to delete from line 3 to line 10. The second is supposed to delete from the line containing to the end of the file.
However, what happens is the first deletes everything in the tempfile. The second one was doing nothing. (each were tested separately)
What am I doing wrong?
I must be missing something. Even trying -- what appears to be -- a very simple example does not work either. My hope was, the following example, would delete lines 3-10, but save the rest of the file to test.txt.
sed '3,10d' nxd2019-01-06.txt > test.txt
Your invocation of find will attempt to run sed with as many files as possible per call. But note: Addresses in sed do not address lines in each input file, they address the whole input of sed (which can consist out of many input files)
Try this:
> a.txt cat <<EOF
1
2
EOF
> b.txt cat <<EOF
3
4
EOF
Now try this:
sed 1d a.txt b.txt
2
3
4
As you can see, sed removed the first line from a.txt, not from b.txt
The problem in your case, is the second invocation of find. If will remove everything from the first occurrence of ACROSS until the last line in the last file found by find This will effectively remove the content from all but the first tempfile.html.
Having that the remaining logic in your script is working, you should just change the find invocations to:
find . -type f -name 'tempfile.html' -exec sed -i '' '3,10d' {} \;
find . -type f -name 'tempfile.html' -exec sed -i '' '/<ACROSS>/,$d' {} \;
This would call sed once per input file.

How to move files based on file names in a.csv doc - macOS Terminal?

Terminal noob need a little help :)
I have a 98 row long filename list in a .csv file. For example:
name01; name03, etc.
I have an external hard drive with a lot of files in chaotic file
structure. BUT the file names are consistent, something like:
name01_xy; name01_zq; name02_xyz etc.
I would like to copy every file and directory from the external hard
drive which begins with the filename stored in the .csv file to my
computer.
So basically it's a search and copy based on a text file from an eHDD to my computer. I guess the easiest way to do is a Terminal command. Do you have any advice? Thanks in advance!
The task can be split into three: read search criteria from file; find files by criteria; copy found files. We discuss each one separately and combine them in a one-liner step-by-step:
Read search criteria from .csv file
Since your .csv file is pretty much just a text file with one criterion per line, it's pretty easy: just cat the file.
$ cat file.csv
bea001f001
bea003n001
bea007f005
bea008f006
bea009n003
Find files
We will use find. Example: you have a directory /Users/me/where/to/search and want to find all files in there whose names start with bea001f001:
$ find /Users/me/where/to/search -type f -name "bea001f001*"
If you want to find all files that end with bea001f001, move the star wildcard (zero-or-more) to the beginning of the search criterion:
$ find /Users/me/where/to/search -type f -name "*bea001f001"
Now you can already guess what the search criterion for all files containing the name bea001f001 would look like: "*bea001f001*".
We use -type f to tell find that we are interested only in finding files and not directories.
Combine reading and finding
We use xargs for passing the file contents to find a -name argument:
$ cat file.csv | xargs -I [] find /Users/me/where/to/search -type f -name "[]*"
/Users/me/where/to/search/bea001f001_xy
/Users/me/where/to/search/bea001f001_xyz
/Users/me/where/to/search/bea009n003_zq
Copy files
We use cp. It is pretty straightforward: cp file target will copy file to directory target (if it is a directory, or replace file named target).
Complete one-liner
We pass results from find to cp not by piping, but by using the -exec argument passed to find:
$ cat file.csv | xargs -I [] find /Users/me/where/to/search -type f -name "[]*" -exec cp {} /Users/me/where/to/copy \;
Sorry this is my first post here. In response to the comments above, only the last file is selected likely because the others have a carriage return \r. If you first append the directory to each filename in the csv, you can perform the move with the following command, which strips the \r.
cp `tr -d '\r' < file.csv` /your/target/directory

Rename *.csv.* files with mutilple extensions in Linux bash

I would like to rename files with multiple extensions (only the csv files) with .csv extension to the end.
example Input files in directory:
zebra.txt
sounds.pdf
input.csv
input.csv.aa
input.csv.ab .. ..
input.csv.zz
123.csv
123.csv.aa ...
123.csc.zz xxx.csv yyy.csv
All .csv. files are in the same format. I would like the output to be *.csv file with no further extensions
I would like to rename the files to keep the last part of extension to be swapped like below.
input.csv.aa to input_aa.csv
input.csv.ab to input_ab.csv ..
input.csv.zz to input_zz.csv
xxx.csv - will remain as is
yyy.csv - will remain as is
or
If we can combine to one file based on name, that is fine too
input.csv (combined all input.csv.aa,input.csv.ab, ..,input.csv.zz)
123.csv (combined all 123.csv.aa, .. , 123.csc.zz) xxx.csv yyy.csv
Try something like this for your first option:
find . -name "*.csv*" | \
sed -e 's/\(\(.*\)\.csv\(.*\)\)/\1|\2\3.csv/' | \
tr '|' '\n' | \
xargs -n 2 mv
This does:
Finds all the files with a .csv extension somewhere in the name, putting one filename per line
Changes the line to output the original name, then a pipe character (|), then a new filename with any additional extensions after .csv moved to come before the .csv (e.g. input.csv.aa becomes input.csv.aa|input.aa.csv)
Replace pipe with a newline
For every two lines, pass as arguments to mv to rename the files
To rename *.csv files:
find . -regex ".*\.csv\.[a-z]*$" -exec rename 's/(\.csv)\.([a-z]+)$/_$2$1/' {} \;
-regex pattern - file name matches regular expression pattern
rename perlexp - renames the filenames according to the Perl expression perlexp
To combine a separate group of files (let's say input.csv.aa,input.csv.ab, ..,input.csv.zz) into one file - use the following cat approach:
cat input.csv.* > input.csv

bash- use filename to append to every line in each file using sed

I have multiple files named as such --> 100.txt, 101.txt, 102.txt, etc.
The files are located within a directory. For every one of these files, I need to append the number before the extension in the file name to every line in the file.
So if the file content of 100.txt is:
blahblahblah
blahblahblah
...
I need the output to be:
blahblahblah 100
blahblahblah 100
...
I need to do this using sed.
My current code looks like this, but it is ugly and not very concise:
dir=$1
for file in $dir/*
do
base=$(basename $file)
filename="${base%.*}"
sed "s/$/ $filename/" $file
done
Is it possible to do this in such a way?
find $dir/* -exec sed ... {} \;
The code you already have is essentially the simplest, shortest way of performing the task in bash. The only changes I would make are to pass -i to sed, assuming you are using GNU sed (otherwise you will need to redirect the output to a temporary file, remove the old file, and move the new file into its place), and to provide a default value in case $1 is empty.
dir="${1:-.}"
the following command line will find all files that that has filename with only numbers with an extension and append the filename (numbers) at the end of each line in that file..(I tested with a couple of files)
find <directory path> -type f -name '[0-9]*' -exec bash -c 'num=`basename "{}"|sed "s/^\([0-9]\{1,\}\)\..*/\1/"`;sed -i.bak "s/.$/& $num/" "{}"' \;
Note: command line using sed not tested in OS X
replace <directory path> with the path of your directory

Visit all subdirectories and extract first page from every pdf

I have a few folders with E-Books and I want to extract first page from every book. There are over two hundred books so doing this manually it's a big pain in the back and will be very time consuming.
I have a command that does the job for single file
pdftk TehInput.pdf cat 1 output cover_TehInput.pdf
How do I wrap this into a single script that visits everything and assigns the name to output like cover_wtv-original-name-is.pdf? All the output files might be everywhere like in the directory where script was started or near the original file.
You want to use the find command for this. Something like:
find . -iname '*.pdf' -exec pdftk '{}' cat 1 output '{}'.cover.pdf ';'
This will find all PDFs from the current directory (.) downwards, and execute
pdftk filename.pdf cat 1 output filename.pdf.cover.pdf
on it. It's the whole path that will get passed to pdftk, so you'll end up with the cover PDFs in the same directory as the original files. (You could do something to get rid of the .pdf.cover.pdf extensions if you need to.)
If you use no blanks or newlines in filenames:
find . -iname '*.pdf' -printf "%h %f\n" | sed -E 's|(.*) (.*)|echo pdftk \1/\2 cat 1 output \1/cover_\2|' | sh
If output is okay, remove "echo ".

Resources