Rename *.csv.* files with mutilple extensions in Linux bash - bash

I would like to rename files with multiple extensions (only the csv files) with .csv extension to the end.
example Input files in directory:
zebra.txt
sounds.pdf
input.csv
input.csv.aa
input.csv.ab .. ..
input.csv.zz
123.csv
123.csv.aa ...
123.csc.zz xxx.csv yyy.csv
All .csv. files are in the same format. I would like the output to be *.csv file with no further extensions
I would like to rename the files to keep the last part of extension to be swapped like below.
input.csv.aa to input_aa.csv
input.csv.ab to input_ab.csv ..
input.csv.zz to input_zz.csv
xxx.csv - will remain as is
yyy.csv - will remain as is
or
If we can combine to one file based on name, that is fine too
input.csv (combined all input.csv.aa,input.csv.ab, ..,input.csv.zz)
123.csv (combined all 123.csv.aa, .. , 123.csc.zz) xxx.csv yyy.csv

Try something like this for your first option:
find . -name "*.csv*" | \
sed -e 's/\(\(.*\)\.csv\(.*\)\)/\1|\2\3.csv/' | \
tr '|' '\n' | \
xargs -n 2 mv
This does:
Finds all the files with a .csv extension somewhere in the name, putting one filename per line
Changes the line to output the original name, then a pipe character (|), then a new filename with any additional extensions after .csv moved to come before the .csv (e.g. input.csv.aa becomes input.csv.aa|input.aa.csv)
Replace pipe with a newline
For every two lines, pass as arguments to mv to rename the files

To rename *.csv files:
find . -regex ".*\.csv\.[a-z]*$" -exec rename 's/(\.csv)\.([a-z]+)$/_$2$1/' {} \;
-regex pattern - file name matches regular expression pattern
rename perlexp - renames the filenames according to the Perl expression perlexp
To combine a separate group of files (let's say input.csv.aa,input.csv.ab, ..,input.csv.zz) into one file - use the following cat approach:
cat input.csv.* > input.csv

Related

bash: use list of file names to concatenate matching files across directories and save all files in new directory

I have a large number of files that are found in three different directories. For some of these files, a file with an identical name exists in another directory. Other files exist in only one directory. I'd like to use bash to copy all of the files from the three directories to a single new directory, but for files with identically named files in more than one directory I want to concatenate the file contents across directories before saving to the new directory.
Here's an example of what my file structure looks like:
ls dir1/
file1.txt
file2.txt
file4.txt
ls dir2/
file2.txt
file5.txt
file6.txt
file9.txt
ls dir3/
file2.txt
file3.txt
file4.txt
file7.txt
file8.txt
file10.txt
Using this example, I'd like to produce a new directory that contains file1.txt through file10.txt, but with the contents of identically named files (e.g. file2.txt, file4.txt) concatenated in the new directory.
I have a unique list of all of the file names contained in my three directories (single instance of each unique file name is contained within the list). So far, I have come up with code to take a list of file names from one directory and concatenate these files with identically named files in a second directory, but I'm not sure how to use my list of file names as a reference for concatenating and saving files (instead of the output from ls in the first directory). Any ideas for how to modify? Thanks very much!
PATH1='/path/to/dir1'
PATH2='/path/to/dir2'
PATH3='/path/to/dir3'
mkdir dir_new
ls $PATH1 | while read FILE; do
cat $PATH1/"$FILE" $PATH2/"$FILE" $PATH3/"$FILE" >> ./dir_new/"$FILE"
done
You can do it like this:
mkdir -p new_dir
for f in path/to/dir*/*.txt; do
cat "$f" >> "new_dir/${f##*/}"
done
This is a common use for substring removal with parameter expansion, in order to use only the basename of the file to construct the output filename.
Or you can use a find command to get the files and execute the command for each one:
find path/to/dir* -type f -name '*.txt' -print0 |\
xargs -0 -n1 sh -c 'cat "$0" >> new_dir/"${0##*/}"'
In the above command, the filenames out of find are preserved with zero separation (-print0), and xargs also accepts a zero separated list (-0). For each argument (-n1) the command following is executed. We call sh -c 'command' for convenience to use the substring removal inside there, we can access the argument provided by xargs as $0.

How to move files based on file names in a.csv doc - macOS Terminal?

Terminal noob need a little help :)
I have a 98 row long filename list in a .csv file. For example:
name01; name03, etc.
I have an external hard drive with a lot of files in chaotic file
structure. BUT the file names are consistent, something like:
name01_xy; name01_zq; name02_xyz etc.
I would like to copy every file and directory from the external hard
drive which begins with the filename stored in the .csv file to my
computer.
So basically it's a search and copy based on a text file from an eHDD to my computer. I guess the easiest way to do is a Terminal command. Do you have any advice? Thanks in advance!
The task can be split into three: read search criteria from file; find files by criteria; copy found files. We discuss each one separately and combine them in a one-liner step-by-step:
Read search criteria from .csv file
Since your .csv file is pretty much just a text file with one criterion per line, it's pretty easy: just cat the file.
$ cat file.csv
bea001f001
bea003n001
bea007f005
bea008f006
bea009n003
Find files
We will use find. Example: you have a directory /Users/me/where/to/search and want to find all files in there whose names start with bea001f001:
$ find /Users/me/where/to/search -type f -name "bea001f001*"
If you want to find all files that end with bea001f001, move the star wildcard (zero-or-more) to the beginning of the search criterion:
$ find /Users/me/where/to/search -type f -name "*bea001f001"
Now you can already guess what the search criterion for all files containing the name bea001f001 would look like: "*bea001f001*".
We use -type f to tell find that we are interested only in finding files and not directories.
Combine reading and finding
We use xargs for passing the file contents to find a -name argument:
$ cat file.csv | xargs -I [] find /Users/me/where/to/search -type f -name "[]*"
/Users/me/where/to/search/bea001f001_xy
/Users/me/where/to/search/bea001f001_xyz
/Users/me/where/to/search/bea009n003_zq
Copy files
We use cp. It is pretty straightforward: cp file target will copy file to directory target (if it is a directory, or replace file named target).
Complete one-liner
We pass results from find to cp not by piping, but by using the -exec argument passed to find:
$ cat file.csv | xargs -I [] find /Users/me/where/to/search -type f -name "[]*" -exec cp {} /Users/me/where/to/copy \;
Sorry this is my first post here. In response to the comments above, only the last file is selected likely because the others have a carriage return \r. If you first append the directory to each filename in the csv, you can perform the move with the following command, which strips the \r.
cp `tr -d '\r' < file.csv` /your/target/directory

save filename and information from the file into a two column txt doc. ubuntu terminal

I have a question regarding the manipulation and creation of text files in the ubuntu terminal. I have a directory that contains several 1000 subdirectories. In each directory, there is a file with the extension stats.txt. I want to write a piece of code that will run from the parent directory, and create a file with the name of all the stats.txt files in the first column, and then returns to me all the information from the 5th line of the same stats.txt file in the next column. The 5th line of the stats.txt file is a sentence of six words, not a single value.
For reference, I have successfully used the sed command in combination with find and cat to make a file containing the 5th line from each stats.txt file. I then used the ls command to save a list of all my subdirectories. I assumed both files would be in alphabetical order of the subdirectories, and thus easy to merge, but I was wrong. The find and cat functions, or at least my implementation of them, resulted in a file that appeared to be random in order (see below). No need to try to remedy this code, I'm open to all solutions.
# loop through subdirectories and save the 5th line of stats.txt as a different file.
for f in ~/*; do [ -d $f ] && cd "$f" && sed -n 5p *stats.txt > final.stats.txt done;
# find the final.stats.txt files and save them as a single file
find ./ -name 'final.stats.txt' -exec cat {} \; > compiled.stats.txt
Maybe something like this can help you get on track:
find . -name "*stats.txt" -exec awk 'FNR==5{print FILENAME, $0}' '{}' + > compiled.stats

Recursive cat with file names

I'd like to cat recursively several files with same name to another file. There's an earlier question "Recursive cat all the files into single file" which helped me to get started. However I'd like to achieve the same so that each file is preceded by the filename and path, different files preferably separated with a blank line or ----- or something like that. So the resulting file would read:
files/pipo1/foo.txt
flim
flam
floo
files/pipo2/foo.txt
plim
plam
ploo
Any way to achieve this in bash?
Of course! Instead of just cating the file, you just chain actions to print the filename, cat the file, then add a line feed:
find . -name 'foo.txt' \
-print \
-exec cat {} \; \
-printf "\n"

combining grep and find to search for file names from query file

I've found many similar examples but cannot find an example to do the following. I have a query file with file names (file1, file2, file3, etc.) and would like to find these files in a directory tree; these files may appear more than once in the dir tree, so I'm looking for the full path. This option works well:
find path/to/files/*/* -type f | grep -E "file1|file2|file3|fileN"
What I would like is to pass grep a file with filenames, e.g. with the -f option, but am not successful. Many thanks for your insight.
This is what the query file looks like:
so the file contains one column of filenames separated by '\n' and here is how it looks like:
103128_seqs.fna
7010_seqs.fna
7049_seqs.fna
7059_seqs.fna
7077A_seqs.fna
7079_seqs.fna
grep -f FILE gets the patterns to match from FILE ... one per line*:
cat files_to_find.txt
n100079_seqs.fna
103128_seqs.fna
7010_seqs.fna
7049_seqs.fna
7059_seqs.fna
7077A_seqs.fna
7079_seqs.fna
Remove any whitespace (or do it manually):
perl -i -nle 'tr/ //d; print if length' files_to_find.txt
Create some files to test:
touch `cat files_to_find.txt`
Use it:
find ~/* -type f | grep -f files_to_find.txt
output:
/home/user/tmp/7010_seqs.fna
/home/user/tmp/103128_seqs.fna
/home/user/tmp/7049_seqs.fna
/home/user/tmp/7059_seqs.fna
/home/user/tmp/7077A_seqs.fna
/home/user/tmp/7079_seqs.fna
/home/user/tmp/n100079_seqs.fna
Is this what you want?

Resources