Combine CSV files with condition - bash

I need to combine all the csv files in some directory (.csv), provided that there are other files with the same name in this directory, but with different expansion (.csv.done).
If a csv file doesn't have .done in this extension then I don't need it for combine process.
What is the best way to do it using Bash ?

This approach is a solution to your problem. I see you've commented that it "didn't work", but whatever the reason is for it not working, it's likely simple to fix e.g. if you forgot to include key details, or failed to adapt it appropriately to suit your specific situation. If you need further help troubleshooting, add more info to your question.
The approach:
for f in *.csv.done
do
cat "${f%.*}" >> combined_file.csv
done
How it works:
In your example, you have 3 files named 1.csv 2.csv 3.csv and two 'done' files named 1.csv.done 2.csv.done.
This script begins by making a list of all files that end in .csv.done (two files: 1.csv.done 2.csv.done).
It then uses a parameter expansion, specifically ${parameter%word}, to 'shorten' the name of the two files in the list to .csv (instead of .csv.done).
Then it 'prints' the content of the two 'shortened' filenames (1.csv and 2.csv) into a 'combined' file.
It doesn't 'print' the content of 1.csv.done or 2.csv.done, or 3.csv, because these files weren't in the original 'list'.
If you run this script multiple times, it will keep adding the contents of files 1.csv and 2.csv to the 'combined' file (only run it once, or delete the 'combined' file before running it again)

Related

for loop concatenating files that share part of their common basename (paired end sequencing reads)

I'm trying to concatenate a bunch of paired files into one file (for those who work with sequencing data, you'll be familiar with the paired-end read format).
For example, I have
SLG1.R1.fastq.gz
SLG1.R2.fastq.gz
SLG2.R1.fastq.gz
SLG2.R2.fastq.gz
SLG3.R1.fastq.gz
SLG3.R2.fastq.gz
etc.
I need to concatenate the two SLG1 files, the two SLG2 files, and the two SLG3 files.
So far I have this:
cd /workidr/slg/diet_manip/filtered_concatenated_reads/nonhost
for i in *1.fastq.gz
do
base=(basename $i "1.fastq.gz")
cat ${base}1.fastq.gz ${base}2.fastq.gz > /workdir/slg/diet_manip/filtered_concatenated_reads/cat/${base}.fastq.gz
done
The original files are all in the /filtered_concatenated_reads/nonhost directory, and I want the concatenated versions to be in /filtered_concatenated_reads/cat
The above code gives this error:
-bash: /workdir/slg/diet_manip/filtered_concatenated_reads/cat/basename.fastq.gz: No such file or directory
Any ideas?
Thank you!!

for loop calling two different files (same name, different extension) across several files

I have several files. I need to run a perl script calling all the files in a folder, simultaneously calling one file with extension .pep and another that is a similar file and matches the same name but different extension, .pep.nuc (like Oh01.pep and Oh001.pep.nuc)
I have this script so far, but I am missing something of course.
for file *.pep; do ./script.pl *.pep *.pep.nuc > "${file%.*}.nucleo"; done
This should do it:
for file in *.pep; do
./script.pl "$file" "${file}.nuc" > "${file%.pep}.nucleo";
done

Bash get all specific files in specific directory

I have a script that takes as an argument a path to a file upon which it performs certain operations. These files are stored in directories with path storage///_id/files (so in 2016 July 22 it would be storage/2016/Jul/22_1/files for the first set of files, .../Jul/22_2/files for second one etc.). The problem is each directory stores files with two extensions (say file.doc, file.txt) and I want to perform operations only on .txt files. I've tested earlier something like
for file in "/home/gonczor/temp/"*/*".txt"; do
echo "$file"
done
And it worked perfectly given that names in directories don't change. When I move one step further and add this 22_1, 22_2, 23_1 directories something strange happens.
This is my script (simplified):
for file in "$FILE_PATH/""$YEAR/""$MONTH/""$DAY"*/*".txt"; do
my_program ${report}
done
And instead of finding .../2016/Jul/22_1/file.txt it finds /2016/Jul/22*/*.txt
How can I make it work? The solution I've tried to make up is from here

Extracting contents of many zipped folders into a single directory

Kind of easy question, but I can't find the answer. I want to extract the contents of multiple zipped folders into a single directory. I am using the bash console, which is the only tool available on the particular website I am using.
For example, I have two folders: a.zip (which contains a1.txt and a2.txt) and b.zip (which contains b1.txt and b2.txt). I want to get extract all four text files into a single directory.
I have tried
unzip \*.zip -d \newdirectory
But it creates two directories (a and b) with two text files in each.
I also tried concatenating the two zipped folders into one big folder and extracting it, but it still creates two directories, even when I specify a new directory.
I can't figure what I am doing wrong. Any help?
Thanks in advance!
Use the -j parameter to ignore any directory structure.
unzip -j -d /path/to/your/directory '*.zip*'

Finding and Removing Unused Files Through Command Line

My websites file structure has gotten very messy over the years from uploading random files to test different things out. I have a list of all my files such as this:
file1.html
another.html
otherstuff.php
cool.jpg
whatsthisdo.js
hmmmm.js
Is there any way I can input my list of files via command line and search the contents of all the other files on my website and output a list of the files that aren't mentioned anywhere on my other files?
For example, if cool.jpg and hmmmm.js weren't mentioned in any of my other files then it could output them in a list like this:
cool.jpg
hmmmm.js
And then any of those other files mentioned above aren't listed because they are mentioned somewhere in another file. Note: I don't want it to just automatically delete the unused files, I'll do that manually.
Also, of course I have multiple folders so it will need to search recursively from my current location and output all the unused (unreferenced) files.
I'm thinking command line would be the fastest/easiest way, unless someone knows of another. Thanks in advance for any help that you guys can be!
Yep! This is pretty easy to do with grep. In this case, you would run a command like:
$ for orphan in `cat orphans.txt`; do \
echo "Checking for presence of ${orphan} in present directory..." ;
grep -rl $orphan . ; done
And orphans.txt would look like your list of files above, one file per line. You can add -i to the grep above if you want to grep case-insensitively. And you would want to run that command in /var/www or wherever your distribution keeps its webroots. If, after you see the above "Checking for..." and no matches below, you haven't got any files matching that name.

Resources