How do I add a header in each of my text files based on the first two words of the filename? - bash

I have a folder with several hundred .txt files with numbers. The files are named in a format word1_word2_word3_word4.txt. The words are different for each of the .txt files.
I would like to add a header to each of those text files based on the filename such that the header is of the format:
'>c_word1_word2'
Is there a way to do this for all the .txt files using the command line or a bash script?

Use this in bash and sed using some parameter expansion builtins :
for i in *.txt; do j="${i%_*}"; sed "1i >c_${j%_*}" "$i"; done
Output :
$ ls -1
word1_word2_word3_word4.txt
word2_word2_word3_word4.txt
word3_word2_word3_word4.txt
$ cat word1_word2_word3_word4.txt
foo
bar
base
$ for i in *.txt; do j="${i%_*}"; sed -i "1i >c_${j%_*}" $i; done
$ cat word1_word2_word3_word4.txt
>c_word1_word2
foo
bar
base
$
-i switch edit the file in place

This is just an alternative to Gilles' answer, using a regular expression
for f in *.txt; do
if [[ $f =~ ^([^_]+_[^_]+) ]]; then
sed -i "1i>c_${BASH_REMATCH[1]}" "$f"
fi
done

Related

cat multiple files in separate directories file1 file2 file3....file100 using loop in bash script

I have several files in multiple directories like in directory 1/file1 2/file2 3/file3......100/file100. I want to cat all those files to a single file using loop over index in bash script. Is there easy loop for doing so?
Thanks,
seq 100 | sed 's:.*:dir&/file&:' | xargs cat
seq 100 generates list of numbers from 1 to 100
sed
s substitutes
: separates parts of the command
.* the whole line
: separator. Usually / is used, but it's used in replacement string.
dir&/file& by dir<whole line>/file<whole line>
: separator
so it generates list of dir1/file1 ... dir100/file100
xargs - pass input as arguments to ...
cat - so it will execute cat dir1/file1 dir2/file2 ... dir100/file100.
This code should do the trick;
for((i=1;i<=`ls -l | wc -l`;i++)); do cat dir${i}/file${i} >> output; done
I made an example of what you're describing about your directory structure and files. Create directories and files with It's own content.
for ((i=1;i<=100;i++)); do
mkdir "$i" && touch "$i/file$i" && echo content of "$(pwd) $i" > "$i/file$i"
done
Check the created directories.
ls */*
ls */* | sort -n
If you see that the directories and files are created then proceed to the next step.
This solution does not involve any external command from the shell except of course cat :-)
Now we can check the contents of each files using bash syntax.
i=1
while [[ -e "$i" ]]; do
cat "$i"/*
((i++))
done
This code was tested in dash.
i=1
while [ -e "$i" ]; do
cat "$i"/*
i=$((i+1))
done
Just add the redirection of the output to the file after the done.
You can add some more test if you like see help test
One more thing :-), you can just check the contents using tail and brace expansion
tail -n +1 {1..100}/*
Using cat also you can redirect the output already, just remember brace expansion is bash3+ feature/syntax.
cat {1..100}/*

Batch change multiple file names in bash and save output

I'm trying to change multiple file names with a for loop.
This works to send the output to the screen:
for i in *.gz; do echo $i | sed 's/\-//g'; done
However, when I try to overwrite the file name using sed -i, I get this error:
for i in *.gz; do echo $i | sed -i 's/\-//g'; done
sed: no input files
Any suggestions?
there is a command for this
$ rename - '' *.gz
NB. this is the standard one, not the advanced perl version.
Use Perl rename instead:
rename 's/-//g' *.gz
Or use simple parameter expansion:
for i in *.gz; do mv -- "$i" "${i//-}"; done

Remove middle of filenames

I have a list of filenames like this in bash
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz
UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz
And I want them to look like this
UTSHoS10_R1.fq.gz
UTSHoS10_R2.fq.gz
UTSHoS11_R1.fq.gz
UTSHoS11_R2.fq.gz
UTSHoS12_R1.fq.gz
UTSHoS12_R2.fq.gz
I do not have the perl rename command and sed 's/_Other*160418./_/' *.gz
is not doing anything. I've tried other rename scripts on here but either nothing occurs or my shell starts printing huge amounts of code to the console and freezes.
This post (Removing Middle of Filename) is similar however the answers given do not explain what specific parts of the command are doing so I could not apply it to my problem.
Parameter expansions in bash can perform string substitutions based on glob-like patterns, which allows for a more efficient solution than calling an extra external utility such as sed in each loop iteration:
for f in *.gz; do echo mv "$f" "${f/_Other_*-TTAGGA_R_160418./_}"; done
Remove the echo before mv to perform actual renaming.
You can do something like this in the directory which contains the files to be renamed:
for file_name in *.gz
do
new_file_name=$(sed 's/_[^.]*\./_/g' <<< "$file_name");
mv "$file_name" "$new_file_name";
done
The pattern (_[^.]*\.) starts matching from the FIRST _ till the FIRST . (both inclusive). [^.]* means 0 or more non-dot (or non-period) characters.
Example:
AMD$ ls
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz
UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz
UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz
AMD$ for file_name in *.gz
> do new_file_name=$(sed 's/_[^.]*\./_/g' <<< "$file_name")
> mv "$file_name" "$new_file_name"
> done
AMD$ ls
UTSHoS10_R1.fq.gz UTSHoS10_R2.fq.gz UTSHoS11_R2.fq.gz UTSHoS12_R1.fq.gz UTSHoS12_R2.fq.gz
Pure Bash, using substring operation and assuming that all file names have the same length:
for file in UTS*.gz; do
echo mv -i "$file" "${file:0:9}${file:38:8}"
done
Outputs:
mv -i UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R1.fq.gz UTSHoS10_R1.fq.gz
mv -i UTSHoS10_Other_CAAGCC-TTAGGA_R_160418.R2.fq.gz UTSHoS10_R2.fq.gz
mv -i UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz UTSHoS11_R2.fq.gz
mv -i UTSHoS11_Other_AGGCCT-TTAGGA_R_160418.R2.fq.gz UTSHoS11_R2.fq.gz
mv -i UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R1.fq.gz UTSHoS12_R1.fq.gz
mv -i UTSHoS12_Other_GGCAAG-TTAGGA_R_160418.R2.fq.gz UTSHoS12_R2.fq.gz
Once verified, remove echo from the line inside the loop and run again.
Going with your sed command, this can work as a bash one-liner:
for name in UTSH*fq.gz; do newname=$(echo $name | sed 's/_Other.*160418\./_/'); echo mv $name $newname; done
Notes:
I've adjusted your sed command: it had an * without a preceeding . (sed takes a regular expression, not a globbing pattern). Similarly, the dot needs escaping.
To see if it works, without actually renaming the files, I've left the echo command in. Easy to remove just that to make it functional.
It doesn't have to be a one-liner, obviously. But sometimes, that makes editing and browsing your command-line history easier.

bash removing part of a file name

I have the following files in the following format:
$ ls CombinedReports_LLL-*'('*.csv
CombinedReports_LLL-20140211144020(Untitled_1).csv
CombinedReports_LLL-20140211144020(Untitled_11).csv
CombinedReports_LLL-20140211144020(Untitled_110).csv
CombinedReports_LLL-20140211144020(Untitled_111).csv
CombinedReports_LLL-20140211144020(Untitled_12).csv
CombinedReports_LLL-20140211144020(Untitled_13).csv
CombinedReports_LLL-20140211144020(Untitled_14).csv
CombinedReports_LLL-20140211144020(Untitled_15).csv
CombinedReports_LLL-20140211144020(Untitled_16).csv
CombinedReports_LLL-20140211144020(Untitled_17).csv
CombinedReports_LLL-20140211144020(Untitled_18).csv
CombinedReports_LLL-20140211144020(Untitled_19).csv
I would like this part removed:
20140211144020 (this is the timestamp the reports were run so this will vary)
and end up with something like:
CombinedReports_LLL-(Untitled_1).csv
CombinedReports_LLL-(Untitled_11).csv
CombinedReports_LLL-(Untitled_110).csv
CombinedReports_LLL-(Untitled_111).csv
CombinedReports_LLL-(Untitled_12).csv
CombinedReports_LLL-(Untitled_13).csv
CombinedReports_LLL-(Untitled_14).csv
CombinedReports_LLL-(Untitled_15).csv
CombinedReports_LLL-(Untitled_16).csv
CombinedReports_LLL-(Untitled_17).csv
CombinedReports_LLL-(Untitled_18).csv
CombinedReports_LLL-(Untitled_19).csv
I was thinking simply along the lines of the mv command, maybe something like this:
$ ls CombinedReports_LLL-*'('*.csv
but maybe a sed command or other would be better
rename is part of the perl package. It renames files according to perl-style regular expressions. To remove the dates from your file names:
rename 's/[0-9]{14}//' CombinedReports_LLL-*.csv
If rename is not available, sed+shell can be used:
for fname in Combined*.csv ; do mv "$fname" "$(echo "$fname" | sed -r 's/[0-9]{14}//')" ; done
The above loops over each of your files. For each file, it performs a mv command: mv "$fname" "$(echo "$fname" | sed -r 's/[0-9]{14}//')" where, in this case, sed is able to use the same regular expression as the rename command above. s/[0-9]{14}// tells sed to look for 14 digits in a row and replace them with an empty string.
Without using an other tools like rename or sed and sticking strictly to bash alone:
for f in CombinedReports_LLL-*.csv
do
newName=${f/LLL-*\(/LLL-(}
mv -i "$f" "$newName"
done
for f in CombinedReports_LLL-* ; do
b=${f:0:20}${f:34:500}
mv "$f" "$b"
done
You can try line by line on shell:
f="CombinedReports_LLL-20140211144020(Untitled_11).csv"
b=${f:0:20}${f:34:500}
echo $b
You can use the rename utility for this. It uses syntax much like sed to change filenames. The following example (from the rename man-page) shows how to remove the trailing '.bak' extension from a list of backup files in the local directory:
rename 's/\.bak$//' *.bak
I'm using the advice given in the top response and have put the following line into a shell script:
ls *.nii | xargs rename 's/[f_]{2}//' f_0*.nii
In terminal, this line works perfectly, but in my script it will not execute and reads * as a literal part of the file name.

Ubuntu:- append remove keyword to a file

I have a file containing file name.
file1
file2
file3
file4
I wan to create a shell script that add the'rm' infront
rm file1
rm file2
rm file3
rm file4
How to append the rm in front the file name?
You can do that many ways - sed, vim, perl, awk.
Or you can simply use xargs like this:
xargs rm < filelist
If you really insist on editing filelist, use sed:
sed 's/^/rm /g' filelist > newscript
(which means find start of line ^ and replace it with rm for every line /g).
You can even edit filelist in-place using sed -i:
sed -i 's/^/rm /g' filelist
I think mvp's answer is the best, but if you're talking about changing your current file list to a shell script with rm inserted before each filename, you can do this simply with any good text editor that supports find and replace with regular expressions.
Search term : ^(.)
Replacement : rm \1
Vi one-liner :
:%s/^/rm /
Another way could be using a simple shell script.
#!/bin/sh
FILE=$1
while read line
do
echo "Removing $line"
rm $line
done < $FILE
You could then run it as sh multirm.sh filelist
If you want to simply add rm in to the file you could use awk for that.
awk '{ print "rm", $1" }' filelist

Resources