replace $1 variable in file with 1-10000 - bash

I want to create 1000s of this one file.
All I need to replace in the file is one var
kitename = $1
But i want to do that 1000s of times to create 1000s of diff files.
I'm sure it involves a loop.
people answering people is more effective than google search!
thx

I'm not really sure what you are asking here, but the following will create 1000 files named filename.n containing 1 line each which is "kite name = n" for n = 1 to n = 1000
for i in {1..1000}
do
echo "kitename = $i" > filename.$i
done

If you have mysql installed, it comes with a lovely command line util called "replace" which replaces files in place across any number of files. Too few people know about this, given it exists on most linux boxen everywhere. Syntax is easy:
replace SEARCH_STRING REPLACEMENT -- targetfiles*
If you MUST use sed for this... that's okay too :) The syntax is similar:
sed -i.bak s/SEARCH_STRING/REPLACEMENT/g targetfile.txt
So if you're just using numbers, you'd use something like:
for a in {1..1000}
do
cp inputFile.html outputFile-$a.html
replace kitename $a -- outputFile-$a.html
done
This will produce a bunch of files "outputFile-1.html" through "outputFile-1000.html", with the word "kitename" replaced by the relevant number, inside the file.
But, if you want to read your lines from a file rather than generate them by magic, you might want something more like this (we're not using "for a in cat file" since that splits on words, and I'm assuming here you'd have maybe multi-word replacement strings that you'd want to put in:
cat kitenames.txt | while read -r a
do
cp inputFile.html "outputFile-$a.html"
replace kitename "$a" -- kitename-$a
done
This will produce a bunch of files like "outputFile-red kite.html" and "outputFile-kite with no string.html", which have the word "kitename" replaced by the relevant name, inside the file.

Related

Iterating through pairs of files with glob

I'm having a difficult time trying to iterate through a long set of files that I need to pair up to run through some process. I'd like to generate a bit of a batch file, pairing each set of matching files one per line. I've done this kind of thing before when it's a simple replacement (e.g. file1 = something.txt, file2 = something.csv). But in this case, the end of the file string is a random UUID, and I can't figure out how to get bash to properly expand the glob the second file.
Given a directory of files like this:
banana_pre-proc_b101a65a-31c7-5e4f-b433-bac4fb1efc1f.txt
banana_proc_a75b3a3e-7140-1cb6-2ad1-c10f7db6743f.txt
cherry_pre-proc_f5d0716f-c205-b0b4-5c63-d33755767de4.txt
cherry_proc_025ff6d5-534d-0020-5446-5da3ed04adc6.txt
kiwi_pre-proc_26075f3b-e3a2-fc1a-a741-615cacfc1a7e.txt
kiwi_proc_be1760f6-413d-edc0-1efc-a134b1b6bfbb.txt
peach_pre-proc_ecafbb30-3df0-6014-61ee-11d1d5745b53.txt
peach_proc_bb3ea3fc-671e-e024-6e61-06a2bc147363.txt
pear_pre-proc_c2db376f-f351-7141-114e-a2ebc3cfc410.txt
pear_proc_ccb2f16a-27cd-c70d-7aac-ce72c3af6575.txt
How can I get a file that looks like:
banana_pre-proc_b101a65a-31c7-5e4f-b433-bac4fb1efc1f.txt banana_proc_a75b3a3e-7140-1cb6-2ad1-c10f7db6743f.txt
cherry_pre-proc_f5d0716f-c205-b0b4-5c63-d33755767de4.txt cherry_proc_025ff6d5-534d-0020-5446-5da3ed04adc6.txt
kiwi_pre-proc_26075f3b-e3a2-fc1a-a741-615cacfc1a7e.txt kiwi_proc_be1760f6-413d-edc0-1efc-a134b1b6bfbb.txt
peach_pre-proc_ecafbb30-3df0-6014-61ee-11d1d5745b53.txt peach_proc_bb3ea3fc-671e-e024-6e61-06a2bc147363.txt
pear_pre-proc_c2db376f-f351-7141-114e-a2ebc3cfc410.txt pear_proc_ccb2f16a-27cd-c70d-7aac-ce72c3af6575.txt
I thought I could do something like
for f in *pre-proc_*txt; do echo "$f" "${f/-pre-proc_/-proc_}"; done
But that doesn't deal with the UUID at the end of the file. I've tried a few other iterations of this strategy too, but none get any closer. What is the trick to doing this? Obviously for a few files like this, I can just manually do it. But, the actual set of files I need to process is quite long and apart from just pulling them all into a text doc and then using some Vim macro or something, I'm a bit baffled as to how to get Bash to expand the glob like I'm intending.
This seems to work:
for preproc in *_pre-proc*; do
base=${preproc%_pre-proc*}
proc=${base}_proc*
echo $preproc $proc
done
We get a base name by stripping of the _pre_proc<uuid> part, and
then use the base name to find the matching _proc file.
This I think should be sufficient:
printf "%s %s\n" *[-_]proc_*.txt
Glob expansions are sorted and the pairs of files share the same prefix.

Call script on all file names starting with string in folder bash

I have a set of files I want to perform an action on in a folder that i'm hoping to write a scipt for. Each file starts with mazeFilex where x can vary from any number , is there a quick and easy way to perform an action on each file? e.g. I will be doing
cat mazeFile0.txt | ./maze_ppm 5 | convert - maze0.jpg
how can I select each file knowing the file will always start with mazeFile?
for fname in mazeFile*
do
base=${fname%.txt}
base=${base#mazeFile}
./maze_ppm 5 <"$fname" | convert - "maze${base}.jpg"
done
Notes
for fname in mazeFile*; do
This codes starts the loop. Written this way, it is safe for all filenames, whether they have spaces, tabs or whatever in their names.
base=${fname%.txt}; base=${base#mazeFile}
This removes the mazeFile prefix and .txt suffix to just leave the base name that we will use for the output file.
./maze_ppm 5 <"$fname" | convert - "maze${base}.jpg"
The output filename is constructed using base. Note also that cat was unnecessary and has been removed here.
for i in mazeFile*.txt ; do ./maze_ppm 5 <$i | convert - `basename maze${i:8} .txt`.jpg ; done
You can use a for loop to run through all the filenames.
#!/bin/bash
for fn in mazeFile*; do
echo "the next file is $fn"
# do something with file $fn
done
See answer here as well: Bash foreach loop
I see you want a backreference to the number in the mazeFile. Thus I recommend John1024's answer.
Edit: removes the unnecessary ls command, per #guido 's comment.

Bash scripting print list of files

Its my first time to use BASH scripting and been looking to some tutorials but cant figure out some codes. I just want to list all the files in a folder, but i cant do it.
Heres my code so far.
#!/bin/bash
# My first script
echo "Printing files..."
FILES="/Bash/sample/*"
for f in $FILES
do
echo "this is $f"
done
and here is my output..
Printing files...
this is /Bash/sample/*
What is wrong with my code?
You misunderstood what bash means by the word "in". The statement for f in $FILES simply iterates over (space-delimited) words in the string $FILES, whose value is "/Bash/sample" (one word). You seemingly want the files that are "in" the named directory, a spatial metaphor that bash's syntax doesn't assume, so you would have to explicitly tell it to list the files.
for f in `ls $FILES` # illustrates the problem - but don't actually do this (see below)
...
might do it. This converts the output of the ls command into a string, "in" which there will be one word per file.
NB: this example is to help understand what "in" means but is not a good general solution. It will run into trouble as soon as one of the files has a space in its nameā€”such files will contribute two or more words to the list, each of which taken alone may not be a valid filename. This highlights (a) that you should always take extra steps to program around the whitespace problem in bash and similar shells, and (b) that you should avoid spaces in your own file and directory names, because you'll come across plenty of otherwise useful third-party scripts and utilities that have not made the effort to comply with (a). Unfortunately, proper compliance can often lead to quite obfuscated syntax in bash.
I think problem in path "/Bash/sample/*".
U need change this location to absolute, for example:
/home/username/Bash/sample/*
Or use relative path, for example:
~/Bash/sample/*
On most systems this is fully equivalent for:
/home/username/Bash/sample/*
Where username is your current username, use whoami to see your current username.
Best place for learning Bash: http://www.tldp.org/LDP/abs/html/index.html
This should work:
echo "Printing files..."
FILES=(/Bash/sample/*) # create an array.
# Works with filenames containing spaces.
# String variable does not work for that case.
for f in "${FILES[#]}" # iterate over the array.
do
echo "this is $f"
done
& you should not parse ls output.
Take a list of your files)
If you want to take list of your files and see them:
ls ###Takes list###
ls -sh ###Takes list + File size###
...
If you want to send list of files to a file to read and check them later:
ls > FileName.Format ###Takes list and sends them to a file###
ls > FileName.Format ###Takes list with file size and sends them to a file###

Bash: Trying to append to a variable name in the output of a function

this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?
Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.
Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk

How to rename files keeping a variable part of the original file name

I'm trying to make a script that will go into a directory and run my own application with each file matching a regular expression, specifically Test[0-9]*.txt.
My input filenames look like this TestXX.txt. Now, I could just use cut and chop off the Test and .txt, but how would I do this if XX wasn't predefined to be two digits? What would I do if I had Test1.txt, ..., Test10.txt? In other words, How would I get the [0-9]* part?
Just so you know, I want to be able to make a OutputXX.txt :)
EDIT:
I have files with filename Test[0-9]*.txt and I want to manipulate the string into Output[0-9]*.txt
Would something like this help?
#!/bin/bash
for f in Test*.txt ;
do
process < $f > ${f/Test/Output}
done
Bash Shell Parameter Expansion
A good tutorial on regexes in bash is here. Summarizing, you need something like:
if [[$filenamein =~ "^Test([0-9]*).txt$"]]; then
filenameout = "Output${BASH_REMATCH[1]}.txt"
and so on. The key is that, when you perform the =~" regex-match, the "sub-matches" to parentheses-enclosed groups in the RE are set in the entries of arrayBASH_REMATCH(the[0]entry is the whole match,1` the first parentheses-enclosed group, etc).
You need to use rounded brackets around the part you want to keep.
i.e. "Test([0-9]*).txt"
The syntax for replacing these bracketed groups varies between programs, but you'll probably find you can use \1 , something like this:
s/Test(0-9*).txt/Output\1.txt/
If you're using a unix shell, then 'sed' might be your best bet for performing the transformation.
http://www.grymoire.com/Unix/Sed.html#uh-4
Hope that helps
for file in Test[0-9]*.txt;
do
num=${file//[^0-9]/}
process $file > "Output${num}.txt"
done

Resources