Neuroimaging: AFNI bash - bash

This is the error that I receiveI am new to this, and I want to run the following script in shell
but I am getting errors, COLUMN.nii file is the nifty file with a collection of masks with value ranging from 1 to 10 and I want to separate these masks into individual nifty file using this AFNI command in a for loop.
Any suggestions are welcomed,
thanks
K.
for i in {1..10};
  do
3dcalc -a COLUMNS.nii -expr ‘equals(a, "${i}”)’ -prefix col_"${i}”.nii;
done

It seems that you may have edited your code using a word processing program, such as Microsoft Word, which doesn't put "normal" single- and double-quote characters into the file.
In your program the line
3dcalc -a COLUMNS.nii -expr ‘equals(a, "${i}”)’ -prefix col_"${i}”.nii;
has those "curly" single- and double-quote characters. Change this to
3dcalc -a COLUMNS.nii -expr 'equals(a, "${i}")' -prefix col_"${i}".nii;
When editing code I recommend that you use a programming-specific editor - there are many out there - rather than a word processing program.

Related

Using space-separated arguments from a field in a tab-separated file

I'm writing a shell script intended to edit audio files using the sox command. I've been running into a strange problem I never encountered in bash scripting before: When defining space separated effects in sox, the command will work when that effect is written directly, but not when it's stored in a variable. This means the following works fine and without any issues:
sox ./test.in.wav ./test.out.wav delay 5
Yet for some reason the following will not work:
IFS=' ' # set IFS to only have a tab character because file is tab-separated
while read -r file effects text; do
sox $file.in.wav $file.out.wav $effects
done <in.txt
...when its in.txt is created with:
printf '%s\t%s\t%s\n' "test" "delay 5" "other text here" >in.txt
The error indicates this is causing it to see the output file as another input.
sox FAIL formats: can't open input file `./output.wav': No such file or directory
I tried everything I could think of: Using quotation marks (sox "$file.in.wav" "$file.out.wav" "$effects"), echoing the variable in-line (sox $file.in.wav $file.out.wav $(echo $effects)), even escaping the space inside the variable (effects="delay\ 5"). Nothing seems to work, everything produces the error. Why does one command work but not the other, what am I missing and how do I solve it?
IFS does not only change the behavior of read; it also changes the behavior of unquoted expansions.
In particular, unquoted expansions' content are split on characters found in IFS, before each element resulting from that split is expanded as a glob.
Thus, if you want the space between delay and 5 to be used for word splitting, you need to have a regular space, not just a tab, in IFS. If you move your IFS assignment to be part of the same simple command as the read, as in IFS=$'\t' read -r file effects text; do, that will stop it from changing behavior in the rest of the script.
However, it's not good practice to use unquoted expansions for word-splitting at all. Use an array instead. You can split your effects string into an array with:
IFS=' ' read -r -a effects_arr <<<"$effects"
...and then run sox "$file.in.wav" "$file.out.wav" "${effects_arr[#]}" to expand each item in the array as a separate word.
By contrast, if you need quotes/escapes/etc to be allowed in effects, see Reading quoted/escaped arguments correctly from a string

Can't seem to use more than one -c argument for tesseract

I'm just using tesseract through bash scripting. I've finally come up with all the settings that recognize my text for my images nearly perfectly; however, I can't seem to use all of the options together. My command is as follows:
$ tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0;tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\
I need the whitelist, because tesseract is picking up some lowercase characters, strange characters (such as yen sign), and other oddities. My images do not contain those characters, and since my document is quite simple I figured it would just be easier to whitelist the ones that do exist. Additionally, the image is in a "table" format (without any lines or borders), and tesseract only picks up the large spaces (which separate columns) and not individual spaces in between words within a column. Setting the tosp value to 0 seemed to fix that problem.
Now the issue is that tesseract won't process with both of those -c arguments at the same time, but the man pages explicitly states that you can use multiple -c arguments!
I've also tried to work around in the following way:
my_config_file
tosp_min_sane_kn_sp 0.0
tessedit_char_whitelist ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\
$ tesseract infile.tif outputbase --psm 6 my_config_file
The config file is saved in the correct directory, but again only one of the options will work at a time. If both options are in the config file, it seems like it ignores the tosp_min_sane_kn_sp 0.0. If I remove one, then the other works.
I'm pulling out my hair here, and I'm about to just work around this issue by running the OCR twice and then just merging the two files with an awk script. I really don't want to do that, however, because its obviously less efficient and I don't really like the idea of trying to use awk when the OCR isn't guaranteed to be formatted 100% in the way that I'm going to have to assume in my potential awk script.
Please help!
EDIT:
I forgot to mention that I have indeed tried to pass multiple -c options. Instead of guessing various field separators in between variables semicolon made the most sense to me because I understand that tesseract is written in C++ which uses semicolons to signify the end of a line. I know C++ isn't interpreted, but it just seemed to make sense. Now I'm digressing . . .
Additionally, I've tried the advice of putting the whitelist in quotation marks, but that has made no difference. I was really excited because that didn't even occur to me, but it doesn't seem that tesseract even recognizes quotations even if I run that one -c argument by itself.
You can't pass multiple arguments to a single -c option, especially not separated by semicolons. I don't have tesseract, but I'm pretty sure you need to pass a separate -c option for each config variable you want to set:
tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0 -c 'tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/\'
(I also enclosed the second variable setting in single-quotes, so the shell doesn't try to interpret the backslash. Without the quotes, it'd escape the newline, so the next line would be treated as a continuation of this one.)
Explanation of the original problem: When the shell sees a semicolon (and it isn't in quotes or escaped), the shell treats it as a command separator. So it treated the line as two completely separate commands (with the next line combined, because of the backslash):
tesseract infile.tif outputbase --psm 6 -c tosp_min_sane_kn_sp=0.0
tessedit_char_whitelist=ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-+&/ <whatever's on the next line of the file>
The first runs tesseract with one -c option, and the second one creates a shell variable named tessedit_char_whitelist. And even if you quoted or escaped it, so the semicolon got passed to tesseract, I suspect it wouldn't treat it as a separator the way you want it to.

Printf splits a string at spaces using Bash [duplicate]

This question already has answers here:
Why a variable assignment replaces tabs with spaces
(2 answers)
Closed 7 years ago.
I'm having some troubles with the printf function in bash.
I wrote a little script on which I pass a name and two letters (such as "sh", "py", "ht") and it creates a file in the current working directory named "name.extension".
For instance, if I execute seed test py a file named test.py is created in the current working dir with the shebang #!/usr/bin/python3.
So far, so good, nothing fancy: I'm learning shell scripting and I thought this could be a simple exercise to test the knowledge gained so far.
The problem is when I want to create an HTML file. This is the function that I use:
creaHtml(){
head='<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
percorso=$CARTELLA_CORRENTE/$NOME_FILE.html
printf $head>>$percorso
chmod 755 $percorso
}
If I run, for instance, seed test ht the correct function (creaHtml) is called, test.html is created but if I try to look into it I only see:
<!--DOCTYPE
And nothing else.
This is the trace for that function:
[sviluppo:~/bin]$ seed test ht
+ creaHtml
+ head='<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
+ percorso=/home/sviluppo/bin/test.html
+ printf '<!--DOCTYPE' 'html-->\n<html>\n\t<head>\n\t\t<meta' 'charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>'
+ chmod 755 /home/sviluppo/bin/test.html
+ set +x
However, if I try to run printf '<!--DOCTYPE html-->\n<html>\n\t<head>\n\t\t<meta charset=\"UTF-8\">\n\t</head>\n\t<body>\n\t</body>\n</html>' from the terminal, I see the correct output: the "skeleton" of an HTML file neatly displayed with indentation and everything. What am I missing here?
Try echo -e instead of printf. printf is for printing formatted strings. Since you didn't protect $head with quotes, bash splits the string to form the command. The first word (before first white space) forms the format string. The rest are just arguments for things you didn't specify to print.
echo -e "$head" > "$percorso"
The -e evaluates your \n into newlines. I changed your >> to > since it looks like you want this to be the whole file, rather than append to any existing file you might have.
You have to be careful with quotes in bash. One thing can become many things. This actually makes it more powerful, but it can be confusing for people learning. Notice that I also put the file name "$percorso" in double quotes too. This evaluates the variable and makes sure that it ends up as one thing. If you use single quotes, it will be one word, but not evaluated. Unlike Python, there is a big difference between single and double quotes.
If you want to use printf for compatibility as #chepner pointed out, just be sure to quote it:
printf "$head" > "$percorso"
Actually that is much simpler anyway.

Substitution of substring doesn't work in bash (tried sed, ${a/b/c/})

Before to write, of course I read many other similar cases. Example I used #!/bin/bash instead of #!/bin/sh
I have a very simple script that reads lines from a template file and wants to replace some keywords with real data. Example the string <NAME> will be replaced with a real name. In the example I want to replace it with the word Giuseppe. I tried 2 solutions but they don't work.
#!/bin/bash
#read the template and change variable information
while read LINE
do
sed 'LINE/<NAME>/Giuseppe' #error: sed: -e expression #1, char 2: extra characters after command
${LINE/<NAME>/Giuseppe} #error: WORD(*) command not found
done < template_mail.txt
(*) WORD is the first word found in the line
I am sorry if the question is too basic, but I cannot see the error and the error message is not helping.
EDIT1:
The input file should not be changed, i want to use it for every mail. Every time i read it, i will change with a different name according to the receiver.
EDIT2:
Thanks your answers i am closer to the solution. My example was a simplified case, but i want to change also other data. I want to do multiple substitutions to the same string, but BASH allows me only to make one substitution. In all programming languages i used, i was able to substitute from a string, but BASH makes this very difficult for me. The following lines don't work:
CUSTOM_MAIL=$(sed 's/<NAME>/Giuseppe/' template_mail.txt) # from file it's ok
CUSTOM_MAIL=$(sed 's/<VALUE>/30/' CUSTOM_MAIL) # from variable doesn't work
I want to modify CUSTOM_MAIL a few times in order to include a few real informations.
CUSTOM_MAIL=$(sed 's/<VALUE1>/value1/' template_mail.txt)
${CUSTOM_MAIL/'<VALUE2>'/'value2'}
${CUSTOM_MAIL/'<VALUE3>'/'value3'}
${CUSTOM_MAIL/'<VALUE4>'/'value4'}
What's the way?
No need to do the loop manually. sed command itself runs the expression on each line of provided file:
sed 's/<NAME>/Giuseppe/' template_mail.txt > output_file.txt
You might need g modifier if there are more appearances of the <NAME> string on one line: s/<NAME>/Giuseppe/g

Bash: Trying to append to a variable name in the output of a function

this is my very first post on Stackoverflow, and I should probably point out that I am EXTREMELY new to a lot of programming. I'm currently a postgraduate student doing projects involving a lot of coding in various programs, everything from LaTeX to bash, MATLAB etc etc.
If you could explicitly explain your answers that would be much appreciated as I'm trying to learn as I go. I apologise if there is an answer else where that does what I'm trying to do, but I have spent a couple of days looking now.
So to the problem I'm trying to solve: I'm currently using a selection of bioinformatics tools to analyse a range of genomes, and I'm trying to somewhat automate the process.
I have a few sequences with names that look like this for instance (all contained in folders of their own currently as paired files):
SOL2511_S5_L001_R1_001.fastq
SOL2511_S5_L001_R2_001.fastq
SOL2510_S4_L001_R1_001.fastq
SOL2510_S4_L001_R2_001.fastq
...and so on...
I basically wish to automate the process by turning these in to variables and passing these variables to each of the programs I use in turn. So for example my idea thus far was to assign them as wildcards, using the R1 and R2 (which appears in all the file names, as they represent each strand of DNA) as follows:
#!/bin/bash
seq1=*R1_001*
seq2=*R2_001*
On a rudimentary level this works, as it returns the correct files, so now I pass these variables to my first function which trims the DNA sequences down by a specified amount, like so:
# seqtk is the program suite, trimfq is a function within it,
# and the options -b -e specify how many bases to trim from the beginning and end of
# the DNA sequence respectively.
seqtk trimfq -b 10 -e 20 $seq1 >
seqtk trimfq -b 10 -e 20 $seq2 >
So now my problem is I wish to be able to append something like "_trim" to the output file which appears after the >, but I can't find anything that seems like it will work online.
Alternatively, I've been hunting for a script that will take the name of the folder that the files are in, and create a variable for the folder name which I can then give to the functions in question so that all the output files are named correctly for use later on.
Many thanks in advance for any help, and I apologise that this isn't really much of a minimum working example to go on, as I'm only just getting going on all this stuff!
Joe
EDIT
So I modified #ghoti 's for loop (does the job wonderfully I might add, rep for you :D ) and now I append trim_, as the loop as it was before ended up giving me a .fastq.trim which will cause errors later.
Is there any way I can append _trim to the end of the filename, but before the extension?
Explicit is usually better than implied, when matching filenames. Your wildcards may match more than you expect, especially if you have versions of the files with "_trim" appended to the end!
I would be more precise with the wildcards, and use for loops to process the files instead of relying on seqtk to handle multiple files. That way, you can do your own processing on the filenames.
Here's an example:
#!/bin/bash
# Define an array of sequences
sequences=(R1_001 R2_001)
# Step through the array...
for seq in ${sequences[#]}; do
# Step through the files in this sequence...
for file in SOL*_${seq}.fastq; do
seqtk trimfq -b 10 -e 20 "$file" > "${file}.trim"
done
done
I don't know how your folders are set up, so I haven't addressed that in this script. But the basic idea is that if you want the script to be able to manipulate individual filenames, you need something like a for loop to handle the that manipulation on a per-filename basis.
Does this help?
UPDATE:
To put _trim before the extension, replace the seqtk line with the following:
seqtk trimfq -b 10 -e 20 "$file" > "${file%.fastq}_trim.fastq"
This uses something documented in the Bash man page under Parameter Expansion if you want to read up on it. Basically, the ${file%.fastq} takes the $file variable and strips off a suffix. Then we add your extra text, along with the suffix.
You could also strip an extension using basename(1), but there's no need to call something external when you can use something built in to the shell.
Instead of setting variables with the filenames, you could pipe the output of ls to the command you want to run with these filenames, like this:
ls *R{1,2}_001* | xargs -I# sh -c 'seqtk trimfq -b 10 -e 20 "$1" > "${1}_trim"' -- #
xargs -I# will grab the output of the previous command and store it in # to be used by seqtk

Resources