Automation using Bash

Automation using Bash - bash

We have total 168 sub directories all will end with doc__t extension where under these sub directories will have some files std_doc__t.lst, PA_doc_*_t_output.
So I need to check in all these sub directories is there any unseen modules in report.rpt for this I have created a small script looks like this:
#!/bin/bash
for d in *_t;
do
echo "$d"
cat $d/PA_*_output/report.rpt | grep "Unseen Module"
echo ""
done
Output of above script is:
doc_txa1_t
Number of Unseen Modules = 4
smro 1 Unseen Module ( 1 )
mapo_HDN 1 Unseen Module ( 1 )
trms7_cpo 1 Unseen Module ( 1 )
overlay 1 Unseen Module ( 1 )
Unseen Modules:
Definition : Unseen Module (line 1)
Definition : Unseen Module (line 1)
doc_fuse_t
Number of Unseen Modules = 1
Unresolved Modules:
TEF07HD18_PH_OVERLAY 4 Unseen Module ( 1 )
Unseen Modules:
Definition : Unseen Module (line 1)
Now I need to search these unseen modules(smro, mapo_HDN, trms7_cpo, overlay, TEF07HD18_PH_OVERLAY) in this path /proj/history/Unseen_Modules if these are available then we need to add below path in this file std_doc_*_t.lst which is available in all sub directories like below :
/proj/history/Unseen_Modules/TEF07HD18_PH_OVERLAY
/proj/history/Unseen_Modules/smro
/proj/history/Unseen_Modules/mapo_HDN
/proj/history/Unseen_Modules/trms7_cpo
/proj/history/Unseen_Modules/overlay
In above bash script can anyone please help me how can I do this after greping the Unseen Module and how I need to search those modules in this path /proj/history/Unseen_Modules/.
I tried this :
#!/bin/bash
path = /proj/history/Unseen_Modules
for d in *_t;
do
echo "$d"
cat $d/PA_*_output/report.rpt | grep "Unseen Module"
done
I am stuck at how to search those modules in $path.

Related

Assiging a number to a module

I need to be able to assign a number to a module, my current code is below. Gap is where it needs to be assigned.
assignmodules(){
#Assign first module
x=1
modules=$(ls ./modules)
for module in modules; do
echo "$module assigned to slot $x"
x=$(x+1)
done
}
A few things:
Module title has to be printed in a different function, so i cant print it within the for loop.
Module needs to be run this way:
case $choice in
1) module1
2) module2
3) module3
etc.
Needs to be printed as following (If i find a good a solution that doesn't do it right, ill probably still use that.):
[1] Module1 [2] Module2
[3] Module3 [4] Module4
etc.
I tried this in the blank:
[{x}]module=$module
(i don't think this is exactly what i tried, i believe it was slightly different but i cant remember)
I wanted it to do as described above, but i don't think it will.

If you're using Bash, just use an array and some globbing and special parameter expansion.
assignmodules() {
set -- ./modules/*
modules=(0 "${#/#./modules/}")
unset 'modules[0]'
declare -p modules # Optionally show result
}
Somewhere you can print the modules list using
for i in "${!modules[#]}"; do
echo "[$i] ${modules[i]}"
done
Or
for i in "${!modules[#]}"; do
echo -n "[$i] ${modules[i]} "
(( i % 2 == 0 )) && echo
done
(( i % 2 )) && echo
It's also recommended to add shopt -s nullglob at the beginning of the script, especially if
modules directory can sometimes be empty.
Lastly, if you're wanting to ask user for choices, look at the select command. You may not need to display them manually. Run help select.

Just put your modules in an array: Numerically-indexed arrays (the default kind) have indices that are numbered by nature (note that these numbers start at 0, not 1).
assignmodules() {
declare -ga modules # explicitly declare global array (optional)
modules=( modules/* ) # put all modules directory entries in an array
modules=( "${modules[#]#*/}" ) # strip modules/ prefix off each entry
}
assignmodules # calling that function leaves modules assigned
# to print your list of modules in the specific format requested might look like:
print_args=( ) # generate format string argument array
for modules_idx in "${!modules[#]}"; do # iterate over indices in array
module=${modules[$module_idx]} # retrieve corresponding module name
print_args+=( "[$module_idx]" "$module" ) # append to list of stuff to print
done
# format string is repeated until all arguments are consumed
# so this way we get two columns as you requested
printf '%s %s\t%s %s\n' "${print_args[#]}" # actually print our list

baseDir issue with nextflow

This might be a very basic question for you guys, however, I am have just started with nextflow and I struggling with the simplest example.
I first explain what I have done and the problem.
Aim: I aim to make a workflow for my bioinformatics analyses as the one here (https://www.nextflow.io/example4.html)
Background: I have installed all the packages that were needed and they all work from the console without any error.
My run: I have used the same script as in example only by replacing the directory names. Here is how I have arranged the directories
location of script
~/raman/nflow/script.nf
location of Fastq files
~/raman/nflow/Data/T4_1.fq.gz
~/raman/nflow/Data/T4_2.fq.gz
Location of transcriptomic file
~/raman/nflow/Genome/trans.fa
The script
#!/usr/bin/env nextflow
/*
* The following pipeline parameters specify the refence genomes
* and read pairs and can be provided as command line options
*/
params.reads = "$baseDir/Data/T4_{1,2}.fq.gz"
params.transcriptome = "$baseDir/HumanGenome/SalmonIndex/gencode.v42.transcripts.fa"
params.outdir = "results"
workflow {
read_pairs_ch = channel.fromFilePairs( params.reads, checkIfExists: true )
INDEX(params.transcriptome)
FASTQC(read_pairs_ch)
QUANT(INDEX.out, read_pairs_ch)
}
process INDEX {
tag "$transcriptome.simpleName"
input:
path transcriptome
output:
path 'index'
script:
"""
salmon index --threads $task.cpus -t $transcriptome -i index
"""
}
process FASTQC {
tag "FASTQC on $sample_id"
publishDir params.outdir
input:
tuple val(sample_id), path(reads)
output:
path "fastqc_${sample_id}_logs"
script:
"""
fastqc "$sample_id" "$reads"
"""
}
process QUANT {
tag "$pair_id"
publishDir params.outdir
input:
path index
tuple val(pair_id), path(reads)
output:
path pair_id
script:
"""
salmon quant --threads $task.cpus --libType=U -i $index -1 ${reads[0]} -2 ${reads[1]} -o $pair_id
"""
}
Output:
(base) ntr#ser:~/raman/nflow$ nextflow script.nf
N E X T F L O W ~ version 22.10.1
Launching `script.nf` [modest_meninsky] DSL2 - revision: 032a643b56
executor > local (2)
executor > local (2)
[- ] process > INDEX (gencode) -
[28/02cde5] process > FASTQC (FASTQC on T4) [100%] 1 of 1, failed: 1 ✘
[- ] process > QUANT -
Error executing process > 'FASTQC (FASTQC on T4)'
Caused by:
Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
Command executed:
fastqc "T4" "T4_1.fq.gz T4_2.fq.gz"
Command exit status:
0
Command output:
(empty)
Command error:
Skipping 'T4' which didn't exist, or couldn't be read
Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
Work dir:
/home/ruby/raman/nflow/work/28/02cde5184f4accf9a05bc2ded29c50
Tip: view the complete command output by changing to the process work dir and entering the command `cat .command.out`
I believe I have an issue with my baseDir understanding. I am assuming that the baseDir is the one where I have my file script.nf I am not sure what is going wrong and how can I fix it.
Could anyone please help or guide.
Thank you

Caused by:
Missing output file(s) `fastqc_T4_logs` expected by process `FASTQC (FASTQC on T4)`
Nextflow complains when it can't find the declared output files. This can occur even if the command completes successfully, i.e. with exit status 0. The problem here is that fastqc simply skips files that don't exist or can't be read (e.g. permissions problems), but it does produce these warnings:
Skipping 'T4' which didn't exist, or couldn't be read
Skipping 'T4_1.fq.gz T4_2.fq.gz' which didn't exist, or couldn't be read
The solution is to just make sure all files exist. Note that the fromFilePairs factory method produces a list of files in the second element. Therefore quoting a space-separated pair of filenames is also problematic. All you need is:
script:
"""
fastqc ${reads}
"""

Script usage for output

I am having 230 directories(*_t) where I need to grep "Unseen Issues" in report.rt file in all directories.
I have tried this :
grep -r "Unseen Issues" *_t/A_*/ar.rt
I got this where pf_t, pu_t, pv_t, pz_t are directories:
pf_t/A_output/ar.rt:Number of Unseen Issues = 3
pf_t/A_output/ar.rt:adsd1p2r 50 Unseen Issues ( 1 )
pf_t/A_output/ar.rt:edsd1p2r 50 Unseen Issues ( 1 )
pf_t/A_output/ar.rt:wdsd1p2r 50 Unseen Issues ( 1 )
pu_t/A_output/ar.rt:Number of Unseen Issues = 0
pv_t/A_output/ar.rt:Number of Unseen Issues = 0
pz_t/A_output/ar.rt:Number of Unseen Issues = 0
But I need the output in this way below:
pf_t
Number of Unseen Issues = 3
adsd1p2r 50 Unseen Issues ( 1 )
edsd1p2r 50 Unseen Issues ( 1 )
wdsd1p2r 50 Unseen Issues ( 1 )
pu_t
Number of Unseen Issues = 0
pv_t
Number of Unseen Issues = 0
pz_t
Number of Unseen Issues = 0
Can anyone please help me with any small script to get the output as above with using above grep command.
We can use any script can anyone please help me.

I would recommend using AWK if the intermediate text file test1.txt is important. Otherwise a shell loop is simple:
for d in *_t; do
echo "$d"
grep -h "Unseen Issues" $d/A_*/ar.rt
echo ""
done
Added a test case and ran the script from command-line:
$ find * -type f
pf_t/A_output/ar.rt
pu_t/A_output/ar.rt
pv_t/A_output/ar.rt
pz_t/A_output/ar.rt
$ for d in *_t; do echo "$d"; grep -h "Unseen Issues" $d/A_*/ar.rt; echo ""; done
pf_t
Number of Unseen Issues = 3
adsd1p2r 50 Unseen Issues ( 1 )
edsd1p2r 50 Unseen Issues ( 1 )
wdsd1p2r 50 Unseen Issues ( 1 )
pu_t
Number of Unseen Issues = 0
pv_t
Number of Unseen Issues = 0
pz_t
Number of Unseen Issues = 0

batch processing : File name comparison error

I have written a program (Cifti_subject_fmri) which compares whether file name matches in two folders and essentially executes a set of instructions
#!/bin/bash -- fix_mni_paths
source activate ciftify_v1.0.0
export SUBJECTS_DIR=/scratch/m/mchakrav/dev/functional_data
export HCP_DATA=/scratch/m/mchakrav/dev/tCDS_ciftify
## make the $SUBJECTS_DIR if it does not already exist
mkdir -p ${HCP_DATA}
SUBJECTS=`cd $SUBJECTS_DIR; ls -1d *` ## list of my subjects
HCP=`cd $HCP_DATA; ls -1d *` ## List of HCP Subjects
cd $HCP_DATA
## submit the files to the queue
for i in $SUBJECTS;do
for j in $HCP ; do
if [[ $i == $j ]];then
parallel "echo ciftify_subject_fmri $i/filtered_func_data.nii.gz $j fMRI " ::: $SUBJECTS |qbatch --walltime '05:00:00' --ppj 8 -c 4 -j 4 -N ciftify_subject_fmri -
fi
done
done
When i run this code in the cluster i am getting an error which says
./Cifti_subject_fmri: [[AS1: command not found
The query ciftify_subject_fmri is part of toolbox ciftify, for it to execute it requires following instructions
ciftify_subject_fmri <func.nii.gz> <Subject> <NameOffMRI>
I have 33 subjects [AS1 -AS33] each with its own func.nii.gz files located SUBJECTS directory,the results need to be populated in HCP directory, fMRI is name of file format .
Could some one kindly let me know why i am getting an error in loop

To run "pylint" tool on multiple Python files on Windows Machine

I want to run "pylint" tool on multiple python files present under one directory.
I want one consolidated report for all the Python files.
I am able to run individually one python file, but want to run on bunch of files.
Please help with the command for the same.

I'm not a windows user, but isn't "pylint directory/*.py" enough ?
If the directory is a package (in the PYTHONPATH), you may also run "pylint directory"

Someone wrote a wrapper in python 2 to handle this
The code :
#! /usr/bin/env python
'''
Module that runs pylint on all python scripts found in a directory tree..
'''
import os
import re
import sys
total = 0.0
count = 0
def check(module):
'''
apply pylint to the file specified if it is a *.py file
'''
global total, count
if module[-3:] == ".py":
print "CHECKING ", module
pout = os.popen('pylint %s'% module, 'r')
for line in pout:
if re.match("E....:.", line):
print line
if "Your code has been rated at" in line:
print line
score = re.findall("\d+.\d\d", line)[0]
total += float(score)
count += 1
if __name__ == "__main__":
try:
print sys.argv
BASE_DIRECTORY = sys.argv[1]
except IndexError:
print "no directory specified, defaulting to current working directory"
BASE_DIRECTORY = os.getcwd()
print "looking for *.py scripts in subdirectories of ", BASE_DIRECTORY
for root, dirs, files in os.walk(BASE_DIRECTORY):
for name in files:
filepath = os.path.join(root, name)
check(filepath)
print "==" * 50
print "%d modules found"% count
print "AVERAGE SCORE = %.02f"% (total / count)

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Automation using Bash - bash

Related

Assiging a number to a module

baseDir issue with nextflow

Script usage for output

batch processing : File name comparison error

To run "pylint" tool on multiple Python files on Windows Machine

Categories

Resources