Storing a line in a variable - shell

Hi I have the following batch script where I submitted each file to a separate processing as follows:
for file in ../Positive/*.txt_rn; do
bsub <<EOF
#BSUB -L /bin/bash
#BSUB -W 150:00
#BSUB -M 10000
#BSUB -n 3
#BSUB -e /somefolder/errors/%J.err
#BSUB -o /somefolder/errors/%J.out
while read line; do
name=`cat \$line | awk '{print $1":"$2"-"$3}'`
four=`cat \$line | awk '{print $4}' | cut -d\: -f4`
fasta=\$name".fa"
op=\$name".rs"
echo \$name | xargs samtools faidx /somefolder/rn4/Rattus_norvegicus/UCSC/rn4/Sequence/WholeGenomeFasta/genome.fa > \$fasta
Process -F \$fasta -M "list_"\$four".txt" -p 0.003 | awk '(\$5 >= 0.67)' > \$op
if [ -s "\$op" ]
then
cat "\$line" >> ../Positive_Strand/$file".cons"
fi
rm \$lne
rm \$op
rm \$fasta
done < $file
EOF
done
I am am somehow unable to store the values of the column from the line (which is in $line variable into the $name and $four variable and hence unable to carry on further processes. Also any suggestions to edit the code for a better version of it would be welcome.

If you change EOF to 'EOF' then you will more properly disable shell interpretation. Your problem is that your back-ticks (`) are not escaped.
I've fixed your indentation and cleaned up some of your code. Note that the syntax highlighting here doesn't understand cat <<'EOF'. If you paste that into vim with highlighting enabled, you'll see that block is all the same color since it's just a string.
bsub_helper() {
cat <<'EOF'
#BSUB -L /bin/bash
#BSUB -W 150:00
#BSUB -M 10000
#BSUB -n 3
#BSUB -e /somefolder/errors/%J.err
#BSUB -o /somefolder/errors/%J.out
while read line; do
name=`cat $line | awk '{print $1":"$2"-"$3}'`
four=`cat $line | awk '{print $4}' | cut -d: -f4`
fasta="$name.fa"
op="$name.rs"
genome="/somefolder/rn4/Rattus_norvegicus/UCSC/rn4/Sequence/WholeGenomeFasta/genome.fa"
echo $name | xargs samtools faidx "$genome" > "$fasta"
Process -F "$fasta" -M "list_$four.txt" -p 0.003 | awk '($5 >= 0.67)' > "$op"
if [ -s "$op" ]
then
cat "$line" >> "../Positive_Strand/$file.cons"
fi
rm "$lne" "$op" "$fasta"
EOF
echo " done < \"$1\""
}
for file in ../Positive/*.txt_rn; do
bsub_helper "$file" |bsub
done
I created a helper function because I needed to get the input in two commands. I am assuming that $file is the only variable in that block that you want interpreted. I also surrounded that variable (among others) with quotes so that the code can support file names with spaces in them. The final line of the helper has nested double quotes for this reason.
I left your echo $name | xargs … line alone because it's so odd. Without quotes around $name, xargs will take each whitespace-separated entry as its own file. With quotes, xargs will only supply one (likely invalid) file name to samtools.
If $name is a single file, try:
samtools faidx "$genome" "$name" > "$fasta"
If $name is multiple files and none of them have spaces, try:
samtools faidx "$genome" $name > "$fasta"
The only reason to use xargs here would be if you have too much content for one command line, but if you're running echo $name | xargs then you'll run into the same problem.

Related

Shell: Add string to the end of each line, which match the pattern. Filenames are given in another file

I'm still new to the shell and need some help.
I have a file stapel_old.
Also I have in the same directory files like english_old_sync, math_old_sync and vocabulary_old_sync.
The content of stapel_old is:
english
math
vocabulary
The content of e.g. english is:
basic_grammar.md
spelling.md
orthography.md
I want to manipulate all files which are given in stapel_old like in this example:
take the first line of stapel_old 'english', (after that math, and so on)
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
search in english_old_sync line by line for the pattern '.md'
And append to each line after .md :::#a1
The result should be e.g. of english_old_sync:
basic_grammar.md:::#a1
spelling.md:::#a1
orthography.md:::#a1
of math_old_sync:
geometry.md:::#a1
fractions.md:::#a1
and so on. stapel_old should stay unchanged.
How can I realize that?
I tried with sed -n, while loop (while read -r line), and I'm feeling it's somehow the right way - but I still get errors and not the expected result after 4 hours inspecting and reading.
Thank you!
EDIT
Here is the working code (The files are stored in folder 'olddata'):
clear
echo -e "$(tput setaf 1)$(tput setab 7)Learning directories:$(tput sgr 0)\n"
# put here directories which should not become flashcards, command: | grep -v 'name_of_directory_which_not_to_learn1' | grep -v 'directory2'
ls ../ | grep -v 00_gliederungsverweise | grep -v 0_weiter | grep -v bibliothek | grep -v notizen | grep -v Obsidian | grep -v z_nicht_uni | tee olddata/stapel_old
# count folders
echo -ne "\nHow much different folders: " && wc -l olddata/stapel_old | cut -d' ' -f1 | tee -a olddata/stapel_old
echo -e "Are this learning directories correct? [j ODER y]--> yes; [Other]-->no\n"
read lernvz_korrekt
if [ "$lernvz_korrekt" = j ] || [ "$lernvz_korrekt" = y ];
then
read -n 1 -s -r -p "Learning directories correct. Press any key to continue..."
else
read -n 1 -s -r -p "Learning directories not correct, please change in line 4. Press any key to continue..."
exit
fi
echo -e "\n_____________________________\n$(tput setaf 6)$(tput setab 5)Found cards:$(tput sgr 0)$(tput setaf 6)\n"
#GET && WRITE FOLDER NAMES into olddata/stapel_old
anzahl_zeilen=$(cat olddata/stapel_old |& tail -1)
#GET NAMES of .md files of every stapel and write All to 'stapelname'_old_sync
i=0
name="var_$i"
for (( num=1; num <= $anzahl_zeilen; num++ ))
do
i="$((i + 1))"
name="var_$i"
name=$(cat olddata/stapel_old | sed -n "$num"p)
find ../$name/ -name '*.md' | grep -v trash | grep -v Obsidian | rev | cut -d'/' -f1 | rev | tee olddata/$name"_old_sync"
done
(tput sgr 0)
I tried to add:
input="olddata/stapel_old"
while IFS= read -r line
do
sed -n "$line"p olddata/stapel_old
done < "$input"
The code to change only the english_old_sync is:
lines=$(wc -l olddata/english_old_sync | cut -d' ' -f1)
for ((num=1; num <= $lines; num++))
do
content=$(sed -n "$num"p olddata/english_old_sync)
sed -i "s/"$content"/""$content":::#a1/g"" olddata/english_old_sync
done
So now, this need to be a inner for-loop, of a outer for-loop which holds the variable for english, right?
stapel_old should stay unchanged.
You could try a while + read loop and embed sed inside the loop.
#!/usr/bin/env bash
while IFS= read -r files; do
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
convert in this case english to english_old_sync, (or after that what is given in second line, e.g. math to math_old_sync)
cp copies the file with a new name, if the goal is renaming the original file name from the content of the file staple_old then change cp to mv
The -n and -i flag from sed was ommited , include it, if needed.
The script also assumes that there are no empty/blank lines in the content of staple_old file. If in case there are/is add an addition test after the line where the do is.
[[ -n $files ]] || continue
It also assumes that the content of staple_old are existing files. Just in case add an additional test.
[[ -e $files ]] || { printf >&2 '%s no such file or directory.\n' "$files"; continue; }
Or an if statement.
if [[ ! -e $files ]]; then
printf >&2 '%s no such file or directory\n' "$files"
continue
fi
See also help test
See also help continue
Combining them all together should be something like:
#!/usr/bin/env bash
while IFS= read -r files; do
[[ -n $files ]] || continue
[[ -e $files ]] || {
printf >&2 '%s no such file or directory.\n' "$files"
continue
}
echo cp -v "$files" "${files}_old_sync" &&
echo sed '/^.*\.md$/s/$/:::#a1/' "${files}_old_sync"
done < olddata/staple_old
Remove the echo's If you're satisfied with the output so the script could copy/rename and edit the files.

Bash script to stdout stuck with redirect

My bash script is the following:
#!/bin/bash
if [ ! -f "$1" ]; then
exit
fi
while read line;do
str1="[GAC]*T"
num=$"(echo $line | tr -d -c 'T' | wc -m)"
for((i=0;i<$num;i++))do
echo $line | sed "s/$str1/&\n/" | head -n1 -q
str1="${str1}[GAC]*T"
done
str1="[GAC]*T"
done < "$1
While it works normally as it should (take the filename input and print it line by line until the letter T and next letter T and so on) it prints to the terminal.
Input:
GATTT
ATCGT
Output:
GAT
GATT
GATTT
AT
ATCGT
When I'm using the script with | tee outputfile the outputfile is correct but when using the script with > outputfile the terminal hangs / is stuck and does not finish. Moreover it works with bash -x scriptname inputfile > outputfile but is stuck with bash scriptname inputfile > outputfile.
I made modifications to your original script, please try:
if [ ! -f "$1" ]; then
exit
fi
while IFS='' read -r line || [[ -n "$line" ]];do
str1="[GAC]*T"
num=$(echo $line | tr -d -c 'T' | wc -m)
for((i=0;i<$num;i++));do
echo $line | sed "s/$str1/&\n/" | head -n1 -q
str1="${str1}[GAC]*T"
done
str1="[GAC]*T"
done < "$1"
For input:
GATTT
ATCGT
This script outputs:
GAT
GATT
GATTT
AT
ATCGT
Modifications made to your original script were:
Line while read line; do changed to while IFS='' read -r line || [[ -n "$line" ]]; do. Why I did this is explained here: Read a file line by line assigning the value to a variable
Line num=$"(echo $line | tr -d -c 'T' | wc -m)" changed to num=$(echo $line | tr -d -c 'T' | wc -m)
Line for((i=0;i<$num;i++))do changed to for((i=0;i<$num;i++));do
Line done < "$1 changed to done < "$1"
Now you can do: ./scriptname inputfile > outputfile
Try:
sed -r 's/([^T]*T+)/\1\n/g' gatc.txt > outputfile
instead of your script.
It takes some optional non-Ts, followed by at least one T and inserts a newline after the T.
cat gatc.txt
GATGATTGATTTATATCGT
sed -r 's/([^T]*T+)/\1\n/g' gatc.txt
GAT
GATT
GATTT
AT
AT
CGT
For multiple lines, to delete empty lines in the end:
echo "GATTT
ATCGT" | sed -r 's/([^T]*T+)/\1\n/g;' | sed '/^$/d'
GATTT
AT
CGT

bash scripting to add users

I created a bash script to read information such as username, group etc., from a text file and create users based on it in linux. The code seems to function properly and creates the users as desired. But the user information in the last line of the text file always gets misinterpreted. Even if i delete it then the next last line gets misinterpreted i.e., the text is read wrongly.
`
#!/bin/bash
userfile="users.txt"
IFS=$'\n'
if [ ! -f "$userfile" ]
then
echo "File does not exist. Specify a valid file and try again. "
exit
fi
groups=(`cut -f 4 "$userfile" | sed 's/ //'`)
fullnames=(`cut -f 1 "$userfile" | sed 's/,//' | sed 's/"//g'`)
username1=(`cut -f 1 "$userfile" |sed 's/,//' | sed 's/"//' | tr [A-Z] [a-z] | awk '{print substr($2,1,1) substr($3,1,1) substr($1,1,1)}'`)
username2=(`cut -f 4 "$userfile" | tr [A-Z] [a-z] | awk '{print substr($1,1,1)}'`)
i=0
n=${#username1[#]}
for (( q=0; q<n; q++ ))
do
usernames[$q]=${username1[$q]}"${username2[$q]}"
done
declare -a usernames
x=0
created=0
for user in ${usernames[*]}
do
adduser -c ${fullnames[$x]} -p 123456789 -f 15 -m -d /home/${groups[$x]}/$user -K LOGIN_RETRIES=3 -K PASS_MAX_DAYS=30 -K PASS_WARN_AGE=3 -N -s /bin/bash $user 2> /dev/null
usermod -g ${groups[$x]} $user
chage -d 0 $user
let created=$created+1
x=$x+1
echo -e "User $user created "
done
echo "$created Users created"
enter image description here`
#!/bin/bash
userfile="./users.txt"; # <-- Config
while read line; do
# FULL NAME
# Capture all between quotes as full name
fullname=$(printf '%s' "${line}" | sed 's/^"\(.*\)".*/\1/')
# Remove spaces and punctuations???:
fullname=$(printf '%s' "${fullname}" | tr -d '[:punct:][:blank:]')
# Right-side names:
partb=$(printf '%s' "${line}" | sed "s/^\".*\"//g")
# CODE 1, capture second row
code1=$(printf '%s' "${partb}" | cut -f 2 )
# CODE 2, capture third row
code2=$(printf '%s' "${partb}" | cut -f 3 )
# GROUP, capture fourth row
group=$(printf '%s' "${partb}" | cut -f 4 )
# Print only for report
echo "fullname: ${fullname}\n code 1: ${code1}\n code 2: ${code2}\n group: ${group}\n"
done <${userfile}
Maybe these are the fields that you want, now you have it in variables for manipulate them: $fullname, $code1, $code2 and $group.
Although maybe the fail that you observed was due to some misplaced quotation mark in the text file or the line breaks, on the attached screenshot I can see one missed quote.

How to pass a variable string to a file txt at the biginig of test?

I have a problem
I Have a program general like this gene.sh
that for all file (es file: geneX.csv) make a directory with the name of gene (example: Genex/geneX.csv) next this program compile an other program inside gene.sh but this progrm need a varieble and I dont know how do it.
this is the program gene.sh
#!/bin/bash
# Create a dictory for each file *.xls and *.csv
for fname in *.xlsx *csv
do
dname=${fname%.*}
[[ -d $dname ]] || mkdir "$dname"
mv "$fname" "$dname"
done
# For each gene go inside the directory and compile the programs getChromosomicPositions.sh to have the positions, and getHapolotipeStings.sh to have the variants
for geni in */; do
cd $geni
z=$(tail -n 1 *.csv | tr ';' "\n" | wc -l)
cd ..
cp getChromosomicPositions.sh $geni --->
cp getHaplotypeStrings.sh $geni
cd $geni
export z
./getChromosomicPositions.sh *.csv
export z
./getHaplotypeStrings.sh *.csv
cd ..
done
This is the program getChromosomichPositions.sh:
rm chrPosRs.txt
grep '^Haplotype\ ID' $1 | cut -d ";" -f 4-61 | tr ";" "\n" | awk '{print "select chrom,chromStart,chromEnd,name from snp147 where name=\""$1"\";"}' > listOfQuery.txt
while read l; do
echo $l > query.txt
mysql -h genome-mysql.cse.ucsc.edu -u genome -A -D hg38 --skip-column-names < query.txt > queryResult.txt
if [[ "$(cat queryResult.txt)" == "" ]];
then
cat query.txt |
while read line; do
echo $line | awk '$6 ~/rs/ {print $6}' > temp.txt;
if [[ "$(cat temp.txt)" != "" ]];
then cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g' > temp.txt;
./getHGSVposHG19.sh temp.txt ---> Hear the problem--->
else
echo $line | awk '{num=sub(/.*:g\./,"");num+=sub(/\".*/,"");if(num==2){print};num=""}' > temp2.txt
fi
done
cat query.txt >> varianti.txt
echo "Missing Data" >> chrPosRs.txt
else
cat queryResult.txt >> chrPosRs.txt
fi
done < listOfQuery.txt
rm query*
hear the problem:
I need to enter in the file temp.txt and put automatically at the beginning of the file the variable $geni of the program gene.sh
How can I do that?
Why not pass "$geni" as say the first argument when invoking your script, and treating the rest of the arguments as your expected .csv files.
./getChromosomicPositions.sh "$geni" *.csv
Alternatively, you can set it as environment variable for the script, so that it can be used there (or just export it).
geni="$geni" ./getChromosomicPositions.sh *.csv
In any case, once you have it available in the second script, you can do
if passed as the first argument:
echo "${1}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')
or if passed as environment variable:
echo "${geni}:$(cat temp.txt | awk -F'name="' '{print $2}' | sed -e 's/";//g')

A script to find all the users who are executing a specific program

I've written the bash script (searchuser) which should display all the users who are executing a specific program or a script (at least a bash script). But when searching for scripts fails because the command the SO is executing is something like bash scriptname.
This script acts parsing the ps command output, it search for all the occurrences of the specified program name, extracts the user and the program name, verifies if the program name is that we're searching for and if it's it displays the relevant information (in this case the user name and the program name, might be better to output also the PID, but that is quite simple). The verification is accomplished to reject all lines containing program names which contain the name of the program but they're not the program we are searching for; if we're searching gedit we don't desire to find sgedit or gedits.
Other issues I've are:
I would like to avoid the use of a tmp file.
I would like to be not tied to GNU extensions.
The script has to be executed as:
root# searchuser programname <invio>
The script searchuser is the following:
#!/bin/bash
i=0
search=$1
tmp=`mktemp`
ps -aux | tr -s ' ' | grep "$search" > $tmp
while read fileline
do
user=`echo "$fileline" | cut -f1 -d' '`
prg=`echo "$fileline" | cut -f11 -d' '`
prg=`basename "$prg"`
if [ "$prg" = "$search" ]; then
echo "$user - $prg"
i=`expr $i + 1`
fi
done < $tmp
if [ $i = 0 ]; then
echo "No users are executing $search"
fi
rm $tmp
exit $i
Have you suggestion about to solve these issues?
One approach might looks like such:
IFS=$'\n' read -r -d '' -a pids < <(pgrep -x -- "$1"; printf '\0')
if (( ! ${#pids[#]} )); then
echo "No users are executing $1"
fi
for pid in "${pids[#]}"; do
# build a more accurate command line than the one ps emits
args=( )
while IFS= read -r -d '' arg; do
args+=( "$arg" )
done </proc/"$pid"/cmdline
(( ${#args[#]} )) || continue # exited while we were running
printf -v cmdline_str '%q ' "${args[#]}"
user=$(stat --format=%U /proc/"$pid") || continue # exited while we were running
printf '%q - %s\n' "$user" "${cmdline_str% }"
done
Unlike the output from ps, which doesn't distinguish between ./command "some argument" and ./command "some" "argument", this will emit output which correctly shows the arguments run by each user, with quoting which will re-run the given command correctly.
What about:
ps -e -o user,comm | egrep "^[^ ]+ +$1$" | cut -d' ' -f1 | sort -u
* Addendum *
This statement:
ps -e -o user,pid,comm | egrep "^\s*\S+\s+\S+\s*$1$" | while read a b; do echo $a; done | sort | uniq -c
or this one:
ps -e -o user,pid,comm | egrep "^\s*\S+\s+\S+\s*sleep$" | xargs -L1 echo | cut -d ' ' -f1 | sort | uniq -c
shows the number of process instances by user.

Resources