Bash how to copy the next value from csv - bash

i have this CSV file. how can i copy the first row and the next 2 values. server.2=cl1z2 server.3=cl1z3. i want to repeat this task with the other Values. (e.g. third row and server.2=cl2z2 server.3=cl2z3 )
z_name;z_hosts;z_clientPort;z_leaderPort;z_electionport;tickTime;initLimit;syncLimit;snapRetainCount;purgeInterval
cl1z1;server.1=cl1z1;2180;2890;3890;200;5;2;3;24
cl1z2;server.2=cl1z2;2181;2891;3891;200;5;2;3;24
cl1z3;server.3=cl1z3;2182;2892;3892;200;5;2;3;24
cl2z1;server.1=cl2z1;2183;2893;3893;200;5;2;3;24
cl2z2;server.2=cl2z2;2184;2894;3894;200;5;2;3;24
cl2z3;server.3=cl2z3;2185;2895;3895;200;5;2;3;24
Thanks for Help.

You can try this
result=""
while read line ; do
if echo "$line" | grep -q 'z_name'; then
result+="$line\n"
else
result+=$(echo "$line" | cut -d';' -f2 | cut -d';' -f1)"\n"
fi
done < your_csv_file
echo -e "$result"

Related

How can I make cut take less time to process?

I have a text file with a whole bunch of lines (1000 exactly) and they all have 4 bits of text, seperated by a ;.
Here is the for loop I'm using, to go through each line:
while IFS= read -r line; do
let liner++
if [[ liner -eq "1" ]]; then
continue
fi
name=$(echo "${line}" | cut -d';' -f1)
fullname=$(echo "${line}" | cut -d';' -f2)
id=$(echo "${line}" | cut -d';' -f3)
test=$(echo "${line}" | cut -d';' -f4)
echo "${GREEN}$(($liner-1))) ${name} ${ORANGE}v${test} ${RED}(${id})${NC}"
stuff+=("${fullname}")
done < list.txt
It takes about 5 seconds before it finishes running and I believe it's from all those cut (name, fullname, id, test) variables. What would be the best solution to speed this up?
Awk undoubtedly provides a better solution, but if you don't want to learn Awk right now, you could speed your function up a lot by just using read to split the lines into fields:
liner=0
stuff=()
while IFS=\; read -r name fullname id test; do
echo "$GREEN$((++liner))) $name ${ORANGE}v$test $RED($id)$NC"
stuff+=("$fullname")
done < <(tail -n+2 1000num.txt)

Bash while loop: Preventing third-party commands to read from stdin

Assume an input table (intable.csv) that contains ID numbers in its second column, and a fresh output table (outlist.csv) into which the input file - extended by one column - is to be written line by line.
echo -ne "foo,NC_045043\nbar,NC_045193\nbaz,n.a.\nqux,NC_045054\n" > intable.csv
echo -n "" > outtable.csv
Further assume that one or more third-party commands (here: esearch, efetch; both part of Entrez Direct) are employed to retrieve additional information for each ID number. This additional info is to form the third column of the output table.
while IFS="" read -r line || [[ -n "$line" ]]
do
echo -n "$line" >> outtable.csv
NCNUM=$(echo "$line" | awk -F"," '{print $2}')
if [[ $NCNUM == NC_* ]]
then
echo "$NCNUM"
RECORD=$(esearch -db nucleotide -query "$NCNUM" | efetch -format gb)
echo "$RECORD" | grep "^LOCUS" | awk '{print ","$3}' | \
tr -d "\n" >> outtable.csv
else
echo ",n.a." >> outtable.csv
fi
done < intable.csv
Why does the while loop iterate only over the first input table entry under the above code, whereas it iterates over all input table entries if the code lines starting with RECORD and echo "$RECORD" are commented out? How can I correct this behavior?
This would happen if esearch reads from standard input. It will inherit the input redirection from the while loop, so it will consume the rest of the input file.
The solution is to redirect is standard input elsewhere, e.g. /dev/null.
while IFS="" read -r line || [[ -n "$line" ]]
do
echo -n "$line" >> outtable.csv
NCNUM=$(echo "$line" | awk -F"," '{print $2}')
if [[ $NCNUM == NC_* ]]
then
echo "$NCNUM"
RECORD=$(esearch -db nucleotide -query "$NCNUM" </dev/null | efetch -format gb)
echo "$RECORD" | grep "^LOCUS" | awk '{print ","$3}' | \
tr -d "\n" >> outtable.csv
else
echo ",n.a." >> outtable.csv
fi
done < intable.csv

exiting an IF statement after initial match bash scripting

I have a script which iterates through a file and finds matches in another file. How to I get the process to stop once I've found a match.
For example:
I take the first line in name.txt, and then try to find a match for it in file.txt.
name.txt:
7,7,FRESH,98,135,
65,10,OLD,56,45,
file.txt:
7,7,Dave,S
8,10,Frank,S
31,7,Gregg
45,5,Jake,S
Script:
while read line
do
name_id=`echo $line | cut -f1,2 -d ','`
identiferOne=`echo $name_id | cut -f1 -d ','`
identiferTwo=`echo $name_id | cut -f2 -d ','`
while IFS= read line
do
CHECK=`echo $line | cut -f4 -d','`
if [ $CHECK = "S" ]
then
symbolName=`echo $line | cut -f3 -d ','`
numberOne=`echo $line | awk -F',' '{print $1}'`
numberTwo=`echo $line | cut -f2 -d ','`
if [ "$numberOne" == $identiferOne ] && [ "$numberTwo" == $identifierTwo ]
then
echo "WE HAVE A MATCH with $symbolName"
break
fi
fi
done < /tmp/file.txt
done < /tmp/name.txt
My question is - how do I stop the script from iterating through file.txt once it has found an initial match, and then set that matched record into a variable, stop the if statement, then do some other stuff within the loop using that variable. I tried using break; but that exits the loop, which is not what I want.
You can tell grep different things:
Stop searching after the first match (option -m 1).
Read the searchkeys from a file (option -f file).
Pretend that the output of a command is a file (not really grep, bash helps here) with <(cmmnd).
Combining these will give you
grep -m1 -f <(cut -d"," -f1-2 name.txt) file.txt
Close, but not what you want. The substrings given by cut -d"," -f1-2 name.txt will match everywhere in the line, and you want to match the first two fields. Matching at the start of the line is done with ^, so we use sed to make strings like ^field1,field2 :
grep -m1 -f <(sed 's/\([^,]*,[^,]*,\).*/^\1/' name.txt) file.txt

shell script - select CSV lines by float value

I'm stuck with a strange behavior while reading a CSV file and selecting its lines with a specific column float value.
Here's an extract from the input file.
ben#truc:$ head summary.fasta.csv
scf7180000753635;170043549;XP_001849446.1;27.72;184;2e-13;74.7
scf7180000753636;340728919;XP_003402759.1;25.78;322;8e-19;93.6
scf7180000753642;328716306;XP_003245892.1;33.51;191;7e-27;119
scf7180000753642;512919417;XP_004929373.1;43.18;132;1e-23;108
scf7180000753642;512914080;XP_004928052.1;40.16;127;5e-21;94.7
scf7180000753664;328696819;XP_003240139.1;37.99;179;2e-23;107
scf7180000753664;328696819;XP_003240139.1;26.67;30;2e-23;25.4
scf7180000753664;328703138;XP_003242103.1;31.65;218;1e-20;99.4
scf7180000753669;383855900;XP_003703448.1;68.92;74;2e-23;102
scf7180000753669;380030611;XP_003698937.1;72.06;68;3e-22;99.8
Here's my shell script code:
#!/bin/sh
echo "extracting the values"
# prepare output files
echo "" > "40_sequence_identity.csv"
echo "" > "60_sequence_identity.csv"
echo "" > "80_sequence_identity.csv"
while read -r line; do
#debug: check if line is correclty read
echo $line
#attribute each CSV column value to a variable
query=`echo $line | cut -d ';' -f1`
gi=`echo $line | cut -d ';' -f2`
refseq=`echo $line | cut -d ';' -f3`
seq_identity=`echo $line | cut -d ';' -f4`
align_length=`echo $line | cut -d ';' -f5`
evalue=`echo $line | cut -d ';' -f6`
score=`echo $line | -d ';' -f7`
#debug: check if cut command is OK
echo "seqidentity:"$seq_identity
# test float value of column 4, if superior to a threshold, write the line in a specific line
if [ $( echo "$seq_identity >= 40" | bc ) ]; then
echo "$line" >> "40_sequence_identity.csv"
fi
if [ $( echo "$seq_identity >= 60" | bc ) ]; then
echo "$line" >> "60_sequence_identity.csv"
fi
if [ $( echo "$seq_identity >= 80" | bc ) ]; then
echo "$line" >> "80_sequence_identity.csv"
fi
done < "summary.fasta.csv"
echo "DONE!"
And here's the strange outputs.
extracting the values
scf7180000753635;170043549;XP_001849446.1;27.72;184;2e-13;74.7
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:27.72
scf7180000753636;340728919;XP_003402759.1;25.78;322;8e-19;93.6
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:25.78
scf7180000753642;328716306;XP_003245892.1;33.51;191;7e-27;119
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:33.51
scf7180000753642;512919417;XP_004929373.1;43.18;132;1e-23;108
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:43.18
scf7180000753642;512914080;XP_004928052.1;40.16;127;5e-21;94.7
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:40.16
scf7180000753664;328696819;XP_003240139.1;37.99;179;2e-23;107
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:37.99
scf7180000753664;328696819;XP_003240139.1;26.67;30;2e-23;25.4
./create_project_directories.sh: 1: ./create_project_directories.sh: -d: not found
seqidentity:26.67
First, the 3 output files (blast_summary_superior_40_sequence_identity.csv ...) contain all the lines, as if the tests didn't work.
Second, the file parsing seems OK, but this strange message: -d: not found , comes from nowhere.Though, it appears before the 'echo' displaying the value of $seqidentity and is probably related to the cut command.
Any idea why I have such output ?
When I manually execute the commands in the console, this works.
But not when executing the whole script.
Thanks for your help.
You are getting Error : -d: not found because on line number 17 command is incomplete
score=`echo $line | -d ';' -f7`
So it should be :
score=$(echo $line | cut -d ';' -f7)

shell scripting pipline to a variable

I have the following:
FILENAME=$1
cat $FILENAME | while read LINE
do
response="$LINE" | cut -c1-14
request="$LINE" | cut -c15-31
difference=($response - $request)/1000
echo "$difference"
done
When I run this script it returns blank lines. What am I doing wrong?
Might be simpler in awk:
awk '{print ($1 - $2)/1000}' "$1"
I'm assuming that the first 14 chars and the next 17 chars are the first two blank-separated fields.
You need to change it to:
response=`echo $LINE | cut -c1-14`
request=`echo $LINE | cut -c15-31`
difference=`expr $response - $request`
val=`expr $difference/1000`
You are basically doing everything wrong ;)
This should be better:
FILENAME="$1"
cat "$FILENAME" | while read LINE
do
response=$(echo "$LINE" | cut -c1-14) # or cut -c1-14 <<< "$line"
request=$(echo "$LINE" | cut -c15-31)
difference=$((($response - $request)/1000)
echo "$difference"
done

Resources