appending text to specific line in file bash - bash
So I have a file that contains some lines of text separated by ','. I want to create a script that counts how much parts a line has and if the line contains 16 parts i want to add a new one. So far its working great. The only thing that is not working is appending the ',' at the end. See my example below:
Original file:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
Expected result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
This is my code:
while read p; do
if [[ $p == "HEA"* ]]
then
IFS=',' read -ra ADDR <<< "$p"
echo ${#ADDR[#]}
arrayCount=${#ADDR[#]}
if [ "${arrayCount}" -eq 16 ];
then
sed -i "/$p/ s/\$/,xx/g" $f
fi
fi
done <$f
Result:
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
,xx
What im doing wrong? I'm sure its something small but i cant find it..
It can be done using awk:
awk -F, 'NF==16{$0 = $0 FS "xx"} 1' file
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a
b,b,b,b,b,b
a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,a,xx
-F, sets input field separator as comma
NF==16 is the condition that says execute block inside { and } if # of fields is 16
$0 = $0 FS "xx" appends xx at end of line
1 is the default awk action that means print the output
For using sed answer should be in the following:
Use ${line_number} s/..../..../ format - to target a specific line, you need to find out the line number first.
Use the special char & to denote the matched string
The sed statement should look like the following:
sed -i "${line_number}s/.*/&xx/"
I would prefer to leave it to you to play around with it but if you would prefer i can give you a full working sample.
Related
Unix bash - using cut to regex lines in a file, match regex result with another similar line
I have a text file: file.txt, with several thousand lines. It contains a lot of junk lines which I am not interested in, so I use the cut command to regex for the lines I am interested in first. For each entry I am interested in, it will be listed twice in the text file: Once in a "definition" section, another in a "value" section. I want to retrieve the first value from the "definition" section, and then for each entry found there find it's corresponding "value" section entry. The first entry starts with ' gl_ ', while the 2nd entry would look like ' "gl_ ', starting with a '"'. This is the code I have so far for looping through the text document, which then retrieves the values I am interested in and appends them to a .csv file: while read -r line do if [[ $line == gl_* ]] ; then (param=$(cut -d'\' -f 1 $line) | def=$(cut -d'\' -f 2 $line) | type=$(cut -d'\' -f 4 $line) | prompt=$(cut -d'\' -f 8 $line)) while read -r glline do if [[ $glline == '"'$param* ]] ; then val=$(cut -d'\' -f 3 $glline) | "$project";"$param";"$val";"$def";"$type";"$prompt" >> /filepath/file.csv done < file.txt done < file.txt This seems to throw some syntax errors related to unexpected tokens near the first 'done' statement. Example of text that needs to be parsed, and paired: gl_one\User Defined\1\String\1\\1\Some Text gl_two\User Defined\1\String\1\\1\Some Text also gl_three\User Defined\1\Time\1\\1\Datetime now some\junk "gl_one\1\Value1 some\junk "gl_two\1\Value2 "gl_three\1\Value3 So effectively, the while loop reads each line until it hits the first line that starts with 'gl_', which then stores that value (ie. gl_one) as a variable 'param'. It then starts the nested while loop that looks for the line that starts with a ' " ' in front of the gl_, and is equivalent to the 'param' value. In other words, the script should couple the lines gl_one and "gl_one, gl_two and "gl_two, gl_three and "gl_three. The text file is large, and these are settings that have been defined this way. I need to collect the values for each gl_ parameter, to save them together in a .csv file with their corresponding "gl_ values. Wanted regex output stored in variables would be something like this: first while loop: $param = gl_one, $def = User Defined, $type = String, $prompt = Some Text second while loop: $val = Value1 Then it stores these variables to the file.csv, with semi-colon separators. Currently, I have an error for the first 'done' statement, which seems to indicate an issue with the quotation marks. Apart from this, I am looking for general ideas and comments to the script. I.e, not entirely sure I am looking for the quotation mark parameters "gl_ correctly, or if the semi-colons as .csv separators are added correctly. Edit: Overall, the script runs now, but extremely slow due to the inner while loop. Is there any faster way to match the two lines together and add them to the .csv file? Any ideas and comments?
This will generate a file containing the data you want: cat file.txt | grep gl_ | sed -E "s/\"//" | sort | sed '$!N;s/\n/\\/' | awk -F'\' '{print $1"; "$5"; "$7"; "$NF}' > /filepath/file.csv It uses grep to extract all lines containing 'gl_' then sed to remove the leading '"' from the lines that contain one [I have assumed there are no further '"' in the line] The lines are sorted sed removes the return from each pair of lines awk then prints the required columns according to your requirements Output routed to the file.
LANG=C sort -t\\ -sd -k1,1 <file.txt |\ sed ' /^gl_/{ # if definition N; # append next line to buffer s/\n"gl_[^\\]*//; # if value, strip first column t; # and start next loop } D; # otherwise, delete the line ' |\ awk -F\\ -v p="$project" -v OFS=\; '{print p,$1,$10,$2,$4,$8 }' \ >>/filepath/file.csv sort lines so gl_... appears immediately before "gl_... (LANG fixes LC_TYPE) - assumes definition appears before value sed to help ensure matching definition and value (may still fail if duplicate/missing value), and tidy for awk awk to pull out relevant fields
Trying to create a script that counts the length of a all the reads in a fastq file but getting no return
I am trying go count the length of each read in a fastq file from illumina sequencing and outputting this to a tsv or any sort of file so I can then later also look at this and count the number of reads per file. So I need to cycle down the file and eactract each line that has a read on it (every 4th line) then get its length and store this as an output num=2 for file in *.fastq do echo "counting $file" function file_length(){ wc -l $file | awk '{print$FNR}' } for line in $file_length do awk 'NR==$num' $file | chrlen > ${file}read_length.tsv num=$((num + 4)) done done Currently all I get the counting $file and no other output but also no errors
Your script contains a lot of errors in both syntax and algorithm. Please try shellcheck to see what is the problem. The most issue will be the $file_length part. You may want to call a function file_length() here but it is just an undefined variable which is evaluated as null in the for loop. If you just want to count the length of the 4th line of *.fastq files, please try something like: for file in *.fastq; do awk 'NR==4 {print length}' "$file" > "${file}_length.tsv" done Or if you want to put the results together in a single tsv file, try: tsvfile="read_lenth.tsv" for file in *.fastq; do echo -n -e "$file\t" >> "$tsvfile" awk 'NR==4 {print length}' "$file" >> "$tsvfile" done Hope this helps.
Replace some lines in fasta file with appended text using while loop and if/else statement
I am working with a fasta file and need to add line-specific text to each of the headers. So for example if my file is: >TER1 AGCATGCTAGCTAGTCGACTCGATCGCATGCTC >TER2 AGCATGCTAGCTAGACGACTCGATCGCATGCTC >URC1 AGCATGCTAGCTAGTCGACTCGATCGCATGCTC >URC2 AGCATGCTACCTAGTCGACTCGATCGCATGCTC >UCR3 AGCATGCTAGCTAGTCGACTCGATGGCATGCTC I want a while loop that will read through each line; for those with a > at the start, I want to append |population: plus the first three characters after the >. So line one would be: >TER1|population:TER etc. I can't figure out how to make this work. Here my best attempt so far. filename="testfasta.fa" while read -r line do if [[ "$line" == ">"* ]]; then id=$(cut -c2-4<<<"$line") printf $line"|population:"$id"\n" >>outfile else printf $line"\n">>outfile fi done <"$filename" This produces a file with the original headers and following line each on a single line. Can someone tell me where I'm going wrong? My if and else loop aren't working at all! Thanks!
You could use a while loop if you really want, but sed would be simpler: sed -e 's/^>\(...\).*/&|population:\1/' "$filename" That is, for lines starting with > (pattern: ^>), capture the next 3 characters (with \(...\)), and match the rest of the line (.*), replace with the line as it was (&), and the fixed string |population:, and finally the captured 3 characters (\1). This will produce for your input: >TER1|population:TER AGCATGCTAGCTAGTCGACTCGATCGCATGCTC >TER2|population:TER AGCATGCTAGCTAGACGACTCGATCGCATGCTC >URC1|population:URC AGCATGCTAGCTAGTCGACTCGATCGCATGCTC >URC2|population:URC AGCATGCTACCTAGTCGACTCGATCGCATGCTC >UCR3|population:UCR AGCATGCTAGCTAGTCGACTCGATGGCATGCTC Or you can use this awk, also producing the same output: awk '{sub(/^>.*/, $0 "|population:" substr($0, 2, 3))}1' "$filename"
You can do this quickly in awk: awk '$1~/^>/{$1=$1"|population:"substr($1,2,3)}{}1' infile.txt > outfile.txt $ awk '$1~/^>/{$1=$1"|population:"substr($1,2,3)}{}1' testfile >TER1|population:TER AGCATGCTAGCTAGTCGACTCGATCGCATGCTC >TER2|population:TER AGCATGCTAGCTAGACGACTCGATCGCATGCTC >URC1|population:URC AGCATGCTAGCTAGTCGACTCGATCGCATGCTC >URC2|population:URC AGCATGCTACCTAGTCGACTCGATCGCATGCTC >UCR3|population:UCR AGCATGCTAGCTAGTCGACTCGATGGCATGCTC Here awk will: Test if the record starts with a > The $1 looks at the first field, but $0 for the entire record would work just as well in this case. The ~ will perform a regex test, and ^> means "Starts with >". Making the test: ($1~/^>/) If so it will set the first field to the output you are looking for (using substr() to get the bits of the string you want. {$1=$1"|population:"substr($1,2,3)} Finally it will print out the entire record (with the changes if applicable): {}1 which is shorthand for {print $0} or.. print the entire record.
Extracting value from a flat file using shell script
I'm trying to extract the value present between brackets in the last row of a flat file e.g. " last_line (4) ". This is the last line and I want to extract 4 and store it in a variable. I have extracted the last row using tail command but now I am unable to extract the value between the brackets. Kindly help.
Using awk: $ cat input first line 2nd line last line (4) with some data $ awk -F'[()]' 'END{print $2}' input 4
l=$(tail -n1 filename); tmp=${l##*(}; tmp=${tmp%)*}; printf "tmp: %s\n" $tmp Output tmp: 4 Written in script format, you are using substring removal to trim everything up to the first ( and everything after the last ) from the last line, leaving only 4: l=$(tail -n1 filename) ## get the last line tmp=${l##*(} ## trim to ( from left tmp=${tmp%)*} ## trim to ) from right printf "tmp: %s\n" $tmp
sed: sed -n '${s/.*(//;s/).*//;p}' file
U can use this script. In this script i saved the last line in a tmp file and at last removed it. the number between the brackets() is in variable WORD #!/bin/ksh if test "${DEBUG}" = "Y" then set -vx fi tail -1 input>>tmp WORD=`sed -n 's/.*(//;s/).*//;p' tmp` echo $WORD rm tmp
Use awk to parse source code
I'm looking to create documentation from source code that I have. I've been looking around and something like awk seems like it will work, but I've had no luck so far. The information is split in two files, file1.c and file2.c. Note: I've set up an automatic build environment for the program. This detects changes in the source and builds it. I would like to generate a text file containing a list of any variables which have been modified since the last successful build. The script I'm looking for would be a post-build step, and would run after compilation In file1.c I have a list of function calls (all the same function) that have a string name to identify them such as: newFunction("THIS_IS_THE_STRING_I_WANT", otherVariables, 0, &iAlsoNeedThis); newFunction("I_WANT_THIS_STRING_TOO", otherVariable, 0, &iAnotherOneINeed); etc... The fourth parameter in the function call contains the value of the string name in file2. For example: iAlsoNeedThis = 25; iAnotherOneINeed = 42; etc... I'm looking to output the list to a txt file in the following format: THIS_IS_THE_STRING_I_WANT = 25 I_WANT_THIS_STRING_TOO = 42 Is there any way of do this? Thanks
Here is a start: NR==FNR { # Only true when we are reading the first file split($1,s,"\"") # Get the string in quotes from the first field gsub(/[^a-zA-Z]/,"",$4) # Remove the none alpha chars from the forth field m[$4]=s[2] # Create array next } $1 in m { # Match feild four from file1 with field one file2 sub(/;/,"") # Get rid of the ; print m[$1],$2,$3 # Print output } Saving this script.awk and running it with your example produces: $ awk -f script.awk file1 file2 THIS_IS_THE_STRING_I_WANT = 25 I_WANT_THIS_STRING_TOO = 42 Edit: The modifications you require affects the first line of the script: NR==FNR && $3=="0," && /start here/,/end here/ {
You can do it in the shell like so. #!/bin/sh eval $(sed 's/[^a-zA-Z0-9=]//g' file2) while read -r line; do case $line in (newFunction*) set -- $line string=${1#*\"} string=${string%%\"*} while test $# -gt 1; do shift; done x=${1#&} x=${x%);} eval x=\$$x printf '%s = %s\n' $string $x esac done < file1.c Assumptions: newFunction is at the start of the line. Nothing follows the );. Whitespace exactly as in your samples. Output THIS_IS_THE_STRING_I_WANT = 25 I_WANT_THIS_STRING_TOO = 42
You can execute file file2.c so variables will be defined in bash. Then, you will just have to print $iAlsoNeedThis to get value from iAlsoNeedThis = 25; It can be done with . file2.c. Then, what you can do is: while read line; do name=$(echo $line | cut -d"\"" -f2); value=$(echo $line | cut -d"&" -f2 | cut -d")" -f1); echo $name = ${!value}; done < file1.c to get the THIS_IS_THE_STRING_I_WANT, I_WANT_THIS_STRING_TOO text.