I have recently finished running a FeatureCounts script on a fasta file using a simple annotation file (SAF), which resulted in a table with a row for each feature, and the columns show it's location, length and number of reads from all samples. I would like to calculate the FPKM values for all features in each and everyone on the samples. Is there a script or program available which does that?
You can use https://github.com/AAlhendi1707/countToFPKM function to calculate the same
Related
I would like to structure my long format SPSS file so I can clean it and get a better overview. However, I run into some problems.
How can i create a new veriable counting the complation moments/waves/follow-up moments. I only have a completion data avaible in my dataset. Please open my image for a more explanation.
Preferably a numbering that continues counting if a year is missing.
If I understand right, you want the new variable to be an index of the year of involvement for each patient, as opposed to an index of data row per patient. To do this we can calculate for each entry the difference in years between the entry and the first entry of that patient:
(this assumes your dates are in date format)
compute year=xdate.year(OpenInvulMomenten).
aggregate /outfile=* mode=addvariables /break=PatientIdPseudo /firstYear=min(year).
compute newvar=1+year-firstYear.
exe.
I have an excel file which is non-system generated report format.
I wish to calculate and generate another new output.
Given the Report format as below:-
1) Inside the query when load this excel file, how can I create a new column to copy and paste on the first found value (1#51) at column at the next record, if the next record is empty. Once, if detected a new value (1#261) then copy and paste to the subsequent null value of few next records till this end?
2) The final aim is to generate a new output to auto match/calculate the money to be assign to different reference. As shown below:-
The reference A ~ E is sharing the 3 bank Ref (28269,28542 & RMP) , was thinking to read the same data source a few times, first time to read the column A ~ O(QueryRef) and 2nd time to read the same source to read from A, Q ~ V(QueryBank).
After this I do not have idea how I can allocate the $$ from Query Bank to QueryRef based on the Sum of Total AR.
Eg,
Total Amt of BankRef 28269, $57,044.67 is sufficient to cover Ref#A $10,947.12
BankRef 28269 still sufficient to cover Ref#B $27,647.60
BankRef 28269 left only $18,449.95 , hence the balance of 28269 be allocate to Ref#C.
Remaining balance of Ref#C will need to use BankRef28542 to cover,i.e. $1,812.29
Ref#D will then be allocated of the remaining balance of BankRef28542, i.e. $4,595.32
Ref#D still left $13,350.03 unallocated, hence this will use BankRef#RMP
Ref#E only need $597.66, and BankRef#RMP is sufficient to cover this.
I am not sure if my above case study can be solved using power query or not, due to me still being a newbie # Power Query? Or this is too complicate to handle hence we need to write a program to auto matching this kinds of scenario?
Attached is the sample source file and output :
https://www.dropbox.com/sh/dyecwcdz2qg549y/AACzezsXBwAf8eHUNxxLD1eWa?dl=0
Any advice/opinion/guidance is very much appreciated.
Answering question one:
You have a feature in Powerquery called FILL, DOWN or UP.
For a selected column you can copy the first non empty value to all rows under until a new non empty row is found and so on.
Requirement: I need to sort an input file based on Date.
The date is in YYYYMMDD format starting at 56th Position in the flat file.
Now, the I am trying to write a sort card which writes all the records that have the date(YYYYMMDD) in the past 7 Days.
Example: My job is running on 20181007, it should fetch all the records that have date in between 20181001 to 20181007.
Thanks in advance.
In terms of DFSort you can use the following filter to select the current date as a relative value. For instance:
OUTFIL INCLUDE=(56,8,CH,GE,DATE1-7)
There are several definitions for Dates in various formats. I assume that since you are referring to a flat file the date is in a character format and not zoned decimal or other representation.
For DFSort here is a reference to the include statement
Similar constructs exist for other sort products. Without specifics about the product your using this is unfortunately a generic answer.
I am calculating sum of all sales order (by multiplying quantity and price of a sales order - assume one sale order has only one item and using the sum function) in SQL query and I am spooling the output to a CSV file by using spool C:\scripts\output.csv.
The numeric output I get is truncated/rounded e.g. the SQL output 122393446 is made available in CSV as 122400000.
I tried to google and search on stackoverflow, but I could not get any hints about what can be done to prevent this.
Any clues?
Thanks
I think it is a xls issue.
Save as xls.
format column -> number with 2 decimals for example.
Initially I thought it might have something to do with the width of the number format which normally is 10 (NUMWIDTH) in sqlplus, but your result numeric width is 9, so that can not be the problem. Please check your query if you use a numeric type that doesn't have the required precission, and thus makes inexact calculations.
I have a transaction log file in CSV format that I want use to run statistics. The log has the following fields:
date: Time/date stamp
salesperson: The username of the person who closed the sale
promo: sum total of items in the sale that were promotions.
amount: grand total of the sale
I'd like to get the following statistics:
salesperson: The username of the salesperson being analyzed.
minAmount: The smallest grand total of this salesperson's transaction.
avgAmount: The mean grand total..
maxAmount: The largest grand total..
minPromo: The smallest promo amount by the salesperson.
avgPromo: The mean promo amount...
I'm tempted to build a database structure, import this file, write SQL, and pull out the stats. I don't need anything more from this data than these stats. Is there an easier way? I'm hoping some bash script could make this easy.
TxtSushi does this:
tssql -table trans transactions.csv \
'select
salesperson,
min(as_real(amount)) as minAmount,
avg(as_real(amount)) as avgAmount,
max(as_real(amount)) as maxAmount,
min(as_real(promo)) as minPromo,
avg(as_real(promo)) as avgPromo
from trans
group by salesperson'
I have a bunch of example scripts showing how to use it.
Edit: fixed syntax
Could also bang out an awk script to do it. It's only CSV with a few variables.
You can loop through the lines in the CSV and use bash script variables to hold your min/max amounts. For the average, just keep a running total and then divide by the total number of lines (not counting a possible header).
Here are some useful snippets for working with CSV files in bash.
If your data might be quoted (e.g. because a field contains a comma), processing with bash, sed, etc. becomes much more complex.