For loop with two variables using zip, still too many values to unpack - python-2.x

I have two lists, one of which a set of row numbers and another that is .csv file paths. I need to create a loop which will create new dataframes for each file path with the row number I specify.
I have script which creates the new dataframes however I am stuck on the loop.
I have tried using zip for each variable, however I still get too many values to unpack
MYLIST = [42, 50, 52, 59, 60, 62]
ID = '/Users/uni/Desktop/corrindex+id/rt35'
for X,Y in zip(ID,MYLIST):
df= pd.read_csv(X,
index_col= False,
header=None,
nrows=max(my_list) + 1).loc[Y], engine = 'python'
ValueError: too many values to unpack
This is the error I'm receiving, I'm at a loss as to why.

Related

"Different row counts implied by arguments" in attempt to plot BAM file data

I'm attempting to use this tutorial to manipulate and plot ATAC-sequencing data. I have all the libraries listed in that tutorial installed and loaded, except while they use biocLite(BSgenome.Hsapiens.UCSC.hg19) for the human genome, I'm using biocLite(TxDb.Mmusculus.UCSC.mm10.knownGene) for the mouse genome.
Here I have loaded in my BAM file
sorted_AL1.1BAM <-"Sorted_1_S1_L001_R1_001.fastq.gz.subread.bam"
And created an object called TSS, which is transcription start site regions from the mouse genome. I want to ultimately plot the average signal in my read data across mouse transcription start sites.
TSSs <- resize(genes(TxDb.Mmusculus.UCSC.mm10.knownGene), fix = "start", 1)
The problem occurs with the following code:
nucFree <- regionPlot(bamFile = sorted_AL1.1BAM, testRanges = TSSs, style = "point",
format = "bam", paired = TRUE, minFragmentLength = 0, maxFragmentLength = 100,
forceFragment = 50)
The error is as follows:
Reading Bam header information.....Done
Filtering regions which extend outside of genome boundaries.....Done
Filtered 24528 of 24528 regions
Splitting regions by Watson and Crick strand..Error in DataFrame(..., check.names = FALSE) :
different row counts implied by arguments
I assume my BAM file contains empty values that need to be changed to NAs. My issue is that I'm not sure how to visualize and manipulate BAM files in R in order to do this. Any help would be appreciated.
I tried the following:
data.frame(sorted_AL1.1BAM)
sorted_AL1.1BAM[sorted_AL1.1BAM == ''] <- NA
I expected this to resolve the issue of different row counts, but I get the same error message.

Lua - Create a nested table using for loop

I'm a very new to lua so am happy to read material if it will help with tables.
I've decoded a json object and would like to build a table properly using its data, rather than writing 64 lines of the below:
a = {}
a[decode.var1[1].aId] = {decode.var2[1].bId, decode.var3[1].cId}
a[decode.var1[2].aId] = {decode.var2[2].bId, decode.var3[2].cId}
a[decode.var1[3].aId] = {decode.var2[3].bId, decode.var3[3].cId}
...etc
Because the numbers are consecutive 1-64, i presume i should be able to build it using a for loop.
Unfortunately despite going through table building ideas I cannot seem to find a way to do it, or find anything on creating nested tables using a loop.
Any help or direction would be appreciated.
Lua for-loops are, at least in my opinion, pretty easy to understand:
for i = 1, 10 do
print(i)
end
This loop inclusively prints the positive integers 1 through 10.
Lua for-loops also take an optional third argument--which defaults to 1--that indicates the step of the loop:
for i = 1, 10, 2 do
print(i)
end
This loop prints the numbers 1 through 10 but skips every other number, that is, it has a step of 2; therefore, it will print 1 3 5 7 9.
In the case of your example, if I understand it correctly, it seems that you know the minimum and maximum bounds of your for loops, which are 1 and 64, respectively. You could write a loop to decode the values and put them in a table like so:
local a = {}
for i = 1, 64 do
a[decodevar.var1[i].aId] = {decode.var2[i].bId, decode.var3[i].cId}
end
What you can do is generating a new table with all the contents from the decoded JSON with a for loop.
For example,
function jsonParse(jsonObj)
local tbl = {}
for i = 1, 64 do
a[decodevar.var1[i].aId] = {decode.var2[i].bId, decode.var3[i].cId}
end
return tbl
end
To deal with nested cases, you can recursively call that method as follows
function jsonParse(jsonObj)
local tbl = {}
for i = 1, 64 do
a[decodevar.var1[i].aId] = {decode.var2[i].bId, decode.var3[i].cId}
if type(decode.var2[i].bId) == "table" then
a[decodevar.var1[i].aid[0] = jsonParse(decode.var2[i].bId)
end
end
end
By the way, I can't understand why are you trying to create a table using a table that have done the job you want already. I assume they are just random and you may have to edit the code with the structure of the decodevar variable you have

IDL: reading multiple DICOM images save them in .dat file

I'm writing a program in IDL to read DICOM images, then store them in a big matrix and finally save them in .dat file. The DICOMs are under the name IM0,IM1,IM2,..IM21777. I wrote the code below but I am getting an error. I am using IDL version 6.4.
files = file_search('E:\SE7\IM*)
n_files = n_elements(files)
full_data = fltarr(256,256,n_files)
for i=0L, n_files-1 do begin
full_data[*,*,i] = read_dicom('E:\SE7\IM')
endfor
path = 'E:\'
open, 1, path + "full_data.dat'
writeu, 1, full_data
close, 1
I am not sure how to loop over the DICOM name i.e. IM0, IM1,IM2 etc
After I store them in the big matrix (i.e. full_data =[256,256,2178]) I would like to make the 3D matrix 4D. Is that possible? I would like to make it have the dimensions [256, 256, 22, 99] i.e. 2178/99.
I'm not sure what error you are getting, but you are missing a quotation mark in the first line. It should be:
files = file_search('E:\SE7\IM*')
To loop over the DICOM name, you can string concatenate the loop index using + and STRTRIM() as follows:
for i=0L, n_files-1 do begin
full_data[*,*,i] = read_dicom('E:\SE7\IM'+STRTRIM(i,2))
endfor
Finally, to turn your (256,256,2178) matrix into a (256,256,22,99) matrix, use REBIN:
final_data = REBIN(full_data, 256, 256, 20, 99)
Depending on the way you want the dimensions arranged, you may need additional operations. This post is a great primer on how to manipulate arrays and their dimensionality: Coyote Dimensional Juggling Tutorial.

In a CSV file, how can a Python coder remove all but an X number of duplicates across rows?

Here is an example CSV file for this problem:
Jack,6
Sam,10
Milo,9
Jacqueline,7
Sam,5
Sam,8
Sam,10
Let's take the context to be the names and scores of a quiz these people took. We can see that Sam has taken this quiz 4 times but I want to only have an X number of the same person's result (They also need to be the most recent entries). Let's assume we wanted no more than 3 of the same person's results.
I realised it probably wouldn't be possible to achieve having no more than 3 of each person's result without some extra information. Here is the updated CSV file:
Jack,6,1793
Sam,10,2079
Milo,9,2132
Jacqueline,7,2590
Sam,5,2881
Sam,8,3001
Sam,10,3013
The third column is essentially the number of seconds from the "Epoch", which is a reference point for time. With this, I thought I could simply sort the file in terms of lowest to highest for the epoch column and use set() to remove all but a certain number of duplicates for the name column while also removing the removed persons score as well.
In theory, this should leave me with the 3 most recent results per person but in practice, I have no idea how I could adapt the set() function to do this unless there is some alternative way. So my question is, what possible methods are there to achieve this?
You could use a defaultdict of a list, and each time you add an entry check the length of the list: if it's more than three items pop the first one off (or do the check after cycling through the file). This assumes the file is in time sequence.
from collections import defaultdict
# looping over a csv file gives one row at a time
# so we will emulate that
raw_data = [
('Jack', '6'),
('Sam', '10'),
('Milo', '9'),
('Jacqueline', '7'),
('Sam', '5'),
('Sam', '8'),
('Sam', '10'),
]
# this will hold our information, and works by providing an empty
# list for any missing key
student_data = defaultdict(list)
for row in raw_data: # note 1
# separate the row into its component items, and convert
# score from str to int
name, score = row
score = int(score)
# get the current list for the student, or a brand-new list
student = student_data[name]
student.append(score)
# after addeng the score to the end, remove the first scores
# until we have no more than three items in the list
if len(student) > 3:
student.pop(0)
# print the items for debugging
for item in student_data.items():
print(item)
which results in:
('Milo', [9])
('Jack', [6])
('Sam', [5, 8, 10])
('Jacqueline', [7])
Note 1: to use an actual csv file you want code like this:
raw_file = open('some_file.csv')
csv_file = csv.reader(raw_file)
for row in csv_file:
...
To handle the timestamps, and as an alternative, you could use itertools.groupby:
from itertools import groupby, islice
from operator import itemgetter
raw_data = [
('Jack','6','1793'),
('Sam','10','2079'),
('Milo','9','2132'),
('Jacqueline','7','2590'),
('Sam','5','2881'),
('Sam','8','3001'),
('Sam','10','3013'),
]
# Sort by name in natural order, then by timestamp from highest to lowest
sorted_data = sorted(raw_data, key=lambda x: x[0], -int(x[2]))
# Group by user
grouped = groupby(sorted_data, key=itemgetter(0))
# And keep only three most recent values for each user
most_recent = [(k, [v for _, v, _ in islice(grp, 3)]) for k, grp in grouped]

How do I skip certain columns when parsing a text file with Ruby?

I have to parse a tab-delimited text file with Ruby to extract some data from it. For some unknown reason some columns aren't used and are just essentially spacers; I'd like to ignore these columns since I don't need their output (however I can't just ignore all empty columns since some legitimate columns have empty values). I know the indices of these columns already (e.g. columns 6, 14, 24 and 38).
While I could just add a conditional while I'm parsing the file and say parse this unless it's one of those columns, this doesn't seem very "Rubyish" - is there a better and more elegant way to handle this? RegExps, perhaps? I thought of doing something like [6, 14, 24, 38].each { |x| columns.delete_at(x) } to remove the unused columns, but this will force me to redetermine the indices of the columns which I actually need. What I'd really like to do is just loop through the whole thing, checking the index of the current column and ignore it if it's one of the "bad" ones. However it seems very ugly to have code like unless x == 6 || x == 14 || x == 24 || x == 38
No need for a massive conditional like that.
bad_cols = [6, 14, 24, 38]
columns.each_with_index do |val,idx|
next if bad_cols.include? idx
#process the data
end

Resources