How to create dataframe from ordered dictionary? - ordereddict

I have an ordered dictionary which has 4 keys and multiple values. I tried to create the dataframe like this
df = pd.DataFrame(items, index=[0])
print('\ndf is ',df)
But this triggers ValueError, as the multiple values from the dictionary don't match.
The ordered dictionary is below:
OrderedDict([('Product', 'DASXZSDASXZS'), ('Region', ['A', 'B', 'C']), ('Items', ['1', '2', '3']), ('Order', ['123', '456', '789'])])
I want the dataframe format to be like:
Product Region Items Order
DASXZSDASXZS A 1 123
DASXZSDASXZS B 2 456
...
How can I achieve this format for the dataframe?

Not enough rep to comment. Why do you try to specify index=[0]?
Simply doing
df = pd.DataFrame(items)
works; if you want to change the index, you can set it later with df.set_index(...)

#viktor_dmitry your comment to #Battleman links to external data, here's a solution.
In https://www.codepile.net/pile/GY336DYN you have a list of OrderedDict entries, in the example above you just had 1 OrderedDict. Each needs to be treated as a separate DataFrame construction. From the resulting list you use concat to get a final DataFrame:
ods = [OrderedDict([('MaterialNumber', '2XV9450-1AR24'), ('ForCountry'...]),
OrderedDict([('MaterialNumber', ...),
...]
new_df = pd.concat([pd.DataFrame(od) for od in ods])
# new_df has 4 columns and many rows
Note also that 1 of your example items is invalid, you'd need to filter this out, the rest appear to be fine:
ods[21]
OrderedDict([('MaterialNumber', '4MC9672')]) # lacks the rest of the columns!

Related

Simpler alternative to simultaneously Sort and Filter by column in Google Spreadsheets

I have a spreadsheet (here's a copy) with the following (headered) columns:
A: Indices for a list of groceries;
B: Names for the groceries to be indexed by column A;
C: Check column with "x" for inactive items in column B, empty otherwise;
D: Sorting indices that I want to apply to column B;
Currently, I am getting the sorted AND filtered result with this formula:
=SORT(FILTER(B2:B; C2:C = ""); FILTER(D2:D; C2:C = ""); TRUE)
The problem is that I need to apply the filter two times: one for the items and one for the indices, otherwise I get a mismatch between elements for the Sort function.
I feel that this doesn't scale well since it creates duplication.
Is there a way to get the same results with a simpler formula or another arrangement of columns?
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D=""))
=SORT(FILTER({Itens!B2:B\Itens!G2:G}; Itens!D2:D="");2;1)
or maybe: =SORT(FILTER(Itens!B2:B; Itens!D2:D="");2;1)

pymysql-How to return the results of a query in the form of a list of tuples

Lets say I have this code:
sql_query="select actor.actor_id from actor where actor='%s'"
cursor.execute=(sql_query,(actorID))
result=cursor.fetchall()
return(result)
What should I do to my code so the results are in the form of a list of tuples?Also I want the first tuple to be the name of the columns of my query.
For example: [(“Name”, “Id”,),(“Jim”,7,),(“Tom”,13,)]
Here is a sample example:
cur.execute('''SELECT * FROM patient_login''')
results = cur.fetchall()
nested_tuple_list = []
for result in results:
nested_tuple_list.append(result)
print(nested_tuple_list)
As you can see, we enter our select statement with cur.execute(). We fetch all of the results and store them in the variable results. We run a for loop, and each result will be a tuple of a result in our DB. We then append them to the end of the list. When we print the results, here is the output:
[(4, 'sikudabo', 'monkey1'), (83, 'sikudabo2', 'monkey2')]
We end up with a list of tuples.
Here is another answer that simple grabs the column names and stores them in a nested tuple:
cur.execute('''DESCRIBE patient_login''')
results = cur.fetchall()
nested_tuple_list = []
nested_tuple_list_2 = []
for result in results:
result = ((result[0]))
nested_tuple_list.append(result)
nested_tuple_list = tuple(nested_tuple_list)
nested_tuple_list_2.append(nested_tuple_list)
print(nested_tuple_list_2)
The describe command in SQL will Describe the table by telling you which columns exist within the table, and various characteristics of the table such as primary key, datatype ect. This command will return a tuple of the described data. Here we search each result and grab the first index in the results for each of the nested tuples. The first index corresponds with the column name in the for loop. We can append this to the first empty list. After we append all of the column names, we change the list to a tuple. We then append that tuple to the list and have all of the column names in the list within a tuple. Here is the output:
[('ID', 'username', 'password')]
If you want each element in the list to be a tuple in it of itself, here is the code:
nested_tuple_list = tuple(nested_tuple_list)
nested_tuple_list_2 = [(x,) for x in nested_tuple_list]
print(nested_tuple_list_2)
We can do a list comprehension with x representing each column, and here is our output for each of my columns in the DataFrame:
[('ID',), ('username',), ('password',)]

Pig latin join by field

I have a Pig latin related problem:
I have this data below (in one row):
A = LOAD 'records' AS (f1:chararray, f2:chararray,f3:chararray, f4:chararray,f5:chararray, f6:chararray);
DUMP A;
(FITKA,FINVA,FINVU,FEEVA,FETKA,FINVA)
Now I have another dataset:
B = LOAD 'values' AS (f1:chararray, f2:chararray);
Dump B;
(FINVA,0.454535)
(FITKA,0.124411)
(FEEVA,0.123133)
And I would like to get those two dataset joined. I would get corresponding value from dataset B and place that value beside the value from dataset A. So expected output is below:
FITKA 0.123133, FINVA 0.454535 and so on ..
(They can also be like: FITKA, 0.123133, FINVA, 0.454535 and so on .. )
And then I would be able to multiply values (0.123133 x 0.454535 .. and so on) because they are on the same row now and this is what I want.
Of course I can join column by column but then values appear "end of row" and then I can clean it by using another foreach generate. But, I want some simpler solution without too many joins which may cause performance issues.
Dataset A is text (Sentence in one way..).
So what are my options to achieve this?
Any help would be nice.
A sentence can be represented as a tuple and contains a bag of tuples (word, count).
Therefore, I suggest you change the way you store your data to the following format:
sentence:tuple(words:bag{wordcount:tuple(word, count)})

Sorting a pandas DataFrame by the order of a list

So I have a pandas DataFrame, df, with columns that represent taxonomical classification (i.e. Kingdom, Phylum, Class etc...) I also have a list of taxonomic labels that correspond to the order I would like the DataFrame to be ordered by.
The list looks something like this:
class_list=['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes', 'Clostridia', 'Bacilli', 'Actinobacteria', 'Betaproteobacteria', 'delta/epsilon subdivisions', 'Synergistia', 'Mollicutes', 'Nitrospira', 'Spirochaetia', 'Thermotogae', 'Aquificae', 'Fimbriimonas', 'Gemmatimonadetes', 'Dehalococcoidia', 'Oscillatoriophycideae', 'Chlamydiae', 'Nostocales', 'Thermodesulfobacteria', 'Erysipelotrichia', 'Chlorobi', 'Deinococci']
This list would correspond to the Dataframe column df['Class']. I would like to sort all the rows for the whole dataframe based on the order of the list as df['Class'] is in a different order currently. What would be the best way to do this?
You could make the Class column your index column
df = df.set_index('Class')
and then use df.loc to reindex the DataFrame with class_list:
df.loc[class_list]
Minimal example:
>>> df = pd.DataFrame({'Class': ['Gammaproteobacteria', 'Bacteroidetes', 'Negativicutes'], 'Number': [3, 5, 6]})
>>> df
Class Number
0 Gammaproteobacteria 3
1 Bacteroidetes 5
2 Negativicutes 6
>>> df = df.set_index('Class')
>>> df.loc[['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']]
Number
Bacteroidetes 5
Negativicutes 6
Gammaproteobacteria 3
Alex's solution doesn't work if your original dataframe does not contain all of the elements in the ordered list i.e.: if your input data at some point in time does not contain "Negativicutes", this script will fail. One way to get past this is to append your df's in a list and concatenate them at the end. For example:
ordered_classes = ['Bacteroidetes', 'Negativicutes', 'Gammaproteobacteria']
df_list = []
for i in ordered_classes:
df_list.append(df[df['Class']==i])
ordered_df = pd.concat(df_list)

Queries on ActiveRecord Association collection object

I have a set of rows which I've fetched from a table. Let's say the object Rating. After fetching this object, I have say 100 entries from the database.
The Rating object might look like this:
table_ratings
t.integer :num
So what I now want to do is perform some calculations on these 100 rows without performing any other queries. I can do this, running an additional query:
r = Rating.all
good = r.where('num = 2') # this runs an additional query
"#{good}/#{r.length}"
This is a very rough idea of what I want to do, and I have other more complex output to build. Let's imagine I have over 10 different calculations I need to perform on these rows, and so aliasing these in a sql query might not be the best idea.
What would be an efficient way to replace the r.where query above? Is there a Ruby method or a gem which would give me a similar query api into ActiveRecord Association collection objects?
Rating.all returns an array of all Rating objects. From there, shift your focus to selecting and mapping from the array. eg.:
#perfect_ratings = r.select{|x| x.a == 100}
See:
http://www.ruby-doc.org/core-1.9.3/Array.html
ADDITIONAL COMMENTS:
Going over the list of methods available for array, I find myself using the following frequently:
To check a variable against multiple values:
%w[dog cat llama].include? #pet_type # returns true if #pet_type == 'cat'
To create another array(map and collect are aliases):
%w[dog cat llama].map(|pet| pet.capitalize) # ["Dog", "Cat", "Llama"]
To sort and drop duplicates:
%w[dog cat llama dog].sort.uniq # ["cat", "dog", "llama"]
<< to add an element, + to add arrays, flatten to flatten embedded arrays into a single level array, count or length or size for number of elements, and join are the others I tend to use a lot.
Finally, here is an example of join:
%w[dog cat llama].join(' or ') # "dog or cat or llama"

Resources