I am writing a code for extracting the fill factor, and while doing so, I realized that some of the data points were way off the list. It was importing any number like 4.25E-16 or -3.46422E-17 range as something else. Can anyone help me figure out the problem and how to fix it?
With data imported as strings
import = ReadList["example.dat", String]
{"4.25E-16", "-3.46422E-17"}
convert[num_String] := ToExpression[StringReplace[num, "E" -> "*10^"]]
SetAttributes[convert, Listable]
numbers = convert[import]
{4.25x10^-16, -3.46422x10^-17}
Import will do the conversion automatically
Import["example.dat"]
{{4.25x10^-16}, {-3.46422x10^-17}}
while ReadList on its own will not
ReadList["example.dat"]
{-4.4473, -26.4167}
Related
I have a list of dots of variable random length and I want to be able to apply a transform (shift in this case) to these objects with independence but at the same time.
list = [Dot(), Dot() ...] # Variable length
I am using the Manim Library by https://github.com/3b1b/manim by 3blue1brown.
As a note, other related posts don't solve my problem as they only work with a fix number of objects (dots).
The following code from this reddit post, used as an example, solves the problem:
import numpy as np
class DotsMoving(Scene):
def construct(self):
dots = [Dot() for i in range(5)]
directions = [np.random.randn(3) for dot in dots]
self.add(*dots) # It isn't absolutely necessary
animations = [ApplyMethod(dot.shift,direction) for dot,direction in zip(dots,directions)]
self.play(*animations) # * -> unpacks the list animations
Special thanks to u/Xorlium.
Don't use list, it's a reserved word, use VGroup to contain objects:
list_dots = VGroup(*[Dot() for _ in range(5)]) # 5 dots vgroup
# this is the same as:
# list_dots = VGroup(Dot(),Dot(),Dot(),Dot(),Dot())
# See 'list comprehension python' in google
list_dots.arrange(RIGHT)
list_dots.set_color(RED)
list_dots.shift(UP)
Please find below a minimum example of how I iterate through time in xarray.
ds = xr.Dataset({'time': pd.date_range(start='1/1/2018', periods=8)})
for ii, date in enumerate(ds.time):
nd = date.data
nd is a numpy.ndarray but of size = 1; no shape: shape = () and 0-dimension: dims = 0.
I can access the element through nd[()] (it took me a while, thx Clive), but I wonder if it is something we should expect or if it is a bug.
If there is a better way to enumerate through my date, please let me know or point me out where to find it.
The nd array as a 0d array is a feature; explained here: https://stackoverflow.com/a/49621796/3064736.
There is a small bug given a recent pandas change such that nd.item() returns an int rather than a date on the most recent versions of xarray & pandas. That's being tracked here: https://github.com/pydata/xarray/pull/4292.
Generally we would want nd=data.item()
I have a pretty simple example but I am just learning and can't find a solution for the following:
Given 2 sequences, being
<emp>10</emp>
<emp>42</emp>
<emp>100</emp>
and another sequence
<emp>10</emp>
<emp>42</emp>
Want i want to do is: Compare the sequences and return the part of sequences that is in the first, but not in the 2nd sequence, being <emp>100</emp> in this case.
I was thinking about an "except"-operation, but can't figure out how to make it working.
Help greatly appreciated.
The except expression operates on node identity, not node value. What I think you want is a value comparison over your sequences. For example:
let $seq1 :=
(<emp>10</emp>,
<emp>42</emp>,
<emp>100</emp>)
let $seq2 :=
(<emp>10</emp>,
<emp>42</emp>)
return $seq1[not(. = $seq2)]
=>
<emp>100</emp>
I'm trying to use Spark dataframes instead of RDDs since they appear to be more high-level than RDDs and tend to produce more readable code.
In a 14-nodes Google Dataproc cluster, I have about 6 millions names that are translated to ids by two different systems: sa and sb. Each Row contains name, id_sa and id_sb. My goal is to produce a mapping from id_sa to id_sb such that for each id_sa, the corresponding id_sb is the most frequent id among all names attached to id_sa.
Let's try to clarify with an example. If I have the following rows:
[Row(name='n1', id_sa='a1', id_sb='b1'),
Row(name='n2', id_sa='a1', id_sb='b2'),
Row(name='n3', id_sa='a1', id_sb='b2'),
Row(name='n4', id_sa='a2', id_sb='b2')]
My goal is to produce a mapping from a1 to b2. Indeed, the names associated to a1 are n1, n2 and n3, which map respectively to b1, b2 and b2, so b2 is the most frequent mapping in the names associated to a1. In the same way, a2 will be mapped to b2. It's OK to assume that there will always be a winner: no need to break ties.
I was hoping that I could use groupBy(df.id_sa) on my dataframe, but I don't know what to do next. I was hoping for an aggregation that could produce, in the end, the following rows:
[Row(id_sa=a1, max_id_sb=b2),
Row(id_sa=a2, max_id_sb=b2)]
But maybe I'm trying to use the wrong tool and I should just go back to using RDDs.
Using join (it will result in more than one row in group in case of ties):
import pyspark.sql.functions as F
from pyspark.sql.functions import count, col
cnts = df.groupBy("id_sa", "id_sb").agg(count("*").alias("cnt")).alias("cnts")
maxs = cnts.groupBy("id_sa").agg(F.max("cnt").alias("mx")).alias("maxs")
cnts.join(maxs,
(col("cnt") == col("mx")) & (col("cnts.id_sa") == col("maxs.id_sa"))
).select(col("cnts.id_sa"), col("cnts.id_sb"))
Using window functions (will drop ties):
from pyspark.sql.functions import row_number
from pyspark.sql.window import Window
w = Window().partitionBy("id_sa").orderBy(col("cnt").desc())
(cnts
.withColumn("rn", row_number().over(w))
.where(col("rn") == 1)
.select("id_sa", "id_sb"))
Using struct ordering:
from pyspark.sql.functions import struct
(cnts
.groupBy("id_sa")
.agg(F.max(struct(col("cnt"), col("id_sb"))).alias("max"))
.select(col("id_sa"), col("max.id_sb")))
See also How to select the first row of each group?
I think what you might be looking for are window functions:
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=window#pyspark.sql.Window
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
Here is an example in Scala (I don't have a Spark Shell with Hive available right now, so I was not able to test the code, but I think it should work):
case class MyRow(name: String, id_sa: String, id_sb: String)
val myDF = sc.parallelize(Array(
MyRow("n1", "a1", "b1"),
MyRow("n2", "a1", "b2"),
MyRow("n3", "a1", "b2"),
MyRow("n1", "a2", "b2")
)).toDF("name", "id_sa", "id_sb")
import org.apache.spark.sql.expressions.Window
val windowSpec = Window.partitionBy(myDF("id_sa")).orderBy(myDF("id_sb").desc)
myDF.withColumn("max_id_b", first(myDF("id_sb")).over(windowSpec).as("max_id_sb")).filter("id_sb = max_id_sb")
There are probably more efficient ways to achieve the same results with Window functions, but I hope this points you in the right direction.
Here is my problem.
I'm using sympy and a complex matrix P (all elements of P are complex valued).
I wanna extract the real/imaginary part of the first row.
So, I use the following sequence:
import sympy as sp
P = sp.Matrix([ [a+sp.I*b,c-sp.I*d], [c-sp.I*d,a+sp.I*b] ])
Row = P.row(0)
Row.as_mutable()
Re_row = sp.re(Row)
Im_row = sp.im(Row)
But the code returns me the following error:
"AttributeError: ImmutableMatrix has no attribute as_coefficient."
The error occurs during the operation sp.re(Row) and sp.im(Row)...
Sympy tells me that Row is an Immutable matrix but I specify that I want a mutable one...
So I'm in a dead end, and I don't have the solution...
Could someone plz help me ?
thank you very much !
Most SymPy functions won't work if you just pass a Matrix to them directly. You need to use the methods of the Matrix, or if there is not such method (as is the case here), use applyfunc
In [34]: Row.applyfunc(re)
Out[34]: [re(a) - im(b) re(c) + im(d)]
In [35]: Row.applyfunc(im)
Out[35]: [re(b) + im(a) -re(d) + im(c)]
(I've defined a, b, c, and d as just ordinary symbols here, if you set them as real the answer will come out much simpler).