How can I use more than one additional regressor with DeepAREstimator in gluon-ts? - gluon

When creating training or test data in gluon-ts we can specify an additional real-valued regressor in the DeepAREstimator by specifying a feat_dynamic_real. Is there support for multiple real-valued regressors?
There is a one_dim_target flag in gluonts.dataset.common.ListDataset which is used to create the training/test data objects. This seems like it could be needed to support multiple additional regressors, however I couldn't find a good example on the intended usage.
Here is the set up for creating training data with one additional regressor:
training_data = ListDataset(
[{"start": df.index[0], "target": df.values, "feat_dynamic_real": df['randomColumn'].values}],
freq = "5min", one_dim_target=False
)
and the Estimator:
from gluonts.model.deepar import DeepAREstimator
from gluonts.trainer import Trainer
estimator = DeepAREstimator(freq="5min", prediction_length=12, trainer=Trainer(epochs=10))
predictor = estimator.train(training_data=training_data)
I'm looking for the syntax/configuration needed for multiple regressors.

Yes, there is support for that. First of, Gluon TS refers to regressors as features and the signal that we are trying to predict as target. Thus, the one_dim_target flag you mention is related to the dimension of the ouput and not the input.
Below is the code I use to associate a multi-dimensional feature (input) to each target signal (I use a one-dimensional target)
train_ds = ListDataset([{FieldName.TARGET: target,
FieldName.START: start,
FieldName.FEAT_DYNAMIC_REAL: fdr}
for (target, start, fdr) in zip(
target,
custom_ds_metadata['start'],
feat_dynamic_real)]
In the zip-function above,
target: Is a 1-dimensional numpy array containing the target signal, i.e., the shape of target is (1,#of time steps)
custom_ds_metadata['start'] : Is a pandas date variable indicating the beginning of the data
feat_dynamic_real: Is a 2-dimensional numpy array containing two feature signals, i.e., feat_dynamic_real has shape (#of features, #number of time steps)

Related

'Duplicate' NGram values in topic list created using bertopic

I've set the CountVectorizer to examine bi and trigrams (ngram_range=(1, 3)) . This seems very useful. However, I'm seeing "duplicate" terms e.g.:
The terms "justice," "India," "gate," and "along" appear to overlap significantly. I'm utilising these vocabularies to choose documents for further processing, and it appears that we have one phrase "pushing out" other terms that could otherwise surface. In fact, I'm conducting a broad search across all of these terms to pick target documents for additional processing, so I'm not sure what I'm "missing" otherwise. Is this something I'm thinking about correctly? In this case, would it be a "good thing" if "india gate" and "justice khanna" were combined into a single term?
also how can I combine these into a single term in bertopic so that these overlaps don't occur
In BERTopic, there is the diversity parameter that allows you to fine-tune the topic representations. The underlying algorithm for this is called MaximalMarginalRelevance. It is a value between 0 and 1 that indicates how diverse keywords in a single topic should be compared to one another. A value of 1 indicates high diversity and 0 indicates little diversity. It works as follows:
from bertopic import BERTopic
from sklearn.datasets import fetch_20newsgroups
# Get documents
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']
# Train BERTopic and apply MMR
topic_model = BERTopic(diversity=0.4)
topics, probs = topic_model.fit_transform(docs)
Do note that in the upcoming version, the diversity parameter is removed and will be replaced as follows:
from bertopic.representation import MaximalMarginalRelevance
from bertopic import BERTopic
# Create your representation model
representation_model = MaximalMarginalRelevance(diversity=0.3)
# Use the representation model in BERTopic on top of the default pipeline
topic_model = BERTopic(representation_model=representation_model)

Setting the power spectral density from a file

How does one set the power spectral density (PSD) from file and is it possible to use a different PSD for generating the data and for likelihood evaluation?
Question asked by Vivien Raymond by email.
Setting the PSD from file
To set the PSD from a file, first initialise a list of interferometers, here we just use Hanford:
>>> ifos = bilby.gw.detector.InterferometerList(['H1'])
Every element of the list is initialised with a default PSD using the advanced LIGO noise curve, to check this
>>> ifos[0].power_spectral_density
PowerSpectralDensity(psd_file='/home/user1/miniconda3/lib/python3.6/site-packages/bilby-0.3.5-py3.6.egg/bilby/gw/noise_curves/aLIGO_ZERO_DET_high_P_psd.txt', asd_file='None')
Note, no data has yet been generated. To overwrite the PSD,simply create a new PowerSpectralDensity object and assign it (if you have multiple detectors, you'll need to do this for every element of the list)
ifos[0].power_spectral_density = bilby.gw.detector.PowerSpectralDensity(psd_file=PATH_TO_FILE)
Nest, generate an instance of the strain data from the PSD:
ifos.set_strain_data_from_power_spectral_densities(
sampling_frequency=4096, duration=4,
start_time=-3)
You can check what the data looks like by doing
ifos[0].plot_data()
Note, you can also inject signals using the ifos.inject_signal method.
Using a different PSD for likelihood evaluation
Each ifo in the ifos list contains both the data and a PSD (or equivalent ASD). For inference, we pass that list into the bilby.gw.GravitationalWaveLikelihood object as the first argument and the PSD for each element of the list is used in calculating the likelihood.
So, if you want to use a different PSD for likelihood estimate. First generate the data (as above). Then, assign the PSD you want to use for sampling to each element of ifos and pass that object into the likelihood instead. This won't overwrite the data (provided you don't call set_strain_data_from_power_spectral_densities of course).

How to inject a zero-noise signal compact binary coalescence signal

Is it possible to inject a signal by itself with no coloured Gaussian noise?
Question asked by Arunava Mukherjee via email
Yes. There are two easy ways to do this.
1) Use the existing helper functions
When generating an interferometer object, bilby provides several helper routines denoted by bilby.gw.detector.get_interferometer_with.... In this case, you'll want to use this function (I've truncated the doctring)
bilby.gw.detector.get_interferometer_with_fake_noise_and_injection(
name, injection_parameters, injection_polarizations=None,
waveform_generator=None, sampling_frequency=4096, duration=4,
start_time=None, outdir='outdir', label=None, plot=True, save=True,
zero_noise=False)
Docstring:
Helper function to obtain an Interferometer instance with appropriate
power spectral density and data, given an center_time.
Note: by default this generates an Interferometer with a power spectral
density based on advanced LIGO.
Parameters
----------
name: str
Detector name, e.g., 'H1'.
...
zero_noise: bool
If true, set noise to zero.
So you just pass the flag in and it will create an interferometer with just the injection signal (you'll then need to make one for each interferometer you want in the list of interferometers passed in to the likelihood.
2) Use the low level set strain data methods
Alternatively, you may instead wish to use the low level methods themselves. As a general rule of thumb, you can always look at the source code for the generic helper functions to figure out how this should be done. Here, we create a H1 interferometer set the strain data with zero noise and inject a signal:
interferometer = get_empty_interferometer("H1")
interferometer.power_spectral_density = PowerSpectralDensity.from_aligo()
interferometer.set_strain_data_from_zero_noise(
sampling_frequency=sampling_frequency, duration=duration,
start_time=start_time)
injection_polarizations = interferometer.inject_signal(
parameters=injection_parameters,
waveform_generator=waveform_generator)
Information correct as of v.0.3.5

What problem does a reinitializable iterator solve?

From the tf.data documentation:
A reinitializable iterator can be initialized from multiple different
Dataset objects. For example, you might have a training input pipeline
that uses random perturbations to the input images to improve
generalization, and a validation input pipeline that evaluates
predictions on unmodified data. These pipelines will typically use
different Dataset objects that have the same structure (i.e. the same
types and compatible shapes for each component).
the following example was given:
# Define training and validation datasets with the same structure.
training_dataset = tf.data.Dataset.range(100).map(
lambda x: x + tf.random_uniform([], -10, 10, tf.int64))
validation_dataset = tf.data.Dataset.range(50)
# A reinitializable iterator is defined by its structure. We could use the
# `output_types` and `output_shapes` properties of either `training_dataset`
# or `validation_dataset` here, because they are compatible.
iterator = tf.data.Iterator.from_structure(training_dataset.output_types,
training_dataset.output_shapes)
next_element = iterator.get_next()
training_init_op = iterator.make_initializer(training_dataset)
validation_init_op = iterator.make_initializer(validation_dataset)
# Run 20 epochs in which the training dataset is traversed, followed by the
# validation dataset.
for _ in range(20):
# Initialize an iterator over the training dataset.
sess.run(training_init_op)
for _ in range(100):
sess.run(next_element)
# Initialize an iterator over the validation dataset.
sess.run(validation_init_op)
for _ in range(50):
sess.run(next_element)
It is unclear what the benefit of this complexity is.
Why not simply create 2 different iterators?
The original motivation for reinitializable iterators was as follows:
The user's input data is in two or more tf.data.Dataset objects with the same structure but different pipeline definitions.
For example, you might have a training data pipeline with augmentations in a Dataset.map(), and an evaluation data pipeline that produced raw examples, but they would both produce batches with the same structure (in terms of the number of tensors, their element types, shapes, etc.).
The user would define a single training graph that took input from a tf.data.Iterator, created using Iterator.from_structure().
The user could then switch between the different input data sources by reinitializing the iterator from one of the datasets.
In hindsight, reinitializable iterators have turned out to be quite hard to use for their intended purpose. In TensorFlow 2.0 (or 1.x with eager execution enabled), it is much easier to create iterators over different datasets using idiomatic Python for loops and high-level training APIs:
tf.enable_eager_execution()
model = ... # A `tf.keras.Model`, or some other class exposing `fit()` and `evaluate()` methods.
train_data = ... # A `tf.data.Dataset`.
eval_data = ... # A `tf.data.Dataset`.
for i in range(NUM_EPOCHS):
model.fit(train_data, ...)
# Evaluate every 5 epochs.
if i % 5 == 0:
model.evaluate(eval_data, ...)

Paraview rotate fields

I am using Paraview 5.0.1. If any solution requires updating, I can try.
I want to programmatically obtain field plots (and corresponding PlotOverLine) of displacements and stresses in rotated coordinate systems.
What are appropriate/convenient/possible ways of doing this?
So far, I have created one Calculator filter for each component of displacements and stresses.
For instance, I used Calculators in 2D with results
(displacement.iHat)*cos(0.7853981625)+(displacement.jHat)*sin(0.7853981625)
(stress_3-stress_0)*sin(45.0*3.14159265/180)*cos(45.0*3.14159265/180)+stress_1*((cos(45.0*3.14159265/180))^2-(sin(45.0*3.14159265/180))^2)
It works fine, but it is quite cumbersome, in several aspects:
Creating them (one filter per component).
Plotting several of them in a single XY plot
Exporting them (one export per component).
Is there a simple way to do this?
PS: The Transform filter does not accomplish this. It rotates the view, not the fields.
Two solutions:
Ugly, inneficient solution
Use Transform and check "Transform All Input vectors"
Add a calculator and add a dummy array
Use transform the other way around, without checking "Transform All Input vectors"
Correct solution :
Compute the transformation yourself in a programmable filter
input = self.GetUnstructuredGridInput();
output = self.GetUnstructuredGridOutput();
output.ShallowCopy(input)
data = input.GetPointData().GetArray("YourArray")
vec = vtk.vtkDoubleArray();
vec.SetNumberOfComponents(3);
vec.SetName("TransformedVectors");
numPoints = input.GetNumberOfPoints()
for i in xrange(0, numPoints):
tuple = data.GetTuple(i)
transform(tuple) # implement the transform in python
vec.InsertNextTuple(tuple)
output.GetPointData().AddArray(vec)

Resources