Related
I have this dataframe
structure(list(plate = c("A", "A", "A", "A", "A", "B", "B", "B",
"B", "B", "C", "C", "C", "C", "C"), marker = c("IL-1", "IL-2",
"IL-3", "IL-4", "IL-5", "IL-1", "IL-2", "IL-3", "IL-4", "IL-5",
"IL-1", "IL-2", "IL-3", "IL-4", "IL-5"), sample = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15), result = c(1.94000836127381,
0.426353706529969, 2.07418521661429, 1.58200029160696, 0.812685661255674,
0.546681932009987, 0.199532997122114, 0.100208840148698, 0.720956738045624,
0.444814277410285, 2.25080298569014, 1.61429066532657, 1.1066027850052,
0.927880542016121, 4.1487948134003), LOD = c(0.810456546400942,
0.614177278086376, 0.98739611371029, 0.315142822914328, 0.221497734151459,
0.0191136249820546, 0.364139946842526, 0.983763479804491, 0.982034953153209,
0.851687364910033, 0.893324689832074, 0.978609354294382, 0.62613140416969,
0.0310439168600307, 0.729966088361143)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -15L))
As you can see I have different LOD values, for each marker in each plate. So I calculate the mean LOD for each marker using
lod <- dummy_2 |>
group_by (marker) |>
summarise(lod = mean(LOD))
which results in the following mean LOD per marker for all plates
structure(list(marker = c("IL-1", "IL-2", "IL-3", "IL-4", "IL-5"
), lod = c(0.57429828707169, 0.652308859741095, 0.865763665894824,
0.442740564309189, 0.601050395807545)), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -5L))
So far so good. Now I want to check if the result of my markers are above or below my mean LOD. If it is above my mean LOD, the results must not be changed, if it is below my LOD, the result must be changed in the LOD/2.
I tried to use for loop and mutate in combination with ifelse, but that did not work. I also saw the accross function, but that also did not work. My latest try was...
marker <- unique(dummy_2$marker)
for (i in marker){
dummy_2 <- mutate(result = ifelse(i %in% dummy_2$result < dummy_2$LOD, (i %in% lod$LOD)/2), dummy_2$result)}
Is for loop the right way to go, or is there a better solution?
Any help would be appreciated..
Already found a solution by creating a new dataframe with my mean values and linked this together using left_join to my dataset. Still like to know if this could be possible with a for loop. But for now, problem solved....
I’m new in kafka streams and I’m really going crazy. I have a stream of counter values represented by <counter-name, counter-value, timestamp>. I want to calculate the average value for each day, like this:
counterValues topic content:
“cpu”, 10, “2022-06-03 17:00”
“cpu”, 20, “2022-06-03 18:00”
“cpu”, 30, “2022-06-04 10:00”
“memory”, 40, “2022-06-04 10:00”
and I want to obtain this output:
“cpu”, “2022-06-03”, 15
“cpu”, “2022-06-04”, 30
“memory”, “2022-06-04”, 40
This is a snippet of my code that it doesn’t work (it seems to calculate count)…
Duration windowSize = Duration.ofDays(1);
TimeWindows tumblingWindow = TimeWindows.of(windowSize);
counterValueStream
.groupByKey().windowedBy(tumblingWindow)
.aggregate(StatisticValue::new, (k, counterValue, statisticValue) -> {
statisticValue.setSamplesNumber(statisticValue.getSamplesNumber() + 1);
statisticValue.setSum(statisticValue.getSum() + counterValue.getValue());
return statisticValue;
}, Materialized.with(Serdes.String(), statisticValueSerde))
.toStream().map((Windowed<String> key, StatisticValue sv) -> {
double avgNoFormat = sv.getSum() / (double) sv.getSamplesNumber();
double formattedAvg = Double.parseDouble(String.format("%.2f", avgNoFormat));
return new KeyValue<>(key.key(), formattedAvg) ;
}).to("average", Produced.with(Serdes.String(), Serdes.Double()));
But the aggregation result is:
“cpu”, 1, “2022-06-03 17:00”
“cpu”, 1, “2022-06-03 18:00”
“cpu”, 1, “2022-06-04 10:00”
“memory”, 1, “2022-06-04 10:00”
Note that I use a TimestampExtractor that use counter timestamp instead of kafka record. What am I doing wrong?
I'm trying to fit a model using fitDataset(). I can train using the "normal" approach, with a for loop and getting random batches of data (20000 data points).
I'd like to use the fitDataset() and be able to use the entire dataset and not rely on "randomness" of my getBatch function.
I'm getting closer, using the API docs and the example on tfjs-data but, i'm stuck on a probably dumb data manipulation...
So here's how i'm doing it:
const [trainX, trainY] = await bigData
const model = await cnnLSTM // gru performing well
const BATCH_SIZE = 32
const dataSet = flattenDataset(trainX.slice(200), trainY.slice(200))
model.compile({
loss: 'categoricalCrossentropy',
optimizer: tf.train.adam(0.001),
metrics: ['accuracy']
})
await model.fitDataset(dataSet.train.batch(32), {
epochs: C.trainSteps,
validationData: dataSet.validation,
callbacks: {
onBatchEnd: async (batch, logs) => (await tf.nextFrame()),
onEpochEnd: (epoch, logs) => {
let i = epoch + 1
lossValues.push({'epoch': i, 'loss': logs.loss, 'val_loss': logs.val_loss, 'set': 'train'})
accuracyValues.push({'epoch': i, 'accuracy': logs.acc, 'val_accuracy': logs.val_acc, 'set': 'train'})
// await md `${await plotLosses(train.lossValues)} ${await plotAccuracy(train.accuracyValues)}`
}
}
})
here's my interpretation of the dataset creation:
flattenDataset = (features, labels, split = 0.35) => {
return tf.tidy(() => {
let slice =features.length - Math.floor(features.length * split)
const featuresTrain = features.slice(0, slice)
const featuresVal = features.slice(slice)
const labelsTrain = labels.slice(0, slice)
const labelsVal = labels.slice(slice)
const data = {
train: tf.data.array(featuresTrain, labelsTrain),
validation: tf.data.array(featuresVal, labelsVal)
}
return data
})
}
I'm getting an error:
Error: Dataset iterator for fitDataset() is expected to generate an Array of length 2: `[xs, ys]`, but instead generates Tensor
[[0.4106583, 0.5408, 0.4885066, 0.9021732, 0.1278526],
[0.3711334, 0.5141, 0.4848816, 0.9021571, 0.2688071],
[0.4336613, 0.5747, 0.4822159, 0.9021728, 0.3694479],
...,
[0.4123166, 0.4553, 0.478438 , 0.9020132, 0.8797594],
[0.3963479, 0.3714, 0.4871198, 0.901996 , 0.7170534],
[0.4832076, 0.3557, 0.4892016, 0.9019232, 0.9999322]],Tensor
[[0.3711334, 0.5141, 0.4848816, 0.9021571, 0.2688071],
[0.4336613, 0.5747, 0.4822159, 0.9021728, 0.3694479],
[0.4140858, 0.5985, 0.4789927, 0.9022084, 0.1912155],
...,
The input data is 6 timesteps with 5 dimensions and the labels are just one-hot encoded classes [0,0,1], [0,1,0] and [1, 0, 0]. I guess the flattenDataset() is not sending the data in the correct way.
Does data.train needs to output for each data point [6 timesteps with 5 dims, label] ? I get this error when i tried that:
Error: The feature data generated by the dataset lacks the required input key 'conv1d_Conv1D5_input'.
Could really use some pro insight...
--------------------
Edit #1:
I feel i'm close to an answer.
const X = tf.data.array(trainX.slice(0, 100))//.map(x => x)
const Y = tf.data.array(trainY.slice(0, 100))//.map(x => x)
const zip = tf.data.zip([X, Y])
const dataSet = {
train: zip
}
dataSet.train.forEach(x => console.log(x))
With this i get on the console:
[Array(6), Array(3)]
[Array(6), Array(3)]
[Array(6), Array(3)]
...
[Array(6), Array(3)]
[Array(6), Array(3)]
but the fitDataset is giving me: Error: The feature data generated by the dataset lacks the required input key 'conv1d_Conv1D5_input'.
my model look like this:
const model = tf.sequential()
model.add(tf.layers.conv1d({
inputShape: [6, 5],
kernelSize: (3),
filters: 64,
strides: 1,
padding: 'same',
activation: 'elu',
kernelInitializer: 'varianceScaling',
}))
model.add(tf.layers.maxPooling1d({poolSize: (2)}))
model.add(tf.layers.conv1d({
kernelSize: (1),
filters: 64,
strides: 1,
padding: 'same',
activation: 'elu'
}))
model.add(tf.layers.maxPooling1d({poolSize: (2)}))
model.add(tf.layers.lstm({
units: 18,
activation: 'elu'
}))
model.add(tf.layers.dense({units: 3, activation: 'softmax'}))
model.compile({
loss: 'categoricalCrossentropy',
optimizer: tf.train.adam(0.001),
metrics: ['accuracy']
})
return model
What is wrong here?
What model.fitDataset expects are a Dataset, each element inside this dataset is a tuple of two items, [feature, label].
So in your case, you need to create featureDataset and labelDataset, then merge then with tf.data.zip to create trainDataset. Same for validation dataset.
Solved it
so after a lot of trial an error i found a way to make it work.
So, i had an input shape of [6, 5], meaning an array with 6 arrays of 5 floats each.
[[[0.3467378, 0.3737, 0.4781905, 0.90665, 0.68142351],
[0.44003019602788285, 0.3106, 0.4864576, 0.90193448, 0.5841830879700972],
[0.30672944860847245, 0.3404, 0.490295674, 0.90720676, 0.8331748581920732],
[0.37475716007758336, 0.265, 0.4847249, 0.902056932, 0.6611207914113887],
[0.5639427928616854, 0.2423002, 0.483168235, 0.9020202294447865, 0.82823],
[0.41581425627336555, 0.4086, 0.4721923, 0.902094287, 0.914699]], ... 20k more]
What i did was to flatten the array becoming an array of 5 dimensions arrays. Then applied the .batch(6) to it.
const BATCH_SIZE = 20 //batch size fed to the NN
const X = tf.data.array([].concat(...trainX)).batch(6).batch(BATCH_SIZE)
const Y = tf.data.array(trainY).batch(BATCH_SIZE)
const zip = tf.data.zip([X, Y])
const dataSet = {
train: zip
}
Hope it can help others on complex data!!
Silly question that has me stumped. I want to give a different debounceTime based on data in the stream. I have:
const fakeData = [{number: 1}, {number: 2}, {number: 3}];
const stream$ = Rx.Observable.from(fakeData);
const delayedStream$ = stream$.concatMap(x => Rx.Observable.of(x).delay(300));
delayedStream$
.concatMap(x => x.number >=2
? Rx.Observable.of(x).debounceTime(500)
: Rx.Observable.of(x).debounceTime(1000)
)
.subscribe(x => console.log(x));
// expected output: 3
// actual output: 1 ... 2 ... 3 |
http://jsbin.com/dafaxoraca/edit?js,console
The above code simply returns x without a debounce. But if I replace debounceTime with delay, the delay works as expected. I'm obviously missing something fundamental between the two operators. I've gone through the docs and am not getting it.
Thanks for your help!
I can't test this with your actual use-case but you're not using debounceTime correctly.
Operator debounceTime applies debounce only on this Observable stream and its data. Since you're chaining concatMap and the returned Observable with debounceTime, the concat will always wait until the Observable completes. So this always returns all three values.
You can use debounce() that expects to get an Observable that lets you use delay by emitting values instead of hardcoded time.
const fakeData = [{number: 1}, {number: 2}, {number: 3}];
const stream$ = Rx.Observable.from(fakeData);
const delayedStream$ = stream$.concatMap(x => Rx.Observable.of(x).delay(300));
delayedStream$
.debounce(val => Rx.Observable.of(true).delay(val >= 2 ? 500 : 1000))
.subscribe(x => console.log(x.number));
See live demo: http://jsbin.com/tifajodogi/1/edit?js,console
This emits just: 3
Update: Since RxJS 5.5+ the same technique can be restructured like so:
const fakeData = [{number: 1}, {number: 2}, {number: 3}];
const stream$ = from(fakeData);
const delayedStream$ = stream$.pipe(concatMap(x => of(x).pipe(delay(300))));
delayedStream$.pipe(
debounce(val => of(true).pipe(delay(val >= 2 ? 500 : 1000)))
subscribe(x => console.log(x.number))
);
For instance, how can I construct a list consisting of all the digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.
You can construct it with val xs = ($list {int} (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)).
Make sure you specify the correct memory allocation functions by passing -DATS_MEMALLOC_LIBC to the compiler when using this code.
If you compile to JavaScript via atscc2js, then you need to construct
the list in the following manner:
val ds =
0::1::2::3::4::5::6::7::8::9::nil{int}()
// end of [val]
It also works for targeting C.
There are also combinators for this sort of things. For instance,
val ds = (10).list_map(TYPE{int})(lam(i) => i)
val ds = list_tabulate_cloref<int>(10, lam i => i)
You can find special literal on emacs mode:
https://github.com/githwxi/ATS-Postiats/blob/653af81715cf6bfbd1d2cd5ece1e88c8c3912b4a/utils/emacs/ats2-mode.el#L287
(defvar ats-special-keywords
'("$arrpsz" "$arrptrsize" "$delay" "$ldelay" "$effmask" "$effmask_ntm" "$effmask_exn" "$effmask_ref"
"$effmask_wrt" "$effmask_all" "$extern" "$extkind" "$extype" "$extype_struct" "$extval" "$lst"
"$lst_t" "$lst_vt" "$list" "$list_t" "$list_vt" "$rec" "$rec_t" "$rec_vt"
"$record" "$record_t" "$record_vt" "$tup" "$tup_t" "$tup_vt" "$tuple" "$tuple_t"
"$tuple_vt" "$raise" "$showtype" "$myfilename" "$mylocation" "$myfunction" "#assert" "#define"
"#elif" "#elifdef" "#elifndef" "#else" "#endif" "#error" "#if" "#ifdef"
"#ifndef" "#print" "#then" "#undef" "#include" "#staload" "#dynload" "#require"))
If you find some keyword, you can easily know how to use it on doc/EXAMPLE/ directory:
$ git clone https://github.com/githwxi/ATS-Postiats.git
$ cd ATS-Postiats/doc/EXAMPLE
$ grep -r "\$list" . | head
./MISC/word-chain.dats: $list{word}("", "")
./MISC/word-chain.dats: $list{word}("", "")
./MISC/mysendmailist.dats:$list{string}
./MISC/monad_list.dats: $list{a}("this", "that", "a")
./MISC/monad_list.dats: $list{a}("frog", "elephant", "thing")
./MISC/monad_list.dats: $list{a}("walked", "treaded", "grows")
./MISC/monad_list.dats: $list{a}("slowly", "quickly")
./ATSLF/CoYonedaLemma.dats:val myintlist0 = g0ofg1($list{int0}(I(1), I(0), I(1), I(0), I(0)))
./ATSLF/YonedaLemma.dats: $list{bool}(True, False, True, False, False)
./ATS-QA-LIST/qa-list-2014-12-07.dats:$list{double}(0.111111, 0.222222, 0.333333)
For a list0-value, you can do
val xs = g0ofg1($list{T}(x1, ..., xn))
where T is the type for the elements in xs. For instance,
val some_int_list = g0ofg1($list{int}(0, 9, 8, 7, 3, 4))