Related
I have a list of 23 utilities that can be in a state of either enabled or disabled. I've ordered them from 0-22.
Some of these utilities are dependent on others, meaning they cannot be enabled without one or multiple dependency utilities first being enabled. I've put the indices of each utility's dependencies in a list for each utility; for example, if utilities 0-1 had no dependencies, but utility 2 had dependencies on utilities 0 and 9, the full dependency list would look something like:
[ [], [], [0, 9], ... ]
What I want to do is devise an algorithm (pseudocode is fine, implementation does not matter) for generating a list of all possible 23-bit bitvectors---each bit in each bitvector with an index that we could label 0-22 corresponding to a single utility, each bitvector itself representing a possible combination of the status of all 23 utilities---that ignores combinations where the dependency requirements provided by the dependency list (described above) would not be satisfied. For example (assume right-to-left numbering):
[
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 ],
//skip[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ] this would not be included (2 enabled, but 0 and/or 9 are not. See prev. example of dependency list)
...
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
]
First step, get rid of all circular dependencies. If A depends on B depends on C depends on A, all will be on/off together. So we can transfer all dependencies to A, then fill B and C at the last step. This is a question of identifying all connected components in a graph, which we can use Kosaraju's algorithm for to do efficiently.
Second step, do a topological sort by dependencies of the remaining list. This will put the remaining utilities into a list where each only depends on ones you looked at before.
And now we can use recursion down that list. The first utility can be 0 or 1. Each subsequent utility is 0 only if some dependency is not satisfied, else it can be 0 or 1. And then for the ones eliminated due to being part of circular dependencies, fill them in with whatever value the one kept has.
https://www.elastic.co/guide/en/elasticsearch/guide/current/_finding_multiple_exact_values.html
{
"terms" : {
"price" : [20, 30]
}
}
What is the max number of values we can put in this search?
"price" : [20, 30, 40, 50, 10, 11, 12 .... to the max number of search values]
Thanks!
searain,
The max number of terms is 1024, you can still change it for a higher number
using (in case if you are using it into a bool operator):
indices.query.bool.max_clause_count
into the .yml config file.
I have a 6-node elasticsearch 5.5 cluster with their path.data set to the following:
path.data: "/data/0/elasticsearch,/data/1/elasticsearch,/data/2/elasticsearch,/data/3/elasticsearch,/data/4/elasticsearch"
I have restarted all the nodes after the config change however only 1 node in the cluster actually writes to /data/0/elasticsearch and /data/1/elasticsearch. All the other 5 nodes have only been writing to /data/0/elasticsearch
To give you more insight into my cluster settings:
For my index => "number_of_shards":"5"
How do I configure the cluster to write evenly across all drives /data/0/ through /data/4/ and across all nodes.
Update 1:
Here is my health check report.
{
"cluster_name": "elasticsearch",
"status": "green",
"timed_out": false,
"number_of_nodes": 6,
"number_of_data_nodes": 6,
"active_primary_shards": 6,
"active_shards": 12,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100.0
}
Right now processing a large amount of Json data coming from a Mixpanel API. With a small dataset, it's a breeze and the code below runs just fine. However, a large dataset takes a rather long time to process and we're starting to see timeouts because of it.
My Scala optimization skills are rather poor, so I am hoping someone can show a faster way to process the following with large data sets. Please do explain why since it will help my own understanding of Scala.
val people = parse[mp.data.Segmentation](o)
val list = people.data.values.map(b =>
b._2.map(p =>
Map(
"id" -> p._1,
"activity" -> p._2.foldLeft(0)(_+_._2)
)
)
)
.flatten
.filter{ behavior => behavior("activity") != 0 }
.groupBy(o => o("id"))
.map{ case (k,v) => Map("id" -> k, "activity" -> v.map( o => o("activity").asInstanceOf[Int]).sum) }
And that Segmentation class:
case class Segmentation(
val legend_size: Int,
val data: Data
)
case class Data(
val series: List[String],
val values: Map[String, Map[String, Map[String, Int]]]
)
Thanks for your help!
Edit: sample data as requested
{"legend_size": 4, "data": {"series": ["2013-12-17", "2013-12-18", "2013-12-19", "2013-12-20", "2013-12-21", "2013-12-22", "2013-12-23", "2013-12-24", "2013-12-25", "2013-12-26", "2013-12-27", "2013-12-28", "2013-12-29", "2013-12-30", "2013-12-31", "2014-01-01", "2014-01-02", "2014-01-03", "2014-01-04", "2014-01-05", "2014-01-06"], "values": {"afef4ac12a21d5c4ef679c6507fe65cd": {"id:twitter.com:194436690": {"2013-12-20": 0, "2013-12-29": 0, "2013-12-28": 0, "2013-12-23": 0, "2013-12-22": 0, "2013-12-21": 1, "2013-12-25": 0, "2013-12-27": 0, "2013-12-26": 0, "2013-12-24": 0, "2013-12-31": 0, "2014-01-06": 0, "2014-01-04": 0, "2014-01-05": 0, "2014-01-02": 0, "2014-01-03": 0, "2014-01-01": 0, "2013-12-30": 0, "2013-12-17": 0, "2013-12-18": 0, "2013-12-19": 0}, "id:twitter.com:330103796": {"2013-12-20": 0, "2013-12-29": 0, "2013-12-28": 0, "2013-12-23": 0, "2013-12-22": 0, "2013-12-21": 0, "2013-12-25": 0, "2013-12-27": 0, "2013-12-26": 1, "2013-12-24": 0, "2013-12-31": 0, "2014-01-06": 0, "2014-01-04": 0, "2014-01-05": 0, "2014-01-02": 0, "2014-01-03": 0, "2014-01-01": 0, "2013-12-30": 0, "2013-12-17": 0, "2013-12-18": 0, "2013-12-19": 0}, "id:twitter.com:216664121": {"2013-12-20": 0, "2013-12-29": 0, "2013-12-28": 0, "2013-12-23": 1, "2013-12-22": 0, "2013-12-21": 0, "2013-12-25": 0, "2013-12-27": 0, "2013-12-26": 0, "2013-12-24": 0, "2013-12-31": 0, "2014-01-06": 0, "2014-01-04": 0, "2014-01-05": 0, "2014-01-02": 0, "2014-01-03": 0, "2014-01-01": 0, "2013-12-30": 0, "2013-12-17": 0, "2013-12-18": 0, "2013-12-19": 0}, "id:twitter.com:414117608": {"2013-12-20": 0, "2013-12-29": 0, "2013-12-28": 1, "2013-12-23": 0, "2013-12-22": 0, "2013-12-21": 0, "2013-12-25": 0, "2013-12-27": 0, "2013-12-26": 0, "2013-12-24": 0, "2013-12-31": 0, "2014-01-06": 0, "2014-01-04": 0, "2014-01-05": 0, "2014-01-02": 0, "2014-01-03": 0, "2014-01-01": 0, "2013-12-30": 0, "2013-12-17": 0, "2013-12-18": 0, "2013-12-19": 0}}}}}
To answer Millhouse's question, the intention is to sum up each date to provide a number that describes total volume of "activity" for each ID. The "ID" is formatted as id:twitter.com:923842.
I don't know the full extent of your processing, what pipelines you have going on, what stress your server is under or what sort of threading profile you've set up to receive the information. However, assuming that you've correctly separated I/O from CPU bound tasks and what you've shown us is strictly CPU bound try simply adding .par to the very first Map.
people.data.values.par.map(b =>
as a first pass to see if you can get some performance gains. I don't see any specific ordering required of the processing which tells me it's ripe for parallelization.
Edit
After playing around with parallelization, I would add that modifying the TaskSupport is helpful for this case. You can modify a parallelized collection's tasksupport as such:
import scala.collection.parallel._
val pc = mutable.ParArray(1, 2, 3)
pc.tasksupport = new ForkJoinTaskSupport(
new scala.concurrent.forkjoin.ForkJoinPool(2))
See http://www.scala-lang.org/api/2.10.3/index.html#scala.collection.parallel.TaskSupport
I have some suggestions that might help.
I would try to move the filter command as early in the program as
possible. Since your data contains many dates with 0 activity you
would see improvements doing this. The best solution might be to
test for this while parsing the json data. If this is not possible
make it the first statement.
The way I understand it you would like to end up with a way to look up a aggregate of
the sums for a given id. I would suggest you represent this with a map from the id
to the aggregate. Also the scala List class has a sum function.
I came up with this code:
val originalList_IdToAggregate = people.data.values.map(p=> (p._2._1,
p._2._2.sum) );
It might not match your project directly, but I think it is almost what you need.
If you need to make a map of this you just append toMap to the end.
If this doesn't give you enough speed you could create your own parser that aggregates
and filters while parsing only this kind of json.
Writing parsers is quite easy in scala if you are using the parser combinators.
Just keep in mind to throw away what you don't need as early as possible and not to make
too many deep branches this should be a fast solution with a low memory footprint.
As for going parallel this can be a good idea. I don't know enough about
your application to tell you what is the best way, but it might be possible
to hide the computational cost of processing the data under the cost of
transporting the data. Try to balance parsing and io over multiple
threads and see if you can achieve this.
I have some problems with solving puzzle. I haven't found solution for this puzzle anywhere, but I tried to write it in Prolog, but I think my solution won't be fast (I generate every solution and delete them if they aren't possible or correct).
This is my problem:
(I found a name of that puzzle, here is the link with all rules of that puzzle: http://en.wikipedia.org/wiki/Kuromasu).
Now I have a different question, which method would be the quite easy to write and quite fast to solve it in Prolog. I thought about transforming my list of fields into a undirected graph, or maybe there is another method to search my list vertically (head after head)?
In:
0, 0, 0, 5, 0, 0, 0
0, 5, 0, 0, 0, 0, 2
0, 0, 0, 0, 7, 0, 4
0, 0, 0, 0, 0, 0, 0
8, 0, 13,0, 0, 0, 0
5, 0, 0, 0, 0, 6, 0
0, 0, 0, 8, 0, 0, 0
Result:
0, #, 0, 5, 0, 0, #
0, 5, 0, 0, 0, #, 2
0, #, 0, #, 7, 0, 4
#, 0, 0, 0, 0, 0, #
8, 0, 13,0, 0, 0, 0
5, 0, 0, 0, #, 6, 0
#, 0, 0, 8, 0, 0, #
This type of puzzles is called Kuromasu. Here is a page that solves it with SWI-Prolog and finite domain constraints: http://jfoutelet.developpez.com/articles/kuromasu/