Related
For example, in Huggingface's example:
encoded_input = tokenizer("Do not meddle in the affairs of wizards, for they are subtle and quick to anger.")
print(encoded_input)
{'input_ids': [101, 2079, 2025, 19960, 10362, 1999, 1996, 3821, 1997, 16657, 1010, 2005, 2027, 2024, 11259, 1998, 4248, 2000, 4963, 1012, 102],
'token_type_ids': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
'attention_mask': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]}
The input_ids vector already encode the order of each token in the original sentence. Why does it need positional encoding again with an extra vector to represent it?
The reason is the design of the neural architecture. BERT consists of self-attention and feedforward sub-layers, and neither of them is sequential.
The feedforward layers process each token independently of others.
The self-attention views the input states as an unordered set of states. Attention can be interpreted as soft probabilistic retrieval from a set of values according to some keys. The position embeddings are there so the keys can contain information about their relative order.
I installed the Guna theme which brings a nice color scheme that I would like to keep; however, the theme comes with a clock and weather widget that I can't remove even if I set the theme back to default.
I have tried setting the layer where the clock texture is contained, to have zero opacity, but then the most I got it to do was showing the clock in a different color.
The following lines are in the Guna theme's default settings, starting at line 166:
{
"class": "sidebar_container",
"layer0.texture": "Guna/assets/simple/sidebar/sidebar-bg-clock-nb.png",
"layer0.inner_margin": [15, 55, 15, 0],
//"layer0.inner_margin": [15, 70, 15, 0],
//"layer0.inner_margin": [15, 92, 15, 0],
"layer0.tint": "color(var(--background))",
"layer0.opacity": 1,
"content_margin": [0, 45, 0, 0],
//"content_margin": [0, 60, 0, 0],
//"content_margin": [0, 82, 0, 0],
},
{
"class": "sidebar_container",
"settings": ["gnwidgx"],
"layer0.texture": "Guna/assets/simple/sidebar/sidebar-bg-nb.png",
"layer0.inner_margin": [15, 55, 15, 0],
"layer0.tint": "color(var(--background))",
"layer0.opacity": 1,
"content_margin": [0, 0, 0, 0]
}
I tried setting the content margin to [0,0] for the clock widget (like I've seen it's possible to do to hide file icons), like so:
{
"variables": {
},
"rules":
[
{
"class": "sidebar_container",
"content_margin": [0,0],
"layer0.opacity": 0
},
{
"class": "sidebar_container",
"content_margin": [0,0],
"layer1.opacity": 0
}
]
}
And the result is this: every sidebar element, including the files and folders, bugs out and leaves a trail as if I were shift+dragging everything in mspaint.
Thanks in advance!
Apparently Guna has a setting that adds the widgets (clock, weather, etc.) to all themes. Fortunately, it can be turned off. Select Preferences → Package Settings → Guna → Settings and add the following to the right pane:
"sidebar_widget_on_other_theme": false,
You can turn off the widgets in Guna itself with this setting:
"sidebar_widget": [],
Valid values for that array are empty (as above) or any combination of "clock", "weather", and "date".
Save the right pane when you're done, and the settings should be applied immediately. If not, you might need to restart Sublime.
I have a list of 23 utilities that can be in a state of either enabled or disabled. I've ordered them from 0-22.
Some of these utilities are dependent on others, meaning they cannot be enabled without one or multiple dependency utilities first being enabled. I've put the indices of each utility's dependencies in a list for each utility; for example, if utilities 0-1 had no dependencies, but utility 2 had dependencies on utilities 0 and 9, the full dependency list would look something like:
[ [], [], [0, 9], ... ]
What I want to do is devise an algorithm (pseudocode is fine, implementation does not matter) for generating a list of all possible 23-bit bitvectors---each bit in each bitvector with an index that we could label 0-22 corresponding to a single utility, each bitvector itself representing a possible combination of the status of all 23 utilities---that ignores combinations where the dependency requirements provided by the dependency list (described above) would not be satisfied. For example (assume right-to-left numbering):
[
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0 ],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1 ],
//skip[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0 ] this would not be included (2 enabled, but 0 and/or 9 are not. See prev. example of dependency list)
...
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
]
First step, get rid of all circular dependencies. If A depends on B depends on C depends on A, all will be on/off together. So we can transfer all dependencies to A, then fill B and C at the last step. This is a question of identifying all connected components in a graph, which we can use Kosaraju's algorithm for to do efficiently.
Second step, do a topological sort by dependencies of the remaining list. This will put the remaining utilities into a list where each only depends on ones you looked at before.
And now we can use recursion down that list. The first utility can be 0 or 1. Each subsequent utility is 0 only if some dependency is not satisfied, else it can be 0 or 1. And then for the ones eliminated due to being part of circular dependencies, fill them in with whatever value the one kept has.
I have tried to install Elastic search on AWS with 3 instances with 1 master node and 2 data nodes. I have followed the steps mentioned https://www.elastic.co/blog/running-elasticsearch-on-aws.
Below is the elasticsearch.yml setting that I have edit based on the need.
node.master= true
node.data= true
node.ingest= true
discovery.zen.ping.unicast.hosts = [list of ip of nodes]
I have started the elastic service and it is running the status is green.
Following is the output of curl -XGET http:// privateip:9200/_cluster/health?pretty
{
"cluster_name" : "EduGrowthElastic",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 0,
"active_shards" : 0,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
I am worried for the active sherds are zero and every other field about the sherds.
I would be very happy if anybody can help me on this. This is my first post sorry for the mistakes
If you just installed and started Elasticsearch without indexing any data then those numbers are correct.
No data means no index which also means no shards, and hence why all shard numbers are 0.
The only thing that matters at this point is that the status is green!
I have a database of readings from weather sensors. One of the items measured is 'sky temperature'. I want to find the minimum sky temperature each day over a period of a month or two.
The first thing I tried was this:
r.db('Weather').table('TAO_SkyNet', {readMode:'outdated'})
.group(r.row('time').dayOfYear(),{index:'time'})
.min('sky')
I think that might work, except that it is a large database and the query times out after 300 seconds. Fair enough, I really don't want the data back to the beginning of time. A few weeks will do. So I tried to restrict the records examined like this:
r.db('Weather').table('TAO_SkyNet', {readMode:'outdated'})
.between(r.time(2018,3,1,'Z'), r.now())
.group(r.row('time').dayOfYear(),{index:'time'})
.min('sky')
..and I get...
e: Expected type TABLE but found TABLE_SLICE:
SELECTION ON table(TAO_SkyNet) in:
r.db("Weather").table("TAO_SkyNet", {"readMode": "outdated"}).between(r.time(2018, 3, 1, "Z"), r.now()).group(r.row("time").dayOfYear(), {"index": "time"}).min("sky")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
So, I'm stuck here. How do I group on a subset of the table?
between returns a table slice, and table slices don't support indexes.
table.between(lowerKey, upperKey[, options]) → table_slice
By the way, between operates over indexes itself.
Once you remove {index:'time'} from your group clause (if TAO_SkyNet has time as its primary key):
r.db('Weather')
.table('TAO_SkyNet', {readMode: 'outdated'})
.between(r.time(2018, 3, 1, 'Z'), r.now())
.group(r.row('time').dayOfYear())
.min('sky')
or move the index option to the between clause (if TAO_SkyNet has time as its secondary key)
r.db('Weather')
.table('TAO_SkyNet', {readMode: 'outdated'})
.between(r.time(2018, 3, 1, 'Z'), r.now(), {index: 'time'})
.group(r.row('time').dayOfYear())
.min('sky')
it should work fine.
Test dataset:
r.db('Weather').table('TAO_SkyNet').insert([
// day 1
{time: r.time(2018, 3, 1, 0, 0, 0, 'Z'), sky: 10},
{time: r.time(2018, 3, 1, 8, 0, 0, 'Z'), sky: 4}, // min
{time: r.time(2018, 3, 1, 16, 0, 0, 'Z'), sky: 7},
// day 2
{time: r.time(2018, 3, 2, 0, 0, 0, 'Z'), sky: 2}, // min
{time: r.time(2018, 3, 2, 8, 0, 0, 'Z'), sky: 4},
{time: r.time(2018, 3, 2, 16, 0, 0, 'Z'), sky: 9},
// day 3
{time: r.time(2018, 3, 3, 0, 0, 0, 'Z'), sky: 7},
{time: r.time(2018, 3, 3, 8, 0, 0, 'Z'), sky: 7},
{time: r.time(2018, 3, 3, 16, 0, 0, 'Z'), sky: 1} // min
]);
Query result:
[{
"group": 60,
"reduction": {"sky": 4, "time": Thu Mar 01 2018 08:00:00 GMT+00:00}
},
{
"group": 61,
"reduction": {"sky": 2, "time": Fri Mar 02 2018 00:00:00 GMT+00:00}
},
{
"group": 62,
"reduction": {"sky": 1, "time": Sat Mar 03 2018 16:00:00 GMT+00:00}
}]