RStudio seems to handle display of output slides inconsistently/poorly. How can I control the output so that the saved version of my slides matches what I see in RStudio?
This test document:
test
========================================================
author:
date:
autosize: true
Exponentials
========================================================
> "...King Shihram asked Sissa ben Dahir what reward he wanted... Sissa said that he would take this reward: the king should put one grain of wheat on the first square of a chessboard, two grains of wheat on the second square, four grains on the third square, eight grains on the fourth square, and so on...
> [The King] ordered his slaves to bring out the chessboard and they started putting on the wheat. Everything went well for a while, but the king was surprised to see that by the time they got halfway through the chessboard the 32nd square required more than four billion grains of wheat, or about 100,000 kilos of wheat...
> [T]o finish the chessboard you would need as much wheat as six times the weight of all the living things on Earth." - Story of [Ibn Khallikan](https://en.wikipedia.org/wiki/Ibn_Khallikan), _ca_. 1260 AD, [via](http://quatr.us/islam/literature/chesswheat.htm)
<!-- BREAKING UP QUOTE BLOCK -->
> "Humans don't understand exponential growth. If you fold a paper 50 times it goes to the moon and back." - Mark Zuckerberg [via](http://www.kazabyte.com/2011/12/we-dont-understand-exponential-functions.html)
Displays like this in RStudio:
But when I open it as a standalone HTML page, it's overlarge and the quote box is narrower:
I'd like my Rpres to be available without necessarily needing RStudio on the local machine. How can I synergize the output I use to intermediate output with the final product? That is, how can I be more sure of what the slides will look like on export while working in RStudio? What controls do I have at my disposal for manipulating the output of the slides?
Related
I am in the process of designing an algorithm that will calculate regions in a candlestick chart where strong areas of support exist. An "area of support" in this case is defined as an area in the chart where the price of a stock rises by a large amount in a short period of time. (Please see the diagram below, the blue dots represent these strong areas of support)
The data I am working with is a list of over 6000 TOHLC (timestamp, open price, high price, low price, close price) values. For example, the first entry in this list of data is:
[1555286400, 83.7, 84.63, 83.7, 84.27]
The way I have structured the algorithm to work is as follows:
1.) The list of 6000+ TOHLC values are split into sub-lists of 30 TOHLC values (30 is a number that I arbitrarily chose). The lowest low price (LLP) is then obtained from each of these sub-lists. The purpose behind using this method is to find areas in the chart where prices dip.
2.) The next step is to determine how high the price rose from each of these lows. For this, I take the next 30 candlestick values from the low and determine what the highest high price (HHP) is. Then, if HHP / LLP >= 1.03, the low price is accepted, otherwise it is discarded. Again, 1.03 is a value that I arbitrarily chose, by analysing the stock chart manually and determining how much the price rose on average from these lows.
The blue dots in the chart above represent the accepted areas of support by the algorithm. It appears to be working well, in terms of that I am trying to achieve.
So the question I have is: does anyone have any improvements they can suggest for this algorithm, or point out any faults in it?
Thanks!
I may have understood wrong, however, from your explanation it seems like you are doing your calculation in separate 30-ish sub lists and then combining them together.
So, what if the LLP is the 30th element of sublist N and HHP is 1st element of sublist N+1 ? If you have taken that into account, then it's fine.
If you haven't taken that into account, I would suggest doing a moving-window type of approach in reading those data. So, you would start from 0th element of 6000+ TOHLC and start with a window size of 30 and slide it 1 by 1. This way, you won't miss any values.
Some of the selected blue dots have higher dip than others. Why is that? I would separate them into another classifier. If you will store them into an object, store the dip rate as well.
Floating point numbers are not suggested in finance. If possible, I'd use a different approach and perhaps classifier, solely using integers. It may not bother you or your project as of now, but surely, it will begin to create false results when the numbers add up in the future.
I am looking for any direction on how to implement the process below, you should not need to understand much at all about poker.
Below is a grid of possible two-card combinations.
Pocket pairs in blue, suited cards in yellow and off-suited in red.
Essentially there is a slider under the matrix which selects a percentage of possible combinations of two cards which a player could be dealt. However, you can see that it moves in a sort of linear fashion, towards the "better" cards.
These selections are also able to be parsed from strings e.g AA-88,AKo-AJo,KQo,AKs-AJs,KQs,QJs,JTs is 8.6% of the matrix.
I've looked around but cannot find questions about the specific selection process. I am not looking for "how to create this grid" or , more like how would I go about the selection process based on the sliding percentage. I am primarily a JavaScript developer but snippets in any language are appreciated, if applicable.
My initial assumptions are that there is some sort of weighting involved i.e. (favoured towards pairs over suited and suited over non-suited) or could it just be predetermined and I'm overthinking this?
In my opinion there should be something along the lines of "grouping(s)" AND "a subsequent weighting" process. It should also be customisable for the user to provide an optimal experience (imo).
For example, if you look at the below:
https://en.wikipedia.org/wiki/Texas_hold_%27em_starting_hands#Sklansky_hand_groups
These are/were standard hand rankings created back in the 1970s/1980s however since then, hand selection has become much more complicated. These kind of groupings have changed a lot in 30 years so poker players will want a custom user experience here.
But lets take a basic preflop scenario.
Combinations:- pairs = 6, suited = 4, nonsuited = 12
1 (AA:6, KK:6, QQ:6, JJ:6, AKs:4) = 28combos
2 (AQs:4, TT:6, AK:16, AJs:4, KQs:4, 99:6) = 40
3 (ATs:4, AQ:16, KJs:4, 88:6, KTs:4, QJs:4) = 38
....
9 (87s:4, QT:12, Q8s:4, 44:6, A9:16, J8s:4, 76s:4, JT:16) = 66
Say for instance we only reraise the top 28/1326 of combinations (in theory there should be some deduction here but for simplicity let's ignore that). We are only 3betting or reraising a very very obvious and small percentage of hands, our holdings are obvious at around 2-4% of total hands. So a player may want to disguise their reraise or 3bet range with say 50% of the weakest hands from group 9. As a basic example.
Different decision trees and game theory can be used with "range building" so a simple ordered list may not be suitable for what you're trying to achieve. depends on your programs purpose.
That said, if you just looking to build an ordered list then you could just take X% of hands that players open with, say average is 27% and run a hand equity calculator simulation tweaking the below GitHub to get different hand rankings. https://github.com/andrewprock/pokerstove
Theres also some lists here at the bottom this page.
http://www.propokertools.com/help/simulator_docs
Be lucky!
I was sick and so I missed my past 2 classes, I was wondering if someone could help me figure out how to solve this problem and I could sort of study it and try to understand it,I need pseudocode for this problem, I feel like I'm falling a little behind:
The Vernon Hills Mail-Order Company often sends multiple packages per order. For each customer order, output enough mailing labels to use on each of the boxes that will be mailed. The mailing labels contain the customer’s complete name and address, along with a box number in the form Box 9 of 9. For example, an order that requires three boxes produces three labels: Box 1 of 3, Box 2 of 3, and Box 3 of 3. Design an application that reads records that contain a customer’s title (for example, Mrs.), first name, last name, street address, city, state, zip code, and number of boxes. The application must read the records until eof is encountered and produce enough mailing labels for each order.
Write down each separate step that you list on a line of its own, and draw arrows between them, to indicate that a step should be followed by the next one.
That will process one "order". Since an order may consist of multiple boxes, look for where you can loop in this part. Draw a small arrow upwards to the right step where to restart for an individual box in an order.
At the end of this diagram you have processed a single "order", so now look for where the main loop should restart and on what condition.
With this done you have a flow chart; a purely visual aid, which you can translate into pseudocode (or, for that matter, directly into any programming language that has the right commands). So all that's left is to translate the graphic arrows into appropriate pseudo-code.
So I'm building a rock paper scissors bot, and I need people to be able to be sure that the robot doesn't "cheat" and make its selection after the player chooses their throw.
Normally, for computer viewing, this is done by hashing the choice and maybe providing a salt, and then revealing the choice+salt. But I want something that can be "instantly" verifiable by a human. If I just hash the choice, people would cry foul on rigging the hash.
So my idea is to have a "visual hashing algorithm" of sorts -- a hashing algorithm that humans can perform themselves trivially and easily, and verify.
What my idea is right now is to have three boxes: Rock, Paper, and Scissors, and then have three other unlabeled boxes A, B, and C across from the RPS boxes. Then I connect Rock to one of them using tangled lines, Paper to another, and C to another. The lines are tangled so that it would take time to "follow back" the line from box B to, say, Scissors.
When the computer picks its throw, it "highlights" the corresponding box to the throw -- that is, if Scissors' tangled lines lead to Box B, it'll highlight Box B. But it won't reveal that it was Scissors. The human then is given, say, 3 seconds, to pick a throw. 3 seconds, hopefully, is not fast enough for them to detangle the lines and trace back from box B to Scissors.
Then, when the human picks the throw, the computer reveals Scissors, and also highlights the tangled line from Scissors to Box B so that it is clear that Scissors has lead to Box B this entire time, and it couldn't have just cheated.
While this would work, I think...it's a little ugly, and inelegant. The human can easily verify that the computer didn't cheat, but at the same time, it seems unusual or weird and introduces so many UI elements that the screen might seem cluttered or untrustworthy. The less UI elements/graphical footprint, the better.
Are there any solutions right now that exist that solve this issue?
The "hash" of the throw is presented, as well as the hashing algorithm, which takes time (at least 3 seconds) to "undo".
When the throw is revealed, it should be easily and visually and immediately identifiable that the hashing algorithm was performed validly and that the throw does indeed correspond to the hash
It uses as few UI elements as possible and has as small a graphical footprint as possible
This is interesting. An idea in my head would be to display a 10x10 grid (of say 5 pixels per square) with a key;
Red: Rock; Blue: Scissors; Green: Paper
And fill the grid randomly with 33 red, 33 blue and 33 green, and then 1 random of the 3 colours. A human would struggle to identify the 34 colours over the other 2 in a small time period, but the count could be revealed on user input, along with optionally expanding the grid/highlighting the cells etc.
A small UI footprint, and neater than your solution, but whether it's good enough...
You have 3 seconds, which calculation is correct?
R: 317 * 27 = 8829
P: 297 * 16 = 5605
S: 239 * 38 = 9082
When I tell you my answer, you could quickly verify with a calculator.
What's the rationale behind the formula used in the hive_trend_mapper.py program of this Hadoop tutorial on calculating Wikipedia trends?
There are actually two components: a monthly trend and a daily trend. I'm going to focus on the daily trend, but similar questions apply to the monthly one.
In the daily trend, pageviews is an array of number of page views per day for this topic, one element per day, and total_pageviews is the sum of this array:
# pageviews for most recent day
y2 = pageviews[-1]
# pageviews for previous day
y1 = pageviews[-2]
# Simple baseline trend algorithm
slope = y2 - y1
trend = slope * log(1.0 +int(total_pageviews))
error = 1.0/sqrt(int(total_pageviews))
return trend, error
I know what it's doing superficially: it just looks at the change over the past day (slope), and scales this up to the log of 1+total_pageviews (log(1)==0, so this scaling factor is non-negative). It can be seen as treating the month's total pageviews as a weight, but tempered as it grows - this way, the total pageviews stop making a difference for things that are "popular enough," but at the same time big changes on insignificant don't get weighed as much.
But why do this? Why do we want to discount things that were initially unpopular? Shouldn't big deltas matter more for items that have a low constant popularity, and less for items that are already popular (for which the big deltas might fall well within a fraction of a standard deviation)? As a strawman, why not simply take y2-y1 and be done with it?
And what would the error be useful for? The tutorial doesn't really use it meaningfully again. Then again, it doesn't tell us how trend is used either - this is what's plotted in the end product, correct?
Where can I read up for a (preferably introductory) background on the theory here? Is there a name for this madness? Is this a textbook formula somewhere?
Thanks in advance for any answers (or discussion!).
As the in-line comment goes, this is a simple "baseline trend algorithm",
which basically means before you compare the trends of two different pages, you have to establish
a baseline. In many cases, the mean value is used, it's straightforward if you
plot the pageviews against the time axis. This method is widely used in monitoring
water quality, air pollutants, etc. to detect any significant changes w.r.t the baseline.
In OP's case, the slope of pageviews is weighted by the log of totalpageviews.
This sorta uses the totalpageviews as a baseline correction for the slope. As Simon put it, this puts a balance
between two pages with very different totalpageviews.
For exmaple, A has a slope 500 over 1000,000 total pageviews, B is 1000 over 1,000.
A log basically means 1000,000 is ONLY twice more important than 1,000 (rather than 1000 times).
If you only consider the slope, A is less popular than B.
But with a weight, now the measure of popularity of A is the same as B. I think it is quite intuitive:
though A's pageviews is only 500 pageviews, but that's because it's saturating, you still gotta give it enough credit.
As for the error, I believe it comes from the (relative) standard error, which has a factor 1/sqrt(n), where
n is the number of data points. In the code, the error is equal to (1/sqrt(n))*(1/sqrt(mean)).
It roughly translates into : the more data points, the more accurate the trend. I don't see
it is an exact math formula, just a brute trend analysis algorithm, anyway the relative
value is more important in this context.
In summary, I believe it's just an empirical formula. More advanced topics can be found in some biostatistics textbooks (very similar to monitoring the breakout of a flu or the like.)
The code implements statistics (in this case the "baseline trend"), you should educate yourself on that and everything becomes clearer. Wikibooks has a good instroduction.
The algorithm takes into account that new pages are by definition more unpopular than existing ones (because - for example - they are linked from relatively few other places) and suggests that those new pages will grow in popularity over time.
error is the error margin the system expects for its prognoses. The higher error is, the more unlikely the trend will continue as expected.
The reason for moderating the measure by the volume of clicks is not to penalise popular pages but to make sure that you can compare large and small changes with a single measure. If you just use y2 - y1 you will only ever see the click changes on large volume pages. What this is trying to express is "significant" change. 1000 clicks change if you attract 100 clicks is really significant. 1000 click change if you attract 100,000 is less so. What this formula is trying to do is make both of these visible.
Try it out at a few different scales in Excel, you'll get a good view of how it operates.
Hope that helps.
another way to look at it is this:
suppose your page and my page are made at same day, and ur page gets total views about ten million, and mine about 1 million till some point. then suppose the slope at some point is a million for me, and 0.5 million for you. if u just use slope, then i win, but ur page already had more views per day at that point, urs were having 5 million, and mine 1 million, so that a million on mine still makes it 2 million, and urs is 5.5 million for that day. so may be this scaling concept is to try to adjust the results to show that ur page is also good as a trend setter, and its slope is less but it already was more popular, but the scaling is only a log factor, so doesnt seem too problematic to me.