11th October 2016, 08:55 AM  #241 
Join Date: Aug 2010
11th October 2016, 09:01 AM  #242 
Join Date: Dec 2007
Location: Netherlands
11th October 2016, 09:07 AM  #243 
Join Date: Aug 2010
But the simulations start with percentages, right? That's the role of the polls, I presume (weighted, but that matters little).
I should think that if you have percentages which are fed into the simulations, then the simulations are unnecessary. A decent statistician should be able to tell you the odds without running simulations. I admit that I must be missing something, since Silver is a decent statistician and I am not. 
11th October 2016, 09:43 AM  #244 
Join Date: Jul 2004
The simulation algorithm is described here: http://fivethirtyeight.com/features/...tionforecast/
It's quite complex, and digs down past the numbers into things like demographics, instead of treating the states as independent blocks. This prevents the simulation from doing things like calling Pennsylvania for Trump, but Ohio, with similar demographics but slightly more conservative leanings, for Clinton. 
11th October 2016, 05:14 PM  #245 
Join Date: Nov 2014
11th October 2016, 11:01 PM  #246 
Join Date: Dec 2003
Location: Santa Barbara, CA
12th October 2016, 12:57 AM  #247 
Join Date: Aug 2006
Location: Pattaya, Thailand
Yep.
I can't believe the people arguing for the sanctity of St. Nate's data. I don't disagree that the data reflects something, but it's shown as EC votes, not Halpern's Hypothesis of EC Votes. The number changes if you hit NowCast, Polls Only, or Polls Plus. In none of them does it reflect what those options are predicting. It's just silly. All his percentile scores for the three different models address the probabilities. Since he is best at forecasting state results, and the rules (including ME2 and NE2) are known, you can still count the EC votes. I can almost do the figures in my head, frankly. It's just a strange number to have with merely a label that says EC Votes. 
12th October 2016, 05:25 AM  #248 
Join Date: Dec 2007
Location: Netherlands
I don't think anyone is arguing for the sanctity of Nate's data. It isn't his data anyway, he doesn't poll himself; he only does an analysis of others' data, so at most people could be arguing for the sanctity of Nate's modeling.
And also, it isn't silly. It's just that the math is a bit more complicated than you think. I have to backtrack on my own words again. For the expected number of EC votes, you can simply multiply the chance of winning in each state with its number of EC seats. It doesn't matter that the races in the various states are not independent  for that single number "expected value". But that single number doesn't give the whole picture. It doesn't say anything about the probability distribution among the possible outcomes. Right now, the expected value of Hillary's EV is around 330. If the distribution of outcomes is something "normal", like a Gaussian distribution or more likely a Poisson distribution, yes, then her chance of getting a majority in the EC is something like 90%. But the distribution could also be something different. In the extreme case, the results in all state races could be completely correlated: win all or lose all. An expected value of 330 EV could also be the result of 50% of simulations coming up with 130 EV, and the other 50% coming up with 530 EV. In which case Hillary's chance of winning the presidency would be only 50%. 
12th October 2016, 06:09 AM  #249 
Join Date: Mar 2004
Location: Qatar
538 has now updated their "pollsonly" results to 86% for Clinton, and Arizona is now (slightly) blue. Arizona is still red in their "pollsplus" prediction.

12th October 2016, 06:27 AM  #250 
Join Date: Sep 2001
That's what I'd assume.
Consider the following: suppose they had predicted that for every state, Clinton had a 90% chance of winning (ignore maine and nebraska and DC issues). Now, if that is exactly true, then that means that we should expect her to LOSE 5 states. And the average number of electoral votes would be 53.8 for Trump and the rest for Clinton. The problem is, you don't know which are the 5 states that she would end up losing. Or if it would be 4, or 6. The chance that she wins them all is only 0.5%. It could be that the 5 states she loses would be the 5 largest in electoral votes. In that case, Trump would get a lot more than 50 votes. Alternatively, maybe it is the 5 smallest states, in which case he might get 20. Now, in this scenario, either of these outcomes is equally likely. However, in the real case, where probabilities vary all over the place, the math is really hard. So in that situation, it's probably a lot easier to run 10 million simulations using the probabilities for each state and looking at the outcomes that way. If I had the probabilities, I could do it easily. I will say, however, that I think Silver underestimates the probabilities for the states, at least has been recently. In fact, his claim that he correctly called 99/100 states in the last two elections would suggest that. Unless his probabilities are are in the 99% range, he should be getting a lot more wrong than he is. As I pointed out above, if all the states have a 90% probability, the odds of getting them all right are only 0.5%. I pointed this out after the last election. Silver's model isn't working as good as he asserts, because if he was right, he'd be getting a lot more wrong. If that makes any sense. 
12th October 2016, 06:28 AM  #251 
Join Date: Sep 2012
Location: UK
12th October 2016, 06:31 AM  #252 
Join Date: Aug 2010
The highlighted bit is the part that surprises me. Seems that the math could be settled more easily than running simulations.
But, of course, I know very little about statistics and Nate Silver knows a lot about statistics, and so I presume that my surprise is due to ignorance. 
12th October 2016, 07:40 AM  #253 
Join Date: Sep 2001
Probably not, because you have basically have to consider every possibility. So "What is the probability of winning these 49 states and losing 1?" That's pretty easy, but now you have to do for having all the states being the losing state. That means 50 calculations right there. And now, what about 482? Well, there's 1225 combinations there, right (50*49/2?) And then for 47  3, there are 19600 possibilities. And for 36  14, there are something like 1e11 possibilities.
You could either calculate them all, or you could just run a simulation of 10 million outcomes and see what is most likely. You can always calculate the average  that's easy. It's just probability of winning the state*Electoral votes in the state. But to calculate the most likely individual outcomes? The sample is way too large. 
12th October 2016, 07:51 AM  #254 
Join Date: Dec 2007
Location: Netherlands
No, it's not that easy as "either of these outcomes is equally likely". The outcomes in the various states are not independent, but they're correlated. Someone mentioned Ohio and Pennsylvania above. Those states have similar demographics, with Ohio traditionally more conservative leaning. If Ohio votes more Democratic than the polls indicate (due to something that happens between now and Nov 8), then surely Pennsylvania will also be more Democratic. No way you're going to see on Nov 9 that Ohio voted Democratic and Pennsylvania voted Republican.
I'm not 100% sure on the need for simulations, but I think this is where they come in. Nate's model does not only take into account the various polls out there, with their trustworthiness and their bias, but his model also contains correlations between voting habits of the various states. And the easiest way to calculate is to run simulations that take those correlations into account. So when one simulation gives Ohio an outcome +2 for Clinton compared to the polls, it will simulate the Pennsylvania outcome also with a +2 slant for Clinton (or a bit more sophisticated than that). However, on election day the numbers will have stabilized and the margins of error of the pollsters are small, so the confidence of calling a state will have greatly increased. And another factor is the correlation. It means you can't treat the 50 states as independent probabilities. Exaggerating, that means: he's either spoton or he's off in a handful states at a time. 
12th October 2016, 10:11 AM  #255 
Join Date: Jun 2011
The NYTimes has a good article on why the USC poll is such an outlier. A lot of it comes down one 19 year old black guy in Illinois.
http://www.nytimes.com/2016/10/13/up...ages.html?_r=0 
12th October 2016, 02:44 PM  #256 
Join Date: Nov 2014
NYT: 89%
538: 87% Daily Kos: 96% HuffPost: 91% PredictWise: 89% PEC: 97% 
12th October 2016, 04:47 PM  #257 
Join Date: May 2009
Location: Central Vale of Humility (USA, sort of)
12th October 2016, 07:13 PM  #258 
Join Date: Nov 2014
12th October 2016, 07:26 PM  #259 
Join Date: Aug 2006
Location: Pattaya, Thailand
The snake graph is very important. If you look at RCP, they have her at 260, but haven't conceded her ME1 (2 EC votes) and MN (10). Every other aggregator has those in her column already. She has a very VERY conservative 272, and those states that they dream of in Trump Tower (NH, WI, MI, PA) are not turning back.
More important, though, is that the 272 doesn't include OH, NC, FL or NV. She doesn't need the favorite battleground states of the last two elections, although she will take three and possibly four. RCP is doing yeomanlike duty trying to make it look close for their conservative readership. 
12th October 2016, 08:40 PM  #260 
Join Date: Nov 2014
12th October 2016, 08:49 PM  #261 
Join Date: Mar 2007
Don't entirely agree with it. Iowa is a tossup, not leaning GOP, whereas Utah is pretty solidly GOP unless, and this is the rider, the Mormon Church abandon's Trump for Johnson (I don't see them going to Hillary) in which case Johnson may take Utah. I'd also say that NC is leaning Dem currently.

12th October 2016, 08:57 PM  #262 
Join Date: Dec 2012
13th October 2016, 10:40 PM  #263 
Join Date: Nov 2014
Texas may be closer than Pennsylvania. New poll has Donald only +4 there.
http://www.wfaa.com/mb/news/local/te...rror/335896258 Romney won Texas by almost 16 points. 
13th October 2016, 10:43 PM  #264 
Join Date: Sep 2007
13th October 2016, 11:15 PM  #265 
Join Date: Sep 2012
Location: UK
13th October 2016, 11:18 PM  #266 
Join Date: Nov 2014
13th October 2016, 11:18 PM  #267 
Join Date: Oct 2006
Location: Belgium
13th October 2016, 11:55 PM  #268 
Join Date: Sep 2012
Location: UK
14th October 2016, 12:05 AM  #269 
Join Date: Sep 2011
You're assuming that the state outcomes are independent. They're not. This means that you can't simply multiply the probability that state A will go to Clinton and the probability that state B will go to Clinton to get the joint probability that states A and B both go to Clinton. For example, say there is an 80% chance that NM goes to Clinton and an 80% chance that CO goes to Clinton. If the states' outcomes were independent, then there would be a 64% chance that both states go to Clinton, a 4% chance that they both go to Trump, and 32% chance that they split. But say, at the extreme, that these two states are perfectly correlated: whichever way one goes, so does the other. Then, there is an 80% chance that they both go to Clinton, a 20% chance that they both Trump, and no chance that they split. So, in this scenario, Nate either gets both states right or both states wrong; it's one extreme or the other. The state outcomes are, in fact, correlated (though not perfectly, of course). What this means is that across 50 state predictions, Nate will tend to get either more states right or more states wrong than his individual state probabilities suggest. 
14th October 2016, 12:25 AM  #270 
Join Date: Dec 2003
Location: Santa Barbara, CA
I don't mean to but I don't think that is right.
Nate's core tool is a simulation. Just one single run. In that one simulation there are no statistics at all. Each candidate is allocated the deterministic (not statistical) EC votes that that simulation produced. Next, he does that a gazillion times. NOW, and only now, is when the statistical work is done. He has a distribution of EC votes per candidate and uses some statistic (mean, mode, etc. I don't know) to come up with his final probability. The absolute key here is that there is NO statebystate averaging done. 
14th October 2016, 12:37 AM  #271 
Join Date: Dec 2003
Location: Santa Barbara, CA
Again, that is not a valid approach using MonteCarlo tools (which is what Nate's simulations are). He does NOT predict statistics by state. All of the results of each simulation gives a deterministic number, NOT a probability. And if you take the totality of his simulations and compute probability statistics for each state, you CANNOT use the process you are talking about here to analyze the data.
No, that part is not right. That part is exactly right. No, that part is not right. You cannot take his state probabilities to do statistics because his core prediction is EC votes, NOT state probabilities for each candidate. Your right in the sense that if he was predicting what you have asserted, then you would be right. But the premise is wrong. The fact that his is not getting a lot more wrong should be a big clue as to why *your* statistical analysis is wrong. He is NOT predicting state probabilities. 
14th October 2016, 12:40 AM  #272 
Join Date: Dec 2003
Location: Santa Barbara, CA
Those kind of maps are so bogus. No wonder Fox uses them. They imply that votes are allocated by acreage, not by population. Now, if they would make the state sizes by number of EC votes then such a depiction would have merit. But then the map would be so completely distorted that you might have trouble reading it.

14th October 2016, 12:45 AM  #273 
Join Date: Dec 2003
Location: Santa Barbara, CA
jt512, please read my previous posts and you will hopefully see that the simulations would include the correlations you are talking about. Thus, you are correct that the individual states are not independent so the statebystate analysis cannot be done as you state. So you are correct, but for the wrong reason.

14th October 2016, 12:48 AM  #274 
Join Date: Nov 2014
14th October 2016, 01:39 AM  #275 
Join Date: Sep 2011
14th October 2016, 06:24 AM  #276 
Join Date: Dec 2007
Location: Netherlands
Yes, that's right.
I think we largely agree on what Nate's doing. Only I would not say his core tool is the simulation. It's his model underlying that simulation. The process has the following steps. First, there's the input data, of two kinds: 1) polling data from various polls, which is continuously added to by new polls 2) demographic data, which is static throughout the election process Second, there's Nate's model which says how to interpret these data. The model says how to weigh each poll: some are better than others, some have an inherent bias to one of the parties. The model also interprets the demographic data into correlations between the outcomes in the various states. To take jt512's hypothetical example: if the demographies of CO and NM are exactly the same, the model translates this to that those states vote identical. This is exaggerated, the model is certainly more subtle than that, and probably also takes polling data into account for establishing correlation between the various states. Third, the model produces 54 stochasts, to put it in mathematical terms, for each of the separate state elections. That is, you have probability distributions for each state what percentage each party will get. To make a crude layman analogy, you now have 54 dice with a "Clinton" side and a "Trump" side. Those dice are all differently weighted, and the correlations from the model say that some dice are connected. Those CO and NM dice  to carry on that example  have a 100% correlation, so they're effectively glued together. The OH and PA dice are more loosely tied together, so that whenever the OH die turns up "Clinton", the PA one will too. Fourth, those stochasts are what he runs his simulation with. Each run of the simulation gives one discrete outcome, e.g., Trump wins OH with x% margin and Clinton wins PA with y% margin, and overall, Clinton has n EV and Trump m. He runs the simulation a gazillion times. Yes, that's simply a Monte Carlo run. Fifth, all the numbers on the 538 page are the averages over those gazillion simulation runs. The probabilities per state that Clinton wins are simply the percentage of simulation runs she won, and the expected value of Clinton's EC vote is also the average number of EC votes over all of those simulations. And this is a bit where I struggle with why Nate needs simulations at all. You can only run a simulation when you have a probability distribution to begin with. The probability distribution for, say, Ohio, already rolls out of his model and is plugged into the simulation algorithm. Basically, already at step (3) you can say "Clinton has a 65.1% chance of winning Ohio". I surmise it's in the correlations between the various states that his model is too difficult to simply be calculated and that he needs a Monte Carlo run. I admit to a bit of a bias against Monte Carlo runs, mainly because I see all too often people making simulations for trivial questions, like "what is the chance of throwing 7 with 2 dice?" which you can perfectly calculate with pencil and paper. But Nate is a professional statistician, he surely knows what he's doing. Finally, to come back to my statement you objected to. Yes, that statement is true. Let's do that in proper mathematical terms, and define stochasts: D_{OH} = number of Democratic electoral votes from Ohio That's a stochast with outcome either 0 or 18. The chance it is 18 is those 65.1% that comes out of the simulation. Define the respective stochasts for all state races. Then define the stochast: D_{USA} = number of overall Democratic electoral votes which is a stochast with discrete values between 0 and 538. Then it obviously holds that: D_{USA} = SUM (i in states) D_{i} And then basic probability theory says about their expected values: ED_{USA} = SUM (i in states) ED_{i} It doesn't matter for the latter formula whether the various state stochasts are independent or not (they're not). That doesn't matter for the single value of "expected value", it does matter for the distribution. 
14th October 2016, 11:02 AM  #277 
Join Date: Sep 2007
Arizona (Data Orbital; 10/1112)Within margin of error but Clinton is ahead in a poll in Arizona. 
14th October 2016, 11:22 AM  #278 
Join Date: Jan 2007
Location: UK
The highlighted part. I not a mathematician, but my understanding of a Monte Carlo analysis is that one could play around with the distributions a bit i.e. instead of using a single probability distribution for say, Nevada, one has a distribution of the distributions, so the weighting could be altered over the runs. I suspect that would be pretty ugly to work out analytically (or even if you do just have the probability distribution for each of the 50 states).

14th October 2016, 01:56 PM  #279 
Join Date: Dec 2007
Location: Netherlands
Thank you. I am a mathematician by education, but I did only a few probability and statistics classes  I never quite liked it from a philosophical point of view, for me maths is about certainty (the Dutch word wiskunde also means that). There must be at least something like a shifting probability distribution to model the correlation between states' outcomes.

14th October 2016, 01:59 PM  #280 
Join Date: Jan 2007
Location: UK
