Alright, time for the big reveal. With comment section chatter over possible unbannings increasing, it’s time for the hard data to make its own argument. I had less fun with this test than with previous efforts, mostly because I have a long and difficult history with Bloodbraid Elf. That said, the testing was comparatively easy and painless. The combination of straightforward play and considerable experience paid off with this test.
In this series, I take a card from the Modern Banned and Restricted List and test it against a gauntlet of current Tier 1 decks. I am trying to evaluate its power in the current field and determine if it is plausible to unban. So far I’ve tested Stoneforge Mystic, Jace, the Mind Sculptor, and Preordain. Now it’s time for Bloodbraid Elf. I’ve been discussing its history, how it would fit into current decks, and finally the intangibles of playing the card. Today we’ll see the hard test data.
An All-Encompassing Disclaimer
These are the results from my experiment. It is entirely possible that repetition will yield different results. This project models the effect that the banned card would have on the metagame as it stood when the experiment began. This result does not seek to be definitive, but rather to provide a starting point for discussions on whether the card should actually be banned.
This test consists of 500 total matches: 250 with the control Jund list, and 250 with the test deck, Bloody Jund. This is so I get 100 matches in against each test deck, or a nice, round number (n) for my analysis. Play/draw alternates, as does which deck is played. The first match with the control list is followed by the the first match with the test deck. The purpose is to mitigate the effect increasing experience and familiarity play on the match results. Sideboarding strategy is decided before testing begins and never changes, even when it is determined to be wrong. Otherwise, the results would be invalid. We also behaved like we didn’t know what the matchup was game one.
Testing was done primarily over Skype with paper cards. We don’t use MTGO because timing-out and misclicking can ruin the data. Accuracy is more important than win percentage. Also, Skype and proxies are free, while buying a deck for testing purposes on MTGO isn’t. We previously used free simulation programs, but they proved too time-consuming for my team’s tastes. When everything is manual, the clicks needed to play become maddening.
Note on Significance
When I refer to statistical significance, I really mean probability. Specifically, the probability that the differences between a set of results are the result of the trial and not normal variance. Statistical tests are used to evaluate whether normal variance is behind the result, or if the experiment caused a noticeable change in result. This is expressed in confidence intervals determined by the p-value from the statistical test. In other words, statistical testing determines how confident researchers are that their results came from the test and not from chance.
If a test yields p > .1, the test is not significant, as we are less than 90% certain that the result isn’t variance. If p < .1, then the result is significant at the 90% level. This is considered weakly significant and insufficiently conclusive by most academic standards; however, it can be acceptable when the n-value of the data set is low. While you can get significant results with as few as 30 entries, it takes huge disparities to produce significant results, so sometimes 90% confidence is all that is achievable. p < .05 is the 95% confidence interval, and is considered a significant result. The data is almost certainly the result of the experiment. Should p < .01, the result is significant at the 99% interval, which is as close to certainty as you can get. When looking at the results, just look at the p-value to see if the data is significant.
Alright, enough waiting. First, I will report the overall win percentages. Then I will post the results of the z-test to show whether the result is significant. I use the z-test because it’s the more common test. I do other tests to confirm my result, but I won’t report them. I’ll finish this section off with some interesting statistics I kept during the test.
- Total wins – 269
- Total win % – 53.8%
- Total control wins – 122
- Control win % – 48.8%
- Total test wins – 147
- Test win % – 58.8%
Overall, Jund had a favorable record against my gauntlet. The control deck was just under 50% against the field, while Bloody Jund shot up to ~60%. That is quite the result and the z-test result should not be surprising.
As you can see, this result is significant at the 95% level, and very nearly at 99%. Including Bloodbraid Elf strongly affected the match results. This was not surprising to me, as I remember just how powerful the card has always been. Some other interesting results from the test:
- Average cascade length – 2.31 cards
- Longest cascade – 8 cards (all lands)
- Average cascade hit’s mana cost – 2.72
- Times cascading past other Bloodbraids – 98 (once past all three!)
- Average turn playing first Bloodbraid – 4.98
- Times losing to Blood Moon – 15
Anyway, enough justifying my obsessive note-taking; time to actually make sense of the results. This necessitates breaking the total data down by gauntlet deck, but I must restate that the n of these tests is small in comparison. The threshold for significance is much higher.
Quick aside: the metagame today looks a lot like the metagame back in 2012. Tron, Affinity, UWx Control, Storm, and creature toolbox (then Birthing Pod, now Collected Company) are all top decks. No, it’s not exactly the same as the last time Bloodbraid was loose, and Jund is not the same powerhouse either. However, it does indicate that the conditions that let Jund thrive back then are still present now, and leads me to speculate about history repeating itself.
The classic matchup of two old rivals. In some senses, Jund and Affinity are Modern. If you don’t know how this matchup normally works it’s a removal heavy deck against a small creature deck. Jund wins through superior attrition while Affinity wins either through blitzkrieg or Etched Champion.
- Control Deck wins – 28, 56%
- Test Deck wins – 28, 56%
Dead even. Let’s check out my numbers anyway.
Absolutley not significant. The matchup is determined by factors not related to Bloodbraid Elf. Specifically, whether Jund clunks out and doesn’t kill enough robots to stifle Affinity. The deck is so fast that hand disruption is minimally effective, and if Affinity can stick Etched Champion with any kind of power boost (and maintain protection from colors), they’re in for an easy game. Otherwise it’s a Jund-favoring slog through removal. Extra card advantage and tempo on turn four don’t dramatically alter the odds of either scenario.
And now for the traditional predator to Jund. Gx Tron has always been a hard matchup for Jund, which struggles to keep pace. Thoughtseize is critical so you don’t just lose to Tron’s bombs, but they always have more, and it’s hard to profitably interact. The GB version is said to be better than GR because of Collective Brutality, but I have no opinion.
- Control Deck wins – 19, 38%
- Test Deck wins – 26, 52%
That’s a very large spike. The additional maindeck Thoughtseize was a factor, but not only incrementally. It’s arguably the best maindeck card in the matchup, but there’s only one more copy so the benefit is small. There’s more to this.
The result is significant at the 90% level but not at 95%. There’s that problem of the small n, as previously mentioned. I would say that these results are probably significant, contingent on additional study.
Bloody Jund had the same problems as Jund against Tron: it just doesn’t measure up in raw power or speed. However, Bloodbraid allowed Jund to make up for that with card advantage and tempo. Even when Jund was behind, playing two spells made catch-up significantly easier. Tron has also cut down on Wurmcoil Engine, and that card was Jund’s worst nightmare. Not a lot killed the initial Wurm, and then you had to expend additional resources to kill the tokens.
This matchup is about Jund’s clock. You can have all the disruption in the world (and Jund does), but if you don’t end the game, Storm will eventually find Past in Flames and enough mana to win.
- Control Deck wins – 26, 52%
- Test Deck wins – 30, 60%
- Turn three Storm wins – 5 (3 against control, 2 against test)
That is an interesting jump, but it is not going to be significant. This doesn’t surprise me, there is a lot of variance associated with Storm. For example, I lost once to a turn two Blood Moon as the control deck and three times with the test deck. That’s just Storm variance and Bloodbraid Elf or my play had little effect.
As I said, not a significant result. There was so much going on with Storm that I never felt that my own play mattered as much. As long as I had some kind of clock and had disruption, I’d done all I could.
The Jund sideboarding guide said to do things this way. I asked about the Surgical Extractions and was told no. Apparently, Grafdigger’s Cage and Scavenging Ooze are enough. You’re free to disagree, but given Storm’s sideboarding toward Blood Moon and away from the graveyard, I see the point.
The deck that eventually supplanted traditional Jund. I thought this would be a worse matchup than it proved to be, all things considered. Jund has a higher density of relevant cards while Grixis has larger threats and more ways to find them.
- Control Deck wins – 24, 48%
- Test Deck wins – 30, 60%
This is almost a significant result. The decks are far more evenly matched than anyone figured. This indicates to me the preference for Grixis over Jund comes from other matchups rather than any advantage over the deck.
The matchup was a weird kind of attrition: the most important spells are the discard and kill spells, which are nearly identical across decks. The blue cantrips made it more likely Grixis would see them, but that deck also had a harder time getting out threats. I also think that I played this matchup wrong, as it became clear during testing that Jund did better when it went wide around the bigger Grixis threats, making patience critical for Bloody Jund. I should have been sideboarding to take advantage of this revelation, but it was too late.
Grixis Death’s Shadow
+1 Liliana, the Last Hope
+1 Kolaghan’s Command
It’s rather fortuitous that I’m doing the gauntlet alphabetically, as it lets me save the most interesting result for last. Jeskai Tempo has arguably been the best deck over the past few months even if it’s slipping in our rankings. Its combination of removal and hard-to-fight threats is remarkably Jund-like, and I think it even plays like Jund.
- Control Deck wins – 25, 50%
- Test Deck wins – 33, 66%
That is a very large jump. The decks are fighting a war of attrition where tempo is a factor, or just the kind of fight that Bloodbraid Elf wins. The Elf substantially impacted the matchup.
This result is significant at the 90% level and very nearly at 95%. One more win or a control loss was needed. I would say again that this individual result is probably significant, with a high likelihood of confirmation.
Bloodbraid Elf let Jund really break things open in this matchup. Jeskai is all about incremental advantage, which is why Dark Confidant is so important to the matchup if Jeskai doesn’t have a way to kill it. Bloodbraid accomplished the same job, but immediately, and with a tempo boost to boot. Jeskai tempo doesn’t have a similar gamebreaker, and so fell behind against Bloody Jund far more often than the normal version. This isn’t surprising: this is why Jund killed traditional control when it had Bloodbraid previously. This result was just a confirmation.
Not having mirror breakers like Ancestral Vision really hurt Jeskai.
What Does It Mean?
Jund was overall improved by the inclusion of Bloodbraid Elf, to the great surprise of nobody on my team. The most significant results were against Tron and Jeskai, respectively the worst matchup and a very even one. What this says about Bloodbraid Elf in the Modern metagame is the subject of next week’s article. See you then!