<strong><font color="#f73809">Abstract</font><br/></strong>We study rare events data, binary dependent variables with dozens to<br/>thousands of times fewer ones (events, such as wars, vetoes, cases of<br/>political activism, or epidemiological infections) than zeros (“nonevents”).<br/>In many literatures, these variables have proven difficult to explain and<br/>predict, a problem that seems to have at least two sources. First, popular<br/>statistical procedures, such as logistic regression, can sharply<br/>underestimate the probability of rare events. We recommend corrections<br/>that outperform existing methods and change the estimates of absolute and<br/>relative risks by as much as some estimated effects reported in the<br/>literature. Second, commonly used data collection strategies are grossly<br/>inefficient for rare events data. The fear of collecting data with too few<br/>events has led to data collections with huge numbers of observations but<br/>relatively few, and poorly measured, explanatory variables, such as in<br/>international conflict data with more than a quarter-million dyads, only a<br/>few of which are at war. As it turns out, more efficient sampling designs<br/>exist for making valid inferences, such as sampling all available events<br/>(e.g., wars) and a tiny fraction on nonevents (peace). This enables<br/>scholars to save as much as 99% of their (nonfixed) data collection costs<br/>or to collect much more meaningful explanatory variables. We provide<br/>methods that link these two results, enabling both types of corrections to<br/>work simultaneously, and software that implements the methods<br/>developed.