If we do no-hitters since 1998, we end up with a whole bunch of teams that haven’t been no-hit. Applying a separate statistical concept (chi-squared), we want all expected counts to be at least greater than one so we can make a valid insight.

The 1927 Yankees and 1962 Mets do have an approximately equal chance of being no-hit. A no-hitter involves three things: 1) a dominant pitcher 2) excellent defense and 3) luck on batted balls being hit at fielders. The opposition’s offense is important, but there’s no difference in a no-hitter between Babe Ruth getting just slightly under the ball and flying out to the track and Roger Craig (first ’62 Met that came to mind) popping out to the catcher. The only thing that we care about from the offense when determining how likely a no-hitter is to happen is batted ball tendencies. But a premiere power-hitting team like the Murderer’s Row Yankees would hit a ton of flyballs, which are more likely to turn into outs, and a team that makes a lot of contact hits for a high average in general hits softer line drives that are more likely to be caught. The ’62 Mets got no-hit by Sandy Koufax because Koufax was unhittable that day. Koufax had the stuff to no-hit anyone. If the Yankees faced a pitcher who had unhittable stuff, they were just as likely as anyone to get no-hit. It seems counter-intuitive I’ll admit, but I do think that all teams on any given day have just about the same probability of getting no-hit.

For your third point, the standard deviation for such a small proportion and large population (I essentially took a census) is ridiculously small (.00004145 in this case) that even small fluctuations stand out up to a certain point. There’s a reason I didn’t get into why the Kansas City Royals and Chicago Cubs of all teams have been no-hit the least of the 30 MLB teams- there has to be variation expected. When I was calculating the p-value of the Rays’ result, part of me expected a much more significant value but the big thing was how small the Rays’ sample size was compared to the population, leading to a .016 p-value instead of something smaller. ]]>

However, don’t you think that this kind of analysis is really only usable if we assume that every team, on any given day, has a roughly equal chance of taking part in a no-hitter? That kind of assumes that, for example, the 1927 Yankees have as good a chance of being on the bod end of a no-hitter as the 1962 Mets.

Also, given the nature of no-hitters (insofar that there has to be a a victor and a victim), there will necessarily be a nearly equal rate of outliers on either side, don’t you think? Even though your sample size is obviously large enough, for the rates at which no-hitters actually occur very small fluctuations can look a lot more damning than they really are. ]]>