On August 9th, the Republic of Belarus held its main voting day during the Presidential Elections. Also, early voting was held on August 4th-8th. The domestic citizen observers reported falsifications of the official turnout data as reported by the authorities. We have analysed the turnout data received from citizen observers and official sources. The statistical analysis shows that the officially announced turnout figure during the early vote (41.7%) was grossly inflated. In reality and statistically speaking, this figure could not reach more than 24% which means that the authorities apparently inflated the real turnout during the early voting by approx. 1.2 million votes.
A table received from the Belarusian observer community contains 5366 records with numbers of ballots cast by voters during the early vote in Belarus on August 4th, 5th, 6th, 7th, 8th according to official protocols and/or observers. Each record corresponds to one voting station and one day and contains non-zero data from at least one observer and/or obtained from official protocol:
Because of significant hindrances to the monitoring of turnout at the voting precincts and the impossibility to ensure continuous monitoring coverage during the whole period of the early vote, it is not easy to evaluate the turnout during the early Presidential vote. In this note, we do this based on the available partial precinct data (not necessarily covering all the voting days in their entirety).
The idea is as follows: although we cannot trace the early vote turnout at individual precincts during the whole early vote period, we still can estimate the typical early vote for the whole period by summing typical early vote values for separate days. To eliminate the influence of possible falsifications, we perform some preliminary cleansing of the data, which is described below.
The analysis of data available shows that in the vast majority of cases, turnout reported in the official protocol data was either equal to, or higher to, the turnout as reported by observers where both numbers are available.
To illustrate this point, below is the histogram of voting precincts in Belarus vs. number of persons who voted therein on August 6th according to the official protocol data and observers’ data (histogram bin = 10).
The distribution of voting precincts vs. number of people voted according to observers data is a narrow bell-shaped curve; on the other hand, the distribution based on electoral protocols data, whilst having the same mode as the former, has a significant extended tail part going towards high turnout. The extensive experience in observing and analysing elections in Russia suggests that such a tail-like structure indicates a high probability of artificial inflation of the electoral turnout, used as a means to falsify the voting results. A similar picture is observed for the remaining voting days as well.
In order to exclude the influence of falsification on the evaluation, we can limit ourselves to those records in the dataset where the turnout from the official protocol is consistent with the one from observers. We shall consider account protocol data on the number of voters ‘verified’ if it differs by no more than 10% from observer data (leaving room for possible observers’ counting error).
There are 1804 such records (approximately a third) in the available dataset:
Histograms of precincts vs. number of early voters for ‘verified’ precincts are shown below.
Using the set of verified precincts, we then calculate the median and mean number of voters per precinct for each of the dates, as well as the interquartile range (IQR) which provides a robust estimate of standard deviation (IQR/1.35). Median and IQR are robust (less sensitive to outliers (unusually high or small values) than normal mean and standard deviation, and thus may provide more reliable estimates in the situation where falsified data may be present in the dataset.
Now, we can estimate the median and mean values of early voters over the whole period as sums of median and mean values respectively for the individual dates. As an estimate of the statistical error margin, one can use the square root of the sum of squared estimated standard deviations for individual dates (as the variance in voter numbers includes a systemic component due to the variance in precincts size, the resulting error margin is most likely an upper estimate).
It must be noted, though, that the sample of voting precincts under consideration was probably biased towards urban (that is, larger) ones with voter counts above the country-wide average (one constituting, approximately, 1150 persons).
With some rounding, the number of voters at the precincts monitored by observers and precincts similar to these may be estimated as follows:
As of end of August 7th:
As of end of August 8th:
For a precinct with the Belarus-wide average number of voters (1150 persons), this gives the early turnout values:
As of August 7th 17% (median) — 20% (mean),
As of August 8th 20% (median) — 24% (mean).
Again, as the precincts observed were obviously more populated than average, the numbers should be viewed as an upper estimate. Therefore, the officially reported early vote turnout figure (41.7%) was, by all accounts, grossly inflated (twice at the very least).
The absolute number of officially overreported early voters can be calculated as follows: (42% [official early vote turnout] — 24% [estimate for 8th August mean early vote turnout]) *6.8mln voters = 1.2 million votes. Even taking the aforementioned disclaimers into account, this must be viewed as a lower estimate with a large safety margin.
Later, I received another set of data for 258 precincts where the observers were able to monitor the turnout throughout the whole early vote period. Respectively, the numbers of early voters can be evaluated directly from observers’ data.
This new dataset gives the following estimates:
As of end of August 7th:
As of end of August 8th:
Thus, the direct estimate agrees with the indirect one within the limits of error margin stated. This means that early vote turnout can indeed be obtained with sufficient accuracy even from incomplete observation data.