I’ve previously described r/counting as a collaborative incremental game, and that for me sums up the essence of counting fairly well. A natural question to ask about the game is how many people have played over the years

We’ll start of by importing the relevant packages and loading some data. Since we’re only interested in the counters in each thread, we only load those two columns from the database.

Code for importing packages and loading data
import re
import sqlite3
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio
import scipy
import seaborn as sns
from IPython.display import Markdown
from rcounting import parsing

sns.set_theme()
pio.templates.default = "seaborn"
data_directory = Path("../data")

pd.options.plotting.backend = "plotly"
db = sqlite3.connect(data_directory / "counting.sqlite")

counts = pd.read_sql(
    "select counters.canonical_username as username, submission_id from comments "
    " join counters on comments.username=counters.username "
    "where comments.position > 0 and submission_id != 'uuikz' order by timestamp",
    db,
)
submissions = pd.read_sql("select * from submissions", db)


def format_title(row):
    return (
        f"[{row.title}](http://www.reddit.com/r/counting/comments/{row.submission_id})"
    )


submissions["link"] = submissions.apply(format_title, axis=1)

Now finding the total number of counters is easy

Code
counts["username"].nunique()
15714

That’s more than I was expecting!

The number of counters in each thread

The counts in r/counting are split into threads of 1000 counts each, and in principle it should be possible to have a thread with 1000 different counters participating. That’s never happened, especially since most counts are made as part of a series of replies between just two users. Still, it might be interesting to see which threads had the most counters taking part:

Code
levels = counts.groupby(['submission_id', 'username'], sort=False).size()
top = levels.groupby(level=0, sort=False).size().sort_values(ascending=False).head()
top_submissions =  submissions.query("submission_id in @top.index").copy()
combined = pd.concat([top, top_submissions.set_index("submission_id")], axis=1)
Markdown(combined[["link", 0]].to_markdown(headers=["**Thread**", "**Number of counters**"], index=False))

Some of these threads really had a lot of participants!

On the oppositve end of the scale, we can look at the threads with fewest participants. Since you’re not allowed to reply to yourself, at least two people have to take part in each thread. We can easily see how many times that’s happened:

Code
perfect = levels.groupby(level=0, sort=False).size() == 2
perfect = perfect.loc[perfect].index
len(perfect)
161

So not a huge amount of times, but it’s happened. The last five threads with only two counters are

Code
perfect_500s = submissions.query("submission_id in @perfect").copy().tail().iloc[::-1]
def find_counters(submission_id):
    return pd.Series(levels.loc[submission_id].index)
perfect_500s[["first_counter", "second_counter"]] = perfect_500s["submission_id"].apply(find_counters)
Markdown(perfect_500s[["link", "first_counter", "second_counter"]].to_markdown(headers=["**Thread**", "**First Counter**", "**Second Counter**"], index=False))
Thread First Counter Second Counter
5078k Counting Thread Antichess Countletics
5079k Counting Thread Antichess Countletics
5,104k Counting Thread ClockButTakeOutTheL Antichess
5120k Counting Thread Antichess Countletics
5121k Counting Thread Antichess Countletics

We can plot the distribution of the number of counters in each thread; this is shown on Figure 1.

Code
counters = levels.groupby(level=0, sort=False).size()
fig = px.histogram(
    list(counters[counters <= 100]),
    labels={"value": "Number of Counters"},
)
fig.update_layout(showlegend=False, yaxis_title_text='Occurences')
fig.show()

Figure 1: The distribution of the number of counters participating in a thread

Effective number of counters per thread

The total number of counters that participate in a thread is an inherently noisy quantity. One person making a single count can change the total even if they make no other counts in the thread. A better way is to look at the effective number of counters taking part in a thread. The effective number takes into account how skewed the distribution of participants is. If 10 people count 100 times each in a thread, then both the actual and the effective number of counters is 10. If instead two people count 496 times each, and 8 people count once each, then the effective number of counters is 2.02, because two people made basically all the counts.

We can find the submission with the highest number of effective counters.

Code
from rcounting.analysis import effective_number_of_counters
effective_counters = levels.groupby(level=0, sort=False).apply(effective_number_of_counters)
submission_id = effective_counters.idxmax()
s = (f"The thread with the highest number of effective counters is "
     f"{submissions.query('submission_id == @submission_id')['link'].iat[0]}, "
     f"with {effective_counters.loc[submission_id]:.1f} counters.")
Markdown(s)

The thread with the highest number of effective counters is 336K Counting Thread, with 28.2 counters.

We can also compare the total and the effective number of counters

Code
total_counters = levels.groupby(level=0, sort=False).size()
merged = (pd.concat([effective_counters, total_counters], axis=1))
merged.columns = ['Effective counters', 'Actual counters']
Code
table = merged.describe().transpose()[["mean", "50%", "max"]]
Markdown(table.to_markdown(floatfmt=".1f", headers=["**Mean**", "**Median**", "**Maximum**"]))
Mean Median Maximum
Effective counters 4.5 3.5 28.2
Actual counters 20.4 18.0 189.0

We can see that both the total and effective number of counters have a median that is lower than the mean, indicating that the distributions have long tails to the right. We can plot these, which is done on figure Figure 2. You can clearly see how much more spread out the actual number of counters is compared with the effective number. The effective number is really sharply peaked at 2, with 25% of the counts lying in the range 2-2.4.

Code
limits = [0, 50]
kde1 = scipy.stats.gaussian_kde(merged["Effective counters"])
kde2 = scipy.stats.gaussian_kde(merged["Actual counters"])
axis = np.linspace(*limits, 100, endpoint=False)
data = pd.DataFrame(
    {
        "Number of counters": axis,
        "Effective counters": kde1(axis),
        "Actual counters": kde2(axis),
    }
)

fig = px.line(
    data_frame=data.melt(id_vars=["Number of counters"]),
    x="Number of counters",
    y="value",
    color="variable",
    labels={"value": "Probability density", "variable": "Model"},
)
fig.update_layout(legend=dict(yanchor="top", y=0.99, xanchor="right", x=0.99))
fig.update_yaxes(range=(0, 0.28))

fig.show()

Figure 2: The distributions of the number of effective and actual counters in each thread

We can also plot how the effective and actual number of counters have evolved throughout r/counting history; this is shown on figure Figure 3. The actual and effective number of counters track each other quite closely across threads. It seems there’s been a gradual decline in the number of counters participating in each thread, but with spikes of activity. One thing I was expecting to see was clear spikes at 100k threads, since running isn’t allowed on those. And those spikes just aren’t apparent in the data.

Figure 3: How the number of effective and actual counters has changed through r/counting history, a 10-thread rolling average

We can also plot the effective number of counters as a function of the actual number of counters. You can see generally, the more actual counters there are ina thread, there more effective counters there will be, but the relationship is fairly noisy.

Code
fig = px.scatter(data_frame=merged, x="Actual counters", y="Effective counters", trendline="ols")
fig.update_traces(opacity=0.5)
fig.update_yaxes(range=(2, 25))
fig.update_xaxes(range=(0, 150))
fig.show()