The use of separators

We have access to the body of each comment, so it’s possible to do some of analysis on those. One interesting thing could be to look at whether a given count is comma separated, space separated or uses no separator at all. And a natural question to ask is how the distribution between those three types has changed over time

Specifically, we’ll define the three types of count as:

Code for importing packages and loading data
import re
import sqlite3
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import plotly.express as px
import plotly.io as pio
import seaborn as sns
from rcounting import analysis, counters, parsing, side_threads
from rcounting import thread_navigation as tn
from rcounting.reddit_interface import reddit

pio.templates.default = "seaborn"
sns.set_theme()
from IPython.display import Markdown

data_directory = Path("../data")

db = sqlite3.connect(data_directory / "counting.sqlite")

counts = pd.read_sql(
    "select comments.body, comments.timestamp from comments join submissions "
    "on comments.submission_id = submissions.submission_id where comments.position > 0 "
    "order by submissions.timestamp, comments.position",
    db,
)
counts["date"] = pd.to_datetime(counts["timestamp"], unit="s")
counts.drop("timestamp", inplace=True, axis=1)

We started by making the necessary imports and loading all the data; with that out of the way we can implement the rules defined above

Code
data = counts.set_index("date")

data["body"] = data["body"].apply(parsing.strip_markdown_links)
comma_regex = re.compile(r"\d{1,3}(?:,\d{3})+")
data["commas"] = data["body"].apply(lambda x: bool(re.search(comma_regex, x)))
space_regex = re.compile(r"\d{1,3}(?: \d{3})+")
data["spaces"] = data["body"].apply(lambda x: bool(re.search(space_regex, x)))


def no_separators(body):
    body = body.split("\n")[0]
    separators = re.escape("' , .*/")
    regex = rf"(?:^[^\d]*\d[^\d]*$)|" rf"(?:^[^\d]*\d[^{separators}]*\d[^\d]*$)"
    regex = re.compile(regex)
    result = re.search(regex, body)
    return bool(result)


data["no separator"] = data["body"].apply(no_separators)
data.sort_index(inplace=True)

Once we have the data, we can get a 14-day rolling average, and resample the points to nice 6h intervals. The resampling makes plotting with pandas look nicer, since it can more easily deal with the x-axis.

Code for plotting the separator data
resampled = (
    (data[["commas", "spaces", "no separator"]].rolling("14d").mean() * 100)
    .resample("6h")
    .mean()
    .melt(ignore_index=False)
    .reset_index()
)

labels = {
    "date": "Date",
    "variable": "Separator style",
    "value": "Percentage of counts",
}
fig = px.line(
    data_frame=resampled,
    x="date",
    y="value",
    color="variable",
    labels=labels,
    title="The separators used on r/counting over time"
)

fig.update_yaxes(range=[0, 100])
fig.show()

Notice you can clearly see when the count crossed 100k: that’s when the ‘no separators’ line quickly drops from being the majority to being a clear minority of counts. That was followed by the era of commas, when the default format was just to use commas as separators. Over the last years, commas have significantly declined, and have now been overtaken by spaces as the most popular separator, although there’s a lot of variation depending on who exactly is active. No separators has bouts of activity, but is generally below the other two options. Pretty neat!