Most counts on r/counting are made by two counters collaborating; the signature of this is that the n^th count and the (n + 2)^th count almost always have the same author. Usually we only consider such collaborative counting a run when the two counters reply to each other at a fairly rapid clip, but I’ve ignored that here
Firstly, I’ve looked at the longest runs by total time, which I’ve defined as the longest periods of time when only two people counted in main. Unfortunately, that doesn’t give anything very interesting: these are all from early in the subreddit history, where hours would frequently pass between replies, and the runs I’ve found are generally less than ten counts long. An example is from the 24k counting thread, where just over 23 hours passed between two counts here here1. There was only one exception to this trend in the 2M era. Apparently we had a bit of a problem with spammers & farmers back then, and a really long chain of counts was deleted, and the count was continued from a valid point some hours later. The whole thing caused a bit of confusion.
1 Plus that was a late chain which somehow became official. I guess we were less strict back then.
Code
from pathlib import Pathimport pandas as pdimport numpy as npimport reimport sqlite3import matplotlib.pyplot as pltfrom rcounting import side_threads, counters, analysis, thread_navigation as tn, parsing, unitsfrom rcounting.reddit_interface import redditimport seaborn as snssns.set_theme()from IPython.display import Markdowndata_directory = Path("../data/")db = sqlite3.connect(data_directory /"counting.sqlite")df = pd.read_sql("select counters.canonical_username as username, timestamp, comment_id from comments join counters on comments.username=counters.username where comments.position > 0 order by timestamp", db)
Looking at the longest runs by total number of counts in the run is more interesting. so let’s do that. We take all our ordered counts and shift them by two; if the two authors are different, that means one streak has ended and a new one started. Taking the cumulative sum of all these changes means that each run is assigned the same number; exactly what we want. Table 1 shows the top ten runs in r/counting history, as well as their total length.2:
2 I haven’t checked through all of them, but it’s likely all the lengths are off by one. That’s because I’ve forced every comment to be part of only one run, but the first comment in each run should simultaneously be the last comment in the previous run.
The first seven runs are by u/davidjl123 and u/Countletics in the early 3Ms, and all of the top ten involve either david or countletics. The first one which doesn’t either of them is number 17 between nonsensy and colby6666, starting at 3,456,003 and continuing for 2000 counts.
Source Code
---title: "Longest runs"---Most counts on r/counting are made by two counters collaborating; the signature of this is that the n^th count and the (n + 2)^th count almost always have the same author. Usually we only consider such collaborative counting a run when the two counters reply to each other at a fairly rapid clip, but I've ignored that hereFirstly, I've looked at the longest runs by total time, which I've defined as the longest periods of time when only two people counted in main. Unfortunately, that doesn't give anything very interesting: these are all from early in the subreddit history, where hours would frequently pass between replies, and the runs I've found are generally less than ten counts long. An example is from the 24k counting thread, where just over 23 hours passed between two counts here [here](/comments/z63ql/_/c64fdp8/?context=3)^[Plus that was a late chain which somehow became official. I guess we were less strict back then. ]. There was [only one exception](/comments/8gj7vx_/dyherjj) to this trend in the 2M era. Apparently we had a bit of a problem with spammers & farmers back then, and a really long chain of counts was deleted, and the count was continued from a valid point some hours later. The whole thing [caused a bit of confusion](/comments/dyhzmay/?context=3).```{python}from pathlib import Pathimport pandas as pdimport numpy as npimport reimport sqlite3import matplotlib.pyplot as pltfrom rcounting import side_threads, counters, analysis, thread_navigation as tn, parsing, unitsfrom rcounting.reddit_interface import redditimport seaborn as snssns.set_theme()from IPython.display import Markdowndata_directory = Path("../data/")db = sqlite3.connect(data_directory /"counting.sqlite")df = pd.read_sql("select counters.canonical_username as username, timestamp, comment_id from comments join counters on comments.username=counters.username where comments.position > 0 order by timestamp", db)```Looking at the longest runs by total number of counts in the run is more interesting. so let's do that. We take all our ordered counts and shift them by two; if the two authors are different, that means one streak has ended and a new one started. Taking the cumulative sum of all these changes means that each run is assigned the same number; exactly what we want. @tbl-runs shows the top ten runs in r/counting history, as well as their total length.^[I haven't checked through all of them, but it's likely all the lengths are off by one. That's because I've forced every comment to be part of only one run, but the first comment in each run should simultaneously be the last comment in the previous run.]:```{python}#| label: tbl-runs#| tbl-cap: The 10 longest runs in r/counting history#| column: body-outsetcolumn = df['username']is_different_group = (column != column.shift(2))df['group'] = is_different_group.cumsum()groups = df.groupby('group')indices = (-groups.size().loc[groups.size() >1]).sort_values(kind='mergesort').indexold = groups.first().loc[indices]new = groups.last().loc[indices]new['username'] = groups.nth(1).loc[indices]['username']old['length'] = groups.size().loc[indices]combined = old.join(new, rsuffix='2', lsuffix='1')combined['dt'] = (combined['timestamp2'] - combined['timestamp1']) / units.HOURcombined.drop(['timestamp1', 'timestamp2'], axis=1, inplace=True)def link(comment): body, submission = pd.read_sql(f"select body, comment_id from comments where comment_id == '{comment}' order by timestamp desc limit 1", db).iloc[0]returnf"[{parsing.find_count_in_text(body):,}](/comments/{submission}/_/{comment})"final = combined.head(10).reset_index(drop=True).copy()final["old_link"] = final["comment_id1"].apply(link)final["new_link"] = final["comment_id2"].apply(link)final.index +=1def format_time(timedelta): hours, rem =divmod(timedelta.total_seconds(), 3600) minutes, seconds =divmod(rem, 60)returnf"{int(hours):0>2}:{int(minutes):0>2}"final['dt'] = pd.to_timedelta(final['dt'], unit='h').round('s').apply(format_time)Markdown(final[['username1', 'username2', 'old_link', 'new_link', 'length', 'dt']].to_markdown(headers=['**Rank**', '**1st counter**', '**2nd counter**', '**Start**', '**End**', '**Length**', '**Duration**']))```The first seven runs are by u/davidjl123 and u/Countletics in the early 3Ms, and all of the top ten involve either david or countletics. The first one which doesn't either of them is number 17 between nonsensy and colby6666, starting at [3,456,003](/comments/erw1rp/_/ff66e1r) and continuing for 2000 counts.