Classify Insiders as Informed

Code
Python
Classifying insiders as informed according to Cohen, Malloy & Pomorski (2012) routine/opportunistic insider split.
Published

April 20, 2026

There is a large interest in knowing when corporate insiders trade, as they are employees with access to material nonpublic (inside) information about their firm. Insiders are banned from trading on inside information, but they can legally trade on other material information—and if anyone knows when the stock is mispriced, it should be them. However, they also trade for reasons unrelated to the firm’s expected performance: taxes, diversification, liquidity, and grant-vesting cycles.

The question this post replicates is how to separate the informed insiders from the rest.

Cohen, Malloy & Pomorski (CMP, 2012)—Decoding Inside Information—propose a simple identifying assumption: trades whose timing is predictable are unlikely to be information-driven. The everyday reasons insiders trade—taxes, vesting cycles, bonus-funded buying, scheduled diversification—show up as repeating calendar-month patterns. If we can flag those ex ante and strip them out, what remains is the set of trades that might contain information, because we have no obvious non-informational story for their timing.

So, if an insider has traded in the same calendar month in each of the prior three years, call them “routine” (non-informed); otherwise, “opportunistic” (informed). CMP show that all the return predictability in the insider universe lives in the opportunistic bucket—the routine bucket is essentially flat.

This post gives you a short working classifier in python. Following CMP (2012), not a full replication—the paper is a complete academic implementation with a specific sample, transaction filters, and portfolio return tests. The authors themselves call the rule “a noisy proxy for actual routine trading” (p. 11).

The rule

At the start of each calendar year Y, look at the insider’s trades in years Y-3, Y-2, Y-1:

  • Routine — traded in all three prior years, and some calendar month is shared across all three.
  • Opportunistic — traded in all three prior years, but no calendar month is shared.
  • NonClassified — missing a trade in at least one of the three prior years.

Every year-Y trade inherits that label. The classification is at the insider level (pooled across firms), matching the paper’s language.

Step 1. Synthetic data

Four insiders, three firms, ten years. The data is set up so each insider lands in a different bucket, so we can check the classifier against a known answer.

I am an insider at Company A, buying 1,000 shares every March—same month, same size, every year. To complicate things, I am also an insider at Company B, where I trade every year but in scattered months. CMP’s baseline classifies at the insider level, pooling my trades across firms. So my Company A March pattern dominates, and I get labeled Routine globally. That label then sticks to my Company B trades too, even though they have no monthly pattern of their own. The trade-level variant (Table III) below addresses exactly this case, and the extensions deal with insiders whose history has gaps.

Beatrice (Company B, different month every year) → Opportunistic. Cecilia (Company C, only odd years) → NonClassified. Diana (one trade in 2015, then silent) → NonClassified too.

import pandas as pd

# Alexander (me) — March every year at Company A
rows  = [('Company A', 'Alexander', f'{y}-03-15',  1000) for y in range(2015, 2025)]
# Also an insider at Company B — scattered months
rows += [('Company B', 'Alexander', f'{y}-{m:02d}-10', 200)
         for y, m in zip(range(2015, 2025), [5, 8, 11, 2, 6, 10, 1, 4, 7, 12])]
# Beatrice — different month each year at Company B
rows += [('Company B', 'Beatrice', f'{y}-{m:02d}-10', 800)
         for y, m in zip(range(2015, 2025), [6, 8, 11, 2, 9, 4, 7, 10, 1, 5])]
# Cecilia — only odd years, never 3 consecutive
rows += [('Company C', 'Cecilia', f'{y}-06-20',  500) for y in range(2015, 2025, 2)]
# Diana — one trade in 2015 and never again (used in the extensions later)
rows += [('Company A', 'Diana', '2015-04-10', 300)]

df = pd.DataFrame(rows, columns=['Firm', 'Insider', 'Date', 'Shares'])
df['Date']  = pd.to_datetime(df['Date'])
df['Year']  = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df.head(12)
Firm Insider Date Shares Year Month
0 Company A Alexander 2015-03-15 1000 2015 3
1 Company A Alexander 2016-03-15 1000 2016 3
2 Company A Alexander 2017-03-15 1000 2017 3
3 Company A Alexander 2018-03-15 1000 2018 3
4 Company A Alexander 2019-03-15 1000 2019 3
5 Company A Alexander 2020-03-15 1000 2020 3
6 Company A Alexander 2021-03-15 1000 2021 3
7 Company A Alexander 2022-03-15 1000 2022 3
8 Company A Alexander 2023-03-15 1000 2023 3
9 Company A Alexander 2024-03-15 1000 2024 3
10 Company B Alexander 2015-05-10 200 2015 5
11 Company B Alexander 2016-08-10 200 2016 8

Step 2. Classify each insider-year

def classify(insider, y):
    # Pool this insider's trades across all firms in the 3 calendar years before y,
    # and aggregate into a Series {year: {set of calendar months traded}}.
    m = df[(df.Insider == insider) & df.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)

    # Must have traded in each of the 3 prior years.
    if len(m) < 3:
        return 'NonClassified'

    # set.intersection(*m) returns months present in ALL three years. Non-empty ⇒ Routine.
    return 'Routine' if set.intersection(*m) else 'Opportunistic'


years = range(df.Year.min() + 3, df.Year.max() + 1)
cls = pd.DataFrame(
    [(i, y, classify(i, y)) for i in df.Insider.unique() for y in years],
    columns=['Insider', 'Year', 'Class'])

cls.pivot(index='Insider', columns='Year', values='Class')
Year 2018 2019 2020 2021 2022 2023 2024
Insider
Alexander Routine Routine Routine Routine Routine Routine Routine
Beatrice Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic
Cecilia NonClassified NonClassified NonClassified NonClassified NonClassified NonClassified NonClassified
Diana NonClassified NonClassified NonClassified NonClassified NonClassified NonClassified NonClassified

Exactly what the construction predicts. My global month history always contains March, so I am Routine. Beatrice has no repeat month, so Opportunistic. Cecilia and Diana never have three consecutive years of trading, so NonClassified.

Step 3. Label every trade

out = df.merge(cls, on=['Insider', 'Year'], how='left').fillna({'Class': 'NonClassified'})
out[out.Insider == 'Alexander'][['Firm', 'Date', 'Shares', 'Class']]
Firm Date Shares Class
0 Company A 2015-03-15 1000 NonClassified
1 Company A 2016-03-15 1000 NonClassified
2 Company A 2017-03-15 1000 NonClassified
3 Company A 2018-03-15 1000 Routine
4 Company A 2019-03-15 1000 Routine
5 Company A 2020-03-15 1000 Routine
6 Company A 2021-03-15 1000 Routine
7 Company A 2022-03-15 1000 Routine
8 Company A 2023-03-15 1000 Routine
9 Company A 2024-03-15 1000 Routine
10 Company B 2015-05-10 200 NonClassified
11 Company B 2016-08-10 200 NonClassified
12 Company B 2017-11-10 200 NonClassified
13 Company B 2018-02-10 200 Routine
14 Company B 2019-06-10 200 Routine
15 Company B 2020-10-10 200 Routine
16 Company B 2021-01-10 200 Routine
17 Company B 2022-04-10 200 Routine
18 Company B 2023-07-10 200 Routine
19 Company B 2024-12-10 200 Routine

That is it. The insider’s year-level label propagates to every trade they make that year, across all firms.

One consequence worth noticing: My Company B trades are all labeled Routine, even though those trades happen in non-March months. Under CMP’s baseline the label is assigned per insider, not per insider-firm, so his global March pattern dominates. Trades from 2015–2017 are NonClassified because they fall in my first three years, before the classifier has enough history to label me.

The paper describes a Table III trade-level variant that handles the cross-month case differently — below.

Trade-level variant (Table III)

Same 3-year lookback. The label is decided per trade using its calendar month.

  • If the insider’s prior 3 years share month M, a year-Y trade in month M is Routine; trades in any other month are Opportunistic.
  • If the insider has 3 years of history but no shared month, all their trades are Opportunistic.
  • If they lack 3 prior years, NonClassified.
def classify_trade(insider, y, month):
    m = df[(df.Insider == insider) & df.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)
    if len(m) < 3:
        return 'NonClassified'
    common = set.intersection(*m)
    if not common:
        return 'Opportunistic'
    return 'Routine' if month in common else 'Opportunistic'

trade_out = df.copy()
trade_out['Class'] = [
    classify_trade(r.Insider, r.Year, r.Month) if r.Year >= df.Year.min() + 3 else 'NonClassified'
    for r in df.itertuples()]

trade_out[trade_out.Insider == 'Alexander'][['Firm', 'Date', 'Shares', 'Class']]
Firm Date Shares Class
0 Company A 2015-03-15 1000 NonClassified
1 Company A 2016-03-15 1000 NonClassified
2 Company A 2017-03-15 1000 NonClassified
3 Company A 2018-03-15 1000 Routine
4 Company A 2019-03-15 1000 Routine
5 Company A 2020-03-15 1000 Routine
6 Company A 2021-03-15 1000 Routine
7 Company A 2022-03-15 1000 Routine
8 Company A 2023-03-15 1000 Routine
9 Company A 2024-03-15 1000 Routine
10 Company B 2015-05-10 200 NonClassified
11 Company B 2016-08-10 200 NonClassified
12 Company B 2017-11-10 200 NonClassified
13 Company B 2018-02-10 200 Opportunistic
14 Company B 2019-06-10 200 Opportunistic
15 Company B 2020-10-10 200 Opportunistic
16 Company B 2021-01-10 200 Opportunistic
17 Company B 2022-04-10 200 Opportunistic
18 Company B 2023-07-10 200 Opportunistic
19 Company B 2024-12-10 200 Opportunistic

Now my March trades at Company A stay Routine, but his Company B trades in other months become Opportunistic. A single insider produces both types of trade.

Extension 1 — relaxed eligibility

The paper’s strict rule tags Cecilia as NonClassified every year, because she never has three consecutive years of trading. But clearly she is not routine either. A natural relaxation is to default her to Opportunistic as long as she has any history in the 3-year lookback window.

def classify_relaxed(insider, y):
    m = df[(df.Insider == insider) & df.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)
    if len(m) == 0:
        return 'NonClassified'                      # no history in the 3-year lookback
    if len(m) == 3 and set.intersection(*m):
        return 'Routine'                            # strict routine definition
    return 'Opportunistic'                          # any history in lookback, not routine

cls_relaxed = pd.DataFrame(
    [(i, y, classify_relaxed(i, y)) for i in df.Insider.unique() for y in years],
    columns=['Insider', 'Year', 'Class'])

cls_relaxed.pivot(index='Insider', columns='Year', values='Class')
Year 2018 2019 2020 2021 2022 2023 2024
Insider
Alexander Routine Routine Routine Routine Routine Routine Routine
Beatrice Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic
Cecilia Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic Opportunistic
Diana Opportunistic NonClassified NonClassified NonClassified NonClassified NonClassified NonClassified

Cecilia is now Opportunistic across the board. Diana is Opportunistic in 2018 (her 2015 trade is still inside the lookback window), then drops to NonClassified from 2019 onward.

CMP (p. 12) test a version of this as a robustness check: pooling non-classified trades into the Opportunistic bucket leaves their results intact and “if anything” strengthens them slightly.

Caveat. The relaxation mixes two types into Opportunistic: active non-routine traders, and sporadic gappy traders. The paper’s strict filter keeps the two buckets comparable on activity level. This relaxation does not.

Extension 2 — any past history implies Opportunistic

The most permissive version. If the insider has any past trade—even outside the 3-year lookback—and they do not satisfy the strict routine rule, classify them as Opportunistic. Only insiders with zero prior history remain NonClassified.

def classify_ever(insider, y):
    past = df[(df.Insider == insider) & (df.Year < y)]
    if past.empty:
        return 'NonClassified'
    m = past[past.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)
    if len(m) == 3 and set.intersection(*m):
        return 'Routine'
    return 'Opportunistic'

cls_ever = pd.DataFrame(
    [(i, y, classify_ever(i, y)) for i in df.Insider.unique() for y in years],
    columns=['Insider', 'Year', 'Class'])

# Three-way comparison focused on Diana, where the rules disagree
combined = (cls.rename(columns={'Class': 'paper'})
              .merge(cls_relaxed.rename(columns={'Class': 'relaxed'}), on=['Insider', 'Year'])
              .merge(cls_ever.rename(columns={'Class': 'ever'}),       on=['Insider', 'Year']))
combined[combined.Insider == 'Diana']
Insider Year paper relaxed ever
21 Diana 2018 NonClassified Opportunistic Opportunistic
22 Diana 2019 NonClassified NonClassified Opportunistic
23 Diana 2020 NonClassified NonClassified Opportunistic
24 Diana 2021 NonClassified NonClassified Opportunistic
25 Diana 2022 NonClassified NonClassified Opportunistic
26 Diana 2023 NonClassified NonClassified Opportunistic
27 Diana 2024 NonClassified NonClassified Opportunistic

Diana traded exactly once, in 2015, and disappeared. The three rules encode three different beliefs about what her silence means:

  • Paper baseline — we cannot see enough of her pattern to judge. Drop her.
  • Relaxed (lookback) — her pattern went cold. From 2019 onward we cannot judge her either.
  • Ever — she once acted non-routinely. That is what she is until she acts otherwise.

There is no “right” answer. It is a bet about the information content of activity.

Using it on real data

To apply this to a real dataset such as LSEG Workspace:

  1. Filter to open-market buys and sells only. Drop Exercise/Conversion of Awards, Payment of Exercise Price or Tax Liability, Gift, private transactions. These are mechanical and will pollute both routine and opportunistic signals.
  2. Reconcile insider identities across firms. LSEG sometimes reports the same person with different spellings at different firms. Without reconciliation, a multi-firm insider like me gets split into separate personas and neither copy has enough history to be classified.
  3. Make sure you have at least three years of history before the first classification year. CMP’s 1989 start required data back to 1986. If your data starts in 2000, your first classification year is 2003.

Reference

Cohen, L., Malloy, C., and Pomorski, L. (2012). Decoding Inside Information. Journal of Finance 67(3): 1009–1043.