Classify Insiders as Informed

Code

Python

Classifying insiders as informed according to Cohen, Malloy & Pomorski (2012) routine/opportunistic insider split.

Published

April 20, 2026

There is a large interest in knowing when corporate insiders trade, as they are employees with access to material nonpublic (inside) information about their firm. Insiders are banned from trading on inside information, but they can legally trade on other material information—and if anyone knows when the stock is mispriced, it should be them. However, they also trade for reasons unrelated to the firm’s expected performance: taxes, diversification, liquidity, and grant-vesting cycles.

The question this post replicates is how to separate the informed insiders from the rest.

Cohen, Malloy & Pomorski (CMP, 2012)—Decoding Inside Information—propose a simple identifying assumption: trades whose timing is predictable are unlikely to be information-driven. The everyday reasons insiders trade—taxes, vesting cycles, bonus-funded buying, scheduled diversification—show up as repeating calendar-month patterns. If we can flag those ex ante and strip them out, what remains is the set of trades that might contain information, because we have no obvious non-informational story for their timing.

So, if an insider has traded in the same calendar month in each of the prior three years, call them “routine” (non-informed); otherwise, “opportunistic” (informed). CMP show that all the return predictability in the insider universe lives in the opportunistic bucket—the routine bucket is essentially flat.

This post gives you a short working classifier in python. Following CMP (2012), not a full replication—the paper is a complete academic implementation with a specific sample, transaction filters, and portfolio return tests. The authors themselves call the rule “a noisy proxy for actual routine trading” (p. 11).

The rule

At the start of each calendar year Y, look at the insider’s trades in years Y-3, Y-2, Y-1:

Routine — traded in all three prior years, and some calendar month is shared across all three.
Opportunistic — traded in all three prior years, but no calendar month is shared.
NonClassified — missing a trade in at least one of the three prior years.

Every year-Y trade inherits that label. The classification is at the insider level (pooled across firms), matching the paper’s language.

Step 1. Synthetic data

Four insiders, three firms, ten years. The data is set up so each insider lands in a different bucket, so we can check the classifier against a known answer.

I am an insider at Company A, buying 1,000 shares every March—same month, same size, every year. To complicate things, I am also an insider at Company B, where I trade every year but in scattered months. CMP’s baseline classifies at the insider level, pooling my trades across firms. So my Company A March pattern dominates, and I get labeled Routine globally. That label then sticks to my Company B trades too, even though they have no monthly pattern of their own. The trade-level variant (Table III) below addresses exactly this case, and the extensions deal with insiders whose history has gaps.

Beatrice (Company B, different month every year) → Opportunistic. Cecilia (Company C, only odd years) → NonClassified. Diana (one trade in 2015, then silent) → NonClassified too.

import pandas as pd

# Alexander (me) — March every year at Company A
rows  = [('Company A', 'Alexander', f'{y}-03-15',  1000) for y in range(2015, 2025)]
# Also an insider at Company B — scattered months
rows += [('Company B', 'Alexander', f'{y}-{m:02d}-10', 200)
         for y, m in zip(range(2015, 2025), [5, 8, 11, 2, 6, 10, 1, 4, 7, 12])]
# Beatrice — different month each year at Company B
rows += [('Company B', 'Beatrice', f'{y}-{m:02d}-10', 800)
         for y, m in zip(range(2015, 2025), [6, 8, 11, 2, 9, 4, 7, 10, 1, 5])]
# Cecilia — only odd years, never 3 consecutive
rows += [('Company C', 'Cecilia', f'{y}-06-20',  500) for y in range(2015, 2025, 2)]
# Diana — one trade in 2015 and never again (used in the extensions later)
rows += [('Company A', 'Diana', '2015-04-10', 300)]

df = pd.DataFrame(rows, columns=['Firm', 'Insider', 'Date', 'Shares'])
df['Date']  = pd.to_datetime(df['Date'])
df['Year']  = df['Date'].dt.year
df['Month'] = df['Date'].dt.month
df.head(12)

	Firm	Insider	Date	Shares	Year	Month
0	Company A	Alexander	2015-03-15	1000	2015	3
1	Company A	Alexander	2016-03-15	1000	2016	3
2	Company A	Alexander	2017-03-15	1000	2017	3
3	Company A	Alexander	2018-03-15	1000	2018	3
4	Company A	Alexander	2019-03-15	1000	2019	3
5	Company A	Alexander	2020-03-15	1000	2020	3
6	Company A	Alexander	2021-03-15	1000	2021	3
7	Company A	Alexander	2022-03-15	1000	2022	3
8	Company A	Alexander	2023-03-15	1000	2023	3
9	Company A	Alexander	2024-03-15	1000	2024	3
10	Company B	Alexander	2015-05-10	200	2015	5
11	Company B	Alexander	2016-08-10	200	2016	8

Step 2. Classify each insider-year

def classify(insider, y):
    # Pool this insider's trades across all firms in the 3 calendar years before y,
    # and aggregate into a Series {year: {set of calendar months traded}}.
    m = df[(df.Insider == insider) & df.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)

    # Must have traded in each of the 3 prior years.
    if len(m) < 3:
        return 'NonClassified'

    # set.intersection(*m) returns months present in ALL three years. Non-empty ⇒ Routine.
    return 'Routine' if set.intersection(*m) else 'Opportunistic'


years = range(df.Year.min() + 3, df.Year.max() + 1)
cls = pd.DataFrame(
    [(i, y, classify(i, y)) for i in df.Insider.unique() for y in years],
    columns=['Insider', 'Year', 'Class'])

cls.pivot(index='Insider', columns='Year', values='Class')

Year	2018	2019	2020	2021	2022	2023	2024
Insider
Alexander	Routine	Routine	Routine	Routine	Routine	Routine	Routine
Beatrice	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic
Cecilia	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified
Diana	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified

Exactly what the construction predicts. My global month history always contains March, so I am Routine. Beatrice has no repeat month, so Opportunistic. Cecilia and Diana never have three consecutive years of trading, so NonClassified.

Step 3. Label every trade

out = df.merge(cls, on=['Insider', 'Year'], how='left').fillna({'Class': 'NonClassified'})
out[out.Insider == 'Alexander'][['Firm', 'Date', 'Shares', 'Class']]

	Firm	Date	Shares	Class
0	Company A	2015-03-15	1000	NonClassified
1	Company A	2016-03-15	1000	NonClassified
2	Company A	2017-03-15	1000	NonClassified
3	Company A	2018-03-15	1000	Routine
4	Company A	2019-03-15	1000	Routine
5	Company A	2020-03-15	1000	Routine
6	Company A	2021-03-15	1000	Routine
7	Company A	2022-03-15	1000	Routine
8	Company A	2023-03-15	1000	Routine
9	Company A	2024-03-15	1000	Routine
10	Company B	2015-05-10	200	NonClassified
11	Company B	2016-08-10	200	NonClassified
12	Company B	2017-11-10	200	NonClassified
13	Company B	2018-02-10	200	Routine
14	Company B	2019-06-10	200	Routine
15	Company B	2020-10-10	200	Routine
16	Company B	2021-01-10	200	Routine
17	Company B	2022-04-10	200	Routine
18	Company B	2023-07-10	200	Routine
19	Company B	2024-12-10	200	Routine

That is it. The insider’s year-level label propagates to every trade they make that year, across all firms.

One consequence worth noticing: My Company B trades are all labeled Routine, even though those trades happen in non-March months. Under CMP’s baseline the label is assigned per insider, not per insider-firm, so his global March pattern dominates. Trades from 2015–2017 are NonClassified because they fall in my first three years, before the classifier has enough history to label me.

The paper describes a Table III trade-level variant that handles the cross-month case differently — below.

Trade-level variant (Table III)

Same 3-year lookback. The label is decided per trade using its calendar month.

If the insider’s prior 3 years share month M, a year-Y trade in month M is Routine; trades in any other month are Opportunistic.
If the insider has 3 years of history but no shared month, all their trades are Opportunistic.
If they lack 3 prior years, NonClassified.

def classify_trade(insider, y, month):
    m = df[(df.Insider == insider) & df.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)
    if len(m) < 3:
        return 'NonClassified'
    common = set.intersection(*m)
    if not common:
        return 'Opportunistic'
    return 'Routine' if month in common else 'Opportunistic'

trade_out = df.copy()
trade_out['Class'] = [
    classify_trade(r.Insider, r.Year, r.Month) if r.Year >= df.Year.min() + 3 else 'NonClassified'
    for r in df.itertuples()]

trade_out[trade_out.Insider == 'Alexander'][['Firm', 'Date', 'Shares', 'Class']]

	Firm	Date	Shares	Class
0	Company A	2015-03-15	1000	NonClassified
1	Company A	2016-03-15	1000	NonClassified
2	Company A	2017-03-15	1000	NonClassified
3	Company A	2018-03-15	1000	Routine
4	Company A	2019-03-15	1000	Routine
5	Company A	2020-03-15	1000	Routine
6	Company A	2021-03-15	1000	Routine
7	Company A	2022-03-15	1000	Routine
8	Company A	2023-03-15	1000	Routine
9	Company A	2024-03-15	1000	Routine
10	Company B	2015-05-10	200	NonClassified
11	Company B	2016-08-10	200	NonClassified
12	Company B	2017-11-10	200	NonClassified
13	Company B	2018-02-10	200	Opportunistic
14	Company B	2019-06-10	200	Opportunistic
15	Company B	2020-10-10	200	Opportunistic
16	Company B	2021-01-10	200	Opportunistic
17	Company B	2022-04-10	200	Opportunistic
18	Company B	2023-07-10	200	Opportunistic
19	Company B	2024-12-10	200	Opportunistic

Now my March trades at Company A stay Routine, but his Company B trades in other months become Opportunistic. A single insider produces both types of trade.

Extension 1 — relaxed eligibility

The paper’s strict rule tags Cecilia as NonClassified every year, because she never has three consecutive years of trading. But clearly she is not routine either. A natural relaxation is to default her to Opportunistic as long as she has any history in the 3-year lookback window.

def classify_relaxed(insider, y):
    m = df[(df.Insider == insider) & df.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)
    if len(m) == 0:
        return 'NonClassified'                      # no history in the 3-year lookback
    if len(m) == 3 and set.intersection(*m):
        return 'Routine'                            # strict routine definition
    return 'Opportunistic'                          # any history in lookback, not routine

cls_relaxed = pd.DataFrame(
    [(i, y, classify_relaxed(i, y)) for i in df.Insider.unique() for y in years],
    columns=['Insider', 'Year', 'Class'])

cls_relaxed.pivot(index='Insider', columns='Year', values='Class')

Year	2018	2019	2020	2021	2022	2023	2024
Insider
Alexander	Routine	Routine	Routine	Routine	Routine	Routine	Routine
Beatrice	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic
Cecilia	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic	Opportunistic
Diana	Opportunistic	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified	NonClassified

Cecilia is now Opportunistic across the board. Diana is Opportunistic in 2018 (her 2015 trade is still inside the lookback window), then drops to NonClassified from 2019 onward.

CMP (p. 12) test a version of this as a robustness check: pooling non-classified trades into the Opportunistic bucket leaves their results intact and “if anything” strengthens them slightly.

Caveat. The relaxation mixes two types into Opportunistic: active non-routine traders, and sporadic gappy traders. The paper’s strict filter keeps the two buckets comparable on activity level. This relaxation does not.

Extension 2 — any past history implies Opportunistic

The most permissive version. If the insider has any past trade—even outside the 3-year lookback—and they do not satisfy the strict routine rule, classify them as Opportunistic. Only insiders with zero prior history remain NonClassified.

def classify_ever(insider, y):
    past = df[(df.Insider == insider) & (df.Year < y)]
    if past.empty:
        return 'NonClassified'
    m = past[past.Year.between(y-3, y-1)].groupby('Year').Month.agg(set)
    if len(m) == 3 and set.intersection(*m):
        return 'Routine'
    return 'Opportunistic'

cls_ever = pd.DataFrame(
    [(i, y, classify_ever(i, y)) for i in df.Insider.unique() for y in years],
    columns=['Insider', 'Year', 'Class'])

# Three-way comparison focused on Diana, where the rules disagree
combined = (cls.rename(columns={'Class': 'paper'})
              .merge(cls_relaxed.rename(columns={'Class': 'relaxed'}), on=['Insider', 'Year'])
              .merge(cls_ever.rename(columns={'Class': 'ever'}),       on=['Insider', 'Year']))
combined[combined.Insider == 'Diana']

	Insider	Year	paper	relaxed	ever
21	Diana	2018	NonClassified	Opportunistic	Opportunistic
22	Diana	2019	NonClassified	NonClassified	Opportunistic
23	Diana	2020	NonClassified	NonClassified	Opportunistic
24	Diana	2021	NonClassified	NonClassified	Opportunistic
25	Diana	2022	NonClassified	NonClassified	Opportunistic
26	Diana	2023	NonClassified	NonClassified	Opportunistic
27	Diana	2024	NonClassified	NonClassified	Opportunistic

Diana traded exactly once, in 2015, and disappeared. The three rules encode three different beliefs about what her silence means:

Paper baseline — we cannot see enough of her pattern to judge. Drop her.
Relaxed (lookback) — her pattern went cold. From 2019 onward we cannot judge her either.
Ever — she once acted non-routinely. That is what she is until she acts otherwise.

There is no “right” answer. It is a bet about the information content of activity.

Using it on real data

To apply this to a real dataset such as LSEG Workspace:

Filter to open-market buys and sells only. Drop Exercise/Conversion of Awards, Payment of Exercise Price or Tax Liability, Gift, private transactions. These are mechanical and will pollute both routine and opportunistic signals.
Reconcile insider identities across firms. LSEG sometimes reports the same person with different spellings at different firms. Without reconciliation, a multi-firm insider like me gets split into separate personas and neither copy has enough history to be classified.
Make sure you have at least three years of history before the first classification year. CMP’s 1989 start required data back to 1986. If your data starts in 2000, your first classification year is 2003.

Reference

Cohen, L., Malloy, C., and Pomorski, L. (2012). Decoding Inside Information. Journal of Finance 67(3): 1009–1043.