What is the Kirkpatrick Model? Four Levels of Training Evaluation Explained

Q: What is the Kirkpatrick model?

The Kirkpatrick model is a framework for evaluating the effectiveness of training programmes. It identifies four levels: Level 1 (Reaction, how participants responded to the training), Level 2 (Learning, what knowledge, skills or attitudes were acquired), Level 3 (Behaviour, whether participants changed their behaviour at work), and Level 4 (Results, whether the behaviour change produced the intended business or operational outcomes). The framework was developed by Donald Kirkpatrick and published in 1959, updated in Kirkpatrick and Kirkpatrick (2016).

Q: What is Kirkpatrick Level 3?

Kirkpatrick Level 3 is the Behaviour level: whether training participants changed their observable behaviour in their actual work as a result of training. Level 3 is the most practically important level for most L&D buyers because it directly answers the question that training is intended to address: did people behave differently as a result? Most training programmes are evaluated at Level 1 (satisfaction surveys) and Level 2 (quiz scores), rarely at Level 3.

Q: Why is Kirkpatrick Level 3 rarely measured?

Kirkpatrick Level 3 is rarely measured because it is harder to measure than Levels 1 and 2. It requires observational data collected in the actual workplace, follow-up after the training event, and the methodology to distinguish training effects from other influences on behaviour. Level 1 (satisfaction survey) is easy to administer at the end of a training event. Level 3 requires the organisation to look beyond the training event into the actual working behaviour that followed.

Q: What is the difference between Kirkpatrick Level 1 and Level 3?

Level 1 measures whether participants liked the training (satisfaction, confidence ratings). Level 3 measures whether participants actually behaved differently at work as a result. The two are often uncorrelated: participants who rated training highly at Level 1 do not necessarily show behaviour change at Level 3, and vice versa. Level 3 is a more reliable measure of training effectiveness for behaviour-change objectives.

Q: How does Sidestream use the Kirkpatrick model?

Sidestream's standard measurement is Kirkpatrick Level 3 (observed behaviour in real work) as the minimum, Level 4 (downstream operational or business metric) where the brief allows. This is the measurement commitment that distinguishes Sidestream's design from training designed to produce positive satisfaction surveys. The specific Level 3 measures are calibrated to the specific behavioural target of each engagement: speak-up frequency, structured peer challenge rates, disclosure-response quality, decision-documentation quality, and adjacent observable behavioural indicators.

Q: Who invented the Kirkpatrick model?

Donald L. Kirkpatrick developed the four-level model in the late 1950s. It was first published in a series of articles in the Journal of the American Society of Training Directors (1959) and later in book form as Evaluating Training Programs: The Four Levels (1994). Donald's son James Kirkpatrick co-authored an updated version, Kirkpatrick's Four Levels of Training Evaluation (2016), which clarified the model and introduced the concept of the New World Kirkpatrick Model.

The Four Kirkpatrick Levels: A Complete Breakdown

Donald L. Kirkpatrick first published his four-level model in 1959 in the Journal of the American Society of Training Directors. It was formalised in book form in 1994 as Evaluating Training Programs: The Four Levels and updated by James Kirkpatrick and Wendy Kayser Kirkpatrick in Kirkpatrick's Four Levels of Training Evaluation (2016).

Level 1: Reaction

Level 1 measures how training participants responded to the training. Was it engaging? Was it relevant? Would they recommend it? Level 1 is typically captured through post-training satisfaction surveys or "smile sheets."

Level 1 is the easiest level to measure and the most commonly used training metric. It is also the level with the weakest connection to actual training effectiveness. Participant satisfaction does not indicate whether learning occurred, and it provides no evidence of behaviour change. A participant can enjoy training that does not change their behaviour; they can also find training uncomfortable that produces sustained behaviour change.

Level 1 data is useful for assessing training quality and programme delivery. It is not useful for assessing whether training achieved its purpose.

Level 2: Learning

Level 2 measures whether participants acquired the intended knowledge, skills or attitudes as a result of training. Level 2 is typically captured through post-training assessments, quizzes, or skills demonstrations immediately after the training event.

Level 2 is a more meaningful measure than Level 1 because it assesses acquisition rather than only reaction. A participant who passes a post-training quiz has demonstrated that they absorbed the content. However, Level 2 evidence does not indicate whether the participant will apply the learning in their actual work.

Level 2 is appropriate as the primary measurement standard when the objective is knowledge transfer: compliance facts, product information, process knowledge. It is insufficient as the primary standard when the objective is behaviour change.

Level 3: Behaviour

Level 3 measures whether training participants changed their observable behaviour in their actual work as a result of training. Level 3 is the most practically important level for most L&D buyers because it directly answers the question that training is intended to address: did people behave differently?

Level 3 measurement requires observational data collected in the actual workplace, typically 3 to 6 weeks after the training event when the behaviour change should be embedded. Methods include: observation of actual work behaviour, manager observation reports, peer observation, mystery-shopper approaches, and performance data that reflects the target behaviour.

Level 3 is rarely measured. CIPD surveys consistently find that fewer than 10% of UK organisations routinely measure training at Level 3. The most common reason is difficulty: Level 3 requires effort, methodology, and follow-up that most training providers do not offer and most organisations do not build into their procurement specifications.

Level 4: Results

Level 4 measures whether the behaviour change produced the intended business or operational outcomes. Examples include: reduced incident rates (for safety training), improved patient outcomes (for clinical-behaviour training), higher sales conversion rates (for sales training), reduced harassment complaints (for harassment-prevention training).

Level 4 is the most meaningful level and the hardest to measure. Isolating the contribution of a specific training intervention from all other variables that affect the outcome requires sophisticated analysis. Most organisations do not attempt Level 4 measurement for individual training programmes. Where Level 4 measurement is built into a training engagement, it typically focuses on one or two specific downstream metrics where the connection to the training behaviour is strong and traceable.

Why Level 3 is the Standard That Matters

For L&D buyers whose purpose is behaviour change rather than knowledge transfer, Level 3 is the only measurement standard that directly assesses whether the training achieved its purpose. The case rests on three structural arguments.

Argument 1: Level 1 and 2 are systematically inflated by the Dunning-Kruger effect. Post-training satisfaction scores and confidence assessments are vulnerable to the Dunning-Kruger pattern (see our guide to the Dunning-Kruger effect): exposure to a topic produces inflated confidence that does not reflect actual capability. Level 1 and Level 2 data therefore systematically overstate training effectiveness when the objective is behaviour change.

Argument 2: Level 1 and 2 data does not differentiate effective from ineffective training designs. Both a rigorous immersive behaviour-change programme and a poor-quality awareness e-learning can produce high Level 1 satisfaction scores. The measurement standard does not discriminate between the two. Level 3 measurement does: it tells you which design actually changed behaviour and which did not.

Argument 3: Level 3 data supports procurement defensibility. For organisations under regulatory scrutiny (FCA conduct-and-culture, CQC well-led, Office for Students, HMICFRS), Level 3 behavioural evidence is more defensible than Level 1 satisfaction data. The Worker Protection Act 2024 all-reasonable-steps duty is read at the level of demonstrable behavioural change, not certificate completion.

How Sidestream Uses the Kirkpatrick Model

Sidestream's standard measurement commitment is Kirkpatrick Level 3 as the minimum, Level 4 where the brief allows. This is the commitment that distinguishes our design from training designed to produce positive satisfaction surveys.

For each engagement, the Level 3 measurement framework is calibrated to the specific behavioural target during the diagnostic phase. Specific measures we use:

Speak-up frequency post-engagement versus pre-engagement baseline
Structured peer challenge frequency in leadership meetings
Disclosure-response quality observation (for harassment-prevention work)
Coaching question frequency in 121 meetings (for coaching-skills work)
Decision-documentation quality (for decision-making work)
Validated psychological safety scale scores (Edmondson 7-item scale) combined with observed behavioural data

The six-week embedding architecture is the structural commitment that enables Level 3 measurement. By building in structured follow-through and behavioural observation after the training event, the measurement infrastructure is designed into the engagement rather than bolted on as an afterthought.

The New World Kirkpatrick Model (2016 Update)

James Kirkpatrick and Wendy Kayser Kirkpatrick's 2016 update introduced the "New World Kirkpatrick Model," which inverts the traditional sequence. Rather than evaluating training after the fact from Level 1 up, the 2016 update argues that effective training should be designed backwards from Level 4: start with the intended business results (Level 4), identify the behaviour change required to produce them (Level 3), design the learning content to build that behaviour (Level 2), and plan for engagement (Level 1).

The backwards design principle is well-established in instructional design generally. The Kirkpatrick update makes it explicit: training that is not designed with Level 3 and 4 outcomes as the starting point is unlikely to produce Level 3 and 4 outcomes as the result.

Related Sidestream Guides

What is the Dunning-Kruger Effect?, which explains why Level 1 and 2 data is unreliable
Immersive Training vs E-Learning, which maps the two methods against the Kirkpatrick levels
Behaviour Change Training: The Complete UK Guide
Organisational Behaviour Training London
Behavioural Design Workshop London
Glossary: 100 Behaviour Change Terms

Frequently Asked Questions

What is the Kirkpatrick model?

A four-level framework for evaluating training: Level 1 (Reaction), Level 2 (Learning), Level 3 (Behaviour), Level 4 (Results). Developed by Donald Kirkpatrick in 1959, updated in 2016.

What is Kirkpatrick Level 3?

The Behaviour level: whether participants changed their observable behaviour in actual work as a result of training. The most practically important level for behaviour-change training objectives.

Why is Kirkpatrick Level 3 rarely measured?

It requires observational data collected after the training event in the actual workplace, which demands more methodology and effort than post-training surveys. Most organisations and training providers default to Level 1 and 2.

What is the difference between Kirkpatrick Level 1 and Level 3?

Level 1 measures satisfaction with the training event. Level 3 measures actual behaviour change in real work. The two are often uncorrelated: high Level 1 scores do not predict Level 3 outcomes.

Who invented the Kirkpatrick model?

Donald L. Kirkpatrick, first published in 1959. Updated by James Kirkpatrick and Wendy Kayser Kirkpatrick in 2016.

The Kirkpatrick Model in Summary

The Kirkpatrick model evaluates training across four levels: Reaction, Learning, Behaviour and Results. For any programme whose purpose is behaviour change, Level 3 is the standard that counts, because it asks whether people actually act differently at work. Most training is measured only at Levels 1 and 2, which inflate effectiveness and tell you little about real-world change.

Level 1 measures whether participants liked the training; Level 2 measures what they learned; Level 3 measures changed behaviour at work; Level 4 measures business or operational results.
Level 1 and Level 2 are often uncorrelated with Level 3, and confidence ratings overstate competence.
Level 3 is rarely measured because it needs observation in the workplace and follow-up after the event, not a satisfaction survey.
The 2016 New World update designs backwards from Level 4 results, building each lower level to serve the outcome.