Guide · Training Evaluation

What is the Kirkpatrick Model?

The Four Kirkpatrick Levels: A Complete Breakdown

Donald L. Kirkpatrick first published his four-level model in 1959 in the Journal of the American Society of Training Directors. It was formalised in book form in 1994 as Evaluating Training Programs: The Four Levels and updated by James Kirkpatrick and Wendy Kayser Kirkpatrick in Kirkpatrick's Four Levels of Training Evaluation (2016).

Level 1: Reaction

Level 1 measures how training participants responded to the training. Was it engaging? Was it relevant? Would they recommend it? Level 1 is typically captured through post-training satisfaction surveys or "smile sheets."

Level 1 is the easiest level to measure and the most commonly used training metric. It is also the level with the weakest connection to actual training effectiveness. Participant satisfaction does not indicate whether learning occurred, and it provides no evidence of behaviour change. A participant can enjoy training that does not change their behaviour; they can also find training uncomfortable that produces sustained behaviour change.

Level 1 data is useful for assessing training quality and programme delivery. It is not useful for assessing whether training achieved its purpose.

Level 2: Learning

Level 2 measures whether participants acquired the intended knowledge, skills or attitudes as a result of training. Level 2 is typically captured through post-training assessments, quizzes, or skills demonstrations immediately after the training event.

Level 2 is a more meaningful measure than Level 1 because it assesses acquisition rather than only reaction. A participant who passes a post-training quiz has demonstrated that they absorbed the content. However, Level 2 evidence does not indicate whether the participant will apply the learning in their actual work.

Level 2 is appropriate as the primary measurement standard when the objective is knowledge transfer: compliance facts, product information, process knowledge. It is insufficient as the primary standard when the objective is behaviour change.

Level 3: Behaviour

Level 3 measures whether training participants changed their observable behaviour in their actual work as a result of training. Level 3 is the most practically important level for most L&D buyers because it directly answers the question that training is intended to address: did people behave differently?

Level 3 measurement requires observational data collected in the actual workplace, typically 3 to 6 weeks after the training event when the behaviour change should be embedded. Methods include: observation of actual work behaviour, manager observation reports, peer observation, mystery-shopper approaches, and performance data that reflects the target behaviour.

Level 3 is rarely measured. CIPD surveys consistently find that fewer than 10% of UK organisations routinely measure training at Level 3. The most common reason is difficulty: Level 3 requires effort, methodology, and follow-up that most training providers do not offer and most organisations do not build into their procurement specifications.

Level 4: Results

Level 4 measures whether the behaviour change produced the intended business or operational outcomes. Examples include: reduced incident rates (for safety training), improved patient outcomes (for clinical-behaviour training), higher sales conversion rates (for sales training), reduced harassment complaints (for harassment-prevention training).

Level 4 is the most meaningful level and the hardest to measure. Isolating the contribution of a specific training intervention from all other variables that affect the outcome requires sophisticated analysis. Most organisations do not attempt Level 4 measurement for individual training programmes. Where Level 4 measurement is built into a training engagement, it typically focuses on one or two specific downstream metrics where the connection to the training behaviour is strong and traceable.

Why Level 3 is the Standard That Matters

For L&D buyers whose purpose is behaviour change rather than knowledge transfer, Level 3 is the only measurement standard that directly assesses whether the training achieved its purpose. The case rests on three structural arguments.

Argument 1: Level 1 and 2 are systematically inflated by the Dunning-Kruger effect. Post-training satisfaction scores and confidence assessments are vulnerable to the Dunning-Kruger pattern (see our guide to the Dunning-Kruger effect): exposure to a topic produces inflated confidence that does not reflect actual capability. Level 1 and Level 2 data therefore systematically overstate training effectiveness when the objective is behaviour change.

Argument 2: Level 1 and 2 data does not differentiate effective from ineffective training designs. Both a rigorous immersive behaviour-change programme and a poor-quality awareness e-learning can produce high Level 1 satisfaction scores. The measurement standard does not discriminate between the two. Level 3 measurement does: it tells you which design actually changed behaviour and which did not.

Argument 3: Level 3 data supports procurement defensibility. For organisations under regulatory scrutiny (FCA conduct-and-culture, CQC well-led, Office for Students, HMICFRS), Level 3 behavioural evidence is more defensible than Level 1 satisfaction data. The Worker Protection Act 2024 all-reasonable-steps duty is read at the level of demonstrable behavioural change, not certificate completion.

How Sidestream Uses the Kirkpatrick Model

Sidestream's standard measurement commitment is Kirkpatrick Level 3 as the minimum, Level 4 where the brief allows. This is the commitment that distinguishes our design from training designed to produce positive satisfaction surveys.

For each engagement, the Level 3 measurement framework is calibrated to the specific behavioural target during the diagnostic phase. Specific measures we use:

The six-week embedding architecture is the structural commitment that enables Level 3 measurement. By building in structured follow-through and behavioural observation after the training event, the measurement infrastructure is designed into the engagement rather than bolted on as an afterthought.

The New World Kirkpatrick Model (2016 Update)

James Kirkpatrick and Wendy Kayser Kirkpatrick's 2016 update introduced the "New World Kirkpatrick Model," which inverts the traditional sequence. Rather than evaluating training after the fact from Level 1 up, the 2016 update argues that effective training should be designed backwards from Level 4: start with the intended business results (Level 4), identify the behaviour change required to produce them (Level 3), design the learning content to build that behaviour (Level 2), and plan for engagement (Level 1).

The backwards design principle is well-established in instructional design generally. The Kirkpatrick update makes it explicit: training that is not designed with Level 3 and 4 outcomes as the starting point is unlikely to produce Level 3 and 4 outcomes as the result.

Related Sidestream Guides

Frequently Asked Questions

What is the Kirkpatrick model?

A four-level framework for evaluating training: Level 1 (Reaction), Level 2 (Learning), Level 3 (Behaviour), Level 4 (Results). Developed by Donald Kirkpatrick in 1959, updated in 2016.

What is Kirkpatrick Level 3?

The Behaviour level: whether participants changed their observable behaviour in actual work as a result of training. The most practically important level for behaviour-change training objectives.

Why is Kirkpatrick Level 3 rarely measured?

It requires observational data collected after the training event in the actual workplace, which demands more methodology and effort than post-training surveys. Most organisations and training providers default to Level 1 and 2.

What is the difference between Kirkpatrick Level 1 and Level 3?

Level 1 measures satisfaction with the training event. Level 3 measures actual behaviour change in real work. The two are often uncorrelated: high Level 1 scores do not predict Level 3 outcomes.

Who invented the Kirkpatrick model?

Donald L. Kirkpatrick, first published in 1959. Updated by James Kirkpatrick and Wendy Kayser Kirkpatrick in 2016.