Thoughts on managing variability: Powerful percentages

Numbers are powerful, statistics are powerful, but they must be used correctly and responsibly. Leaders need to use data to help take decisions and measure progress, but leaders also need to make sure that they know where limitations creep into data, particularly when it's processed into summary figures.

This links quite closely to this post by David Didau (@Learningspy) where he discusses availability bias - i.e. being biased because you're using the data that is available rather than thinking about it more deeply.

As part of this there is an important misuse of percentages that as a maths teacher I feel the need to highlight... basically when you turn raw numbers into percentages it can add weight to them, but sometimes this weight is undeserved...

Percentages can end up being discrete measures dressed up as continuous
Quick reminder of GCSE data types - Discrete data is in chunks, it can't take values between particular points. Classic examples might be shoe sizes where there is no measure between size 9 or size 10, or favourite flavours of crisps where there is no mid point between Cheese & Onion or Smoky Bacon.

Continuous data can have sub divisions inserted between them, for example a measure of height could be in metres, centimetres, millimetres and so on - it can keep on being divided.

The problem with percentages is that they look continuous - you can quote 27%, 34.5%, 93.2453%. However the data used to calculate the percentage actually imposes discrete limits to the possible outcome. A sample of 1 can only have a result of 0% or 100%, a sample of 2 can only result in 0%, 50% or 100%, 3 can only give 0%, 33.3%, 66.7% or 100%, and so on. Even with 200 data points you can only have 201 separate percentage value outputs - it's not really continuous unless you get to massive samples.

It LOOKS continuous and is talked about like a continuous measure, but it is actually often discrete and determined by the sample that you are working with.

Percentages as discrete data makes setting targets difficult for small groups
Picture a school that sets an overall target that at least 80% of students in a particular category (receipt of pupil premium, SEN needs, whatever else) are expected to meet or exceed expected progress.

In this hypothetical school there are three equivalent classes, let's call them A, B and C. In class A we can calculate that 50% of these students are making expected progress; in class B it's 100%, and in class C it's 0%. On face value Class A is 30% behind target, B is 20% ahead and C is 80% behind, but that's completely misleading...

Class A has two students in this category, one is making expected progress, the other isn't. As such it's impossible to meet the 80% target in this class - the only options are 0%, 50% or 100%. If the whole school target at 80% accepts that some students may not reach expected progress then by definition you have to accept that 50% might be on target for this specific class. You might argue that 80% is closer to 100% so that should be the target for this class, but that means that this teacher as to achieve 100% where the whole school is only aiming at 80%! The school has room for error but this class doesn't! To suggest that this teacher is underperforming because they haven't hit 100% is unfair. Here the percentage has completely confused the issue, when what's really important is whether these 2 individuals are learning as well as they can?

Class B and C might each have only one student in this category. But it doesn't mean that the teacher of class B is better than that of class C. In class B the student's category happens to have no significant impact on their learning in that subject, they progress alongside the rest of the class with no issues, with no specific extra input from the teacher. In class C the student is also a young carer and misses extended periods from school; when present they work well but there are gaps in their knowledge due to absences that even the best teacher will struggle to fill. To suggest that either teacher is more successful than the other on the basis of this data is completely misleading as the detailed status of individual students is far more significant.

What this is intended to illustrate is that taking a target for a large population of students and applying it to much smaller subsets can cause real issues. Maybe the 80% works at a whole school level, but surely it makes much more sense at a class level to talk about the individual students rather than reducing them to a misleading percentage?

Percentage amplifies small populations into large ones
Simply because percent means "per hundred" we start to picture large numbers. When we state that 67% of books reviewed have been marked in the last two weeks it conjures up images of 67 books out of 100. However that statistic could have been arrived at having only reviewed 3 books, 2 of which had been marked recently. The percentage give no indication of the true sample size, and therefore 67% could hide the fact that the next step better could be 100%!

If the following month the same measure is quoted as having jumped to 75% it looks like a big improvement, but it could simply be 9 out of 12 this time, compared to 8 out of 12 the previous month. Arithmetically the percentages are correct (given rounding), but the apparent step change from 67% to 75% is actually far less impressive when described as 8/12 vs 9/12. As a percentage it suggests a big move in the population; as a fraction it means only one more meeting the measure.

You can get a similar issue if a school is grading lessons/teaching and reports 72% good or better in one round of reviews, and then sees 84% in the next. (Many schools are still doing this type of grading and summary, I'm not going to debate the rights and wrongs here - there are other places for that). However the 72% is the result of 18 good or better out of 25 seen, the 84% is the result of 21 out of 25. So the 12% point jump is due to just 3 teachers flipping from one grade to the next.

Basically when your population is below 100 an individual piece of data is worth more than 1% and it's vital not to forget this. Quoting a small population as a percentage amplifies any apparent changes, and this effect increases as the population size shrinks. The smaller your population the bigger the amplification. So with a small population a positive change looks more positive as a percentage, and a negative change looks more negative as a percentage.

Being able to calculate a percentage doesn't mean you should
I guess to some extent I'm talking about an aspect of numeracy that gets overlooked. The view could be that if you know the arithmetic method for calculating a percentage then so long as you do that calculation correctly then the numbers are right. Logic follows that if the numbers are right then any decisions based on them must be right too. But this doesn't work.

The numbers might be correct but the decision may be flawed. Comparing this to a literacy example might help. I can write a sentence that is correct grammatically, but that does not mean the sentence must be true. The words can be spelled correctly, in the correct order and punctuation might be flawless. However the meaning of the sentence could be completely incorrect. (I appreciate that there might be some irony in that I may have made unwitting errors in this sentence about grammar - corrections welcome!)

For percentage calculations then the numbers may well be correct arithmetically but we always need to check the nature of the data that was used to generate these numbers and be aware of the limitations to the data. Taking decisions while ignoring these limitations significantly harms the quality of the decision.

Other sources of confusion
None of the above deals with variability or reliability in the measures used as part of your sample, but that's important too. If your survey of books could have given a slightly different result if you'd chosen different books, different students or different teachers then there is an inherent lack of repeatability to the data. If you're reporting a change between two tests then anything within test to test variation simply can't be assumed to be a real difference. Apparent movements of 50% or more could be statistically insignificant if the process used to collect the data is unreliable. Again the numbers may be arithmetically sound, but the statistical conclusion may not be.

Draw conclusions with caution
So what I'm really trying to say is that the next time someone starts talking about percentages try to look past the data and make sure that it makes sense to summarise it as a percentage. Make sure you understand what discrete limitations the population size has imposed, and try to get a feel for how sensitive the percentage figures are to small changes in the results.

By all means use percentages, but use them consciously with knowledge of their limitations.

As always - all thoughts/comments welcome...

Thoughts on managing variability

Saturday, 14 June 2014

Powerful percentages

No comments:

Post a Comment