Why do History Departments struggle? What can you do about it?Part 3 — Bad Data and letting the score take care of itself.

Kristian Shanks
17 min readJun 28, 2020

--

This is the third part of my occasional series exploring why History Departments sometimes struggle. I hope these are of use to prospective and incumbent Heads of Department as they navigate through the sometimes choppy waters of subject leadership. Part one — on the ‘revolving door’ of staffing is here, and part two — on the issue of resourcing, is here.

In many schools, you will find yourself, as Head of Department, confronted with lots and lots of data. Indeed, I think a number of years ago, until what might be termed the Counsell Revolution, managing data was perhaps seen as the pre-eminent role in the Head of Department. Being a Curriculum Leader who actually led the curriculum for your subject certainly was not really a thing outside of, perhaps, the independent sector, grammar schools and a few islands of practice around the country.

Instead, crucial to the role was analysing data. You, of course knew this already because no doubt one of the tasks you had to complete on interview was about analysing some of the internal school data about your department. You had to show in an hour that you could get your head round a big spreadsheet and identify the actions that you would take — as if those numbers alone could provide insight to the problems or challenges this department was facing (also — I wish they’d only give me an hour to do my actual exam analysis because it would save a load of work!). For many History people this is particularly troublesome — we’re not all of us exactly renowned for our Maths skills in this subject.

Unfortunately, as a Head of History you will find that there is a big problem with data in many schools. There is a lot of bad data around that you’re expected to spend a lot of time analysing, and not all that much good data that actually helps you, unless you know where to look. You’ll find yourself presented with bad data by the Senior Leadership Team, and you’ll also find yourself presented with bad data from some of the teachers in your own department. You’ll find that much of the data you look at focuses on the quality of the outputs rather than the inputs. Let’s have a look at some of the issues with data that you may encounter.

ISSUE 1 — The data produced by your department is flawed.

One of the things you may encounter is that your department has a reputation for, most likely, predicting too generously and then not delivering on those results. Now, the whole culture of predicted grades has got a bit out of hand — it seems in some schools if you’re not Mystic Meg then you’re not doing your job properly. But clearly, if as a department you’ve predicted, I don’t know, 90% 9–4 and your kids have only come out with 60% 9–4 then that’s an issue.

However, there might be a number of reasons for this. Yes, colleagues may have been generous, but that sometimes is reflective of management culture where ‘questions are asked’ if internal predictions are seen as too low. Indeed, I have been in schools where I have been encouraged to make sure predictions were high because OFSTED were due and we wanted to make sure they’d see an upward trend in our results (hence the right decision to remove internal data considerations from the new framework).

Another issue might be poor internal assessment. What are the mock exams like? Do they give you a good idea of how students would do in the real thing? Or are they made too gentle, which leads to students and teachers having an over-inflated view of performance? Are you or your teachers indicating what topics will come up in advance? Are students encountering challenging enough work that will give you a real sense of how everyone is doing? I’m not suggesting that you should go for ‘death by exam question’ — far from it — but you need ways of seeing whether students are able to sit and write in an extended and knowledgeable way in silent conditions. Mock exams themselves can be problematic if not handled with care, as I’ll discuss later on.

For me, when it comes to predictions, I err on the side of caution. I tend to think if we’ve slightly under-predicted and then exceed expectations I’m pretty happy with that. I think that also reflects how History works in terms of the mythical concept of ‘progress’. Students don’t progress through grades in History, or indeed any subject, on some linear flightpath. Students to me are building to their peak performance in those summer exams. They are far better prepared for the summer exams than any mock exam they could do — so they should do, on average, a bit better in those than in the mocks (if your mocks are challenging enough). Of course the trick is to ensure motivation is maintained through the inevitable setbacks some students will face.

ISSUE 2 — The data given to you about your department is flawed.

Early in the new academic year you’ll have to do your exam analysis. In some schools you’ll be confronted with stacks of numbers downloaded off of SISRA or some other data management tool. These often are not terribly helpful numbers and don’t reveal much that you probably didn’t already know. Let’s have a look at some of the things you’ll get given and why they aren’t great.

1. Residual scores — how well did students do in your subject compared to everyone else? These scores are easily skewed when looked at collectively — especially if you’ve got one open bucket soft option filler qualification where everyone does very well — that will suppress your own department’s residual score. It is perhaps more useful to look at individual students and think about how they did in yours versus other subjects.

2. Gender/Ethnicity/PP breakdowns. Again — not very helpful especially as you often get this in isolation from everyone else’s unless you go digging around for that information yourself. There’s also not a lot you can meaningfully do with that information. Take the issue of gender. Is the solution to boys doing worse than girls, to teach boys differently as a homogenous block? While I’m improving boys results, should I just ignore trying to help girls? Or should I do things that will help boys but harm the girls? It’s a nonsense. This is a national problem, one you are unlikely to solve on your own, although you may have a fluke year from time to time. What these distinctions do is to take your focus off the goal — which is try to ensure every student does better than they might otherwise have done. If I implement some change that increases results for all, but maintains a boy-girl gap, then that to me is much more of a success than it is a failure. PP/Disadvantage is another especially problematic category. Some students who ‘should’ be PP, aren’t, for various reasons. Not all PP students are the same. One-size-fits-all solutions aren’t going to work here. Additionally — arguably you’d be better off not knowing who the PP kids are given the issues of the internal biases that we all hold. On the issue of teaching boys and disadvantaged pupils, Matt Pinkett and Mark Roberts excellent Boys Don’t Try is the seminal work here and they slaughter various sacred cows far better than I could ever do. Buy it and read it, if you haven’t already.

3. HAP/MAP/LAP breakdowns. I think there’s a little bit more merit here than the previous breakdowns, but not much. The problem is that the groupings are too broad (The definition of ‘MAP’ encompasses a huge number of the student population) and depending on your context you may find you have very few of one or more of those groups (at my school we have a handful of LAP students every year taking History).

4. Comparisons with the national picture — this can be in terms of the SISRA attempt to create a ‘subject P8’ (the SPI score) or perhaps comparing your grades with the expected grades via FFT20 or something like that. In the former case — it should be treated as a guide rather than something concrete — and often isn’t presented with the upper and lower confidence intervals that really ought to be there (indeed — failure to understand confidence intervals is a big issue across a lot of how school data is presented I find). In the latter case — the FFT20 score is fine in some ways but is generated via English and Maths achievement in Y6 — not necessarily helpful for History and certainly not for subjects like Art or PE who can be really punished by this system. Unfortunately — if nuance was used then that would be better when it comes to looking at this type of information but it often isn’t.

5. Exam board information — In some schools, you are often left to your own devices when it comes to getting information from the exam boards. It is literally astonishing to me that I have not seen a single piece of training for middle leaders about using exam board provided results data like the Edexcel Results Plus service or whatever. The information here tends to be much more helpful than what you get from your SLT, as it shows you how students actually did on actual exam questions — yet this is something you’re expected to sort out completely on your own. This information was great for me in 2018 coming in as a new HoD, as I could see straight away that the students had issues with the interpretations aspect of the Germany Paper 3 and how to approach those silly questions. I could also see that Paper 2 was a major problem. For the Paper 2 the following year — although our results weren’t flashy still, I could at least see that we were around the national average on this paper — so it looked worse than perhaps it was in reality.

ISSUE 3 — The data about the process is either limited, flawed or non-existent.

So one of the issues I find is that a lot of the data we get is ‘end of product’ data in some form — either actual end of GCSE data or trying to guess the end of GCSE data based on some internal inputs. However — what is more tricky for Heads of History to get at is good data about the input. There are a number of issues.

1. Attendance data. This one of my absolute pet hates. Attendance strongly correlates with achievement (or perhaps — lack of attendance strongly correlates with lack of achievement). Yet the measuring of it is atrocious in a lot of schools. While we can get figures on a child’s attendance % by session it’s much harder to get the % attendance for a child to their actual lessons, broken down by subject. You can get a version of this information via SIMS, although not in a terribly helpful format (and lots of schools are moving away from SIMS I think due to cost). But the accuracy of that data is a problem. How many times do you see a child marked present for your lesson, but they are actually in Isolation, or ‘having a chat’ with the pastoral leader, or just wandering the corridors? You may, in fact, have marked them as ’N’ — only to see that mark retrospectively changed by somebody else who could account for their whereabouts — BUT THEY WERE NOT IN THE LESSON! Children are often marked present in that they are physically in the building, but they are not in your lesson. So what, you may ask? But then if you’re in a struggling department then you need this information to back up your case for why some students may have underachieved. It’s amazing to me how little effort is put in to getting this right. Rant over.

2. Effort grades. There are a number of issues with effort grades. First of all, there’s often little to no attempt to ensure that the data is provided here is of quality. There’s no attempt to check and ensure that we all know what a ‘2’ out of 5 actually looks like. There’s also often very little done with this data at a whole school level (other than it’s shared with parents) which often makes me think why we bothered putting it in the first place — I think this is because Senior Leaders are often drowning in data themselves (sometimes a self-inflicted problem) and lack time. Sometimes you are asked to provide effort grades for lots of different things (Resilience! Independent learning! Homework! Behaviour!) that are actually quite hard to separate out in a granular way or make a meaningful comment on (I think just one ATL grade is preferable, personally). Often issues about low effort grades for individual students are batted back to departments rather than leading to whole school solutions — you get 11 different teachers working independently trying to help Student X rather than all working together to come up with some sort of coherent plan.

3. Current and predicted grades. OK so this is another bugbear — we’re often asked to input a ‘current’ grade as well as a ‘predicted’ grade. This happens in, I think, every school I’ve worked in. The issue, to cut a long story short, is that current grades are bollocks. In no sense is a student in April of Year 10 ‘currently’ a 6 in anything when they might only have been taught 40% of the content! I think a big issue here is the differences between subject. One Maths department I knew used to set kids the whole GCSE as a mock exam, say in Year 10, even when they hadn’t taught the whole course. This generated a ‘current’ grade — with the predicted grade then extrapolated from that. I think it’s a silly idea but I can see a logic to it in Maths. Of course in History this would be completely stupid. You wouldn’t give kids all 3 GCSE Papers in Year 10, when they hadn’t been taught any of the content for Paper 3 yet! You only set assessments on what you’ve taught. But then this blurs the line between a current and a predicted grade quite heavily. Frankly, current grades should go in the bin. What we’re bothered about is what we think the student will get when they sit the exam for real.

Another issue is that we don’t like inexactitude. We want to predict the exact grade a student WILL get. In some schools, not only do you predict the exact grade, but you then have to ‘fine grade’ at an even more granular level. This to me is one of those things that is desirable, but isn’t actually possible (I stole that line from someone, can’t remember who, but it’s a good one). I can’t reasonably say in the Spring of Year 12 that a student is going to get a ‘High B’ or ‘C-’. So why ask? I actually think we need to be less exact? I think I can say about a student in the Spring of Year 12 that they are likely to be, maybe, an A or a B. At the end of Year 9 (we do a 3 year KS4), I can say that I think a student will be somewhere between 7 and a 9, and feel pretty secure about that. That to me provides more useful information than saying that they are an 8a! And yes it may feel like hedging in some cases — but all students go into their exam for my GCSE History course on 0 out of 168. The range of mark possible for a ‘7’ might be something like 14 marks. That’s not a lot — and we know some students are likely to have volatile performance for all sorts of reasons. So hedging in this context seems to present useful information.

4. Mock Exam and internal assessment data. We’re often asked to do things like ‘Question Level Analysis’ on Mock data. This sounds perfectly reasonable — however for History it does present a problem. As a subject, only a very small amount of the domain of knowledge is actually assessed. Knowing that students struggled with a question on ‘why Hitler became Chancellor in 1933’ is useful — but given that I’ve probably used a recent exam paper, it’s probably unlikely it’s coming up on the actual exam students are going to do. So do I spend lots of time re-going over that topic? What I need to do — and there’s a brilliant blog by Adam Boxer about this with regards to Science — is to try to extrapolate what larger problems are revealed by the student responses to particular questions. What else can I work out that they don’t know about Germany, if they don’t know very well why Hitler was appointed Chancellor? (There’s also another excellent blog by Matthew Benyohai on mock exams and assessment, also from a Science background, here).

Another issue can be the focus on ‘question type’ over content knowledge. The narrative account question is often considered highly problematic for those of us following the Edexcel course. The first year of this course, the board set a question on the Indian Wars 1862–4 — a question that led to an average national score of about 2 out of 8. Now — is the conclusion that students couldn’t do ‘narrative account’ questions, or that they knew sod all about the Indian Wars 1862–4? My feeling here was the latter, given that this was a new bit to come into this topic for the new specifications and many centres I think basically didn’t teach it or assumed ‘it wouldn’t come up’.

Other problems of course include that mock exams aren’t the same as real exams. I think a great myth is that doing more mock exams will mean they’re better prepared for the real exams. Alex Quigley has blogged about this issue with regards to English. This idea, of doing lots of mock exams is a PiXL thing I believe, in particular — they’ve given them the ghastly name of PPEs because there aren’t already enough acronyms in teaching. Unfortunately, some students don’t prepare in any way properly unless it’s a real exam and many students spend hours in exam halls with their head on the desk, which seems like a total waste of teaching time. I’m increasingly of the view that there should only one set of mocks in Year 11, and these need to be condensed into a short time frame and come just after a set of holidays — perhaps October or February. You’re also missing so much teaching time — some students are doing three loads of mocks, maybe three lots of two-week mock windows, from summer Year 10 to Spring Year 11 — that’s a lot of time given that any lessons that do take place in mock exam windows are often highly disrupted. It seems to me that the focus on doing lots of mock exams is often more about generating more ‘accurate’ data rather than actually facilitating school improvement.

What you can do, I think, is draw broader conclusions. We have a good idea in my department, both from internal and external data, that our students have a problem with the Medicine through Time section at the moment. They are not performing as well on that compared to their peers nationally, and the work I see supports that suggestion. So as a department we’ve been working hard on trying to correct this. We also know the American West is a problem — but with that I know that we’re not doing that badly on it compared to the whole country. The issue is more the psychology of the students at the lower end of attainment giving up because it just seems impossible.

5. Data and information about teacher effectiveness. Thankfully we are moving away from graded lesson observations which has been thoroughly shown by evidence to be highly flawed (see David Didau’s 2014 blog here as one example)— although do watch out for ‘graded lessons’ by another name — where features of good teaching are listed and there’s maybe a ‘gold/silver/bronze’ or ‘exceeding/meeting/developing’ approach taken to these. Here we actually have lots of meaningless, highly flawed grades rather than just one. Other people as referenced above have blogged well on the problems of measuring teacher effectiveness — it’s really hard to do particularly over a short time frame. Past exam results are often used to make judgements about teachers but again this is highly problematic where data about someone is taken in isolation. If someone’s classes have performed poorly, by comparison to other teachers of similar ability groupings of pupils, in the same subject, for a long period of time, then you can probably say there’s an issue (which might be fixable). Otherwise I think you have to be very careful. Unfortunately lesson observation cultures prioritise ‘the cult of the single lesson’ over measuring the quality of teaching over an extended period of time. That’s because it’s much easier to do the former than the latter.

ISSUE 4 — Data often means you lose sight of what will actually improve your department.

As Curriculum Leaders, data often ends up sending us off in all sorts of different directions. It can be a real distractor. Tom Sherrington’s masterful blog on the perils of ‘macro-summative attainment tracking’ makes this point. Frankly, if there was a massive irretrievable data loss and all of our pupil tracking info was gone, it wouldn’t make the blindest bit of difference to student achievement — and there’s probably some potential that results would actually go up rather than down! Ultimately — you need to always keep your eyes on the prize — what things are going to make the biggest difference to my students’ achievement in this subject?

The chances are that the answer to that question remains fairly constant no matter what different bits of data throw up. Some things (Do students attend my lesson? How well can they read and write? Does the school have an effective policy for managing student behaviour?) are somewhat out of your control as a Secondary School History teacher with maybe 2–3 hours a week of contact time with a GCSE class. Some things (What type of approaches do we use in the classroom and are they generally more effective? Are we teaching the correct content for this specification? Do my teachers know the content and assessment requirements they are working towards well enough? Does my KS3 curriculum prepare them for KS4? Do teachers in my department follow the school behaviour policy properly in issuing sanctions and rewards?) are more in your control. This is why, some of the most effective schools in the country, like Michaela School, like the Dixons’ schools in Bradford, are so focused on getting their systems and processes and their culture right, day in and day out, at every level of the school, rather than fussing, at least at a Head of Department level, about burdensome and fairly meaningless data analysis (or at least that is how it seems on the outside).

The best data is often that qualitative data collected as a result of your daily interactions with students — working out which bits of the course they’ve got, and which bits they haven’t. Looking at a sample of student work and realising they are not yet secure on the key knowledge for rebellions against William I in the late 1060s. Doing some retrieval practice and they’re all struggling about who the guy was who developed cattle ranching on the Plains. You don’t need a jazzy spreadsheet for that (and I love a jazzy spreadsheet).

Focus on those things in your department and make sure they’re done really well — they are constants. Those priorities don’t change regardless of whatever the data says. Frankly, you haven’t got the time in this job to faff with loads of this other stuff where the opportunity cost isn’t favourable. If you focus on those key things all the time, then, as the great NFL (American football for non-sports types) coaching legend Bill Walsh once said, ‘the score takes care of itself’.

--

--

Kristian Shanks
Kristian Shanks

Written by Kristian Shanks

I’m an Assistant Principal (Teaching and Learning) at a Secondary school in Bradford. Also teach History (and am a former Head of History).

No responses yet