Back to the future on secondary examinations?

Bethan Marshall and Margaret Brown

On September 17th 2012, Michael Gove announced further details of the proposed changes to the examination system at age 16. These modified significantly his earlier (June 21st) statement requiring a return to an O-level type of examination, apparently in order to obtain the agreement of his Liberal Democrat coalition partners. The new English Baccalaureate Certificates (EBCs) in five core subjects are now intended to be designed not just for high attainers but for the same percentage of the school population who currently enter for GCSE (90-95% in mathematics and English). The consultation closed in December and we await the results.

The ministerial statements offer no real arguments as to why 16 is the appropriate age to have costly external accountability-related examinations when all students will now remain in education or training until age 18, but focus on the rationale of making the examinations ‘more rigorous’ to raise standards in order to compete internationally. But can we be sure that the EBC system will raise standards?

The recommendation for more rigour seems likely to be interpreted both in terms of the style of examination, which will be discussed later in the article, and in terms of the difficulty of subject content. Certainly the contents of the new mathematics EBC which were first proposed by the DfE were even more ambitious than 1950s O-levels, although pass grades were then only obtained by under 25% of the population. This is presumably because of a belief that the way to raise standards is to make the examined curriculum tougher. There is some justification for this in that countries with demanding curricula generally score higher in international comparisons, but this is partly because parents are forced to support students who experience difficulty with the curriculum either by supporting their children themselves or by buying into private evening classes or personal tutors. However what happened in Curriculum 2000 when the Mathematics AS and A syllabus was made harder, by a committee constituted mainly of university mathematicians and awarding body examiners, was a much higher failure rate at AS level followed by a significant drop in the participation rate for mathematics AS and A2 courses. This was eventually reversed by removing or postponing most of the additional content, but the recovery process took many years, during which the supply of future scientists, engineers, statisticians and economists was reduced. The country cannot afford a repeat of this lower post-16 participation by toughening the content of the age 16 examinations.

But were O-level examinations always harder than GCSEs? Some of the work on ‘standards over time’ has confined itself to comparing the difficulty of examination questions. For example if we look at what was demanded in the English Language O-level set by the old London Board three things were demanded – a comprehension completed by multiple choice, a précis and an essay which could be argumentative, descriptive or narrative. In the current exam set by Edexcel, candidates have to complete two controlled assessment tasks one that will focus on their ability to comprehend another on their writing; they then have to complete an examination of an hour and three quarters either on a non-fiction text or a text from other cultures, an opinion piece and another writing task, and finally they have to complete three speaking and listening tasks, one spoken language study and one writing task from a choice of speeches, stories with a focus on dialogue, and scripts.

Similarly if we consider what has to be done for English literature again the comparison is stark. For O-level in the London Board students had to be familiar with three texts – a Shakespeare play, a novel and some poetry – and answered on them all in a two-hour exam. Now they have to study two novels – one from the canon and another from another culture, an anthology of poetry and both a Shakespeare play and a contemporary one. There is now an emphasis on the language used to create effects in addition to discussion of character and plot.

Given that most students enter for both language and literature they are being asked to do considerably more than they were in 1989 when O-level gave way to GCSE. Indeed those who took the previous JMB O-level in English Language and Literature could avoid doing an examination altogether and could complete the qualification through 100% coursework.

The fact that a wider range of performance is now demanded does not of course prove either that the examination is now ‘harder’ or that students’ standards of written English would be judged to have improved over time. The only way to check standards over time is when students are re-tested on the same test. This is what makes the international comparisons so attractive to politicians because PISA for example has been maintaining a uniform standard since 2003 by calibrating its questions and using a common core of questions, though again there are difficulties with the PISA test in English as much of the work is de-contextualised and multiple choice, neither rated highly as assessment methods by the English teaching community (Marshall, 2011).

Using common tests over time to measure standards may not be possible in subjects where the curriculum or the expectations have changed significantly, but is possible in many parts of mathematics. For example a 30-year comparison by Jeremy Hodgen et al (2009, 2010, 2011) suggests that mathematical standards have dropped slightly yet the proportion of O-level/GCSE grade A*- C passes has more than doubled from about 23% to 58% over that period. This is consistent with evidence that there is about a 2-grade slippage in marking, i.e. a level of attainment that would now be awarded a grade A would in the early 1980s have been awarded an O-level grade C. Similar results were obtained for some aspects of science by Shayer et al (2007, 2009).

Of course it would be quite possible to maintain the current distribution of grades on a more difficult examination by reducing the grade boundaries (the percentage of the total marks needed to achieve each grade). This touches on the vexed issue of the balance in the use of criterion- referencing and norm-referencing to award results in transition to a new assessment system, which was highlighted by the debacle over disappointing GCSE English grades in June 2012. When the style of assessment changes, as it did between June 2011 and January 2012, awarding bodies are still required to specify in advance the broad criteria for achieving each grade (criterion-referencing), which they then translate into a marking scheme for each component. However at the end of the accumulation process this may well initially result in a very different distribution of grades from the previous year. Ofqual reasonably claim that a change in standards across one year is unlikely and therefore using criterion-referencing alone in this way might lead to unfairness between candidates in adjacent years. They prefer to achieve continuity across such a change in examinations by requiring a very similar distribution of final grades to those in the previous year (norm-referencing), which may require a post-hoc revision in the grade boundaries. (There was a further complication in 2012 because the first of the new style of English GCSE examinations was in January, where the limited entry is unrepresentative and rapidly changing, rather than the main entry period in June.)

The O-level examination on which Michael Gove seems to modelling the new EBC was for part of its life a norm-referenced examination and therefore the grades that students received were predetermined to the extent that they had to fit into the grading distribution of earlier years. There was a move after the introduction of GCSE to use pure criterion-referenced assessment, so that any given grade would indicate to employers or others what a candidate had mastered. However research studies, in particular those by Mike Cresswell (1996), suggested that even within apparently similar examinations, examiners were not easily able to set grade boundaries which represented consistent standards from year to year, given that the content of the papers themselves varied slightly in type and in difficulty. Thus the only way of making the awarding process fair was to check the initial criterion-referenced results against previous years’ grade distributions and to alter them if there were significant differences. It is possible that Gove wants to return a pure type of norm-referenced assessment. If so, it means that students grades will be awarded only on the basis of how well they do in comparison to others, and that EBCs will not therefore be able to measure changes in national standards. (Actually as noted below it is not clear anyway that GCSEs have been successful in doing this.)

As noted earlier, the EBCs are clearly intended to be ‘more rigorous’ than GCSE not only in their subject content but also in their style, in particular including removal of the options of being examined in ‘bite-sized’ modules, or in an easier tier of papers aimed at a limited range of grades, or in teacher-marked controlled assessment or coursework.

The proposed removal of modular examinations has some disadvantages in relation to student learning. Currently students can take the modules again and again until they get the grade they want. This is not unlike an MOT approach to exams whereby you pass when you have addressed all those things which were wrong on previous tests. It also takes a more formative approach to the exam as each time you learn from your errors and put them right. Gove, however, wants everybody to take one set of exams in the summer of Year 11. This again is more in keeping with the old O-level type exam where all the papers were sat together.

Reverting to terminal exams will mean that the school year is less cluttered by the perpetual taking of tests, yet as a system, it puts much faith in a student’s ability to show what they can do under examination conditions. This affects what is known as the validity of exams. Although the reliability of the exams may be easier to achieve in a terminal examination (the London Board in English Language had multiple choice comprehension tests to improve the reliability of the O-Level, for example) it seems unlikely that such narrow styles of examining are more valid tests of ability in English. Arguments about the validity of the examination system are particularly prevalent amongst English teachers (see for example Marshall, 2000, 2001 and 2011). Paul Black and Dylan Wiliam have also written much on both the reliability and validity of terminal examinations (see for example, Black, 1998 and Wiliam, 2001).

O-levels may also have been viewed as benign forms of assessment because they were around before national tests and league tables. Many of the top public schools barely noticed them. Now everyone, including inspectors, prospective parents and universities look at GCSE results in detail. When the new examinations are introduced then teachers will be pouring over the examination criteria, making sure that their lessons are geared towards passing the exams and focusing on the C/D borderline. So it is not clear why Gove claims that EBCs will improve the quality of education by reducing the tendency for teachers to teach to the test (GBPHCCSFC, 2008).

Yet entertaining the possibility of a higher failure rate in EBCs due to their increased rigour is dismissed by Michael Gove as defeatist; his statement expresses faith that the increasing excellence of academies, heads and teachers will simply achieve similar pass rates in a system with tougher curriculum and assessment regime. This optimistic assertion on standards seems to have little base of solid evidence.

In reality, if the content and style of the age 16 assessments suddenly become more demanding, without a compensating lowering of grade boundaries, we may see a very much greater catastrophe than was created this summer. That will be a difficult clock to turn back.

There are alternatives. The current IGCSE, unlike the present GCSE, in English still has coursework as part of its component and as with the old O-level there should be different syllabi offering a variation on how the course will be delivered. Certainly the National Association for the Teaching of English will support this as a move partly because, as has been said before, of the objections to the validity of a terminal exam alone. The sheer range of subject matter that can be assessed is far greater if some form of coursework or controlled conditions is allowed.

There are certainly several problems with the existing GCSE in maths (e.g. a lack of interesting and challenging questions, which in turn affects the curriculum) but a new ‘matched pair’ of maths GCSEs are currently being trialled which could improve the current offering. Although the EBC proposal of a single examination for all (or at least the top 80%) sounds very attractive, it is technically impossible within 2 papers to have enough questions to reliably differentiate between both A and A* and between F and G grades. And even if it were, current practice with Year 6 coaching for a single set of maths papers suggests that students likely to scrape a level 3 become disillusioned at never being able to succeed with most questions on practice papers, while those likely to easily achieve a level 5 can waste a year practising what for them are mostly very easy questions. The even wider spread of attainment at Year 11 is likely to exacerbate the problem. However a more flexible model could be provided (e.g. 3 papers increasing in difficulty of which all students take the middle one and have a choice of whether to take one or both of the others). It is technically possible to set questions which differentiate by outcome in maths as well as English, but there would have to be a much longer assessment period, probably using controlled assessments, to obtain reliability.

To return to an earlier point, with the leaving age of pupils soon to become eighteen it is strange that Gove should place so much emphasis on re-designing the examination for sixteen year olds. We are almost alone in the West in having assessment at this age. Surely it would be better to consider once more the Tomlinson proposals which concentrated on the fourteen to eighteen range and did not focus on an intermediate 16+ qualification that carried such weight. Pupils would then take a range of subjects including vocational, technical and academic courses, with a core of functional English and Maths, up to the age of eighteen as is done in other countries. This would create a different kind of baccalaureate more akin to the one taken in many European countries, and one which we believe to be more appropriate for the 21st century.

Black, P. (1998) Testing: friend or foe?: Theory and practice of assessment and testing. London Falmer Press
Cresswell, M.J. (1996) Defining, setting and maintaining standards in curriculum-embedded examinations: judgemental and statistical approaches. In H. Goldstein & T. Lewis (eds.) Assessment: problems, developments and statistical issues (p.57-84). Chichester: John Wiley and Sons
Great Britain Parliament House of Commons Children, Schools and Families Committee. (2008) Testing and assessment: Government and Ofsted responses to the Committee’s third report of session 2007-08, fifth special report of session 2007-08. House of Commons papers 1003 2007-08. London, HMSO.
Hodgen, J. Küchemann, D. Brown, M and Coe, R. (2009) Children’s understandings of algebra 30 years on Research in Mathematics Education 11(2) 193-194
Hodgen, J., Küchemann, D., Brown, M., & Coe, R. (2010). Multiplicative reasoning, ratio and decimals: A 30 year comparison of lower secondary students’ understandings. In M. F. Pinto & T. F. Kawaski (Eds.), Proceedings of the 34th Conference of the International Group of the Psychology of Mathematics Education (Vol. 3, pp. 89-96). Belo Horizonte, Brazil.
Hodgen, J., Brown, M., Küchemann, D., & Coe, R. (2011). Why have educational standards changed so little over time: The case of school mathematics in England. Paper presented at the British Educational Research Association (BERA) Annual Conference, Institute of Education, University of London
Marshall, B. (2000) English Teachers – The Unofficial Guide: Researching the philosophies of English teachers. London, Routledge Falmer
Marshall, B. (2001) Marking the Essay: Teachers subject philosophies as related to their assessment English in Education, 35(3)42 – 57
Marshall, B. (2011) Testing English: Summative and formative assessment in English. London Continuum
Shayer, M., Ginsberg, D. & Coe, R (2007) Thirty years on – a large anti-Flynn effect? The Piagetian tests Volume and Heaviness norms 1975-2003. British Journal of Educational Psychology, 77(1), 25-41.
Shayer, M., & Ginsburg, D. (2009). Thirty years on – a large anti-Flynn effect? (II): 13- and 14-year-olds. Piagetian tests of formal operations norms 1976-2006/7. British Journal of Educational Psychology, 79, 409-418.
Wiliam, D. An Overview of the Relationship Between Assessment and the Curriculum. In D. Scott (ed) Curriculum and Assessment. Washington, Library of Congress