Portuguese university students’ conceptions of assessment: taking responsibility for achievement

After 20 years of the Bologna Declaration, Portuguese universities claim to be implementing student-centred and involved assessment practices. Student conceptions of assessment matter when innovations and reforms in assessment practices are being implemented. This study is part of a larger research project entitled “Assessment in higher education: the potential of alternative methods” funded by the Portuguese Foundation for Science and Technology (Government Funding Agency) (PTDC/MHCCED/2703/2014). This paper surveys a large sample (N = 5549) of Portuguese students in five public universities with a Portuguese version of the Students Conceptions of Assessment (SCoA-VI) inventory, previously validated in Brazil. Confirmatory factor analysis recovered the eight SCoA factors reported in the Brazilian context. Differences in mean score for the eight factors were trivial for institutional and student factors. Overall, students agreed that assessment for student improvement was not ignored. Scale inter-correlations revealed interesting inverse relations between improvement and irrelevance functions.

qualification, however, is often seen as less than ideal because of its lack of emphasis on learning (Black and Wiliam 1998;Dochy and McDowell 1997;Myers and Myers 2015;Webber 2012). Higher education institutions are globally seeking to prioritise positive effects and benefits for learning through reformed assessment processes (e.g. rubrics, transparency, feedback, authenticity) and the use of innovative assessment practices (e.g. portfolios, self and peer assessment, simulation) (Struyven and Devesa 2016;Struyven et al. 2005).
The conceptions students have of those functions, including the value they give to each function, matter to the quality of their learning outcomes (McMillan 2016). Awareness of student beliefs especially matters when institutional policies and practices are reformed, partly because students tend to resist innovations in the mechanisms used to judge, evaluate, or certify their achievements (Struyven and Devesa 2016). As universities seek to innovate in their assessment regimes, it is important that students believe such assessments contribute to improved outcomes. Rejection of evaluative practices as irrelevant, inaccurate, or invalid would probably undermine the constructive goal of using assessment for learning. Hence, this paper extends the growing body of research into student conceptions of assessment by surveying a large sample of Portuguese university students with a recognised research tool (i.e. Student Conceptions of Assessment (SCoA) inventory; Brown 2008). Beyond validating that tool in a new context, the study explicitly tests the impact of relevant student and institutional factors upon the stability of results and differences in mean scores.

Students' conceptions of assessment in higher education
According to Segers and Tillema (2011, p. 53), Bstudying conceptions of assessment is of utmost relevance at a time when innovation of assessment practices is on the educational agenda^. Students' conceptions of assessment have been shown to relate significantly to academic performance, partly because of their contribution to self-regulation of learning (Brown 2011). Endorsement of assessment as a tool for improvement appears to be a constructive self-regulating belief leading to higher academic performance because self-regulation requires reflection upon achieved performance to identify learning priorities and successes. Gulikers et al. (2008) also found that students who perceived assessment as more authentic studied harder and developed more professional skills. In contrast, endorsement that assessment is irrelevant and ignorable leads to maladaptive responses (e.g. attribution to external locus of control) and rejection of the legitimacy of either the evaluative process, results, or feedback.
Students' conceptions and beliefs about assessment may influence their motivation and the process of self-regulated learning (Pereira 2016;Zimmerman 2008) because of the overlap between such ideas and self-regulation of learning (Brown 2011). According to Novak and Johnson (2012), emotions are directly linked to cognition and there is a strong correlation between affect and learning. Those conceptions lead to different reactions and feelings before, during, and after assessment (Boud 1995;McMillan 2016;Race 1995). Research also shows that students experienced both positive and negative emotions in higher education contexts (Novak and Johnson 2012). In addition, recent studies on assessment feedback reveal that emotional reactions may determine how students will perform regarding the feedback received (Pitt and Norton 2017;Ryan and Henderson 2018). In contrast, an intensive longitudinal study of student emotional responses through a summative assessment found that only when scores were known (i.e. feedback was received) did student emotions have systematic relations with performance (Peterson et al. 2015).
Much of the empirical work on conceptions of assessment has focused on compulsory education students and how these conceptions affect students' study behaviours and outcomes (Brown 2011(Brown , 2013Brown and Harris 2012;Brown and Hirschfeld 2008;Brown et al. 2009b;Chen and Brown 2018;Otunuku et al. 2013;Peterson and Irving 2008;Solomonidou and Michaelides 2017). Segers and Tillema (2011) found gaps in research regarding how the conceptions of assessment influence students' learning and teachers' approaches to teaching in different assessment systems. Furthermore, according to Solomonidou and Michaelides (2017), although there is research evidence on students' conceptions regarding school improvement and evaluation of teaching, students' conceptions of their own assessment continue to be an under-researched area. While some research with higher education students does exist (Brown 2013;Brown et al. 2014;Wang 2013, 2016;Fletcher et al. 2012;Landim et al. 2015;Matos et al. 2009Matos et al. , 2013Wang and Brown 2014), much less has been conducted in Portugal.
In the Portuguese context, research on assessment in higher education is still scarce. Flores and colleagues (Flores et al. 2015;Pereira et al. 2017a, b) found that written tests were the most used assessment methods and were associated with surface approaches to learning. Furthermore, the ideas which students associated most with assessment were linked to the kinds of assessment methods used by the teachers and were dependent on the programmes in which students are enrolled (Pereira et al. 2017a, b). Students perceived assessment as more effective and fairer when learner-centred assessment methods were used rather than only traditional assessment methods with implications for student learning processes and connections to their professional world (Flores et al. 2015;Pereira et al. 2017a). Learner-centred methods include, for instance, portfolios and project-based assessments, and Bactivities such as multiple drafts of written work that provides progressive feedback, oral presentations, student evaluations of each other's work and group and team projects that require interactions ( Webber (2012 p. 201).
In general, students associated most frequently neutral ideas (such as tests or examinations and grades) with assessment but also positive ideas (e.g. learning), while negative ideas (e.g. unfairness and fear) were least frequent. However, the idea of conflict was identified by undergraduate students who were assessed most frequently through so-called learner-centred methods of assessment such as project-based work and portfolios (Flores et al. 2015). However, differences among different programmes were found; for example, students enrolled in the programmes of Social Sciences and Humanities associated more positive ideas to assessment than students from other programmes (Flores et al. 2015). Similar results were found by Fernandes and colleagues who conducted comparative studies of assessment in higher education in Portugal and Brazil (Fernandes 2014(Fernandes , 2015Barreira et al. 2017). These studies revealed that university students perceived assessment as being essentially summative. However, traditional assessment methods (exams/tests) were not seen by the students as the best way of knowing what students learn and what they are capable of doing. In addition, students considered that, although assessment practices were continuous throughout the semester, there was little involvement in student-involved self-and peer assessment (Fernandes 2015;Barreira et al. 2017). Although the assessment system in Portugal is consistent with research-based recommendations and recognises the fact that formative assessment promotes students' educational success, in many Portuguese educational settings, assessment is more oriented towards grading and ranking pupils' achievements rather than on helping them to learn (Fernandes 2009).
The same emphasis on traditional methods of assessment is reported in Brazil (De Oliveira and Flores 2017). Undergraduate students state that written tests are the most common assessment methods and that grades usually have no feedback from the instructors. In a similar vein, Pereira et al. (2016b) showed that students perceive feedback as more relevant and effective when students are assessed through learner-centred methods rather than by traditional ones. In another study, comparing students' perceptions about assessment in Portugal and Sweden found similar views although student modes of assessment (i.e. selfand peer assessment) were less used by Swedish students (Pereira et al. 2017b). Likewise, although there is a large variation in assessment practices between universities and disciplines within neighbouring Spain, the use of traditional examination practices was widespread (Panadero et al. 2018). Thus, it could be expected that specific context factors could impinge upon student conceptions in Portugal.
A review of Brazilian research into higher education student perspectives on assessment (Matos et al. 2009) showed that two dimensions (i.e. formality of assessment method and locus of control) were able to summarise the variation in student thinking. Ultimately, assessment was perceived as formal test-like practices under the control of teachers, rather than students. Even peer-and self-assessment practices were directed by teachers, not students themselves. A large scale survey of public and private university students in Minas Gerais state in Brazil (Matos et al. 2012) found that students had a wide range of conceptions but their dominant conception was a negative emotional reaction (irrelevance) with respect to student accountability. In comparing Brazilian and New Zealand university students, Matos and Brown (2015) concluded that Brazilian students had a more negative conception of assessment, speculating that the largely summative use of assessment in higher education was responsible. Using a Bdraw-a-picturet echnique with categorical analysis taken from Brown and Wang (2013), 60% of Brazilian university students' pictures of assessment were dominated by just two themes (i.e. negative emotions and portrayal of assessment as imprecise or inaccurate) (Landim et al. 2015). Hence, independent of data collection method (i.e. document analysis, factor-analysed survey, or free response drawing), the same message appears-assessment is formal and negatively experienced.
Previous research among New Zealand secondary students with Brown's SCoA version 2 inventory reported invariance in responding to the factors according to student sex, year of study, and ethnic group (Hirschfeld and Brown 2009). Reanalysis of the same data found that differences in interest and self-efficacy in reading had no statistically significant effect on how conceptions of assessment related to achievement (Brown and Walton 2017). In contrast, jurisdictional differences in factor means and inter-correlations were observed for the responses of Brazilian university students compared with New Zealand, Hong Kong, and China students on the 6th version of the SCoA inventory (Brown 2013). Furthermore, there were statistically and practically significant differences in SCoA factor means around attendance and effort given on a computer-based low-stakes test at one American university (Wise and Cotten 2009). This suggests demographic and psychological factors within a jurisdiction may properly play little effect on how students respond to the SCoA inventory; however, contextually defined assessment practices may elicit different responses. Hence, an important goal for this Portuguese study was to establish the nature of their responses to the SCoA and discover whether there were demographic factors that influenced responses.
This review reveals a relative lack of insight into university students' perceptions or conceptions of assessment globally let alone in Portugal. As Brown et al. (2009a, p. 9) recognise, Bexamining students' attitudes towards assessment purposes may help us uncover important factors in what students do before, during, and after assessment^.

The Portuguese context
Portuguese higher education includes both universities and polytechnics. Universities offer solid scientific training, gathering the efforts and competences of teaching and research units, while polytechnics focus on vocational and advanced technical training for the labour market. The Portuguese higher education system includes public higher education, composed of State institutions and foundations, and private higher education, consisting of institutions belonging to private entities and cooperatives (Eurydyce). 1 In Portugal, there are Bclassical^universities (e.g. University of Lisbon, University of Porto, and University of Coimbra) and the so-called new universities which were founded in the early 1970s, (e.g. University of Minho, University of Aveiro, University of Beira Interior, University of Algarve, University of Trás-os-Montes and Alto Douro, and University of Azores).
Access to higher education in Portugal is gained by national exams at the end of secondary education. In addition to completing a secondary education programme or having a legally equivalent qualification, students must obtain on the entrance examinations established for their intended programme, a classification equal to or higher than the minimum required classification (Decree-law n.°90/2008). In competitive programmes, for example Medicine and a number of Engineering degrees, the required entry marks are very high (18 or more out of 20). According to the latest official statistics (Direcção Geral de Estatísticas da Educação e Ciência 2017) 361,943 students were enrolled in higher education in all study cycles in 2017, with 65% in university and 35% in polytechnics. More than four-fifths (83.6%) of the university students were enrolled in public higher education and 16.4% in private higher education. 2 In public higher education, students have to pay a tuition fee. The amount of tuition is fixed according to the nature of the programmes and their quality, with a minimum value corresponding to 1.3 of the national minimum salary. 3 Portugal has adopted the Bologna Process (Bologna Declaration 1999) which has shaped higher education landscape in the light of its principles and goals within the European Higher Education Area (EHEA). The resulting system provides degrees of comparable value in order to promote mobility, employability, quality assurance, and the development of life-long learning (European Commission 2006). Relevant to this study, Decree-law no. 42/2005 stipulates that training be centred on the competencies to be developed by the students, and the centrality of the student in his/her process of training and learning. Students are expected to engage in project-led work, individual study, assessment, and other activities focused on artistic, sociocultural, and sports dimensions. The pace and priorities of the reforms consistent with the Bologna process vary from country to country (Pereira et al. 2016a).
In Portugal, both public and private universities are required to report on their implementation of the Bologna Process. Institutions reported on changes in the teaching and learning process, based on a student-centred learning that promotes autonomous work and student coresponsibility for learning (see reports on the implementation of the Bologna process). Given that assessments are used to select students for entry to university and that universities are expected to implement formative approaches to assessment, or at least to use diverse forms of assessment along with active methodologies in teaching in the light of the Bologna framework, we expect students' conceptions of assessment to endorse the formative purposes, but not reject the summative uses of assessment. This appears to suggest that we expect students to illogically endorse inconsistent conceptions; instead, we suggest that student conceptions of assessment will be ecologically rational (Rieskamp and Reimer 2007). In other words, because assessments fulfil these two different functions with important consequences for students (i.e. improved learning outcomes and certification or selection), we expect students to endorse both purposes. Nevertheless, in light of the assessment for learning and Bologna processes, we do anticipate that the formative purposes of assessment will be endorsed more than the summative ones.

Method
The study This study is part of a larger research project BAssessment in higher education: the potential of alternative methods^funded by the Portuguese Foundation for Science and Technology (Government Funding Agency) (PTDC/MHCCED/2703/2014) and co-financed by European Regional Development Funds (FEDER), Competitiveness and Internationalization Operational Program (POCI-01-0145-FEDER-007562). This broader research project has a strong empirical component in order to identify and analyse assessment practices in five Portuguese public universities from the point of view of both university teachers and students. However, this paper aims (1) to look at students' conceptions of assessment in the same five Portuguese public universities and (2) evaluate student and institutional factors impinging upon how Portuguese students respond to the questionnaire BStudents Conceptions of Assessmentversion VI^inventory (Brown 2008).

Procedures
In accordance with procedures approved by the Ethics Committee of the University of Minho (Subcommittee on Ethics for Social and Human Sciences, ref. SECSH035 / 2016), programme coordinators and directors were asked for permission to collect data. The survey was disseminated to different programmes in each of the fields of knowledge included in the study (for instance, Medical and Health Sciences included Medicine and Nursing programmes). However, the number of students in those programmes is not publically available. Teachers in the five universities were asked to schedule a time and place in which their students could be invited to complete the questionnaires. Data were collected in person and via email from February to July 2017. In total, 5549 students from five Portuguese public universities participated in the study (Table 1). Out of the 5549 students, 5407 completed the questionnaire in the classroom and 142 students completed the questionnaire replying to the questions by using the link provided via e-mail. Missing data analysis and estimation was carried out to ensure valid inferences would be made. All participants (n = 24 deletions, 0.43%) with more than 3 missing values on SCoA items (i.e. > 10%) were deleted. Expectation maximisation was used to impute missing values for the remaining cases. Little's MCAR test chi-square ratio to df was statistically not significant (χ 2 (4144) = 4980, χ 2 /df = 1.2, p = 0.27), suggesting the distribution of missing responses was random and imputation was valid. The final retained sample with no missing values was N = 5525.

Participants
Approximately two-thirds of the participants were women (n = 3488, 63.1%). Just over half of the participants were between 20 and 25 years old (55.7%). Participants were enrolled in different academic programmes in different fields of knowledge: Medical and Health Sciences, Exact Sciences, Engineering and Technology, Social Sciences, and Humanities. About fourfifths of the participants were in the first 3 years of an undergraduate or integrated masters' degree. 4

Instrument-students conceptions of assessment inventory
Research with the Student Conceptions of Assessment (SCoA) inventory (Brown 2008) into the conceptions of higher education students have about assessment has focused on how students conceive of the purpose, nature, and function of assessment (i.e. it is for improvement, it predicts the student's future, it improves classroom climate, and it is irrelevant). The SCoA aggregates items into four major inter-correlated factors, with subordinate factors for the improvement and irrelevance factors. The four inter-correlated constructs are (i) assessment improves learning and teaching (improvement); (ii) assessment relates to external factors (external); (iii) assessment has affective benefit (affect); and (iv) assessment is irrelevant (irrelevance). A Portuguese version of the SCoA-VI inventory was used in Brazil (Matos 2010) and this was the version used in this study. In accordance with Matos (2010), the item (sf4, assessment tells my parents how much I've learnt) was deleted because most university students no longer have their grades reported to their parents. Instead of the positively packed rating scale used in the SCoA-VI, this study adopted Likert's five-point balanced agreement rating scale (i.e. strongly disagree, disagree, neither agree nor disagree, agree, and strongly agree), following from a previous Portuguese study with nursing teachers (Gonçalves 2012). Note, while completely conventional, the restricted number of options in the positive and negative ranges usually reduces variance in the data, potentially making modelling somewhat more complicated.
Studies with New Zealand secondary students and university students (Brown 2013;Brown et al. 2009c) suggest that the hierarchical statistical model fits both samples. However, studies with samples outside New Zealand indicate that no single statistical model exists to capture student beliefs (Brown 2013). Research in Brazil (Brown 2013) has suggested that the hierarchical elements of the SCoA do not work, but Matos and Brown (2015) found that the items aggregated, as per design, into the eight original factors of the SCoA in an inter-correlated model. Bifactor analysis of Brazilian and New Zealand university student SCoA responses (Matos et al. 2019) found full metric equivalence when a general SCoA factor was supplemented with three unique factors (i.e. assessment improves classroom climate, assessment improves teaching and learning, and assessment is irrelevant). These Brazil-New Zealand comparative studies suggest that there may be universal aspects of how students conceive of assessment and support the use of the inventory in Portugal.
This study surveys Portuguese university students to test the validity of the SCoA for use in a second Portuguese language context and to test for differences between groups of students based on membership in relevant educational categories (i.e. their university, year of study, field of knowledge, and cycle of study). Thus, this study contributes to a growing understanding of global and local aspects of student experience of higher education assessment.

Analysis
Because multiple previously published statistical models for the SCoA inventory existed, four models were tested with confirmatory factor analysis to identify the model that best corresponded with the data. The models were: 1. Brown's NZ SCoA-VI model (Brown et al. 2009c), 2. Matos' eight-factor inter-correlational model , 3. A bifactor model with a general factor and four main factors (Weekers et al. 2009), and 4. An inter-correlated four main factors model, in which all subordinate factors are removed so that items load directly on the higher order factor.
Model fit was examined in line with conventional practices (Fan and Sivo 2007;Marsh et al. 2004). A model need not be rejected if: & χ 2 per degree of freedom was statistically non-significant, & Gamma hat and comparative fit index (CFI) > .90, & Root mean square error of approximation (RMSEA) < .08, with 90% confidence interval being less than 0.08, and & Standardized root mean residual (SRMR) < .08.
When choosing between competing models, differences in the Akaike information criterion (ΔAIC) greater than 10 indicate that the smaller value is closer to the data (Burnham and Anderson 2004). By taking advantage of features of the data, it is possible to improve the fit of a model by inspection of the modification indices. Items with strong attraction to other factors violate the assumption of simple structure in which items clearly belong only to one factor. Deletion of those items can improve the fit but such actions have to be theoretically justifiable.
Once a model is selected, the stability of the model across participant demographic groupings was conducted using nested multigroup confirmatory factor analysis . Nested invariance testing examines the equivalence of regression weights (metric equivalence) and item intercepts (i.e. scalar invariance). When differences in the CFI are < .01, the addition of equivalence constraints is considered not to have changed the quality of fit. This approach first fixes regression weights to be equivalent, before testing equivalence of intercepts or starting points. All confirmatory factor analysis and invariance testing were carried out in Amos v25 (IBM 2017) using maximum likelihood estimation.
Comparison of mean scores usually requires that the groups have statistically equivalent item regressions and intercepts. To evaluate the differences in demographic categories, the Bartlett factor scores were calculated for each scale or factor in the model. Bartlett factor scores are latent scores derived from the weighted contribution of items within each factor (DiStefano et al. 2009). The scores are centred on zero with a mean of 1, meaning they behave like a z-score.

Model selection
The fit indices for the four tested models are shown in Table 2. According to AIC value, model 2 with eight inter-correlated factors was best fitting (ΔAIC = 592 over model 3). Because the chi-squared per degree of freedom of model 2 was > 17, the support for this model was weak. However, because the chi-square test is very sensitive to big sample size (Wheaton et al. 1977), a randomly selected sample of 500 participants (model 2A) was drawn and tested against the same model. This produced a statistically not significant result for the chi-square per degree of freedom ratio indicating that this model did not need to be rejected.
While 13 items had substantial sum of modification indices, just four items (i.e. two in classroom environment [ce6, ce5] and two in irrelevance [ig3, bd5]) had to be removed to achieve excellent model fit (model 6 in Table 2). Figure 1 shows the structure of the trimmed model 2 showing eight inter-correlated factors, with 28 items. Note that three factors only have two items which are normally unacceptable; however, in a multi-factoral inventory, factors can be estimated with just 2 items (Bollen 1989).

Multiple-group confirmatory factor analysis
A series of multigroup confirmatory factor analyses (MG-CFA) were conducted with grouping variables being (a) field of knowledge, (b) year, (c) university, and (d) cycle of study. Model fit indices (Table 3) show that all four approaches yielded similar fit. However, the model based on cycle of study had the smallest AIC value (ΔAIC > 200) suggesting that it should be used to analyse factor scores in trimmed model 2.
Before comparing the mean scores, invariance testing of cycle of study's three groups was conducted within MG-CFA. Each cycle group was found to separately fit the data well and the fit improved when the three groups were combined in one model ( Table 4). The metric and scalar equivalence tests were sequentially ΔCFI < 0.01 indicating equivalence in regression weights and intercepts between the three cycle groups. Thus, factor loadings and means can be compared between these three groups.

Factor mean scores and correlations
Mean scores for the factors, taking the average of all items belonging to the factor, were highest for Student Improvement, close to neutral for five scales, and lowest for Ignore (Table 5). Differences in mean scores were relatively trivial to small, except Student Improvement which had large positive effects relative to all other factors (mean Cohen's d = 1.83) and Ignore which had large negative effects relative to all other factors (mean Cohen's d = 1.87). The small variations in mean scores among six factors suggest that, after endorsing assessment for improving student learning and rejecting the act of ignoring assessment, there was a relatively neutral stance towards assessments for accountability, improvement, and socioaffective purposes. Thus, the students' conceptions can most succinctly be summarised as assessments helped them improve and were not ignored. Table 5 also shows the factor inter-correlation matrix for trimmed model 2. It shows that there are potentially two major dimensions within the eight factors. As previously reported, the two accountability factors (i.e. Student Future and School Quality), the two improvement factors (i.e. Student and Teacher), and the two emotional factors (i.e. Personal Enjoyment and Class Environment) are all positively inter-correlated with each other. In contrast, and consistent with previous studies, the two irrelevance factors (i.e. Bad and Ignore) are negatively CFI, comparative fit index; RMSEA, root mean square error of approximation; SRMR, standardised root mean residual; AIC, Akaike information criterion; ***p < .001; ns, not significant .53 Fig. 1 Trimmed model 2 based on Matos and Brown (2015) for Portugal university students. Note: All values are standardised; diagram from Jamovi 0.9 using Lavaan package in R CFI, comparative fit index; RMSEA, root mean square error of approximation; SRMR, standardised root mean residual; AIC, Akaike information criterion; *p < .05; ns, not significant correlated with the six other factors but positively correlated with each other. Interestingly, Bad had a stronger negative correlation with Teacher Improvement, while Ignore had a stronger negative correlation with Student Improvement, suggesting that assessment that informs teachers' teaching is not bad, while assessment that guides student improvement is not ignored. MANOVA was conducted with factor mean scores as dependent variables, and demographic variables as independent variables (i.e. field of knowledge, year, cycle of study, university, sex) as both main effects and two-way interaction effects. Given the large number of comparisons made, alpha was set at p ≤ .01. Note also that not all universities provided students in all fields of knowledge (e.g. university 3 had no students in exact sciences, university 5 had no students in social sciences or humanities); this meant the two-way interaction for university by field of knowledge was excluded from the MANOVA. Furthermore, only two of the five fields of knowledge had all three cycles of study, so this interaction was also removed from the MANOVA. Likewise, cycle of study 2 had < 10 students in year 3 onward and there were no cycle of study 1 or 2 students in year 6; hence, the cycle of study by year interaction was also excluded.
Four of the main effects met the threshold for consideration (i.e. field of knowledge, F (32) = 3.15, p < .001; sex, F (8) = 6.19, p < .001; year, F (40) = 2.30, p < .001; and cycle of study, F (16) = 3.60, p < .001), while five two-way interactions also met that standard (i.e. field of knowledge by year, F (128) = 1.45, p = .001; and sex, F (32) = 1.81, p = .003; university by sex, F (32) = 1.85,   (Table 6). However, the total variance explained for each factor fell in the small range (f 2 < .11, Cohen 1992), suggesting that these background variables do little to explain how students conceive of assessment. Instead, intra-individual psychological or social factors are more likely to impact these attitude or belief scores. In light of this small power, no further analysis was carried out to explore how student conceptions of assessment varied by demographic conditions.

Discussion
This survey of Portuguese university students with a translated and adapted version of the Student Conceptions of Assessment (version VI) inventory successfully replicated the intercorrelated eight-factor model reported in Brazil  after trimming four items in two factors. The eight conceptions related to the purpose of assessment are as follows: (1) predicts student future; (2) evaluates school quality; (3) supports student learning improvement; (4) supports improvement of teaching; (5) is bad; (6) is ignored; (7) is personally enjoyable; and (8) supports positive class environment. This study successfully replicates the previous Brazilian survey of university students  and confirms that the items in the SCoA-VI, developed in New Zealand with secondary school students (Brown 2008), have generalizability as measures of Portuguese-speaking university student conceptions of assessment. The trimmed model was strongly invariant by cycle of study supporting the comparison of factor means across groups. Factor mean score differences existed but were trivially small based on institutional and student demographic characteristics. This contradicted the expectation that differences in experiences related to discipline, degree, university, length of study, or sex would lead to different conceptions of assessment. Overall, students tended to agree with the student improvement factor and reject the ignore concept; otherwise, their conceptions  (Brown and Walton 2017;Hirschfeld and Brown 2009). Such stability within a population is a desirable feature of any research instrument. Nonetheless, there may be more substantial differences in SCoA scores if observational data around assessment practices (e.g. effort given, time on task, perseverance in light of failure) had been available. Given the nature of the study, finding common practices across so many different disciplines and universities was beyond the scope of this study. Elsewhere, it has been shown that academically successful students actively see assessment as a tool for improving their learning and performance (Brown et al. 2009c and accept that assessment judges their learning (Brown and Hirschfeld 2008). The positive endorsement of student improvement in this study is entirely consistent with notions that student responses, attitudes, and conceptions of assessment that support self-regulation of learning are appropriate and adaptive for successful learning outcomes (Brown 2011). The relatively negative to weak endorsement for the two affective purposes is consistent with research in New Zealand, which has shown that endorsement of classroom environment purpose had no statistically significant relationship to achievement (Brown et al. 2009c). Furthermore, reliance on peers for feedback has not been shown to contribute to greater performance among New Zealand university students . Combined with the rejection of the concept that assessment should be ignored, the results suggest that these students have a view that assessment is primarily about paying attention to what they got wrong so that they can improve. These results indicate Portuguese university students are already trying to use whatever information they get to improve their performance, which is what successful, self-regulating students do.
The slightly negative attitude towards the teacher improvement purpose was somewhat surprising relative to results in other jurisdictions (Brown 2011). Higher performance was seen among secondary school students who had confidence that their instructors used assessments to improve instruction (Brown et al. 2009c). However, it may be difficult in higher education for students to perceive that teachers are using assessment for improved teaching. Nonetheless, it would be tragic if the belief in assessment for improvement were left solely to the responsibility of the learner.
It behoves instructors, departments, and institutions to ensure that assessment within courses and programmes is designed to permit the possibility of gaining feedback that supports learning in subsequent assessments. This probably means not relying solely on percentage scores or letter grades to communicate information about what students need to do to improve their work (Brown and Hattie 2012); diagnostic insights are needed which probably come in the form of commentary rather than scores (Lipnevich and Smith 2009). Further, it means that assessments have to scaffold in such a way that feedback from early assessments helps students learn what they need to do on later and summative or terminal tasks. Unfortunately, this is an approach that is not universally implemented in higher education (Winstone and Boud 2019).
The negative and relatively strong correlation between Teacher Improvement and Bad suggests that students accept, like previous studies, that assessments are not bad if they help teachers do a better job. Nonetheless, it may be difficult for higher education students to perceive that instructors actually use assessments to modify or improve their instruction. Given semester-long courses in which all assessment and course details have to be published in advance, it is very difficult for students to notice if teachers make changes based on assessment results. The only tangible evidence of teacher use is likely to appear in the kind, speed, and quality of feedback provided. Furthermore, the use of national summative exams to provide students access to degrees and institutions is likely to lead students to a somewhat narrow view of assessment; how can teachers be using it for improvement when it is about my terminal performance with little or no feedback beyond a score?
It is an unfortunate but understandable limitation of this study that a measure of academic performance could not be included, making this a matter for future investigations to test the claim that students who endorse student improvement and reject ignoring it have higher grades. Furthermore, future research should incorporate the SCoA with measures of self-regulation to ascertain how self-reported self-regulation relates to these eight conceptions of assessment. Embedding such research in or around actual assessment tasks would also be highly desirable (e.g. Peterson et al. 2015) to identify characteristics of assessments that support self-regulating conceptions of assessment.
In conclusion, it would appear the Portuguese university students have an overall conception of assessment that expresses personal responsibility to use insights from assessment to improve their learning, performance, and achievement. The success of the Bologna Process in terms of how assessment is experienced by students is not strongly evident in these five institutions. Nonetheless, the ambition of the Bologna Process is a positive one and further efforts are needed by universities to mitigate an approach to assessment that is purely summative.