Socio-emotional development in high functioning children with Autism Spectrum Disorders using a humanoid robot


 The use of robots had already been proven to encourage the promotion of social interaction and skills lacking in
 children with Autism Spectrum Disorders (ASD), who typically have difficulties in recognizing facial expressions and emotions. The
 main goal of this research is to study the influence of a humanoid robot to develop socio-emotional skills in children with ASD.
 The children’s performance in game scenarios aiming to develop facial expressions recognition skills is presented. Along the
 sessions, children who performed the game scenarios with the robot and the experimenter had a significantly better performance
 than the children who performed the game scenarios without the robot. The main conclusions of this research support that a
 humanoid robot is a useful tool to develop socio-emotional skills in the intervention of children with ASD, due to the engagement
 and positive learning outcome observed.


Introduction
According to the current criteria in the DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition), Autism Spectrum Disorders (ASD) are characterized by repetitive patterns of behaviour, restricted activities or interests, and impairments in social communication (Association, 2013). The essential characteristics of ASD are the presence of markedly atypical development of social communication and a repertoire patently restricted of activities and interests. The manifestations of these characteristics vary in function of the development level and of the age of the individual (Filipe, 2012). According to the American Academy et al., 2010; Boucenna et al., 2014;Diehl et al., 2012;Ricks and Colton, 2010;Scassellati et al., 2012).
The following subsections present research about the use of facial expressions displayed on robots and research concerning the emotion recognition difficulty by individuals with ASD. Few research have been devoted to the recognition of emotional expressions in games with children with ASD using robotics tools. The research presented in this article addresses exactly this gap.

Facial expressions displayed by robots
This section presents projects involving the use of robots to display emotional facial expressions. Only the projects using the robots FACE and Probo had as target group children with ASD.
The humanoid robot FACE (Mazzei et al., 2011) was built to allow children with ASD to deal with expressive and emotional information. The expressions and movements of FACE were modelled to be harmonized with the feelings of the user. HEFES (Hybrid Engine for Facial Expressions Synthesis) is a system created by the same authors to generate and control facial expressions both on physical androids and 3D avatars (Mazzei et al., 2012). The system used in FACE was tested on a panel of 5 children with ASD and 15 typically developing children interacting with the robot individually under the therapist supervision. The evaluated facial expressions were happiness, anger, sadness, disgust, fear, and surprise, defined as the basic emotions by Ekman (Ekman and Rosenberg, 1998). These emotions are going to be referred from now on as basic emotions or basic facial expressions. The participants labelled each expression and this labelling was scored by the therapist as correct or incorrect. Their results showed that children with ASD and typically developing children were able to label happiness, anger and sadness performed by FACE with good accuracy. However fear, disgust, and surprise had not been labelled correctly, especially by participants with ASD. The results for FACE's recognition rates with children with ASD were the following: anger -100%, disgust -20%, fear -0%, happiness -100%, sadness -100%, surprise -40%, and the average of all emotions was 60%. The results for FACE's recognition rates with typically developing children were anger: 93.3%, disgust: 20%, fear: 46.7%, happiness: 93.3%, sadness 86.7%, surprise: 40%, and the average of all emotions was 61.1%. The authors justify these results claiming that fear, disgust, and surprise are emotions which rely greatly on gestures to convey its expression, and facial expressions on their own were not enough for an efficient recognition.
Probo (Saldien et al., 2010) is an animal-like robot, designed to act as a social interface. The authors used Probo as a platform to study human-robot interaction and it was capable of performing facial expressions. These were represented as a vector in the two-dimensional emotional space, valence and arousal, based on the Russell's circumplex model of affect (Russell, 1980). The recognition of the robot's facial expressions were evaluated by 23 typically developing children, giving an identification rate of 96% for anger, 87% for disgust, 65% for fear, 100% for happiness, 87% for sadness, 70% for surprise, and the average of all emotions of 84%. In their opinion, a better recognition of the robot's facial expressions contributes to the general social acceptance. In addition, the recognition of the facial expressions is important for an effective nonverbal communication between a human and a robot.

Emotion recognition difficulty
Three groups of ten individuals each, matched for verbal mental age and composed of children with ASD in the first, children with Down syndrome in the second, and typically developing children in the third, were tested on a delayedmatching task and on a sorting-by-preference task. In the first task, the participants had to match faces expressing an emotion which was presented briefly (750 msec). The second task involved rating the valence of an isolated stimulus, such as facial expression of an emotion or an emotional situation in which no people were represented. Results showed a considerably worse performance from individuals with ASD than from both typically developing and Down participants groups on both tasks, shown by the mean scores of the participants (Celani et al., 1999). Another crucial aspect is the examination of the roles of the verbal and nonverbal sources of information in the ability of participants to recognize emotions (Loveland et al., 1997). A study with children with low-and high-functioning ASD and typically developing children, matched by verbal and non-verbal mental age, was compared in an emotion recognition task. All participants watched video clips from which they had to identify the emotions expressed, verbally, nonverbally, or both. The presented emotions were either happy, angry, sad, surprised, or neutral, and verbal expressions of emotion were either explicit, implicit, or neutral, whereas non-verbal expressions were animated (clearly conveyed happiness, sadness, anger, or surprise) or flat (neutral face and voice). Results showed differences between higher and lower functioning groups. The performance of low-functioning participants implied they had problems understanding how a person in the video clips felt based on what the person said, if the emotion was not clearly stated. The performance of high-functioning participants suggested that they used more non-verbal than verbal information to determine a speaker's emotion, except when the emotion was explicitly named (Loveland et al., 1997).
Results from (Hobson, 1986) showed that children with ASD were significantly impaired in choosing which of the drawings of gestures should match videotaped vocalisations and facial expressions characteristic of four emotional states, when compared to typically developing children with learning disabilities.
The studies presented in this section summarize the research performed with individuals with ASD regarding facial emotion recognition, and they emphasize the common difficulty of this population to identify emotions. Children with ASD presented difficulties when examining the valence from emotional expressions and situations, so in this kind of tasks emotional information should be strong and marked so they can perceive them as such (Baron-Cohen, 1991).

Research questions
In this study, the following research questions were addressed: a. Can a humanoid robot contribute to develop visual facial expressions recognition in children with ASD? b. Can a humanoid robot with the capability of displaying facial expressions elicit facial expressions' imitation skills in children with ASD? c. Can a humanoid robot help children with ASD to attribute mental states and to identify others affective state?
For a brief clarification, the traditional strategy applied to the children participating in this research was the TEACCH methodology. This methodology supports its intervention in: -physical structure: organisation of the physical spaces with signalling and well defined limitations, decreasing distracting factors; -creation of an one-on-one workspace and autonomous work inside the classroom; -implementation of the individual work schedule with the different moments of the day; -implementation of transition cards as a communication medium and promoter of the child's autonomy; -definition of daily routines to promote the childs adequate behaviour through a stable and safe environment; -introduction of small changes to break routines and to promote the capability of the child's adaptation to new situations; -visual support to promote communication between the child and others using augmentative communication systems, such as, PECS (Picture Exchange Communication System) or communication tables (Lima, 2012;Mesibov and Howley, 2003).

Methods
All topics regarding the experimental study are defined below, specifically ethical concerns, participants, undertaken procedures, characteristics of the robot, used setup, and evaluation tools.

Ethics statement
The procedures presented in this article were approved by the Ethics Committee of the University of Minho, Portugal and by the Portuguese National Committee for Data Protection. A partnership protocol was established between the University of Minho and each of the schools, clinics and associations where the experiments took place. This protocol identified the researcher involved in the experiments and the assigned professionals who supported the research. The experimenter made the commitment to make available the results and conclusions from the research, through scientific reports. The schools, clinics and associations made the commitment to collaborate in the experiments, by the support of their professionals, the use of the intervention rooms and the connection to the children's family. Parents of the children signed a consent form in which they were notified about the goals and applied methods of the research. This consent also included a document with information about the risks and benefits arising from the research, as their entire freedom to decide on their acceptance to participate and to withdraw their child from the research project at any time. The children's teachers were consulted and informed about the activities to be performed and gave suggestions intended to improve them.

Participants
The sample of 45 children was divided in three groups: -G1: 15 children with ASD who perform game scenarios with the robot, the pre-, and the post-test; -G2: 15 children with ASD who perform game scenarios without the robot, the pre-, and the post-test; -G3: 15 children with ASD who only perform the pre-and the post-test (without intervention).
This study was carried out in eight primary schools and two clinics which conduct therapies with children with ASD. All children met the following inclusion criteria: aged five to ten years old, diagnosed with high functioning ASD by a professional clinician, and with authorisation of the parents. Children with intellectual problems were excluded from the sample. The direct access to the children's medical files was not granted to the experimenter, but the questionnaires filled in by the teachers or therapists guaranteed the children's diagnosis. The experimenter did not know any of the children prior to the experiments. All children receive weekly therapy from speech and occupational therapists and some of them by psychologists. During the intervention time in this study, both teachers and therapists were asked not to perform activities focusing on facial expressions and emotion recognition.
The first criteria to divide the participants in groups was their age. However, some of the children in the clinics were not able to attend sessions twice a week so they were included in G3. The second criteria was gender. Statistical data shows that on average there is a ratio of 1 girl to 6 boys with high-functioning ASD (Johnson et al., 2007). For this reason it was only possible to include 9 girls in the sample. It was not possible to balance the number of girls in each group, due to their unavailability to attend sessions twice a week. Each school had participants presented in all of the three groups whenever it was possible (for example, it was not possible in schools where only two children met our sample criteria).

Procedures
A combined crossover multiple baseline design across participants has been used in some research using robots to interact with children with ASD (Huskens et al., 2013). In this design, the intervention conditions must start at different times across individuals, so it can be verified that the changes are due to the intervention rather than to a chance factor. If a significant change occurs in all participants after the intervention, it is possible to infer that the treatment was effective. This is followed by a period of time, often called a washout period, to allow any effects to go away or dissipate. Then, a second intervention occurs with an equal period of time, followed by a second observation. However, it was not considered appropriate to use a crossover design since it is expected that the effect of the intervention continues, i.e., that the children acquire the competence. If it is verified the second method after a certain amount of time, it would not be possible to verify which intervention caused the effect. Therefore, a multiple baseline design across participants was used in this research to explore the efficiency of an intervention using the robot compared to an intervention without the robot in promoting emotion recognition skills scenarios in children with ASD. Since the differences between the groups in the pre-test and in the first session of the Practice Phase are not significant (as it can be seen in the Results Section -Figures 6, 7, and 8), it can be assumed that the groups of participants were balanced in this specific skill.
To reach our goals, four different phases were defined: familiarisation, pretest, practice, and post-test. The familiarisation phase took place in each school and clinic during a usual day of activities or intervention session, and the experimenter had the opportunity to interact with the children in a group context for at least two hours. The goal was to allow research and child to be acquainted.

Performance task
The task performed in the pre-and post-test (performance task) had the final goal of evaluating the skill of children to label facial expressions and in this study its suitability to be used with children with ASD was tested. This task was performed without the robot and consisted in matching cards on which a man or a woman was showing one of five different emotions (happiness, sadness, anger, surprise, and fear). These cards were matched with cards with PECS (Picture Exchange Communication System) representing the same emotions. The cards showed to the children are presented in Figure 1.
The cards were not labelled with the emotion they depicted and this was done because even that the children included in the sample were diagnosed with highfunctioning ASD, the ones with 6 years old or less do not have enough academic skills to read. In addition, it was not assessed if whether the children knew what emotion each card was representing to prevent the memorization of the answers by the children which would jeopardize an impartial evaluation of their improvement before and after the experiment. The two sets with facial expressions were taken from the database of (Kanade et al., 2000) and (Lucey et al., 2010), which was released for the purpose of promoting research into automatically detecting individual facial expressions. The five PECS cards were presented at the same time on a board. Five empty spaces under the PECS cards were available, and the experimenter delivered the cards with the picture of the man or the woman, and prompted verbally the child to match the card he/she had in his/her hand with the ones on the board, putting them together. Once the child managed to correctly match the cards, the experimenter gave him/her another one, until they were over. The order the cards were given to the child was always random.

Practice phase
In the practice phase, the children performed three different game scenarios: -Recognize -to identify and label facial expressions and gestures matching emotions -the robot (in G1) or the human partner (in G2) displayed an emotional facial expression and the corresponding gesture and the child had to choose the correct racket matching the emotion; -Imitate Me -to reproduce a facial expression representing an emotion -the robot (in G1) or the human partner (in G2) displayed an emotional facial expression and the child had to display the same facial expression; -Storytelling -to evaluate the affective state of a character at the end of a story -the robot (in G1) or the human partner (in G2) told a social story implying an emotional state and the child had to choose the correct racket matching that emotion.
A total of eight sessions were performed with each child, being the first and second session, and the seventh and eighth session performed in the same day. The experimenter interacted with each child twice a week, during three weeks. When any of the children had to miss his/her session, it was re-scheduled. The children missing more than two sessions were excluded from the experiments. The game scenarios were performed as presented in the following list: - The distribution of the game scenarios in such a way took into account the experience taken from the literature and from two focus groups made with professionals who daily interact with children with ASD.

Focus groups
Two focus groups were composed: one of them was formed by five professionals that normally accompany children with ASD as carers, and the other group was formed by four occupational and speech therapists. One of the goals of this study was to verify what kind of vocabulary should be used by the experimenter and by the robot in the instructions of the game scenarios. In addition, it was necessary to define which was the best position of the participants in the room (experimenter, child, and robot), and the procedure to start and finish sessions. Regarding the difficulty level of the game scenarios, it was considered by the professionals that the Recognize activity would be the basic task, followed by the Imitate Me activity, since the latter involved the identification and then the imitation of facial expressions. Considering, that the children had to identify the character's affective state in a story, the Storytelling activity was ranked harder for children with ASD. The game scenarios were presented an approximate number of times (four times for the Recognize and Imitate Me game scenarios, and three times for the Storytelling game scenario), and trying to introduce new factors in each session to keep the child motivated. Each session with the children took between 5 and 15 minutes, according to the number of game scenarios the children had to perform. No more than one minute passed between one scenario and the following one. The experiments were carried out by the first author. The design of facial expressions and corresponding gestures representing emotions were tested with typically developed children and adults (Costa et al., 2013) and the game scenarios were tested in exploratory studies with a small sample of children with ASD (Costa et al., 2014a,b).

The robot
The robot used in the studies differs greatly from robots used in other designs due to the face being covered with a polymeric material called Frubber, giving it the ability to display varied facial expressions (Figure 2). This humanoid robot developed by RoboKind (Hanson et al., 2009) possesses a walking body (with 31 degrees of freedom in total) that simulates expressive capabilities of a human-inspired character face and gestural body. The robot is 60 cm tall, weights less than 6 kg, is low power, and battery operated. It has two hi-definition (HD) 720p cameras embedded in its eyes with USB-2.0 interfaces and it includes Wi-Fi, USB ports, and all associated power adapters.
The RoboKind software performs animation and motion control functions and it includes an Application Programming Interface (API) for rapid integration of other components, distributed computation and shared control. The robot includes the parameters between face expressions and servo-motors. Hereafter, the robot is going to be referred as ZECA (Zeno Engaging Children with Autism).

Robot's input and processing
To allow the automatic identification of the answers to the robot's prompts, the children could select one of five rackets presented in front of them and showing it to the robot (Figure 3). The images displayed on the rackets were chosen considering the opinion from professionals working in special education (Costa et al., 2014b). The chosen option was the images with unknown persons, so it could be easier for the children to generalize to another human being.
Each racket featured a picture with a face representing an emotion and its corresponding label. Additionally, each racket had a Quick Response (QR) code which was used to automatically identify the emotion. This QR code was then read by one of the HD cameras of the robot. In case of failure, a keypad (Wizard of Oz) was used to guarantee the feedback from the robot to the child on time.
The experiments started with the robot prompting verbally the child (for example, "What is the correct choice?". The child answered, choosing the corresponding racket. When the child answered successfully, the robot gave him/her a reward based on the type of favourite reward identified by the teacher (either movement, verbal, sound or combinations of them). If the answer was incorrect, the robot shook its head and said, for example "Ups. Pay attention. Let's try another one!".
As it can be seen in Figure 3, each picture in the racket was labelled. The label was added to match the teaching strategy used by the professionals with the children participating in this study. Even that the children could identify the answer in the racket by its label, they first had to recognize the expression displayed either by the robot or by the human partner.

Experimental setup
The sessions took place in an individual context, encouraging triadic relationships between the child, the experimenter and the robot (Figure 4). The arrangement of the elements in the room (robot, child and experimenter) was organized according to a cooperative position (Pease and Pease, 2008). The robot in the centre of the room forms a triangle with the child and the experimenter, promoting a triadic interaction. Two people work together on the same task, providing an opportunity for eye contact and mirroring. With this arrangement, the child's space is not threaten and there is no forced eye contact, allowing the experimenter to encourage the child to participate and be engaged in the interaction. All sessions were videotaped, with two cameras put in strategic places to record the interaction of the child with the robot and the experimenter.

Evaluation tools
The video analysis of the children's behaviours played an important role and they are the main source of information. The produced videos were analysed using specialized software, The Observer XT from Noldus (Noldus, 1991), to quantify predetermined behaviours performed by the children. In this list, state events stand for behaviours that take a period of time and therefore have a duration. Point events stand for a behaviour that only takes an instant in time, or whose duration is not important.
-Prompts (Point Events): prompt made either by the robot or the experimenter to request the answer happy, sad, surprised, afraid, or angry; -Answers (Point Events): -Happy, Sad, Surprised, Afraid, Angry: answer given by the child; -Successful: Right answer to the previous prompt; -Unsuccessful: Wrong answer to the previous prompt; -Unanswered Prompt: There is no answer from the child or when the experimenter repeats the previous prompt; -Duration (State Events): Duration of execution of the performance task.

Results
This section presents the results regarding the performance in the game scenarios Recognize, Imitate Me, and Storytelling, as well as the comparison between the pre-and post-test using the performance task. Since the obtained data do not follow a normal distribution, non-parametric tests were used to statistically analyse the acquired data. Whenever the data report to the comparison between the two groups, Mann-Whitney U tests are used to compare independent data from each group (e.g. first session G1 vs. first session G2). The comparison of sessions in the same group represents dependent data, since they were performed by the same child (e.g. G1: first session vs. last session) using Wilcoxon tests.
To ensure inter-rater reliability, 10% of the videos were re-coded by a second independent coder, resulting in a Cohen's kappa k = 0.72. This is acceptable, as having a Cohen's kappa value higher than 0.60 suggests a good agreement between the raters (Bakeman and Gottman, 1997).
The percentage of successful, unsuccessful answers and unanswered prompts performed by the children in G1 and in G2 is presented in Figure 5  When comparing the first session of each game scenario in both groups, no significant differences were found in the Recognize game scenario (p = .755), in the Imitate me game scenario (p = .135) nor in the Storytelling game scenario (p = .427). However, comparing the last session of each game scenario in both groups, significant differences were found in the Imitate me game scenario (p = .014) and in the Storytelling game scenario (p = .006). There was no significant differences in the last session of the Recognize game scenario (p = .660). Figure 6 compares the successful answers between the groups and according to each game scenario. Using a Wilcoxon statistical test, the first and the last session of each group were compared, in each game scenario (e.g. Performance of Session 2 and Session 5 in the Recognize game scenario).
When comparing the first session to the last session in the Recognize game scenario, significant differences were found for G1 (p = .013) but not for G2 (p = .069). The same was verified regarding the Imitate Me game scenario (G1: p = .001; G2: p = .063) and the Storytelling game scenario (G1: p = .001; G2 = p = .868).
On average the performance of G1 in the Recognize game scenario increased by 23% (from 50.5 to 73.4%) while the performance of G2 only increase by 9.2% (from 52.5 to 61.6). In the Imitate Me game scenario, the performance of the children in G1 increased on average by 16.2% (from 67.5 to 83.7%) and increased only by 7.6% in G2 (from 54.3 to 62.0%). In the Storytelling game scenario, there was an increase in the performance of the children in G1 by 19.5% (from 62.7 to 82.3%) and a decrease by 0.4% in G2 (from 51.8 to 51.4%).
Besides comparing the first and the last session in each group and between groups, it is important to verify if the number of successful answers (SA) over-came the children's number of unsuccessful and unanswered prompts (UUP) along the sessions.
There was no significant difference when comparing the number of SA with the number of UUP in the first session of the Recognize game scenario performed by children in G1. In the same group, significant differences were found in the last session of this game scenario when comparing the number of SA with the number of UUP (p = .005). Both for the first and for the last session of the Imitate Me game scenario, the number of SA overcome the number of UUP (first session: p = .010; last session: p = .001) in children in G1. In the Storytelling game scenario, significant differences were not found in the first session (p = .154) but they were found in the last session (p = .003).
The same analysis comparing the number of SA and UUP by children in G2 was performed and no significant differences were found.

Comparison of the pre-and post-test data between G1, G2, and G3
As mentioned in Section 3.2, a third group of children participated in this study, performing only the pre-and the post-test. Before and after the experimental procedure which included the performance of the three game scenarios, the children of the three groups completed a performance task. Figures 7 and 8 present the average number of attempts and the time that children, in every group, took to complete the performance task twice. Figure 7. Number of attempts in the performance task by children in G1, G2, and G3. There was no difference in the number of attempts to complete the task No significant differences were found between any of the three groups when comparing the number of attempts to match the two series of cards representing facial expressions in the pre-and in the post-test (i.e., number of attempts in the pre-test: G1 vs. G2 vs. G3 and number of attempts in the post-test: G1 vs. G2 vs. G3). In addition, significant differences were not found when comparing the dura- However, significant differences were found in the time that children took to complete the task from the pre-to the post-test: G1 pre-test: 99.3 seconds, posttest: 82.0 seconds, p = .017; G2 pre-test: 155.7 seconds, post-test: 115.56 seconds, p = .031; G3 pre-test: 140.47 seconds, post-test: 105.93 seconds, p = .026.

Discussion of the results
This study is focused on one behaviour, which is seriously impaired in children with ASD: facial expression recognition.
Concerning the children's performance in the game scenarios (Table 1), it is understandable that, when comparing the first session of each game scenario between groups, there was no difference. The children were theoretically at the same level in the beginning of the procedure. The children were assigned to each group randomly, only taking into account their age.
Between the first and the last session of the Recognize game scenario only 4 sessions occurred and it is possible that the intervention did not happen for enough time to have the desired effect on the children, at least significantly. However, there was an effect on some children because the percentage of successful answers for G1 was higher than G2 in the last session. Children in G1 gave more than 10% successful answers in the Recognize game scenario than children in G2 in the last session. Although this result is encouraging, more sessions are needed and/or the sample size must be increased to show significant results.
The first session of the Imitate Me game scenario introduced a new dynamic to both groups because the children had to adapt their expectations to a new sit- uation. When comparing the last session of both groups, it was possible to verify a significant difference regarding the successful answers from the children (p = .014). This might imply that the robot had a role in promoting the acquisition of the skill of recognizing and imitating different facial expressions to convey emotions. This implication can overcome the limitation presented in the first scenario, since there were not enough sessions to a significant difference to be evident in the Recognize game scenario. Another possibility is that the children found it easier to imitate a robot because the facial expressions produced by the robot were standardized and repeated always the same way, unlike the ones produced by the human partner.
In general, children in G1 kept improving along the sessions and differences were found when comparing the first to the last session in each game scenario (Recognize -p < .013; Imitate Me -p = .001; Storytelling -p < .001). The results show a higher improvement in the Recognize game scenario than in the Imitate Me scenario because the first session of the Recognize game scenario was in fact the first session of the children with the robot/human partner. The implications are that the children were actually more focused on the physical aspect of the robot itself or in the new game partner than in the prompts of the game scenario. In addition, the performance of the first session of the Imitate Me game scenario was already close to 70% in G1, which gave a small margin of progression for the children.
As any intervention appropriate for children with ASD, it is expected that there is always an improvement in their performance, which was verified in G2. However, this increase was not significant as it was verified with the use of the robot with G1. This might indicate that in fact the use of the robot was a beneficial tool to promote the acquisition of the emotion recognition skill, especially because this was verified in G1 and not in G2, for all game scenarios.
The Storytelling game scenario shows the strongest results on the evolution of the children performance in G1, comparing: -the first to the last session of G1; -the successful answers to the sum of unsuccessful answers and the unanswered prompts of G1; -the performance of the children in G1 increased by 19.53% on average (from 62.7% to 82.3%).
Moreover, children who performed this game scenario with the robot had 30% more successful answers than the children who performed the game scenario without the robot. The more pronounced difference was verified in the Storytelling game scenario, classified by the professionals who participated in the focus groups as the most difficult task for the children.
The results from the pre-and the post-tests (Table 2) indicate that the children were faster to complete the task in the post-test in all groups with significant differences. However, there was no difference in the number of attempts to complete the task in all groups, which indicates that the children still had difficulty to generalize the knowledge acquired in the experimental procedure. Numerous systematic reviews have presented criteria to assess the confidence of evidence for a study (Palmen et al., 2012;Ramdoss et al., 2011). Research has to follow five criteria to be considered conclusive: (1) the study used an experimental design (e.g., a group design with random assignment, an ABAB design or a multiple baseline design), (2) adequate inter-observer agreement and treatment integrity were reported, (3) operational definitions for dependent variables were provided, (4) sufficient details for replication were provided, and (5) the design of the study provided at least some control for alternative explanations for increases in the target behaviour (e.g., a multiple baseline design across participants in which the start of interventions was staggered and simultaneous interventions targeting the same behaviour were held constant) (Ramdoss et al., 2012).
The presented study meets all of these criteria. First, an experimental design was used (i.e., a multiple baseline design across participants with random assignment to experimental groups). Second, inter-observer agreement and treatment integrity were adequate (i.e., a second rater coded 10% of the videos to insure the quality of the obtained data, evaluated by an intra-rater reliability test). Third, an operational definition of a prompt was provided (i.e., the experimenter/robot asks a syntactically correct question that implies carrying out an action). Fourth, participant characteristics, setting, materials, procedures and data-collection were described in detail to enable replication. Fifth, the design of the present study provided some alternative explanations using a control group. As a result, this study measured children's behaviour during a robot intervention and found evidence that a robotic tool might promote facial expressions recognition skills in children with ASD.

Summary of hypotheses and implications
This study targeted the analysis of the children's performance focusing on facial expresions recognition skills. As a comparative study, the goal was to verify if the robot had any measurable influence in game scenarios which aimed to encourage the identification and labelling of facial expressions. Regarding the research questions presented in the beginning of this section, here highlighted in bold, the following implications were found: a. Can a humanoid robot contribute to develop visual facial expression recognition in children with ASD?: The number of successful answers in G1 exceeded largely the sum of the unsuccessful answers with the unanswered prompts in the Recognize game scenario, while in G2 this was not verified. This is also verified, when significant differences were found comparing the first to the last session in this game scenario, in G1 but not in G2. However, the results comparing the pre-to the post-test were not conclusive since no difference was found regarding the number of attempts to complete the task for all the three groups, but there were statistical differences when comparing the time children took to accomplish the task for all groups. This might indicate that the children acquired the skill but had difficulty to generalize it. The expectations regarding this research question were partially fulfilled. b. Can a humanoid robot with the capability of displaying facial expressions elicit facial expressions imitation skills in children with ASD?: The results regarding the performance of the children in the Imitate Me game scenario indicate that children in G1 performed significantly better than children in G2. This was verified when comparing the number of successful answers to the sum of unsuccessful answers with the unanswered prompts, but also comparing the success between the first and last session of the game scenario, and between groups. The expectations regarding this research question were fulfilled. c. Can a humanoid robot help children with ASD to attribute mental states and to identify others affective state?: Considering the Storytelling game scenario, the children in G1 performed better than children in G2, so the expectations regarding this research question were accomplished. The performance of the children in G1 was 30% higher than the childrens performance in G2, after the procedure. Differences were observed only in G1 between the first and the last session of this game scenario, which might indicated that the robot helped the children understand the perspective of the character in the story.

Conclusions and future work
A tool which manages to attract the children's attention, giving an excellent opportunity to develop social skills that are deeply impaired in children with ASD, was developed. Promoting social interaction skills in this target group is challenging but the research presented in the present paper indicates that this tool may facilitate the learning process. The main contributions of this research are: -An original study with 45 children with ASD compared the use of a robotic tool to traditional intervention aiming to promote emotion recognition skills; -The children's performance provide strong evidence of the robot being a valuable tool to encourage the acquisition of facial expressions recognition skills by children with ASD. This knowledge was attained at three different levels either by identifying and labelling facial expressions and the correspond-ing gestures, imitating facial expressions, and inferring the affective state of another person; -A game scenario focusing on imitation of emotional facial expressions provided strong proof of the engagement of the children in the interaction, validated by their non-verbal behaviours; -Storytelling, a game scenario that had the specific goal of identifying the affective state of the character at the end of the social story. Children who interacted with the robot, presented an improvement in performance between the first and the last session of the Storytelling game scenario, which might strongly indicated that the robot helped the children understand the perspective of the character in the story.
Nevertheless, there are some limitations which can be identified in the studies presented in this article. The experimenter had to adapt to the individual differences between the children, mainly constituted by their communication abilities (non-verbal vs. verbal) and differences in the attention span, which might have resulted in slight variations of the experimental procedure during the sessions. Regarding future work, the developed game scenarios with the robot could be used in small groups context. The research presented in this article has already shown the potential of the robot to encourage the interaction between a child with ASD and an adult, in an individual context, and it would be interesting to observe how the children with ASD would split their attention and would interact in small groups.
In addition, several experiments could be conducted to narrow down the beneficial factors of the robot, and testing their impact. For example, an experiment could investigate the impact of the added gestures corresponding to the facial expressions, to verify if their addition provides a differential effect or not. In addition, in the Storytelling game scenario, the performance of the children could be evaluated when the storyteller uses facial expressions or not. Still related to this game scenario, measuring the change in the children's emotional vocabulary used before and after the intervention could be a useful procedure to evaluate the benefits in terms of generalisation.