A total of 17 studies were reviewed

A total of 17 studies were reviewed, all meeting inclusion criteria. 82% of the studies were pub-lished in the USA, with just one study from each of Canada, Finland and France representing the rest of the studies. All of the studies measured behaviour performance as a result of concur-rent auditory feedback as a tool to facilitate learning and retention. The behaviours measured were varied, with eight studies focusing on sport and leisure activities (47%) which included: shooting (Konttinen et al., 2004), rowing (Gautheir, 1985), football (Harrison & Pyles, 2013; Stokes et al., 2010), dance (Quinn et al., 2015; Quinn et al., 2017), golf (Fogel et al., 2010) and pole vaulting (Scott et al., 1997). Three of the studies (17%) included individuals with Intellec-tual Disabilities (ID; Holden & Corrigan, 1980; Wertalik & Kubina, 2017; Persicke et al., 2014), two (11%) included surgical residents (Levy et al., 2015; Levy et al., 2016) and one from each of; air force, university students, infants and behaviour therapists (Reynolds & Ad-ams 1953; Hubbard, 1951; Lee & Newell, 2013; Herron et al., 2018). These were published across twelve different journals (see TABLE 4 for breakdown of where studies are published) with five studies (29%) in the Journal of Applied Behaviour Analysis, two (11%) in the Jour-nal of Experimental Psychology, two (11%) journals focused solely on sport (Journal of Sports Sciences and Journal of Sport & Exercise Psychology), two on ID (Journal of Autism & De-velopmental Disorders and American Journal of Mental Deficiency), one focused on clinical orthopaedics (Clinical Orthopaedics and Related Research) with the rest focusing on behav-ioural health, education, management and development.

3.1 Participants
A total of 450 participants were involved in the studies, the distribution of gender was reported for 82% of these studies and one study of 81 participants did not report the ratio of male to fe-male (Hubbard, 1951). The majority of studies reporting gender had male participants (84%). Only one study (5%) included equal numbers of male and female participants (Gauthier, 1985), while 7 studies (41%) included only males in their sample (Reynolds & Adams, 1953; Wertalik & Kubina, 2017); Scott et al., 1997; Persicke et al., 2014; Harrison & Pyles, 2013; Stokes et al.,, 2010; Konttinen et al.,, 2004). 12 papers reported the age of their participants (70%), with six of these reporting exact age (50%), with mean of 6.5 years, three papers reported the mean and range together (Stokes, et al.,, 2010; Gauthier, 1985; Konttinen et al.,, 2004) and two re-ported just the range (Harrison & Pyles, 2013; Quinn et al.,, 2017), while one reported an age approximation (Fogel, et al.,, 2010). The overall participant age range was 0.19-25 years.
In the present study, 22 participants (4.9%) had an ID, three of these participants (13%) had IEP goals associated with the skills needed for the study (Wertalik & Kubina 2017), two of these were selected based on scores from the Adaptive Behavior Assessment System, 3rd Edi-tion (ABAS-III; Harrison & Oakland 2015) and one was based on Vineland Adaptive Scales, 2nd Edition (Vineland-II: Sparrow et al., 2005). One participant (Persicke et al., 2014) was di-agnosed with the DSM-IV-TR criteria (American Psychiatric Association 2000) and scored mild to moderate on the Childhood Autism Rating Scale (CARS: Schopler et al., 1980). The remaining 18 participants (81%) with ID received a mean IQ score of 65.2 (SD = 7.2) assessed with the Wechsler Intelligence Scales (Wechsler 1974).
The military was represented by 228 participants (51%) in two studies (Konttinen et al., 2004; Reynolds & Adams, 1953) and 98 (22%) of the participants were university students across three studies in rowing, shooting and a discrimination task (Gauthier 1985; Scott et al., 1997; Hubbard 1951). Surgical and medical students were represented by 35 (7%) participants across two studies (Levy et al., 2015; Levy et al., 2016). A study using infants had 11 partici-pants (Lee & Newell, 2013), eight participants (1.7%) from two studies were high school foot-ball players (Harrison & Pyles, 2013; Stokes et al., 2010), students of dance were represented by 10 participants across two studies (Quinn et al., 2015; Quinn et al., 2017) and golf was rep-resented in one study with a single participant (Fogel et al., 2010).
3.2 Types of study design
TABLE 1 and TABLE 2 shows that there were both Group Design (GD; n = 8) and Single Case Design (SCD; n = 9). The SCDs within this review included Multiple Baseline Design (MBD; n = 6; 35%) with three of these across participants (Harrison & Pyles 2013; Quinn et al., 2017; Stokes et al.,, 2010), two across behaviours (Quinn et al.,, 2015; Herron et al.,, 2018) and one across skill sets (Fogel et al.,, 2010). Further SCDs were ABAB design (n = 1; Per-sicke et al., 2014), Changing Criterion Design (CCD; n = 1; Scott et al., 1997) and Adapted Alternating Treatments Design (AATD; n = 1; Wertalik & Kubina, 2017). Of the GD studies there were Random Control Designs (RCT; n = 4; Konttinen et al., 20014; Hubbard, 1951; Reynolds & Adams, 1953; Gauthier, 1985), Cluster designs (n = 2; Holden & Corrigan, 1980; Levy et al., 2016), Quasi Experimental Designs (QED; n = 2; Lee & Newell, 2013; Levy et al., 2015). One design randomised only some participants (Levy et al., 2015), one used repeated measures (Lee & Newell, 2013) and six (75%) studies used independent groups (Hubbard, 1951; Reynolds & Adams, 1953; Holden & Corrigan, 1980; Gauthier, 1985; Konttinen et al., 2004; Lee & Newell, 2013; Levy et al., 2016).

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

3.3 Types of interventions
The studies reviewed include a variety of interventions to facilitate learning across different types of participants, settings and subject areas. These included combining Auditory Feedback (AFb) with correction (n = 1; Persicke et al., 2014), prompts (n = 1; Scott et al., 1997), time on task (n = 2 ; Reynolds & Adams 1953; Holden & Corrigan, 1980), proximity to a target (n = 3; Konttinen et al., 2004; Lee & Newell, 2013; Gauthier, 1985), correct responding (n = 2 ; Her-ron et al., 2018; Hubbard, 1951), verbal instruction and shaping with task analysis (n = 7; Har-rison & Pyles, 2013; Wertalik & Kubina, 2017; Stokes et al., 2010; Quinn et al., 2015; Fogel et al., 2010; Levy et al., 2015; Levy et al., 2016) and task analysis using modelling with peer feedback (n = 1; Quinn et al., 2017). In addition, two of these studies also assessed AFb during extinction as a function of learning (n = 2; Hubbard, 1951; Reynolds & Adams 1953) and one study compared AFb with Video Modelling (VM), an already established intervention for peo-ple with ID (Wertalik & Kubina, 2017).
Duration of treatment was also examined across studies, this information was not available (n = 2; Holden & Corrigan, 1985; Levy et al., 2015), not determinable (n = 2; Gauthier, 1985; Herron et al., 2018), weekly sessions over a range of 8-39 weeks (n = 3; Quinn et al., 2017;Stokes et al., 2010; Fogel et al., 2010), number of days ranging from 12-78 days (n = 2; Wertalik & Kubina, 2017; Konttinen et al., 2004), every other week for a period of 6 weeks (n = 1; Lee & Newell, 2013), one month per group (n = 1; Levy et al., 2016) and number of ses-sions/trials ranging from 32-200 (n = 4; Harrison & Pyles, 2013; Hubbard 1951; Scott et al., 1997; Reynolds & Adams 1953).

3.4 Settings
Some of the interventions were conducted in the locations central to the task being taught (n = 7; 41%), for example football field (n = 2; Harrison & Pyles, 2013; Stokes et al., 2010), dance studio (n = 2; Quinn et al., 2015; Quinn et al., 2017), golf driving range (n = 1; Fogel et al., 2010), shooting range (n = 1; Konttinen et al., 2004) and track (n = 1; Scott et al., 1997). Other interventions were carried out in specialist university laboratories with specialist purpose-built equipment (n = 7; Holden & Corrigan 1980; Gauthier, 1985; Lee & Newell, 2013; Hubbard, 1951; Reynolds & Adams, 1953; Levy et al., 2015; Levy et al., 2016), in a school setting (n = 1;Wertalik & Kubina, 2017) and home (n = 1; Herron et al., 2018).

3.5 Outcomes
All 17 studies reported treatment effects with the use of AFb except for one of the experi-ments in one of the group studies (Gauthier, 1985) which used electromyographic (EMG) and AFb to signal correct engagement of biceps and triceps for ideal rowing. Further group studies reported that there was no significant difference between control and experimental groups using AFb during acquisition but post training performance was superior to control groups (n = 4; Hubbard, 1951; Reynolds & Adams, 1953; Konttinen et al., 2004, Levy et al., 2015), there was an increase in performance across trials (Holden & Corrigan 1980), a decrease in performance across trials as function of age depending on the type of AFb (n = 1; Lee & Newell, 2013) and promising results when measuring time to achieve task precision (n = 1; Levy et al., 2016).
For 8 of the 9 SCD studies the outcomes were measured by visual analysis of the data, the other measured celeration of behaviour change using the frequency of correct steps (n = 1; Wertalik & Kubina, 2017). The majority of SCD studies showed an increase in learning of the target skill with AFb and there were promising results when it was compared to other evidence-based interventions, such as video modelling (VM; n = 1; Wertalik & Kubina, 2017). Most of the SCD (n = 8; 88%) studies measured performance with percentage of correct trials/skill steps to criteria (n = 7; Harrison & Pyles, 2013; Quinn et al., 2017; Stokes et al., 2010; Quinn et al., 2015; Fogel et al., 2010; Scott et al., 1997; Persicke et al., 2014) and percentage of oppor-tunities to correct steps (n = 1; Herron et al., 2018).
The included studies were also examined for information relating to generalisation and maintenance and nine included some measure (52%), others included both (n = 2; Fogel et al., 2010; Persicke et al., 2014), generalisation alone (n= 1; Harrison & Pyles, 2013), just mainte-nance (n = 6; Quinn et al., 2017; Stokes et al., 2010; Herron et al., 2018; Gauthier, 1985; Konttinen, 2004; Levy et al., 2016) and the rest did not include either (n = 8; Wertalik & Kubi-na, 2017; Scott et al., 1997; Reynolds & Adams 1953; Hubbard, 1951; Holden & Corrigan, 1980; Lee & Newell, 2013; Levy et al., 2015; Quinn et al., 2015).
Studies that included assessment of social validity (n = 7; 41%) showed mostly positive outcomes (n = 3; Levy et al., 2016; Herron et al., 2018; Fogel et al., 2010), rated it better than their typical training (Quinn et al., 2015), varied but too repetitive whereas teachers rated it higher than baseline (Quinn et al., 2017), teachers rated it positively (Wertalik & Kubina, 2017), others gave mixed ratings with only one of four participants rating it higher (Stokes et al., 2010) and the rest (58%) did not include social validity measures (n = 10; Persicke et al., 2014; Scott et al., 1997; Adams & Reynolds 1953; Hubbard, 1951; Holden & Corrigan, 1980; Harri-son & Pyles; Gauthier, 1985; Konttinen et al., 2004; Lee & Newell, 2013; Levy et al., 2015).
3.6 Quality Appraisal
The criteria used to assess the methodological quality of included group and single case studies can be found in Appendix XYZ. Each study was assessed for both primary and secondary quality indicators and the quality rating was assigned (based on XYZ Appendix)
From the 17 studies, one (6%) single case design was rated as strong (Herron et al., 2018); seven (41%) were moderate (n = 2 group design; Konttinen et al., 2004; Lee & Newell, 2013 and for single case design n = 5; Harrison & Pyles, 2013; Quinn et al., 2017; Fogel et al.,
?

2010) and nine (52%) received a weak rating (n = 6 group design; Hubbard, 1951; Reynolds & Adams, 1953; Holden & Corrigan, 1980; Gauthier, 1985; Levy et al., 2015; Levy et al., 2016 and for single case design n = 3; Wertalik & Kubina, 2017; Scott et al., 1997; Persicke et al., 2014).
The primary quality indicators across both group and single case design showed that only one group study overall had an unacceptable rating on the dependent variable (Levy et al., 2016), the rest across both types of studies were either acceptable or high, six (35%) of all studies (one of these was single case design; Persicke et al., 2014) had unacceptable ratings on participant characteristics (Levy et al., 2016; Levy et al., 2015; Gauthier, 1985; Reynolds & Adams 1953; Hubbard, 1951). Across both sets of studies, secondary quality indicators were reviewed, the use of either blinding procedures or raters was reported in only three (17%) studies (Quinn et al., 2015; Lee & Newell, 2013; Herron et al., 2018), procedural fidelity was not reported in any of the group design studies and only reported in three (33%) single case studies (Wertalik & Kubina, 2017; Persicke et al., 2014; Herron et al.,, 2018), generalisation and maintenance was reported for seven (77%) of the single case studies (Quinn et al., 2017; Stokes et al., 2010; Harrison & Pyles, 2013; Fogel et al., 2010; Scott et al., 1997; Herron et al., 2018; Persicke et al., 2014) and five (63%) of the group studies (Hubbard 1951; Reynolds & Adams 1953; Gauthier, 1985; Levy et al., 2016), effect sizes were not reported in any of the single case stud-ies, instead they were derived from the graphs and calculated for seven (78%) of the single case studies, they were reported in only one of the group design studies (Konttinen et al., 2004), so-cial validity was reported in two (25%) of the group studies (Levy et al., 2015;Levy et al., 2016) six (75%) did not report this from the group studies (Hubbard, 1951; Reynolds & Ad-ams, 1953; Holden & Corrigan; Gauthier, 1985; Konttinen et al., 2004; Lee & Newell, 2013), whereas, six (67%) of the single case studies reported social validity (Quinn et al., 2017; Stokes et al., 2010; Quinn et al., 2015; Fogel et al., 2010; Wertalik & Kubina, 2017; Herron et al., 2018) and none of the group designs reported interobserver agreement and all of the single case designs did.
For single case designs, the primary quality indicator of baseline condition was reviewed and all had either a high or acceptable rating. Similarly, only two studies had an unacceptable rating on visual analysis (Wertalik & Kubina, 2017; Scott et al., 1997), these studies used alter-nating treatment design and a changing criterion design. Experimental control was reviewed with there were no unacceptable ratings and none of them reported a Kappa statistic.

For group designs, primary quality indicators on standardised measures showed that none had unacceptable ratings, whereas three had unacceptable ratings on statistical tests (Gauthier, 1985; Levy et al., 2015; Levy et al., 2016), only one had an unacceptable rating on comparison condition (Levy et al., 2016), five (62%) had high ratings on statistical tests except three (Gauthier, 1985; Levy et al., 2015; Levy et al., 2016). For secondary quality indicators four (50%) had strong ratings for random assignment (Hubbard, 1951; Reynolds & Adams, 1953; Gauthier, 1985; Konttinen et al., 2004), six (75%) showed a measure of baseline equivalence (Hubbard, 1951; Reynolds & Adams 1953; Holden & Corrigan, 1980; Gauthier, 1985; Konttinen et al., 2004; Lee & Newell, 2013) and sample attrition was not a source of bias in the group designs.

3.7 Effect sizes and statistical findings
Three group studies (n = 303; 67%) reported that there was no significant difference between test and control groups during acquisition using an auditory stimulus (Levy et al. 2015; Reyn-olds & Adams, 1953; Hubbard, 1951). Levy et al., (2015) used accuracy and fluency measures to report that the time to tie the first knot was significant (median 271 seconds range, 184-626 seconds) in favour of the control group (median 163 seconds range, 93-900 seconds p = 0.017) with an effect size in the low range (d = 0.32 for a 95% confidence interval{CI}), for the drill behaviour, the results were not significant compared to the control group (test group M = 193 seconds, SD = 26 and for control M = 146 seconds, SD = 63, p = 0.084), however the Cohen’s d showed a strong effect in favour of the auditory stimulus group (see table 4; d = 0.98for a 95% CI). However, the number of technique successes were significant for both knot tying and drilling precision compared to control (p = 0.006 for median successes com-pared with an odds ratio of 82.329.1-232.8 for a 95% CI) and the number of mistakes was higher in the control group compared to the test with (d = -1.06 for a 95% CI). Similarly, Levy et al., (2016) also looked at fluency accuracy and speed for ten surgical tasks in a test after a month-long rotation to help design a modular curriculum for orthopaedic residents and found one participant unsuccessful at one knot tying task and another did not perform the plunge board test. The average time to tie 10 successful right/left hand square knots, knot in a bucket, locking sliding knot ranged from 46.4 – 83.1 seconds (SD 10.5 – 22.8), number successful from number attempted was 60/60 except for the locking sliding knot which was 50/60. Cast saw criteria was to cut through a cast with no injury time was 139.7 (SD = 58.2), with 6/30 complete cutes, 22/30 partial cuts and 1/30 eggplant cuts, ten successful plunge board drilling averaged 123.5 seconds (SD = 24.7) with 25/60 successful, 21/60 dimples in third board and 14/60 full penetration of the board. Blind target was to drill eight holes evenly average time was 242.8 seconds (SD = 64.9) with 4/30 in the tunnel, 16/30 half an inch from the target, 8/30 five eighths from the target, 9/30 three quarters from target and 11/30 one inch from target. Saw one limb of tuning fork with criteria to saw off three slices with no corners or scuffs on second limb average time was 145.1 seconds (SD = 40.1) corner fractures 2/18, scuffs on second limb 5/18, cuts in second limb 2/18, tunnel with cannulated reamer, criteria to complete tunnel in one minute, six were completed, no guide pins advanced and one removed (not supposed to ad-vance or remove guide pin), cut box graft with chisel in two minutes criteria 10 mm x 20 mm x 10 mm, mean width 9.7 mm (SD= 0.5), mean length 20.3 mm (SD = 0.5), mean depth 4.5mm (SD = 1.26) all in six grafts. No effect size measurements were possible as no comparison groups.
The remaining two studies who had similar findings during learning with auditory feedback examined the role of an auditory stimulus under conditions of acquisition and then extinction in terms of correct responding (Hubbard 1951) and proximity to target (Reynolds ; Adams 1953) and found that there was no significant difference between performances of control groups compared to test groups during learning (mean range 28.78- 29.23 in the 0.1-2.0 second groups including control). However, they both examined extinction as a function of learning and found, through analysis of variance that auditory stimuli significantly enhanced perfor-mance of learning (the F for between group differences being significant at the 5% level for all clicks compared to control and when the .5 second click was added after the extinction trial for all groups it was significant at the 1% level), particularly when the sound was delivered at a 0.5 second interval (Reynolds ; Adams 1953) and when it is used during extinction trials (Hub-bard, 1951), it was also found that (group 1trained with light and tone extinction none M = 15.00, group 2 trained with light and tone but tone during extinction M = 33.19, group 3 trained with light only and tone during extinction M = 26.04) group 2 were superior to all groups (significant p