Education Application Testing Perspective to Empower Students' Higher Order Thinking Skills Related to The Concept of Adaptive Learning Media

This article aims at arguing for the importance of the testing step when designing an educational application by taking a case study from the development of adaptive learning media. The media contains a set of instruments that are specifically built to empower students' critical thinking skills. Three aspects that are considered in testing this educational application are application validity at each stage of system development, measurement of the final system feasibility test for user needs, and system implementation by running learning media on the test sample. Implementation of testing on application products is carried out according to system requirements and models. The existence of the characteristics of adaptive media and the diversity of menus in the application implies the importance of doing a lot of improvisation when carrying out tests, such as determining the right test cases, choosing the appropriate test model and method, determining a suitable test environment, and considering several other aspects aimed at optimizing test results. obtained in order to ensure the quality of learning media products. This study analyzed the test data using Likert scale as an interpretation of the results of the validation assessment from the experts by referring to certain perceived standards of assessment. Meanwhile, the analysis of the data from the feasibility test results from a sample of 20 students using the system usability scale (SUS) instrument. The technique to test the effectiveness was using a pretest-posttest control group design with a sample of 98 students. Parametric/non-parametric data analysis was then applied to analyze the data on the results of testing the effectiveness or efficacy of adaptive media products in improving students' higher order thinking skills (HOTS). Based on the testing steps applied to the application of adaptive learning media, the results obtained that the product was considered feasible and effective in empowering students' HOTS. The study concludes that the educational application testing that has been carried out is able to provide an objective and independent view of the application of adaptive learning media which will be useful in operational functions to understand the level of effectiveness in its implementation before being widely used in learning.


Introduction
Educational application testing is an indepth investigation carried out to obtain information about the quality of a learning media product being tested (Maulana, A., et. All., 2020).The increased visibility of educational applications as system elements and the "costs" arising from application failures Indonesian Journal on Learning and Advanced Education http://journals.ums.ac.id/index.php/ijolaehave motivated good planning through careful and accurate testing.This makes testing educational applications an important stage in the development of learning media.The reasons why testing is necessary are application developers are not good enough programmers; application developers may not be able to concentrate specifically on avoiding mistakes; application developers sometimes forget to use structured programming in full; application developers are sometimes bad at doing things; and application developers must can distinguish what other developers or users are saying and what they really think (Schwan et al., 2018).
Currently, learning aids are being developed in the form of information technologybased learning media (Afandi et al., 2018) (Seechaliao, 2017).Various advanced concepts and algorithms have been implanted as the embodiment of learning applications that follow today's technological developments.Therefore, it is important to adhere to the correct testing rules in maintaining the quality of the resulting media products.
Testing can be done by evaluating the application configuration consisting of requirements specifications, design descriptions and the resulting program (Kurniawan, D., et. all., 2022) (Purmadi & Surjono, 2016).The evaluation results are then compared with the expected test results.If errors are found, the application must be repaired and then tested again.So, basically testing activities can be considered destructive rather than constructive.However, the importance of testing the application of learning media and its implications for quality cannot be overemphasized because it involves a series of production activities where the chances of human error can be very large.Therefore, the development of educational applications should be accompanied by quality assurance activities (Bedjou et al., 2015).
One of the products produced in this research is adaptive learning media.The adaptive concept is pinned to describe the media's intelligence in adjusting the presentation of material according to the character of student learning.In this article, it will be explained how the form of testing that has been carried out during research in the implementation of learning media validation, measuring application feasibility, and evaluating product effectiveness in HOTS empowerment.

Method
This study includes two main aspects, namely empirical studies and practice testing steps in the development of adaptive learning media.The study presented in this article is the result of a study of several literature sources, both printed and electronic.Sources include primary and secondary sources that were studied empirically and descriptively.Furthermore, the testing steps were thoroughly practiced in the development of adaptive learning media based on the identification of student learning characters.The learning media development method used Research and Development (R&D) (Kusuma et al., 2017) with the appropriate Luther development model (Sulistyanto, H. et al., 2019).By following the steps in the R&D method, testing was carried out.
Testing activities consisted of two stages of the R&D method, namely the development stage consisting of validation and feasibility testing of product drafts and the testing stage called the application effectiveness test step.In the feasibility test, the product draft was evaluated three times to nine students to study the products and assess the performance of the products for improvement s.In the feasibility test, suggestions for improvement were obtained for the product draft for revision before beng developed further.
Feasibility tests were also conducted by referring to the standard software application design testing using the System Usability Scale (SUS) model (Sauro, 2011) determining the feasibility of using the product.
Based on the suggestions, inputs, and improvements, the results of the analysis of this activity became the basis for conducting the final revision.The second and third feasibility tests were conducted in small groups of five students as a descriptive and meaningful test of the feasibility of media products before field testing; (4) The evaluation process was also carried out by teachers and students as respondents after using media products.The effectiveness evaluation was conducted in the form of pre-test and posttest to measure students' understanding ability or cognitive learning outcomes.The effectiveness analysis technique employed a pretest-posttest control group design research design as described in Table 1 below.In testing the effectiveness of the product used a population of all students.A sample of 98 students was taken by using cluster random sampling method to determine which students were members of the experimental class and the control class.The form of the developed application product is shown by the system diagram in Figure 1 below.

Results and Discussion
The benchmark used to judge that the test is good according to Pressman (2010) and Sommerville (2011) is that the test has a high probability of finding errors.For this to happen, testers must understand how the app can fail.Each test conducted must have a different purpose.Furthermore, the test conducted is the best type of test.Testing is possible in several ways.The test used was the one having the greatest probability of uncovering all categories of errors (with the least amount of time and effort).So, the test was complex.
Product draft validation was a process to assess whether the developed product draft in accordance with the existing theoretical requirements.This validation was rational, as it was based on facts in the field.The inter-pretation of the validation assessment category for learning media products in this study fell into the following scale (Wu & Leung, 2017)  The summary of expert validation on the concept of adaptive learning media syntax is shown in Table 3.The results of expert validation obtained an average value of 82.50% interpretation which means the concept of the adaptive learning media model is very feasible.The syntax of the adaptive learning media is considered to have clear, systematic, logical stages, and can be used to measure critical and creative thinking skills.Students were judged to be able to formulate answers in their own words, motivated to ask questions and active in debate when applying the model.The model is considered capable of providing opportunities for students to take the initiative, be responsive, innovative, communicative and respectful of each other.
The model is also equipped with learning tools in the form of modules that are considered good by experts.Students as targets of model application, can understand the material, are able to work together, and empower critical thinking skills as the instructional impact of model application.Experts assess this adaptive learning model can provide an accompaniment impact, namely fa- miliarizing students in solving problems actively and establishing good communication between students so that in the end it can empower critical thinking skills.The advice given by the expert is that the support system is better with a little more detail.
The summary of expert validation or learning technology experts on adaptive learning applications can be seen in Table 4 below.Table 4 presents information on the results of the assessment of two learning technology experts with an average index of interpretation of 86.00%.Based on the rating scale category, it can be seen that the two learning technology experts considered it very feasible to draft an adaptive learning application product that was developed to be piloted.Suggestions and inputs given by learning technology experts 1 and 2 can be concluded as follows: 1) In general, the material presented needs to be slightly improved on the depth and breadth of the material.2) In the aspect of assessment, a more even distribution of material is needed; 3) need to clarify the description of indicators and try to add or adjust the time allocation.4) The suitability of the question with the make elaboration problem aspect is slightly clarified.
Furthermore, a summary of the validation of the application design and graphic experts is shown in Table 5 below.Table 5 provides information on the results of the assessment of two design and graphic software application experts with an average interpretation index of 83.13%.Based on the rating scale category, it can be seen that both design and graphic experts rated it very feasible for the draft adaptive learning application to be developed.Suggestions for improvement from design and graphic experts include improvements to: 1) arrangement of manuscripts, pictures, illustrations to make them easier to understand; 2) content needs to pay attention to free space; 3) consistency of letter shape and font size on each page from beginning to end: 4) linguistic aspects of the sentences used are made more flexible so that they are easy to understand; 5) management of free space in each view is optimized with appropriate content.
Data analysis of respondents' responses in the first feasibility test used a standard  (Martins et al., 2015).The summary of the data analysis of the feasibility test results is shown in Table 6 below.Through the calculation process according to the SUS rules in (Sauro, 2011) above, the final result of the average feasibility score is 77.625 as shown in Table 6.Based on Figure 2, it can finally be determined that the average respondent's feasibility test results are worth 77.625 which is greater than the SUS average value of 68.These results indicate that there is no problem with the application made.Based on Figure 2, the value of 77.625 is in the good rating domain, class C scale, and at an acceptable interval.This shows that the application of adaptive learning media is feasible and can be used in learning.
The next stage is testing the effectiveness of adaptive learning media.The first step is to test the balance between the experimental class and the control class.The data used are the results of the pre-test scores of the two groups.Because the results of the prerequisite analysis found that the data were not normally distributed, the non-parametric Wilcoxon Signed Rank Test (paired test) and Mann Whitney U Test (unpaired test) were used (Vong & Kaewurai, 2017).The results of the analysis are shown in Table 7 below.Based on the results of the nonparametric analysis as presented in Table 7, the comparison of student scores in the experimental group and the control group obtained a pre-test score with had a significance value of 0.358.As the value of Sig = 0.358 > 0.05, it can be concluded that there is no significant difference between the experimental group and the control group.This means that in the two groups there is no difference in initial ability before the treatment in the experimental group.The test results using the Wilcoxon Signed Rank Test is shown in  Based on the results of the nonparametric analysis of the Wilcoxon Signed Rank Test as presented in Table 8 above, the comparison of the pre-test and post-test scores in the experimental group was 0.000.Because the value of Sig = 0.000 < 0.05, the difference between the pre-test and post-test mean values in the experimental group was significant.Meanwhile, the comparison of pre-test and post-test scores in the control group was also 0.000.Because the value of Sig = 0.000 < 0.05, the difference in the average value between the pretest and post-test in the control group was significant.It suggested that both in the experimental group and the control group, the pretest and posttest scores were different.This means that based on the pre-test to post-test scores, students from both groups had the same ability to change Furthermore, the results of the test on the post-test scores between the two groups are shown in Table 4.35 below.Based on Table 4.35, the significance = 0.000 <0.05 so it means that the results of the post-test scores between the two groups have differences due to the treatment in the experimental group, namely learning using applications.(2-tailed) .000a. Grouping Variable: Group The analysis was also conducted on the increase in the difference between the results of the pre-test and post-test (gain) as an indi-cator of the effectiveness of adaptive media used in learning.The description of the gain score is shown in Table 10 below.In accordance with the interpretation category (Hake, 1999) the average gain score obtained in the control class is 55.11%, so belongs to less effective.While in the experimental class, the score of 86.05% belongs to the effective category.
The description of the gain score per aspect of the HOTS is shown in  The gain score obtained in the HOTS aspect in the control class has a range between 36.91% to 71.68% with an average of 58.37% in the sufficient category.
Meanwhile, for the experimental class, the gain scores ranged from 63.01% to 98.13% with an average of 87.54% in the effective category.
The conclusion obtained is that in the HOTS aspect the increase in gain score in the experimental class is greater than the control class which indicates that adaptive learning media is effective in empowering students' HOTS.
Educational application testing is the process of running and evaluating learning media software manually or automatically to assess whether the application meets the requirements or not (Khodadi & Abadeh, 2016) (Clune and Rood, 2011) (Yoshii & Nakajima, 2012) (Nakagawa and Maldonado, 2011).In short, testing is an activity to find and determine the difference between the expected results and the actual results.
Application tests follows the building of applications from abstract concepts of user needs.The purpose of testing is to "disassemble" applications built.According to (Chang et al., 2015), Jin and Xue (2011) and Kumamoto et al. (2010) testing intends to find errors in the application compiler program and evaluate its quality.The purpose of application testing according to (Xie et al., 2015) is to assess whether the application developed has met user needs, assess whether the application development stages are in accordance with the methodology used, and make documentation of test results that inform the suitability of the application product assessed to the existing.The data collected during the test gave a good indication of the overall reliability and quality of the application.
According to Pressman (2010), application program testing has several important objectives, namely (1) testing is carried out with the intention of finding errors; (2) test success is the ability to find errors that have never been found before; and (3) a good test case is a test case that has a high probability of finding errors that have never been found before.Objectivity in testing can be achieved if there are several actors involved during the test, including according to Lamas et al. (2013) namely the customer (the team that contracts the application developer), the user (the group that will use the application), the application developer (the team that builds the application), and the application testing team (a special team assigned to test the functionality in the software application).In addition, it should always be based on the principle that testing can be traced to customer needs, testing should be planned before testing, testing should start with small results and then move on to larger things, over-testing will not be possible, and testing should be carried out by a third party (Baoling et al., 2020) (Jiang and Lu, 2012) (Lemos et al., 2011).
The implementation of learning media application testing usually matches to the development methodology used.Reza (2010) and Sommerville (2011) stated that testing is done after the programming stage but testing planning has been conducted from the analysis stage.Overall, the stages in testing include determining what will be measured, how the test will be conducted, building a test case, which is a set of data or situations that will be used in testing, then determining the expected results or actual results, running the case.test and compare the test results with the expected results.
The analysis phase emphasizes the validation of user requirements to ensure that the requirements have been correctly defined.2010), engineered products can be assessed by: (1) knowing the specific function that the product is designed to perform.Tests are conducted to ensure that each function is fully operational and to find faults in each function; (2) know the internal work to ensure that the internal components work according to specifications.So, in this case there are two types of test cases.First, to demonstrate knowledge of the specific function of the product designed, testing can be conducted to assess whether each function is running as expected.Second, to gain knowledge of how the product works, testing can be done to show how the product works in detail according to its specifications.
There are two kinds of test case approaches, namely white-box and black-box.The white-box approach is a test to show how the product works in detail according to its specifications (Lei & Jiang, 2010) (Jiang, 2012) (Pressman, 2010).The logical path of the application builder software will be tested by providing test cases that will work on a certain set of conditions and loops.Using this method will obtain test cases that ensure that all independent paths in a model have been used at least once.The use of logical decisions on the right and wrong sides, execution of all loops within the constraints and constraints of engineer operations, and use of internal data structures to guarantee its validity.At first glance, it can be concluded that the white box testing approach leads to getting the program 100% correct.The blackbox approach is a testing approach to find out whether all software functions have been running well in accordance with the functional requirements set (Jiang, 2012) (Pressman, 2010).This test case aims to show the function of the software that composes the application on how to operate it.This testing technique focuses on the application information domain, namely conducting test cases by partitioning the program input and output domains.The black box method allows the application engineer to derive a set of input conditions that fully utilizes all the functional requirements for a program.This test attempts to find errors in the categories of incorrect or missing functions, interface errors, errors in data structures or external database access, performance errors, and initialization errors and errors.
Application testing is one element of a broader topic often referred to as verification and validation.Verification is a collection of activities that ensure that a software application performs its function.While validation is a collection of various activities that ensure that the application built can meet customer needs.Or in other words, verification is "Are the products we make right?" and validation is "Are we really making the product?".Validation testing is carried out after all errors are corrected.An indicator of the success of the validation test is if the functions that exist in the software are in accordance with what is expected by the user (Setyaningsih, E., Agustina, P., Anif, S., Ahmad, C., Sofyan, I., Saputra, A., Salleh, W., Shodiq, D., Rahayu, S., & Hidayat, 2022).If the application is made for the customer, acceptance test can be done thus allowing the customer to validate all the re-Education Application Testing Perspective to Empower Students High Order Thinking Skills Related to the Concept of Adaptive Learning Media quirements.This test is carried out so that customers can find more detailed errors and familiarize customers with understanding the applications that have been made.The form of testing that can be done is alpha and beta testing.Alpha testing is done on the developer side by the customer (Pressman, 2010).The app is deployed in a natural setting with the developer "looking" over the user's shoulder and recording all usage errors and issues.Whereas beta testing is carried out on one or more customers by the end users of the application in a real environment.Developers are usually absent on these tests.The customer records all problems (real or imaginary) encountered during testing and reports to the developer at certain time intervals (Pressman, 2010).
In the end the application product is combined with other system elements and then a series of validation tests are carried out.If the test fails or falls outside the scope of the system development cycle, the steps taken during design and testing can be improved.System testing is a series of different tests with the main objective of working on all elements of the system being developed.Several types of system testing according to Pressman (2010) include recovery testing, security testing, and stress testing.
There are several other aspects in other perspectives that can be used as indicators of a good and optimal test implementation.As stated in the earlier section that the essence of testing is finding software defects and evaluating their quality (Pressman, 2010) (Sommerville, 2011) (Gehring et al., 2017) (Wu, 2010).In terms of quality, it is certainly not easy to justify the quality of an application product or not.The actual level of application product quality is inseparable from how the quality of the test is carried out.Because quality is not a specific concept but an abstract measure, the user can only know and judge that quality is essentially related to the level of service or product and that level is determined from the level of customer satisfaction.Judging from this, it is necessary to set quality standards.Some possible references that can be used to measure the level of quality of application testing are in the form of the quality of the test case itself where application testing can have defects as well and this deficiency can affect the ability of the test to find "bugs".".The next reference is the quality of the testing process whose stability depends on the test environment.Next is the quality of the test results that can be seen from the test report, as well as the quality of the test clients, namely the report readers.They can immediately feel the effect of the test so that the quality assessment can be considered immediately.Aspects second is the accuracy in choosing the test method and model.Not always a method or model that produces good tests on an application will also be suitable for other applications.The selection of the right test method will certainly contribute to optimal test results.Considerations that can be used in the selection of methods, among others, in terms of time, available manpower, as well as resources and equipment owned.Those three aspects vary in the implementation of the test.
Collaborating several test techniques will certainly increase the reliability of the application being tested because it has passed more than one test case.Application reliability can also be achieved by testing software that implements methods that have been proven to perform well, such as the Bayesian method (Xu et al., 2013) (Cheng et al., 2010) or matrix transformation (Yang et al., 2015) (Yang et al., 2015) (Yang et al., 2015(Yang et al., ). et al., 2011)).The fourth aspect is basing the test on the application architecture.Architectural design provides an overview of the form of the application body that contains components and their relationships.A good understanding of the architecture of an application will be extremely helpful in determining the appropriate test cases and test stages.Architecture-based testing will also assist in deeper flaw detection and prevention.The fifth aspect is that each application test does not need to always create new and special test cases.There is a possibility that the implementation of testing an application is only hosted with other applications.This is possi-  (Wang & Mendori, 2016) regarding the important factors of test the validity and reliability of a product to determine its feasibility.One of the efforts in quality assurance in the development of this application is at the feasibility trial stage carried out three times with the aim of ensuring that media development and the availability of features in the application are in accordance with the needs desired by the user.The results of the first feasibility test managed to capture a lot of suggestions from potential users.After making a number of improvements in accordance with the input suggestions, it is continued with the second trial.A number of suggestions and inputs were also resubmitted by potential users, but the quantity of suggestions submitted was much less than the suggestions in the first feasibility test.After making improvements according to the input suggestions from the second feasibility test, the third trial was then reapplied.The results of this third trial leave suggestions related to the appearance of the application design to improve the adaptive learning application accepted by potential users.In addition, some instruments used in the implementation of this application development research were valid and reliable based on the validity and reliability tests.Finally, it can continue until the stage of testing the effectiveness of the developed media.
Furthermore, the testing process of the effectiveness of adaptive learning applications was conducted using pre-test and posttest designs in the experimental and control groups (Kashani-Vahid et al., 2017).In the experimental group, the learning process was carried out using adaptive learning applications, while in the control group the learning method was used by giving modules and notes.The stage of testing the effectiveness was carried out with a series of pre-test and post-test in the form of ten essay questions sourced from the subject matter.Data.The problem description contains aspects of the HOTS assessment from Facione in (Seventika et al., 2018).
After the analysis, it is known that there is an incredibly significant difference in results where the experimental group using adaptive media in general is better at increasing the average post-test results compared to the module and note user group.Analysis was also conducted on every aspect of critical thinking skills by comparing the gain scores between the experimental and control classes.The increase in the critical HOTS gain score of the experimental group compared to the control group in aspects of Interpretation 93.18 (65.90),Inference 76.74 (44.52),Explanation 96.33 (71.68),Analysis 97.87 (64,83),Evaluation 63.01 (36.91),.The overall results show that in the HOTS aspects the experimental group gets a better gain score in the high category, while in the control class it is in the medium category.The results achieved are in accordance with the research submitted by (Nagao & Nagao, 2019); (Drissi & Amirat, 2016); (Bimba et al., 2017); (Tsortanidou et al., 2017) that HOTS can be empowered or improved by providing learning media that are in accordance with the character of the learning style and needs of students.

Conclusion
The main goal of application testing is to ensure the product quality of the resulting learning media.There are many parameters that influence to produce quality learning media application products, among others, related to how the environment is during testing, the selection of cases and testing methods, as well as the approach used.Other aspects that contribute to application testing so as to obtain optimal test results include Education Application Testing Perspective to Empower Students High Order Thinking Skills Related to the Concept of Adaptive Learning Media justification in terms of quality from many points of view, accuracy in determining the method and model of the test form, variations in collaborating test techniques, ignoring the form of testing.application architecture, and the possibility of combining (hosting) tests on other applications.In this study, testing steps have been applied carefully following strict testing rules.Based on the analysis of the test results, it can be concluded that adaptive learning media is able to empower students' HOTS with good assessment scores.With the findings from this study, it is expected that all educational application designers always prioritize testing techniques and procedures to ensure the production of quality application products.

Education Application Testing Perspective to Empower Students High Order Thinking Skills Related to the Concept of Adaptive Learning Media instrument
of web application feasibility test, namely the System Usability Scale (SUS)

Application Testing Perspective to Empower Students' Higher Order Thinking Skills Related to The Concept of Adaptive Learning Media
Table 8 below.

Application Testing Perspective to Empower Students High Order Thinking Skills Related to the Concept of Adaptive Learning Media
Table 11 below.