The world’s leading publication for data science, AI, and ML professionals.

Using Data Science for Improving an Education System: The Case for Indonesia

How data science can have a significant role in realizing Indonesia's Education Roadmap for 2045

Data for Change

My elementary school building in Indonesia (source: Author)
My elementary school building in Indonesia (source: Author)

Background

Learning is meant to be fun, therefore school should be a place of joy. This is how I felt during my elementary school years in the US. There was an exciting feeling in every session, from the thrills in doing class projects to the humor in seeing my teacher dressing up as her scientist alter ego. I actively participated in these learning sessions with my classmates and my teacher was the facilitator that ensured all voices were heard. When I moved to Indonesia, however, I saw such a striking difference: learning was mostly about getting good grades. Class was basically memorizing textbooks and math quizzes. The main goal was to get high exam scores and a top 3 rank in class. The teacher was regarded as a parent rather than a facilitator. This transformed my perception of school from a place of joy to an obligation. However, I am now excited to see that change is about to come.

In 2019, President Joko Widodo appointed co-founder of the tech unicorn Go-Jek, Nadiem Makarim, as Minister of Education and Culture to transform the system. A few months later, his team released a roadmap for 2020–2045 which I found really exciting. The roadmap aims to better prepare youth for global challenges through an initiative called Merdeka Belajar (Learning Freedom, which is a set of systemic changes shown in the table below.

The direction from the Merdeka Belajar initiative (source: Author's analysis)
The direction from the Merdeka Belajar initiative (source: Author’s analysis)

This initiative will be piloted in ~600 schools with two groups, 100 developed and 500 developing schools, to differentiate the approach according to readiness. The pilot group is aimed to expand to 10,000 schools by the end of 2025 and 30,000 in an additional ten years. Students are expected to develop their problem-solving, cognitive and social skills to then embody the following characteristics:

  • Noble
  • Independent
  • Creative
  • Collaborative
  • Critical Thinker
  • Global-Minded

As this seems like a great direction, my main question is how to ensure proper execution. This is where Data Science comes to use. The Ministry of Education is indeed developing digital platforms to promote stakeholder collaboration and increase learning effectiveness. Moreover, it will be a great tool to generate vast amounts of data to gather insight on the initiative’s implementation. To start the data science project, we will need to first define the project charter which is discussed in the next section.

Defining the Project Charter

Before we go into technical details, we must first determine the objective, method of analysis and stakeholders which combined is known as the project charter. For the case of Indonesia’s Education roadmap, our objective would be to:

  1. Test Merdeka Belajar is effective in developing students’ problem-solving, cognitive and social skills.
  2. Determine whether the direction is properly executed.
  3. Identify the main gap in execution.

The first objective requires a north star metric that best represents effectiveness. Following the roadmap’s minimum assessment component, the best metric would be students’ aptitude growth in literacy and numeracy. This can be done by making direct comparisons before and after the Merdeka Belajar initiative. Results for pilot schools can be considered good based on target achievement, using a fair benchmark, or significant difference to those of non-pilot schools.

Another north star metric worth considering is the assessment of students’ cognitive, social awareness and problem solving skills for real life situations. The method can focus on a particular subject using an online essay that can capture these aspects. A good subject to use could be history. When I was in school, history was mainly about memorizing key dates and events. Unless the teacher encourages it, which my teacher did, students do not need to apply critical thinking and problem-solving skills for this subject. For this case, an online essay can be provided to a control (non-pilot) and treatment (pilot) group. The question would be: "Learning from Indonesia‘s colonization history, what kind of policies and institutions does the government need to protect itself from being exploited by foreign nations?" An acceptable answer would give real cases from the colonization period, identify the root case and propose a set of policies to prevent repetition of similar events in the future. An advanced machine learning algorithm that classifies keywords can be used for pre-assessment, along with final assessment by experts. The essay assessment results will help determine the effectiveness of Merdeka Belajar.

For the next objectives, we would need to have deep-dive analysis on the students from both groups. Merdeka Belajar can be used as the framework to determine which categories separate students based on the North Star metric. By extracting the right data, we can see which areas have significant gaps that require immediate action.

The analysis can be translated into a decision tree, as shown below, to determine action items for the stakeholders: teachers, school principals, local authorities and the Ministry of Education.

Decision Tree for Indonesia's stakeholders in education (source: Author's analysis)
Decision Tree for Indonesia’s stakeholders in education (source: Author’s analysis)

A complete DataFrame is needed in order to apply this decision-tree, which is discussed in the next section.

Sourcing the Data

Before going into analysis, we need to determine how the data can be extracted, transformed and loaded into a proper database. The Ministry of Education currently has a number of online platforms such as Rumah Belajar (Study House) for virtual learning, Marketplace BOS as an e-budgeting and commerce platform for school principals and Guru Berbagi (teacher sharing) as a workplace platform for teachers. These platforms will be useful to obtain data on the school’s internal ecosystem, teacher performance, curriculum, pedagogy and grading system. There are also platforms for local authorities to update geographic information of schools, to understand its current environment. Parents should also have a platform so their involvement level can be traced. All these features will be needed to extract key components to form the DataFrame. Note that this should be required for non-pilot schools as well for further comparison.

Main data outputs from key platform features (Source: Author's analysis)
Main data outputs from key platform features (Source: Author’s analysis)

Creating the DataFrame

The most important element for the DataFrame is having unique IDs: school ID, teacher ID and student ID. This will enable us to create relationships with various datasets and form a holistic source of data. The next step is transforming the extracted data from the platforms into defined quantifiable metrics. I have come up with some as stated below:

Key columns and its logic to for the DataFrame for our models (Source: Author's analysis)
Key columns and its logic to for the DataFrame for our models (Source: Author’s analysis)

After these datasets are extracted, transformed and loaded into a data warehouse, we can now create the DataFame for in-depth analysis as shown in the figure below. Note that the data should be at the same timeframe level, which for this case would be semiannually.

Assembling the main DataFrame (Source: Author's analysis)
Assembling the main DataFrame (Source: Author’s analysis)

Applying Data Science Models

With the final DataFrame, we can generate key insights using simple analytics. We can create simple graphs or highlight tables to compare the north star results to target and/or non-pilot schools and identify the main gaps for each category. This data visualization will be helpful in giving an overview analysis for stakeholders. For in-depth analysis, we can use data science to identify key factors behind students’ aptitude growth or online essay scores.

The first data science model that comes to my mind is k-means clustering. This algorithm will identify k number of centroids to then cluster students into k number of groups based on similarities. Once the results are ready, we can then see which are the most dominant clusters and make a description based on the metrics that standout the most.

Example of k-means clustering (Source: scikit-learn.org)
Example of k-means clustering (Source: scikit-learn.org)

In this particular case, we can start with clustering students using the k-means algorithm to define the common characteristics among pilot school students based on the 6 dimensions. The results will not only show the percentage of students that meet the north star metrics, but also their similarities. For instance, the data would likely show Cluster 1 not only has high scores in the North Star but also for ecosystems and pedagogy, while it is vice versa for Cluster 2. The next step would be identifying the dominant cluster for each school and determine the type of support needed, following the decision tree.

Data science can also be used to predict the student’s north star metric achievement after applying improvement actions, using a linear regression model. This model can be designed to calculate the causal impact, the north star uplift, from the set of actions to determine its effectiveness. The way it works is comparing the actual outcomes against the prediction without interventions. This prediction will be based on the control group, which can be students from non-pilot schools that share similar characteristics.

Example of linear regression (Source: stackoverflow.com)
Example of linear regression (Source: stackoverflow.com)

Closing Remarks

Current technology enables us to collect and process limitless amounts of data to put into great use, including to improve an education system. Having multiple digital platforms, I am excited to see how data science can help Indonesia’s Ministry of Education ensure seamless execution for the 2045 roadmap. The models and techniques discussed in this article are just some examples, there are still many more ways to explore and analyze progress in education development.

However, the main challenge is actually interpreting and communicating the data insights so the right decisions can be made. There will always be cases where stakeholders choose to deny such findings due to strong beliefs in current actions. Some may also focus on achieving metrics by taking short cuts like intensive lessons before aptitude tests or report lessons not according to reality. Another challenge is the potential public protests from having pilot and non-pilot schools, as many stakeholders may have negative perspectives on being the subject of an experiment or vice versa.

Nevertheless, having a strong and brilliant leader like Nadiem Makarim, I strongly believe that this roadmap can be achieved and data science will have a key part in it. From personal experience, I sincerely hope that Indonesia’s next generation; in every province, city, town and village; will see school as a fun place to learn and never a mere obligation.


Related Articles