3. Teacher-Centric Descriptive Analysis#
Show code cell source
import pandas as pd
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool, LinearAxis, Range1d, Span, Label
from bokeh.layouts import gridplot, column
from bokeh.palettes import Spectral8
data = pd.read_csv('/Users/rachel/Library/CloudStorage/Dropbox-TLPSummerInterns/TLP Summer Intern Folder/Zhou/CODE - MPS_data_july_2023/mps_student_activity.tsv', sep='\t', on_bad_lines='skip')
data_detail = pd.read_csv('/Users/rachel/Library/CloudStorage/Dropbox-TLPSummerInterns/TLP Summer Intern Folder/Zhou/CODE - MPS_data_july_2023/mps_student_activity_detail.tsv', sep='\t',on_bad_lines='skip')
Show code cell source
data_sorted = data_detail.sort_values(by='school_year')
years = data_sorted['school_year'].unique()
def recruitment_analysis(df: pd.DataFrame) -> pd.DataFrame:
"""function to calculate recruitment metrics based on truly new teachers."""
years = df['school_year'].unique()
all_teachers = set()
new_teachers = {}
total_teachers = {}
for year in sorted(years):
current_year_teachers = set(df[df['school_year'] == year]['teacher_user_id'].unique())
new_this_year = current_year_teachers - all_teachers
new_teachers[year] = len(new_this_year)
all_teachers = all_teachers.union(current_year_teachers)
total_teachers[year] = len(all_teachers)
new_teachers_series = pd.Series(new_teachers)
percentage_increase = new_teachers_series.pct_change() * 100
recruitment_df = pd.DataFrame({
'New Teachers': new_teachers_series,
'Percentage Increase (%)': percentage_increase.round(1),
'Cumulative Teachers': pd.Series(total_teachers)
})
return recruitment_df
def retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
"""Function to calculate retention metrics."""
retained_teachers_dict = {}
retention_rate_dict = {}
for i, year in enumerate(years):
current_year_teachers = set(df[df['school_year'] == year]['teacher_user_id'].unique())
if i == len(years) - 1: # For the last year, there's no "next year" for it
retained_teachers_dict[year] = np.nan
retention_rate_dict[year] = np.nan
else:
next_year_teachers = set(df[df['school_year'] == years[i+1]]['teacher_user_id'].unique())
retained_this_year = current_year_teachers.intersection(next_year_teachers)
retained_teachers_dict[year] = len(retained_this_year)
retention_rate_dict[year] = len(retained_this_year) / len(current_year_teachers) * 100
retention_df = pd.DataFrame({
'Retained Teachers': pd.Series(retained_teachers_dict),
'Retention Rate (%)': pd.Series(retention_rate_dict).round(1)
}) # Rounding to one decimal place
return retention_df
def compute_tenure_distribution(df: pd.DataFrame) -> pd.Series:
"""Function to compute the distribution of maximum tenure for teachers."""
teacher_years = df.groupby('teacher_user_id')['school_year'].unique()
tenure_max = teacher_years.apply(len)
distribution = tenure_max.value_counts().sort_index()
return distribution
3.1 Teacher Recruitment and Retention Overview#
Yearly Recruitment Analysis
New Teachers: Represents the number of teachers that were genuinely new in each school year, i.e., teachers who had not appeared in the dataset before that specific year.
Percentage Increase (%): Shows the percentage change in the number of new teachers from the previous year to the current year.
Culmulative Teachers: Represents the total number of unique teachers that have been recorded up to and including the current school year.
recruitment_df = recruitment_analysis(data_sorted)
recruitment_df
| New Teachers | Percentage Increase (%) | Cumulative Teachers | |
|---|---|---|---|
| 2014-15 | 2 | NaN | 2 |
| 2015-16 | 12 | 500.0 | 14 |
| 2016-17 | 25 | 108.3 | 39 |
| 2017-18 | 27 | 8.0 | 66 |
| 2018-19 | 26 | -3.7 | 92 |
| 2019-20 | 29 | 11.5 | 121 |
| 2020-21 | 46 | 58.6 | 167 |
| 2021-22 | 72 | 56.5 | 239 |
| 2022-23 | 50 | -30.6 | 289 |
Insights
Consistent Increase in Recruitment : From 2015-16 to 2021-22, there’s a consistent increase in the recruitment of new teachers each year, with particularly notable jumps in 2016-17 (108.3% increase) and 2020-21 (58.6% increase).
Slight Dip in 2022-23 : In 2022-23, there’s a decrease in new teacher recruitment by 30.6%, which might suggest a saturation point, a shift in school policies, or external factors influencing recruitment.
Yearly Retention Analysis
Retained Teachers: Indicates the number of teachers from the current year who continued into the next year.
Retention Rate (%): The percentage of teachers from the current year who continued into the next year.
retention_df = retention_analysis(data_sorted)
retention_df
| Retained Teachers | Retention Rate (%) | |
|---|---|---|
| 2014-15 | 1.0 | 50.0 |
| 2015-16 | 6.0 | 46.2 |
| 2016-17 | 13.0 | 40.6 |
| 2017-18 | 18.0 | 43.9 |
| 2018-19 | 23.0 | 45.1 |
| 2019-20 | 24.0 | 42.9 |
| 2020-21 | 43.0 | 56.6 |
| 2021-22 | 45.0 | 36.9 |
| 2022-23 | NaN | NaN |
Insights
The significant drop in the retention rate in 2021-22 is a point of concern and might need targeted interventions or deeper analysis to understand the root causes.
Despite the fluctuations in the retention rate, the absolute number of retained teachers has seen a consistent increase until 2021-22. This indicates that the overall size of the teaching cohort is growing, even if the percentage of retained teachers varies.
3.2 Teaching Experience Analysis#
The visualization presents a bar plot showcasing the distribution of the maximum tenure (in years) of teachers on the platform. Each bar represents the number of teachers who have a specific tenure length, ranging from 1 to 7 years.
Show code cell source
output_notebook()
# Computing the tenure distribution data again
tenure_distribution = compute_tenure_distribution(data_sorted)
# Create a new plot with a title and axis labels
p = figure(title="Distribution of Maximum Tenure for Teachers",
x_axis_label='Maximum Tenure (Years)',
y_axis_label='Number of Teachers',
tools="pan,box_zoom,reset,save",
width=600, height=400)
# Data source for the plot
source = ColumnDataSource(data=dict(
tenure=tenure_distribution.index,
count=tenure_distribution.values
))
# Add a hover tool for better interactivity
hover = HoverTool()
hover.tooltips = [("Maximum Tenure", "@tenure"), ("Number of Teachers", "@count")]
p.add_tools(hover)
# Add a bar renderer with legend
p.vbar(x='tenure', top='count', width=0.5, source=source, legend_label="Teachers", color=Spectral8[0])
# Show the plot
show(p)
Insights
Predominance of Short Tenures: The majority of teachers (170) have a tenure of just one school year on the platform. This could indicate a high turnover rate or a large influx of new teachers.
Declining Long Tenures: As the tenure length increases, the number of teachers with that specific tenure length decreases. Only one teacher has the longest association of 7 years. This trend signifies that fewer teachers have sustained long-term associations with the platform.
Room for Improvement: The steep drop from 1-year to longer tenures suggests potential areas for improvement in teacher retention strategies. A deeper dive into the reasons for shorter tenures could provide insights into areas of enhancement.
3.3 Platform Usage and Course Exploration#
Educator Trends and Platform Adoption#
The solid line representing the “Number of Teachers” provides insight into the growth of the platform over the years. A steadily rising line would suggest a growing user base of teachers on the platform.
The dotted line, representing the average number of students per teacher, provides insight into the teacher-student ratio on the platform.
The red dashed line indicates the onset of the COVID-19 pandemic.
Show code cell source
# Grouping data and calculating metrics
teacher_yearly_students = data_sorted.groupby(['school_year', 'teacher_user_id']).agg(
unique_student_ids=('student_user_id', 'nunique')
).reset_index()
updated_yearly_summary = teacher_yearly_students.groupby('school_year').agg(
num_teachers=('teacher_user_id', 'nunique'),
avg_students_per_teacher=('unique_student_ids', 'mean')
).reset_index()
covid_onset_year = "2019-20"
covid_onset_position = updated_yearly_summary['school_year'].tolist().index(covid_onset_year) + 0.5
p = figure(title="Yearly Trends: Number of Teachers vs. Average Students per Teacher",
x_axis_label='School Year', y_axis_label='Number of Teachers',
width=650, height=450, tools="", toolbar_location=None,
x_range=updated_yearly_summary['school_year'].tolist())
# Add the primary line and markers (Number of Teachers)
p.line(updated_yearly_summary['school_year'], updated_yearly_summary['num_teachers'],
legend_label="Number of Teachers", line_width=2, line_color=Spectral8[0], line_dash="solid")
p.circle(updated_yearly_summary['school_year'], updated_yearly_summary['num_teachers'], color=Spectral8[0], size=8)
# Add the secondary line and markers (Average Students per Teacher) with secondary y-axis
p.extra_y_ranges = {"avg_students": Range1d(start=0, end=updated_yearly_summary['avg_students_per_teacher'].max() + 10)}
p.add_layout(LinearAxis(y_range_name="avg_students", axis_label="Average Students Associated with Each Teacher"), 'right')
p.line(updated_yearly_summary['school_year'], updated_yearly_summary['avg_students_per_teacher'],
legend_label="Avg. Students per Teacher", line_width=2, line_color=Spectral8[1], line_dash="dotted", y_range_name="avg_students")
p.square(updated_yearly_summary['school_year'], updated_yearly_summary['avg_students_per_teacher'], color=Spectral8[1], size=8, y_range_name="avg_students")
# Adding a vertical line to indicate the onset of the COVID-19 pandemic at the corrected position
vline = Span(location=covid_onset_position, dimension='height', line_color='red', line_width=1.5, line_dash='dashed')
p.renderers.extend([vline])
label = Label(x=covid_onset_position, y=0, y_units='screen', text="Onset of COVID-19", text_font_size="10pt", text_baseline="bottom", text_align="left", text_color="red")
p.add_layout(label)
# Other customizations
p.xaxis.major_label_orientation = "vertical"
p.legend.location = "top_left"
p.grid.grid_line_alpha = 0.3
# Show the plot
show(p)
Insights
By comparing trends before and after the onset of COVID-19, there is an upward trend in teacher invovled in this platform but decrease in the average number of students per teacher.
There are several potential reasons for this scenario:
Increasing Teacher with Smaller Classes: The rise in the number of teachers but decrease in the average students per teacher could suggest that more teachers are joining the platform, but each teacher is dealing with fewer students. This could be due to the integration of the platform in smaller or specialized classrooms.
Spread of Usage: Previously, a smaller number of teachers might have been using the platform extensively with a larger number of students. As more teachers have adopted the platform, usage has spread out, reducing the average number of students per teacher even as the total number of students remains constant or grows.
Platform Usage Pattern: It could also suggest a shift in how the platform is used. Instead of being used as the primary teaching tool for large classes, teachers might be using it for smaller, specialized groups, supplementary lessons, or remedial classes.
Comprehensive Course and Script Coverage#
The visualization provides insights into the distribution of scripts covered by teachers for three distinct courses: CS Fundamentals (K-5), CS Discoveries (6-10), CS Principles (AP), across different school years.Each plot corresponds to a course, with the x-axis representing the school year and the y-axis denoting the number of scripts covered by the teachers.
In each plot:
Circles represent the count of teachers that covered a specific number of scripts in a particular year. The size of a circle indicates the number of teachers; a larger circle means more teachers covered that specific number of scripts. Colors help distinguish between different school years, offering a visual cue to yearly trends in script coverage.
Show code cell source
def generate_course_coverage_plots(data: pd.DataFrame) -> None:
"""Generate a grid of plots showcasing script coverage by school year for each course."""
course_names = ['csf', 'csd', 'csp']
course_full_names = {
'csf': 'CS Fundamentals (K-5)',
'csp': 'CS Principles (AP)',
'csd': 'CS Discoveries (6-10)',
}
plots = []
global_max_counts = data.groupby(['school_year', 'teacher_user_id', 'course_name']).agg({'script_id': 'nunique'}).reset_index().groupby(['school_year', 'script_id']).size().max()
# Get unique school years from the whole dataframe and create a color map based on all school years
all_school_years = sorted(data['school_year'].unique())
color_map = {year: Spectral8[i % len(Spectral8)] for i, year in enumerate(all_school_years)}
# Create a separate plot for each course
for course in course_names:
df_course = data[data['course_name'] == course]
df_course = df_course.dropna(subset=['script_id'])
df_grouped = df_course.groupby(['school_year','teacher_user_id']).agg({'script_id': 'nunique'}).reset_index()
df_grouped = df_grouped.sort_values(by='school_year')
counts = df_grouped.groupby(['school_year','script_id']).size().reset_index(name='counts')
counts['school_year'] = pd.Categorical(counts['school_year'], categories=sorted(counts['school_year'].unique()), ordered=True)
# Map colors to 'school_year' based on the color map
counts['color'] = counts['school_year'].map(color_map)
# Normalize 'counts' for size of circles in plot based on the global maximum
counts['size'] = counts['counts'] / global_max_counts * 30
source = ColumnDataSource(counts)
p = figure(x_range=sorted(counts['school_year'].unique()),
y_range=(counts['script_id'].min(), counts['script_id'].max()),
width=450, height=500,
title=f'Script coverage by school year for {course_full_names[course]}',
x_axis_label='School Year', y_axis_label='Number of scripts covered by the teacher',
tools='pan,wheel_zoom,xbox_select,reset')
hover = HoverTool(tooltips=[
("school_year", "@school_year"),
("number of scripts covered", "@script_id"),
("teacher counts", "@counts")
])
p.add_tools(hover)
# Add a circle glyph with 'school_year' and 'script_id' as coordinates, size as size and color mapped to 'counts'
p.circle(x='school_year', y='script_id', size='size',
color='color',
source=source)
plots.append(p)
grid = gridplot([plots])
show(grid)
generate_course_coverage_plots(data)
Insights
Consistent Popularity of CS Fundamentals: The ‘CS Fundamentals (K-5)’ course seems to enjoy consistent popularity among teachers, with many covering a broad range of scripts across different years. This could indicate that it’s a foundational course widely adopted in early education settings.
Variations in Advanced Courses: Both ‘CS Principles (AP)’ and ‘CS Discoveries (6-10)’ display significant adoption, but there are observable variations year over year. Some years show a higher number of teachers covering more scripts, indicating possible fluctuations in the curriculum’s relevance or external factors influencing its adoption.
Depth of Course Coverage: The y-axis placement of circles, especially those towards the top, indicates that many teachers delve deep into the courses, covering a majority of the available scripts. This suggests a comprehensive engagement with the platform’s content.
In summary, while the platform’s courses are widely adopted and explored in depth by many teachers, there are variations in how different courses are embraced year over year. These insights can guide platform administrators or educators in refining course content, understanding teacher preferences, and making informed decisions about curriculum development or platform enhancements.