3. Teacher-Centric Descriptive Analysis#

Hide code cell source
import pandas as pd
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure, show
from bokeh.models import ColumnDataSource, HoverTool, LinearAxis, Range1d, Span, Label
from bokeh.layouts import gridplot, column
from bokeh.palettes import Spectral8

data = pd.read_csv('/Users/rachel/Library/CloudStorage/Dropbox-TLPSummerInterns/TLP Summer Intern Folder/Zhou/CODE - MPS_data_july_2023/mps_student_activity.tsv', sep='\t', on_bad_lines='skip')
data_detail = pd.read_csv('/Users/rachel/Library/CloudStorage/Dropbox-TLPSummerInterns/TLP Summer Intern Folder/Zhou/CODE - MPS_data_july_2023/mps_student_activity_detail.tsv', sep='\t',on_bad_lines='skip')
Hide code cell source
data_sorted = data_detail.sort_values(by='school_year')
years = data_sorted['school_year'].unique()

def recruitment_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """function to calculate recruitment metrics based on truly new teachers."""
    years = df['school_year'].unique()
    all_teachers = set()
    new_teachers = {}
    total_teachers = {}
    
    for year in sorted(years):
        current_year_teachers = set(df[df['school_year'] == year]['teacher_user_id'].unique())
        new_this_year = current_year_teachers - all_teachers
        new_teachers[year] = len(new_this_year)
        all_teachers = all_teachers.union(current_year_teachers)
        total_teachers[year] = len(all_teachers)
    
    new_teachers_series = pd.Series(new_teachers)
    percentage_increase = new_teachers_series.pct_change() * 100
    
    recruitment_df = pd.DataFrame({
        'New Teachers': new_teachers_series,
        'Percentage Increase (%)': percentage_increase.round(1),
        'Cumulative Teachers': pd.Series(total_teachers)
    })
    
    return recruitment_df


def retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """Function to calculate retention metrics."""
    retained_teachers_dict = {}
    retention_rate_dict = {}
    
    for i, year in enumerate(years):
        current_year_teachers = set(df[df['school_year'] == year]['teacher_user_id'].unique())
        if i == len(years) - 1:  # For the last year, there's no "next year" for it
            retained_teachers_dict[year] = np.nan
            retention_rate_dict[year] = np.nan
        else:
            next_year_teachers = set(df[df['school_year'] == years[i+1]]['teacher_user_id'].unique())
            retained_this_year = current_year_teachers.intersection(next_year_teachers)
            retained_teachers_dict[year] = len(retained_this_year)
            retention_rate_dict[year] = len(retained_this_year) / len(current_year_teachers) * 100
    
    retention_df = pd.DataFrame({
        'Retained Teachers': pd.Series(retained_teachers_dict),
        'Retention Rate (%)': pd.Series(retention_rate_dict).round(1)
    })  # Rounding to one decimal place
    
    return retention_df

def compute_tenure_distribution(df: pd.DataFrame) -> pd.Series:
    """Function to compute the distribution of maximum tenure for teachers."""
    teacher_years = df.groupby('teacher_user_id')['school_year'].unique()
    
    tenure_max = teacher_years.apply(len)
    
    distribution = tenure_max.value_counts().sort_index()
    
    return distribution

3.1 Teacher Recruitment and Retention Overview#

Yearly Recruitment Analysis

New Teachers: Represents the number of teachers that were genuinely new in each school year, i.e., teachers who had not appeared in the dataset before that specific year.

Percentage Increase (%): Shows the percentage change in the number of new teachers from the previous year to the current year.

Culmulative Teachers: Represents the total number of unique teachers that have been recorded up to and including the current school year.

recruitment_df = recruitment_analysis(data_sorted)
recruitment_df
New Teachers Percentage Increase (%) Cumulative Teachers
2014-15 2 NaN 2
2015-16 12 500.0 14
2016-17 25 108.3 39
2017-18 27 8.0 66
2018-19 26 -3.7 92
2019-20 29 11.5 121
2020-21 46 58.6 167
2021-22 72 56.5 239
2022-23 50 -30.6 289

Insights

  1. Consistent Increase in Recruitment : From 2015-16 to 2021-22, there’s a consistent increase in the recruitment of new teachers each year, with particularly notable jumps in 2016-17 (108.3% increase) and 2020-21 (58.6% increase).

  2. Slight Dip in 2022-23 : In 2022-23, there’s a decrease in new teacher recruitment by 30.6%, which might suggest a saturation point, a shift in school policies, or external factors influencing recruitment.

Yearly Retention Analysis

Retained Teachers: Indicates the number of teachers from the current year who continued into the next year.

Retention Rate (%): The percentage of teachers from the current year who continued into the next year.

retention_df = retention_analysis(data_sorted)
retention_df
Retained Teachers Retention Rate (%)
2014-15 1.0 50.0
2015-16 6.0 46.2
2016-17 13.0 40.6
2017-18 18.0 43.9
2018-19 23.0 45.1
2019-20 24.0 42.9
2020-21 43.0 56.6
2021-22 45.0 36.9
2022-23 NaN NaN

Insights

The significant drop in the retention rate in 2021-22 is a point of concern and might need targeted interventions or deeper analysis to understand the root causes.

Despite the fluctuations in the retention rate, the absolute number of retained teachers has seen a consistent increase until 2021-22. This indicates that the overall size of the teaching cohort is growing, even if the percentage of retained teachers varies.

3.2 Teaching Experience Analysis#

The visualization presents a bar plot showcasing the distribution of the maximum tenure (in years) of teachers on the platform. Each bar represents the number of teachers who have a specific tenure length, ranging from 1 to 7 years.

Hide code cell source
output_notebook()

# Computing the tenure distribution data again
tenure_distribution = compute_tenure_distribution(data_sorted)

# Create a new plot with a title and axis labels
p = figure(title="Distribution of Maximum Tenure for Teachers", 
           x_axis_label='Maximum Tenure (Years)', 
           y_axis_label='Number of Teachers', 
           tools="pan,box_zoom,reset,save",
           width=600, height=400)

# Data source for the plot
source = ColumnDataSource(data=dict(
    tenure=tenure_distribution.index,
    count=tenure_distribution.values
))

# Add a hover tool for better interactivity
hover = HoverTool()
hover.tooltips = [("Maximum Tenure", "@tenure"), ("Number of Teachers", "@count")]
p.add_tools(hover)

# Add a bar renderer with legend
p.vbar(x='tenure', top='count', width=0.5, source=source, legend_label="Teachers", color=Spectral8[0])

# Show the plot
show(p)
Loading BokehJS ...

Insights

  1. Predominance of Short Tenures: The majority of teachers (170) have a tenure of just one school year on the platform. This could indicate a high turnover rate or a large influx of new teachers.

  2. Declining Long Tenures: As the tenure length increases, the number of teachers with that specific tenure length decreases. Only one teacher has the longest association of 7 years. This trend signifies that fewer teachers have sustained long-term associations with the platform.

  3. Room for Improvement: The steep drop from 1-year to longer tenures suggests potential areas for improvement in teacher retention strategies. A deeper dive into the reasons for shorter tenures could provide insights into areas of enhancement.

3.3 Platform Usage and Course Exploration#

Comprehensive Course and Script Coverage#

The visualization provides insights into the distribution of scripts covered by teachers for three distinct courses: CS Fundamentals (K-5), CS Discoveries (6-10), CS Principles (AP), across different school years.Each plot corresponds to a course, with the x-axis representing the school year and the y-axis denoting the number of scripts covered by the teachers.

In each plot:

Circles represent the count of teachers that covered a specific number of scripts in a particular year. The size of a circle indicates the number of teachers; a larger circle means more teachers covered that specific number of scripts. Colors help distinguish between different school years, offering a visual cue to yearly trends in script coverage.

Hide code cell source
def generate_course_coverage_plots(data: pd.DataFrame) -> None:
    """Generate a grid of plots showcasing script coverage by school year for each course."""
    
    course_names = ['csf', 'csd', 'csp']
    course_full_names = {
        'csf': 'CS Fundamentals (K-5)',
        'csp': 'CS Principles (AP)',
        'csd': 'CS Discoveries (6-10)',
    }

    plots = []

    global_max_counts = data.groupby(['school_year', 'teacher_user_id', 'course_name']).agg({'script_id': 'nunique'}).reset_index().groupby(['school_year', 'script_id']).size().max()

    # Get unique school years from the whole dataframe and create a color map based on all school years
    all_school_years = sorted(data['school_year'].unique())
    color_map = {year: Spectral8[i % len(Spectral8)] for i, year in enumerate(all_school_years)}

    # Create a separate plot for each course
    for course in course_names:

        df_course = data[data['course_name'] == course]
        df_course = df_course.dropna(subset=['script_id'])
        df_grouped = df_course.groupby(['school_year','teacher_user_id']).agg({'script_id': 'nunique'}).reset_index()
        df_grouped = df_grouped.sort_values(by='school_year')
        counts = df_grouped.groupby(['school_year','script_id']).size().reset_index(name='counts')
        counts['school_year'] = pd.Categorical(counts['school_year'], categories=sorted(counts['school_year'].unique()), ordered=True)

        # Map colors to 'school_year' based on the color map
        counts['color'] = counts['school_year'].map(color_map)

        # Normalize 'counts' for size of circles in plot based on the global maximum
        counts['size'] = counts['counts'] / global_max_counts * 30
        source = ColumnDataSource(counts)

        p = figure(x_range=sorted(counts['school_year'].unique()), 
                y_range=(counts['script_id'].min(), counts['script_id'].max()), 
                width=450, height=500,
                title=f'Script coverage by school year for {course_full_names[course]}',
                x_axis_label='School Year', y_axis_label='Number of scripts covered by the teacher',
                tools='pan,wheel_zoom,xbox_select,reset')
        hover = HoverTool(tooltips=[
            ("school_year", "@school_year"),
            ("number of scripts covered", "@script_id"),
            ("teacher counts", "@counts")
            ])
        p.add_tools(hover)

        # Add a circle glyph with 'school_year' and 'script_id' as coordinates, size as size and color mapped to 'counts'
        p.circle(x='school_year', y='script_id', size='size', 
                color='color',
                source=source)

        plots.append(p)

    grid = gridplot([plots])
    show(grid)
    
generate_course_coverage_plots(data)

Insights

  1. Consistent Popularity of CS Fundamentals: The ‘CS Fundamentals (K-5)’ course seems to enjoy consistent popularity among teachers, with many covering a broad range of scripts across different years. This could indicate that it’s a foundational course widely adopted in early education settings.

  2. Variations in Advanced Courses: Both ‘CS Principles (AP)’ and ‘CS Discoveries (6-10)’ display significant adoption, but there are observable variations year over year. Some years show a higher number of teachers covering more scripts, indicating possible fluctuations in the curriculum’s relevance or external factors influencing its adoption.

  3. Depth of Course Coverage: The y-axis placement of circles, especially those towards the top, indicates that many teachers delve deep into the courses, covering a majority of the available scripts. This suggests a comprehensive engagement with the platform’s content.

In summary, while the platform’s courses are widely adopted and explored in depth by many teachers, there are variations in how different courses are embraced year over year. These insights can guide platform administrators or educators in refining course content, understanding teacher preferences, and making informed decisions about curriculum development or platform enhancements.