3. Teacher-Centric Descriptive Analysis

3. Teacher-Centric Descriptive Analysis#

Show code cell source Hide code cell source

data_sorted = data_detail.sort_values(by='school_year')
years = data_sorted['school_year'].unique()

def recruitment_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """function to calculate recruitment metrics based on truly new teachers."""
    years = df['school_year'].unique()
    all_teachers = set()
    new_teachers = {}
    total_teachers = {}
    
    for year in sorted(years):
        current_year_teachers = set(df[df['school_year'] == year]['teacher_user_id'].unique())
        new_this_year = current_year_teachers - all_teachers
        new_teachers[year] = len(new_this_year)
        all_teachers = all_teachers.union(current_year_teachers)
        total_teachers[year] = len(all_teachers)
    
    new_teachers_series = pd.Series(new_teachers)
    percentage_increase = new_teachers_series.pct_change() * 100
    
    recruitment_df = pd.DataFrame({
        'New Teachers': new_teachers_series,
        'Percentage Increase (%)': percentage_increase.round(1),
        'Cumulative Teachers': pd.Series(total_teachers)
    })
    
    return recruitment_df


def retention_analysis(df: pd.DataFrame) -> pd.DataFrame:
    """Function to calculate retention metrics."""
    retained_teachers_dict = {}
    retention_rate_dict = {}
    
    for i, year in enumerate(years):
        current_year_teachers = set(df[df['school_year'] == year]['teacher_user_id'].unique())
        if i == len(years) - 1:  # For the last year, there's no "next year" for it
            retained_teachers_dict[year] = np.nan
            retention_rate_dict[year] = np.nan
        else:
            next_year_teachers = set(df[df['school_year'] == years[i+1]]['teacher_user_id'].unique())
            retained_this_year = current_year_teachers.intersection(next_year_teachers)
            retained_teachers_dict[year] = len(retained_this_year)
            retention_rate_dict[year] = len(retained_this_year) / len(current_year_teachers) * 100
    
    retention_df = pd.DataFrame({
        'Retained Teachers': pd.Series(retained_teachers_dict),
        'Retention Rate (%)': pd.Series(retention_rate_dict).round(1)
    })  # Rounding to one decimal place
    
    return retention_df

def compute_tenure_distribution(df: pd.DataFrame) -> pd.Series:
    """Function to compute the distribution of maximum tenure for teachers."""
    teacher_years = df.groupby('teacher_user_id')['school_year'].unique()
    
    tenure_max = teacher_years.apply(len)
    
    distribution = tenure_max.value_counts().sort_index()
    
    return distribution

3.1 Teacher Recruitment and Retention Overview#

Yearly Recruitment Analysis

New Teachers: Represents the number of teachers that were genuinely new in each school year, i.e., teachers who had not appeared in the dataset before that specific year.

Percentage Increase (%): Shows the percentage change in the number of new teachers from the previous year to the current year.

Culmulative Teachers: Represents the total number of unique teachers that have been recorded up to and including the current school year.

recruitment_df = recruitment_analysis(data_sorted)
recruitment_df

	New Teachers	Percentage Increase (%)	Cumulative Teachers
2014-15	2	NaN	2
2015-16	12	500.0	14
2016-17	25	108.3	39
2017-18	27	8.0	66
2018-19	26	-3.7	92
2019-20	29	11.5	121
2020-21	46	58.6	167
2021-22	72	56.5	239
2022-23	50	-30.6	289

Insights

Consistent Increase in Recruitment : From 2015-16 to 2021-22, there’s a consistent increase in the recruitment of new teachers each year, with particularly notable jumps in 2016-17 (108.3% increase) and 2020-21 (58.6% increase).
Slight Dip in 2022-23 : In 2022-23, there’s a decrease in new teacher recruitment by 30.6%, which might suggest a saturation point, a shift in school policies, or external factors influencing recruitment.

Yearly Retention Analysis

Retained Teachers: Indicates the number of teachers from the current year who continued into the next year.

Retention Rate (%): The percentage of teachers from the current year who continued into the next year.

retention_df = retention_analysis(data_sorted)
retention_df

	Retained Teachers	Retention Rate (%)
2014-15	1.0	50.0
2015-16	6.0	46.2
2016-17	13.0	40.6
2017-18	18.0	43.9
2018-19	23.0	45.1
2019-20	24.0	42.9
2020-21	43.0	56.6
2021-22	45.0	36.9
2022-23	NaN	NaN

Insights

The significant drop in the retention rate in 2021-22 is a point of concern and might need targeted interventions or deeper analysis to understand the root causes.

Despite the fluctuations in the retention rate, the absolute number of retained teachers has seen a consistent increase until 2021-22. This indicates that the overall size of the teaching cohort is growing, even if the percentage of retained teachers varies.

3.2 Teaching Experience Analysis#

The visualization presents a bar plot showcasing the distribution of the maximum tenure (in years) of teachers on the platform. Each bar represents the number of teachers who have a specific tenure length, ranging from 1 to 7 years.

Loading BokehJS ...

Insights

Predominance of Short Tenures: The majority of teachers (170) have a tenure of just one school year on the platform. This could indicate a high turnover rate or a large influx of new teachers.
Declining Long Tenures: As the tenure length increases, the number of teachers with that specific tenure length decreases. Only one teacher has the longest association of 7 years. This trend signifies that fewer teachers have sustained long-term associations with the platform.
Room for Improvement: The steep drop from 1-year to longer tenures suggests potential areas for improvement in teacher retention strategies. A deeper dive into the reasons for shorter tenures could provide insights into areas of enhancement.

3.3 Platform Usage and Course Exploration#

Educator Trends and Platform Adoption#

The solid line representing the “Number of Teachers” provides insight into the growth of the platform over the years. A steadily rising line would suggest a growing user base of teachers on the platform.

The dotted line, representing the average number of students per teacher, provides insight into the teacher-student ratio on the platform.

The red dashed line indicates the onset of the COVID-19 pandemic.

Show code cell source Hide code cell source

# Grouping data and calculating metrics
teacher_yearly_students = data_sorted.groupby(['school_year', 'teacher_user_id']).agg(
    unique_student_ids=('student_user_id', 'nunique')
).reset_index()

updated_yearly_summary = teacher_yearly_students.groupby('school_year').agg(
    num_teachers=('teacher_user_id', 'nunique'),
    avg_students_per_teacher=('unique_student_ids', 'mean')
).reset_index()

covid_onset_year = "2019-20"
covid_onset_position = updated_yearly_summary['school_year'].tolist().index(covid_onset_year) + 0.5

p = figure(title="Yearly Trends: Number of Teachers vs. Average Students per Teacher", 
           x_axis_label='School Year', y_axis_label='Number of Teachers', 
           width=650, height=450, tools="", toolbar_location=None, 
           x_range=updated_yearly_summary['school_year'].tolist())

# Add the primary line and markers (Number of Teachers)
p.line(updated_yearly_summary['school_year'], updated_yearly_summary['num_teachers'], 
       legend_label="Number of Teachers", line_width=2, line_color=Spectral8[0], line_dash="solid")
p.circle(updated_yearly_summary['school_year'], updated_yearly_summary['num_teachers'], color=Spectral8[0], size=8)

# Add the secondary line and markers (Average Students per Teacher) with secondary y-axis
p.extra_y_ranges = {"avg_students": Range1d(start=0, end=updated_yearly_summary['avg_students_per_teacher'].max() + 10)}
p.add_layout(LinearAxis(y_range_name="avg_students", axis_label="Average Students Associated with Each Teacher"), 'right')
p.line(updated_yearly_summary['school_year'], updated_yearly_summary['avg_students_per_teacher'], 
       legend_label="Avg. Students per Teacher", line_width=2, line_color=Spectral8[1], line_dash="dotted", y_range_name="avg_students")
p.square(updated_yearly_summary['school_year'], updated_yearly_summary['avg_students_per_teacher'], color=Spectral8[1], size=8, y_range_name="avg_students")

# Adding a vertical line to indicate the onset of the COVID-19 pandemic at the corrected position
vline = Span(location=covid_onset_position, dimension='height', line_color='red', line_width=1.5, line_dash='dashed')
p.renderers.extend([vline])
label = Label(x=covid_onset_position, y=0, y_units='screen', text="Onset of COVID-19", text_font_size="10pt", text_baseline="bottom", text_align="left", text_color="red")
p.add_layout(label)

# Other customizations
p.xaxis.major_label_orientation = "vertical"
p.legend.location = "top_left"
p.grid.grid_line_alpha = 0.3

# Show the plot
show(p)

Insights

By comparing trends before and after the onset of COVID-19, there is an upward trend in teacher invovled in this platform but decrease in the average number of students per teacher.

There are several potential reasons for this scenario:

Increasing Teacher with Smaller Classes: The rise in the number of teachers but decrease in the average students per teacher could suggest that more teachers are joining the platform, but each teacher is dealing with fewer students. This could be due to the integration of the platform in smaller or specialized classrooms.
Spread of Usage: Previously, a smaller number of teachers might have been using the platform extensively with a larger number of students. As more teachers have adopted the platform, usage has spread out, reducing the average number of students per teacher even as the total number of students remains constant or grows.
Platform Usage Pattern: It could also suggest a shift in how the platform is used. Instead of being used as the primary teaching tool for large classes, teachers might be using it for smaller, specialized groups, supplementary lessons, or remedial classes.

Comprehensive Course and Script Coverage#

The visualization provides insights into the distribution of scripts covered by teachers for three distinct courses: CS Fundamentals (K-5), CS Discoveries (6-10), CS Principles (AP), across different school years.Each plot corresponds to a course, with the x-axis representing the school year and the y-axis denoting the number of scripts covered by the teachers.

In each plot:

Circles represent the count of teachers that covered a specific number of scripts in a particular year. The size of a circle indicates the number of teachers; a larger circle means more teachers covered that specific number of scripts. Colors help distinguish between different school years, offering a visual cue to yearly trends in script coverage.

Show code cell source Hide code cell source

def generate_course_coverage_plots(data: pd.DataFrame) -> None:
    """Generate a grid of plots showcasing script coverage by school year for each course."""
    
    course_names = ['csf', 'csd', 'csp']
    course_full_names = {
        'csf': 'CS Fundamentals (K-5)',
        'csp': 'CS Principles (AP)',
        'csd': 'CS Discoveries (6-10)',
    }

    plots = []

    global_max_counts = data.groupby(['school_year', 'teacher_user_id', 'course_name']).agg({'script_id': 'nunique'}).reset_index().groupby(['school_year', 'script_id']).size().max()

    # Get unique school years from the whole dataframe and create a color map based on all school years
    all_school_years = sorted(data['school_year'].unique())
    color_map = {year: Spectral8[i % len(Spectral8)] for i, year in enumerate(all_school_years)}

    # Create a separate plot for each course
    for course in course_names:

        df_course = data[data['course_name'] == course]
        df_course = df_course.dropna(subset=['script_id'])
        df_grouped = df_course.groupby(['school_year','teacher_user_id']).agg({'script_id': 'nunique'}).reset_index()
        df_grouped = df_grouped.sort_values(by='school_year')
        counts = df_grouped.groupby(['school_year','script_id']).size().reset_index(name='counts')
        counts['school_year'] = pd.Categorical(counts['school_year'], categories=sorted(counts['school_year'].unique()), ordered=True)

        # Map colors to 'school_year' based on the color map
        counts['color'] = counts['school_year'].map(color_map)

        # Normalize 'counts' for size of circles in plot based on the global maximum
        counts['size'] = counts['counts'] / global_max_counts * 30
        source = ColumnDataSource(counts)

        p = figure(x_range=sorted(counts['school_year'].unique()), 
                y_range=(counts['script_id'].min(), counts['script_id'].max()), 
                width=450, height=500,
                title=f'Script coverage by school year for {course_full_names[course]}',
                x_axis_label='School Year', y_axis_label='Number of scripts covered by the teacher',
                tools='pan,wheel_zoom,xbox_select,reset')
        hover = HoverTool(tooltips=[
            ("school_year", "@school_year"),
            ("number of scripts covered", "@script_id"),
            ("teacher counts", "@counts")
            ])
        p.add_tools(hover)

        # Add a circle glyph with 'school_year' and 'script_id' as coordinates, size as size and color mapped to 'counts'
        p.circle(x='school_year', y='script_id', size='size', 
                color='color',
                source=source)

        plots.append(p)

    grid = gridplot([plots])
    show(grid)
    
generate_course_coverage_plots(data)

Insights

Consistent Popularity of CS Fundamentals: The ‘CS Fundamentals (K-5)’ course seems to enjoy consistent popularity among teachers, with many covering a broad range of scripts across different years. This could indicate that it’s a foundational course widely adopted in early education settings.
Variations in Advanced Courses: Both ‘CS Principles (AP)’ and ‘CS Discoveries (6-10)’ display significant adoption, but there are observable variations year over year. Some years show a higher number of teachers covering more scripts, indicating possible fluctuations in the curriculum’s relevance or external factors influencing its adoption.
Depth of Course Coverage: The y-axis placement of circles, especially those towards the top, indicates that many teachers delve deep into the courses, covering a majority of the available scripts. This suggests a comprehensive engagement with the platform’s content.

In summary, while the platform’s courses are widely adopted and explored in depth by many teachers, there are variations in how different courses are embraced year over year. These insights can guide platform administrators or educators in refining course content, understanding teacher preferences, and making informed decisions about curriculum development or platform enhancements.

3. Teacher-Centric Descriptive Analysis

Contents

3. Teacher-Centric Descriptive Analysis#

3.1 Teacher Recruitment and Retention Overview#

3.2 Teaching Experience Analysis#

3.3 Platform Usage and Course Exploration#

Educator Trends and Platform Adoption#

Comprehensive Course and Script Coverage#