Visualizing Course Status - A Unified Perspective

This project integrates CSV data from 4 files, creating a unified MySQL table. It refines the table structure, analyzes course status, and organizes insights. Using limited R graphics functions, it crafts a visual to offer a comprehensive summary of active and retired courses.

View the Project on GitHub Kfkyyian1/coursesummary

Using R to combine CSV files into a table & uploading to MySQL

image

After loading in the necessary package, a function named “uploadCSV” was used to read in all the CSV files and combining it into a table as instructed in the question.

The database details and folder path were first established. Then the list.files command was used to list out all the files that are in CSV format within the specified folder. This function is used instead of listing out all 4 file names, so that it’ll be more sustainable in the future when there’s new CSV files being added to the folder.

Moving on, all the CSV files were then combined into a dataframe called combined_data, breakdown of the command below:

The connection to the MySQL database was opened, so that a table called “course_status” created with the data from combined_data can be imported into MySQL. Lastly, the connection was closed.

Data Cleaning & Sanity Check in MySQL

  1. Check if the rows and columns match the raw data
    image
    image

  2. Remove the additional column
    image
    image

  3. Check type of data
    image
    image

  4. Clean up column names
    image
    image

Create a new table

Table containing:
(i) the number of courses ever offered,
(ii) percentage of active courses offered,
(iii) percentage of retired courses for each program and school
image
After refreshing MySQL, the table is created as shown below.
image

Visualise data in R

  1. Read in course_summary table into R
    image
    image

  2. Data Sanity Check
    • The number of rows and columns match the table in MySQL, indicating all the data has been fetched. (93 rows, 5 columns)
    • The structure of the data was checked using the str command. The data type observed is correct and ready for plotting.
      image
  3. Visualisation Total Number of Active   Retired Courses_CORRECT

The senior management team is often constrained by time, and only requires a quick overview of the data. Therefore, the visualisation is simplified to only showcase the total number of active and retired courses within each school. Notably, the School of Science & Technology exhibits the number of retired courses, surpassing active courses. In contrast, other schools demonstrate a higher number of active courses in comparison to their retired courses.

With the visualisation above, the senior management team can analyze and possibly take some of the next steps below and make informed decisions:

image