Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
7-2025
Abstract
This study investigates ChatGPT-4o's ability to answer multi-modal assessment exercises in computer science (CS) courses. While the use of large language models (LLMs) to answer text-based exercises are extensively researched, their ability to answer exercises involving artifacts of other modalities remains underexplored. To close this gap, we evaluate ChatGPT-4o's answers to 120 multi-modal CS exercises in programming, software design, human-computer interaction, statistical analysis, process analysis, and simulation. The multi-modal artifacts in these exercises include class diagrams, sequence diagrams, user interface images, analytical charts, workflow diagrams and object-flow diagrams. Our comparisons to the expected answers of these exercises show that ChatGPT-4o performs well for exercises with class and sequence diagrams possibly due to the availability of more data for training. The potential for misuse by students highlights these exercises are better suited for closed-book exams or as scaffolding activities. ChatGPT-4o answers better for those multi-modal exercises designed to assess students at the lower levels of Bloom's taxonomy than the higher levels. This discrepancy is possibly due to ChatGPT-4o's lack of understanding underlying design concepts and limited ability to generate new multi-modal artifacts, making exercises requiring higher order of cognitive thinking suitable for take-home assignment. We hope the insights from this study provide a foundation to develop effective multi-modal assessments.
Keywords
academic assessments, multi-modal exercises, prompt engineering
Discipline
Artificial Intelligence and Robotics | Computer Sciences | Higher Education
Research Areas
Data Science and Engineering
Publication
ITiCSE 2025: Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education, Nijmegen, June 27 - July 2
Volume
1
First Page
58
Last Page
64
ISBN
9798400715679
Identifier
10.1145/3724363.3729056
Publisher
ACM
City or Country
New York
Citation
OUH, Eng Lieh; TAN, Kar Way; LO, Siaw Ling; and GAN, Benjamin.
Evaluating ChatGPT to answer multi-modal exercises in computer science education. (2025). ITiCSE 2025: Proceedings of the 30th ACM Conference on Innovation and Technology in Computer Science Education, Nijmegen, June 27 - July 2. 1, 58-64.
Available at: https://ink.library.smu.edu.sg/sis_research/10262
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/3724363.3729056