One of the streamlined use-cases of LLMs involve the comparison of a student answer given for an exam question, trying to (1) identify the validity level of the answer and to (2) provide a simple reasoning for the assigned validity either in terms of the knowledge base that the large language model possesses.
In our presentation our main focus will be to implement a RAG (Retrieval Augmented Generation) based chatbot that attempts to assess exam questions with a justified grade, based on selected study materials. Certain technical details, such as underlying model components and alternating prompting techniques will be addressed. The real life application will demonstrate representative short essay answers in different difficulty levels and subject domains. Based on such practical cases, we would also like to actively discuss the capabilities and shortcomings of the presented approach.
The workshop is carried out by Andrea Palmini and Tunc Yilmaz.