Comment by somethingsome

Hi! I can't really share the company, but I do love the space and happy to discuss what I've been reading.

So the idea of Knowledge Tracing originated, from my understanding with a paper in 1994: http://act-r.psy.cmu.edu/wordpress/wp-content/uploads/2012/1... this sort of introduced the idea that you could model and understand a students learning as it progresses through a set of materials.

The concept of Knowledge Components was started, I believe, at Carnegie Mellon and University of Pittsburgh with the Learn Lab: https://learnlab.org/learnlab-research/ - in 2012 they authored a paper defining KLI (Knowledge Learning Instruction framework): https://pact.cs.cmu.edu/pubs/KLI-KoedingerCorbettPerfetti201... which provided the groundwork for the concept of Knowledge Components.

This sort of kicked things off with regards to really studying these things on a finer-grained level. They have a Wiki which covers some key concepts: https://learnlab.org/wiki/index.php?title=Main_Page like the Knowledge Component: https://learnlab.org/wiki/index.php?title=Knowledge_componen...

Going forward a few years you have a Stanford paper, Deep Knowledge Tracing (DKT): https://stanford.edu/~cpiech/bio/papers/deepKnowledgeTracing... which delves into utilizing RNN(recurrent neural networks) to aide in the task of modelling student knowledge over time.

Jumping really far forward to 2024 we have another paper from Carnegie Mellon & University of Pittsburgh: Automated Generation and Tagging of Knowledge Components from Multiple-Choice Questions: https://arxiv.org/pdf/2405.20526 and A very similar paper that I really enjoyed from Switzerland: Using Large Multimodal Models to Extract Knowledge Components for Knowledge Tracing from Multimedia Question Information https://arxiv.org/pdf/2409.20167

Overall the concept I've been sort of gathering is that, if you can break down the skills involved in smaller and smaller tasks, you can make much more intelligent decisions about what is best for the student.

The other thing I've been gathering is that Skills Taxonomies are only useful in as much as they help you make decisions about students. If you build a very rigid Taxonomy that is unable to accommodate change, you can't really adapt easily to new course material or to make dynamic decisions about students. So the idea of a rigid Taxonomy is quickly becoming outdated. Large language models are being used to generate fine-grained skills (Knowledge Components) from existing course material to help model a students development based on performance in a way that can be easily updated when materials change.

I have worked through and replicated some of the findings in these later papers using local models, for example using the open Gemma 2 27b models from Google to generate Knowledge components and using Sentence Embedding models and K-means clustering to gather them together and create groups of related Knowledge Components. It's been a really fun project and I've been learning quite a bit.