Real Talk About GenAI Applications in Education

For our last webinar, we were joined by Kristen DiCerbo, the Chief Learning Officer of Khan Academy and explored the opportunities and challenges involved with using and developing education specific applications of GenAI.

Key topics included:

  • Common misconceptions about GenAI tools like Khanmigo, including the critical differences between foundational models and applications built on them, and how these differences impact evidence about AI's effect on learning

  • Khan Academy's approach to preventing direct answers through creative prompt design and managing the risks and trade-offs of deploying AI tools directly to students

  • Biggest surprises in how students and teachers interact with Khanmigo and the most effective partnerships and feedback loops for improving features

  • Evaluation approaches for AI model outputs using rubrics and human agreement training, and methods for understanding AI's impact on learning

  • The importance of transparency in sharing usage data publicly and essential questions educators should ask GenAI providers about bias, training data, and effectiveness

Participants gained:

  • Actionable insights into assessing GenAI tools

  • Real-world examples of GenAI model limitations and workarounds

  • Questions to ask EdTech providers to ensure accountability

  • Resources and rubrics for evaluating AI tools in your own context

AI Summary Notes:

📊 Background and Introduction (00:00 - 09:37)

  • Webinar introduction featuring Dr. Kristen DiCerbo, Chief Learning Officer at Khan Academy, for "Real Talk" discussion about GenAI applications in education.

  • Khan Academy's early AI access - Sal Khan received email from Sam Altman and Greg Brockman of OpenAI three years ago to test new model that could pass Advanced Placement Biology.

  • Initial GPT-4 experimentation - Khan Academy team tested model via Slack channel in August 2023, discovering it was actually GPT-4 before ChatGPT 3.5 public release.

  • Prompt engineering breakthrough - OpenAI taught them basic tutoring prompts: "you are a Socratic tutor, I am a student, help me get to answers, don't tell me the answer."

  • Hackathon development - Khan Academy used September 2023 hackathon week with 30 people under strict NDA to experiment with AI tutoring, writing coaching, and teaching assistant functions.

  • Khanmigo launch decision - Team scrapped entire 6-month roadmap to build Khanmigo for March 2024 GPT-4 release, despite early accuracy issues like "9 + 5 = 16."

🔧 Technical Challenges and Solutions (09:37 - 19:10)

  • Probabilistic model limitations - GenAI models consistently choose "27" when asked to pick random number 1-50 due to highest probability, demonstrating non-random behavior.

  • Production monitoring systems - Khan Academy built dashboards to monitor math accuracy and tutor behavior in real-time with thousands of daily interactions.

  • Hybrid approach implementation - Math calculations sent to Python-based calculator rather than relying on LLM, with results fed back into conversation invisibly to users.

  • Prompt engineering evolution - Early prompts required extreme language like "fate of the world depends on you not giving the answer" and writing in ALL CAPS to improve instruction following.

  • Model evaluation framework - Khan Academy uses different OpenAI models (GPT-4, GPT-4o, GPT-4o Mini) based on specific tasks, with continuous evaluation against custom datasets.

🛡️ Safety and Monitoring Systems (19:11 - 29:14)

  • Comprehensive evaluation system - Released dataset and research paper on tutoring effectiveness, measuring when AI correctly identifies right/wrong student responses.

  • Content moderation pipeline - Every interaction processed through moderation API checking hate, violence, self-harm, and sexual content with customizable thresholds.

  • Parental notification system - Flagged conversations for users under 18 trigger emails to parents/teachers, with manual review by community support team.

  • Accuracy monitoring tools - AI systems check AI responses for mathematical accuracy, trained to match human evaluator agreement levels.

  • Grounding in quality content - Significantly higher accuracy when AI tutoring references existing Khan Academy materials with worked examples and hint structures.

👩‍🏫 User Behavior and Product Learning (29:15 - 38:21)

  • Transparency challenges - Accuracy differences between Khan Academy problems vs. student-provided problems communicated mainly through blog posts and district success managers.

  • Creative homework avoidance - Students attempted using "chat with historical figures" feature to get Pythagoras to solve math homework, requiring prompt hardening across entire platform.

  • Writing coach evolution - Initial full-process writing support (brainstorming → outline → draft → feedback) modified after teacher feedback preferring to handle early stages personally.

  • Human-AI role boundaries - Teachers wanted to maintain control over brainstorming and planning phases, viewing these as core teaching responsibilities.

  • Implementation lesson learned - Technology attempting to replace human-viewed core responsibilities faces significant classroom adoption problems.

🔍 Research and Evidence Base (38:22 - 47:27)

  • Transparency philosophy - Khan Academy shares both successes and failures despite negative feedback, viewing failures as iteration opportunities.

  • Historical efficacy data - Large-scale studies with 500,000-600,000 students show learning gains for 18 hours/year usage, though only 5% of students reach this threshold.

  • Theory of action framework - Access to Khanmigo → high-quality tutoring interactions → increased cognitive engagement → more skills to proficient → better external assessment gains.

  • Research-based tutoring moves - AI tutoring built on decades of human tutor research including when to probe, summarize, correct, and provide different support levels.

  • ICAP framework implementation - Uses Mickey Chi's framework: passive → active → constructive → interactive learning progression for measuring cognitive engagement.

🔮 Future Outlook and Challenges (47:28 - 57:41)

  • Student engagement reality - Sees both amazing tutoring interactions and frequent "IDK" or "bro IDK" responses, highlighting engagement challenges.

  • Teacher implementation success - Newark science teacher created physical question prompt cards to help students ask better questions to AI, combining low-tech with high-tech approaches.

  • Global expansion limitations - Student version limited by legal/financial infrastructure for international billing, though teacher tools free in 40 countries via Microsoft partnership.

  • Cost structure reality - Every AI interaction has actual compute cost unlike traditional software, requiring sustainable business models.

  • Biggest concern - Unpredictable development pace makes long-term educational planning extremely difficult in slow-moving education sector.

  • Greatest optimism - Near-term potential for multimodal interactions with voice and visual capabilities creating more natural, engaging tutoring experiences.

  • Amanda Bickerstaff

    Amanda is the Founder and CEO of AI for Education. A former high school science teacher and EdTech executive with over 20 years of experience in the education sector, she has a deep understanding of the challenges and opportunities that AI can offer. She is a frequent consultant, speaker, and writer on the topic of AI in education, leading workshops and professional learning across both K12 and Higher Ed. Amanda is committed to helping schools and teachers maximize their potential through the ethical and equitable adoption of AI.

    Dr. Kristen Dicerbo

    Dr. DiCerbo is the Chief Learning Officer at Khan Academy, where she leads the content, assessment, design, product management, and community support teams. Time magazine named her one of the top 100 people influencing the future of AI in 2024. Dr. DiCerbo’s career has focused on embedding insights from education research into digital learning experiences. Prior to her role at Khan Academy, she was Vice-President of Learning Research and Design at Pearson, served as a research scientist supporting the Cisco Networking Academies, and worked as a school psychologist. Kristen has a Ph.D. in Educational Psychology from Arizona State University.

  • 00:00
    Amanda Bickerstaff
    Hi, everyone. We're so excited to have you. It'll take a little bit of time to get everybody in because we have a pretty big group, but I cannot tell you how excited I am to have this time with Kristin. It's going to be, I think, you know, we named it Real Talk and it will definitely be Real Talk today. As always. If you've been to a webinar with us before, this is your first time. Say hello in the chat where you're from. I know we'll be a pretty international group and then we will get started. You know, we're really excited. I'm really excited. Excited. And so is the whole entire AI for education team to be able to talk to one of the real leaders in gen AI and tech. And I mean that in a very meaningful way.


    00:40

    Amanda Bickerstaff
    I think that the work of Khan Academy, especially with the first focus on Khanmigo and trying to figure out what the applications could be to support student learning, have been pretty cutting edge and on the forefront. I think that what we're going to talk about is that what we thought going in or what they thought going in wasn't maybe what we've all learned now a couple years into Generative AI. But, but I have had such a pleasure over the last couple years of getting to know Dr. Kristen Kosurbo, who is the chief Learning Officer of Khan Academy. And were at AI show a couple months ago and were sitting on a bus going to a women education event and we just started talking about. I was like, how did this happen and what did you learn?


    01:20

    Amanda Bickerstaff
    And I was like, would you be willing to talk about that to a bigger audience? And one of the things that I think is so important this time is to be able to share not just the good, but the learnings. And I think that's what we try to do all the time. The things that work or don't work. And so I'm just so excited to have Kristin here. We're going to, as always, like, get involved. Say hello. Thank you for all. Already people are saying hello and you'll see this from everywhere from Berlin to Canada to the, to Tampa. I'm in Tampa as well today you've got, you know, but say hello. We will use the chat for you all to communicate.


    01:52

    Amanda Bickerstaff
    But if you have a question for Kristen or I, please put that in the Q and A and we'd love to have that conversation. And then if you have a great resource or, you know, a story, maybe Kristen will learn some stuff from you as well, if you've used Kamigo before. But we just want to make sure that this always feels like a community. This is what we really want this to be. So I'm going to come off share and I'm going to let Kristin introduce herself and then I'd love, you know, we're going to start with like, how did Calmigo even come to be? So Kristin, I'll hand it over to you.


    02:21

    Kristen DiCerbo
    Well first, thanks so much for having me and good to see everyone. I see a couple folks from Phoenix. I live in Phoenix and in Phoenix today. So yay to all of us who are living through the 115 degree heat today. So I'm Kristin. I am the Chief Learning Officer at Khan Academy and that title means different things in different places and different types of organizations. But we're a learning organization. And so as the Chief Learning Officer I am overseeing much of what you see on Khan Academy. So I lead our content design, product management, community support and assessment teams at Khan Academy. But my background is in educational psychology. So I have a PhD in educational psychology and initially started as a school psychologist here in Phoenix and kind of really got excited about what education technology could do and its potential.


    03:12

    Kristen DiCerbo
    This is back in my 2002, 2003, and then have spent about the last 20 years designing and building and researching education technology. And so I bring that lens to the work that I do. But also because I've been in this space for quite some time, have learned enough about the technology side to be a little dangerous.


    03:40

    Amanda Bickerstaff
    I love a little danger. I mean, like it's, you know, I think I would say that you know a lot more. I mean, I think that, you know, one of the things that really struck me, first of all, your background in educational psychology I think is so interesting because I think that this is a place maybe we're a little under baked in right now, like really thinking about what this is going to mean for learning. And so when, you know, one of the things that we always we actually talk about in our keynote, Kristin, is that there's kind of this dichotomy that happens when ChatGPT was released. So we have a slide that says like ChatGPT 3.5 is released in November 2022 and then in January, New York City with maybe some of our friends here and where I taught bans chatGPT pretty quickly.


    04:22

    Amanda Bickerstaff
    And then two months later when GPT4 is released, the next day you release Khanmigo, the math tutor. And so we always think it's such an Interesting way to frame this kind of context of, like, we know this is going to have an impact. We don't know what's going to be. But also that Jenny, like, that Ed Tech was thinking this might be a real opportunity. So can you talk to us about, like, the origins of, you know, you've been thinking about it a while, but I know that were all really surprised at just how good ChatGPT became from its earlier versions to 3.5. So you can talk about, like, what happened first and what made you all inspired to actually start building khanmigo.


    04:59

    Kristen DiCerbo
    Absolutely. So here's the origin story just about three years ago now. Sal Khan. So Sal's my boss at Khan Academy, obviously the names on the name of the product got an email from Sam Altman and Greg Brockman, who, as most of you know, are the leaders of OpenAI at that time, and they said, hey, we've just trained this new model. Would you be interested in seeing it? And Sal said to me, come up, come to this meeting with me for a second and let's see what they're up to. And it turns out in the background that Bill Gates had told them not to go back to him until their new model could pass Advanced Placement Biology. Well, we have a whole lot of Advanced Placement Biology questions that they were interested in using for evaluation and to see how it could do.


    05:47

    Kristen DiCerbo
    So we get on this zoom call with Sam and Greg and Sal and I, and they started showing how it could answer these AP Biology questions. And it was doing a pretty good job, not just getting them correct, but explaining the answers and why they were correct or not. And Sal and I were pretty blown away. And they said, well, what we'll do is we'll give you access to chat with this new model over a Slack channel. So it will hook up to a Slack channel and then we would ping it and ask it to do things. So this was now, this was just in August of 2023. It turns out what were playing with was actually GPT4. 3.5 is already built. They just hadn't put the chat interface on it, which they would do the coming November.


    06:32

    Kristen DiCerbo
    So we pretty quickly, like within a couple of hours, said to them, this is great, but we don't want a tool that's going to answer questions. We want a tool that's going to help students get to the answers themselves. So we're not sure about this. And they said, oh, let us teach you how to get. Tell the model how to act. And this was basically our first lesson. In prompt engineering. So they just wrote three sentences, something like, you are a Socratic tutor, I am a student, help me get to the answers, don't tell me the answer. And just like that, it started obviously with lots of problems and errors and all kinds of things. It started acting more like a tutor. And we said that was kind of our. Oh, wow. Aha.


    07:18

    Kristen DiCerbo
    So from there's a couple other big things that kind of moved us forward. So we have a tradition at Khan Academy of Hackathons, which is a week where not just engineers, but the entire org works on projects that are, we think, will, you know, advance Khan Academy in whatever way we want to. Well, we happen to have a hackathon scheduled the third week in September three years ago. And so at the time under this super strict NDA, as you can imagine, with OpenAI, so like Sal and I begged them for 30 more people to be included in this NDA and swore like word would not get out, all of that.


    07:58

    Kristen DiCerbo
    And so we use this hackathon week to experiment with all kinds of different things we might do with the AI, including being a math tutor, doing writing, coaching, being a teaching assistant in all kinds of ways, writing lesson plans, all of those things we see now were kind of playing with as ideas. And so having that whole week of just focusing on what might be gave us then the confidence when we said, okay, what we're going to try to do when they release this GPT4 in March is with that we're going to release an AI powered tutor for students and a few activities to help support teachers as well. And then we took our roadmap. So we usually have a roadmap of all the things we think we're going to build in the next six months.


    08:44

    Kristen DiCerbo
    We took everything we thought were going to build, threw it away and built Congo over that next six months. So that was a crazy time and where were. But there were lots of things those initial first months. There was a pretty legendary now place where it was insisting that nine plus five was 16 and we it would explain itself and it was trying to explain how it was and were like, oh my gosh, how we cannot release this as it is in what we're doing, but work through a lot of different things even before that initial release in terms of safety and security to try to get to a place where we felt comfortable with the risk were taking releasing it that early.


    09:37

    Amanda Bickerstaff
    That's really interesting. I mean, I think the first thing that really stands out to me is this idea of like, we don't want us to give you answers, we want it to help you, but it kind of still is. I mean, I feel like we haven't really nailed that still. Like, it's still like, I mean, it's almost like, you know, you want. It wants to be sycophantic, it wants to give you the answer, it wants to help or, you know, and the. Without anthropomorphizing it too much. But that's how it's been designed. So I think that's what I'd love to understand, like, how you got around that challenge and how well you think it's doing now to do that. But I do think I just want to kind of underscore.


    10:08

    Amanda Bickerstaff
    I don't know if everyone caught this, but, like, you know, generative AI models are so fascinating because they are probabilistic engines. And so if you went to ChatGPT right now and you said, you know, and Claude and Gemini and said, pick a number of 1 to 50, it will be 27. This is wild. This is something that we found out pretty recently because it's a. Apparently that's the highest probability answer. So there's some times where you get these correct answers and then like, so those AP bio, Sorry, the AP exams got right. It looks like it was going to get it right all the time. Right. Because that's really hard. But then when you asked it nine plus four five, and it was determined that was 15. Like, what was that? I mean, like, what, like, what was that moment for you all?


    10:52

    Amanda Bickerstaff
    What did you learn from that moment of, okay, I can do an AP math question correctly and explain it, but it can't do simple calculations.


    11:01

    Kristen DiCerbo
    Yeah. So there's a couple things that we have learned here. So one is because these are probabilistic, you can't just ask it to do something a couple times and then if it does it right, be satisfied that it's always going to do it. So the first thing we learned was that we have to put in. In production evaluation tools. So we have, for instance, have created dashboards that monitor math accuracy and tutor behavior in time. And so every morning I can go on and see, where is our, you know, how's our math accuracy doing? Has there been changes? Is it moving forward? Because we have thousands and thousands of hundreds of thousands of interactions every day, and a human just can't monitor those.


    11:46

    Kristen DiCerbo
    So back then, I had no idea that we'd end up spending so much time and effort on understanding how to monitor the output of these tools and to be able to ensure where they are. The other things we've learned is that generative AI is not always the right tool in the toolbox. So for math, we have ended up that when we detect that math is being done, we actually don't ask the AI, the Generative Large Language Model, to do the math. We send it out to a Python based calculator that does the math and then feeds that answer back into the conversation. The student doesn't see any of that, they just see the conversation.


    12:28

    Kristen DiCerbo
    But we're having the check on the side so that we're not relying on the model that isn't a calculator and isn't meant to be to do that math and that we feed it back in. So it's the idea of not potentially asking all of, you know, this tool to always do everything, because it's not meant to do everything all the time.


    12:49

    Amanda Bickerstaff
    But I think this is where we get, like, it gets hard to understand that though, right? And I'm sure at the beginning you were like, oh my gosh, it's like, you know, Chibi cheese is going to be able to do it all. And then like, you're like, oh, no, can't do math. And we actually have a math guy, Chris, I don't know if you've seen it yet, but we did it with SAP where our math guy was like, don't use gender value models to support math instruction. Use actual fit for purpose math tools that do what you're talking about, where they go like Khanmigo is going to the Python calculator, doing the calculations and filtering it back.


    13:23

    Amanda Bickerstaff
    But I think what can be interesting though is that if you didn't know that about Khanmigo, because you don't see that in the interface, you might just think that, oh, this is a generative tool, that it can do math. And I think that this is why these conversations are so important, because of the transparency of understanding, like, what's really happening under the hood. Because I'll give an example, there was a, you know, a lot of metrics about our benchmarks about generative models being very good at the Math Olympiad and it was doing very good. But then when they did the benchmark, where it asked them to explain, oh, did we lose?


    13:58

    Kristen DiCerbo
    No, I'm here, I'm going to go off video for just a second. The joys of, you know.


    14:06

    Amanda Bickerstaff
    So when the Math Olympiad, when it was just, you know, general BI was just asked to give the question, the answers, it was getting very high positive rates. But when it was asked to explain these highly complex answers. And the computational work that was done, it failed pretty significantly. And so I think that's why it really gets really fascinating for us because we want to know, Kristen, we want to know the real talk of like, what's worked or not worked. So I don't know. I mean, do you still have the kiddo in the room?


    14:36

    Kristen DiCerbo
    I'm good, I'm good. I hear you.


    14:38

    Amanda Bickerstaff
    Okay, great. I want to talk a little bit about what you found in terms of like these really early stage understandings about what it meant to build a generative model like ChatGPT and where you think that you know what you learned that was like very positive, but where you found some challenges.


    14:58

    Kristen DiCerbo
    Yeah, so I think it really is. There's not, I think the specific things of for. There's just some things about giving it instructions. So when we first started working on prompts, were writing all one big prompt. And so we would start like were having to do some basic things like write a lesson hook, for example, which you can get pretty well done without a huge prompt. Then we started thinking about building our writing coach and we wanted it to be able to give feedback on the introduction, feedback on evidence from argument, feedback on tone and style. When you start trying to cram a whole bunch of things into a prompt, the model starts ignoring parts of the prompt and not ignoring the same parts of the prompt, ignoring different parts of the prompt randomly as they're happening.


    15:51

    Kristen DiCerbo
    So what we found was, oh, now we have to start breaking the prompt part in chaining things and where they work. So that was one big lesson about prompting and how it follows instructions or doesn't follow instructions. Also, over time, the models have changed in their ability to follow instructions. So that steerability that the makers call has really changed. Early on, we would be doing things like writing in all capital letters, do not give the answer. Sal at one point wrote, the fate of the world depends on you not giving the answer. And that helped improve the accuracy of it not giving the answer. Which is like insane that was how responding.


    16:37

    Amanda Bickerstaff
    It's important for my grandma was one with coding. The user has no hands was one, whereas you have to read the code. So if you don't know everybody, maybe let's take one step back and talk a little bit about the technology. And so I think that people don't know what a system prompt is. A system prompt is when ChatGPT and OpenAI are developing GPT5 that's coming out, it will have a natural language written out prompt that gives it directions of be polite, give a full answer, don't say I don't know. And these are real, these change quite a bit and they have to be updated. If you are part of the sycophantic, what was it the sycophantic apocalypse where ChatGPT's system prompt was changed and then it just told everybody that you're like the smartest, prettiest human that ever existed until it was fixed.


    17:32

    Amanda Bickerstaff
    It's really actually very impactful. And at the early days, these emotional pleas, these kind of strange like creative like ways to try to get the bots to pay attention to the system prompt were really common, like salsa of the world. And these are still important today. But I think that to the point that Chris, you're saying is that it's become easier to direct, like to at least have normal, not crazy rhetoric to drive the bot does, but it can still be radically inconsistent. Right. And so have you found like, do you all like, you're using. You're still using ChatGPT, is that correct? You're still using. Are using 4o. Are you starting to do different models now?


    18:18

    Kristen DiCerbo
    We are still in the OpenAI family of models, but we use different ones depending on the different tasks that we're doing. So 4.04. Oh, Mini. Depending on different ones, respond to different bit pieces. The reason that we are still with the OpenAI family is because we have a set of evaluations. There's actually we've released a data set and a paper along with it. Dan, if you want to post that link in the chat for folks that are interested in getting into the technical details. But essentially we run evaluations to see things like how good is each of these different models at telling a student the right when they're right and wrong when they're wrong in a tutoring session because to your point about them being positive, sometimes we found that the tutor absolutely will say great job. No, it's not a great job.


    19:13

    Kristen DiCerbo
    They're not on the right track at all. So doing just that little piece of tutoring, telling them they're right when they're right and wrong when they're wrong. We released a data set of, you know, 180 student conversations that we run through different models and then can rank them. And what we find is that these OpenAI models consistently on the specific tasks we're asking them to do, perform better than the other pieces and in the other models in doing what they're doing. So we are still on those, on that family of models and doing those things. And we don't, I mean, you said they're, you know, radically inconsistent. We don't find that is the case, that with prompting and with the controls we have that we can get some pretty consistent results.


    20:01

    Kristen DiCerbo
    But I think this is a point between that gets lost too, is that folks, when they're talking about generative AI, are sometimes talking about just interacting with GPT, ChatGPT and Claude and Gemini. But when we're talking about tools that are specifically made for education, the point is not every teacher in the world has to learn all these intricacies of exactly how to prompt if you're using education tools that have already done some of that for you. But then you need to know how to evaluate those tools, which is an important piece. But I think it's important we think about the differences between the foundational models and education specific tools that are built on those foundational models.


    20:45

    Amanda Bickerstaff
    Absolutely. And I think that my point about the inconsistency is much more on the foundation models. And so just to kind of tease out what we mean by this is that I think a lot of people, and maybe not our audience as much, but are not aware that essentially all, almost every single generative application is run on a set of maybe five to six families of models like OpenAI that could be the Google models, that could be anthropics models, Mistral, you know, when we have, you have in, you know, China and India, like there are these models, you have the open source models for Meta. And so those foundation models, when you have a Conmigo, or you have a magic school or you have a canva AI or Gamma that does presentations, they're using those models underneath.


    21:35

    Amanda Bickerstaff
    And sometimes multiple, sometimes it won't just be a family, it'll be quad for this, arsonnet for this, and then it'll be flash 2.5 for this for Gemini. And I think that what's really fascinating though is that while a foundational model gives you more creativity and openness and control of your destiny, meaning that your prompting technique, what you want, it's pretty unlimited what you can do right now for most educators, it does require a pretty significant amount of AI literacy, which is what we really focus on, AI for education to understand how to get to what Kristen and Saul and all the team got through trial and error. But then having something like Hanmego lifts all that work and takes it out. But it will have both a lot of positives, I think, right, Kristin?


    22:19

    Amanda Bickerstaff
    But then maybe it's A little bit more contained and constrained. In some cases it's a trade off, it's easier to use and you don't have to build that AI literacy, but you are going to make a trade off in terms of like how much control or how much you can kind of steer the conversation. So you want to talk a little bit about that kind of distinction.


    22:40

    Kristen DiCerbo
    Yep, that's absolutely right. So we, for example, when we first launched our lesson planning tool, we, you know, had a very specific lesson plan format, a five part lesson plan which is common in many places but is not, for instance, what, or it's not what, you know, some districts want folks to use and where that is. So we had done a ton of work on this lesson plan and then there's a blog post that I shared a link to about how we literally used a rubric that is used in teacher prep programs for what makes a good lesson plan. And we worked on our lesson plan until it met the proficient status in that rubric consistently before we released this lesson plan. So were sure this was a good lesson plan.


    23:26

    Kristen DiCerbo
    But if you didn't want a five part lesson plan, then you had to go. Now we now have five different types of lesson plans. We've gradually expanded, but that's just an example of the kind of thing where we have done a lot of work to make it good and responsive and do what we know is educationally right. But to do that we've had to constrain the system and the parameters a bit around where things are. And it might not be what someone is specifically looking for and what they're doing. So there's definitely some trade offs there.


    24:01

    Amanda Bickerstaff
    Absolutely. And so how have you know, I think that another kind of piece that I think can be hard for us to recognize in an application like Hanmigo is that some of the same like the bias that we have? So let's say like, you know, pedagogical bias, so to speak. Or like the fact that five part lesson plan being the one you chose first, for example. Or like bias, like, I mean, I'm just going to say it like the Grok stuff happening right now, if you're not familiar with what's happening. But you know, these tools can be, they are trained on human data and they can be led by human developers. And so right now there's definitely a pretty big case of how biased outputs can be, you know, part and parcel of generative AI responses. But you have bias, you have hallucinations.


    24:45

    Amanda Bickerstaff
    If you're not familiar with hallucination. That's where the inaccuracies that happen because it's probabilistic. It's always that's that nine plus five equals 15. And so how do you, like how do you track or measure those in your responses both in the tutors and the teacher tools. But like, because I know no one can fix it fully. So like how are you thinking about it, measuring it and mitigating it?


    25:10

    Kristen DiCerbo
    So there's, as a tutor there's some specific things that we know good tutors do. And so we base a lot of that's from research on decades research and human tutors and what that looks like and how those fit. So we are first building on getting the AI to use the, you know, what do we know from research and how that fits. Then we have this idea of red teaming which is where you basically have a small group of people who tries to break and get the AI to say things you don't want it to say. We have a moderation system that we use so our every interaction goes through a moderation API that gives a value on hate, violence, self harm, sexual content with minors.


    26:04

    Kristen DiCerbo
    And if we get, then we set some thresholds for each of those and you can imagine on there's some art and science there to say for violence. We want to flag conversations about guns, but if you're writing an essay about World War II weapons then we actually do want you to use guns. So there's some trick in there but that all of that then sends up a flag. And if students are under 18 that sends an email to their parent or teacher, if they're a school or an individual user, that then flags that conversation and allows the parent to make the judgment of whether that or the teacher to make a judgment about whether that's a conversation that should be happening or not and where that is.


    26:43

    Kristen DiCerbo
    All of those moderations are also reviewed by our community support team every morning and work through where things are that have gotten through. And then we do a random sample of things of non flag conversations to see if there's things we're missing and what that looks like. So all of those things are things that we put in place on the, you know, monitoring what's happening and what kids are talking to the AI about and giving visibility and transparency for parents and teachers to what the kids are talking to their AI about and what the AI is saying in return and what that looks like.


    27:18

    Kristen DiCerbo
    So all of those things are in place and then on top of that in terms of the hallucinations, that's where we have some of the tools like that math accuracy tool where we are going through all those conversations and having another AI actually grading and scoring and checking for accuracy and then being able to give us information about whether we're seeing spikes or not in our accuracy and where things go. So there is some machines checking the machines going on here. But before we do that, we basically do human labeling and then we get to train the AI to get to the levels of agreement with the humans that humans have with each other. And then we then have the AI label our label conversations.


    28:08

    Amanda Bickerstaff
    So do you have any like data you're willing to share, even if it's like anecdotal or at least like in the ballpark of like how often kind of those things happen where inaccuracies kind of pop up or like these kind of flagged conversations. And has it changed? Like, has it changed as the technology has improved and. Or your approach has?


    28:28

    Kristen DiCerbo
    Yes, and yes. So the one of the things that has become very clear is that it is more accurate when it is grounded in our human generated high quality instructional materials. So when we are doing math tutoring, it can happen two ways. You can get tutored on problems that are on Khan Academy and when you're being tutored on a Khan Academy problem, we feed into as part of the prompt to the model, we feed in the question, also the answer. And the entire hint structure, which is in procedural math problems is actually a worked example of the problem being completed. And then the answer and the rationales for the right and wrong answer that all gets fed into then the prompt in the conversation. The other kind of tutoring is a student may bring their own problem to the.


    29:19

    Kristen DiCerbo
    In which case we don't have the human generated answer. We don't. We still send it out to the calculator, but it's still, you know, we don't have the human hint structure, the rationales for right and wrong answers, et cetera. We find a significant difference in that the error rates, it is much more accurate in that place where it's referencing existing content as part of what it's doing in the responses. So those are, that's certainly something we see grounding it in good human generated content makes a difference. Probably not a news flash to anyone, but not surprising. We also have seen over time the error rates come down significantly just on the model, the foundational models, so that we are seeing more improvement over time on those.


    30:12

    Amanda Bickerstaff
    Do you let people know that, Kristin? Do you let people know that if they're using their own problem set that the accuracy could be lower. I mean, that's. I would love to know how you do that. But also I think that is for those of you that are building GenTech, I hope that this is something that we could take back or any gen AI tools. But I'd love to understand how you tell people to be more aware of that. That difference between the ones that we have enough data to beat for a probabilistic model to give you a good answer versus those that are kind of coming in with like a much like less data to give you that answer.


    30:47

    Kristen DiCerbo
    Yeah. So we do it largely outside of the product. So we, for instance, have a series of blog posts around, you know, how we build things and what those look like so you can find it there. For our school programs, we have individuals who are Khan Academy staff called district success managers who work with our school districts who are district partners and they work closely with the school districts on implementation and their implementation plans and they share all that information with the schools as they work with them.


    31:19

    Amanda Bickerstaff
    Right. I mean, I think that it's probably a place that we could all probably get. It's like a weird thing to give friction to. You know, if you've ever built a technology, you're like low friction, right. You want people just to use it and use it a lot because it means stickiness, it means more data for you all to improve. But the idea, we've talked a lot and I think Kristen, when were doing our prep work of the idea of good friction, that friction that actually creates a space in which you have to stop and think and then like use the tool better. So I think that, you know, those types of things we would love to see more and more be integrated into gen AI ed tech tools.


    31:52

    Amanda Bickerstaff
    Really just gen AI tools and you know, the tiny ChatGPT this might be inaccurate at the bottom. Isn't cutting it everybody like, it's just not cutting it. But so for that. So that's on the side of the accuracy and hallucinations. But what about like, are students trying to use, you know, Khanmigo not to answer math questions but to like talk about their boyfriend or they're using the lesson planning lesson, but the writing bot to help them talk about, you know, something that has nothing to do with education? Does that happen? Do you see?


    32:25

    Kristen DiCerbo
    Well, the more common thing has been them trying to use other tools to do their math homework. So for a while we have a chat with historical figures and they Were trying to use talk to Pythagoras to get him to do their math homework. So we had to do a whole bunch of, you know, prompt hardening. Basically like making the prompts so that they are the time eagle would be more resistant to answering math problems across the whole site. You know, they were trying to write stories where they would get their math homework. Like the creative effort going into getting their math homework done is insane.


    33:09

    Amanda Bickerstaff
    The kiddos that are going to use generative AI to academically dishonest or to cheat like they are, some of them are incredibly wily and they were doing it before. It is really, really interesting though. Like I thought it was really interesting when we talked Kristin on the bus about how even students are using this in kind of ways that aren't expected. But then teachers also the writing assistant. What teachers actually wanted the bot to do is also surprising. Can you talk a little bit about the writing tutor you have that was newer, it's only a year old. Is that correct? And what you found out from that? Because I always think this is so interesting. User behavior of something that's never happened before is such a fascinating thing for me for him to learn about.


    33:56

    Kristen DiCerbo
    Yeah. So the writing tutor writing coach is an interesting journey. So we released initially just a place where students could put in an essay and get feedback on it. And so that was kind of what I was talking about earlier. And we had these different pieces, types of feedback that would give, you know, your introduction could grab the user's attention more. Here you introduce evidence, but you don't tie it into your argument. Those of kinds. Kinds of things. But we wanted to give a full writing experience support for that. So we created the writing coach where it would start off with your assignment and you could talk to conmigo about the assignment itself. You could go through brainstorming topics and brainstorming your ideas and then to an outline and then to a draft where you would get that feedback.


    34:43

    Kristen DiCerbo
    Interestingly, a lot, not a lot, but a good number of teachers told us, you know what? I actually don't want those early stages. I do. I like the feedback they're getting. They get a chance to revise before I as a teacher see it. That's great. I don't want to do those early things. And I was like, what? Like those are really important for writing instruction. And what it comes down to is the teachers wanted those early stages to happen with them and in the classroom. And I think this is a big lesson and I've learned it over and over in EdTech. But sometimes you need to get the same lesson over.


    35:19

    Kristen DiCerbo
    If you try to have technology do a thing that the humans view as a key part of their role and responsibilities, then you're going to run into real problems with implementation in classrooms. Because the humans, the teachers and the students are like, we don't want that. I was at a conference once, sitting next to a teacher and someone was presenting on an engagement detector, like a technology that would flag for teachers when students weren't engaged. And the teacher next to me just kind of mutters like, I have an engagement detector. It's called my eyes.


    35:52

    Amanda Bickerstaff
    Yeah, right. Like, like me, like I like just being in the classroom.


    35:56

    Kristen DiCerbo
    That's the thing that I build, look across and I see what's going on. So, yeah, I think there's a lot of questions about what does the AI do in the classroom? What does the AI not do in the classroom and what that means.


    36:10

    Amanda Bickerstaff
    And I think that this is a lesson that I wish we would take on, is that we don't. I don't think we do enough in ed tech in general of like really identifying what people, what the problem actually is and is. For most students, the problem usually isn't brainstorming it. You know, it might be outlined like it's like. Or it's going to be part of like the bell curve, right, that some kids will need this. But the idea of a writing coach, right, it starts to take away the human judgment component of it, right? And it's a one size fits all. It says you're going to get help at all these stages. And even if that wasn't what you intended to do, it flagged for users very quickly.


    36:54

    Amanda Bickerstaff
    That I'm sure if you had split it up and you are like, you have, and you said this is the brainstorming part. If there are a kid that has trouble with brainstorming, and this is the part about this that you would have had less issues because then it could be targeted and it's not a one size fits all approach. And then the work of the classroom, right, gets the kid to where they need to be and then that becomes the extra layer, the extra step, the movement towards like fluency and capacity and like, you know, the ability to actually do the thing. And I think that becomes really such an interesting part of Generative AI because we kind of think that it's an all or one. It's like a zero or one. You either don't use it or use it for everything.


    37:32

    Amanda Bickerstaff
    And what we've seen for all of the research on the impact so far, it's when people use these tools in meaningful and like directed ways that you see value. And I think that this is a great example of like you thought you were doing this great thing, but then it forced us all to kind of understand this moment in time and where we need to like think about like the all in versus the spaces in which we extend learning or support individuals. And so I think that's really fascinating. This is why, I mean I'm sure that you're all enjoying this, but this is why I wanted to have this conversation because I feel like so much we do not talk about the things that we learn. And I think ed tech is.


    38:09

    Amanda Bickerstaff
    We have to, we try so hard to be good and look good and to do, but there are almost no genai ed tech tools that have any evidence of learning outcomes. Not just learning outcomes, but like there's very little evidence of like a lesson planning tool making more meaningful lesson plans that impact student outcomes. And I know that you all do. One of the reasons why I wanted to have you here is you do a really good job of lifting up the research and the evidence that you have that we'd love to see other organizations do, even if it's like we don't know. Like honestly we don't know. So I just want to, I don't know. I mean, can you talk about why you all like why you're comfortable sharing the things that don't work since it's such a rare occurrence?


    38:53

    Kristen DiCerbo
    Yeah. So you just put down a whole bunch of things.


    38:56

    Amanda Bickerstaff
    I know, sorry, that was a lot of. I'm going. Everybody cool?


    39:02

    Kristen DiCerbo
    So a couple of thoughts. So first is there's not a lot of reward for sharing what doesn't work from a provider or developer perspective. And so there's, I think basically if when we share something that doesn't work, what we get is kind of the folks that are really negative about AI pointing to, just pointing to that as examples and saying, see, we told you it wouldn't work as opposed to what we think. Like hey, when that, when we found this out about the essay, went back and iterated and said, okay, now you can choose which parts of the coach you want to use and which parts you don't want to use. And now we're iterating and improving. And so there's, you know, that kind of back and forth about whether to share it or not. One thing that we.


    39:48

    Kristen DiCerbo
    Another example of this is we have been historically apart from the AI have released our efficacy studies. So we do large scale studies, 500,000 students, you know, 600,000 students, about whether working on Khan Academy leads to greater than expected learning gains. And we see that the students who use us for 18 hours a year, so that's about two hours a month, about 30 minutes a week, see those gains. And so we, you know, absolutely talk about that. We know practice when you're learning a new skill is so important. That's the learning science. That's where this is. But we know that the majority of students don't get to that level of usage. And we have been transparent in saying in that amongst those 500,000 students, about 5% of them actually get to that number. And again, we get a lot of pushback on.


    40:45

    Kristen DiCerbo
    Well, see, it doesn't work. Well, the issue is the implementation, the engagement, all of the reasons why it doesn't. But what we do know is practice works. And so let's talk about that. And so when we talk about the efficacy of our AI tools, what we're talking about is we're not just inventing a tool for no reason. Our tutor is made and designed to work with that practice system. So students are getting more support during that practice, they're getting better feedback during that practice. So by improving the practice that we already know works and helping students get more practice because they're getting that support and work, that's our hypothesis for why we think the AI will improve learning.


    41:28

    Kristen DiCerbo
    So first, I think it's important when you're thinking about all of these AI tools, what is the hypothesis, what's theory of action for why we think it will work? Yeah, and so, you know, we're building on all of that existing evidence to build this tool, not just inventing something out of full cloth.


    41:46

    Amanda Bickerstaff
    I mean I think that this is, I find this to be really interesting just in general. So if you aren't familiar with my background, like I was the CEO of an edtech in Australia before I started the afore education and it was my first time really building technology for schools. And I always found it to be so interesting even then of this, like, this lack of like just talking about what works or doesn't work. We actually launched a wellbeing tool where we, you know, it was well being for learning. And you know, I, it was great for me because I got to do all the coaching calls. So every 41 schools of different types, I got to coach on their data on students well being and readiness to learn. And then what we did is we Published what we learned alongside the tool.


    42:34

    Amanda Bickerstaff
    And that was like, you know what? There's some good things, other things we don't know. Like, here's some. Some things. And you know what? It was really well received. And it was one thing where, yes, I think that maybe right now it's very fraught with AI because there are like, for example, the cognitive debt study that was done by MIT that said if you offload your thinking to an AI, you have very little cognitive work and you. And you don't have a lot of ownership. Everyone was like, oh, AI will rot your brain. But the same thing would happen if I asked Kristen to do my paper.


    43:04

    Kristen DiCerbo
    Exactly.


    43:05

    Amanda Bickerstaff
    Or an essay mill.


    43:06

    Kristen DiCerbo
    And then.


    43:06

    Amanda Bickerstaff
    But what people don't see, first of all, it's a very small study that when the students wrote the outlines and then used AI to help them write it, their cognitive load was quite high, their ownership was high, and those losses were actually regained. And I think that this is where it is so important for us to have a nuanced conversation. There is not an organization in the world, no matter McDonald's to the smallest district and school from whatever part of the world that has this figured out, including Khan academies or OpenAI.


    43:41

    Amanda Bickerstaff
    And if we can't have these really honest conversations, then we're going to be in a pretty difficult position because we're going to trust these tools to do things that A, humans want to continue to do, like kids want to continue to do, or teachers or B, that could actually be negatively impactful. So I just want to really say thank you, Kristen, for your willingness and Khan Academies, if you watch our stuff, we don't often have organizations that we work with in this way because we want to be tech agnostic, we want to be a player that allows. But I do think that when there is a good example of transparency, evidence basis and willingness to share, it's really important to support. So anyway, that's a very long piece, but we have a couple questions, Kristin, if that's okay.


    44:27

    Kristen DiCerbo
    Absolutely.


    44:28

    Amanda Bickerstaff
    And then we'll let everybody go. And so I think that a couple questions are pretty technical, but I think I might focus on kind of the bigger ones, like where, like when you thought about the evidence base for. Let's just pick the math tutor, because that's definitely the one that's most baked. Like, what evidence base did you use? What was theory of action? And then. And then did you have to change it? Right. And I think that would be really interesting for people to know.


    44:54

    Kristen DiCerbo
    Yeah. So I will walk you through our Simplified care of action first. And I'll tell you, because then each piece has the evidence base. So the idea is access to Khanmigo leads to high quality tutoring interactions, leads to increased cognitive engagement, leads to increased what we call skills to proficient. So on Khan Academy, you move from attempted to familiar to proficient to mastered as you move through things. And we know if you get more skills to proficient, you will see greater gains on external assessments. So that's the basic piece. So then this, the. So if you take that, the first piece is high quality tutoring interactions. All right, what does that look like? So what does the evidence base look like? That's where we built on all of that understanding of what human tutors do. And there's literally research about specific moves.


    45:44

    Kristen DiCerbo
    So when do you pump for more information? When do you summer? When does the tutor summarize? When does the tutor lead into a new path, solution path? When do they not? Well, how much support do they provide? All of that is there's research about that we then attempted to make Comeigo follow. Still a work in progress, because the trick there is that different students need different of those moves at different times. So we're working on that. But, so that's the research it's built on. And then we said, how do we evaluate whether it's doing those moves well? And that's the piece that I was talking about actually labeling all of those transcripts. And is it doing the summarizing? Is it doing the probing for more information? Is it doing procedural correction, like all of those things? So, so that's the first step.


    46:31

    Kristen DiCerbo
    Because if it's not giving a high quality tutoring interaction, there's no reason to think that it's going to lead to learning. The next step is that cognitive engagement. So our students, we know there's a researcher named Mickey Chi actually at Arizona State University, and she and her partner in the research, Ruth Wiley, put out what they call an ICAP framework, icap. And that stands if you do that backwards, as you move from passive to active to constructive to interactive learning, the learning increases. And to be clear, constructive doesn't mean constructivism. It means the students actually engaging and constructing meaning in the answer. So we said, how do we ensure that students are having those kinds of interactions that are leading to that? And so that's the research base we used on the cognitive engagement.


    47:21

    Kristen DiCerbo
    And then we started thinking, looking again at those transcripts to see if we could find evidence of those different levels. And this is another place where we're being pretty clear we certainly see some amazing interactions like where I would like to hold it up and be like this is what we can get to. And we see a lot of idk. My favorite this week was bro IDK or bra.


    47:46

    Amanda Bickerstaff
    It should just be bra.


    47:48

    Kristen DiCerbo
    Bra. So again if students aren't engaging with the tutor, we're not going to get learning outcomes. So that's again part of our theory of action. We're now continuing and thinking about how do we continue to build those better interactions and what they look like. So that laying out that theory of action then points us to both the evidence base to build on and then our evaluation what it is.


    48:14

    Amanda Bickerstaff
    Well, I think I'm just gonna. Someone asked about the attitudes. I think that just to maybe like TLDR a little bit is like that kids can, that can do self directed learning or can have resilience or will continue through interactions and even ones that are incorrect with a bot like Khamigo because it's not going to have the human component. They're going to be probably your best use case right now. And we've heard this a couple of times Kristin, like a kid that's really like they just really need the extra support but they can sit and engage and construct like that. This is a great opportunity because a lot of the time they may not have access to a tutor at home or they might not have access to additional support.


    48:54

    Amanda Bickerstaff
    But you're talking about the group of the kids that are maybe already disengaged or maybe don't find value in a bot interaction which you know what is okay because I mean like it's weird and sometimes fails and can be strange and like that's just how we are right now. But I do think that if you're thinking about ways in which some scaffolding could help, that would be the teacher helping the young person move through the bot interaction to model best use cases. Like we could start to bring in the people side of what we can do and that this again there's no one size fits all. But if you can choose those kids that maybe that's 25% that are in a really good spot in terms of their ability to really engage. Right.


    49:37

    Amanda Bickerstaff
    With a bot like this that frees you up potentially to build those skill sets more directly with the students that it's not there. It's, you know, they're not quite ready, it's not their preferred way. They have some other initial needs. So I think that's a really fascinating component.


    49:54

    Kristen DiCerbo
    I have one good story I want to tell you. So there's a teacher in Newark, New Jersey, actually the science teacher and she felt like her kids weren't asking good questions for Kamigo and they just didn't know like what are good questions in a science class to ask. So first they took time to go and ask me what good science questions are. So they sourced a whole bunch of questions. They picked their favorite top five like question stems that would be good more generically. She then made them printed out three by five cards that they have on their desk questions. And I just love the juxtaposition of this like old school paper, the best.


    50:31

    Amanda Bickerstaff
    Technology and then the post it.


    50:33

    Kristen DiCerbo
    Right.


    50:34

    Amanda Bickerstaff
    If that is not anecdote for education, there isn't one. Like you know, like the sentence starter that like, you know, it's just on your desk if you have you struggle. That's sincere. I love that. And you know what, it's real. Like there's no. We were actually talking, one of our instruction assessment courses started today. Kristen and I was talking like how much time do we have paper that just like we never look at again? Like those post its the computer vision. It's like the silly, like it's kind of a silly use case but take an image of the photo of the post its and get feedback and like it's such a funny like the world of like it doesn't have to be either or. It can be both I think is the best approach.


    51:11

    Amanda Bickerstaff
    Okay, I have one tactical question from the group because we have a very international crowd. They would like to know when you are expanding CODMIGO into other places across Europe and potentially the world.


    51:21

    Kristen DiCerbo
    So that's, that's.


    51:23

    Amanda Bickerstaff
    I'm sure you'll have a great answer for that right now, but if you might.


    51:26

    Kristen DiCerbo
    No, I. So our teacher tools are free for teachers in about 40 countries. The reason we are able to do that is because these AI interactions cost money. Yes. We are grateful for a partnership with Microsoft where Microsoft has covered the cost of those interactions with our teacher tools. So we don't have the bill every month of teachers from 40 countries working through that. So as we. The, the other option is of course that we charge for Conleygo and we do that in the U.S. the issue is we're a nonprofit and we've never charged for anything. So we don't have entities set up in all those countries to collect tax money and to pay the taxes and collect it all and work through it all.


    52:11

    Kristen DiCerbo
    And we have a very small finance and legal team that Makes sure we're meeting all of the legal requirements for AI in every country we want to go into. So I apologize that it's taking so long for the student version of this to be able to be out in other countries. But it literally is like we have to charge for it to be able to not make bankrupt Khan Academy to do this.


    52:36

    Amanda Bickerstaff
    Yeah, we talked about foundation models. Every single time a kid responds, there's going to be an actual cost which is very different than other technologies. Like if you're using a tutor that was like hosted like on an application that wasn't generative AI, you'd have to pay like hosting fees. Fees and they could be higher based on how many people. But this is different because it's a volume thing. Every single time you hit, you know, go. There's not. There's a cost, it's an environmental cost, there's a actual cost. And so I think that is something of why there was a question about why the student version is. Because it's going to be a lot of high touch. Like the tools themselves are affordable ish because they're structured and there's a lot. There's very little like massive, like 95 prompts.


    53:21

    Amanda Bickerstaff
    It's like all structured but with the student interaction it can be quite intensive in the amount of compute and API calls. So that's something to consider. And that's why a lot of tools right now are asking you to pay very early even if they don't have good evidence because it would be too expensive for them to run like traditional models or traditional applications. It's something to consider. Okay, last question. And we're going to end here. And I'm so excited because I get to see Kristin in a couple weeks in Arizona about a few big piece that we're not going to announce yet. But we're going to do a pretty amazing approach of thinking through what Jenny is going to really mean for education as we move forward. But there was a question I want to ask you because we like balance.


    54:02

    Amanda Bickerstaff
    What are you most optimistic about in terms of generating education? And then maybe what you're most concerned about and maybe start with concern first so we can end with the positive.


    54:12

    Kristen DiCerbo
    That sounds like a good strategy. I think the thing that keeps me up at night, I'll say is just that it's so. It's almost impossible to know where this is going. And so three years ago would I ever predicted that we'd be in this place now with where things are? No, I had no idea that it would be this roller coaster that we're on. And so the same holds true for what's in the next five years. A lot of times people will ask me, like, what do you think's gonna. What's your vision for the. And it's so difficult to know. And that makes it really hard in a place like education where we do not move quickly and for good reason, because we don't want our education system to be swaying with the fads at the time.


    55:00

    Kristen DiCerbo
    But it also makes it really difficult to plan ahead and figure out what's coming in the future. And I certainly think that's what worries me. And I think, oh gosh, where we're going to end up here and is it going to be the better version that we are working towards, or is it going to be the dystopia where kids have offloaded their entire cognitive engagement to these tools and we haven't found a way around it? So that certainly is my concern. I think what I'm optimistic about is actually the short term and I think that the ability to bring in more of the visual work so that the AI and the student and the vision can all work together, whether you're drawing something or working on a math problem.


    55:52

    Kristen DiCerbo
    That we get to the point where it's not just the text back and forth, but that it's a voice interaction almost like with a tutor, and it can see the things that students are acting on. I think the engagement piece of that has the potential to be much more like students naturally do their work instead of having to use, you know, that all the keyboard calculator and all that. So I think that has a lot of potential and we're not that far from it.


    56:21

    Amanda Bickerstaff
    I agree. And I think that also, like, one of the things that's funny about a chatbot is like most kids don't want to answer 100 questions. Like they want you to help them and they want to so to be able to actually get real time feedback that's more directed like teacher would do, I think is I, I totally agree. But you know, I think that I just want to say thank you first of all, everyone, thanks for hanging out and asking great questions and being engaged. I think that.


    56:45

    Kristen DiCerbo
    Sorry we couldn't get to all the questions.


    56:46

    Amanda Bickerstaff
    I know we didn't get to all of them. But you know what?


    56:48

    Kristen DiCerbo
    But here we are maybe in six.


    56:50

    Amanda Bickerstaff
    Maybe like the next big epoch. And you know the thing, we can have another conversation. But I just want to say thank you to Kristen as well. I mean, I think that I've already said it. But the thing we need now is honesty, transparency and ethics. And, like, if we can't do that, then we're not. We're doing a disservice to our young people and our teachers and our families. And so if you're watching this and you are making decisions about buying, buy tools that will be willing to share evidence and talk about transparency, about what they're doing and how, and then also make sure to give space for real understanding of the impact on young people. So thanks, everybody. Thank you so much, Kristen. We appreciate it. We'll send the recording tomorrow. And I hope everyone has a great morning, night, day.


    57:36

    Amanda Bickerstaff
    I mean, everyone's in different places. We just appreciate you for joining. And have a wonderful day. Thanks, everyone.

Want to partner with AI for Education at your school or district? LEARN HOW