AI in Education
The leap into a new era of machine intelligence carries risks and challenges, but also plenty of promise
In Neal Stephenson’s 1995 science fiction novel, The Diamond Age, readers meet Nell, a young girl who comes into possession of a highly advanced book, The Young Lady’s Illustrated Primer. The book is not the usual static collection of texts and images but a deeply immersive tool that can converse with the reader, answer questions, and personalize its content, all in service of educating and motivating a young girl to be a strong, independent individual.
Such a device, even after the introduction of the Internet and tablet computers, has remained in the realm of science fiction—until now. Artificial intelligence, or AI, took a giant leap forward with the introduction in November 2022 of ChatGPT, an AI technology capable of producing remarkably creative responses and sophisticated analysis through human-like dialogue. It has triggered a wave of innovation, some of which suggests we might be on the brink of an era of interactive, super-intelligent tools not unlike the book Stephenson dreamed up for Nell.
Sundar Pichai, Google’s CEO, calls artificial intelligence “more profound than fire or electricity or anything we have done in the past.” Reid Hoffman, the founder of LinkedIn and current partner at Greylock Partners, says, “The power to make positive change in the world is about to get the biggest boost it’s ever had.” And Bill Gates has said that “this new wave of AI is as fundamental as the creation of the microprocessor, the personal computer, the Internet, and the mobile phone.”
Over the last year, developers have released a dizzying array of AI tools that can generate text, images, music, and video with no need for complicated coding but simply in response to instructions given in natural language. These technologies are rapidly improving, and developers are introducing capabilities that would have been considered science fiction just a few years ago. AI is also raising pressing ethical questions around bias, appropriate use, and plagiarism.
In the realm of education, this technology will influence how students learn, how teachers work, and ultimately how we structure our education system. Some educators and leaders look forward to these changes with great enthusiasm. Sal Kahn, founder of Khan Academy, went so far as to say in a TED talk that AI has the potential to effect “probably the biggest positive transformation that education has ever seen.” But others warn that AI will enable the spread of misinformation, facilitate cheating in school and college, kill whatever vestiges of individual privacy remain, and cause massive job loss. The challenge is to harness the positive potential while avoiding or mitigating the harm.
What Is Generative AI?
Artificial intelligence is a branch of computer science that focuses on creating software capable of mimicking behaviors and processes we would consider “intelligent” if exhibited by humans, including reasoning, learning, problem-solving, and exercising creativity. AI systems can be applied to an extensive range of tasks, including language translation, image recognition, navigating autonomous vehicles, detecting and treating cancer, and, in the case of generative AI, producing content and knowledge rather than simply searching for and retrieving it.
“Foundation models” in generative AI are systems trained on a large dataset to learn a broad base of knowledge that can then be adapted to a range of different, more specific purposes. This learning method is self-supervised, meaning the model learns by finding patterns and relationships in the data it is trained on.
Large Language Models (LLMs) are foundation models that have been trained on a vast amount of text data. For example, the training data for OpenAI’s GPT model consisted of web content, books, Wikipedia articles, news articles, social media posts, code snippets, and more. OpenAI’s GPT-3 models underwent training on a staggering 300 billion “tokens” or word pieces, using more than 175 billion parameters to shape the model’s behavior—nearly 100 times more data than the company’s GPT-2 model had.
By doing this analysis across billions of sentences, LLM models develop a statistical understanding of language: how words and phrases are usually combined, what topics are typically discussed together, and what tone or style is appropriate in different contexts. That allows it to generate human-like text and perform a wide range of tasks, such as writing articles, answering questions, or analyzing unstructured data.
LLMs include OpenAI’s GPT-4, Google’s PaLM, and Meta’s LLaMA. These LLMs serve as “foundations” for AI applications. ChatGPT is built on GPT-3.5 and GPT-4, while Bard uses Google’s Pathways Language Model 2 (PaLM 2) as its foundation.
Some of the best-known applications are:
ChatGPT 3.5. The free version of ChatGPT released by OpenAI in November 2022. It was trained on data only up to 2021, and while it is very fast, it is prone to inaccuracies.
ChatGPT 4.0. The newest version of ChatGPT, which is more powerful and accurate than ChatGPT 3.5 but also slower, and it requires a paid account. It also has extended capabilities through plug-ins that give it the ability to interface with content from websites, perform more sophisticated mathematical functions, and access other services. A new Code Interpreter feature gives ChatGPT the ability to analyze data, create charts, solve math problems, edit files, and even develop hypotheses to explain data trends.
Microsoft Bing Chat. An iteration of Microsoft’s Bing search engine that is enhanced with OpenAI’s ChatGPT technology. It can browse websites and offers source citations with its results.
Google Bard. Google’s AI generates text, translates languages, writes different kinds of creative content, and writes and debugs code in more than 20 different programming languages. The tone and style of Bard’s replies can be finetuned to be simple, long, short, professional, or casual. Bard also leverages Google Lens to analyze images uploaded with prompts.
Anthropic Claude 2. A chatbot that can generate text, summarize content, and perform other tasks, Claude 2 can analyze texts of roughly 75,000 words—about the length of The Great Gatsby—and generate responses of more than 3,000 words. The model was built using a set of principles that serve as a sort of “constitution” for AI systems, with the aim of making them more helpful, honest, and harmless.
These AI systems have been improving at a remarkable pace, including in how well they perform on assessments of human knowledge. OpenAI’s GPT-3.5, which was released in March 2022, only managed to score in the 10th percentile on the bar exam, but GPT-4.0, introduced a year later, made a significant leap, scoring in the 90th percentile. What makes these feats especially impressive is that OpenAI did not specifically train the system to take these exams; the AI was able to come up with the correct answers on its own. Similarly, Google’s medical AI model substantially improved its performance on a U.S. Medical Licensing Examination practice test, with its accuracy rate jumping to 85 percent in March 2021 from 33 percent in December 2020.
These two examples prompt one to ask: if AI continues to improve so rapidly, what will these systems be able to achieve in the next few years? What’s more, new studies challenge the assumption that AI-generated responses are stale or sterile. In the case of Google’s AI model, physicians preferred the AI’s long-form answers to those written by their fellow doctors, and nonmedical study participants rated the AI answers as more helpful. Another study found that participants preferred a medical chatbot’s responses over those of a physician and rated them significantly higher, not just for quality but also for empathy. What will happen when “empathetic” AI is used in education?
Other studies have looked at the reasoning capabilities of these models. Microsoft researchers suggest that newer systems “exhibit more general intelligence than previous AI models” and are coming “strikingly close to human-level performance.” While some observers question those conclusions, the AI systems display an increasing ability to generate coherent and contextually appropriate responses, make connections between different pieces of information, and engage in reasoning processes such as inference, deduction, and analogy.
Despite their prodigious capabilities, these systems are not without flaws. At times, they churn out information that might sound convincing but is irrelevant, illogical, or entirely false—an anomaly known as “hallucination.” The execution of certain mathematical operations presents another area of difficulty for AI. And while these systems can generate well-crafted and realistic text, understanding why the model made specific decisions or predictions can be challenging.
The Importance of Well-Designed Prompts
Using generative AI systems such as ChatGPT, Bard, and Claude 2 is relatively simple. One has only to type in a request or a task (called a prompt), and the AI generates a response. Properly constructed prompts are essential for getting useful results from generative AI tools. You can ask generative AI to analyze text, find patterns in data, compare opposing arguments, and summarize an article in different ways (see sidebar for examples of AI prompts).
One challenge is that, after using search engines for years, people have been preconditioned to phrase questions in a certain way. A search engine is something like a helpful librarian who takes a specific question and points you to the most relevant sources for possible answers. The search engine (or librarian) doesn’t create anything new but efficiently retrieves what’s already there.
Generative AI is more akin to a competent intern. You give a generative AI tool instructions through prompts, as you would to an intern, asking it to complete a task and produce a product. The AI interprets your instructions, thinks about the best way to carry them out, and produces something original or performs a task to fulfill your directive. The results aren’t pre-made or stored somewhere—they’re produced on the fly, based on the information the intern (generative AI) has been trained on. The output often depends on the precision and clarity of the instructions (prompts) you provide. A vague or poorly defined prompt might lead the AI to produce less relevant results. The more context and direction you give it, the better the result will be. What’s more, the capabilities of these AI systems are being enhanced through the introduction of versatile plug-ins that equip them to browse websites, analyze data files, or access other services. Think of this as giving your intern access to a group of experts to help accomplish your tasks.
One strategy in using a generative AI tool is first to tell it what kind of expert or persona you want it to “be.” Ask it to be an expert management consultant, a skilled teacher, a writing tutor, or a copy editor, and then give it a task.
Prompts can also be constructed to get these AI systems to perform complex and multi-step operations. For example, let’s say a teacher wants to create an adaptive tutoring program—for any subject, any grade, in any language—that customizes the examples for students based on their interests. She wants each lesson to culminate in a short-response or multiple-choice quiz. If the student answers the questions correctly, the AI tutor should move on to the next lesson. If the student responds incorrectly, the AI should explain the concept again, but using simpler language.
Previously, designing this kind of interactive system would have required a relatively sophisticated and expensive software program. With ChatGPT, however, just giving those instructions in a prompt delivers a serviceable tutoring system. It isn’t perfect, but remember that it was built virtually for free, with just a few lines of English language as a command. And nothing in the education market today has the capability to generate almost limitless examples to connect the lesson concept to students’ interests.
Chained prompts can also help focus AI systems. For example, an educator can prompt a generative AI system first to read a practice guide from the What Works Clearinghouse and summarize its recommendations. Then, in a follow-up prompt, the teacher can ask the AI to develop a set of classroom activities based on what it just read. By curating the source material and using the right prompts, the educator can anchor the generated responses in evidence and high-quality research.
However, much like fledgling interns learning the ropes in a new environment, AI does commit occasional errors. Such fallibility, while inevitable, underlines the critical importance of maintaining rigorous oversight of AI’s output. Monitoring not only acts as a crucial checkpoint for accuracy but also becomes a vital source of real-time feedback for the system. It’s through this iterative refinement process that an AI system, over time, can significantly minimize its error rate and increase its efficacy.
Uses of AI in Education
In May 2023, the U.S. Department of Education released a report titled Artificial Intelligence and the Future of Teaching and Learning: Insights and Recommendations. The department had conducted listening sessions in 2022 with more than 700 people, including educators and parents, to gauge their views on AI. The report noted that “constituents believe that action is required now in order to get ahead of the expected increase of AI in education technology—and they want to roll up their sleeves and start working together.” People expressed anxiety about “future potential risks” with AI but also felt that “AI may enable achieving educational priorities in better ways, at scale, and with lower costs.”
AI could serve—or is already serving—in several teaching-and-learning roles:
Instructional assistants. AI’s ability to conduct human-like conversations opens up possibilities for adaptive tutoring or instructional assistants that can help explain difficult concepts to students. AI-based feedback systems can offer constructive critiques on student writing, which can help students fine-tune their writing skills. Some research also suggests certain kinds of prompts can help children generate more fruitful questions about learning. AI models might also support customized learning for students with disabilities and provide translation for English language learners.
Teaching assistants. AI might tackle some of the administrative tasks that keep teachers from investing more time with their peers or students. Early uses include automated routine tasks such as drafting lesson plans, creating differentiated materials, designing worksheets, developing quizzes, and exploring ways of explaining complicated academic materials. AI can also provide educators with recommendations to meet student needs and help teachers reflect, plan, and improve their practice.
Parent assistants. Parents can use AI to generate letters requesting individualized education plan (IEP) services or to ask that a child be evaluated for gifted and talented programs. For parents choosing a school for their child, AI could serve as an administrative assistant, mapping out school options within driving distance of home, generating application timelines, compiling contact information, and the like. Generative AI can even create bedtime stories with evolving plots tailored to a child’s interests.
Administrator assistants. Using generative AI, school administrators can draft various communications, including materials for parents, newsletters, and other community-engagement documents. AI systems can also help with the difficult tasks of organizing class or bus schedules, and they can analyze complex data to identify patterns or needs. ChatGPT can perform sophisticated sentiment analysis that could be useful for measuring school-climate and other survey data.
Though the potential is great, most teachers have yet to use these tools. A Morning Consult and EdChoice poll found that while 60 percent say they’ve heard about ChatGPT, only 14 percent have used it in their free time, and just 13 percent have used it at school. It’s likely that most teachers and students will engage with generative AI not through the platforms themselves but rather through AI capabilities embedded in software. Instructional providers such as Khan Academy, Varsity Tutors, and DuoLingo are experimenting with GPT-4-powered tutors that are trained on datasets specific to these organizations to provide individualized learning support that has additional guardrails to help protect students and enhance the experience for teachers.
Google’s Project Tailwind is experimenting with an AI notebook that can analyze student notes and then develop study questions or provide tutoring support through a chat interface. These features could soon be available on Google Classroom, potentially reaching over half of all U.S. classrooms. Brisk Teaching is one of the first companies to build a portfolio of AI services designed specifically for teachers—differentiating content, drafting lesson plans, providing student feedback, and serving as an AI assistant to streamline workflow among different apps and tools.
Providers of curriculum and instruction materials might also include AI assistants for instant help and tutoring tailored to the companies’ products. One example is the edX Xpert, a ChatGPT-based learning assistant on the edX platform. It offers immediate, customized academic and customer support for online learners worldwide.
Regardless of the ways AI is used in classrooms, the fundamental task of policymakers and education leaders is to ensure that the technology is serving sound instructional practice. As Vicki Phillips, CEO of the National Center on Education and the Economy, wrote, “We should not only think about how technology can assist teachers and learners in improving what they’re doing now, but what it means for ensuring that new ways of teaching and learning flourish alongside the applications of AI.”
What's Your Reaction?