Automated Mentoring with ChatGPT – O’Reilly

Ethan and Lilach Mollick’s paper “Assigning AI: Seven Approaches for Students with Prompts” explores seven ways to use AI in the classroom. (While this paper is eminently readable, there is a non-academic version on Ethan Mollick’s Substack.) The article describes seven roles that an AI bot like ChatGPT could play in the educational process: mentor, tutor, coach, student, teammate, student , simulator and tool. For each role, it includes a detailed example of a prompt that can be used to implement that role, along with an example of a ChatGPT session using the prompt, risks of using the prompt, guidelines for teachers, instructions for students, and instructions on how to do so Help the teacher create their own prompts.

The mentor role is particularly important to the work we do at O’Reilly to train people in new technical skills. Programming (like any other skill) is not just about learning the syntax and semantics of a programming language; It’s about learning to solve problems effectively. This requires a mentor; Tim O’Reilly always said that our books should be like “someone smart and experienced looking over your shoulder and making recommendations.” So I decided to try out the mentor prompt with some short programs I’ve written . Here’s what I learned – not specifically about programming, but about ChatGPT and automated mentoring. I won’t reproduce the session (it was quite long). And I’ll say this again now and finally: What ChatGPT can currently do has its limits, but it will certainly get better, and probably quickly.

Learn faster. Dig deeper. See further.

First, ruby ​​and prime numbers

I tried for the first time a Ruby program I wrote about 10 years ago: a simple prime number sieve. Maybe I’m obsessed with prime numbers, but I chose this program because it’s relatively short and because I haven’t touched it in years, so I was somewhat unfamiliar with how it worked. I started by pasting the full prompt from the article (it’s long), answering ChatGPT’s preliminary questions about what I wanted to accomplish and my background, and pasting in the Ruby script.

ChatGPT responded with some pretty basic advice about following common Ruby naming conventions and avoiding inline comments (Rubyists used to believe that code should document itself. Unfortunately). A was also pointed out puts() Method call within the main loop of the program. That’s interesting – that puts() was there for debugging and I obviously forgot to take it out. A useful point about security was also raised: while a prime number sieve poses few security issues, reading command line arguments directly from ARGV rather than using a parsing options library could leave the program vulnerable to attacks.

It also gave me a new version of the program with these changes. Rewriting the program was not appropriate: a mentor should provide comments and advice, but not rewrite your work. This should be left to the learner. However, it is not a serious problem. To prevent this rewrite, all you need to do is add “Do not rewrite program” to the command prompt.

Second attempt: Python and data in spreadsheets

My next experiment was with a short Python program that used the Pandas library to analyze survey data stored in an Excel spreadsheet. This program had a few problems – as we’ll see.

ChatGPT’s Python mentoring wasn’t much different from Ruby: it suggested a few stylistic changes, like using snaking variable names and using F-strings (I don’t know why I didn’t; they’re one of mine favorite functions). ), encapsulate more of the program logic into functions and add some exception checks to catch possible errors in the Excel input file. There was also an objection to my use of “No Answer” to fill empty cells. (Pandas usually converts empty cells to NaN, “not a number”, and is frustratingly difficult to deal with.) Useful feedback, although not earth-shattering. It would be hard to argue against any of this advice, but at the same time there is nothing that I would consider particularly insightful. If I were a student, I would soon become frustrated if two or three programs gave similar answers.

Of course, if my Python was really that good, all I needed was a few superficial comments on the programming style – but my program wasn’t that good. That’s why I decided to push ChatGPT a little harder. At first I told him that I suspected the program could be simplified by using dataframe.groupby() Function in the Pandas library. (I rarely use it groupby(), for no good reason.) ChatGPT agreed – and while it’s nice when a supercomputer agrees, this isn’t a radical suggestion. This is a suggestion I would have expected from a mentor who has used Python and Pandas to work with data. I had to make the suggestion myself.

ChatGPT obligingly rewrote the code – again, I probably should have advised him against it. The resulting code looked reasonable, but it introduced a not-so-subtle change to the program’s behavior: it filtered out the “no response” lines after the percentage calculation, rather than before. It’s important to pay attention to minor changes like these when asking ChatGPT for help with programming. Such minor changes are common and may seem harmless, but they can change the output. (A rigorous test suite would have helped.) This was an important lesson: you really can’t assume that everything ChatGPT does is correct. Even if it is syntactically correct and runs without error messages, ChatGPT can introduce changes that result in errors. Testing has always been important (and underused); With ChatGPT it’s even more.

Now for the next test. I accidentally left out the last few lines of my program that created a series of plots using Python’s Matplotlib library. While this omission did not affect the data analysis (the results were printed on the terminal), several lines of code arranged the data in a way that was convenient for the graphing functions. These lines of code were now a type of “dead code”: code that is executed but has no influence on the result. Here, too, I would have expected a human mentor to oversee the whole thing. I would have expected them to say, “Look at the graph_data data structure.” Where is this data used? If it’s not being used, why is it there?” I didn’t get any help like that. A mentor who doesn’t point out problems in the code isn’t a great mentor.

So on my next prompt I asked for suggestions on how to clean up the dead code. ChatGPT praised me for my insight and agreed that removing dead code was a good idea. But again, I don’t want a mentor to praise me for having good ideas. I want a mentor to notice what I should have noticed but didn’t. I want a mentor to teach me to pay attention to common programming mistakes, and that if you’re not careful, source code is bound to get worse over time – even as it’s improved and restructured.

ChatGPT also rewrote my program again. This last rewrite was wrong – this version didn’t work. (It might have been better if I had used Code Interpreter, although Code Interpreter does not guarantee correctness.) That both are and are not a problem. This is another reminder that you must carefully review and test everything ChatGPT generates if correctness is a criterion. But – as part of the mentorship – I should have written a prompt that suppresses code generation; Rewriting your program is not the mentor’s job. Also, I don’t think it’s a big problem if a mentor occasionally gives you bad advice. We are all human (at least most of us). That’s part of the learning experience. And it is important to us to find applications for AI where errors are tolerable.

So what is the result?

  • ChatGPT is good at giving basic advice. But anyone who is serious about learning quickly wants advice that goes beyond the essentials.
  • ChatGPT can detect when the user makes good suggestions that go beyond simple generalities, but is unable to make these suggestions itself. This happened twice: when I had to ask about it groupby()and when I asked him about cleaning up the dead code.
  • Ideally, a mentor should not generate code. This can be easily fixed. However, if you want ChatGPT to generate code that implements its suggestions, you must carefully check for bugs, some of which may be subtle changes in program behavior.

Not here yet

Mentoring is an important application for language models, not least because it addresses one of their biggest problems: their tendency to make mistakes and make mistakes. A mentor who occasionally makes a bad suggestion isn’t really a problem; Following the suggestion and discovering that it is a dead end is in itself an important learning experience. You shouldn’t believe everything you hear, even if it comes from a reliable source. And a mentor really has nothing to do with generating code, wrong or not.

I’m more concerned about ChatGPT’s difficulty in providing truly insightful advice, the kind of advice you really want from a mentor. It’s able to offer advice if you ask it about specific problems – but that’s not enough. A mentor must help a student explore problems; A student who is already aware of the problem is well on his way to solving it and may not need the mentor at all.

ChatGPT and other language models are bound to improve, and their ability to act as a mentor will be important for people building new types of learning experiences. But they haven’t arrived yet. If you’re looking for a mentor, you’re on your own for now.