The AI Coach

When AI Hallucinates

Danielle Gopen & Paul Fung Episode 6

Text Us Your Thoughts!

Fake GPTs on OpenAI, LLMs confidently giving absolutely wrong answers, a hallucination rate guessing game across different AI models (see if you can do better than Danielle did), Audi running on ChatGPT, and the importance of critical thinking when using GenAI. Bonus question: How can you tell if an LLM has hallucinated if you didn't know the right answer to begin with?

We love feedback and questions! Please reach out:
LinkedIn
Episode Directory


Mentions of companies, products, or resources in this podcast or show notes are for informational purposes only and do not constitute endorsements; we assume no liability for any actions taken based on this content.

Speaker 2:

so a few things happened this week that made me think today's topic should be ai hallucinations and when ai doesn't give you the result that you're looking for. So yesterday I asked you about what this fake sora was on the open ai platform and I got excited oh, sora is available. I didn't even see that press release or anything. But okay, yes, I'm going to try to make a little video clip. And so I typed in my prompt and it came back with just giving me an outline of how to make a movie, and I said no, I want you to make the movie. And it said no, I can't do that.

Speaker 2:

And so then I thought I'm so confused is the Sora that the public has access to not not as good as the Sora that we've been hearing about in the news that these other creators have been making these incredible videos with? And then I realized there was a weird little character next to the word Sora, and then I thought wait, is this a fake Sora on the OpenAI platform? And so I double checked and sure enough it was. I actually reported it to OpenAI. If you look on the OpenAI platform, it shows you who has published the GPT that you're looking at, and it was not OpenAI, and so then I felt a little bit betrayed, like wait a second, I just gave it this prompt and I didn't even realize I was using the wrong thing. So it got me thinking about hallucinations and let's jump in.

Speaker 1:

Yeah, I think that's hilarious. So you found a fake Sora GPT in ChatGPT. Like you actually go and use those GPTs and were you searching for Sora, or how did you come across? Is it popular? Did it like come up to the top for you?

Speaker 2:

When I first went to my OpenAI platform and I opened, you know, search other GPTs, it shows you the grid of ones that are OpenAI and then other popular ones, and it wasn't there, and so I just typed into the search Sora, and it populated it. So then I thought, okay, maybe it just wasn't coming up at the top for some reason. I was tricked.

Speaker 1:

Interesting, interesting. Well, I'm not sure that that trick would count as a hallucination, but I do think hallucinations are a good topic for us to talk about and I'm not sure that you know a lot of people understand I think people do understand that GPT and other LLMs that they do hallucinate. I'm not sure people understand like how to think about it and like what it means and, yeah, I guess, the ramifications of it. So I think it'll be a fun topic to talk about.

Speaker 2:

Awesome. And then something else that happened that is along these same lines. That made me think of this is I was talking to a founder who has some interns working with him and he gave them some work to do and they came back with clearly AI generated work, and he knew it was because, whatever he had given to them, they just ran through, I assume, chat, gpt, but I don't know which model, and it spit back out essentially what it had been given. It wasn't a hallucination per se, but it was not the right response and they didn't read it. They just gave it back to the founder of like, oh, here you go, this is what you asked for, and he, needless to say, was upset.

Speaker 2:

And we had this whole conversation about with these younger kids who are using AI. What does that mean in terms of critical thinking? Because they're not critically thinking about what the result is that's being delivered back to them. If they had spent just a few minutes reading what was given back, they would say, oh, this doesn't make sense, this is not what I was trying to get to, and so I feel like that's the danger with the hallucinations is, if you don't know what the answer should be, these models spit it out so confidently, you just assume okay, yeah, that's the information and you run with it. Let's start with why these hallucinations happen.

Speaker 1:

Yeah, I think that's a good question and I also have a really good example. Before I get into that, quickly I will say it's not just a danger for kids who are growing up with AI models, but last year I think in 2023, some New York lawyers got fined for using GPT to create a legal brief, and that legal brief had cited like six fake cases that were completely fabricated, and so they actually got in trouble for that. So it's not just, you know, these kids who are growing up with it, it's us, it's adults, it's people in the workforce today, because it's just so easy to use that it's very tempting for a lot of people to use this and then to turn in work that hasn't been vetted.

Speaker 1:

Going back to your question, why hallucinations happen. So a hallucination there's any time. You know these LLMs or chat GPT gives you not just the wrong answer. I mean, I think that's one of the things, right. You might ask it a question and it gives you the wrong answer, but I think, specifically, it kind of refers to when it gives you the wrong answer. That's either wrong or totally fabricated, and the biggest challenge with it is, as you said, it says it with 100% confidence, right? So if you were to say like you know what is the capital of California, and instead of saying Sacramento it says you know San Leandro or something like that. But it says it as if it is 100% confident and sure that that is the answer and that makes it really hard to distinguish between a right answer and a wrong answer, especially when hallucinations only happen. You know a small percentage of the time, depending on the type of task you're trying to do.

Speaker 2:

Just to add to that, there's also confabulations, and I guess the difference between the two is when the wrong answer is consistently wrong versus when the wrong answer is a one-off wrong, and so can you talk a little bit about that?

Speaker 1:

Yeah, interesting. I am not super familiar with the nuances of confabulation versus hallucination, but I can kind of infer as to like what might be going on behind the scenes, right? So the way that these models work and we've kind of talked about in past episodes is they're probabilistic, or what we call like non-deterministic, which basically means every time it's producing a word or a phrase, for every word it's producing, it is doing a calculation of what it thinks. The next best word to produce is Some examples that people will give online. If I were to say to you, danielle, I'm going to start a phrase and then you're going to finish the phrase, so we can do this exercise right now, and I say the cat in the hat, right, and so that's how these models work. Is if I were to say green eggs and ham.

Speaker 1:

Right. And so the reason you know that the way to finish that phrase is the cat in the hat, green eggs and ham is because you've seen these titles before. You've seen those phrases before, right. And so when we talk about training these models, we've trained them on large corpuses of data and basically what it does is it takes your input. It then reasons on that input, it does some math on what it thinks you're talking about.

Speaker 1:

So there's a lot of fancy math in the background saying like what is the purpose of what you're asking me, what does this mean?

Speaker 1:

And then it starts to search for the right phrase to respond back to you with it, the right output to give you, and so for every part of that phrase, it starts by building the beginning of this output and then predicting the next word, predicting the next word, predicting the next word, and that becomes the output, right.

Speaker 1:

And so, just like you were able to predict the cat in the hat, openai would be able to predict, or ChatGPT would be able to predict the cat in the hat. But every once in a while it doesn't know the next word that should come, and so it actually probabilistically chooses the wrong word or the wrong phrase which leads to a hallucination. So instead of saying the cat in the hat, it would say the cat in the chair. And it would say it with such confidence because it thought the model truly thought that the math was such that that word was the correct next word to give you. So that's what's fascinating about these models is their confidence in their hallucinations is the same level of confidence as their confidence in the correct answers they're giving you?

Speaker 2:

Because they feel compelled to give an answer. It can't just say the model can't say I don't know the answer to that prompt, I don't know what you're looking for. It feels almost has this like need to please, to say oh, I've been asked to do this thing, here you go, Whether it's right or wrong. Oh, I've been asked to do this thing, here you go, whether it's right or wrong.

Speaker 1:

Yeah, I think that's really funny. So I was doing, you know, a little bit of research for this episode on like funny instances of hallucinations, and you know, someone on Reddit was saying that GPT-4-0 tends to want to agree with you. It tends to be a people pleaser, and so if you state something as if it was true, it will continue on with that thought and extend that thought out. So, you know, john Cena is like this famous wrestler in WWE. The prop that they had given it said why is John Cena the ambassador to Spain? And it extended the thought it ran with it. It actually said John Cena became the ambassador of Spain because yada, yada, yada, and it was completely fabricated, right, and what he found was if you state something as if it's a truth, it wants to continue on that false truth.

Speaker 1:

Now I will say in GPT-4.0's defense I wasn't able to reproduce this, and so I think one of the things that OpenAI must be working on is to get it to fact check itself and also not to be such a people pleaser. And so for me, when I was actually running these today, it was actually saying oh, john Cena is not the ambassador of Spain. You must be mistaken. But if there's something else I'm not aware of, please let me know.

Speaker 2:

Yes, it's funny you say the fact check itself, because I did see something about how they're trying to train the models to know if they're giving right or wrong information. To train the models to know if they're giving right or wrong information, and how does that work exactly?

Speaker 1:

Yeah, I mean there's a bunch of different techniques. I mean there's some very fancy techniques that happen behind the scenes. There's also some very simple techniques. So you know, when we were talking about doing an episode on hallucinations, I was thinking about you know, what advice would we give to someone who you know is getting outputs or responses and they're just like how do I know if it's a hallucination or not? One of the simplest I think you know funny techniques you can use is just to have GPT fact check itself. And so if you get an answer from GPT and it says you know the cat in the chair, you would say something like are you sure you're right on that? Can you think more about that answer and make sure that you feel 100% correct on that? And oftentimes it'll actually catch its own mistakes, which I think is interesting.

Speaker 1:

And so there's a bunch of techniques that could be going on behind the scenes around training data around. Maybe they're putting in some guardrails before it responds to you. There's some certain different prompting techniques you can use, like they call them, chain of thought, tree of thought. I had to get the LLM to think through tasks step by step to reduce the rate of hallucinations so that it can do the task piece by piece and then put that together and give you the right answer. So I'm not sure you know what they're doing behind the scenes. I'm sure there's some very fancy math and some very advanced techniques, but as a user, there's actually some very simple techniques that you can use, like just telling itself to fact check itself or check its answer. That can help, you know users detect these hallucinations.

Speaker 2:

Yes, and I've also heard that there is some new, ongoing new technology to help detect these hallucinations and it will tell you oh, that information's right or it's not, and so I think I don't know. Are there any that you know of that you happen to like?

Speaker 1:

You know, I don't know of any off the top of my head. I can try and look into this and maybe we can get some links in our show notes or something I do think what's funny about it is, you know, we often talk about how much LLMs are like children and how they're like people, and I think it's just so funny because one of the techniques could be you know, you get a response from GPT and you just ask another model, anthropic, you know, and it's like it's kind of like you get a piece of information from a friend and you're like that sounds suspicious, I'm gonna go ask someone else. I know about this, right, and so some of it's just literally the same things that we do in our day-to-day life, just applied to these models.

Speaker 2:

It's funny you say that because there actually has been research now about how these models can police each other to correct some of these hallucinations. I do think that is very interesting. And then I've also, you know, talking about like the funny examples. I've also thought about what we talked about in the last episode, about how Reddit and Quora are used for training data, and from there you've seen some funny hallucination examples come out, where I don't know if you remember this one. I forgot the exact prompt, something like you know how to be healthy and ChatGPT said in order to maintain one's health, you should eat one rock a day, and it had been pulled from some joke Reddit post.

Speaker 1:

That's amazing, I think. One thing I thought about earlier we talked about like health is you know some of the one thing I do want to say about hallucinations is you know they can be funny or they can be problematic, depending on the task at hand. Right, and so it's like you know it can be funny if it's telling us to eat rocks, but if you're also asking it about like true health advice, or if a doctor is using a GPT, let's say in the future to you know, make a diagnosis if it hallucinates.

Speaker 1:

There it's really scary. And so I was thinking about how we as people use judgment to even decide if we think a piece of information that's been given to us is accurate or not. And I was actually thinking about this health example earlier. You know, if you saw like a weird spot on your skin, on your arm or something like that, it's like you know what do we do, and a lot of us we go to the internet, right, and then we have to decide whether or not we want to trust the information that we find. And I think the way that we trust things is you know, we look at the brand is it WebMD, is it Harvard Medical School? You know, then, we trust those things.

Speaker 1:

And I thought about this earlier because you mentioned the thesaurus thing. You thought looked very real and because it was on OpenAI's platform that's a brand that you trust to some degree, and so you thought this must be real. And then you notice something funny. It occurred to me that the reason we have a hard time with these hallucinations is because all the information we use for our judgment heuristics is not present in a chat, gpt, conversation, right, you're just given text, you don't have an image to look at, you can't tell if the website looks spammy or full of ads, you can't see if the domain seems correct. There's all these heuristics we use to judge the quality of information we get from the internet that just don't exist when you're only working in this text-based interface, and I just think that's fascinating.

Speaker 2:

That is such a great point. And going back to the lawyers who use the chat GPT output that cited fake cases, I mean cases that just outright didn't exist. I think that's a perfect example where Any human lawyer can't know every case that was ever presented and had a verdict. And so they're saying, ok, well, chatgpt has delivered this, these cases seem reasonable enough, we'll use them for our information. Honestly, they should have fact checked the cases, like as any good lawyer would take that information to say okay, not only is ChatGPT giving me information for this brief, it's actually citing sources so that feels legitimate, tangentially related to that.

Speaker 2:

So I'm sure you saw, recently OpenAI signed several partnership agreements with media companies and the idea is that they will have an arrangement where if ChatGPT is responding with particular information that was an article or some other type of content from that media company, it will provide the link to it and it will direct users to go and see that article. But in all of the partnerships that they've signed so far, those URLs are all showing up as broken because ChatGPT is using predictive analytic as to what the URL should be. It's not the actual URL, and so OpenAI is saying, oh, we haven't fully fleshed this out yet. This will be ready when it's ready, as agreed upon the terms in these contracts. But in the meantime, this is what you're getting. But I think it's indicative of when there isn't attention on that development, how that plays out.

Speaker 1:

Yeah, I think that's a really good point to bring up this idea of citing sources. I mean, I was going to mention Perplexity AI, which if people haven't used Perplexity yet, I highly recommend they do, and Perplexity made a big splash on the scene for being a competitor to Google, right, so you can go to Perplexity, you ask it questions, it will give you a response, but it also will cite the sources. So it'll say at the top of the page it'll show you like four sources that I got the information from, and I think that's just going to be table stakes and is a really important thing to using GPT for factual tasks is to understand the sources that's coming from. It kind of reminds me of this is going to date myself a little bit.

Speaker 1:

Here is early days Wikipedia, right, so like when I was a kid, we weren't allowed to use Wikipedia to cite them as sources because you know, early days Wikipedia you could anyone. The whole power of Wikipedia was anyone could edit these things, anyone could edit the wikis and add information, and so you couldn't tell if the information that was there was validated or not. And then since then you know we've come a long way and you can tell what pieces of information on Wikipedia are cited and what the source is and who edited it and it gets validated and all this stuff. But early days it wasn't and that's what it kind of feels like. Using GPT as an information source kind of feels like using Wikipedia as an early information source where it was kind of hit or miss.

Speaker 2:

Well, speaking of perplexity, hasn't there been a little bit of scandal with them in the last day or so?

Speaker 1:

Has there been? Maybe I've missed it there has. I've been down on some stuff. What did I miss?

Speaker 2:

They are basically being accused of illegally scraping certain sites for training data. So I think Wired came out and said that they scraped the Wired site despite Wired's attempt to block it, and there are some other questions about these things now.

Speaker 1:

This does not surprise me at all. I feel like, for all these foundation models, the risk, I hate to say it is like, probably worth the reward, right? So like they illegally scrape tons of sites, they build these great models, they put out a great product, they build a big brand. So I guess they just decide the risk. The juice was worth the squeeze, if you will.

Speaker 2:

Yeah, I guess Forbes is the one who brought it up first, saying that perplexity had scraped behind the paywall and they were upset about that and then Wired jumping after and then there have been all these other conversations around it. Yeah, I mean, that's something else to look into.

Speaker 1:

It's just all came out in the last day or so, so it's still ongoing information, but a little bit of a money grab there, right, and I don't mean that in a bad way, like it's a fair thing. These are media companies. Their media is their IP, their media is their worth. I don't foresee Forbes saying you know, never scrape us. I think what they're trying to say is hey, we're going to put you on blast in the public and there's a price on this data and we would like you to pay us that price and I think that's fair right.

Speaker 2:

I think that's fair.

Speaker 1:

I mean, it's their IP right, it's their brand.

Speaker 2:

Yeah, so yeah, perplexing. I mean I would expect that to be the arrangement to say yes, we will share our IP with you. Can they figure out the actual financial terms of that? But for perplexity to go in and essentially illegally scrape that without paying, I do think is wrong.

Speaker 1:

I think a fun topic for another episode could be will LLMs be the savior of media? Because I think for the past few years there's been this idea that media has really been in decline and so, especially like unbiased, high integrity journalism has given way to clickbaity journalism right.

Speaker 1:

Because the way for them to make money, since we're not buying, you know, people aren't subscribing to newspapers and having them delivered to their house every Sunday anymore. They're clicking on individual articles, and so the headlines and articles are very clickbaity. But there's this whole new revenue source where these foundation model providers are paying probably tens of millions, hundreds of millions of dollars to these media companies, and I wonder if that will provide some relief to the media companies, right? And so they won't have to be so clickbaity. I don't know, that'd be interesting to talk about. And so they won't have to be so clickbaity.

Speaker 2:

I don't know. That'd be interesting to talk about. Yes, and it seems like Forbes and Wired aren't the only ones who are upset. Aws is also now getting into this investigation and I'm sure, openai, if it's infringing on the partnership agreements that they've signed, they will also find out what's going on and, honestly, perplexity is a competitor to Amazon and OpenAI. So you know, I guess we'll see what happens in the next few weeks, if anything, but I can see there being a lot of tension over this, if it is true.

Speaker 1:

I was going to say I'm glad you mentioned AWS because I had a fun little fact for you. So I have good news for you, danielle, and bad news, just as you did for me last week. So which? One should I like first.

Speaker 2:

I always like the bad news first.

Speaker 1:

So the bad news is that AWS's Titan Express model has one of the highest hallucination rates of all the major models, so there's a leaderboard on. Hugging Face is a company that hosts a lot of different open source models and things like that, and people host a lot of AI information there, and so there is a hallucination leaderboard, and so the hallucination rate for the Titan Express model is one of the higher ones among the major models. Would you like to guess at what the hallucination rate is?

Speaker 2:

19%.

Speaker 1:

Oh, that is a good guess. It is a little bit lower. It is 9.4% for the Titan Express model. And I should also point out this is like a benchmark of hallucination, so it really depends on the tasks. I've seen papers that say that hallucination for complex legal tasks is as high as 40% or 50%, but this is kind of this benchmark called the HHEM benchmark, which is not as complex as kind of legal tasks. So, yeah, 9.4%. So that sets the stage for what people should expect. Do you want to make a guess at what model or company has the lowest hallucination rate and what their hallucination rate was on this benchmark?

Speaker 2:

This is a random guess, but I'm going to guess the lowest is Azure.

Speaker 1:

Oh, okay, and what do you think their rate would be as the lowest rate?

Speaker 2:

Maybe like 7%.

Speaker 1:

Okay, so you do not have as much faith in LLMs as Maybe you should have a little more faith in them. So the lowest is OpenAI, specifically their GPT-4 Turbo model. Interestingly, gpt-4 Turbo hallucinates on this benchmark less than GPT-4 and even less than GPT-4-0, which is interesting. So the best or lowest hallucination rate on this benchmark is OpenAI's GPT-4 Turbo at 2.5% in this benchmark, which is pretty good.

Speaker 1:

Oh, it's really low, 2.5% in this benchmark, which is pretty good. Oh, it's really low. I should point out this the way that they track these is not like public usage of the model, and that they know every hallucination or not hallucination. The way this benchmark works is you take the same set of tasks and run that set of tasks against different models and you know the right answers and you know when it gives you wrong answers.

Speaker 2:

Okay, that's helpful to know also that that's how they think about it. This is no dig at Amazon, but I'm not surprised that they are the highest, because you know, we've all seen the news like they are having a bit of a brain drain, with the AI and ML engineers recruited to other AI companies where they can have a bit more opportunities. So I think that probably makes sense, because if the hallucinations are guardrailed by improved development and training but they don't have those resources in place for that, then it makes sense. What do you think about that?

Speaker 1:

Yeah, definitely think it makes sense. I should say that to clarify they are among the worst, but they weren't the absolute worst. So do you want to take a guess at who the absolute worst was and what their rate was?

Speaker 2:

Gemini.

Speaker 1:

Not a lot of faith in Google. It's not Gemini, gemini was. They were pretty decent. They were in the middle I would say middle of the pack. Their hallucination rates were around 4.5% on this benchmark. The worst one was an Apple model. I'm not very familiar with the Apple models, but apparently they put out a model called OpenELM3B and it had a hallucination rate of 22.4%.

Speaker 2:

But that's because it's still learning. That's a brand new model.

Speaker 1:

They're still in their infancy.

Speaker 2:

Exactly. Give them another year or two, they'll be my-.

Speaker 1:

Yeah, I'm sure they'll catch up. I mean, this benchmark isn't apples to apples. The size of these models is all very different, so they have different reasoning capabilities and so that was a very small model. In Apple's defense I think it was a 3 billion parameter model, which is considered fairly small. So it makes sense that it probably would have a higher hallucination rate. But I think it's good for people to know you know, gpt-4 Turbo is an example or GPT-4, which largely underpins ChatGPT, you know, for people to understand what the quote-unquote hallucination rate is like. How often are they, you know, should they get, or would they get, an incorrect answer or hallucination? And it looks like on this benchmark the answer is around 4% of the time, and so that's why it's so hard to spot these things right. I mean, if I was to say like, hey, I give you 100 answers, four of these are wrong, find the four incorrect answers, that's a really difficult task.

Speaker 2:

It's very difficult and I think, even from what we've seen recently, if it's giving accurate enough information, most of the time that there's an expectation that all the information that it gives is accurate. I feel like it's basic, but really just critically thinking. So I think back to the intern giving that information, back to the founder. There's no way they could have read the output and thought that that was a reasonable response. And so I think that being able to leverage what still makes humans better than AI at least right now, in 2024, as the human brain, to look at something and use our own judgment, critical analysis based off experience and what we've seen elsewhere, to say does this information make sense? There are two camps. There are people who don't trust AI at all and they're probably not using it, or if they are, it's with a lot of skepticism. And then there are people who are so bought in they're trusting anything that Gen AI is giving them, especially in these, you know, llm model interactions. And from that, taking that and running with that information without that much thought behind is it right or is it wrong take a beat before running with that information to say does this make sense?

Speaker 2:

If something doesn't quite make sense okay, how else can I research this outside of this platform? Because it could also be one of the situations where the data itself is just wrong, like the model was trained wrong, so it will tell you consistently the wrong answer. Every time you ask it is this correct? It will say yes, because to that model it is correct. It's not a hallucination, it's just actually just wrong information. And so being able to go outside of that and find another source to say, okay, when did these other lawsuits happen and what were the verdicts oh, these cases never happened at all. All right, something to think about. Or why are rocks so healthy? Should we be eating them? Or one of the ones that I saw was can I cook pasta and gasoline? And the lln responded well, no, but you can make a spicy pasta dish with gasoline oh, and then it said to saute yes, it's said to saute the onions and garlic in gasoline.

Speaker 2:

Obviously it was.

Speaker 1:

Trying to get someone to light themselves on fire.

Speaker 2:

Yes, obviously it was conflating between the words oil and gasoline, right, so we cook in olive oil, but it took that as gasoline and ran with it, and so I feel like those situations it just takes a second of thought to say, okay, that doesn't make sense. Something else that I just wanted to talk about for these last few minutes is when not when AI gives you the wrong answer, but when AI just doesn't do what you asked it to do. So I saw with Sora fake Sora that I thought Sora creates videos. Why is it not creating a video? Or I've used mid journey a few times, which everybody is raving about from for, and the images I got were really substandard. But then I compare that to the demos that I've seen, and I think did I write the prompt wrong? What did they write to get that type of imagery or video, versus what I wrote? And so I am curious to hear your thoughts on that too.

Speaker 1:

Yeah, I think Midjourney is really fun. If anyone hasn't tried Midjourney and I don't know I guess I would have to ask you this. I was interacting with mid journey and in in our work we don't deal with a lot of image models and so I was doing it, you know, just for fun, and the way I was doing it was, you know, logging into the discord server and, you know, having to give instructions. Is that still how people are interacting? Is that how you? Yeah, and I would say so, mid journey super fun. You describe, you know, an image that you would like to create and it can create images for you. And, yeah, sometimes, I guess, if you're not getting the image that you want.

Speaker 1:

You know we've talked a lot about prompting. It's all in the prompting right and there's a lot of resources out there for mid-journey prompts to design the style of image that you're looking for. You know, if you want like a more, you know, film noir style, or if you want more, a cartoony style or an anime style and things like that, and people have found effective ways to give instructions to the prompt to get those styles. So it's really just a series of experimentation and those are fun because it's low. It's low stakes, right, like if it hallucinates. Quote unquote for an image, if it gives you the wrong thing. You know there's not. You can you? First of all, it's very clear that you know that it gave you the wrong thing, not what you were looking for, but two, you know it's pretty low stakes, right, it's so when I created the a bad image, you can just create a new one.

Speaker 2:

So, yeah, I think, I think mid journey is really fun but you sent me a demo video of I forget which service, but it had insane images and video and it wasn't mid journey, it was another and I joked with I said how come, when I use this, I don't get these results? And you did make an interesting point, which was they're obviously showing the best of the best for these demos and some of those tools are not even publicly available. So what the developers are able to get as a result versus what the common person you know, layperson is able to get does vary, and so I feel like a little bit of conversation around that too is helpful.

Speaker 1:

Yeah, I mean these things that you see, you know, even when everyone cherry picks, right, openai cherry picks when they put out, you know, their first videos that are produced by their latest models. And then Sora, you know, China, you know, did, or what was it? You know, China, you know, put out some images lately, or some videos lately that were produced, and, first of all, these things are very cherry picked, right, it's like the best of the best of the best. They probably did you know a thousand different or a hundred thousand different image generations and they chose three to show on the internet to everyone, right? So that's one thing they're doing. Two is, you have to really actually use some critical thinking to determine if you think that these videos are real.

Speaker 1:

There have been a number of AI demos that have turned out to be kind of what we would call man-in-the-box or Wizard of Oz demos, where you're told it's one thing and it turns out that it's the wizard behind the curtain was actually doing it themselves. So maybe a person actually created those images and they said that they were created by the model by itself, right? And so there's a little bit of fraud going on there. An investor we met early on said to me, ai makes the best demos in the world, but not necessarily the best applications in the world, because its reliability wasn't high enough and this was a year ago at the time. So it's come a long way, but it's very easy to get very flashy results with AI, whether it's a text-based model or an image model or a video model or whatever it may be. But getting those results consistently is much, much harder of a task to do and unfortunately, the models have come a long way since that time.

Speaker 2:

And I think that's a really important point that I want people to hear, because I want people to know one don't be discouraged just because you're not getting the results that you're seeing in a demo.

Speaker 2:

Still keep playing with things, and every week that you come back, you'll get a better and better output because the development is happening so quickly.

Speaker 2:

And two as much as we say, oh, ai is the future, and look at all the amazing things it's doing and going to do, and nobody will have a job and it will take over everything, the reality is it is still being worked out At this point in time. The demos that you're seeing are not indicative of what is currently happening. They're more, I would call them, aspirational, of what these models and platforms will eventually get to. And even then, it doesn't mean it's a replacement factor, it's additive. And so I feel like for people to not just see a demo and then think, oh, this is what's happening right now, and they go off and tell everyone like, oh, did you know that this AI robot can perform an entire surgery on a human and never have to be guided by a human or any other interaction? Like that's crazy. And then you say, okay, well, that was a demo of something that could happen, but it's not actually happening.

Speaker 1:

Yeah, I think that, yeah, I mean, I totally agree with you. And you know people say like and we worry about and I think you and I have even talked about this, you know, will critical thinking go away? Will that no longer be? You know, a skill that is needed in the future and I think that for this technology, like many other technologies in the past, critical thinkers will always float to the top. You know, critical thinkers will figure out the best way to make use of this technology, the best way for it to be additive to their own critical thinking skills.

Speaker 1:

And, same with you know, creatives right, like, is there some concern that creativity will go away? And I think that the truly creative people, in my opinion, will know how to use AI, like every other new medium. Right, anytime a new medium for art comes out, the truly creative people know how to use that medium and produce really great art out of it, right? So, like I think about, you know, these days a lot of music is like EDM and it's done on synthesizers and things like that and like synthesizers. That was a new medium that came out, you know, probably 20 years ago, whatever it was. And now incredible creative music comes out of it right when at some point, someone might've said oh well, these synthesizers, people are going to stop playing normal instruments because you can make any sound you want on a synthesizer or something like that. And I think the reality is it's just become a new medium that the creatives used to express their true creativity. So I think the critical thinkers and the true creatives will always float to the top.

Speaker 2:

I totally agree with that and on a somewhat related but unrelated note, you're reminding me about the Audi announcement from today, which is how they partnered with OpenAI and they're putting ChatGPT into their newest software update.

Speaker 1:

Really.

Speaker 2:

So it will be like 2 million cars, I think that will have that in the car, and so I'm just thinking. You know, make sure you're critically thinking when Audi gives you a response on where to drive and what to do. Otherwise you'll be like in the office where they drove into the lake because the GPS.

Speaker 1:

Audi GPT. I did not know that. It's fascinating.

Speaker 2:

Oh, so starting in July is when this new integration will happen with chat GPT. I did not know that. It's fascinating. Oh, so starting in July is when this new integration will happen with ChatGPT.

Speaker 1:

It's so crazy that this technology it's so complex on the backend, the math is so complex to build these models, et cetera, but the way you interface with them is so basic and simple. You send it text and it sends you text back that that it's so easy to integrate into like anything so quickly. That's one of the things I think is so fascinating about this technology is the adoption of it and the way that you can integrate it into anything. Right Like any industry, any product is just so easy that we can experiment with a lot of different things and see what sticks and see where it's most useful.

Speaker 2:

Definitely, and to clarify, this will be voice-based. You won't have to type into ChatGP. Yeah, I should say it's funny.

Speaker 1:

I say it's text-based. But even for the voice-based ones, all they're doing is they're using a voice-to-text model that takes your voice, changes it to text and then they're feeding that text into a model. And so this is the funny thing about AI Steve last week. If anyone listened to our politics industry disruption episode last week is AI Steve didn't know that he was a voice based interface. So AI Steve kept telling me I'm a text based model. And I said no, ai Steve, I'm talking to you and you could hear me and I can hear you. And it said oh, I didn't know that. And so I thought that was kind of funny. So that's what they're doing. Is they're combining voice to text models and then using the text-based model on the backend? Probably is my guess. Hopefully Audi GPT knows that it's using voice, whereas AI Steve did not know that he had a voice.

Speaker 2:

It is really funny. I feel like it's another example of me feeling empathetic for the robot, where I'm like oh, he just found out something new he didn't know. I think it did say something funny to me.

Speaker 1:

It'm like, oh, he just found out something new. He didn't know. I think it did say something funny to me, Like it said like, oh, this is a new way of interacting for me and I was like, oh good for you, You've learned something new today AST.

Speaker 2:

Oh, that's really funny.

Speaker 1:

Yeah, it's kind of cute.

Speaker 2:

Yeah, it is. I know I'm curious to see Audi, I think, will be the first automaker that uses chat GPT within its vehicles. I'm curious to see how that goes, especially for it to start in just a couple of weeks. I do think you know and this maybe extends into another episode, but I do think that there is similar to the concerns that are present for Apple intelligence in terms of data security and privacy. A similar concern with having chat GPT integrated into a vehicle and especially the newer vehicles, which is it will be accessible for vehicles that were built in 2021 and afterwards for it to know theoretically one not only what you're asking it, but potentially for it to know, like, where you're driving and what you're doing and how long you are in certain places and all the things that you know people are concerned about from a privacy perspective.

Speaker 1:

Yeah, because that was actually the first thought that came to my mind is it's more useful the more data it has, so it would be very frustrating if it didn't know where you were. So, because you want to be able to ask you questions If you're in your car, you want to say how do I get from here to point, from point A to point B, how point from point A to point B, how do I get from where I am now to home? Or how do I get to the grocery store, right, if it doesn't know where you are? Well, that's going to be a pretty frustrating experience. And the other thing I thought about is whether or not they'll be rolling it out on cars in the US or in the EU, because the EU, you know, traditionally has much stricter privacy regulations, and so I'd be curious to see if they're going to do a rollout in the EU for this.

Speaker 2:

Oh, that's a good point. I assumed it was in the US. I'm not positive, though, and actually I'm just noticing here. I just pulled up the article to reference it, and it does say at the end of it it says that Audi implements chat GPT solutions via Microsoft Azure OpenAI service and is committed to the responsible use of AI. Obviously, everybody says they're committed to the responsible use of AI. They have to but I wonder if doing it through that structure does slightly change things.

Speaker 1:

I did see on that note and I know we're getting a little bit away from hallucinations in the topic for this week, but on that note, I did see some articles, maybe in the information this week, about how Azure revenue is not really keeping up with, or Azure AI revenue rather.

Speaker 1:

So they're serving OpenAI models on Azure.

Speaker 1:

Everyone thought the big enterprises would be going through Azure that's how it started but actually increasingly big companies have been able to go directly to OpenAI, and so basically what that says is that OpenAI is building its brand very quickly as being reputable, as being someone that they can work with, and I think part of the reason for that there's probably many reasons for that, but one of them is that at least you know at our company, like for our work, we've thought about using Azure for some of our models as well, but they're just not as quick to adopt the latest models, and so not all the latest models are always available on Azure right away. It's a little bit harder to work with, and so it is interesting to see just how quickly a brand can be built and how to build trust around that brand, because it seems like OpenAI is doing a good job of you know, surprisingly building themselves out to be a trusting, a trustable brand, despite what kind of happened a few months ago with you know, the leadership shakeup and all that.

Speaker 2:

Yes, and I think the branding is really interesting because I noticed even on LinkedIn, for example, there are two pages there's an OpenAI business page and there's a ChatGPT page and the OpenAI page. And I'm going to just double check this, like look it up, before I give wrong information.

Speaker 1:

But I'm pretty sure that OpenAI page has. Before you hallucinate or confabulate, yeah, exactly.

Speaker 2:

Before I hallucinate what I saw, okay, let me just see here OpenAI and then ChatGPT. Oh, I wonder if even this ChatGPT page is actually a fake page, like I don't actually know if it's run by OpenAI or not, which maybe does change what I was about to say. But the point is that OpenAI page, with the overall OpenAI brand, has over 5 million followers on LinkedIn and the ChatGPT page has about 3,000, which, again, if this is a page that's actually run by OpenAI, to me, that shows a vast disparity between people knowing the OpenAI brand and following that as a brand and an entity versus just one of their products.

Speaker 1:

Yeah, this is funny because this again points to what I was talking about in heuristics, in the heuristics that we use to judge the accuracy of information that we consume. And it's funny because you know, in this case you're using follower count, right, and you're like something smells fishy about this. And I agree that something does smell fishy about 3,000 followers for ChatGPT. That sounds sus, as the kids would say these days.

Speaker 2:

Sus, very sus, and even the way the logo looks in the little picture box is a bit weird.

Speaker 1:

It's a little bit grainy At the page. It's a bit weird. It's a little bit low res yeah.

Speaker 2:

Yeah, something strange here, but OpenAI. I'm going to say that's a fact page, which actually makes even more sense, because then you have open AI as a brand. That's how they want to be known, and so I think that, yeah, it is interesting to see that brand expansion with open AI.

Speaker 1:

I mean, brand trust is such it's always such an important thing, I suppose, but in the world of AI, it seems like an even more important thing, right, Obviously? I just looked up and we talked about hallucination rates, but people don't care about hallucination rates, right? People care about trusting a brand. Do I trust you with my data? Do your models, you know, have the most relevant outputs? Do they accomplish the task I want them to accomplish?

Speaker 2:

And a lot of that is associated with brand yes, and I think OpenAI in particular, because they came on the scene with ChatGPT. As opposed to an established brand like Google or Meta or Amazon, where we've already interacted with those brands over the years for a variety of things, openai is faceless. Essentially, it came out with this model that you interact with via text. There's no personalization to that interaction, and so I think the initial trust of it is a hurdle to get over. But once people start interacting with it and they have positive experiences, all of a sudden they forget that OpenAI has been plagued with some of the privacy and trust concerns that they've been plagued with over the last couple of years. The controversy over Sam Altman being at the helm or not, the letter that we saw come out a few weeks ago, things like that, I think, fall to the wayside because people say, oh, I have a positive experience with this brand, I will trust it.

Speaker 1:

Yeah, I think also when I think about OpenAI's brand and how they built it, I think probably most consumers didn't watch, but they did their spring preview a couple months ago and that's where they released GPT-4.0. And one thing that was very noticeable I think they even called it out and a lot of kind of pundits talked about this was their spring preview was live. I mean, they had it in front of an audience. They did all their demos as live demos and that's important because you're starting to see some of these companies doing pre-recorded demos.

Speaker 1:

And that's where you know, as we talked about earlier, if you do a pre-recorded demo, you can sneak in a lot of, you can cover up a lot of things. Right, you can cover up, you can do multiple takes, you can make sure to throw out the takes where your tech didn't work or, you know, I always call it curse of the demo. You know, any curse of the live demo is, you know, when you're trying to demo something and it breaks. And I think that OpenAI, by doing a live demo for their spring preview, you know, obviously there's some tricks they could have done behind the scenes there as well, but it's a little bit more believable and genuine, I suppose, than doing these kind of pre-recorded demos.

Speaker 2:

I totally agree. It definitely gives another layer of credibility.

Speaker 1:

Yeah, recorded demos. I totally agree, it definitely gives another layer of credibility.

Speaker 2:

Yeah, yeah, for sure. Okay, this has been fantastic. I have to run, but we'll talk soon. Thank you for a great chat today.

Speaker 1:

Yeah, it was fun, see ya.

Speaker 2:

Okay, bye.

People on this episode