christopherhord

Posts Tagged ‘technology’

Using AI-generated Content: beware the dangers

In Uncategorized on December 10, 2023 at 2:51 pm

It’s easy to dismiss those who worry about the dangers of AI as alarmists, although in a world where weaponized drones are piloted by AI, those concerns are closer than you might think. However, anyone who has an interest in using AI to help them generate content for blogs, articles, reports, etc., faces more conventional, but very real dangers.

Thinking of using AI to generate reports or other documents for business use? Think carefully. In June, a New York judge fined NY attorneys Steven Schwartz, Peter LoDuca and their law firm Levidow, Levidow & Oberman for “acts of conscious avoidance and false and misleading statements to the court.” Their offense? In a bid to save some time, the lawyers used ChatGPT to generate a brief in a case they were working on. ChatGPT provided them with six fictitious cases as citations.

Want to use AI-generated images or copy on your blog, your company website, or in marketing and promotional materials? Not so fast. AI has a nasty habit of occasionally belching up entire sections from works that it has scanned. Since the most popular chatbots have all been trained from sources including copyrighted material, you can end up plagiarizing copyrighted work. Microsoft, OpenAI, Google and GitHub are just a few of the prominent companies that have been sued for using copyrighted material.

Currently, several coalitions of prominent novelists, non-fiction writers, and artists have all filed lawsuits against various AI companies for unethical training models. If you use material generated through any of their AIs, you could be using copyrighted material as well, and be legally liable.

The main AI giants are working feverishly to try to create tools that will not present users with copyrighted material but other problems remain, including AIs creating objectionable material, inaccurate material, and material that uses appropriate jargon, but is ultimately meaningless.

To illustrate, I recently created a thesis statement for this article and ran it through Google Bard AI and ChatGPT to get their take on the topic. The thesis statement was as follows:

“There are many concerns when using a chatbot to create a document using large language model AI. This technology sometimes delivers vague generalities, incorrect results, inaccuracies, plagiarism, and disturbing results. Write an article about this topic, including your sources.”

Admittedly, the results were moderately impressive in a limited way, although there was a glaring, major flaw. I had tried out Bard in ChatGPT earlier in their development. I found them interesting but not terribly useful and had not been back in a while. The products have both made strides forward since then. For instance, since I asked them to write an article, both of them knew enough to create headlines and subheads that were catchy, and to divide up the content in such a way that each section fit with its appropriate subhead. This was something that they could not previously do.

But the problem came in my request that the chatbots show their sources. Each chatbot made a casual gesture at showing sources but their attempts were feeble, and possibly misleading.

For instance, here are some sources that Google Bard included at the end of its article:

  1. Bender, Emily M., and Timnit Gebru. “On the dangers of stochastic parrots: Can language models be too big?” arXiv preprint arXiv:2101.08236 (2021).
  2. Brundage, Miles, et al. “The malicious use of artificial intelligence: Forecasting, prevention, and mitigation.” arXiv preprint arXiv:1802.07228 (2018).
  3. Zhang, Xiangyang, et al. “Detecting plagiarism in large language models.” arXiv preprint arXiv:2201.01809 (2022).

On the surface, that looks pretty promising. However, looks are deceiving. The most obvious problem is that the numbers before each citation seem like they might relate to something in the article, much like footnotes. But they do not. There are no corresponding numbers in the article, relating to these cited works. And, since there are no direct quotations or specific factual statements, only generalities, it is impossible to tell what the sources relate to. And, because of the tendency of large language models to simply associate words that it sees used together in other works, there is no guarantee that the sources were ever used in the article at all. They may simply have been cited several times in other pieces the AI scanned.

The results for ChatGPT were even worse. Only two sources were cited, one of which was a paper from chat GPT’s parent company, OpenAI, and no information was provided to help us determine how these two citations were used, or even if they were used.

So beware. Even asking the AI to show its sources does not give you reliable information, or even information that is very helpful to you in double-checking the AI’s result against its sources. This is an extraordinarily dangerous situation for people who are creating text with chatbots that they plan to use In other circumstances. Until AI makers solve the problem of creating more transparency in their sources, it is probably best to avoid using AI-generated content as a quick fix, for the time being.