The Chinese Whispers Problem

DALL-E 3 – in the version integrated in ChatGPT Plus – seems to have a Chinese Whispers problem. In a test by Oliver Bendel, the prompt (prompt A) read: “Two female swimmers competing in lake, photorealistic”. ChatGPT, the interface to DALL-E 3, made four prompts out of it ( prompt B1 – B4). Prompt B4 read: “Photo-realistic image of two female swimmers, one with tattoos on her arms and the other with a swim cap, fiercely competing in a lake with lily pads and reeds at the edges. Birds fly overhead, adding to the natural ambiance.” DALL-E 3, on the other hand, turned this prompt into something that had little to do with either this or prompt A. The picture does not show two women, but two men, or a woman and a man with a beard. They do not swim in a race, but argue, standing in a pond or a small lake, furiously waving their arms and going at each other. Water lilies sprawl in front of them, birds flutter above them. Certainly an interesting picture, but produced with such arbitrariness that one wishes for the good old prompt engineering to return (the picture in this post shows a detail). This is exactly what the interface actually wants to replace – but the result is an effect familiar from the Chinese Whispers game.

Moral Issues with Image Generators

The article “Image Synthesis from an Ethical Perspective” by Prof. Dr. Oliver Bendel was submitted on 18 April and accepted on 8 September 2023. It was published on 27 September 2023. From the abstract: “Generative AI has gained a lot of attention in society, business, and science. This trend has increased since 2018, and the big breakthrough came in 2022. In particular, AI-based text and image generators are now widely used. This raises a variety of ethical issues. The present paper first gives an introduction to generative AI and then to applied ethics in this context. Three specific image generators are presented: DALL-E 2, Stable Diffusion, and Midjourney. The author goes into technical details and basic principles, and compares their similarities and differences. This is followed by an ethical discussion. The paper addresses not only risks, but opportunities for generative AI. A summary with an outlook rounds off the article.” The article was published in the long-established and renowned journal AI & Society and can be downloaded here.

Maybe Not Safe

Ideogram seemed to start as a rather free and permissive image generator in August 2023. In the meantime, a noticeable number of images are censored. It is not the prompt that matters, but the image itself. If the platform detects during generation that the image might be problematic, it is not finished, but replaced by a tile with a cat holding a sign in its paws that says “MAYBE NOT SAFE”. A prompt read: “The sculpture Galatea, resembling the beautiful Aphrodite, creates itself, photo, film”. So, the sculpture of Pygmalion was to empower itself. The four images, two of which showed breasts, were seen by the user and also by the platform itself, apparently resulting in the images being transformed into the said warnings before they were completed. On the other hand, photorealistic images of women in revealing poses remain unproblematic, as long as they are wearing bikinis or hotpants. As with other American platforms, the problem here seems to be the visibility of nipples, whether human or sculptural. In another experiment, in one of the four pictures, the nipples were visible until they disappeared under the cat’s fur. In another sculpture, Ideogram itself had covered the nipples, one with her hand, the other with a piece of clay or stone jewellery. This Galatea was spared the fate of her sister.

ChatGPT can See, Hear, and Speak

OpenAI reported on September 25, 2023 in its blog: “We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.” (OpenAI Blog, 25 September 2023) The company gives some examples of using ChatGPT in everyday life: “Snap a picture of a landmark while traveling and have a live conversation about what’s interesting about it. When you’re home, snap pictures of your fridge and pantry to figure out what’s for dinner (and ask follow up questions for a step by step recipe). After dinner, help your child with a math problem by taking a photo, circling the problem set, and having it share hints with both of you.” (OpenAI Blog, 25 September 2023) But the application can not only see, it can also hear and speak: “You can now use voice to engage in a back-and-forth conversation with your assistant. Speak with it on the go, request a bedtime story for your family, or settle a dinner table debate.” (OpenAI Blog, 25 September 2023) More information via openai.com/blog/chatgpt-can-now-see-hear-and-speak.

CONVERSATIONS 2023 in Oslo

The CONVERSATIONS 2023, a two-day workshop on chatbot research, applications, and design, will take place at the University of Oslo, Norway. According to the CfP, contributions concerning applications of large language models such as the GPT family are warmly welcome, as are contributions on applications combining information retrieval approaches and large language model approaches. Building on the results from previous six CONVERSATIONS workshops, the following topics are of particular interest: 1. Chatbot users and implications, 2. Chatbot user experience, design, and evaluation, 3. Chatbot frameworks and platforms, 4. Chatbots for collaboration, 5. Democratizing chatbots – chatbots for all, 6. Ethics and safety implications of chatbots and large language models, 7. Leveraging advances in AI technology and large language models. More information via 2023.conversations.ws.

Introducing Visual ChatGPT

Researchers at Microsoft are working on a new application based on ChatGPT and solutions like Stable Diffusion. Visual ChatGPT is designed to allow users to generate images using text input and then edit individual elements. In their paper “Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models” Chenfei Wu and his co-authors write: “We build a system called Visual ChatGPT, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps” – and, not to forget: “3) providing feedback and asking for corrected results” (Wu et al. 2023). For example, one lets an appropriate prompt create an image of a landscape, with blue sky, hills, meadows, flowers, and trees. Then, one instructs Visual ChatGPT with another prompt to make the hills higher and the sky more dusky and cloudy.  One can also ask the program what color the flowers are and color them with another prompt. A final prompt makes the trees in the foreground appear greener. The paper can be downloaded from arxiv.org.