ROBOPHILOSOPHY

March 14, 2023March 14, 2023

GPT-4 as Multimodal Model

GPT-4 was launched by OpenAI on March 14, 2023. “GPT-4 is a large multimodal model (accepting image and text inputs, emitting text outputs) that, while less capable than humans in many real-world scenarios, exhibits human-level performance on various professional and academic benchmarks.” (Website OpenAI) On its website, the company explains the multimodal options in more detail: “GPT-4 can accept a prompt of text and images, which – parallel to the text-only setting – lets the user specify any vision or language task. Specifically, it generates text outputs (natural language, code, etc.) given inputs consisting of interspersed text and images.” (Website OpenAI) The example that OpenAI gives is impressive. An image with multiple panels was uploaded. The prompt is: “What is funny about this image? Describe it panel by panel”. This is exactly what GPT-4 does and then comes to the conclusion: “The humor in this image comes from the absurdity of plugging a large, outdated VGA connector into a small, modern smartphone charging port.” (Website OpenAI) The technical report is available via cdn.openai.com/papers/gpt-4.pdf.

March 10, 2023March 10, 2023

Introducing Visual ChatGPT

Researchers at Microsoft are working on a new application based on ChatGPT and solutions like Stable Diffusion. Visual ChatGPT is designed to allow users to generate images using text input and then edit individual elements. In their paper “Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models” Chenfei Wu and his co-authors write: “We build a system called Visual ChatGPT, incorporating different Visual Foundation Models, to enable the user to interact with ChatGPT by 1) sending and receiving not only languages but also images 2) providing complex visual questions or visual editing instructions that require the collaboration of multiple AI models with multi-steps” – and, not to forget: “3) providing feedback and asking for corrected results” (Wu et al. 2023). For example, one lets an appropriate prompt create an image of a landscape, with blue sky, hills, meadows, flowers, and trees. Then, one instructs Visual ChatGPT with another prompt to make the hills higher and the sky more dusky and cloudy. One can also ask the program what color the flowers are and color them with another prompt. A final prompt makes the trees in the foreground appear greener. The paper can be downloaded from arxiv.org.

March 7, 2023March 7, 2023

The World’s First AI-Driven Localized Radio Content

Futuri launches RadioGP, the first AI-driven localized radio content. “RadioGPT™ uses TopicPulse technology, which scans Facebook, Twitter, Instagram, and 250k+ other sources of news and information, to identify which topics are trending in a local market. Then, using GPT-3 technology, RadioGPT™ creates a script for on-air use, and AI voices turn that script into com-pelling audio.” (Press release, February 23, 2023) It is not a new radio station, but an offer and a tool for existing radio stations. “Stations can select from a variety of AI voices for single-, duo-, or trio-hosted shows, or train the AI with their existing personalities’ voices. Programming is available for individual dayparts, or Futuri’s RadioGPT™ can power the entire station. RadioGPT™ is available for all formats in a white-labeled fashion.” (Press release, February 23, 2023) More information via futurimedia.com/futuri-launches-radiogpt/.

February 22, 2023February 22, 2023

The Enhancement of Bixby

The Korean company Samsung Electronics announced new updates to its voice assistant Bixby that are designed to improve user experience, performance, and capabilities of the intelligent assistant and platform. One of the most interesting innovations concerns the voice of the users. According to Samsung, they “can personalize their Bixby Text Call voice”. “Using the new Bixby Custom Voice Creator, users can record different sentences for Bixby to analyze and create an AI generated copy of their voice and tone. Currently available in Korean, this generated voice is planned to be compatible with other Samsung apps beyond phone calls” (Samsung, 22 February 2023). As early as 2017, Oliver Bendel wrote with respect to Adobe VoCo: “Today, just a few minutes of samples are enough to be able to imitate a speaker convincingly in all kinds of statements.” In his article “The synthetization of human voices”, published in AI & Society, he also made ethical considerations. Now there seems to be a recognized market for such applications and they are being rolled out more widely.

February 7, 2023February 7, 2023

Bard Comes into the World

Sundar Pichai, the CEO of Google and Alphabet, announced the answer to ChatGPT in a blog post dated February 6, 2023. According to him, Bard is an experimental conversational AI service powered by LaMDA. It has been opened to trusted testers and will be made available to the public in the coming weeks. “Bard seeks to combine the breadth of the world’s knowledge with the power, intelligence and creativity of our large language models. It draws on information from the web to provide fresh, high-quality responses. Bard can be an outlet for creativity, and a launchpad for curiosity, helping you to explain new discoveries from NASA’s James Webb Space Telescope to a 9-year-old, or learn more about the best strikers in football right now, and then get drills to build your skills.” (Sundar Pichai 2023) In recent weeks, Google had come under heavy pressure from OpenAI’s ChatGPT. It was clear that they had to present a comparable application based on LaMDA as soon as possible. In addition, Baidu wants to launch the Ernie Bot, which means another competing product. More information via blog.google/technology/ai/bard-google-ai-search-updates/.

February 2, 2023February 2, 2023

How People React to Hugs from Robots

As part of the AAAI 2023 Spring Symposia in San Francisco, the symposium “Socially Responsible AI for Well-being” is organized by Takashi Kido (Teikyo University, Japan) and Keiki Takadama (The University of Electro-Communications, Japan). The paper “Increasing Well-being and Health through Robotic Hugs” by Oliver Bendel, Andrea Puljic, Robin Heiz, Furkan Tömen, and Ivan De Paola was accepted. Among other things, they show how people in Switzerland react to robotic hugs. The talk will take place between March 26 and 29, 2023 at Hyatt Regency, San Francisco Airport. The symposium website states: “For our happiness, AI is not enough to be productive in exponential growth or economic/financial supremacies but should be socially responsible from the viewpoint of fairness, transparency, accountability, reliability, safety, privacy, and security. For example, AI diagnosis system should provide responsible results (e.g., a high-accuracy of diagnostics result with an understandable explanation) but the results should be socially accepted (e.g., data for AI (machine learning) should not be biased (i.e., the amount of data for learning should be equal among races and/or locations). Like this example, a decision of AI affects our well-being, which suggests the importance of discussing ‘What is socially responsible?’ in several potential situations of well-being in the coming AI age.” (Website AAAI) According to the organizers, the first perspective is “(Individually) Responsible AI”, which aims to clarify what kinds of mechanisms or issues should be taken into consideration to design Responsible AI for well-being. The second perspective is “Socially Responsible AI”, which aims to clarify what kinds of mechanisms or issues should be taken into consideration to implement social aspects in Responsible AI for well-being. More information via www.aaai.org/Symposia/Spring/sss23.php#ss09.