Blob Opera is an AI experiment by David Li in collaboration with Google Arts and Culture. According to the website, it pays tribute to and explores the original musical instrument, namely the voice. “We developed a machine learning model trained on the voices of four opera singers in order to create an engaging experiment for everyone, regardless of musical skills. Tenor, Christian Joel, bass Frederick Tong, mezzo‑soprano Joanna Gamble and soprano Olivia Doutney recorded 16 hours of singing. In the experiment you don’t hear their voices, but the machine learning model’s understanding of what opera singing sounds like, based on what it learnt from them.” (Blop Opera) You can drag the blobs up and down to change pitch – or forwards and backwards for different vowel sounds. It is not only pleasurable to hear the blobs, but also to see them. While singing, they look around and open and close their mouths. Even their tongues can be seen again and again.
There is great media interest in the new book “Maschinenliebe” (ed. Oliver Bendel), which was published in October 2020. Several review copies were sent out. The title means “Machine Love”, “Machines for Love” or “Machines of Love”. Three contributions are in English. One of them – “Speaking with Harmony: Finding the right thing to do or say … while in bed (or anywhere else)” – is by Kino Coursey (Realbotix). From the abstract: “Doing or saying the right thing in response to circumstances is a constant problem, especially for embodied personal companions like Realbotix’s Harmony. In this paper we will describe the Harmony system, how it finds the right thing to say or do, and how recent advances in neural network-based natural language processing and generation will be integrated into next-generation systems. These advances will allow the transition from pattern-oriented responses to dynamic narrative-oriented response generation. Future systems will be able adapt to their situation much more flexibly, and allow a wider range of role-playing and interaction.” More information via www.springer.com/de/book/9783658298630.
Diffbot, a Stanford startup, is building an AI-based spider that reads as many pages as possible on the entire public web, and extracts as many facts from those pages as it can. “Like GPT-3, Diffbot’s system learns by vacuuming up vast amounts of human-written text found online. But instead of using that data to train a language model, Diffbot turns what it reads into a series of three-part factoids that relate one thing to another: subject, verb, object.” (MIT Technology Review, 4 September 2020) Knowledge graphs – which is what this is all about – have been around for a long time. However, they have been created mostly manually or only with regard to certain areas. Some years ago, Google started using knowledge graphs too. Instead of giving us a list of links to pages about Spider-Man, the service gives us a set of facts about him drawn from its knowledge graph. But it only does this for its most popular search terms. According to MIT Technology Review, the startup wants to do it for everything. “By fully automating the construction process, Diffbot has been able to build what may be the largest knowledge graph ever.” (MIT Technology Review, 4 September 2020) Diffbot’s AI-based spider reads the web as we read it and sees the same facts that we see. Even if it does not really understand what it sees – we will be amazed at the results.
Which moves go with which song? Should I do the Floss, the Dougie or the Robot? Or should I create a new style? But which one? An AI system could help answer these questions in the future. At least the announcement of a social media platform raises this hope: “Facebook AI researchers have developed a system that enables a machine to generate a dance for any input music. It’s not just imitating human dance movements; it’s creating completely original, highly creative routines. That’s because it uses finely tuned search procedures to stay synchronized and surprising, the two main criteria of a creative dance. Human evaluators say that the AI’s dances are more creative and inspiring than meaningful baselines.” (Website FB) The AI system could inspire dancers when they get stuck and help them to constantly improve. More information via about.fb.com/news/2020/08/ai-dancing-facebook-research/.
Imitating the agile locomotion skills of animals has been a longstanding challenge in robotics. Manually-designed controllers have been able to reproduce many complex behaviors, but building such controllers is time-consuming and difficult. According to Xue Bin Peng (Google Research and University of California, Berkeley) and his co-authors, reinforcement learning provides an interesting alternative for automating the manual effort involved in the development of controllers. In their work, they present “an imitation learning system that enables legged robots to learn agile locomotion skills by imitating real-world animals” (Xue Bin Peng et al. 2020). They show “that by leveraging reference motion data, a single learning-based approach is able to automatically synthesize controllers for a diverse repertoire behaviors for legged robots” (Xue Bin Peng et al. 2020). By incorporating sample efficient domain adaptation techniques into the training process, their system “is able to learn adaptive policies in simulation that can then be quickly adapted for real-world deployment” (Xue Bin Peng et al. 2020). For demonstration purposes, the scientists trained “a quadruped robot to perform a variety of agile behaviors ranging from different locomotion gaits to dynamic hops and turns” (Xue Bin Peng et al. 2020).
Google is currently working on Meena, a particular chatbot, which should be able to have arbitrary conversations and be used in many contexts. In their paper “Towards a Human-like Open-Domain Chatbot“, the developers present the 2.6 billion parameters end-to-end trained neural conversational model. They show that Meena “can conduct conversations that are more sensible and specific than existing state-of-the-art chatbots”. “Such improvements are reflected through a new human evaluation metric that we propose for open-domain chatbots, called Sensibleness and Specificity Average (SSA), which captures basic, but important attributes for human conversation. Remarkably, we demonstrate that perplexity, an automatic metric that is readily available to any neural conversational models, highly correlates with SSA.” (Google AI Blog) The company draws a comparison with OpenAI GPT-2, a model used in “Talk to Transformer” and Harmony, among others, which uses 1.5 billion parameters and is based on the text content of 8 million web pages.
“Alphabet X, the company’s early research and development division, has unveiled the Everyday Robot project, whose aim is to develop a ‘general-purpose learning robot.’ The idea is to equip robots with cameras and complex machine-learning software, letting them observe the world around them and learn from it without needing to be taught every potential situation they may encounter.” (MIT Technology Review, 23 November 2019) This was reported by MIT Technology Review on 23 November 2019 in the article “Alphabet X’s ‘Everyday Robot’ project is making machines that learn as they go”. The approach of Alphabet X seems to be well though-out and target-oriented. In a way, it is oriented towards human learning. One could also teach robots human language in this way. With the help of microphones, cameras and machine learning, they would gradually understand us better and better. For example, they observe how we point to and comment on a person. Or they perceive that we point to an object and say a certain term – and after some time they conclude that this is the name of the object. However, such frameworks pose ethical and legal challenges. You can’t just designate cities as such test areas. The result would be comprehensive surveillance in public spaces. Specially established test areas, on the other hand, would probably not have the same benefits as “natural environments”. Many questions still need to be answered.
Artificial intelligence is spreading into more and more application areas. American scientists have now developed a system that can supplement texts: “Talk to Transformer”. The user enters a few sentences – and the AI system adds further passages. “The system is based on a method called DeepQA, which is based on the observation of patterns in the data. This method has its limitations, however, and the system is only effective for data on the order of 2 million words, according to a recent news article. For instance, researchers say that the system cannot cope with the large amounts of data from an academic paper. Researchers have also been unable to use this method to augment texts from academic sources. As a result, DeepQA will have limited application, according to the researchers. The scientists also note that there are more applications available in the field of text augmentation, such as automatic transcription, the ability to translate text from one language to another and to translate text into other languages.” The sentences in quotation marks are not from the author of this blog. They were written by the AI system itself. You can try it via talktotransformer.com.
In October 2019 Springer VS published the “Handbuch Maschinenethik” (“Handbook Machine Ethics”) with German and English contributions. Editor is Oliver Bendel (Zurich, Switzerland). One of the articles was written by Bertram F. Malle (Brown University, Rhode Island) and Matthias Scheutz (Tufts University, Massachusetts). From the abstract: “We describe a theoretical framework and recent research on one key aspect of robot ethics: the development and implementation of a robot’s moral competence. As autonomous machines take on increasingly social roles in human communities, these machines need to have some level of moral competence to ensure safety, acceptance, and justified trust. We review the extensive and complex elements of human moral competence and ask how analogous competences could be implemented in a robot. We propose that moral competence consists of five elements, two constituents (moral norms and moral vocabulary) and three activities (moral judgment, moral action, and moral communication). A robot’s computational representations of social and moral norms is a prerequisite for all three moral activities. However, merely programming in advance the vast network of human norms is impossible, so new computational learning algorithms are needed that allow robots to acquire and update the context-specific and graded norms relevant to their domain of deployment. Moral vocabulary is needed primarily for moral communication, which expresses moral judgments of others’ violations and explains one’s own moral violations – to justify them, apologize, or declare intentions to do better. Current robots have at best rudimentary moral competence, but with improved learning and reasoning they may begin to show the kinds of capacities that humans will expect of future social robots.” (Abstract “Handbuch Maschinenethik”). The book is available via www.springer.com.
Some months ago, researchers at the University of Massachusetts showed the climate toll of machine learning, especially deep learning. Training Google’s BERT, with its 340 million data parameters, emitted nearly as much carbon as a round-trip flight between the East and West coasts. According to Technology Review, the trend could also accelerate the concentration of AI research into the hands of a few big tech companies. “Under-resourced labs in academia or countries with fewer resources simply don’t have the means to use or develop such computationally expensive models.” (Technology Review, 4 October 2019) In response, some researchers are focused on shrinking the size of existing models without losing their capabilities. The magazine wrote enthusiastically: “Honey, I shrunk the AI” (Technology Review, 4 October 2019) There are advantages not only with regard to the environment and to the access to state-of-the-art AI. According to Technology Review, tiny models will help bring the latest AI advancements to consumer devices. “They avoid the need to send consumer data to the cloud, which improves both speed and privacy. For natural-language models specifically, more powerful text prediction and language generation could improve myriad applications like autocomplete on your phone and voice assistants like Alexa and Google Assistant.” (Technology Review, 4 October 2019)