The DALL-E Mini software from a group of open source developers isn’t perfect, but sometimes it effectively comes up with images that match people’s text descriptions.
If you’ve been browsing your social media feeds lately, chances are you’ve seen illustrations accompanied by captions. They are now popular.
The images you see were probably powered by a text-to-image program called DALL-E. Before the illustrations are posted, people insert words, which are then converted into images via artificial intelligence models.
For example, a Twitter user posted a tweet that read, “To be or not to be, rabbi with avocado, marble sculpture.” The attached photo, which is quite elegant, shows a marble statue of a bearded man in a robe and bowler hat, holding an avocado.
The AI models come from Google’s Imagen software and from OpenAI, a start-up backed by Microsoft that developed DALL-E 2. On its website, OpenAI calls DALL-E 2 “a new AI system that can create realistic images and art from a natural language description.”
But most of what happens in this area comes from a relatively small group of people who share their photos and, in some cases, generate a large engagement. That’s because Google and OpenAI haven’t made the technology widely available to the public.
Many of OpenAI’s early adopters are friends and relatives of employees. To apply for admission, you will be required to join a waiting list and indicate if you are a professional artist, developer, academic researcher, journalist, or online creator.
“We are working hard to speed up access, but it will probably take some time to reach everyone; as of June 15, we’ve invited 10,217 people to try DALL-E,” OpenAI’s Joanne Jang wrote on a help page. the company’s website. website.
One system that is publicly available is DALL-E Mini. it’s based on open source code from a loosely organized team of developers and is often overloaded with demand. Attempts to use it may be greeted with a dialog that reads “Too much traffic, please try again”.
It’s a bit reminiscent of Google’s Gmail service, which in 2004 lured people with unlimited email storage. Early adopters could initially only enter by invitation, forcing millions to wait. Now Gmail is one of the most popular email services in the world.
Creating images from text may never have been as ubiquitous as email. But the technology is definitely having a moment, and part of its appeal is in its exclusivity.
Private research lab Midjourney requires people to fill out a form if they want to experiment with the bot for generating images from a channel in the Discord chat app. Only a select group of people use Imagen and post pictures of it.
The text-to-image services are advanced, identify key parts of a user’s prompts and then guess how best to illustrate those terms. Google trained its Imagen model with hundreds of its internal AI chips on 460 million internal image-text pairs, in addition to external data.
The interfaces are simple. There is generally a text box, a button to start the generation process, and an area below to display images. To identify the source, Google and OpenAI add watermarks to the lower right corner of images from DALL-E 2 and Imagen.
The companies and groups building the software are rightly concerned that everyone will storm the gates at once. Handling web requests to query these AI models can get expensive. More importantly, the models are not perfect and do not always produce results that accurately represent the world.
Engineers trained the models on extensive collections of words and images from the Internet, including photos people posted on Flickr.
OpenAI, based in San Francisco, recognizes the potential for harm that could come from a model that learned how to create images by essentially scouring the web. To address the risk, employees have removed violent content from training data and have filters in place that prevent DALL-E 2 from generating images when users submit prompts that may violate company policies against nudity, violence, conspiracy or political content.
“There is an ongoing process to improve the security of these systems,” said Prafulla Dhariwal, an OpenAI research scientist.
Bias in the results are also important to understand and represent a broader concern for AI. Boris Dayma, a Texas developer, and others who have worked on DALL-E Mini detailed the problem in an explanation of their software.
“Professions with higher levels of education (such as engineers, doctors or scientists) or a lot of physical labor (such as construction) are usually represented by white males,” they wrote. “Nurses, secretaries or assistants, on the other hand, are typically women, often white as well.”
Google described similar shortcomings of its Imagen model in an academic paper.
Despite the risks, OpenAI is excited about the kinds of things the technology can make possible. Dhariwal said it could open up creative possibilities for individuals and help with commercial interior design applications or website dressing.
The results should continue to improve over time. Launched in April, DALL-E 2 spits out more realistic visuals than the first version OpenAI announced last year, and the company’s text-generation model, GPT, has gotten more sophisticated with each generation.
“You can expect this to happen for many of these systems,” Dhariwal said.
WATCH: Former press. Obama tackles disinformation, says it could get worse with AI