As of today, Google’s AI-powered search experiment, dubbed SGE, is multimodal. On the heels of OpenAI’s release of DALL-E 3 and Microsoft’s Bing Image Creator, Google’s SGE now has its own AI image generating tool.
Powered by Google’s Imagen text-to-image diffusion models, users with access to SGE can describe an image they want, and within seconds, SGE will give them four varieties to choose from. From there, users can further edit the description to tweak the image they want to see. In Google’s example, the original request is for a whimsical image of a capybara wearing a chef’s hat and cooking bacon. Users can edit the description to make the capybara cooking hash browns instead.
In the AI arms race, or Thunderdome — or whatever you want to call tech giants competing for AI market dominance — multimodality is coveted strategic territory. Multimodality refers to an AI model’s ability to understand and process different types of media, including image and audio.
An AI chatbot conversing with users is one thing, but “seeing,” “hearing,” and producing creative outputs is a whole new level of AI sophistication. OpenAI recently released the latest version of its image generating tool DALL-E 3. Microsoft, which is an OpenAI investor, now uses DALL-E 3 for Bing Image Creator. And now, Google is bringing its own version to SGE.
Widespread access to AI image generating tools is not without major concerns, including the spread of misinformation/disinformation and copyright violations. SGE has been trained to block harmful or misleading content that violates Google’s generative AI policy, and it won’t generate any images containing photorealistic human faces. Plus, notable public figures (i.e., celebrities) will be blocked from the image-generation results, preventing potential deepfakes. As an added precaution, the tool is for users who are 18 and older.
Images created by SGE will have metadata and embedded watermarking to indicate they’re AI-generated. Additionally, Google’s Imagen models were trained on publicly availably content. Mashable asked Google whether user data from text prompts and generated images are used to train the model. We also asked if there’s an opt-out option. We will update this story when we get a response.
Also new to SGE is the ability to draft written content. This is the same feature available in Bard, Google’s AI chatbot, but can now be accessed directly within SGE’s search function, saving you the time of switching back and forth between windows.
You can now draft messages directly within SGE.
Image generation and message drafting within SGE is being introduced in English today to users in the U.S.