What the tech?! AI Images: Discovering Dall-E with Benjamin

Make Things, Make Sense Podcast
Season 1 Episode 18
Play in Spotify App

Introduction

What are AI Imagine Generating tools? How can they impact your business and take it to the next level?

In this episode, Alex is joined by Growth Gurus’ Head of Web, Benjamin, to discuss what the tech are AI Imagine Generating tools and how they can make a difference in your day to day business activities.

Tune in to discover what this new technology is, how it works, what development to expect from it and how marketers can implement it to enhance their work.

What are AI Imagine Generating tools?

In a nutshell it is the reverse of a very common machine learning practice – user provides an image and the AI describes what it sees. Now make the system work in reverse, and it will produce images instead of descriptions!

How these tools work

Dall-E uses a predictive process to come up with accurate renditions of the text prompt. However, it was the predictive process that greatly limited realism, and still something felt a little ‘off’. It relied on the GPT-3 approach [Generative Pre-trained Transformer 3] and used it to produce results by compressing images into a series of words, and learning to predict what comes next.

This produced some amazing results, and showed great promise in 2021 – there of course was still a bit of a disconnect between word-matching and producing aesthetically pleasing results – in short the first version of Dall-E was quite limited to cartoonishly looking images, against simple backgrounds.

It only took around a year for OpenAI to announce Dall-E 2, and that is what shocked the entire world – we were presented with an iteration of the Dall-E method that seemed to be almost impossible! Gone were the cartoonish, blurry or simple images – enter photorealism, speed and editing power! The key focus here is how little time it took for such significant progress.

Dall-E 2 is not only able to absorb very detailed (and sometimes strange) text prompts, but it is able to produce an image at much better efficiency, quality and accuracy than ever before! It is also able to do so with a high level of aesthetics and context – it is able to manifest complex prompts into visual representations using OpenAI’s model called CLIP (Contrastive Language-Image Pre-Training) – which was originally designed to look at images and summarize their content – OpenAI inverted this process and created unCLIP – an inverted model that starts with the description and creates an image!

Dall-E 2 generates images at 1024 x 1024 pixels, substantial increase in resolution than the previous model, which was 256 x 256.

CLIP, as opposed to GPT, rather than being predictive is contrastive – it learns HOW RELATED any given prompt or caption is to an image, and this allows it to link between textual and visual representations of the same abstract object.

The image creation process

You might be surprised to hear that every image Dall-E starts off as a ‘canvas’ of noise. It samples random Gaussian noise, and then uses another model called GLIDE to gradually strip away or fill in the noise until it matches the text prompt!

Dall-E 1 was never released publicly – the closest would be Craiyon (previously known as Dall-E Mini), which I highly recommend people try out! It will really show you how powerful these AI tools can be – it is guaranteed to leave you awestruck. After that I recommend you go and check out the Dall-E official site and examples posted on there – if you have not seen these already you are guaranteed to be in amazed disbelief!

On topic of ownership and copyright – “Starting today, users get full usage rights to commercialize the images they create with DALL-E 2, including the right to reprint, sell, and merchandise,” – July 20th 2022 (https://openai.com/blog/dall-e-now-available-in-beta/).

Available Options

Google’s Imagen – text to image generator, able to understand even more complex languages!

Imagen outperforms Dall-2 ad a few other generators – DrawBench, a benchmark used to compare the text-to-image models, has found that the average user strongly prefers Imagen over other methods!

Currently Dall-E 2 and Imagen are not openly available due to concerns about misuse.

Nvidia’s AI – StyleGen was able to generate images based on text, and also a sketch-to-image model that allowed photorealistic image generation, especially when paired with segmentation maps which allows the user to specify regions of the sketch. This was back in 2021 – mostly focused on landscapes.

Their new StyleGen 3 can now generate faces, in real-time, as the user is sketching! The user does not even need to be gifted at drawing or posses any skills to produce a good quality output.

StyleGen allows you to also edit portraits of people – you can change their pose, the style, background, or use shape-shifting and warping effects.

Not only that, StyleGen is also able to take existing images and do something called Domain Transfer – what this means is that you can take an image of a beach and beach-goers, and using a text prompt you can replace the sand on the beach to snow, and the sea to fanta – and it will do so at an unprecedented rate.

StyleGen is quite exciting because Nvidia has a track record of turning reasearch into products – for example 2017 Audio2Face, which leverages AI technology to generate full facial motion and lip sync with just an audio track as input, is available from March 2022, this year, within the NVIDIA Omniverse ecosystem.

What does the future of design look like with these tools? They seem to render realistic looking images based on more complex prompts at an increasing level of quality, so much so that photorealism is already within our grasp, and the sky is the limit when it comes to creativity.

Business and business-owners are wondering – can do it faster, cheaper and automated?

Designers are also wondering if this is putting their jobs at risk – is their profession viable? What are the long-term implications of these AI tools?

These AI tools are able to produce stock photography at very high quality at hyper speeds, without any complicated briefs, discussions or “human” limitations. The interest and concerns are warranted from both sides, and there are a few opinions on what the near future might look like.

Uses Cases

1. Using Dall-E as a viable substitute for stock photography

George Baily, Product Manager of FintechOS: “As an individual wanting a series of interesting illustrations in a particular style or having ten ideas for a new logo or a quick banner ad for my social media campaign, I don’t want to have an emotional discussion with my designer about their creativity. It doesn’t need to be perfect, it just needs to be. These images can be created at a hyper-speed, scale and quantity that humans are just not built to process,”

Our take: It is likely that AI tools will replace some production work in certain organizations, depending on the kind of design being done, however it is still very likely that businesses will need to pair the tools with experts in order to get the most out of them. Someone without a background in design will be limited by their understanding of design methodologies, concepts and best practices.

OpenAI has stated that its customers have come forward with their plans to use the Dall-E tool “for commercial projects, like illustrations for children’s books, art for newsletters, concept art and characters for video games, moodboards for consulting, and storyboards for movies” (https://openai.com/blog/dall-e-now-available-in-beta/)

It could do wonders for people that cannot afford to hire a graphic designer, or those that want to prototype their product ideas with quickly generated images.

However there are still limitations to the tool – its output is limited to raster images (jpg) rather than vector format, which is not ideal for assets like logos and icons.

Just as you can find many amazing results from Dall-E, you can as easily stumble upon many failed images, mainly due to Dall-E’s limitations.

2. Graphic designers will use Dall-E as a new tool for inspiration or to optimize their workflows.

Some are looking forward to automating as much of their tedious work as possible – their goal is to focus more in bigger and important things.

Others see it as a great inspirational tool for mock-ups and mood boards, and are already using it as such.

It is more likely that what we know today as the design process will remain in the hands of design specialists and experts, and they will use these AI tools like Dall-E, and StyleGen’s Sketch-to-Image as superpower tools.

This is super exciting for many designers that believe in this perspective, but they also recognize the limitations – the image tools mainly focus on photorealistic and creative renditions of scenery and objects, rather than structura layoutl elements, such as landing page design, or wire frames for a website – the Dall-E system has trouble with borders and counting for example.

Fields like strategic brand design, as of today, do not seem to be impacted by these tools – yet.

The current biggest strength in Dall-E, and other current AI image rendering tools, seems to be abstract and photorealistic artwork, especially conceptual pieces.

What are a few exciting ways marketers can start to implement the use of these tools?

8 ways marketers could use Dall-e 2:

  1. Unique, compelling images for blog posts, ebooks, videos, and podcast episode listings.
  2. Unique, compelling images for website pages and landing pages.
  3. Visuals for digital or print brand collateral used internally and externally.
  4. Visuals that help describe complex information, products, or services across all digital assets.
  5. Eye-catching visuals that stand out in advertising creative.
  6. Mockups to brainstorm branding, campaign ideas, video scripts, or commercials.
  7. Mockups or final versions of logos.
  8. Mockups to inspire and guide human designers on more complex visual generation projects.

Other episodes

Growth-Gurus-Podcast-Cover-1024x1024

S01E19: How to avoid burnout with a change in leadership style

Growth-Gurus-Podcast-Cover-1024x1024

S01E18: What the tech?! AI Images: Discovering Dall-E with Benjamin

Growth-Gurus-Podcast-Cover-1024x1024

S01E17: Making sense of Data and Google Analytics 4 with Steph

Growth-Gurus-Podcast-Cover-1024x1024

S01E16: 8 Proven Techniques to Generate More Business

Growth-Gurus-Podcast-Cover-1024x1024

S01E15: How to improve your focus and productivity

Growth-Gurus-Podcast-Cover-1024x1024

S01E14: SEO for CEOs

Growth-Gurus-Podcast-Cover-1024x1024

S01E13: Is content still King?

Growth-Gurus-Podcast-Cover-1024x1024

S01E12: How to start your day off the right way

Growth-Gurus-Podcast-Cover-1024x1024

S01E10: Standing out in a crowded market with guest CEO Tom Jones from Yolted

Growth-Gurus-Podcast-Cover-1024x1024

S01E11: Why you should love your brand

Growth-Gurus-Podcast-Cover-1024x1024

S01E09: You need a Marketing Strategy – Here’s why

Growth-Gurus-Podcast-Cover-1024x1024

S01E08: 8 Tips to Running a Successful Workshop

Growth-Gurus-Podcast-Cover-1024x1024

S01E03: Making sense of office vs remote working

Growth-Gurus-Podcast-Cover-1024x1024

S01E07: The power of branding, done right

Growth-Gurus-Podcast-Cover-1024x1024

S01E06: Making Sense of target audiences: Why should anybody care?

Growth-Gurus-Podcast-Cover-1024x1024

S01E05: Making Sense of Internships

Growth-Gurus-Podcast-Cover-1024x1024

S01E04: The importance of core values for hiring the right people