Disclaimer
I'm not currently affiliated with the companies mentioned in this review. This review, and the motivation to do it, is my own. I'm sharing my experience in case it may be of interest to others.
This review is using an early limited-access version of a new technology developed by OpenAI called DALL-E 2.
The technology evolves very quickly and therefore this review is relevant as of June 2022.
All images in this review are not real (unless specified otherwise).They were created in real-time using my text prompts by the AI-based DALL-E 2 technology.
What is this review about?
There is a growing interest in the development of generative technologies using AI models to produce text, image, and music content. These technologies use text input to describe the intended output. They are currently called "prompts".
One of the current leader in these technologies is OpenAI. As of today, they have a text generator called GPT3 and they just released a limited-access to their text-to-image generator called DALL-E 2.
Many images generated by DALL-EÂ had already been shared on social media and IÂ really wanted to get my hands on it and experience it for myself.
I just got granted an early access to test DALL-E 2, and this review is my experience playing with it for about 3 days.
DALL-E description:Â https://github.com/openai/dall...
OpenAI website:Â https://openai.com/
My angle
It is important to share a bit of my background to bring this review into perspective. I have both a long experience in productizing cutting-edge digital media technologies and was once Principal Designer for Adobe.
As a hobby, I'm also a fine art photographer with experience in discussing and reviewing images (I was president of the Palo Alto Camera Club for several years) and I'm competing internationally with my own images.
I have been coached in the art business by one of the best coaches in the industry and I started my career with a master in product design.
It's with this background that I'm writing this review.
First 5 minutes impressions
Impressive.
I used the DALL-E 2 web-based playground interface. There is a text field to enter a text "prompt" which is where one describes the intended image to be created. Unlike GPT3, there are no parameters exposed to influence the output. Everything goes through the text prompt.
Very first prompt (without thinking too much about it):Â
a chalet on a snowy mountain at dusk
Response:
IÂ was very impressed with the result of my first prompt. The experience right out of the "box" is mind blowing.
The positive
- DALL-E returns 6 different images (I was expecting just one)
- The variety of images is impressive
- The interpretation of the text prompt is, in my opinion, quite accurate
- The composition, colors, angle, luminosity, etc are spot on
- Some elements in the images serve as a focus point
- It only takes about 20 seconds for DALL-E 2 to generate the images after submitting the text prompt
The negative
- The images are generated in a square aspect ratio. IÂ assume this is an arbitrary decision for the preview technology and that in the future the aspect ratio and orientation will be parameters.Â
- The image resolution is currently 1024x1024 pixels. IÂ also assume that in a commercial version the output resolution will be customizable.Â
How I organized the test
There are most likely many different strategies to go about testing such a technology. My interest was to evaluate where the technology shines and where it lacks. I'm interested in understanding the underlying capabilities which would inform the type of commercial applications and products that could eventually be developed on top of the API.
I'm also interested to understand if the technology can generate commercial-grade output (ready to use as is), if the generated content requires editing, or if it serves as conceptual inspiration for diverse creative professionals.
I was also interested in testing the output in different creative industries such as:
- Graphic and layout design (posters, book covers, etc)
- Industrial design (3D rendering of products)
- Architecture (in-situ rendering of architectural renderings)
- Logotype design
- Packaging design
- Fine art creative and abstract photographic creations
I'm not interested in:
- So-called generative Art (like the "astronaut riding a horse")
- Anything that is related to the so-called NFT Art
Because I have been playing a lot with the text generating technology GPT3 (also from OpenAI), I included a text description output generated by GPT3Â using the same prompt used in the DALL-E tests.
Each of the following tests include:
- Test (what I wanted to test)
- Prompt (the text input I used to test)
- Output (the grid of six images returned by DALL-E)
- Prompt matching (how well the output and elements follow the text input)
- My pick (my personal choice out of the six generated images and the reason)
- Rendering (the technical execution of the image highlighting the inclusion of visual techniques as well as any visual anomalies)
- Composition (how the elements use the space and their respective placement)
- Strength (what is the most valuable aspect of the image)
- Commercial readiness (can the image be use as is)
- GPT3 text description (a text description of the image generated by GPT3 using the same prompt used with DALL-E)
- Comments (my personal comments)
Creating vs. editing: the current preview version of DALL-E allows to generate images from scratch based on the text prompt only and also allows to edit an image that one uploads or an image created by DALL-E.Â
What DALL-E calls edit is the ability to paint an area of an image and to ask DALL-E to fill-in the specified area with a generated content (using the same text prompt mechanism).
I have focused this test on the 100% generative algorithm and only included limited examples of such editing.
GPT3Â prompt: I used the following GPT3 prompt template to compare the text output of GPT3 with the image output of DALL-E (the prompt placeholder being the same as the one used in DALL-E):
Describe the image of "{prompt-goes-here}" in detail. 1. What are the elements in the image? 2. What is the composition like? 3. What are the colors? 4. What makes the image interesting?
Photograph
Test | Very first test, testing an almost random prompt. What is the quality of a landscape photograph compared to actual photographs. |
Prompt | a chalet on a snowy mountain at dusk |
Output | |
Prompt matching |
The images effectively match the prompt. The included elements are aligned with the expectation. It also includes elements not specified in the prompt such as: trees, rocks, and sky/cloud elements. Non of the variations include a road, other constructions, power lines, fences, snow tracks, or people. Not that they would be expected, but could be part of variations. |
My pick | |
Rendering | The quality of the rendering varies across the six images. For example one of the chalet looks more like a placeholder texture than an actual chalet: This could be acceptable if there was a specific "artistic" filter applied to it. As a photographic rendering compared to stock images, it's not usable. Other images have a nice rendering of the elements and their connections like snow, snow on rock, trees, snow on trees, etc. |
Composition | The compositions are excellent in my opinion. Sure, they do not bring a creative unique signature of what a landscape or creative architectural photographer could bring. There are very different and all "correct" compositions in all six images. Different points of view, foreground, background, placements of the chalet. The Chalets are all placed in a way to be the main subject while only using an 8th to a 4th of the image. All the lines created by the geometry of the chalet and the surrounding elements play nicely together without visual basic errors. |
Strength |
|
Weakness |
|
Commercial readiness | The image I picked could be used as is for placeholder images or, depending on the use, for direct non-professional applications. Stock images from the usual suspects like iStock, Unsplash, or Adobe offer images of a much higher rendering quality (well, sure, they are actual photographs). |
GPT3 description | 1. The image contains a chalet on a snowy mountain at dusk. The chalet is a small, wooden house with a pitched roof. It is surrounded by evergreen trees and has a small porch with a railing. The mountain is covered in snow and the sky is a deep blue. 2. The composition is simple, with the chalet in the center and the mountain in the background. The colors are muted and the light is soft. 3. The colors in the image are blue, white, and brown. 4. The image is interesting because of the contrast between the dark mountain and the light chalet. The image is also calming and serene. |
Comments | The result are very impressive being generated on the fly by an AI engine based on a single text prompt. These images are photorealistic and could fool a non-exercised eye. Although the compositions are excellent, the rendering quality in some places would be a blocker for me to use in a professional application. |
Logotype
Test | How DALL-E responds to logotype design prompts. |
Prompt | a logo for an AI-based solar energy company in California |
Output | |
Prompt matching |
The results do look like logos. Bonus point for that. There doesn't seem to be reference to the word "California". The content is missing actual words (possibly due to the vagueness of my prompt) but it includes what ends up being lorem ipsum text. Elements from the prompt that are missing from the prompt include "California" and references to AI. The reference to sun power is present through the use of yellow-ish colors and round shape elements, but that would be considered quite limited. |
My pick | |
Rendering |
IÂ find the rendering relatively high quality except for a slightly jittery outline. The pseudo text is surprisingly good, using consistent tracking, and attributes (font, weight, cases, etc). Some logos use flat colors, while other use gradient colors, which is an interesting variation. Here is a detail view of a reasonable rendering: the rendering quality is well within an acceptable early concept or draft of logo design. However, the rendering quality for a logo needs to be much crisper (generated from a vector based source) to be used in professional applications including web-based low-resolution applications. |
Composition | The compositions are mostly right (IMO) and consistent with standard logo designs. That is a sense of simplicity, the centered elements, the relative placement of the image and the typography, the use of space between elements. |
Strength |
|
Weakness |
|
Commercial readiness | Not at all. |
GPT3 description | 1. The image contains a stylized sun with rays emanating from it. The sun is yellow and the rays are orange. 2. The composition is simple and symmetrical. 3. The colors are warm and inviting. 4. The image is perceived as positive and optimistic. 5. The image is interesting because it is a unique take on a solar energy company logo |
Comments | At the first view, the result is impressive. DALL-E returns what looks like six logos as requested. Looking into the details, the logos are not usable as is and lack of interpretation. However, having the ability to quickly generate several concepts can be useful as part of an initial source of inspiration for a creative professional. Note: I added the name of a fictive company to the prompt: a logo for an AI-based solar energy company named "CaliSun" based in Californiabut the results are much worse, degrading the elements, the composition, and even the relevance |
Industrial Design
Test | Can DALL-E generate 3D concepts of products based on a prompt description. |
Prompt | the industrial design realistic rendering of a light blue vacuum robot with orange sides with a rectangular shape |
Output | |
Prompt matching |
This one could be considered a total miss or a very interesting one. I initially expected a home vacuum (iRobot) in a square shape. But the prompt could also be intended to get inspiration or concepts for a new project. I'm going for a 4 star... I have a hard time judging the content of these images. I think they would provide some inspiration for the form/shapes. Trying to guess what these machines do and how they do it could also help designers explore new ideas. |
My pick | |
Rendering | The quality of the rendering would be very acceptable for early 3D concepts or sketches. There are almost no anomalies in the rendering, or the connections between parts. The integrity of the 3D objects is actually surprisingly good. In some of the renderings, there is even a consistent source of light and corresponding light reflections on the different parts of the model. I'm also (positively) surprised with the consistency across the six versions. They share the same rendering treatment, background, color values, scale, and textures. Note that they do not have shadows, but that might be a matter of reformulating the prompt. |
Composition | The compositions are right on. The perspective, the angle, the distance, etc are consistent with 3D model sketching. Only one of the six renderings is a close-up view, but that is not an issue and prompt variations might help specify viewing parameters (slicing, technical drawing, closeup views, surrounding elements, background, etc). |
Strength |
|
Weakness |
|
Commercial readiness | Not as final products, but as a source of creative inspiration. Or maybe as imaginary contraptions like the one once found in the Teletubbies show (Noo-Noo anyone?) |
GPT3 description | 1. The image contains a light blue vacuum robot with orange sides. The robot has a rectangular shape. 2. The composition of the image is simple, with the focus on the vacuum robot. The colors are eye-catching and contrast well with each other. 3. The colors in the image are light blue and orange. The light blue represents the vacuum robot, while the orange represents the sides of the robot. The colors are bright and vibrant. 4. The image is perceived as being calm and serene. The light blue color of the vacuum robot is soothing, while the orange sides add a touch of excitement. 5. The image is interesting because it is a realistic rendering of a vacuum robot. The colors are realistic and the composition is simple. |
Comments | The generated models lack of substance or functional integrity. Their constructs are very decent and are made out of plausible shapes and 3D elements, nicely composed and assembled. They could be used as an imaginary contraption in a cartoon though. Here is a different prompt also about orange and blue vacuums... a light blue vacuum robot with orange dots in the shape of an iphone |
Mockups
Test | There is a category of images called "mockups". They are used by creative professionals to present a product. For example, a smartphone with a blank screen so that you can add a screenshot of your application. I have been using living room mockups to present my photography. It usually takes me time to find a mockup to fit a given artwork of mine. This test is to see if DALL-E can be used to generate mockups. |
Prompt | a modern living room with a large art wall above a light blue mid-century sofa, some cacti, and a ray of light coming from a side window |
Output | |
Prompt matching |
This result is matching the prompt quite well. It would be difficult to find that many mockups matching the criteria from the traditional stock images. For instance, there would be few images available with cacti in them. The elements in the image follow the prompt and consistently follow the prompted style. Some elements not specified in the prompt have been included (carpet, lighting). |
My pick | |
Rendering | The quality of the rendering is excellent. The lighting effects is extraordinary knowing that all these images are generated not from 3D models but from a machine learning algorithm. The consistency of the color across the elements is right on and helps create a homogeneous scene. |
Composition | The compositions are just right, as one would expect. The placement of the elements, the balance of the empty space, the variety of full versus partial inclusion of elements (full sofa vs half sofa, full wall art vs half wall art, etc) is what one would find in an interior architecture photographic portfolio. |
Strength |
|
Weakness |
|
Commercial readiness | IÂ might use them for everyday social media placement. I would not use them for high-visibility projects (magazines, exhibits, etc). |
GPT3 description | 1. The image contains a large art wall with a light blue mid-century sofa in front of it. There are also some cacti and a ray of light coming from a side window. 2. The composition is fairly simple, with the main focus being the art wall and the sofa. The cacti and the ray of light help to add some interest and depth to the image. 3. The colors in the image are fairly muted, with the exception of the blue sofa. The art wall is mostly white with some black and gray elements. The cacti are green and the ray of light is a bright white. 4. The image is perceived as calm and relaxing. The blue sofa is the only pop of color, and it stands out against the muted background. 5. The image is interesting because it is a simple, yet effective, composition. The colors are also well-balanced and help to create a calming atmosphere. |
Comments | Here is a successful variation of the prompt to get a washed-out series of images:a modern high key overexposed living room with a large art wall above a light blue mid-century sofa, some cacti, and a ray of light coming from a side window Using DALLE-E to generate advanced backgrounds is definitely a viable application. The main focus of the final image is the high-quality content provided by the user, there is less demand for the generated content to be pixel perfect. Here is how I would integrate my own photograph (which IÂ designed as part of a recent series of abstracts). |
Architecture
Test | Can DALL-E generate design boards to help explore ideas and support a conversation between architects and clients. |
Prompt | an eichler home on the beach at sunset |
Output | New prompt: a mid-century "Eichler Home" on the beach at sunsetOutput: |
Prompt matching |
The resulting images are not "Eichler Homes". In this particular example, I have to put a three-star. I'm fully aware that this might be improved by changing the prompt and could be due to the limited amount of training material available compared to other visual topics. The house is surrounded by vegetation expected to be found on the seaside, the sand is nicely mixed with natural ground cover, and rocks can be seen in the background at the edge of the water. The sunset is materialized in the glass reflections. |
My pick | |
Rendering | The rendering doesn't have any major anomalies that a non-trained eye can catch. Details are approximate all around and look more like filler texture than meaningful details. The rendering of the sunset as a reflection in the house wall glass is powerful. The most important rendering mistake is that there are actually two sources of light in this image (two suns). One is behind the viewer which is seen in the window reflection, and one is coming from behind the house toward the viewer. Only an extreme HDR photograph could get to the final image generated by DALL-E. So, it's both a masterpiece of exposure and an anomaly. |
Composition | The composition is superb for a generated image. The house is located away from the ocean in a slightly elevated bed, the house faces the ocean (which is expected by anyone, but it was not specified in the prompt), the location of the trees, the rock, the sand, the ground cover feels very natural. The sand forms a path leading the viewer to the house. That is a very powerful composition. The background shows a hazy sky over the ocean, creating a non-distracting contrast to the house. All the elements are in proportion. |
Strength |
|
Weakness |
|
Commercial readiness | I believe that an architect could use a tool in real-time to help sketch ideas with their client, updating the prompt to alter some elements and create a concept board. |
GPT3 description | 1. The image shows an Eichler home on the beach at sunset. The house is surrounded by palm trees and has a pool in the front. The sky is a beautiful orange color and the sun is setting behind the house. 2. The composition of the image is very pleasing. The house is in the center of the frame and the palm trees are framing it nicely. The colors are also very beautiful and make the image very eye-catching. 3. The colors in the image are very warm and inviting. The orange sky is beautiful and the sun setting behind the house makes it even more special. 4. The image is perceived as very calming and relaxing. The colors and the setting make it seem like a perfect place to escape from the hustle and bustle of everyday life. 5. The image is interesting because it is a unique perspective of an Eichler home. Most people are not familiar with this type of architecture, so it is interesting to see it in such a beautiful setting.Very interesting that GPT3 says "The image is interesting because it is a unique perspective of an Eichler home. Most people are not familiar with this type of architecture", which seems to make the point that the DALL-E training corpus also lacks sources. Also interesting that GPT3 insists on including a swimming pool... |
Variations and edits | The DALL-E playground offers two features that should be mentioned even if they are not the focus of my testing. - Variation: the ability to select one of the six images returned by DALL-E and ask to get five more variations. - Edit: the ability to brush out an area of an image (either an image generated by DALL-E or uploaded from the user) and use a prompt to instruct DALL-E to fill in the erased area. Here is a "variation" of the image I picked above: Here is an "edit" of one of the variation. I wanted to include a Giacometti sculpture in the left foreground. I used the brush from the DALL-E interface and very roughly marked the destination area. a Giacometti sculpture of a man and a woman holding hands facing to the right. the man raises his arm to welcome the visitors |
Comments | I believe that there is a place for generative content in architecture, interior design, and landscape design as an aid or support for the architects when working with their clients in the early phases of a project for rapid ideation and directions. |
Retail packaging
Test | Can DALL-E generate designs for retail packaging (snacks, cereals, soap, etc). |
Prompt | a box of fantastic cheez-it to promote a harry potter movie with harry potter characters on it |
Output | |
Prompt matching |
On one side, the resulting images follow the prompt in that it looks and feel a fantastic playful Harry Potter snack box, but... the images do not have a specific reference to the cheez-it brand or visuals of snack and the characters are not exactly representing the Potter's gang. All text elements are lorem ipsum. Still very good in my opinion. |
My pick | |
Rendering | The rendering is sketchy at best. It's a very good concept sketch with low rendering fidelity. As a final product, it misses many of the marks. Maybe it can be corrected by a better prompt. Maybe it's a current limitation of the generative engine. |
Composition | The composition is superb for a generated image. It is not going to win an award (does any cereal box design?) but follows what seems to look like a co-branded consumer retail packaging. The placement of the characters looking down to the center of the box, the placement of the typographic elements, the use of the background, etc all fit in proportion and balance. |
Strength |
|
Weakness |
|
Commercial readiness | These images cannot be used as is by a long shot. They could be used as an aid for designers and marketers. |
GPT3 description | 1. The image contains a box of cheez-its with Harry Potter characters on it. The characters on the box are Harry Potter, Ron Weasley, and Hermione Granger. 2. The composition of the image is fairly simple. The box of cheez-its is the main focus, with the characters on the box being secondary. The colors are fairly muted, with the exception of the yellow and red of the cheez-its box. 3. The colors in the image are fairly muted. The cheez-its box is yellow and red, while the characters on the box are mostly gray and white. 4. The image is perceived as being fairly calm and relaxed. The muted colors contribute to this feeling. 5. The image is interesting because it is a simple way to promote a Harry Potter movie. The characters on the box are recognizable, and the image is likely to catch the attention of Harry Potter fans. |
Comments | DALL-E keeps up with the promise and continues to show its versatile ability to keep up with different visual domain spaces. In this packaging application, DALL-E could be a very useful aid for designers asked at 6PM to come up with 10 concepts by 8AM. It can also be very useful for rapidly collecting consumer feedback on such concept directions. |
Technical diagram
Test | Can DALL-E generate a technical diagram. |
Prompt | A technical diagram for a 2nd grader showing the law of gravity using an elastic band and an apple |
Output | |
Prompt matching | I doubt that a 2nd grader will understand gravity with any of these diagrams. Two stars for including an Apple and what may look like a rubber band, and for being like a simple diagram. |
My pick | I'll pass |
Rendering | The rendering uses the visual syntax found in 2nd grader technical illustrations. There are text labels, but they are lorem ipsum. |
Composition | Non of the diagrams are within the canvas. They are cropped, an unacceptable design decision for a diagram. |
Strength |
|
Weakness |
|
Commercial readiness | Nope |
GPT3 description | 1. The image contains a diagram of the low of gravity using an elastic band and an apple. The diagram is simple and easy to understand. 2. The composition of the image is straightforward and easy to follow. 3. The colors in the image are bright and vibrant. 4. The image is perceived as being helpful and informative. 5. The image is interesting because it is a clear and concise explanation of the low of gravity. |
Comments | Certainly one of the least applicable use of DALL-E. Better prompt design might help, but I certainly doubt that. Happy to be proven wrong. |
Diversity bias?
Test | Does DALL-E show any potential diversity toward human ethnicity and gender. |
Prompt | A close up view of a happy person eating a furry blue apple |
Output | |
GPT3 description | 1. The image contains a close up view of a person eating a furry blue apple. 2. The composition is simple, with the focus on the person and the apple. The background is blurred, making the person and the apple stand out. 3. The colors in the image are blue, white, and brown. The blue apple stands out against the white background, and the brown fur adds texture and interest. 4. The image is perceived as happy and fun. The close up view of the person eating the apple makes them look happy and content, and the blue apple is a fun and unusual color. 5. The image is interesting because it is a close up view of a person eating a furry blue apple. The blue apple is a fun and unusual color, and the close up view makes the person look happy and content. |
Comments | The six images generated by DALL-E seem to represent white females. This test was to assess the current handling of ethnicity and gender bias of DALL-E. It is my assumption (and hope) that future versions will have the ability to return by default a more diverse sample of images. For example a mix of male and female (and gender-neutral), ethnicities, and age to start with. I tried a different prompt to force a more diverse result: A close up view of a happy person eating a furry blue apple. The images represent diverse gender, ethnicities, and age groups.Output: Here is GPT3 take on the updated prompt: 1. The image contains a close up view of a person eating a furry blue apple. The person appears to be happy and content. There are a few other people in the background, representing diverse gender, ethnicities, and age groups. 2. The composition of the image is quite simple, with the focus on the person eating the apple. The colors are mostly blue and white, with a few other colors in the background. 3. The colors in the image are quite calming and serene. The blue of the apple stands out against the white background. 4. The image is perceived as being happy and content. The person eating the apple looks like they are enjoying themselves, and the other people in the background add to the feeling of happiness. 5. The image is interesting because it is a close up view of a person eating an apple. The blue color of the apple is also quite unique. I tried a quick Google image search for "A close up view of a happy person eating an apple" and the result is also biased: |
Brochure layout
Test | Can DALL-E generate a promotional brochure. |
Prompt | a commercial brochure for a new electric truck for the Californian market in a joan miro style |
Output | |
Prompt matching | All the expected elements of a commercial brochure for a product are included: the product (the truck), a title, sub title, and some editorial content placeholder. Not sure that it pops across as being a Miro style. |
My pick | |
Rendering | The rendering is on par with a very early low-fidelity concept sketch by a (junior) designer. They are acceptable for the early stage of a graphic project. |
Composition | The placement of the visual and textual elements entertain the expected basic rules of layout placement. The use of background and supporting space serve their purpose. Not an award winning layout, but a good starting point. |
Strength |
|
Weakness |
|
Commercial readiness | Nope |
GPT3 description | 1. The image contains a commercial electric truck in a Joan Miro style. The truck is blue and white with a yellow sun on the side. The background is a gradient of blues and greens. 2. The composition is asymmetrical. The truck is off center and the background is gradient. 3. The colors are blue, white, and yellow. The blue represents the sky and the white represents the truck. The yellow represents the sun. 4. The image is perceived as calm and serene. The blue and green represent the sky and the earth. The yellow represents the sun. 5. The image is interesting because it is a new electric truck in a Joan Miro style. |
Comments | I think it's another good example of DALL-E's value for a rapid generation of concepts to help support a creative professional in a project kick off phase. |
Illustration
Test | Can DALL-E generate an drawing illustration. |
Prompt | a pencil rendering of a group of students riding their bike to school in the early morning in winter in a mountain village |
Output | |
Prompt matching |
It's all there from the pencil rendering, the students riding their bike, the feeling of winter. The mountain village is not very present though. |
My pick | |
Rendering | The strength of the rendering is in the pencil drawing effect. It's consistent across the image and different techniques are used for different parts of the image (black solid for the bike's tires, thin lines for the tree branches, soft lines to suggest the presence of snow, etc). The trees, the back hill, the construction features are acceptable, but the actual rendering of the people on the bike is very approximate. It's good enough to suggest they are students (with backpacks) riding bikes, but some details are wrong or missing. |
Composition | The composition is strong, the placement of the bikers on the path, the placement of the background, all fit very nicely together. In the image I picked the bikers are seen from the back, riding away from the viewer. I think that in this case it reinforces the cold and the "wanna-avoid-to-be-there" feeling. |
Strength |
|
Weakness |
|
Commercial readiness | Lacks of final execution. |
GPT3 description | 1. The image contains a group of students riding their bikes to school in the early morning in winter in a mountain village. The students are bundled up in warm clothes, and the scene is set against a backdrop of snow-covered mountains. 2. The composition of the image is simple, but effective. The students are arranged in a diagonal line, with the lead student in the forefront. This creates a sense of movement and action, as if the students are on their way somewhere. The colors are muted, but there is a contrast between the white of the snow and the dark of the mountains. This creates a sense of depth and dimension. 3. The colors in the image are mostly muted, with the exception of the white of the snow. The snow appears to be freshly fallen, and the students are all wearing brightly colored clothing. This creates a contrast between the dark of the mountains and the light of the snow. The image is perceived as being calm and serene. 4. The image is interesting because it captures a moment in time that is often overlooked. The students are on their way to school, but the scene is set in the early morning, before the hustle and bustle of the school day has begun. The image is peaceful and quiet, and the muted colors add to the feeling of calm. |
Comments | This example shows DALL-E's strength following a given prompted style (pencil drawing in this case). It also continues to show its strength in composing various elements together. |
Creative abstract fine art
Test | Can DALL-E generate fine-art-grade abstracts images. |
Prompt | abstract fine art image with very thin fluid lines in perspective with orange and blue dominant colors in a modern minimalist style that suscitate a curious feeling |
Output | |
Prompt matching |
Well, the prompt asked for lines, and lines they are... I'm removing one star because the "perspective" intent of the prompt is mostly absent from the six concepts. |
My pick | IÂ clicked the "Variations" button on the DALL-E interface to see more like this one: |
Rendering | The rendering is excellent in my opinion. The visual treatment of the lines is consistent all across the image. Lines all merge and blend together in a smooth and again using a consistent visual syntax. |
Composition | I'm very picky when it comes to abstract composition. I gave conferences on composition in fine art. DALL-E returns only square images (at least in the preview version tested here). The composition makes a great use of the square 1:1 ratio. I would most likely not make any edits in the cropping or in the direction and fluidity of the leading lines. |
Strength |
|
Weakness |
|
Commercial readiness | Yes. These are images with better qualities than some current commercial art productions. |
GPT3 description | 1. The image contains fluid lines in perspective with orange and blue dominant colors. 2. The composition is modern and minimalist. 3. The colors are eye-catching and vibrant. 4. The image is perceived as curious and intriguing. 5. The image is interesting because it is unique and different from other abstract images.Interesting to note that, unlike DALL-E, GPT3's take is more than weak... |
Comments | This one to me the most powerful output of DALL-E. Is it because of my experience in the domain and hence the ability to craft a better prompt than for the other use cases? Is it because of DALL-E strength in generating new content rather than being some sort of a "image search engine"? In any way, any artist knows that the artistic and technical merit of a creation is only a small part of the journey to get the attention of the public. Building over time the "story" about the artist, the ongoing marketing effort is the foundation of the art business. What DALL-E has, as an artist of some sort, is that it's generated by a machine and that is a strong unique marketing statement, one that the public likes to hear and understand. Art generated by algorithm is far from being new and has seen waves of public noise over the decades. |
My conclusions
Overall, DALL-E 2 is an impressive text-to-image generator.
Like all new technologies, there is a long productization effort ahead to make it a commercial service (or product).
I believe though that it can be used as is as a "design aid" for creative professionals (architects, industrial designers, graphists, illustrators, etc). DALL-E 2 consistently generate strong and varied concepts to help a designer explore early directions.
It can also serve as a visual support for conversations between a creative professional and their clients, especially in the early phase designed to help articulate a design direction.
I have been using the text generator GPT3 in similar fashion with some success.
I want to thank OpenAIÂ for granting me an early access to this exciting new technology. I'm looking forward to seeing what their next steps are.
Feel free to reach out with any questions.