It's a great day to talk about creative AI (again)
3D artist Axseru entertaines us with some tech erudition
Beeentornati amici per un altro episodio della creative AI saga! Alcuni di voi mi perdoneranno se questa serie verrà portata avanti in lingua inglese, come già vi avevo anticipato, ma non temete perché presto arriverà qualcosa di diverso. Come spesso capita, l’arrivo della primavera mi costipa l’agenda di impegni più o meno professionali, ma se questa newsletter, smentendo le previsioni di tutti i bookmaker, ha raggiunto l’anno di vita (anche piuttosto in salute) non vedo perché debba mollare proprio ora. Nel frattempo dovrò pur vivere un po’, perché se scrivessi e basta non avrei di che parlare.
Questo secondo episodio è il frutto dell’intervista gentilmente concessa da Alessandro Ferraro, in arte Axseru, ragazzo che molto di voi già conoscono e con cui intrattengo un bel rapporto d’amicizia ormai da molti anni. Axseru è un ragazzo timido e modesto, tanto che quando gli ho chiesto una presentazione per la sua figura professionale mi ha riso in faccia: io vi posso assicurare che quello che fa è strabiliante, ma non basterebbero due righe per rendere giustizia al suo lavoro. Nel frattempo vi linko il suo IG molto poco aggiornato ma dove comunque si può apprezzare qualche suo esercizio di stile.
Ps: per capire il contesto di questa saga ti consiglio di dare un’occhiata all’introduzione al primo articolo della suddetta.
As we questioned Axseru about his experience with creative AI, he shared his screen with us and he started scrolling through a fascinating wall of colorful images. As he kept scrolling, we were so delighted by what we were looking at that we almost forgot he was not answering our question. “Midjourney generates images from a text prompt you input”, Axseru says. “This bot or any similar AI-based image generator is a good tool to speed up the initial brainstorming process, where a 3D artist usually does moodboarding to gather visual references to kick start a new creation: it is like a Google image of non-existing images.” After sneak-peeking at the Midjourney Quick Start Guide, which defines itself as an independent research lab exploring new mediums of thought and expanding the imaginative powers of the human species, we can say that Axseru understood the purpose of the tool.
While exploring his Midjourney board, he showed us the different attempts he has made along the way. He begins by conceptualizing his ideas, then deconstructs the machine learning model to identify the reference that best aligns with his vision. “As of today, the product features three distinct models (V1, V2, V3) catering to diverse styles, while beta versions are occasionally launched.”. Despite playing around with them extensively, he hasn't gained a fundamental understanding of the differences between the three.. "It looks like one of them might be cartoonish, the other photorealistic, and so on with different styles. However, I am not entirely certain about this interpretation. If I had to guess, I would say that the three versions are simply a more refined training of one another." Here, he shows us the results of his experiment with the same prompt on three different models. The input was ‘png of a building’ and he was hoping to obtain a cutout image of a building but, as you can see, it did not work entirely.
"I have come to understand how Midjourney requires me to communicate with it. The AI is syntax-aware and can yield different results from only slightly different prompts. By beginning with a general sentence and adding specific details thereafter, one can control the representation being generated. For instance, specifying that the subject wears glasses and not earrings in the prompt's first sentence will likely result in glasses being considered the main aspect of the representation." As he rolls on into the stream of consciousness about his Midjourney experience, some interesting aspects are revealed: “I’m currently using these futuristic drone images to get some sparks for a 3D project I’m working on. It is amazing to see how, upon closer examination, these images are everything but specific.” We typically associate computer-related items with rigid scientific concepts, but this tells us something different. “You cannot tell whether they could exist as a real drone or not, but you still recognize it, don't you? This feels like an oneiric dimension to me, where details are blurred but you get back a lot from the whole”. This behavior, typical of a machine learning model generating images, is indeed in Axseru's case a great benefit since it leaves him room to be creative.
Although Midjourney is the only creative AI Axseru is actively using, he mentioned another powerful tool called Kaedim. This 2D image to 3D content converter is a recent and promising innovation that doesn't require any 3D modeling knowledge. According to his technical judgment, models are well-built from a topographic point of view and they would fit quite well in a low-poly video game. "I love how this AI gives me something that I can work on easily. It would be a perfect starting model, allowing me to skip the initial modeling phase." Kaedim has a pretty lumbering downside, though: the light plan is $599/month. There’s an intermediate one that goes up to 100 generations and 30 iterations per month, for $1799/month. The enterprise plan is only saying custom pricing. Both Kaedim and Midjourney work on Discord through bots: the latter is much more affordable though. Midjourney has three pricing tiers, and Axseru chose the $30/month intermediate plan. For an extra $20/month, users can keep their generated images private and viewable only on their browser profile.
The interview went on with a brief detour through some popular software that uses ML technology. “Photoshop now includes AI features in many of its tools to assist with advanced image editing and cutting. Machine learning models have been trained to fill or cut out images with great precision. Additionally, Blender is set to release an AI-based texture plug-in that generates materials based on mesh type. Many software programs also utilize powerful AI denoising features to quickly preview 3D models during the modeling process.” He was surprised to find out that iOS16's photo app could generate fast cutouts of images by tapping on an identified element. This feature has made the tool widely popular, and its usage has grown exponentially.
“I would really love to have two main AI-driven tools in my arsenal: an effective rigging tool, which could spare me a ton of tedious time, and since I deal with photogrammetry, I could use a denoiser to clean surfaces out.” He did not mention this, but he flies drones to scan buildings. "Although auto rigging has been used in the scene for some time now, it is currently limited to human-like models. There is currently no rigger able to recognize the specific model being created and generate internal bones automatically. This would be a game changer for the 3D-modelling scene." Regarding the 3D denoising tool, so far there is nothing similar out there. "This concept can be likened to 2D upscaling software as it has the ability to exponentially increase the resolution of an initial image. It can also be interpreted as a virtual hole filler for 3D scanned models, where the AI identifies and utilizes the appropriate type of material to fill any gaps present."
When we bring up the long-standing debate over the threat this technology is representing to the industry, he replies confidently: “these are great resources to consider in the first place: many people will lose their current jobs, of course, and that is just because AI is simplifying the process. This turns out in fewer employees having to do the same amount of work. Anyway, I do not consider today’s AI a no-skill-requiring tool: if you are aware of how it interprets prompts you will be led to more accurate results.” Effective communication continues to be an essential element, much like it was during the era when only humans were involved. Acquiring it is not an effortless skill. “Although AIs may appear as powerful tools nowadays, they still require proper application and approachability, similar to handle a sharp knife. Eventually, they could turn into something more similar to a crane, with many more means, but much harder to operate.” AI has both advantages and disadvantages. While AI-enabled devices may lead to more job displacement, individuals who acquire AI-related skills will be at the forefront of the emerging job market.
According to what we noticed so far, Axseru perceives himself as the primary agent in the model-human dichotomy. He envisions a final goal and employs Midjourney as a deconstructive tool. “As you can see here, it happened to start off with a simple drawing and then searching for a better version of it. More often, I just need some visual cue about how the final molded object could look.” When he showed us his current project, he explained that he had created a hybrid drone that could both walk and fly. As someone passionate about sea creatures, he designed it to look like a crab. He created several different prototypes, selecting the ones that best fit his mental image to work on. He described the process as feeling like organic conditioning.
In conclusion, we can infer that Axseru has a positive attitude towards the introduction of this technology in his workflow. He expressed great enthusiasm for the creative AI's ability to boost imagination. “As long as I keep up with these advances, my job will only be improved by them. As a 3D artist, technology is core and it will always be.” Despite this, he leaves us with a thought-provoking question: who should be credited as the author of AI-generated images? Is it the software house that creates the algorithm, the algorithm itself, or the artist who prompted the text that was used to generate the image? "While I have access to these artsy images, I don't feel they belong to me morally since I didn't invest the time to create them, and I don't even remember many of them. If I had to attribute them to someone, I might credit the algorithm that generated them. However, it's worth considering whether algorithms can truly claim authorship or if it's more appropriate to credit the programmers who designed them." Is it, though? Answers to questions like these are the focus of AI researchers. For example, should Midjourney, the software, be treated as a person with rights over the images it produces? Is the concept of intellectual property due for a reevaluation? And how should we address the use of copyrighted images for the software's training? Axseru is unable to provide answers to these question marks, and neither can we. Yet, the best way to clarify this scenario is to focus on how people feel about it.