A new form of artificial intelligence is born that learns and imagines like humans

Contents

An AI with common sense
How this applies to imaging

Meta, the company run by Mark Zuckerberg that controls Facebook and Instagram, claims to have a new model of artificial intelligence that mimics the way humans reason and that it will bring us much more realistic images than what we have seen so far.

The new model is called I-JEPA (Predictive Joint Image Embedding Architecture) and according to its developers it will completely change the way images are analyzed and created. Meta has announced that it will give researchers access to the components of I-JEPA so they can develop their own products with it.

An AI with common sense

I-JEPA is based on the ideas of Yann LeCun, the lead AI researcher at Meta and one of the fathers of artificial intelligence according to some. LeCun advocates bringing artificial intelligence closer to the way humans think. For this, it is key to teach the AI common sense or, in other words, models of how the world works.

Yann LeCun. (REUTERS – Gonzalo Fuentes)

“Human and non-human animals seem capable of learning enormous amounts of prior knowledge about how the world works through observation and through an incomprehensibly small number of interactions in an unsupervised and task-independent way,” LeCun says. “It can be hypothesized that This accumulated knowledge can form the basis of what is usually called common sense.

Meta researchers think that common sense can be seen as a collection of models of the world that can give guidance on what is probable, what is plausible, and what is impossible. Something that is useful, not only for cope with unknown situations and predict future outcomes, but also to complete the missing information.

The brain map of the six LeCun models. (Goal)

To achieve this, LeCun proposes to create an architecture based on six modules: the configurator module, which is in charge of the executive control of the rest of the modules; the perception module, which receives signals from sensors that help it understand what is happening outside; the world model module, which allows estimating what information is missing from the data provided by perception and predicting plausible future states of the world; the cost module, which seeks to minimize the cost in the long term and, according to LeCun, is where the basic impulses of behavior and intrinsic motivations reside; the actor module, which optimizes the sequence of actions and performs the first action in that sequence; and the short-term memory module, which is responsible for keeping track of the current and expected state of the world, as well as the associated costs.

How this applies to imaging

Modern generative artificial intelligence systems create images from text with a model called ‘diffusion’. These AIs, like Midjourney or Stable Diffusion, they are trained with hundreds of millions of different images that are accompanied by a text description. The model then decomposes the image into a cloud of pixels and then reverses the process to learn how to convert that pixel noise into the original image.

I-JEPA, however, applies its six models to provide the system with ‘common sense’ and avoid the usual errors of current generative AIs, such as deformed hands and extra fingers. “I-JEPA learns by creating an internal model of the outside world, that compares abstract representations of images (instead of comparing the pixels themselves),” the company explains in an article published on Tuesday.

The system, they say, predicts the representation of parts of an input, which can come from an image or text, from the representation of other parts of the same input. The idea is fill in the missing information in an abstract representation similar to how we humans understand information.

The benefits of this method are, according to studies carried out by Meta, a higher efficiency in imaging and in the use of the computational capacity of computers and a lower incidence of biases related to this type of technology. Even so, the researchers warn that this is only the beginning. “We look forward to working to extend the JEPA approach to other domains, such as image-text paired data and video data,” they write. “In the future, JEPA models could have interesting applications in tasks such as video comprehension.”

An AI with common sense

How this applies to imaging

By Peter Hughes