NEW Multi-Modal AI by APPLE

No video

NEW Multi-Modal AI by APPLE

Рет қаралды 2,672

Күн бұрын

Apple published new Machine Learning (ML) models on its GitHub repo: 4M-21. Massively Multimodal Masked Modelling.
All rights w/ authors:
4M-21: An Any-to-Any Vision Model
for Tens of Tasks and Modalities
arxiv.org/pdf/...
Video from Apple and Lausanne:
storage.google...
#appleai
#apple
#multimodalai

Пікірлер: 11

@MeinDeutschkurs Ай бұрын

Great! Watched both, your video and the video by EPFL. Hope, the community will create a dataset that is not based on synthetic data to increase the quality. I was impressed by the video-frame demo. I hope that some day, audio and video/animation will be included. That’s so exciting!

@tomw4688 2 ай бұрын

Great catch! Thanks for reviewing this.

@mshonle 2 ай бұрын

It’s about time that we went back to encoder/decoder architectures again!

@user-zd8ub3ww3h 2 ай бұрын

it is very good of Any-to-Any introduction.

@thesimplicitylifestyle 2 ай бұрын

Very useful! Thank you! 😎🤖

@fontenbleau 2 ай бұрын

i don't understand why they release such miniscule useless models year long, the decent models in my experience starting from 30 billions only (yes, i have 128Gb RAM). Only such size provide some quality of more than function (a glimpse of intelligence esp uncensored) in squezzed quantised versions.

@code4AI 2 ай бұрын

Now I could explain to you, that current phones do have compute limitations on board or I could explain that research projects start with a smaller complexity to document proof of concept, but would you understand it?

@fontenbleau 2 ай бұрын

@@code4AI It's hard to tell their real motives, Apple is the most closed tech group. Yes, phones are incapable today as robots, no good chip anywhere. I understand perfectly, that's just my opinion and Apple will never release big models publicly, such are valuable asset. Llama 7B is good but only as dictionary/translator, anything less even more primitive. For spyware like Recall this small model is perfect.

@falklumo Ай бұрын

You seem to be confused. This work is not about an LLM, your parameter count intuition does not apply. This is better be compared with stable diffusion which DOES an ok job on 8GB GPUs.

@fontenbleau Ай бұрын

@@falklumo that's a weird reply and why you referencing to stable diff at all, an image generator? Kinda long to explain, but first Apple's stylus writing recognition (a grandfather of current Ai) was horrible, they bought patent license to use better one in Newton device, made by others.