How can we make Transformers, Large Language models, GPT, more sharable? How can we make them grow?
language models and transformers in https://huggingface.co/spaces/stabilityai/stable-diffusion
Today I was reading the excellent list of Transformers here: https://github.com/NielsRogge/Transformers-Tutorials and soon enough realized how many separate applications and separate datasets these have been trained on. Most of these neural network are offered as pre-trained and can be used and fine-tuned for your custom application. That is super great for someone that is looking for a specific problem to solve.
For example, you can get a model to turn an image of a receipt into a JSON table, or you can extract text from a document, or get depth estimation from a single image, etc.
All of these are separate tasks trained in isolation
One thing that one realizes pretty quickly by training neural networks is that training on more data, more tasks, more abilities always gets you a better model. One that can do much better in the one application you care about, even if you are not looking for a do-it-all model.
So for me this is really a big problem!
A set of problems:
- models are trained in isolation
- use a lot of computing resources to train
- are only effective in their own domain
- use many custom input and output encodings, symbols, techniques
- are not able to share knowledge model to model
All the knowledge these models learned, it is not shared!
A set of questions:
How can we share these models and what they learn?
How can we build a unified model?
How can we make sure all data uses the same formats and is compatible? Should we learn those transformations also?
about the author
I have more than 20 years of experience in neural networks in both hardware and software (a rare combination). About me: Medium, webpage, Scholar, LinkedIn.
If you found this article useful, please consider a donation to support more tutorials and blogs. Any contribution can make a difference!