Predicting the Order of Upcoming Tokens Improves Language Modeling arxiv.org 5 points by wavelander 8 hours ago
NitpickLawyer 7 hours ago Are any of these methods doable on pre-trained models? Like freeze the model and only train these add-ons? Having to redo the training runs with these optimisations doesn't sound too practical, in the great scheme of things.
Are any of these methods doable on pre-trained models? Like freeze the model and only train these add-ons? Having to redo the training runs with these optimisations doesn't sound too practical, in the great scheme of things.