
#022 The Limits of Embeddings, Out-of-Domain Data, Long Context, Finetuning (and How We're Fixing It)
How AI Is Built
00:00
Maximize Performance by Combining Techniques
Using traditional machine learning techniques like XGBoost, support vector machines, and logistic regression on top of frozen embeddings can yield surprisingly strong performance for classification tasks. This approach requires less training time and often shows greater robustness, especially in scenarios involving out-of-domain or cross-lingual classification. By leveraging these techniques, one can efficiently adapt to multiple languages using a foundation of training data in a single language. Additionally, combining various embedding models generates diverse features that enhance classification effectiveness, demonstrating the potential of simpler models when applied judiciously in conjunction with embeddings.
Transcript
Play full episode