
Jerry Liu on the Future of AI: LlamaIndex, LLMs, RAG, Prompting and more ! What's AI Podcast Episode 25
What's AI Podcast by Louis-François Bouchard
Fine-tuning Embeddings for Data Domain Adaptation
An existing embedding, generated by a black box like OpenAI ADAP, can be fine-tuned with a transform to better model specific data. This process involves adding an adapter model on top of the base model, which can be fine-tuned on the document side or the query side. Lomidex offers the capability to fine-tune and encourages users to explore this approach. Additionally, fine-tuning weaker models like llama2 to output structured data and distilling prompts and instructions from more powerful models like JupyT4 are core use cases. Regarding the data pipeline for a rag, the chunking strategy plays a crucial role, and sub-optimal strategies can lead to a failing pipeline. Factors to consider include the quality of the file parser and selecting the appropriate chunking size and strategy.