AI-powered
podcast player
Listen to all your favourite podcasts with AI-powered features
Fine-tuning Embeddings for Data Domain Adaptation
An existing embedding, generated by a black box like OpenAI ADAP, can be fine-tuned with a transform to better model specific data. This process involves adding an adapter model on top of the base model, which can be fine-tuned on the document side or the query side. Lomidex offers the capability to fine-tune and encourages users to explore this approach. Additionally, fine-tuning weaker models like llama2 to output structured data and distilling prompts and instructions from more powerful models like JupyT4 are core use cases. Regarding the data pipeline for a rag, the chunking strategy plays a crucial role, and sub-optimal strategies can lead to a failing pipeline. Factors to consider include the quality of the file parser and selecting the appropriate chunking size and strategy.