What is RAG? - Ramshankar Yadhunath

RAG stands for Retrieval Augmented Generation. It is an architectural pattern used to develop generative AI applications. 1. RAG stands for retrieval augmented generation which is one type of architecture that can be used to create generative AI applications (deals with both unimodal and multimodal data) 2. RAG is based on the fundamental principles of semantic text similarity and large language models 3. There are 3 steps to the process - Indexing (where an embedding model is used to convert available data into records in a vector store), Retrieval (where a query is converted to an embedding and compared against all indexed embeddings; allowing the retrieval of top N records that are most similar; similarity methods are many, though cosine is usually a safe option to start with), Generation (an LLM call is made by passing the filtered records, collectively called {context} and a {prompt} that is basically an instruction to the LLM on how an answer needs to be formulated. 4. Just with most pipeline architectures in data platforms, RAG can also be built with composable blocks in each stage. Embeddings can be pre-trained or fine tuned or new embeddings (if domain specific). You can choose any. LLMs too are many and can be chosen based on needs. 5. Once a RAG is developed, the evaluation of the RAG again has several different techniques available with the easiest being similarity scores such as ROUGE or BERT scores; and more complicated and expensive but accurate methods being human evaluation 6. New data being added into the available data would indicate new embedding creation, and depending on the frequency of change of existing data and volume, we may need to re-index or not. ### References <iframe width="700" height="350" src="https://www.youtube.com/embed/sVcwVQRHIc8" title="Learn RAG From Scratch – Python AI Tutorial from a LangChain Engineer" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>