Pipeline RAG
Fluxo de Treinamento
flowchart LR
A[Upload PDF/MD] --> B[PyPDFLoader / TextLoader]
B --> C[RecursiveCharacterTextSplitter\nchunk_size=1000 overlap=200]
C --> D[OpenAIEmbeddings\ntext-embedding-3-small]
D --> E[FAISS.from_documents]
E --> F[faiss_index salvo em disco\nmedia/agents/ID/faiss_index/]
F --> G[Agent.is_trained = True]
Fluxo de Recuperação (Chat)
flowchart LR
A[Mensagem do Aluno] --> B[FAISS.load_local]
B --> C[similarity_search\nk=4 documentos]
C --> D[Contexto RAG injetado\nno system prompt]
D --> E[LangGraph StateGraph]
E --> F[ChatOpenAI streaming]
F --> G[Resposta em SSE\npara o aluno]
Configurações do Splitter
RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)
| Formato | Loader |
| PDF | PyPDFLoader |
| Markdown | TextLoader |
| TXT | TextLoader |
| CSV | CSVLoader |
| XLSX | UnstructuredExcelLoader |
| DOCX | Docx2txtLoader |
Embeddings
- Modelo:
text-embedding-3-small (OpenAI) - Dimensão: 1536
- Armazenamento: FAISS em disco (
media/agents/{id}/faiss_index/)