Skip to main content

Overview

The Files API handles document upload, processing, and retrieval. Supports PDF, DOCX, PPTX, images, and more. Base Path: /api/files

Endpoints

Upload File

Upload a document for processing.
POST /api/files/upload
Content-Type: multipart/form-data Form Fields:
FieldTypeRequiredDescription
filefileYesThe file to upload
company_idUUIDNoAssociated company
document_typestringNoType (cim, pitch_deck, financial, contract)
descriptionstringNoFile description
Example Request:
curl -X POST \
  -H "Authorization: Bearer TOKEN" \
  -F "file=@document.pdf" \
  -F "company_id=550e8400-e29b-41d4-a716-446655440000" \
  -F "document_type=cim" \
  "http://localhost:8000/api/files/upload"
Response (201 Created):
{
  "id": "990e8400-e29b-41d4-a716-446655440444",
  "filename": "document.pdf",
  "size": 2048576,
  "content_type": "application/pdf",
  "document_type": "cim",
  "company_id": "550e8400-e29b-41d4-a716-446655440000",
  "processing_status": "pending",
  "uploaded_at": "2024-01-22T10:30:00Z",
  "url": "/api/files/990e8400-e29b-41d4-a716-446655440444"
}

Get File

Get file metadata and status.
GET /api/files/{file_id}
Example Request:
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/api/files/990e8400-e29b-41d4-a716-446655440444"
Response:
{
  "id": "990e8400-e29b-41d4-a716-446655440444",
  "filename": "acme_cim_2024.pdf",
  "size": 2048576,
  "content_type": "application/pdf",
  "document_type": "cim",
  "company_id": "550e8400-e29b-41d4-a716-446655440000",
  "company_name": "Acme Corp",
  "processing_status": "completed",
  "progress": 100,
  "extracted_data": {
    "document_type": "cim",
    "company": "Acme Corp",
    "summary": "Technology company...",
    "financials": {
      "revenue": 15000000,
      "ebitda": 4500000
    }
  },
  "uploaded_at": "2024-01-22T10:30:00Z",
  "processed_at": "2024-01-22T10:32:45Z",
  "download_url": "/api/files/990e8400-e29b-41d4-a716-446655440444/download"
}

Download File

Download the original file.
GET /api/files/{file_id}/download
Example Request:
curl -H "Authorization: Bearer TOKEN" \
  -O "http://localhost:8000/api/files/990e8400-e29b-41d4-a716-446655440444/download"
Response: Binary file content

Get Processing Status

Check document processing status.
GET /api/files/{file_id}/status
Response:
{
  "status": "processing",
  "progress": 65,
  "current_step": "ai_analysis",
  "estimated_completion": "2024-01-22T10:35:00Z"
}
Status Values:
  • pending: Queued for processing
  • processing: Currently being processed
  • completed: Processing finished
  • failed: Processing error occurred

List Files

Get a paginated list of files.
GET /api/files
Query Parameters:
ParameterTypeDescription
pageintegerPage number
page_sizeintegerItems per page
company_idUUIDFilter by company
document_typestringFilter by type
statusstringFilter by processing status
Example Request:
curl -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/api/files?company_id=550e8400-e29b-41d4-a716-446655440000&status=completed"
Response:
{
  "data": [
    {
      "id": "990e8400-e29b-41d4-a716-446655440444",
      "filename": "acme_cim_2024.pdf",
      "document_type": "cim",
      "processing_status": "completed",
      "uploaded_at": "2024-01-22T10:30:00Z"
    }
  ],
  "total": 1,
  "page": 1,
  "page_size": 20
}

Delete File

Delete a file.
DELETE /api/files/{file_id}
Example Request:
curl -X DELETE \
  -H "Authorization: Bearer TOKEN" \
  "http://localhost:8000/api/files/990e8400-e29b-41d4-a716-446655440444"
Response (204 No Content)

Processing Flow

1. Upload → File saved to storage
2. Pending → Queued for processing
3. Processing → Docling extraction + OCR + AI analysis
4. Completed → Extracted data available

Supported File Types

TypeExtensionsMax Size
PDF.pdf100 MB
Word.doc, .docx50 MB
PowerPoint.ppt, .pptx50 MB
Excel.xls, .xlsx50 MB
Images.jpg, .png, .tiff20 MB
HTML.html, .htm10 MB

Extracted Data Structure

For CIMs

{
  "document_type": "cim",
  "company": "Acme Corp",
  "industry": "Technology",
  "summary": "Leading SaaS provider...",
  "financials": {
    "revenue": 15000000,
    "ebitda": 4500000,
    "ebitda_margin": 0.30,
    "revenue_growth": 0.25
  },
  "market": {
    "size": 500000000,
    "growth_rate": 0.15
  },
  "team": [
    {
      "name": "John Doe",
      "role": "CEO",
      "experience": "15 years in SaaS"
    }
  ],
  "deal": {
    "valuation": 75000000,
    "structure": "Equity sale"
  }
}

For Financial Statements

{
  "document_type": "financial_statement",
  "period": "2024 Q4",
  "revenue": 5000000,
  "expenses": 3500000,
  "net_income": 1500000,
  "balance_sheet": {
    "assets": 20000000,
    "liabilities": 8000000,
    "equity": 12000000
  },
  "tables": [...]
}

Examples

Upload with Frontend

async function uploadFile(file: File, companyId: string) {
  const formData = new FormData()
  formData.append('file', file)
  formData.append('company_id', companyId)
  formData.append('document_type', 'cim')

  const response = await fetch('http://localhost:8000/api/files/upload', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${token}`
    },
    body: formData
  })

  return response.json()
}

Monitor Processing

async function monitorProcessing(fileId: string) {
  const interval = setInterval(async () => {
    const response = await fetch(
      `http://localhost:8000/api/files/${fileId}/status`,
      {
        headers: { 'Authorization': `Bearer ${token}` }
      }
    )

    const { status, progress } = await response.json()

    console.log(`Status: ${status}, Progress: ${progress}%`)

    if (status === 'completed' || status === 'failed') {
      clearInterval(interval)
    }
  }, 2000)
}

Python Upload

import requests

files = {'file': open('document.pdf', 'rb')}
data = {
    'company_id': '550e8400-e29b-41d4-a716-446655440000',
    'document_type': 'cim'
}

response = requests.post(
    'http://localhost:8000/api/files/upload',
    headers={'Authorization': f'Bearer {token}'},
    files=files,
    data=data
)

file_info = response.json()
print(f"File ID: {file_info['id']}")
print(f"Status: {file_info['processing_status']}")

Error Handling

CodeErrorDescription
400file_too_largeFile exceeds maximum size
400unsupported_formatFile type not supported
404file_not_foundFile ID doesn’t exist
422processing_failedDocument processing error
507storage_fullInsufficient storage space
Error Response:
{
  "detail": "File too large. Maximum size is 100MB for PDF files.",
  "error_code": "file_too_large",
  "max_size_mb": 100
}

Next Steps