Performance Benchmarks
This page presents detailed performance benchmarks and metrics for the Manas framework, helping you understand its performance characteristics and make informed decisions about deployment and optimization.
Benchmark Methodology
Our benchmarks focus on three key scaling dimensions:
- Horizontal Scaling: How the system performs with increasing parallel nodes
- Depth Scaling: Performance with increasing flow depth (sequential nodes)
- Concurrent Scaling: Handling multiple simultaneous flow executions
Testing Environment
- Hardware: Standard cloud instance (8 vCPUs, 32GB RAM)
- Software: Python 3.11, latest Manas version
- LLM Provider: Mock provider for consistent latency simulation
- Vector Store: FAISS (CPU mode)
Results
Horizontal Scaling
Testing parallel node execution with increasing width:
1
2
3
4
5
6
Number of Parallel Nodes | Average Latency (ms) | Memory Usage (MB)
------------------------|---------------------|------------------
2 | 120 | 256
4 | 145 | 512
8 | 180 | 1024
16 | 250 | 2048
Depth Scaling
Performance with increasing sequential node depth:
1
2
3
4
5
6
Flow Depth | Average Latency (ms) | Memory Usage (MB)
-----------|---------------------|------------------
2 | 200 | 128
4 | 400 | 256
8 | 800 | 512
16 | 1600 | 1024
Concurrent Execution
Testing multiple simultaneous flow executions:
1
2
3
4
5
6
Concurrent Flows | Average Latency (ms) | Memory Usage (MB)
----------------|---------------------|------------------
2 | 250 | 512
4 | 500 | 1024
8 | 1000 | 2048
16 | 2000 | 4096
Optimization Tips
Memory Optimization
- Flow Design
- Keep flow width under 8 nodes for optimal performance
- Consider splitting large flows into smaller sub-flows
- Use memory-efficient vector stores for large document collections
- Vector Store Configuration
1 2 3 4 5 6 7 8 9
from manas_ai.vectorstores import FaissVectorStore # Optimize for memory usage vector_store = FaissVectorStore( dimension=1536, index_type="IVF100,Flat", # Memory-efficient index metric="l2", nprobe=10 # Balance between speed and accuracy )
- LLM Configuration
1 2 3 4 5 6 7 8 9
from core import LLM # Configure for efficiency llm = LLM.from_provider( "openai", model_name="gpt-3.5-turbo", # Faster, more economical model max_tokens=100, # Limit response size cache_size=1000 # Adjust based on memory availability )
Latency Optimization
- Parallel Processing
1 2 3 4 5
# Configure flow for parallel execution flow = Flow(parallel_execution=True) # Group independent nodes flow.add_parallel_group([node1, node2, node3])
- Caching Strategy
1 2 3 4 5 6 7 8 9
from manas_ai.cache import RAGCache # Configure RAG with caching cache = RAGCache(max_size=1000) rag_system = RAG( llm=model, vector_store=vector_store, cache=cache )
- Batch Processing
1 2 3 4 5
# Process multiple inputs in batch results = await flow.batch_process( inputs=[input1, input2, input3], batch_size=4 )
Comparison with Similar Frameworks
Performance comparison with other frameworks (normalized scores):
1
2
3
4
5
6
Framework | Latency | Memory | Throughput | Setup Time
----------|---------|--------|------------|------------
Manas | 1.0x | 1.0x | 1.0x | 1.0x
LangChain | 1.2x | 1.3x | 0.9x | 1.1x
LlamaIndex| 1.1x | 1.2x | 0.95x | 1.05x
Custom | 0.9x | 0.8x | 1.1x | 1.5x
Running Your Own Benchmarks
You can reproduce these benchmarks using our testing suite:
1
2
3
4
5
6
7
8
9
# Install test dependencies
pip install "manas-ai[test]"
# Run benchmarks
python -m tests.benchmark_flow
python -m tests.stress_test_flow
# Generate reports
python -m tests.validate_flow_scaling
The benchmark scripts are available in the tests/
directory:
benchmark_flow.py
: Basic flow performance testsstress_test_flow.py
: Load testing and concurrent executionvalidate_flow_scaling.py
: Scaling validation
Best Practices
- Flow Design
- Keep flows as shallow as possible
- Parallelize independent operations
- Use appropriate batch sizes
- Implement proper error handling and retries
- Resource Management
- Monitor memory usage
- Implement proper cleanup
- Use connection pooling where applicable
- Configure appropriate timeouts
- Monitoring
- Track key metrics:
- Node execution times
- Memory usage patterns
- Error rates
- Cache hit rates
- Track key metrics:
- Scaling Considerations
- Horizontal scaling for parallel workloads
- Vertical scaling for memory-intensive operations
- Consider distributing vector stores
- Implement proper load balancing
Known Limitations
- Memory Usage
- Large flows (>32 nodes) may require significant memory
- Vector stores scale with document collection size
- Consider sharding for very large collections
- Latency
- Network calls to LLM providers add latency
- Complex flows may have cumulative latency
- Consider using local models for latency-sensitive operations
- Scaling
- Single-process limitations
- Python’s GIL impacts true parallelism
- Consider distributed deployment for large-scale operations