Struggling to tell your APIs from your CDNs? Read our comprehensive cloud computing glossary covering the most common terms.
< Back to glossary
Auto-scaling is a cloud computing feature that automatically adjusts computational resources—such as CPU, memory, or storage—based on real-time demand. It ensures optimal performance during traffic spikes while reducing costs during low-demand periods by scaling resources up or down dynamically.
Baseline Configuration: Users define initial resource levels based on typical workloads.
Scaling Policies: Reactive Scaling: Adjusts resources in response to real-time metrics like CPU usage or network traffic.
Predictive Scaling: Uses historical data and machine learning to anticipate future demand.
Scheduled Scaling: Pre-provisions resources for known high-demand periods (e.g., Black Friday sales).
Resource Allocation: Adds new instances or increases capacity when demand rises.
Terminates unused instances during low-demand periods to save costs.
Cost Efficiency: Reduces expenses by provisioning only the resources needed at any given time.
Performance Optimization: Maintains consistent application performance even during traffic spikes.
High Availability: Ensures uninterrupted service by dynamically allocating resources during unexpected surges.
Configuration Complexity: Requires careful setup of scaling policies to avoid under-provisioning or over-provisioning.
Latency During Scaling Events: Some delay may occur when launching new instances during sudden spikes in demand.
An e-commerce website uses auto-scaling during holiday sales events. When traffic surges due to promotions, additional server instances are automatically deployed to handle the load, ensuring fast checkout times and preventing downtime.
These entries provide detailed insights into each term while maintaining clarity and simplicity for easy understanding.