Cloud Hosting Glossary

Struggling to tell your APIs from your CDNs? Read our comprehensive cloud computing glossary covering the most common terms.

< Back to glossary

Auto-scaling

Auto-scaling is a cloud computing feature that automatically adjusts computational resources—such as CPU, memory, or storage—based on real-time demand. It ensures optimal performance during traffic spikes while reducing costs during low-demand periods by scaling resources up or down dynamically.

How Auto-Scaling Works

Baseline Configuration: Users define initial resource levels based on typical workloads.

Scaling Policies: Reactive Scaling: Adjusts resources in response to real-time metrics like CPU usage or network traffic.

Predictive Scaling: Uses historical data and machine learning to anticipate future demand.

Scheduled Scaling: Pre-provisions resources for known high-demand periods (e.g., Black Friday sales).

Resource Allocation: Adds new instances or increases capacity when demand rises.

Terminates unused instances during low-demand periods to save costs.

Benefits of Auto-Scaling

Cost Efficiency: Reduces expenses by provisioning only the resources needed at any given time.

Performance Optimization: Maintains consistent application performance even during traffic spikes.

High Availability: Ensures uninterrupted service by dynamically allocating resources during unexpected surges.

Challenges of Auto-Scaling

Configuration Complexity: Requires careful setup of scaling policies to avoid under-provisioning or over-provisioning.

Latency During Scaling Events: Some delay may occur when launching new instances during sudden spikes in demand.

Real-World Example

An e-commerce website uses auto-scaling during holiday sales events. When traffic surges due to promotions, additional server instances are automatically deployed to handle the load, ensuring fast checkout times and preventing downtime.

These entries provide detailed insights into each term while maintaining clarity and simplicity for easy understanding.