Sizing up any processing workload, whether Data Warehousing (aggregations, mostly I/O intensive) or any Data Sciences processing workload (primarily CPU intensive, but depends on algorithms) is a matter of diligent analysis that relies on multiple factors. For example:
- Is the processing or algorithm CPU intensive or I/O intensive
- Data volumes
- Aggregations
- Indexing
- Code quality & efficiency
- Programming language / Tool of choice
- Data store being used and where it falls on the CAP Theorem
- Choice of Elastic Servers, and Cluster size based on above factors
- Continuous and automated monitoring of cost and underlying drivers etc.
Performance and scalability efficacy can somewhat be introduced into the architecture right at the offset of Instance/Server provisioning based on nature of application, experience, design principles, and thorough understanding of Elastic Cloud’s pricing. However, this efficacy may not live forever merely based on this initial understanding. An embedded and on-going study of the performance profile of the Application to be run needs to be part of the DevOps discipline. Along with it, the Cloud provider’s toolset around cloud orchestration, DevOps and cost management needs to be brought to bear. For instance, in the case of AWS, tools and services such as Cloud Formation, Cost Explorer, Trusted Advisor etc. can be used to manage performance, scalability and associated cost. In addition, a healthy ecosystem of product vendors is also emerging to help monitor and manage these levers.
Overall, the Elastic Cloud's cost management is a perennial process that requires a disciplined approach to operate, monitor and manage cost balance against the triad of requirements (performance, scalability and business results desired). How we select the right elastic servers, split the workload or auto-scale the infrastructure, is a topic that can thrive on its own merit in a future post.
These are representative levers to be managed; some levers will affect the cost in a big way (Compute), others will move the needle a little (Storage), and the remaining cost vectors (load balancing, data transfer, monitoring, Elastic IP addresses etc.) will most likely fall on the lower end of this grayscale.
How is your analytic application performing and scaling? Are you deriving the purported cost benefits of Elastic Cloud?
Elevondata (www.elevondata.com) is a leading edge data management advisory and data lake solutions company. Rohit Tandon can be reached at rtandon@elevondata.com.