Using Machine Learning with TensorFlow and AWS Lambda
The market for cloud solutions is growing rapidly worldwide. Understandably, this growth can be attributed to the numerous advantages cloud computing offers to organizations of all sizes.
For example, scalability. Now, there is no need to purchase equipment with a surplus to handle peak loads. In the cloud, you can easily connect new servers as needed and release them immediately after use. Moreover, the cost of renting equipment significantly decreases when it is shared and used for various computing resources among a large number of subscribers.
Another undeniable advantage is the elimination of the need to maintain a large team of technical specialists. Now, there is no need to monitor the functionality of the equipment, purchase, and replace faulty parts, as these responsibilities are handled by the cloud service providers. The user can focus on development and business activities.
Currently, more than 90% of large organizations are using cloud solutions. The greatest growth is observed in the Internet of Things (IoT), Content as a Service (CaaS), and machine learning fields. SimbirSoft also has such projects under its belt. For example, we are participating in the development of an optical character recognition and fact extraction system for a large American company. Let's focus on it in more detail.
The customer has a vast electronic repository containing approximately 700 million scanned newspaper pages. Periodically, there is a need to process all this information and extract various facts about specific events from it. For example, to find all wedding announcements and create a knowledge base about who, when, and with whom marriage was registered.
The architecture of a typical system for solving such a task can be represented as a pipeline (using the Pipe and Filters pattern). At each stage, the information is analyzed, transformed, filtered, and passed to the next stage.
Requests to the system arrive unevenly, with periods of downtime and peak loads. Therefore, the system should automatically scale, adjusting to such conditions. Cloud solutions are perfectly suited for this purpose. Additionally, the customer is already operating in AWS (Amazon Web Services), a commercial public cloud supported and developed by Amazon.
Let's examine the architecture of an element in our AWS pipeline:
- The task flow processing code is written in Java.
- TensorFlow, a machine learning system, is used for text analysis.
- Data preparation for TensorFlow is implemented in Python.
- For high scalability demands, AWS Lambda is used for data preparation.
- Computing power in the cloud is provided by Amazon EC2 service.
- AWS Auto Scaling is used for automatic scaling.
We primarily use ready-made solutions, and the architecture appears straightforward at first glance. However, there are crucial cloud-specific details that require attention.
The primary concern is cost. Most of the cloud services are paid, though their individual rates may be relatively low. For instance, the cost of the cheapest Linux server is around 40 cents per day. Yet, automatic scaling can increase the system's cost significantly. During peak loads, our cluster expanded to several hundred machines. Hence, it's crucial to set limits on the cluster size to stay within budget.
Before deploying the system into production, we always make a rough cost estimate, and this task is challenging. Amazon's pricing structure is quite intricate, with certain subtleties. The AWS Pricing Calculator helps simplify the process to some extent. Once the estimation is done, we track its accuracy using Cloudability.
If you decide to work with the cloud independently, enable AWS Billing Alert on your account. This helps avoid unpleasant surprises at the end of the month when you receive the bill for services. And be sure to set up two-factor authentication since there have been cases of account breaches and launching expensive computations.
You can decrease operational costs by enhancing the efficiency of computing resource usage. We monitor CPU and GPU utilization on the running server, aiming for it to stay above 90%. Selecting the appropriate server type (or EC2 instance) is crucial for achieving this. AWS offers hundreds of different instance types, varying in cost, RAM size, number and type of processors, GPU availability, and more.
In our case, it was crucial to make optimal use of TensorFlow:
- choose between TFServing and Embedded TensorFlow;
- make the choice between CPU or GPU model;
- determine the amount of RAM required for computations;
- calculate the optimal number of threads;
- configure batch processing of requests.
Effective load testing helps make the right decision based on accurate measurements rather than intuition.
Significantly improving the scalability of the system can be achieved with the help of Lambda Functions. We use them for pre-processing data before sending it to TensorFlow. The code of the Lambda function can be executed without the need to provision and manage servers. You only pay for the actual computation time. However, there are some nuances:
- charges are based on the number of requests to the functions and their duration, but the price depends on the amount of allocated memory;
- increasing the memory size also leads to a proportional increase in computational resources (up to a certain limit). This trick allows achieving high function performance at the same cost;
- the processing time is rounded up to 100 ms (and sometimes it significantly affects the cost);
- there is a limit on the number of requests to the function per unit of time, and exceeding this limit leads to throttling and rejections;
- the number of rejections can be significantly reduced by properly selecting Concurrency and Reserved Concurrency parameters;
- if a Lambda function is not used for some time, its code is removed from the container, and the subsequent start will take longer than usual (cold start).
In this project, we successfully dealt with all technical complexities, and ultimately, the use of cloud services allowed us to focus on business tasks and spend less time on infrastructure. During the project, we once again confirmed the importance of load testing and cost estimation. Thanks to them, we promptly identified and reworked the most expensive part of the system. As a result, we successfully met both the deadlines and the budget. The discrepancy between the estimated and actual results was less than 5%.
You can learn more about our processes here. For more cases and useful materials for business, visit our LinkedIn and Medium accounts.