Seamless reconfiguration of stateless AI functions

278 Views

Seamless reconfiguration of stateless AI functions

Next generation applications such as autonomous driving, augmented/virtual reality, smart technologies, interactive games, and Industry 4.0 produce massive scales of data that must be analyzed in a timely fashion. Stream processing frameworks are often used to address this need.

Stream processing applications are often structured as a dataflow graph. Vertices can be sources that generate streams of data tuples, or operators that execute a function over incoming data streams. Sinks, are a special type of vertices that consume the processed data but are terminal and represent the end of the flow. Traditionally, all application components are placed in the cloud to take advantage of powerful datacenters. Unfortunately, this approach is not compatible with next generation applications, since sending data over wide-area links to the cloud results in high bandwidth usage and high application latency.

Edge computing expands cloud computing with a hierarchy of computational resources located along the path between the edge and the cloud. We argue that efficient use of edge computing for stream processing requires support for seamless reconfiguration and deployment of application operators without disrupting application execution. In particular, throughput should be stable, and latency should not spike during the reconfiguration. Seamless reconfiguration is required to enable efficient resource sharing between applications running on edge datacenters. The smaller size of an edge datacenter, leads to higher costs for storage and computation relative to the cloud (e.g., AWS Wavelength is 40% more expensive than EC2). It is therefore impractical (and may not be even possible due to limited resources) to run all applications continuously on the edge.

Figure 1 illustrates the benefits of dynamic reconfiguration for an application running on a hierarchy of edge datacenters with three levels: cloud, region, and city. The application provides analytics about traffic on city roads by performing object detection on video frames produced by a network of motion-activated street cameras. The application consists of a sequence of three operators: [F], a frame filter operator that removes frames that do not significantly change compared to the previous frame; [O], an object detector operator; and finally, [A], an aggregation operator that computes statistics.

Initially, all operators are deployed on the cloud as the number of active cameras is small, and the network cost of transmitting the raw images to the cloud is low (Figure 1a). As more cameras become active in cities [1,2] located in Region 1, network traffic grows and it becomes more cost-effective to filter frames closer to the source by deploying operator [F1] in Region 1 (Figure 1b). The original [F] operator continues to run in the cloud, where it processes traffic from Region 2. Similarly, operators [O] and [A] are not replicated and continue to run only on the cloud. Operator [O] is CPU intensive and benefits from the cheaper cloud cycles. Operator [A] aggregates data across regions and has to run on the cloud, where it has a global view. As traffic continues to grow in City 1, the application adapts by creating a new replica of the operator [F2] in this city (Figure 1c). This process is repeated with the creation of [F3] as more cameras become active in City 2. Finally, the [F1] operator in Region 1 is removed as it no longer necessary (Figure 1d).

State-of-the-art frameworks, such as Apache Flink, Apache Spark, and Apache Storm do not support seamless application reconfiguration. These frameworks were designed to run on the cloud, where application reconfiguration is rare. It is therefore not surprising that reconfiguration in these frameworks is implemented as a high-latency, system-wide stop-the-world event that involves expensive coordination, often in the form of a global barrier. Global coordination is required because these frameworks use early-binding routing, where upstream and downstream operators have direct socket connections. Our experiments show that even for a small deployment of a few nodes, the stoppage time (the interval where the application stops processing data) is measured in tens of seconds and grows with the size of the deployment. Such an approach is incompatible with edge applications, which often have real-time requirements and where reconfiguration is a common occurrence.

VantEdge Labs developed a a stream processing framework for edge networks called Shepherd. Shepherd enables seamless application reconfiguration with minimal stoppage time. Shepherd’s architecture uses a network of software routers to transfer data tuples between operators. Shepherd implements a late-binding approach to routing, where an operator does not need to know the location of the next operator that will consume the tuples that it produces. Instead, tuples are shepherded to their destination. This approach allows for flexible reconfiguration with minimal stoppage time and without requiring global coordination. As an optimization, Shepherd uses direct network connections (bypassing the router) to transfer messages between operators deployed in the same datacenter; however, this is done without compromising reconfigurability.

We evaluated Shepherd on a hierarchical edge network composed of several Amazon datacenters geographically distributed across North America. Shepherd reduces stoppage time by up to 97.5% compared to Apache Storm. Moreover, in contrast to Apache Storm, Shepherd’s stoppage time does not increase with the number of datacenters in the network, network latency, or application size.