Interested in how DevOps, IT Modernization and Agile practices can positively impact customer experience?
CAP Theorem Five years ago, Amazon found that every 100ms of latency cost them 1% of sales. Google discovered that a half-second increase in search latency dropped traffic by 20%. The need for scaling up/down/out is growing and so are the challenges of dealing with huge distributed systems. So, when designing such applications, it’s important to keep three core requirements in mind as described by Brewer’s CAP theorem: 1. Consistency 2. Availability 3. Partition-Tolerance The CAP theorem was first proposed by Eric Brewer of the University of California, Berkeley, in 2000, and then proven by Seth Gilbert and Nancy Lynch of MIT in 2002. Read more here. Defining The Three Core Requirements Consistency (C) requires that all reads initiated after a successful write return the same and latest value at any given logical time. Availability (A) requires that every node (not in failed state) always execute queries. Let’s say we have “n” servers serving our application. To ensure better availability we would add an additional “x” servers. Partition Tolerance (P) requires that a system be able to re-route a communication when there are temporary breaks or failures in the network. The goal is to maintain synchronization among the involved nodes. Brewer’s theorem states that it’s typically not possible for a distributed system to provide all three requirements simultaneously because one of them will always be compromised. Tradeoffs Between Requirements 1.
Docker Introduction “Build once, configure once and run anywhere….” That’s Docker summarized in one line. When trying to choose from the great number of available software application-development technologies, a key concern is the ability to deploy applications typically in any environment without overhead. Deploying services or applications across multiple environments can lead to conflicting communications between services, so one must address issues such as quick migration, scaling and performance. Shortcomings of Virtual Machines (VMs): Up to now, virtual machines have been the go-to method for packaging and distributing applications across various environments. They can be used to create isolated environments, to package them using tools such as Vagrant, and to ship them where needed. However, VMs have their shortcomings: 1. A VM’s size can grow very large when trying to handle all of the required dependencies and packages. 2. They are resource intensive because they consume a great deal of CPU and memory. A complex scenario, like scaling an application for multiple providers, can result in more complexities, such as running out of disk space. 3. From a developer’s perspective, their tools for building and testing applications are limited. 4. They produce significant performance overhead, especially when performing IO operations. [Tweet “Why #Docker Instead of Linux Containers (LXCs)?”] LXCs are lightweight and allow one to run multiple isolated instances on the same host. They share a single kernel, but can have a set definition for the number of resources they can consume. LXCs allow secure running of isolated instances absent interference among those instances. Docker is not a replacement for LXCs, but rather an added layer to make them easier to use for more complex operations. Docker distinguishes itself from LXCs in several ways: 1.
Amazon recently introduced new types of storage-optimized instances. This new generation of instances is available within the I2 and HI1 families. All provide high storage and better IO performance compared to other instance families in AWS. Flux7 Labs decided to benchmark these new instances to better understand the tradeoffs between them that our customers face. The I2 and HI1 families provide fast IO due to their SSD-backed instance stores. The instances are optimized for applications with specific storage and random I/O requirements. The HS1 type provides cost-optimized storage of 48TB and is optimized for sequential IO. The following table provides a quick view of the various instances in the I1, HI1 and HS1 families. Faced with so many options, it’s often difficult to decide which type of instance type to choose. Performance is important, obviously, but cost is also a key consideration. So we decided to run a few basic microbenchmarks on each instance type in order to calculate performance benefits-per-dollar. We chose to use two types of benchmarks: CPU benchmarking using CoreMark software. CPU Benchmarking Through CoreMark CoreMark is an industry standard microbenchmark within the EEMBC suite that’s used for testing CPU performance. It has some limitations, as do all benchmarks, so the only way to get an accurate depiction of performance is to run the target application with a representative dataset. CoreMark is quite good at providing simple CPU performance comparisons. You can find more details on CoreMark here and the software can be downloaded at the CoreMark website. It’s strength is it’s simplicity of use, as it provides a single number by which different CPUs can easily be compared. We ran the CoreMark benchmark 10 times and discarded the fastest and slowest two results in order to remove major outliers.
While advising a client with a strong interest in ARM servers, we decided to evaluate the computational overhead of various big data technologies, which led to some interesting discoveries. Since we in the field are all trying to figure out how big data technology will evolve, Flux7 Labs thought we’d share some of what we’ve learned. The Question The question we tried to answer was, “Is Hadoop a good candidate for microserver workloads?” Big Data workloads, with their high reliance on memory and network, are often touted as perfect candidates for moving to microservers. Among Big Data workloads, Hadoop has become the poster child for applications, and microserver vendors are keenly interested in seeing Hadoop ported over to their platforms. Our theory was that Hadoop is a very high-overhead application because it was designed to utilize excess CPU capacity to conserve disk and network bandwidth. This would make Vanilla Hadoop without tuning, and rearchitecting unsuitable for microservers due to their limited CPU horsepower and excess disk and network capacities. To test our theory, we conducted a brief experiment to compare application performance when running Hadoop, Apache Twitter and native.. The Experiment Our experiment was based on a project previously coded by our CTO, Ali Hussain, for an assignment in the Coursera course, “Introduction to Data Science” (https://www.coursera.org/course/datasci). Ali’s project was a compute-heavy application for performing sentiment analysis on a Twitter stream. You can find his code at https://github.com/Flux7Labs/twitter-fun. The code classifies tweets as positive, negative or neutral based on their content. Most of the work is performed by parsing text and looking up the words in a dictionary in order to classify them. We rewrote the original Python program in Java.
The Amazon I2 Instance Type Amazon has announced immediate availability of the I2 instance type, the next generation of Amazon EC2 High I/O instance and the best solution for transactional systems and high performance NoSQL databases such as Cassandra and MongoDB. I2 instances feature the latest generation of Intel Ivy Bridge processors, the Intel Xeon E5-2670 v2. Each virtual CPU (vCPU) is a hardware hyperthread from an Intel Xeon E5-2670 v2 (Ivy Bridge) processor. Its features, price and availability can be combined to derive a performance-oriented usage and to explore new use cases. Price and Availability I2 instances are available in four sizes as listed in the table below. Instance Type vCPUs ECU Rating Memory (GiB) Instance Storage SSD (GB) Linux On-Demand Price* ($/hr) i2.xlarge 4 14 30.5 1 x 800 $ 0.85 i2.2xlarge 8 27 61 2 x 800 $ 1.71 i2.4xlarge 16 53 122 4 x 800 $ 3.41 i2.8xlarge 32 104 244 8 x 800 $ 6.82 Support for Enhanced Networking I2 instances are the best solution for setups that require high packet rates because they support Enhanced Networking that provisions very high packet-per-second performance. The enhanced networking capabilities can be implemented using Single Root I/O Virtualization (SR-IOV). SR-IOV is a device-virtualization method that provides higher I/O performance and lower CPU utilization than traditional implementations. To take complete advantage of Enhanced Networking, simply launch the appropriate AMI on a supported instance type in a VPC that comes free of cost. I2 instances currently support only Hardware Virtualization (HVM) Amazon Machine Images (AMIs). That’s because the performance impact of paravirtualization does not make sense with the high throughput capabilities of paravirtualization.
And some Glossary treat for the weekend. Check out how well you know DevOps terms! Chef – Chef, a configuration management tool by Opscode, is an automation platform that revamps infrastructure into simple code. It helps to automate configuration, deployment and scaling of servers and applications, irrespective of whether the server or application is in the cloud, on site or a combination of both. Chef runs in two modes: Client/Server and standalone configuration. It is written in Ruby and Erlang. Puppet – Puppet is an IT automation software by Puppet Labs that aids management of an infrastructure throughout its lifecycle. It provides automation and management, right from provisioning and configuration to start with until orchestration and reporting.
One of our primary goals at Flux7 Labs is to help our clients reduce their AWS costs. In fact, our product VyScale is based entirely on cost optimization using Spot instances. We inform our clients when it makes economic sense for them to buy instance reservations because reservations for periods of unexpected minimum usage can be beneficial. Reserved instances function exactly like on-demand instances, except that you pay an upfront fee to gain cheaper hourly rates. Reserved Instance Pricing There are several levels of reservations. Higher ones allow you to pay more up front in order to achieve a lower hourly cost. The table below shows the rates for various types of reserved and on-demand instances for m1.large instances. Upfront Hourly in cents On-demand 0 24 Light Util 243 13.6 Med Util 554 8.4 Heavy Util 676 5.6 At some levels of utilization it makes sense to purchase reservations, rather than to rely solely upon on-demand instances. By factoring in upfront costs, we can determine which levels warrant purchasing reservations. What’s surprising is that those levels are fairly low, especially in the case of light reservations. Even at 30% utilization, light-reservation costs start to break even with those of on-demand instances. The following graph shows equivalent hourly costs for reservations at various utilization levels To better understand these numbers in terms of total cost, as opposed to incremental cost, the figure below shows total annual expenditures at various levels. For a 100% utilization of an instance, one can reduce costs to almost half of Amazon’s standard pricing for reserved instances, and that’s without any of the bulk discounts made available to high-volume AWS customers. At lower levels, reservations cost almost twice as much as those at higher levels.
In our previous post here, we detailed why Ganglia is a good tool for monitoring clusters. However, when monitoring a Hadoop cluster you often need more information about CPU, disk, memory, and nodal network statistics than the generic Ganglia config can provide. For those who need more finely tuned monitoring, Hadoop supports a framework for recording internal statistics and then for posting them to an external source, either to a file or to Ganglia. In fact, Hadoop now supports an implementation of the Metrics2 Framework for Ganglia. In this post we’ll discuss Hadoop Metrics2 Framework’s design and how it enables Ganglia metrics. Features The Hadoop Metrics2 Framework provisions multiple metrics output plugins for use in parallel. It allows dynamic reconfiguration of metrics plugins without having to restart the server, and it exports metrics via Java Management Extensions (JMX). Design Overview The Hadoop Metrics2 Framework consists of three major components: 1. The metric source is used to generate metrics. 2. The metric sink is used to consume the metrics produced by the metric sources. 3. The metric system is used to periodically poll metric sources and to pass the metric records to sink. Implementing and Configuring Components A metric source class must implement the following interface: org.apache.hadoop.metrics2.MetricsSource A metric sink must implement this interface: org.apache.hadoop.metrics2.MetricsSink The basic syntax to configure metric system components is: <prefix>.(source|sink).<instance>.<option> Here’s a sample job tracker configuration for sinking a file: jobtracker.sink.file.class=org.apache.hadoop.metrics2.sink.FileSink jobtracker.sink.file.filename=jobtracker-metrics.out Filtering Metrics Metrics can be filtered on source, context, records, tags, and metrics themselves. Here is a filtering example : test.sink.file1.class=org.apache.hadoop.metrics2.sink.FileSink test.sink.file0.context=foo: This will filter out all the metrics within the context “foo”. Hadoop–Ganglia Integration Metrics are collected from the following: 1. JobTracker 2. TaskTracker 3. NameNode 4. DataNode 5.
I’ve been doing a lot of analysis of latency and throughput recently as a part of benchmarking work on databases. I thought I’d share some insights on how the two are related. For an overview of what these terms mean, check out Aater’s post describing the differences between them here. In this post we’ll talk about how they each relate to one another. The basic relationship between latency and throughput within a stable system—one that is not ramping up and down—is defined by a simple equation called Little’s Law: occupancy = latency x throughput Latency in this equation means the unloaded latency of the system, plus the queuing time that Aater wrote about in the post referenced above. Occupancy is the number of requestors in the system. The Water Pipe Revisited To understand the latency/throughput relationship, let’s go back to Aater’s example of the water pipe. Occupancy is the amount of water in the pipe through which we are measuring latency and throughput. The throughput is the rate at which water leaves the pipe, and the latency is the time it takes to get water from one end of the pipe to the other. If we cut the pipe in two, we halve the latency, but throughput remains constant. That’s because by halving the water pipe we’ve also halved the total amount of water in the pipe. On the other hand, if we add another pipe, the latency is unaffected and the throughput is doubled because the amount of water in the system has doubled. Increasing the water pressure also changes the system. If we double the speed at which the water travels we halve the latency, double the throughput, and the occupancy remains constant. The following figures give a graphical view of the equation.
Recently at Flux7 Labs we developed an end-to-end Internet of Things project that received sensor data to provide reports to service-provider end users. Our client asked us to support multiple service providers for his new business venture. We knew that rearchitecting the application to incorporate major changes would prove to be both time-consuming and expensive for our client. It also would have required a far more complicated, rigid and difficult-to-maintain codebase. We had been exploring the potential of using container technology, Docker, to set up Flux7 Labs’ internal development environments and, based on our findings, believed we could use it in order to avoid a major application rewrite. So we decided to use Docker containers to provide quick, easy, and inexpensive multi-tenancy by creating isolated environments for running app tier multiple instances for each provider. What is Docker? Docker provides a user-friendly layer on top of Linux Containers (LXCs). LXCs provide operating-system-level virtualization by limiting a process’s resources. In addition to using the chroot command to change accessible directories for a given process, Docker effectively provides isolation of one group of processes from other files and system processes without the expense of running another operating system. In the Beginning The “single provider” version of our app had three components: Cassandra for data persistence, which we later use for generating each gateway’s report. A Twisted TCP server listening at PORT 6000 for data ingestion from a provider’s multiple gateways. A Flask app at PORT 80 serving as the admin panel for setting customizations and for viewing reports. In the past, we’d used the following to launch the single-provider version of the application: Both code bases were hard coded inside the Cassandra KEYSPACE.
Press Release January 24, 2014—Austin, TX—Flux7, a solutions company offering cloud optimization products, cloud consulting and implementation, specializing in DevOps and AWS development services, for small businesses through Fortune 500 corporations, today announces that its VyScale spot strategy solution—the only one of its kind on the cloud market—recently earned honorable mention in a competition sponsored by Amazon Web Services [AWS]. Judges of the second annual contest shortlisted droves of applicants to reveal the top six, naming Flux7’s VyScale among them last November at AWS re:Invent 2013, considered the largest gathering of developers and technical leaders from the AWS community. “We received many impressive applications making this year’s competition even harder to judge than last year’s. There were many sophisticated and novel uses of spot instances, and VyScale was among them, “ wrote Stephen A. Elliott, Senior Product Manager, AWS Elastic Compute Cloud [EC2], to Aater Suleman, CEO, Flux7. “Given how tight the competition was, we congratulate you on your honorable mention!” Suleman first learned of the award at the AWS re:Invent developers conference, and was quite happy with the result. He showed that, overall, VyScale provides more than 60 percent in reduced costs without the need for human intervention, installations or access to sensitive data. VyScale, now in closed beta with paying customers, seamlessly combines the cost savings of spot instances with the reliability of reserved and on-demand instances by using smart placement, instance selection and AI-based prediction. “VyScale is an application that’s the result of our team’s extensive research and expertise in cloud consulting,” Suleman said. “It’s the only spot-strategy-as-a-service product on the market that delivers key strategies for the AWS spot marketplace and solutions for scaling and disaster recovery.
On January 11, Aater and I attended Data Day Texas 2014 here in Austin. Sponsored by Geek Austin, it was such a great event that I thought I’d share some highlights. Data Day Texas holds special significance for Flux7 Labs because it was at Data Day 2013 that we made our first presentation, when Aater gave a talk on the role of microservers in big data, which you can find here. What I like the most about this conference is that it provides a perfect balance between technical depth and high-level theory that appeals to everyone. Another great thing about it is that, in addition to presentations on a wide array of topics, you get to meet and talk with a broad spectrum of cutting-edge leaders and innovators in the field. It was also great getting to meet and exchange ideas with many of our Austin network friends, people we know both personally and by reputation, but with whom we don’t often get time to hang out in our everyday professional lives. It’s inspiring and rejuvenating to step back from foci on our own work and to hear what others are thinking about and working on. This year’s keynote was delivered by Paco Nathan (@pacoid). Entitled “Data 2014: The Big Picture”, Nathan covered a variety of topics, from how playing Minecraft has taught his daughter enough to be a Linux sysadmin to how our schools are failing to educate students in the particular fields of math that are most relevant to the 21st century. For instance, current engineering curricula is placing primary emphasis on calculus when in fact it’s graph theory and linear algebra that’s most needed today. He also described the big data processing pipeline from beginning to end, explaining current trends toward functional programming.