Interested in how DevOps, IT Modernization and Agile practices can positively impact customer experience?
In a recent post, we benchmarked ‘r3’ instances using CoreMark. We also did disk bandwidth benchmarking on other AWS instance families: (‘i’ instance family, ‘c3’ instance family, and ‘m’ instance family). Today, we are sharing the same benchmarking done on ‘r3’ instance types using FIO. ‘R3’ instances are the new generation memory optimized instances family offered by Amazon AWS. You can get more details about this instance family here. We have used FIO for benchmarking the instance store disks. In this case, they are SSD drives. The benchmarking methodology is described here. We have ensured that the load is evenly spread across all the drives.The following table shows the results for Disk I/O benchmarking. We also compared the disk I/O bandwidth for specific I/O operations type in the following table. All units are in KBytes/sec. You can see here that when it comes to random I/O, r3.2xlarge and r3.4xlarge offer nearly similar bandwidth. We hope these benchmarking results help you make the right choices. Let’s continue the conversation. We’d like to hear about some of your findings. Send me your inquiries at email@example.com.
A lot of the conversation on DevOps is focused on what are the right tools to accomplish our goals. The popularity of a tool is more than an indication of interest. Many of the popular DevOps tools are open source and the long-term viability of a project is dependent on the number of users of that project along with the ease of supporting it. While emphasizing collaboration and integration, DevOps also looks to automation tools that can leverage an increasingly programmable and dynamic infrastructure from a lifecycle perspective. Version control and automating code deployments are two of the most impactful common tools. But, there are many more. Others include configuration management, containerization and IaaS. In this post, we are exploring some of the more common DevOps tools trends over the past four years, starting in 2010. The following table is a summary of the DevOps tools we will be talking about in this post. Source: Google Trends Location: Worldwide Category: Computers & Electronics Date Range: January 2010 – June 2014 Configuration Management As you can see, Chef and Puppet have shown a fairly steady climb during the past four years compared to other configuration management tools. Among the two, Chef has the upper hand. Ansible, a new comer to DevOps, is starting, though, to gain popularity. While Juju is becoming a very interesting tool, it hasn’t really gained any popularity. It has an event-based model that provides node bringup, discovery, configuration and maintenance, along with a GUI display of the environment. However, it has many PaaS-like restrictions in terms of flexibility, and that’s probably why it has never picked up. Containerization Tools Check out the explosion of Docker compared to everyone else! In March 2013, Docker hit the market and was introduced to a bevy of developers.
‘R3’ instances are the memory optimized instances which are being provided by Amazon AWS. You can get more details about this instance family here. In our previous benchmarking posts, we have benchmarked ‘i2’, ‘m3’, ‘c3’ instance families. In this post, we are sharing the results of CPU benchmarking for ‘r3’ instances. As in previous posts, we have used CoreMark for CPU benchmarking. The following table shows the CoreMark score obtained for each of the ‘r3’ instance type. Here is a chart of the same shown as a bar graph. We also calculated the CoreMark score per dollar rent. We found that there is not much difference across the ‘r3’ instance family. At Flux7, there’s a lot more benchmarking that we do. To learn more about our benchmarking activities, visit our website here.
Ansible is an IT automation tool that provides continuous deployment capabilities and zero downtime rolling updates. It’s simplicity, agentless features, and scalability is what makes it stand out. In this post, let’s discuss some common terms used in Ansible to get a better grasp of this tool. Playbooks Developed in basic text language, Playbooks are human readable and form the basis of the configuration, deployment and orchestration language of Ansible. Some interesting functionalities of playbooks are: Describe policies for remote systems. Manage configurations and deployments of remote systems. Zero downtime rolling updates. Interact with monitoring servers and load balancers. A playbook is a compilation of one or more plays. Each play in turn is responsible for mapping tasks to groups of hosts. Tasks are typically calls to Ansible modules. Modules Ansible provides a module library or user-defined module that controls system resources. All modules support arguments, and most of them accept the key = value form. Data returned from modules are typically in the JSON format. Modules are idempotent, which means a change is made to the system only when the need arises. A wide variety of modules are supported by the Ansible module library. A few of them include cloud, files, database, network, notification, packaging, system and web infrastructure modules. Roles Roles help improve the organization of playbooks. These are nothing more than automation using the include directives. They are redistributable units which enables sharing of roles among playbooks. Patterns Patterns control host management and help decide which hosts to communicate to and the configuration setup that needs to be applied to hosts. Patterns can be used in a variety of ways, including the targeting of all hosts, specific host, host by name, specific groups, and so on. Facts Facts are automatically discovered information about remote nodes.
Today, we explore the performance of wikibench on ‘c3’ instances. The methodology we used is similar to the one we utilized on ‘m3’ instances in this post here. In testing ‘c3’ instances, we used a paravirtual image for all our experiments. We used Ubuntu 12.04 LTS (64-bit architecture) for our OS. Another major difference in this set up compared to the one for ‘m3’ instances, is that we used the SSD instance drive for MySQL DB storage. If you recall, we had used EBS store for ‘m3’ instances In logging our performance comparisons, we have used the following metrics Missed Deadlines Due to the delayed response time from the server for lower sampling rates, we see that wikibench is not able to replay all the requests in the tracefile. This results in missed deadlines. Here is a table of missed deadlines for each of the instance types. Request Timeouts We did not record any request time outs, except for ‘c3.large’ when the request rate was 46239 requests per hour. This is a significant improvement when compared to that of ‘m3’ instances. Average Response Times As we did for ‘m3’ instances, we calculated an average response time by first calculating the averages over a period of one minute and then calculating the average of the resulting values. Here is a table of average response times. Following is a chart for the inflection points for each of the instance types. We see that the inflection points for ‘c3.2xlarge,’ ‘c3.4xlarge’ and ‘c3.8xlarge’ are same. CPU Usage This table shows a breakdown of CPU usage averaged over the total number of CPUs in each instance type. The average is taken during the ~20 hour period, which includes requests replayed from all the trace samples.
Next week, the Flux7 team is attending the first annual DockerCon, a two-day Docker-centric event that is being organized by Docker Inc. By far the most exciting part for us will be presenting our developing productivity workflow with our client Auto.com, a Chicago-based online portal that delivers information and services for finding new and used vehicles, and a subsidiary of Classified Ventures. But as we have previously done before when attending conferences, I thought I’d share what other sessions that are going to be a thrill for us. Starting from the discussions on day one, following are the ones that got our attention. For the first set of sessions, my priority is the one about Docker on Cloud Foundry. Docker, in many ways, is positioning itself to be a solution that offers a PaaS with the flexibility of IaaS. AWS already includes Docker in its Beanstalk offering. Cloud Foundry, an open source cross platform PaaS, is already showing a lot of promise. So, I am very interested in learning how Cloud Foundry is working with Docker. In addition, with its own provisioning mechanism, we can potentially use Cloud Foundry to provide an end-to-end code management pipeline from developer workflow through production deployment. At the same time, another of the talks on the agenda includes “Thoughts on Interoperable Containers.” This seems interesting. The topic is definitely valuable, and I expect to walk away from it with insights into the future and long-term potential of containers. For now, though, I’m going to keep this session as a backup. In the next session, I’m planning to attend a discussion led by Facebook entitled “Tupperware: Containerized Deployment at Facebook.” In a developer workflow, you have to work towards setting up an end-to-end workflow.
Cloud computing is surrounded by a lot of hype, and it’s becoming increasingly important during recent years for organizations. The reason why is cloud computing is leading to a rethinking of the Internet’s capability. Cloud computing is propped to completely delocalize computing power and technology for developers, and other users of it. From easily accessing files and other data to keeping networks and mobile devices in sync, there is a general consensus that cloud computing will become even more important in the future as it continues to gain traction. Recent trends show a steady increase in its interest. To get a good basic understanding of the cloud, you must watch this cool video by Stephen Fry: Scenario 1: Using the cloud to analyze 6-billion records per day of stock market data to create contingency plans. Remember the famous flash crash of 2010? It was the crash of the U.S. stock market, and the biggest one-day point decline of the Dow Jones Industrial Average. In less than seven minutes, the Dow Jones was down by roughly 1000 points. The reason for the crash was due to a technical glitch. Such technical glitches do no one any good, and certainly need immediate, reliable solutions. The U.S. Securities and Exchange Commission used Tradeworx, a financial technology company which builds infrastructures that allows the running of stock markets, to create an analytics platform. The idea was to collect during approximately five to six years of data, then analyze it and develop contingency plans for any similar future event. The size of the data that we are talking about here is close to one terabyte of data per day.
In a recent post, we discussed the basics of the wikibench tool and how it is used for application benchmarking. Today, we’ll discuss how we used the Wikibench tool to benchmark m3.xlarge instance. The operating system used for the purpose of this benchmarking was 64-bit Ubuntu Precise 12.04. We ran wikibench using different sampling rates (reduction in per milli). We used a single trace file for our experiments. Typically, a single trace file downloaded from the Wikibench website consists of request traces for one hour. We found empirically that m3.large can handle approximately ~24000 requests per hour. Based on this factor, we sampled the trace file into a number of samples, increasing the number of requests per hour from ~2K requests to ~45K requests. The following table gives the sampling factor (reduction in per milli) and the number of requests in the sampled trace file. This table also provides the information on the number of missed requests. (Requests are missed because the web server cannot handle the request rate when the traces are replayed.) Here is a graph for the number of missed requests corresponding to the sampling factor. While we ran the experiments, we also captured CPU load and memory, as well as disk I/O activity, using the collectd tool.
It’s well known that the Agile movement proposes alternatives to traditional project management. You likely are aware that Agile approaches are typically used in software development to help businesses respond to unpredictability. The results of this approach to development greatly reduces both costs and time to market. Because teams can develop software at the same time they’re gathering requirements, it’s less likely to impede a team from making progress. In the end, it can help companies build the right product, instead of committing to market a piece of software that hasn’t yet even been written. In this post, we will take an early exploration into Agile Programming Methodologies by introducing you to a set of usual terms for this approach. Let’s begin with a strict definition for Agile Programming … Agile Programming: Agile refers to iterative and incremental development, wherein a solution is reached through collaborative self-organized teams. They key is to use short time-boxed iterations, which usually refers to a software release. Iterations are fixed length and involve all activities, including requirement analysis, coding and testing, usually at the same time. XP: Extreme programming focuses on technical practices intended to improve the quality of software through frequent releases by providing user acceptance testing at the end of each release. Releases are iterated at short intervals of less than four weeks. The core values of extreme programming include communication, simplicity, feedback and courage. The common roles in this methodology include customer, programmer, coach and tracker. Pair Programming: Pair programming is an XP development methodology practice, wherein two programmers are involved: one acts as the driver writing the code; the second acts as the observer who reviews the code as it’s written. User Story: This is a definition of what the business system does.
In many of our previous posts, we have talked about micro-benchmarking AWS instances, including benchmarking the instances for CPU, Disk I/O and Network performance. In this post, we will discuss our methodology for macro-benchmarking. While micro-benchmarks give you a deep insight about low-level performance metrics in an isolated fashion, we need application benchmarks to understand the performance of an instance at the application level, especially when we have CPU, Disk I/O and Network I/O all working together in an application. We used wikibench for application benchmarking. The reasons we chose wikibench are: Wikibench uses ‘real ‘ applications. It uses the mediawiki application. And it is powered by the services Apache and MySql in the backend. The website is populated with real data from Wikipedia, and the benchmark replays traces of real users of Wikipedia. Many of our customers host websites, so it is a workload relevant to their requirements. It is a distributed application and scales easily. About Wikibench To run a benchmark, we need three things: Mediawiki installation with a wikipedia dump. One can find instructions about how to download and install mediawiki here. Wikipedia access traces. One can download the traces here. Wikibench software, which can be downloaded here. There is a README.txt that comes along with the wikibench package. It gives a good documentation for using wikibench. Wikibench has three primary parts: wikiloader, tracebench and wikijector. We will now go into more details for each segment. Wikiloader Wikiloader is used to load the wikipedia dumps into the mysql database of a mediawiki installation. A typical usage of the dumper is the following: The dumper supports the following parameters: There is also another tool that loads the dump which can be obtained from the mediawiki site.The instructions can be found here.
As developers build their applications, they are faced with the dilemma of Using an IaaS that allows more freedom and lower cost with high management overhead or Using a PaaS which simplifies the setup while restricting freedom and increasing costs. The open source PaaS Cloud Foundry gives the best of both worlds and is starting to get traction in enterprise. Given this outcome, in this blog, we will introduce you to Cloud Foundry by way of a glossary that helps define what it can do. We lead with a definition of the topic at hand. Cloud Foundry:An open Platform-as-a-Service (PaaS) which provides a wide range of cloud, framework and application services. Application Manifest:A YAML file that includes specifics of the application, including number of instances to be created, size of memory that needs to be allocated, and the services the application will use. Cloud Foundry allows a minimal manifest which allows application deployment without using a manifest. Manifest benefits include: Consistency Reproducibility Cross-cloud portability Database Migration: This refers to changing a database schema due to application development or maintenance. DB Migration is done in three ways: Migrate Once – Execute SQL command directly in the database. Migrate Occasionally – A scheme migration command or script is created and executed while deploying a single instance of the application. Following this, re-deploy the application using the original command. Migrate Frequently – Run an idempotent script that performs partial automations on the first instance of the application. Blue-Green Deployment: This is a release technique that runs two identical production environments called Blue and Green. At any given time, one of the two environments is live and the state of the other environment is idle.
In our last Docker tutorial series post, we shared the 15 commands that got us onboard with Docker. This set of Docker commands are steps to manually create images. That is to basically help create images, as well as commit, search, pull and push images. But why opt for the long tedious way of creating images when it can all be automated. So, let’s automate! Docker offers us the automation solution as DockerFile. In this post, we will discuss what a Dockerfile is, what it is capable of doing, and some basic DockerFile syntax. Commands for Ease of Automation DockerFile is a script that houses instructions needed to create an image. The image can then be created based on the instructions in the DockerFile using the #Docker build command. It simplifies deployment of an image by easing the entire image and container creation process. DockerFiles support commands in the following syntax: INSTRUCTION argument Instructions are not case sensitive. However, they are capitalized for a naming convention. All DockerFiles must begin with the “FROM” command. The “FROM” command indicates the base image from which the new image will be created and from which all subsequent instructions will follow. The “FROM” command can be used any number of times, indicating the creation of multiple images. The syntax is as follows: FROM <image name> FROM ubuntu tells us that the new image will be created from the base Ubuntu image. Following the “FROM” command, DockerFile offers several other commands that ease automation. The order of the commands in the simple text file, or DockerFile, is the order in which it is executed. Let’s walk through some interesting DockerFile commands. 1. MAINTAINER: Set an author field for the image using this instruction.