Throughout the history of our blog, we have shared many posts in regard to benchmarking, such as explaining how to setup and use sysbench for MySQL benchmarking. You can just do a search in the upper right-hand corner for “benchmarking” to find all of these. Today, we are continuing to add to the library! In this post, we are sharing our experiences using sysbench for MySQL benchmarking. To start, let’s explore the setup we used for benchmarking. Setup Here it is: Machine: AWS m3.large instance (64 Bit, paravirtual) Storage: 32 GB SSD instance store OS: Ubuntu 14.04 LTS (3.13.0-24-generic) MySQL Version: 5.5.35 Sysbench Version: 0.4.12 We used four different tables sizes for our benchmarking. They ranged from 50,000 to 50,000,000 whereby each table is 10 times larger than the previous one. Initially, the benchmark was run without applying any optimization and used the default “my.cnf.” We then applied several optimizations for MySQL based on best practices recommended by MySQL documentation. Then we ran the benchmark again. Optimizations We applied the following optimizations to the MySQL configuration file “my.cnf” (/etc/mysql/my.cnf). A short description of the system variables is given below. Caches and Limits max_heap_table_size → The maximum size for in-memory temporary tables is the minimum of the tmp_table_size and max_heap_table_size values. The default value of tmp_table_size. The default value for max_heap_table_size is 16M and is now set to 32MB so that it will be equal to tmp_table_size. query_cache_size → We increased the query_cache_size so that the results are cached to some extent. thread_cache_size → Although for benchmarking purposes, we do not need to use this variable as there will only be one connection. We have included this just to make sure having this variable does not affect the performance. open_files_limit → Increase the open files limit.
Figuring out [MAIN CHALLENGE] isn’t easy. But once you figure out the basics, you’ve opened the doors to tremendous opportunities for growth and learning. That’s why we’ve covered the fundamentals in [LINK TO EBOOK]. This blog post offers a quick overview by examining the five basic ways to get started. 1. First Way to Get Better Start by embracing the lowest hanging fruit. 2. Second Way to Get Better Then, find a way to move up a notch. 3. Third Way to Get Better Ask your peers and managers for advice. 4. Fourth Way to Get Better Do research using Google Trends and other helpful tools to reveal patterns in behavior. 5. Fifth Way to Get Better Now, tie this back to a life lesson that everyone takes for granted but doesn’t consder much in everyday life.
A couple of weeks ago, we attended DockerCon, the inaugural Docker-centric conference for developers, and anyone else, with an interest in the open platform to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud. We were there, not only as a founding System Integration partner but also as a presenter.
The current trends in technology indicate that more than 60% of businesses use cloud computing for their IT operations. Among the various cloud service providers, Amazon Web Services [AWS] is a pioneer and continues to be a leader in the cloud market.
We published a post a few months ago titled “Must-know Facts About AWS ELB” in which we explored some of the peculiarities of Amazon’s Elastic Load Balancer. We thought we’d go a bit deeper into the details of what the ELB is to better understand its limitations and appreciate the engineering behind it. So, what are the requirements for the ELB?
Last week we explored how business goals should inform every good DevOps strategy. This week we’ll discuss how to use those goals to validate your DevOps architecture. From our experience at Flux7, the best way to do this is to define the workflows of key users.
Introduction to OpenStack The OpenStack project is an open-source cloud-computing platform for private, public and hybrid clouds that’s simple to implement, massively scalable, and feature-rich. OpenStack provides an Infrastructure as a Service (IaaS) solution through a set of interrelated services. OpenStack was started in 2010 as a joint venture between RackSpace Hosting and the National Aeronautics and Space Administration (NASA). Today more than 200 companies have joined the project, including AMD, Canonical, Cisco, Dell, EMC, Ericsson, Groupe Bull, HP, IBM, Inktank, Intel, NEC, Red Hat, SUSE Linux, VMware, and Yahoo! The project is now managed by the OpenStack Foundation, a non-profit corporate entity established in September 2012 to promote OpenStack software and its community. The community collaborates around a six-month, time-based release cycle with frequent development milestones. OpenStack Releases: The following is a conceptual architecture diagram showing the relationships between OpenStack services: Overview of OpenStack Services Keystone The OpenStack Identity Service (Keystone) provides authorization and authentication for users and also manages service catalogs. It’s equivalent to AWS Identity and Access Management (IAM). Glance The OpenStack Image Storage Service (Glance) stores and manages virtual machine images in different formats. These images are used by compute service to provision instances. It’s comparable to AWS AMI (Amazon Machine Image). Cinder The OpenStack Block Storage Service (Cinder) provides persistent block storage to guest virtual machines for expanded storage, better performance, and integration with enterprise storage platforms. It’s similar to AWS EBS (Elastic Block Storage). Neutron The OpenStack Network Service (Neutron) enables network-connectivity interface devices managed by Compute. It enables users to create and attach interfaces to networks. It corresponds to AWS Networking. Nova The OpenStack Compute Service (Nova) provisions instances on user demand. It supports most virtualization technologies.
An organization moving to the cloud truly understands the cloud’s benefits only when setting up good DevOps methodologies and cloud automation to meets its needs. The process is replete with tool choices at every stage and the overall goal is to understand and meet the organization’s needs.
While advising a client with a strong interest in ARM servers, we decided to evaluate the computational overhead of various big data technologies, which led to some interesting discoveries. Since we in the field are all trying to figure out how big data technology will evolve, Flux7 Labs thought we’d share some of what we’ve learned. The Question The question we tried to answer was, “Is Hadoop a good candidate for microserver workloads?” Big Data workloads, with their high reliance on memory and network, are often touted as perfect candidates for moving to microservers. Among Big Data workloads, Hadoop has become the poster child for applications, and microserver vendors are keenly interested in seeing Hadoop ported over to their platforms. Our theory was that Hadoop is a very high-overhead application because it was designed to utilize excess CPU capacity to conserve disk and network bandwidth. This would make Vanilla Hadoop without tuning, and rearchitecting unsuitable for microservers due to their limited CPU horsepower and excess disk and network capacities. To test our theory, we conducted a brief experiment to compare application performance when running Hadoop, Apache Twitter and native.. The Experiment Our experiment was based on a project previously coded by our CTO, Ali Hussain, for an assignment in the Coursera course, “Introduction to Data Science” (https://www.coursera.org/course/datasci). Ali’s project was a compute-heavy application for performing sentiment analysis on a Twitter stream. You can find his code at https://github.com/Flux7Labs/twitter-fun. The code classifies tweets as positive, negative or neutral based on their content. Most of the work is performed by parsing text and looking up the words in a dictionary in order to classify them. We rewrote the original Python program in Java.
Based on the incoming application traffic, Elastic Load Balancing automatically distributes traffic across EC2 instances. Elastic Load Balancing detects unhealthy instances and moves traffic to healthy instances till the unhealthy instances are restored back. Elastic Load Balancing is used for 1. Better fault tolerance 2. DNS failover 3. Auto Scaling 4. Easy creation of entry point for VPC (Virtual Private Cloud) Elastic Load Balancing Features: 1. Incoming traffic can be distributed across amazon Ec2 instances in a single availability zone or multiple zones. 2. Helps creation of security groups when used in a Virtual Private Cloud (VPC) 3. Detects health of Amazon Ec2 instances. For example, when a load balanced instance is detected, Elastic Load Balancing makes sure not to route traffic to that instance. 4. Elastic load balancing supports sticking user sessions to specific EC2 instances. 5. Supports both Internet Protocol v4 and v6 (IPv4 and IPv6) 6. Supports SSL termination 7. Elastic Load balancing metrics, request count and latency, are reported by Amazon CloudWatch Cost: The cost depends on the usage of a Elastic Load Balancer. The charge is on an hourly basis and for each GB transferred through the Elastic Load Balancer.
”BigData” is a term that has been buzzing around a lot for the last few years. And when you hear this buzz, you’ll hear ”Hadoop” as well. In last 2-3 years, many big players in the industry have come up with their own distribution of Apache Hadoop, be it Intel, Microsoft, IBM, or EMC, etc. Also, some startups, focusing only on Hadoop, have become big players now – Cloudera, Hortonworks – in this area. Each Hadoop distributor claims how its distribution is the best one out there. Each distribution has some unique features which really may be useful for a set of users and may not be useful for another. It may become non-trivial to choose from so many distributors matching your requirements, especially when the user is spending money on purchasing a distribution and support. Update: The free white paper comparing the Hadoop Distributions is ready for download! Click here or check the resources section on the sidebar to download the whitepaper for free. User Bases: There are multiple user bases that may need to deploy Hadoop. Some of them are listed below: 1. Higher management in some company, willing to move to BigData solutions using Hadoop. 2. A developer building some tool in Hadoop Ecosystem. 3. A newbie learning Hadoop and looking for a temporary/non-serious Hadoop deployment. Keeping these things in mind, we have completed a thorough study of following distribution sources, which will be covered in a 6-part series. 1. Intel Distribution for Apache Hadoop 2. Cloudera Distribution Including Apache Hadoop 3. Hortonworks Data Platform 4. MapR Through this series, we’ll share our experience with each of these distributors and provide subjective as well as objective results of the feature/performance comparisons we did. This will help you shortlist the distributors, based on your requirements.