Learning about Google's Cloud Services for Big Data
Cloud Services? Big Data? Google? What are we talking about?
Last evening, I finally put the pieces together... at a Google Tech Users Meetup where Patrick Chanzon, Google API evangelist spoke.
As you know, web companies, like Google, offer more and more services (sometimes known as applications) right in your browser. Think Search, Mail, Docs, Calendars, etc. In order to offer these applications to large audiences, large (edit: make that immense) data storage capability is needed. Google, along with other companies like Amazon and Microsoft, began building the infrastructure needed to handle big data several years ago, to be able to grow and shrink storage/response capacity as it is needed. This distributed capacity to store and serve data has become known as cloud services, and is now offered by many large companies including Amazon, Google, Microsoft.
There are several levels of service in cloud computing today, with corresponding acronyms: SAAS, PAAS, and IAAS. Let's look at how they work.
Software as a Service (SAAS) describes the customer-facing, top layer of cloud offerings: complete applications ready-to-use by customers. Everything from web-based email services (Gmail, Yahoo Mail, etc.), to Google Docs, Salesforce and MailChimp, not forgetting Google Maps, Analytics and Adwords, plus whole websites like this one in Drupal Gardens are part of this layer of cloud-based software. More and more, we work and play in our browsers, with the assistance of SAAS. Mobile apps we see sprouting up on many platforms, IOS, Google Android, etc. are also SAAS.
SAAS depends on a layer of cloud infrastructure, either PAAS or IAAS. So, what is PAAS?
Platform as a Service (PAAS) refers to a service that is configured to harness the power of cloud computing, so it is set up for you, but also restricted by the set-up configuration. Sometimes called cloud development in a box.
Google App Engine sits in this layer, along with Microsoft Azure, AWS Elastic Beanstalk, and Acquia Managed Cloud (based on Amazon Web Services). Google App Engine only accepts Java, Go and Python programming, for instance, and is set up to help developers build applications for the web and for Android. Similarly, Acquia Managed Cloud is optimized for hosting Drupal websites and applications. That brings us to IAAS.
Infrastructure as a Service (IAAS) is cloud computing's foundation. This is what we think of when we hear that Google has huge data centers with thousands of servers configured to work together. When you hear that enterprise data centers are closing down, it's because the cost of maintaining corporate/government data centers is becoming higher than the cost of purchasing cloud services, which have other advantages as well. Database limits are largely solved by IAAS. Both PAAS and IAAS are often referred to as elastic and scalable, because they can meet changing needs, growing when a site/service has high demand, then shrinking when no longer needed. Traditional servers are not elastic, and scalablility is a major issue that can bring down sites. A recent example of a site that relied on cloud services to stay "up" is the British royal wedding site. IAAS products offer data storage that you, the purchaser, configure to your own specifications. Amazon S3 or Google Storage, Rackspace Cloud, and Joyent are some best-known examples.
Google Code Labs
Some of Google Code Labs projects perfectly complement the Google App Engine service: Storage for Developers, Prediction API, BigQuery, Fusion Tables. They are pre-beta offerings, interesting to explore with some very close to being offered commercially.
Storage for Developers is for storing massive data on Google's cloud, in any format, any type, up to 100GB+/per object.
Prediction API is a simplified version of machine learning. First you enter large csv files into Google Storage. Next you train the system to identify relevant features in the data. Then you can use it to predict, based on new data sets.
BigQuery allows you to analyze massive amounts of data in seconds, through simple, SQL-like data queries.
Fusion Tables gives you spreadsheet-like data interface, access control, a built-in set of visualizations (charts) that can be built using your data, integration with Google Maps through geographical queries. The results are amazing to see.
If innovation isn't Google's middle name, it surely must be its first!