Posts tagged ‘caching’

April 9, 2011

3 simple steps to adopt cloud computing

Cloud computing is now synonymous with Flexible Provisioning and Scale. Find out below if you are taking full advantage of cloud computing.

The As Is deployment – lowest adoption cost, reasonable benefits:

Move the server application “as is” to a cloud server. This is nothing but a co-located server, at Amazon for example. The provisioning and maintenance of the application is still a self driven task.

The win is in the dynamic on demand provisioning. Easy to compute the ROI here. Let us say that your application needs to be available all year round – but cater to seasonal demands. Say it costs $400 to host your application to cater to peak demand. You would end up paying 12*400 = $4800 per annum to keep your application up. Most of the time it would be under utilized. Cloud computing has made it really simple to change your compute capacity as easily as setting a reminder in your out look calender. With amazon or google, you could just log into the admin panel and say that you need additional resources only on certain dates. At the end of the month you get billed for the amount of resources you actually consume.

The Managed RDBMS deployment – reasonably low adoption cost, reasonable benefits:

A lot of work has to be done to ensure that the application is available. i.e. a replication strategy and policy to keep the database available. This is still a lot of effort and money. The alternative is a managed RDBMS, where the provider (amazon or google) manages the database. They worry about keeping the data safe from being lost. Much harder to do the ROI here – as the time spent in managing this would have to be offset against opportunity costs. Note that there would be some amount of code restructuring (not a lot) to get this going. An example of this is the Amazon MySQL RDS. At the time of writing, google is yet to announce the availability of their hosted sql service.

The Application Rewrite – highest adoption cost, highest benefits (arguably)

If your goal is to write an application which scales very well then you should consider a complete application rewrite to take advantage of the storage APIs. Hosted RDBMS is still a single machine (or a cluster) running a database server – with bottlenecks – be it memory, cpu, networ or disk.

Cloud computing offers storage APIs to access and manage data unlike traditional methods of file or rdbms storage. Because of the underlying architectural differences, cloud datastore offers better scalability – http://labs.google.com/papers/bigtable.html.

Advertisements
April 8, 2011

Performance Engineering – SSD file systems.

Solid state device threatens to challenge and change existing computing paradigms.

While traditional disk access times are of the order of a few milli seconds, ssd access times are under 100 micro seconds for reads and writes respectively. Speeding up by a factor of 100. And that is significant.

Operating system components have evolved over the last 4 decades at a much slower speed. For an enterprise platform like IBM AIX or HPUX it takes A few years (my guess is a minimum 6) to push a new technology. The cycle is as follows – a new hardware technology is invented. OS vendors take a few years to adopt and evangelize. Enterprise customers longer to test, adopt and deploy.

SSD promises to deliver better performance by lowering IO latency and increasing throughput. File systems have evolved to do the same. Specialized caches have been invented to speed up performance. For example directory name lookup cache, page cache, inode cache, large directory cache, buffer cache and so on.

Quite a lot of focus is on being clever with reads and writes of application data. Engineers go to great extents to squeeze the last bit of performance out of the system. Sadly performance is not a main consideration during implementation (functionality is) and is often times applied as an afterthought.

The result is hacks rather than elegant solutions to performance issues.

Coming back to SSDs – a large portion of the file system implementation has to be re looked at and parts of it have to be thrown away completely. Especially true when complete filesystems are laid out on SSD. We need to look at how filesystems can take advantage of SSDs.

April 7, 2011

Balance File System Caching and Flushing policies

Most file-systems have intelligent caching policies. The policies are designed to increase the throughput and decrease IO latency rates thereby providing faster service times to the users and applications. Based on the nature of the workload, appropriate caching policies can be set to achieve maximum cache-hit rates.

However, this works as long as there is enough memory to store revisited pages. Once the number of pages cached exceeds the set memory limit, the file-system decides to reclaim space by flushing older pages to the storage. If flushing is not done frequently, a lot of data may suddenly be dumped on the disk leading to storage bottleneck.

Depending on storage bandwidth, flushing large amount of file-system data can lead to very large service times for the users or the application. For all-round good performance one thus needs to also look at how frequently and how much of data is flushed to the storage. Just like caching policies should take into account the nature of workload, flushing policies should take into account the nature of storage and storage i/o bandwidths.

A good sustained file-system performance is possible only when both, caching and flushing, policies are set optimally.