Twister4Azure

Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud.
Introduction
Core Characteristics
- Distributed decentralized iterative MapReduce runtime for Windows Azure Cloud
- Extension of MRRoles4Azure
- Optimized for iterative MapReduce applications
- Uses decentralized control model (no master node)
Key Features
- Supports caching of loop-invarient data
- Introduces new merge step (map->reduce->merge)
- Implements cache-aware task scheduling
- Allows dynamic scaling of computing instances
Azure Services Integration
- Azure Queues: Map and reduce task scheduling
- Azure Tables: Metadata & monitoring data storage
- Azure Blob Storage: Input, output and intermediate data storage
- Windows Azure Compute: Worker roles for computations
Advantages
- Takes advantage of cloud service provider guarantees:
- Scalability
- High availability
- Distributed nature
- Avoids:
- Single point of failures
- Bandwidth bottlenecks
- Management overheads
Handling Cloud Challenges
- Addresses higher latencies through coarser grained tasks
- Overcomes availability issues through:
- Retry mechanisms
- System design not dependent on immediate data availability

Publications
Conferences/Workshops
Thilina Gunarathne, Bingjing Zhang, Tak-Lon Wu and Judy Qiu. Scalable Parallel Computing on Clouds Using Twister4Azure Iterative MapReduce, Future Generation Computer Systems(FGCS), Available online 22 June 2012, ISSN 0167-739X, 10.1016/j.future.2012.05.027.(6/2012)
​Gunarathne, T., Wu, T.-L., Choi, J. Y., Bae, S.-H. and Qiu, J. (2011), Cloud computing paradigms for pleasingly parallel biomedical applications. Concurrency and Computation: Practice and Experience. doi: 10.1002/cpe.1780
