Cloud Computing CS

Similar documents
Content Part 2 Users manual... 4

University of Toronto. Final Report. myacl. Student: Alaa Abdulaal Pirave Eahalaivan Nirtal Shah. Professor: Jonathan Rose

Detection of aneuploidy in a single cell using the Ion ReproSeq PGS View Kit

CHAPTER 6 DESIGN AND ARCHITECTURE OF REAL TIME WEB-CENTRIC TELEHEALTH DIABETES DIAGNOSIS EXPERT SYSTEM

Automated process to create snapshot reports based on the 2016 Murray Community-based Groups Capacity Survey: User Guide Report No.

Corporate Online. Using Term Deposits

Cross-Domain Development Kit XDK110 Platform for Application Development

HPX integration: APEX (Autonomic Performance Environment for exascale)

Using the CFS Infrastructure

Computer Science 101 Project 2: Predator Prey Model

Agent-Based Models. Maksudul Alam, Wei Wang

User Guide Seeing and Managing Patients with AASM SleepTM

Embracing Complexity in System of Systems Analysis and Architecting

Dementia Direct Enhanced Service

Mitglied der Helmholtz-Gemeinschaft. Advanced System Monitoring in PTP

Mike Davies Director, Neuromorphic Computing Lab Intel Labs

ADVANCED VBA FOR PROJECT FINANCE Near Future Ltd. Registration no

DICOM Conformance Statement

Sleep Apnea Therapy Software User Manual

Large-scale Histopathology Image Analysis for Colon Cancer on Azure

mehealth for ADHD Parent Manual

Scalable PLC AC500 Communication AC500 PROFIBUS DP S500- I/Os Basic module

SmartVA-Analyze 2.0 Help

About REACH: Machine Captioning for Video

Secure Grouping and Aggregation with MapReduce

EBCC Data Analysis Tool (EBCC DAT) Introduction

The University of Texas MD Anderson Cancer Center Division of Quantitative Sciences Department of Biostatistics. CRM Suite. User s Guide Version 1.0.

Diabetes Management App. Instruction Manual

PBSI-EHR Off the Charts!

Audit Firm Administrator steps to follow

EHS QUICKSTART GUIDE RTLAB / CPU SECTION EFPGASIM TOOLBOX.

Customer Guide to ShoreTel TAPI- VoIP Integrations. March

Sleep Apnea Therapy Software Clinician Manual

Thrive Hearing Control Application

Loop Dependence and Parallelism

FT-302 Force Transducer

Web Feature Services Tutorial

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

Dosimeter Setting Device System NRZ

Lightened Dream. Quick Start Guide Lightened Dream is a dream journal designed to wake you up in your dreams.

WFS. User Guide. thinkwhere Glendevon House Castle Business Park Stirling FK9 4TZ Tel +44 (0) Fax +44 (0)

Developing a Custom Tool to Support USACE s National Levee Database

Co-NNections COMMITTEE POLICIES and Procedures MANUAL. Purpose of Policies and Procedures

Quick Notes for Users of. Beef Ration and Nutrition. Decision Software & Sheep Companion Modules

Autocount GIRO plugin enable you to upload your E-banking payment instruction in a

POPULATION TRACKER - DREAMED USER GUIDE

GET SWOLE - Final Report Dartmouth College, COSC 65 Professor Andrew Campbell. Designed by Cameron Price, Andrew Pillsbury & Patricia Neckowicz

Using Data from Electronic HIV Case Management Systems to Improve HIV Services in Central Asia

PROFIBUS ANALYZER INFOLETTER 14/09

ISR Process for Internal Service Providers

AVR Based Gesture Vocalizer Using Speech Synthesizer IC

Getting the Payoff With MDD. Proven Steps to Get Your Return on Investment

Lecture 5- Hybrid Agents 2015/2016

The Hospital Anxiety and Depression Scale Guidance and Information

Getting Started.

Snorkel DryBell: A Case Study in Deploying Weak Supervision at Industrial Scale

CrystalPM - AOA MORE Integration and MIPS (CQM) Tutorial

CALYPSO Highlights

AUTOMATIONWORX. User Manual. UM EN IBS SRE 1A Order No.: INTERBUS Register Expansion Chip IBS SRE 1A

Date: April 19, 2017 Name of Product: Cisco Spark Board Contact for more information:

IT and Information Acceptable Use Policy

Avaya G450 Branch Gateway, R6.2 Voluntary Product Accessibility Template (VPAT)

Online Journal for Weightlifters & Coaches

Thrive Hearing Control Application

Agile Product Lifecycle Management for Process

User Guide V: 3.0, August 2017

SAP Hybris Academy. Public. February March 2017

AN ADVANCED COMPUTATIONAL APPROACH TO ACKNOWLEDGED SYSTEM OF SYSTEMS ANALYSIS & ARCHITECTING USING AGENT BASED BEHAVIORAL MODELING

Application Note. Using RTT on Cortex-A/R based devices. Document: AN08005 Software Version: 1.00 Revision: 1 Date: February 15, 2016

Avaya G450 Branch Gateway, Release 7.1 Voluntary Product Accessibility Template (VPAT)

SHORETEL APPLICATION NOTE

From where does the content of a certain geo-communication come? semiotics in web-based geo-communication Brodersen, Lars

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Systems Engineering Guide for Systems of Systems. Summary. December 2010

This is a repository copy of Measuring the effect of public health campaigns on Twitter: the case of World Autism Awareness Day.

Mobile App User Guide

Contents New Features 2 JustGiving Integration 2

Entering HIV Testing Data into EvaluationWeb

Walkthrough

Mobile Phone Enabled Social Community Extraction for Controlling of Disease Propagation in Healthcare

CHAPTER 4 CONTENT LECTURE 1 November :28 AM

CS148 - Building Intelligent Robots Lecture 5: Autonomus Control Architectures. Instructor: Chad Jenkins (cjenkins)

BOOKING RULES REVISIONS & SUPPORT FOR MULTIPLE CRS

Voluntary Product Accessibility Template (VPAT)

NEW AUTOMATED PERIMETERS NEW. Fast and precise perimetry at your fingertips. ZETA strategy EyeSee recording DPA analysis

SAGE. Nick Beard Vice President, IDX Systems Corp.

Estimating national adult prevalence of HIV-1 in Generalized Epidemics

VMMC Installation Guide (Windows NT) Version 2.0

FORMAT FOR CORRELATION TO THE GEORGIA PERFORMANCE STANDARDS. Textbook Title: Benchmark Series: Microsoft Office Publisher: EMC Publishing, LLC

BlueBayCT - Warfarin User Guide

A Guide to Algorithm Design: Paradigms, Methods, and Complexity Analysis

Welch Allyn Cardiology Solutions

Cloud Condensation Nuclei Counter (CCN) Module

Instructions for Use Audiometric Data Interface

Genetic Algorithms and their Application to Continuum Generation

Genome. Institute. GenomeVIP: A Genomics Analysis Pipeline for Cloud Computing with Germline and Somatic Calling on Amazon s Cloud. R. Jay Mashl.

Suppliers' Information Note. Next Generation Text Service also called Text Relay. Service Description

Symantec ESM Agent for IBM AS/400 Installation Guide. Version: 6.5

Meri Awaaz Smart Glove Learning Assistant for Mute Students and teachers

Transcription:

Cloud Computing CS 15-319 Pregel Lecture 10, Feb 15, 2012 Majd F. Sakr, Suhail Rehman and Mohammad Hammoud

Today Last session Apache Mahout, Guest Lecture Today s session Pregel Announcement: Project Phases I-A and I-B are due today 2

Objectives Discussion on Programming Models MapReduce Pregel, Dryad and GraphLab Why parallelism? Parallel computer architectures Traditional models of parallel programming Examples of parallel processing Message Passing Interface (MPI) Last 3 Sessions 3

Pregel 4

Pregel In this part, the following concepts of Pregel will be described: Motivation for Pregel The Pregel Computation Model The Pregel API Execution of a Pregel Program Fault Tolerance in Pregel 5

Pregel In this part, the following concepts of Pregel will be described: Motivation for Pregel The Pregel Computation Model The Pregel API Execution of a Pregel Program Fault Tolerance in Pregel 6

Motivation for Pregel How to implement algorithms to process large graphs? Create a custom distributed infrastructure for each new algorithm Rely on existing distributed computing platforms such as MapReduce Use a single-computer graph algorithm library like BGL, LEDA, NetworkX etc. Use a parallel graph processing system like Parallel BGL or CGMGraph 7

Motivation for Pregel How to implement algorithms to process large graphs? Create a custom distributed Difficult! infrastructure for each new algorithm Rely on existing distributed Inefficient computing and Cumbersome! platforms such as MapReduce Use a single-computer graph algorithm library like BGL, LEDA, NetworkX etc. Too large to fit on single machine! Use a parallel Not graph suited processing for Large system Scale Distributed like ParallelSystems! BGL or CGMGraph 8

Pregel Pregel is a framework developed by Google. It provides: High scalability Fault-tolerance Flexibility in expressing arbitrary graph algorithms Pregel is inspired by Valiant s Bulk Synchronous Parallel (BSP) model 9

Bulk Synchronous Parallel Model Iterations CPU 1 CPU 1 CPU 1 CPU 2 CPU 2 CPU 2 CPU 3 CPU 3 CPU 3 Barrier Barrier Barrier 10

Pregel In this part, the following concepts of Pregel will be described: Motivation for Pregel The Pregel Computation Model The Pregel API Execution of a Pregel Program Fault Tolerance in Pregel 11

Entities and Supersteps The computation is described in terms of vertices, edges and a sequence of iterations called supersteps You give Pregel a directed graph consisting of vertices and edges Each vertex is associated with a modifiable user-defined value Each edge is associated with a source vertex, value and a destination vertex During a superstep: A user-defined function F is executed at each vertex V F can read messages sent to V in superstep S 1and send messages to other vertices that will be received at superstep S+1 F can modify the state of V and its outgoing edges F can change the topology of the graph 12

Algorithm Termination Algorithm termination is based on every vertex voting to halt In superstep 0, every vertex is active All active vertices participate in the computation of any given superstep A vertex deactivates itself by voting to halt and enters an inactive state A vertex can return to active state if it receives an external message Active Vote to Halt Inactive Message Received Vertex State Machine Program terminates when all vertices are simultaneously inactive and there are no messages in transit 13

Finding the Max Value in a Graph 3 6 2 1 Blue Arrows are messages 36 6 2 16 Blue vertices have voted to halt 6 6 62 6 6 6 6 6 14

Pregel In this part, the following concepts of Pregel will be described: Motivation for Pregel The Pregel Computation Model The Pregel API Execution of a Pregel Program Fault Tolerance in Pregel 15

The Pregel API in C++ A Pregel program is written by subclassing the vertex class: template <typename VertexValue, typename EdgeValue, typename MessageValue> class Vertex { public: virtual void Compute(MessageIterator* msgs) = 0; const string& vertex_id() const; int64 superstep() const; const VertexValue& GetValue(); VertexValue* MutableValue(); OutEdgeIterator GetOutEdgeIterator(); To define the types for vertices, edges and messages To get the value of the current vertex To modify the value of the vertex Override the compute function to define the computation at each superstep }; void SendMessageTo(const string& dest_vertex, const MessageValue& message); void VoteToHalt(); To pass messages to other vertices 16

Pregel Code for Finding the Max Value Class MaxFindVertex : public Vertex<double, void, double> { public: virtual void Compute(MessageIterator* msgs) { int currmax = GetValue(); SendMessageToAllNeighbors(currMax); for ( ;!msgs->done(); msgs->next()) { if (msgs->value() > currmax) currmax = msgs->value(); } if (currmax > GetValue()) *MutableValue() = currmax; else VoteToHalt(); } }; 17

Message Passing, Combiners, and Aggregators Messages can be passed from any vertex to any other vertex in the Graph Any number of messages may be passed Message order is not guaranteed Messages will not be duplicated Combiners can be used to reduce the number of messages passed between supersteps Aggregators are available for reduction operations such as sum, min, max etc. 18

Topology Mutations, Input and Output The graph structure can be modified during any superstep Vertices and edges can be added or deleted Conflicts are handled using partial ordering of operations User-defined handlers are also available to manage conflicts Flexible input and output formats Text File Relational base Bigtable Entries Interpretation of input is a pre-processing step separate from graph computation Custom formats can be created by sub-classing the Reader and Writer classes 19

Pregel In this part, the following concepts of Pregel will be described: Motivation for Pregel The Pregel Computation Model The Pregel API Execution of a Pregel Program Fault Tolerance in Pregel 20

Graph Partitioning The input graph is divided into partitions consisting of vertices and outgoing edges Default partitioning function is hash(id) mod N, where N is the # of partitions It can be customized 1 4 2 7 10 5 8 3 6 12 9 11 21

Execution of a Pregel Program Steps of Program Execution: 1. Copies of the program are distributed across all workers 1.1 One copy is designated as a master 2. Master partitions the graph and assigns workers their respective partition(s) along with portions of the input 3. Master coordinates the execution of supersteps and delivers messages among vertices 4. Master calculates the number of inactive vertices after each superstep and signals workers to terminate if all vertices are inactive and no messages are in transit 5. Each worker may be instructed to save its portion of the graph 22

Pregel In this part, the following concepts of Pregel will be described: Motivation for Pregel The Pregel Computation Model The Pregel API Execution of a Pregel Program Fault Tolerance in Pregel 23

Fault Tolerance in Pregel Fault tolerance is achieved through checkpointing At the start of every superstep the master may instruct the workers to save the state of their partitions in a stable storage Master uses ping messages to detect worker failures If a worker fails, the master reassigns corresponding vertices and input to another available worker and restarts the superstep The available worker reloads the partition state of the failed worker from the most recent available checkpoint 24

Next Class Dryad and GraphLab 25