coffee ==> ideas: Hadoop Speculative Execution

Monday, November 14, 2011

Hadoop Speculative Execution

One thing that seems easy to grasp in Hadoop is the concept of speculative execution. Sounds trivial - if I have a task and am not so sure of the environment, please go ahead dear yellow elephant, schedule some more of the task, and wait which one will finish first. One task here or one task there does not matter, and dammit, I want my results NOW.

But how does the little yellow fellow know what is slow and when to schedule? When running large tasks looks like it runs some speculative tasks anyway, and many get the idea that there is some magic number of tasks launched for each job, but this would be way too expensive if the number of mappers/reducers is large. Therefore Hadoop goes the semi-smart way:

The current algorithm works the following way:
A speculative task is kicked off for mappers and reducers that have completion rate under certain pecentage of the completion rate of the majority of running tasks. For example if you have 100 mappers, 90 of which are at 80% completion, and 10 are at 20%, then hadoop will start 10 addittional tasks for the slower ones.

in versions of hadoop over 0.20.2 there are 3 new fields in the jobconf

mapreduce.job.speculative.speculativecap
mapreduce.job.speculative.slowtaskthreshold
mapreduce.job.speculative.slownodethreshold

Hadoop launches a speculative task for a regular task if completion < slowtaskthrehold*mean(

completion of all other tasks) and the number of speculative tasks launched< speculativecap.

In older versions of Hadoop these threshold values are fixed and cannot be modified.

4 comments:

mareddyonlineJuly 19, 2014 at 12:35 PM
Thank for sharing this great Hadoop tutorials Blog post.I get a lot of great information here and this is what I am searching for. Thank you for your sharing. I have bookmark this page for my future reference.
Hadoop Training in hyderabad
ReplyDelete
Replies
FlowraNovember 29, 2014 at 3:21 AM
thanks for simple explanation for speculative execution strategy and as I am so interested in this topic please share more posts about it and its new algorithms
thanks so much
ReplyDelete
Replies
UnknownAugust 4, 2015 at 5:04 AM
Actually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.

Big Data Training Chennai
Big Data Training
Big Data Course in Chennai
ReplyDelete
Replies
AnonymousMay 27, 2020 at 1:22 AM
Informative post indeed, I’ve being in and out reading posts regularly and I see alot of engaging people sharing things and majority of the shared information is very valuable and so, here’s my fine read.

Big Data Hadoop Training In Chennai | Big Data Hadoop Training In anna nagar | Big Data Hadoop Training In omr | Big Data Hadoop Training In porur | Big Data Hadoop Training In tambaram | Big Data Hadoop Training In velachery
ReplyDelete
Replies

Add comment

coffee ==> ideas

Monday, November 14, 2011

Hadoop Speculative Execution

4 comments:

Followers

Blog Archive

About Me