Monday, November 14, 2011

Hadoop Speculative Execution

One thing that seems easy to grasp in Hadoop is the concept of speculative execution. Sounds trivial - if I have a task and am not so sure of the environment, please go ahead dear yellow elephant, schedule some more of the task, and wait which one will finish first. One task here or one task there does not matter, and dammit, I want my results NOW.

But how does the little yellow fellow know what is slow and when to schedule? When running large tasks looks like it runs some speculative tasks anyway, and many get the idea that there is some magic number of tasks launched for each job, but this would be way too expensive if the number of mappers/reducers is large. Therefore Hadoop goes the semi-smart way:

The current algorithm works the following way:
A speculative task is kicked off for mappers and reducers that have completion rate under certain pecentage of the completion rate of the majority of running tasks. For example if you have 100 mappers, 90 of which are at 80% completion, and 10 are at 20%, then hadoop will start 10 addittional tasks for the slower ones.

in versions of hadoop over 0.20.2 there are 3 new fields in the jobconf
  • mapreduce.job.speculative.speculativecap
  • mapreduce.job.speculative.slowtaskthreshold
  • mapreduce.job.speculative.slownodethreshold
Hadoop launches a speculative task for a regular task if completion < slowtaskthrehold*mean(
completion of all other tasks) and the number of speculative tasks launched< speculativecap.

In older versions of Hadoop these threshold values are fixed and cannot be modified.


  1. Thank for sharing this great Hadoop tutorials Blog post.I get a lot of great information here and this is what I am searching for. Thank you for your sharing. I have bookmark this page for my future reference.
    Hadoop Training in hyderabad

  2. thanks for simple explanation for speculative execution strategy and as I am so interested in this topic please share more posts about it and its new algorithms
    thanks so much

  3. Actually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.

    Big Data Training Chennai
    Big Data Training
    Big Data Course in Chennai

  4. Thanks for sharing your informative article on Hive ODBC Driver. Your article is very descriptive and assists me to learn whole concept in detail. Hadoop Training in Chennai

  5. I was just wondering how I missed this article so far, this is a great piece of content I have ever seen in the entire Internet. Thanks for sharing this worth able information in here and do keep blogging like this.

    Hadoop Training Chennai | Big Data Training in Chennai | Big Data Training Chennai