Skip to content

Latest commit

 

History

History
281 lines (141 loc) · 88 KB

Mozilla-KalpeshKrishna-Schedule-TaskCluster-Jobs.md

File metadata and controls

281 lines (141 loc) · 88 KB

Give Treeherder the ability to schedule TaskCluster jobs

(**Mentors **– jmaher, armenzg)

(Google Summer of Code – 2016 Proposal)

Personal Details

Name: Kalpesh Krishna

Email: [email protected]

IRC Nick: martianwars

Telephone: (+91) 9822650831

**Other Contact Methods: IRC / **Facebook / Hangouts / Skype (martiansideofthemoon)

Country of Residence: India

Timezone: Indian Standard Time - UTC (+5:30h)

Primary Language: English

**Profiles: **Bugzilla, Github, Mozillans

Project Details

Introduction

Over the last ten years, Mozilla has been using Buildbot as its primary continuous integration tool in order to schedule builds and tests for every push that is sent to Treeherder for evaluation. As Mozilla is slowly moving towards using TaskCluster as its primary continuous integration system, an important step would be make Treeherder, the primary UI used for handling builds and tests, compatible with TaskCluster. The current Treeherder interface allows us to schedule buildbot jobs, a major contribution by Alice Duarte Scarpa. One might run Buildbot jobs through TaskCluster using the BuildbotBridge project.

This proposal aims to integrate Treeherder's job scheduling user interface with TaskCluster jobs. While doing this, it hopes to improve the overall integration of Treeherder and pulse actions / mozci_tools with TaskCluster.

This project would primarily comprise of four components which I describe below.

1(i). Generate Data Source with all possible tasks

The first task here is to get a list of all possible TaskCluster jobs that could have been scheduled for a particular push. Once a user has this list, he's free to choose the TaskCluster jobs he wishes to schedule.

Most TaskCluster tasks can be described (like this one) with a set of “routes” and a “build command”. The routes define some access points for the data. For example, tasks have one or more “index” routes. These “index” routes are a part of the Task Index. The Task Index has a rich set of APIs to fetch artifacts and data. The Task Index maps route names to the latest tasks using that name.

TaskCluster has a special decision task called the “Gecko Decision Task”. This job is automatically triggered each time a commit is pushed. This task essentially parses the commit message, which consists of try syntax or schedules almost everything in the case the repository does not have try syntax. Based on the try syntax and an in-tree script, it creates graph.json file which is picked up by the worker and turned into TaskCluster tasks which are relayed back to Treeherder using the Treeherder APIs via a POST request.

For this project, we would need a complete set of jobs that could be scheduled to be produced along with graph.json.

The first task will be to create this artifact in this job which contains a JSON styled TaskCluster compatible graph of all tasks that could have been scheduled (all_tasks.json). This file would be stored on the machine and can be accessed by Treeherder via an AJAX call as these artifacts produced are public files.

This file would typically be dependent on the repository only, (mozilla-inbound or mozilla-central) and could change with time.

An alternate approach is to work with the Task Index APIs. The Gecko-Decision-Task (like this one) has one Task Index route called

index.gecko.v2.mozilla-inbound.latest.firefox.decision

This route maps to the latest Gecko-Decision-Task of mozilla-inbound. Hence one can store the artifact and fetch the file using this route via an AJAX request from an angular controller in Treeherder.

Possible Changes :-

1(ii) Make use of all_tasks.json in mozci_tools

Before we actually move to Treeherder, I want to make use of the all_tasks.json artifact in the mozci_tools project. Step (i) along with step (ii) has been tracked in the bug https://bugzilla.mozilla.org/show_bug.cgi?id=1232005. mozci_tools has a good integration with the BuildAPI and can schedule BuildBot jobs. Following this part of mozci_tools on similar lines, the first step would be to download this file. After reading this file, the data can be passed to a set of scripts which call the TaskCluster APIs and schedule jobs. TaskCluster has a rich set of APIs which could be used by Treeherder to schedule tasks as a part of the run. Using APIs such as createTask, or the (defineTask and scheduleTask APIs together), we can add some additional jobs in the TaskCluster schedule.

The corresponding Github bug for this is here.

I strongly feel that implementing this on the mozci_tools repository will considerably simplify the task when I go on to implement it on Treeherder.

2. Make Treeherder compatible with all_tasks.json

This part of the project closely follows the methodology followed by Alice in this contribution. The work to be done here follows the lines of the contribution for BuildBot job scheduling.

The first step is to GET the file we've created in the first step, all_tasks.json.

We can do this with a simple GET request to the Task Index. Alternatively, we can do this via the Task Index API findArtifactFromTask. This file is common to all pushes in a repository, so it can be cached and updated every one hour from the same source (which would produce the “latest” version of this file). This second approach will be the backup in case the first doesn't work quite well.

Treeherder provides a set of RESTful APIs which gives us information about performance data, job details and resultsets. Among these APIs lies the runnable jobs API which gives a list of all possible runnable jobs in the project. An example query for the same with mozilla-inbound is here. Currently, this API supports only buildbot jobs.

Now that an all_tasks.json has been cached, Treeherder can call this API to fetch this file along with the corresponding BuildBot jobs. Some form of integration would be necessary in this API since there are a few fundamental differences between TaskCluster and Buildbot. Once this file is fetched, it will be parsed and converted into job symbols which will be displayed along with the BuildBot jobs on clicking “Add New Jobs” . The user will not be shown a distinction between the two types of jobs. Distinction would be made internally using the “build_system_type” parameter (can be seen here). On clicking the job, one of two things can be done as described below.

3(a) Directly call the TaskCluster API from Treeherder

This is the first approach, where there would be some form of a queue created locally by Treeherder as and when jobs are clicked. While forming the queue, the dependencies should be added before the jobs. TaskCluster has a rich set of APIs which could be used by Treeherder to schedule tasks as a part of the run. Using APIs such as createTask, or the (defineTask and scheduleTask APIs together), we can add some additional jobs in the TaskCluster schedule. Using some unique identifier for the push, this information would be relayed back to Treeherder using Treeherder APIs via a POST request, thus completing the cycle.

The unique identifier could be revision hashes. Since Treeherder is currently transitioning away from revision hashes to “top revision” instead, care would have to be taken to send both these parameters in our requests.

Another thing to take into account here is TaskCluster authentication in Treeherder. This again is an ongoing work and might have to be accounted for while sending requests to Treeherder.

3(b) Teach pulse_actions to listen to TaskCluster job scheduling requests

(Recommended by mentor in initial proposal, following this approach in schedule)

This approach is the method Alice used for Buildbot job scheduling.

Pulse actions is a sort of communication medium between Treeherder requests and the Continuous Integration system. It has one worker, which listen to messages from Treeherder and schedule jobs accordingly.When a job is clicked on Treeherder, a pulse message is sent to pulse_actions. pulse_actions relays this message to the TaskCluster/BuildBotBridge and job scheduling takes place like in this workflow.

TaskCluster currently doesn't fully support pulse_actions messages, though it does work on pulse messages of its own. This is an ongoing work and might be in a better state by the time GSoC starts.

In case the work doesn't complete and we decide to use this approach, this bug would become a part of this project and some changes might be needed in the TaskCluster part of the project.

The main bug here would typically involve following a parallel approach to Alice's contribution to pulse_action, and might involve some form of integration with treeherder_runnable.py file (like we integrated with runnable_jobs API in Treeherder). Care would have to be taken to account for the authentication issues involved in the TaskCluster jobs.

Once this is done, the last step would be to enable pulse_actions to listen to the modified Treeherder requests and translate the equivalent requests to TaskCluster API calls.

An important change here would be to account for dependencies. Typically, jobs selected should internally schedule all the dependencies.

Schedule of Deliverables

My summer vacations start on the 1st of May and I would be able to dedicate 5-7 hours of work daily. After the 21st of July, I might be forced to cut down due to the start of my third academic year to 3-4 hours daily. Though I still will be able to give the regular 5-7 hours during my weekend. Hence I intend to do most of GSoC during the period 1st May to the 1st August instead of the usual period of 23rd May to 23rd August**.** This gives me roughly 13 weeks of time. As both my mentors reside in the Eastern Standard Time, I'll try to focus my work more towards the late hours in India.

I have a fairly good knowledge of Git / Github and have been using Mercurial / Bugzilla for a good amount of time. I've worked on Treeherder frontend in the past so I'm familiar with that part of the codebase. However I need to get myself up to date with the Treeherder APIs along with the other repositories involved here which would be in the early part of my schedule. I have tried to ensure that each week has enough coding as well as enough learning.

On the last day of every week, I intend to work on the documentations adding any significant changes I've made. I plan to start a basic blog tracking my GSoC progress. This blog will contain links to my contributions, a brief overview about what I did, and my plan for the next week.

I might be away in the first week of May for a couple of days. Other than that, I will not be having any other committment during the GSoC period.

(Before Week 1, “Week 0” )

  • Start working on a “good first bug” in the mozci_tools project with Armen and Joel, the mentors involved in this project. Even though I might have spoken to the mentors, I've never worked with them on any issues. I think it is important to learn a bit about the workflow they expect me to follow and the code styles I need to learn. Besides, my first set of work is in mozci_tools and I will familiarize myself with the project through this effort.

  • Talk to the mentors about the updates in TaskCluster relevant to the project, specifically the big graph update, in-tree scripts update and the pulse_actions support and the relevant documents / codebase I need to familiarize myself with so that I can update my proposal accordingly.

  • Decide whether to go for the 3(a) or 3(b) (details under Project details) approach while solving the issues of pulse_actions.

  • I wish to learn the try syntax and experimenting its usage on mozilla-inbound which would be needed in the initial part of the project.

May 1st – May 4th (BuildBot Model Familiarization)

  • Before I start working on the project, it is important that I study how mozci_tools and BuildAPI work together to get a clearer higher level picture along with a few details. I'll first try to understand how exactly job scheduling takes place through mozci_tools. I'll also try to understand the major technical advantages of TaskCluster over BuildBot here. I'll make sure I maintain some form of documentation for this step as it will be applicable throughout the project.

  • I plan to design a similar technical structure which I will follow while integrating TaskCluster jobs.

May 5th – May 9th (Gecko Decision Task, TaskCluster familiarization)

  • Discuss with the mentors and other knowledgable community members about the implementation details of the first part of the project, which involves generating the all_tasks.json file.

  • Start reading the TaskCluster docs and the TaskCluster worker codes to get an idea about how TaskCluster generates graph.json and the most important parameters involved here**,**

  • I want to start learning the working of the Gecko Decision Task job and the modifications that need to be made to it.

  • Fix a few bugs in the TaskCluster repositories somewhat related to the Gecko Decision Task.

May 10th – May 21th (Generate the all_tasks.json file, Implementation in mozci_tools)

  • Actually start the project work and produce the all_tasks.json file in the TaskCluster Index. This artifact would be added to the Task Index as discussed in the project details.

  • Now that I have a reasonable knowledge about mozci_tools, I will be able to implement TaskCluster job scheduling in mozci_tools using the defineTask and scheduleTask TaskCluster APIs.

  • I hope to complete this by the end of the 3rd week along with the documentation for the API added. I've added some extra time here since there is a chance this takes up some time with the in-tree script changes.

May 21st – May 28th (Treeherder Familiarization)

  • This week will primarily help me get familiar with the backend of Treeherder. Since I have the UI setup, this part shouldn't take more than a week. I plan to read through the following parts of the Treeherder and fix basics issues in the same :-

    • The web APIs provided by Treeherder here and how they're used by the other Automation Team tools.

    • The POST requests used to send data from TaskCluster to Treeherder.

    • ThResultSetModel, ThResultSetStore, RunnableJobs factories / models which would be relevant when I try to integrate TaskCluster jobs to the scheduling list.

    • Study the new “Add New Jobs” feature which is where my primary UI work would be. I would be going through Alice's patch in great detail.

  • I might need to talk with the Treeherder sheriffs to understand the various methods to send pulse messages and how Treeherder works with pulse_actions.

  • I intend to revisit my Buildbot vs TaskCluster documentation here and see how it is applicable in Treeherder.

May 29th – June 14th (Make Treeherder scheduling compatible with TaskCluster)

  • I'll start off by fetching the all_tasks.json file now stored in the Task Index and setting up a basic parser to read through the JSON file.

  • I'll continue with updating the runnable_jobs API. A few UI changes might be proposed since we are including TaskCluster job scheduling to account for dependencies. This might require some work in AngularJS. Since I'm comfortable with the same, I intend to finish the UI changes in this period itself.

  • Document any major changes in the Treeherder docs.

June 14th – June 21st (Pulse Actions familiarization)

  • Using my buildbot scheduling documentation, I'll first study the pulse actions framework with BuildAPI. My major focus here is to understand how exactly the workers listen to Treeherder pulse messages and relay the messages to Buildbot. This would typically involve the entire pulse_actions codebase so I might fix a few bugs in the same or jump directly to Week 10's work.

June 21st – June 28th (Improve Pulse Actions support for TaskCluster)

  • This week I would try to see where pulse_actions lacks TaskCluster support and fill in the gaps. This week is optional really, as there is a lot of ongoing work in this area. I will jump to the next part in the case pulse_actions is ready to be integrated with TaskCluster.

June 28th – July 15th (Integrate Pulse Actions with updated Treeherder pulse messages)

  • This would typically be the last and most challenging of the project. Now that I've updated the Treeherder APIs and user interface, I need to update pulse actions to listen to the modified pulse messages. I've mentioned how I intend to go about this in the Project details.

  • I'll have to send these messages to the TaskCluster API via pulse_actions. Care needs to be taken here with repect to the authentication differences of TaskCluster and Buildbot.

  • The last part would involve getting the appropriate set of jobs in Treeherder from TaskCluster using the POST requests described earlier.

  • This week will involve a lot of debugging and I expect it to be an intensive week where I typically put in 7-9 hours daily.

July 15th – August 1st (Buffer Week, Last Parts of Documentation)

  • My college would start during this period, so this period is not very heavy as I won't be able to devote more than 4 hours a day.

  • I typically expect the Treeherder familiarization period and the last part of the project to be heavier than the others. In the case I'm unable to stay on schedule, I intend to use this week to complete the remaining pull requests.

  • I intend to finish the documentation changes in pulse_actions in this period.

Possible extensions to this project

1. Since Treeherder is a little slow at times, I want to develop a Firefox Addon to perform some Treeherder Tasks, especially tracking your latest push to Treeherder and scheduling jobs to it using the Treeherder web APIs. I would like to support this with a Github bot which can help developers do the same via commands in Pull Requests, similar to how the bors-servo bot works. I've always wanted to develop a Github bot, and this would be an excellent place to try this out!

2. Look for the loopholes in the TaskCluster – Treeherder integration and discuss possible solutions. In case we get some worthwhile solutions, I'll happily participate / mentor in the third Quarter of Contribution with the same.

3. Improve the overall TaskCluster – mozci_tools integration based on the remaining issues here – https://github.com/mozilla/mozilla_ci_tools/milestones/TaskCluster%20support.

Further work in Treeherder :-

1. Speed up Treeherder's UI using React JS (work so far)

2. Try to integrate Servo's tests with Treeherder

Open Source Development Experience

I've been contributing to Mozilla for the last seven months and have worked with the Engineering Productivity Team (#ateam) for the last four months. I've completed my Quarter of Contribution with James Graham in December 2015. The review blogpost would be here.

My Bugzilla profile :- https://bugzilla.mozilla.org/[email protected]

My Github profile :- https://github.com/martiansideofthemoon

My Mozillans profile :- https://mozillians.org/en-US/u/martianwars/

My open source contributions to Mozilla include :-

  • Web Platform Tests Results Viewer – (QoC Project) I have worked on the development of a structured log viewer which has the capability to ingest mozlog files (via URLs / Files) and carry out comparisons between results of various test runs using a filtering system. The project was developed using Lovefield and AngularJS. This was integrated with Treeherder in the first week of March. I mantain this repository with James Graham now and have successfully mentored a few newcomers.

  • TreeherderI've written a few patches with William Lachance to reduce the time taken to render jobs. I'm working on switching the job rendering to ReactJS. I've also written the patch to integrate WPT Results viewer with the Treeherder UI.

  • Gaia – I worked primarily on the SMS Appwith Oleg Azasypkin and Julien Wajsberg. I've written a Marionette test suite, added documentation to a certain part of the app, and fixed a few issues in its functionality. My contributions are here.

  • Firefox for Android – I've done some work for Fennec with Nick Alexander, primarily improving parts of UI and fixing redundant code. My contributions are here.

  • Other Mozilla Repositories – I've written a few basic patches in each of the following projects :-

  • Git Cheat Sheet Wrote a cheat sheet for Git commands in Hindi as an extension to the English version.

  • Other Projects -

    • Pyraminx Utility Kit – A complete tool set to find optimal solutions for a Pyraminx state and generate algorithms for the same. This includes an image processing module in Android to read Pyraminx states.

    • Python Experience – I've written a few scripts to automate some basic tasks using Selenium, Mechanize and Beautiful Soup. I also have some experience in Django.

    • Android Experience – I've written a couple of apps, and developed the app for my college cultural festival Mood Indigo with a friend.

Work / Internship Experience

  • Pickup – Pickup was an automated carpooling taxi service in India. Wrote a set of RESTful APIs with an ER Model Database in Laravel using a few algorithms that used the Google Geocoding API to match similar travel routes.

  • Web and Coding Club, IIT Bombay– I lead a team at IIT Bombay which strives to help newcomers get started with programming and to develop a healty coding culture in the institute. We host a number of workshops and institute wide competitions to cultivate a hobbyist coding culture.

Academic Experience

So I'm a sophomore undergraduate student of Electrical Engineering at the Indian Institute of Technology Bombay.

I wouldn't say electrical engineering was my passion when I chose to pursue it. I have always loved to tinker with old electronic items in my house but I never thought I would pursue it as a career. I have always loved science and technology, especially physics and mathematics, and electrical engineering seemed like a good bet as it would introduce me to a wide spectrum of engineering. I have been blessed to study in a good institute and I've fallen in love with my subject. Though I program a lot, I try my level best to stay focussed on my course work and I hope to pursue a career that has an apt combination of the two.

Besides, I've received some basic training in Astronomy and love to read about the exciting astronomical discoveries that happen every day.

A link to my complete academic resumeis here.

Why Me

I think this project is apt for me as it involves a project I've loved contributing to (Treeherder) in the last few months. I am familiar with the UI parts of the code base and this project seems to be an interesting addition which I would love to work on. I have a good rapport with some of the Treeherder engineers, especially William Lachance, and I think collaborating with them would be a lot easier for this reason. Through the Quarter of Contribution, I've also gotten to know Joel and Armen who would mentor this project.

This idea excites me as I would get a chance to learn about the various Continuous Integration systems existing in Mozilla. I do hope to give my level best like I did for the QoC and I hope to establish myself as a permanent “ateamer” through this project, and who knows, might even mentor a GSoC / QoC myself someday!

Why Mozilla

So I was introduced to Mozilla by Manish Goregaokar last Semptember and I've been contributing to Mozilla ever since. What I've really liked about Mozilla is the effort everyone takes to help newcomers, mentor projects and welcome contributors. I've never felt scared looking at the enormous code base as each of my mentors have tried to give me appropriate hints and help me climb up the learning curve. I particularly like Mozilla's vision, work and dedication to see a better internet and being a part of this team is a dream come true in itself.