Getting Selected In GSoC

D-Day

For me this was 23rd April 00:30 AM. After a month I had been ‘not very patiently’ waiting for my GSoC result. I had high expectations. I wanted to get selected so bad. I was really nervous whole day, and to cope with it I was playing Dota2 with my friends. Then one of my friends called me up and asked me to check the results. Trembling I visited the GSoC dashboard…
and VOILA I was selected ! I had made it. Finally, and that too on my first attempt. It is the happiest moment of my life till now. I am the first student ever from my college to get selected in Google Summer of Code. I only have one person to thank for my selection in GSoC, who motivated me through everything.

Back to the project

As mentioned in the previous post, I will be working on a computational genomics project under Canadian Centre of Computational Genomics.

My project is titled “Improving SegAnnDB Webapp”

What is SegAnnDB ?

SegAnnDB is Segmentation Annotation Database. It is a webapp which is used for identifying gain, loss, amplification in a DNA segment. It is used in copy number analysis. Here is a working instance of SegAnnDB.

Copy number variations

As per wikipedia -

Copy-number variations (CNVs) are a form of structural variation that manifest as deletions or duplications in the genome. For example, the chromosome that normally has sections in order as A-B-C-D might instead have sections A-B-C-C-D (a duplication of “C”) or A-B-D (a deletion of “C”). Cells with CNVs have abnormal or, for certain genes, normal variations in their copy number.

CNV Image

Study of copy number variation is closely related to tumors. Although copy number variations are common in humans, but many studies have found that copy number variations in genes are related to diseases like tumors, cancers, alzheimers. Progress made in field of CNV will help greatly in demystifying the causes and cures behind these diseases.

SegAnnDB focuses on helping researchers analyse th copy number alterations in a chromosome.

Project Aim -

There is already a working version of SegAnnDB in which we can do basic but very accurate genomic segmentation. It visualizes by plotting the log ratio against chromosome length. It is one of the most accurate systems for annotations.

1. Add appropriate unit tests and a regression testing suite using selenium testing framework.

2. Render plots based on chromosome region the user wants to see/annotate.

3. Permission System, whereby a user can grant permissions like read, write to other users.

4. Social Annotations - One user will be able to share his/her annotations with other users.

5. Faster Deletion of Profiles - Aims to optimize the deletion algorithm from O(ND) to O(1)

6. Safe Deletion of Log Files - Have a cron job which will periodically delete all the unused log files.

7. Docker container - Package SegAnnDB as a docker container.

Time line-

The project timeline is as follows -

Project Milestones and Deliverables

Community Bonding Period (April 22-May 22) Become more familiar with the codebase and how it works. Realize how and what the test cases should be about. Also, find out that what would be the most efficient way of coding the other parts as well. Towards the last week, start coding the unit tests using Selenium test framework.
May 23 - May 30 (1 Week) Using Selenium Web Driver Framework, start writing code for the regression testing suite. I will decide what tests need to be written in my community bonding period.
May 31 - June 13 (2 Weeks) Work on replacing large pngs, with functionality to view subregions of chromosome.
June 14 - June 20 (1 Week) Permission System - Implement a permission system so that users can grant read/write permissions on profiles.
Midterm Evaluations Submit midterm evaluation by June 22. Then continue coding.
June 22 - June 28 (1 week) Wrap up the remaining work on permission system
June 29 - July 12 (2 Weeks) Work on Social Annotations.
July 13 - July 26 (2 Week) Faster deletion of profiles. Optimize O(ND) to O(1)
July 27 - August 2(1 Week) Cron Job for safe deletion of log files of BerkeleyDB
August 3 - August 9 (1 Week) Create Docker Image. Package SegAnnDB as a standalone docker container and upload it to dockerhub.
Remaining days Reserved as a buffer period in case something takes longer than expected or unforeseen difficulties arise. If everything runs as per the timeline then this period will be used for more code cleanup, better testing and more documentation.

Technologies and Frameworks Involved -

Languages - Python, Javascript and Berkeley DB.

1. Pyramid framework - As SegAnnDB is a webapp. So it is built using pyramid web application framework.

2. Selenium Unit testing - For creating unit testing framework, I will use python bindings of selenium.

3. D3 JS - It is a JS library for visualizing data on the client side. It handles showing annotations and also, uploading them to ssrver.

4. PruneDP and Segannot - SegAnnDB uses machine learning for getting accurate breakpoints beforehand, then the user can verify and add his own.

I hope I have well explained the project. If you still have doubts please feel free to comment here or email me at x.abhishek.flyhigh@gmail.com

Comments