Data Science Projects

Here you may find a list of my Data Science Projects available online. For more information, don’t hesitate to contact me. Thank you!

  • HIV patient response to antiviral treatment (hivprogression dataset). I predicted the response of HIV patients to the antiviral treatment based on a limited number of measured parameters. To improve model’s performance, I extracted new features via the analysis of the nucleotide sequence of two viral genes (PR and RT) from each patient. The Dataset was downloaded from Kaggle and analyzed using R and Statistica13 (Dell). The final model was built using a neural network algorithm and had an error rate of about 25% in the test set.
  • 2016 University Rankings (posted on Linkedin). I explored the “Times Higher Education World University Ranking” report published in 2016 and compared overall rankings and research scores of the three top Universities in Chicago. Plots were generated using Tableau 9.2.4.
  • Hotspotter (http://52.205.142.233:3838/hotspotter/): I built this Shiny App to facilitate identification of hotspot mutation regions in human cancer genes. Hotspotter uses curated data from The Cancer Genome Atlas ( TCGA, https://cancergenome.nih.gov/ ). Hotspotter is hosted on an Amazon EC2 server, can be accessed via Internet, and can execute real-time analyses on genomic data from 25 different types of cancer , 8,451 patients , 18,010 genes , and a total of 1,092,589 mutations affecting protein-coding sequences.
  • Scraping PubMed records for a targeting campaign. This post is aimed at suggesting a business-oriented way of making use of data included in PubMed records. This post presents an hypothetical case study that is approached according to the work-flow of a Data Mining problem under the CRISP-DM model and is focused on the business understanding, data understanding and data preparation steps
  • Analysis of patient trajectories from the the mimic-iii dataset (still in progress…)