My Photo
I am Research Scholar at the Memorial Sloan Kettering Cancer Centre in New York, working on the BioPAX pathway exchange model and Pathway Commons.
View my complete profile

Post Docs

University of Glasgow

Interfacing proteomic and genomic data. Post Doc in the RASOR project. I was tasked to implement technologies to interface across proteomic and genomic data. The focus of this project is improved data handling, storage and distribution through an integrated LIMS systems as a foundation to the establishment of an integrated relational database. On the face of it, this is classic data integration of two heterogeneous data systems, however, given the nature of the data sources, proteomics data and genomics data, actual integration would be minimal since the overlapping data elements are few. The data are semantically different and therefore not only are they difficult to physically integrate but the process would add little value to the data itself. Since the actual reason for integration is to query the data as a unit, it is more important to the data users to have the data in a form that allows querying across these data. Semantic integration promises to provide exactly this capability. I am using RDF, RDF-S and OWL to integrate genomic, proteomic and transcriptomic data.


Integration of available taxonomic hierarchies from online databases to facilitate easy information retrieval from TreeBASE.

The problem my PhD studies addressed was the use of different hierarchies in information retrieval. Different opinions in taxonomic placement has meant that different taxonomic resources, such as NCBI and ITIS, use different taxonomic hierarchies. While it is very useful to conduct a hierarchical search to retrieve data, for example all Insecta data or all data in the genus Drosophila, in practice this is difficult because the term Insecta in one taxonomic resource can contain very different data in another. Enabling hierarchical queires in systems such as TreeBASE is therefore a challenge that requires delivering and maintaining taxonomic hierarchies that encompasses current taxonomic opinion.

TCl-Db is a datawarehouse of taxonomic names and classfication hierarchies that can be layered over systems such as TreeBASE to enable hierarchical queries.

TCl-DB TreeBASE wrapper can be found here.


E-MAIL: anwarn @ mskcc . org
E-MAIL: anwar @ cbio . mskcc . org
SKYPE: anwarnadia


  • Francisella tularensis novicida proteomic and transcriptomic data integration and annotation based on semantic web technologies
    BMC Bioinformatics 2009, 10(Suppl 10):S3 (1 October 2009)
    Nadia Anwar and Ela Hunt

  • Improved data retrieval from TreeBASE via taxonomic and linguistic data enrichment
    BMC Evolutionary Biology 2009, 9:93 (8 May 2009)
    Nadia Anwar and Ela Hunt

  • Semantic Data Integration for Francisella tularensis novicida Proteomic and Genomic Data.
    Semantic Web Applications and Tools for Life Sciences (SWAT4LS), November 2008, Edinburgh, Scotland.
    Nadia Anwar, Ela Hunt, Walter Kolch and Andrew Pitt

  • Taxonomic Support in Systematics, Doctoral Thesis University of Glasgow (2008).
    N. Anwar
    PDF (9.5M)

  • Taxonomy database as an enabling technology for the Tree of Life
    Workshop on Database Issues in Biological Databases (DBiBD), January 2005, Edinburgh, Scotland.
    N. Anwar
    DBiBD Proceedings

  • 12thInternational Conference on Intelligent Systems for Molecular Biology &
    3rdEuropean Conference on Computational Biology (ISMB/ECCB2004)
    August 2004, Glasgow, Scotland.
    Poster - Taxonomy, Biology's first ontology, and the Tree of Life, Biology's grandest endeavour.
    N. Anwar

    One table or two?
    I have been figuring out the RDF capability in Oracle and I'm now stuck. I created a table rdf_dev, then created a model (as per the manual) and I have loaded one data set. I am stuck on what to do with the second data set. Do I create a new table (this seems non-sensicle since it will have the same structure and will be in the same model) or do I create a separate table for each data set in the model??? I have a feeling one table- one model is not going to scale very well. But - how is querying across these tables going to work? Do the queries run over the model or over the tables? Also, how do the inference rules run - again across the model or a across the table?? More reading.....