Information on physical, functional, and LD annotation served on the SCAN database comes directly from public resources, including the HapMap (release 23a), NCBI (dbSNP 129), or is information created by us using data downloaded from these public resources. For example, the multi-locus measure of disequilibrium, MD, used to summarize some of the reported LD relationships among SNPs and to characterize coverage of genes by product, was calculated using TUNA (Nicolae, 2006), with data from HapMap. Information on the relationship between SNPs and expression transcript levels that is served by SCAN comes from a series of publications describing studies characterizing eQTLs in cell lines from HapMap CEU and YRI samples for which transcript levels had been assayed using the Affymetrix Human Exon 1.0 ST Array. All papers relevant for data served on SCAN are available here. We expect to add eQTL information to SCAN from other studies in which expression has been assayed in additional human tissues, and will include information from relevant papers as results from these studies are migrated into SCAN.
We are making available annotation files for several of the commonly used high-throughput genotyping products, and those are described in and will be downloadable from here. At this time, we are not creating custom annotation files, but as new annotation files are created, they will be described and available for download at this same site.
SCAN will be maintained and kept up to date by the Pharmacogenetics of Anticancer Agents Research Group based at the University of Chicago, which is funded by the NIHGMS U0161393. Related data (e.g., eQTL data in other tissues), when available, will be incorporated into SCAN.
In order to encourage the development of bioinformatics applications that utilize or integrate the data served in SCAN, we are developing an application programming interface (API), written in Simple Object Access Protocol (SOAP). SOAP enables the exchange of structured information, written in the ubiquitous Extensible Markup Language (XML), with other databases or other applications. The architecture allows application developers to utilize real-time data from SCAN -- expression quantitative trait loci (eQTLs) and linkage disequilibrium (multi-locus or pairwise) -- without having to build their own gene expression or LD calculation engine. Initially, we will expose as methods the queries currently available on the SCAN website. In particular, the following methods will be available:
For each query, you can select from the following formats: HTML, comma-delimited, or tab-delimited.
Take the tab-delimited output file, for example. Fields are, as expected, separated by tabs. The expression data, which can be multiple, are colon-delimited. Similarly, for a comma-delimited file.
See Download page.
The results of the expression QTL and methylation QTL analyses are available through SCAN queries. The expression data are available for download at http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE9703; the methylation data can be downloaded from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE39672.