Newest Data
Arrivals
Last update: February 19, 2018
Recently added and updated files
    Recent research in C. elegans and the rodent has identified correlations between gene expression and connectivity. Here we extend this type of approach to examine complex patterns of gene expression in the rodent brain in the context of regional brain connectivity and differences in cellular populations. Using multiple large-scale data sets obtained from public sources, we identified two novel patterns of mouse brain gene expression showing a strong degree of anti-correlation, and relate this to multiple data modalities including macroscale connectivity. We found that these signatures are associated with differences in expression of neuronal and oligodendrocyte markers, suggesting they reflect regional differences in cellular populations. We also find that the expression level of these genes is correlated with connectivity degree, with regions expressing the neuron-enriched pattern having more incoming and outgoing connections with other regions. Our results exemplify what is possible when increasingly detailed large-scale cell- and gene-level data sets are integrated with connectivity data.

    Published on: 16 February 2018

    Permanent URL: http://hdl.handle.net/11272/10549

    MOTIVATION: We examine the effect of replication on the detection of apparently differentially expressed genes in gene expression microarray experiments. Our analysis is based on a random sampling approach using real data sets from 16 published studies. We consider both the ability to find genes that meet particular statistical criteria as well as the stability of the results in the face of changing levels of replication. RESULTS: While dependent on the data source, our findings suggest that stable results are typically not obtained until at least five biological replicates have been used. Conversely, for most studies, 10-15 replicates yield results that are quite stable, and there is less improvement in stability as the number of replicates is further increased. Our methods will be of use in evaluating existing data sets and in helping to design new studies.

    Published on: 16 February 2018

    Permanent URL: http://hdl.handle.net/11272/10564

    Expression profiling of post-mortem human brain tissue has been widely used to study molecular changes associated with neuropsychiatric diseases as well as normal processes such as aging. Changes in expression associated with factors such as age, gender or postmortem interval are often more pronounced than changes associated with disease. Therefore in addition to being of interest in their own right, careful consideration of these effects are important in the interpretation of disease studies. We performed a large meta-analysis of genome-wide expression studies of normal human cortex to more fully catalogue the effects of age, gender, postmortem interval and brain pH, yielding a "meta-signature" of gene expression changes for each factor. We validated our results by showing a significant overlap with independent gene lists extracted from the literature. Importantly, meta-analysis identifies genes which are not significant in any individual study. Finally, we show that many schizophrenia candidate genes appear in the meta-signatures, reinforcing the idea that studies must be carefully controlled for interactions between these factors and disease. In addition to the inherent value of the meta-signatures, our results provide critical information for future studies of disease effects in the human brain.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10572

    Abstract: An important goal in neuroscience is to understand gene expression patterns in the brain. The recent availability of comprehensive and detailed expression atlases for mouse and human creates opportunities to discover global patterns and perform cross-species comparisons. Recently we reported that the major source of variation in gene transcript expression in the adult normal mouse brain can be parsimoniously explained as reflecting regional variation in glia to neuron ratios, and is correlated with degree of connectivity and location in the brain along the anterior-posterior axis. Here we extend this investigation to two gene expression assays of adult normal human brains that consisted of over 300 brain region samples, and perform comparative analyses of brain-wide expression patterns to the mouse. We performed principal components analysis (PCA) on the regional gene expression of the adult human brain to identify the expression pattern that has the largest variance. As in the mouse, we observed that the first principal component is composed of two anti-correlated patterns enriched in oligodendrocyte and neuron markers respectively. However, we also observed interesting discordant patterns between the two species. For example, a few mouse neuron markers show expression patterns that are more correlated with the human oligodendrocyte-enriched pattern and vice-versa. In conclusion, our work provides insights into human brain function and evolution by probing global relationships between regional cell type marker expression patterns in the human and mouse brain.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10578

    The ability to computationally extract mentions of neuroanatomical regions from the literature would assist linking to other entities within and outside of an article. Examples include extracting reports of connectivity or region-specific gene expression. To facilitate text mining of neuroscience literature we have created a corpus of manually annotated brain region mentions. The corpus contains 1,377 abstracts with 18,242 brain region annotations. Interannotator agreement was evaluated for a subset of the documents, and was 90.7% and 96.7% for strict and lenient matching respectively. We observed a large vocabulary of over 6,000 unique brain region terms and 17,000 words. For automatic extraction of brain region mentions we evaluated simple dictionary methods and complex natural language processing techniques. The dictionary methods based on neuroanatomical lexicons recalled 36% of the mentions with 57% precision. The best performance was achieved using a conditional random field (CRF) with a rich feature set. Features were based on morphological, lexical, syntactic and contextual information. The CRF recalled 76% of mentions at 81% precision, by counting partial matches recall and precision increase to 86% and 92% respectively. We suspect a large amount of error is due to coordinating conjunctions, previously unseen words and brain regions of less commonly studied organisms. We found context windows, lemmatization and abbreviation expansion to be the most informative techniques. The corpus is freely available at http://www.chibi.ubc.ca/WhiteText/.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10574

    BACKGROUND: Differential coexpression is a change in coexpression between genes that may reflect 'rewiring' of transcriptional networks. It has previously been hypothesized that such changes might be occurring over time in the lifespan of an organism. While both coexpression and differential expression of genes have been previously studied in life stage change or aging, differential coexpression has not. Generalizing differential coexpression analysis to many time points presents a methodological challenge. Here we introduce a method for analyzing changes in coexpression across multiple ordered groups (e.g., over time) and extensively test its validity and usefulness. RESULTS: Our method is based on the use of the Haar basis set to efficiently represent changes in coexpression at multiple time scales, and thus represents a principled and generalizable extension of the idea of differential coexpression to life stage data. We used published microarray studies categorized by age to test the methodology. We validated the methodology by testing our ability to reconstruct Gene Ontology (GO) categories using our measure of differential coexpression and compared this result to using coexpression alone. Our method allows significant improvement in characterizing these groups of genes. Further, we examine the statistical properties of our measure of differential coexpression and establish that the results are significant both statistically and by an improvement in semantic similarity. In addition, we found that our method finds more significant changes in gene relationships compared to several other methods of expressing temporal relationships between genes, such as coexpression over time. CONCLUSION: Differential coexpression over age generates significant and biologically relevant information about the genes producing it. Our Haar basis methodology for determining age-related differential coexpression performs better than other tested methods. The Haar basis set also lends itself to ready interpretation in terms of both evolutionary and physiological mechanisms of aging and can be seen as a natural generalization of two-category differential coexpression. CONTACT: paul@bioinformatics.ubc.ca.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10573

    MOTIVATION: The Gene Ontology (GO) is heavily used in systems biology, but the potential for redundancy, confounds with other data sources and problems with stability over time have been little explored. RESULTS: We report that GO annotations are stable over short periods, with 3% of genes not being most semantically similar to themselves between monthly GO editions. However, we find that genes can alter their 'functional identity' over time, with 20% of genes not matching to themselves (by semantic similarity) after 2 years. We further find that annotation bias in GO, in which some genes are more characterized than others, has declined in yeast, but generally increased in humans. Finally, we discovered that many entries in protein interaction databases are owing to the same published reports that are used for GO annotations, with 66% of assessed GO groups exhibiting this confound. We provide a case study to illustrate how this information can be used in analyses of gene sets and networks. AVAILABILITY: Data available at http://chibi.ubc.ca/assessGO.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10571

    We describe the WhiteText project, and its progress towards automatically extracting statements of neuroanatomical connectivity from text. We review progress to date on the three main steps of the project: recognition of brain region mentions, standardization of brain region mentions to neuroanatomical nomenclature, and connectivity statement extraction. We further describe a new version of our manually curated corpus that adds 2,111 connectivity statements from 1,828 additional abstracts. Cross-validation classification within the new corpus replicates results on our original corpus, recalling 67% of connectivity statements at 51% precision. The resulting merged corpus provides 5,208 connectivity statements that can be used to seed species-specific connectivity matrices and to better train automated techniques. Finally, we present a new web application that allows fast interactive browsing of the over 70,000 sentences indexed by the system, as a tool for accessing the data and assisting in further curation. Software and data are freely available at http://www.chibi.ubc.ca/WhiteText/.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10570

    BACKGROUND: We performed a statistical analysis of a previously published set of gene expression microarray data from six different brain regions in two mouse strains. In the previous analysis, 24 genes showing expression differences between the strains and about 240 genes with regional differences in expression were identified. Like many gene expression studies, that analysis relied primarily on ad hoc 'fold change' and 'absent/present' criteria to select genes. To determine whether statistically motivated methods would give a more sensitive and selective analysis of gene expression patterns in the brain, we decided to use analysis of variance (ANOVA) and feature selection methods designed to select genes showing strain- or region-dependent patterns of expression. RESULTS: Our analysis revealed many additional genes that might be involved in behavioral differences between the two mouse strains and functional differences between the six brain regions. Using conservative statistical criteria, we identified at least 63 genes showing strain variation and approximately 600 genes showing regional variation. Unlike ad hoc methods, ours have the additional benefit of ranking the genes by statistical score, permitting further analysis to focus on the most significant. Comparison of our results to the previous studies and to published reports on individual genes show that we achieved high sensitivity while preserving selectivity. CONCLUSIONS: Our results indicate that molecular differences between the strains and regions studied are larger than indicated previously. We conclude that for large complex datasets, ANOVA and feature selection, alone or in combination, are more powerful than methods based on fold-change thresholds and other ad hoc selection criteria.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10568

    Methods are presented for detecting differential expression using statistical hypothesis testing methods including analysis of variance (ANOVA). Practicalities of experimental design, power, and sample size are discussed. Methods for multiple testing correction and their application are described. Instructions for running typical analyses are given in the R programming environment. R code and the sample data set used to generate the examples are available at http://microarray.cpmc.columbia.edu/pavlidis/pub/aovmethods/.

    Published on: 14 February 2018

    Permanent URL: http://hdl.handle.net/11272/10566