Newest Data
Arrivals
Last update: May 8, 2015
Recently added and updated files
    In 2016, the Canadian federal funding agencies introduced the Tri-Agency Statement of Principles on Digital Data Management, which advocates for developing data management plans (DMPs) and making data available for future research. A data management plan addresses questions about: research data types and formats, metadata standards, ethics and legal compliance, data storage and reuse, assignment of data management responsibilities, and resource requirements. With anticipation that DMPs will be increasingly required in grants applications, librarians at University of British Columbia surveyed researchers about their RDM practices and needs in three phases, each of which targets different disciplines: 1) the Sciences and Engineering (fall 2015), 2) the Social Sciences and Humanities (fall 2016), and 3) the Health Sciences (spring 2017). The surveys illuminate disciplinary differences in RDM, and will inform the University in developing infrastructure and services to support researchers in RDM. This report describes findings from the third survey at UBC targeting researchers in the Health Sciences.

    Published on: 08 June 2017

    Permanent URL: http://hdl.handle.net/11272/10491

    First published in 1867, the Canada Year Book (CYB) charts key trends and indicators in the nation's economy, population, society and environment.

    Published on: 06 June 2017

    Permanent URL: http://hdl.handle.net/11272/MLNYP

    The DAD contains data on separations from acute inpatient institutions and selected day surgery, chronic, rehabilitation and psychiatric institutions. Data is collected on separations with a discharge date between April 1 and March 31 of the given fiscal year Note that this 10% sample does not include Quebec or British Columbia

    Published on: 06 June 2017

    Permanent URL: http://hdl.handle.net/11272/10498

    Orthorectified aerial imagery of the UBC Okanagan campus, Kelowna, BC, 2013. Ortho Pixel size - 10 cm

    Published on: 02 June 2017

    Permanent URL: http://hdl.handle.net/11272/PAOHL

    Introduction Multi-Language Conversational Telephone Speech 2011 -- Turkish was developed by the Linguistic Data Consortium (LDC) and is comprised of approximately 18 hours of telephone speech in Turkish. The data were collected primarily to support research and technology evaluation in automatic language identification, and portions of these telephone calls were used in the NIST 2011 Language Recognition Evaluation (LRE). LRE 2011 focused on language pair discrimination for 24 languages/dialects, some of which could be considered mutually intelligible or closely related. LDC has released the following as part of the Multi-Language Conversation Telephone Speech 2011 series: Slavic Group (LDC2016S11) Turkish (LDC2017S09) Data Participants were recruited by native speakers who contacted acquaintances in their social network. Those native speakers made one call, up to 15 minutes, to each acquaintance. The data was collected using LDC's telephone collection infrastructure, comprised of three computer telephony systems. Human auditors labeled calls for callee gender, dialect type and noise. Demographic information about the participants was not collected. All audio data are presented in FLAC-compressed MS-WAV (RIFF) file format (*.flac); when uncompressed, each file is 2 channels, recorded at 8000 samples/second with samples stored as 16-bit signed integers, representing a lossless conversion from the original mu-law sample data as captured digitally from the public telephone network. The following table summarizes the total number of calls, total number of hours of recorded audio, and the total size of compressed data: group lng #calls #hours #MB turkish tur 87 18.6 975

    Published on: 18 May 2017

    Permanent URL: http://hdl.handle.net/11272/ACOKK

    Introduction Phrase Detectives Corpus was developed by the School of Computer Science and Electronic Engineering at the University of Essex and consists of approximately 19,012 words across 40 documents anaphorically-annotated by the Phrase Detectives game, an online interactive "game-with-a-purpose" (GWAP) designed to collect data about English anaphoric coreference. GWAPs for creating language resources are growing. In general, they employ non-monetary incentives, such as entertainment, to motivate participation and can be successful for large-scale persistent annotation efforts. Data The documents in the corpus are taken from Wikipedia articles and from narrative text in Project Gutenberg. Wikipedia articles and annotation files are presented as XML and Project Gutenberg source files are presented as plain text. All text is encoded as UTF-8. Annotations are comprised of a gold standard version created by multiple experts, as well as a set created by a large non-expert crowd (via the Phase Detectives game). The data was annotated according to a prevalent linguistically-oriented approach for anaphora used in several tasks, including OntoNotes Release 5.0 (LDC2013T19), SemEval-2010 Task 1 Ontonotes English: Coreference Resolution in Multiple Languages (LDC2011T01) and The ARRAU Corpus of Anaphoric Information (LDC2013T22).

    Published on: 18 May 2017

    Permanent URL: http://hdl.handle.net/11272/QCTAL

    Introdution The EventStatus Corpus was developed by researchers at Texas A&M University, Stanford University and The University of Utah. It consists of approximately 3,000 English and 1,500 Spanish news articles about civil unrest events annotated with temporal tags. This corpus was designed to support the study of the temporal and aspectual properties of major events, that is, whether an event has already happened, is currently happening or may happen in the future. Since it focuses on a single domain (civil unrest events), it may be appropriate for tasks such as event extraction and temporal question answering. Data The relevant news articles were sourced from English Gigaword Fifth Edition (LDC2017T07) and Spanish Gigaword Third Edition (LDC2011T12). The civil unrest events include protests, demonstrations, marches and strikes. The data was annotated as PAST, ON-GOING or FUTURE and within each of those categories, as PLANNED, ALERT or POSSIBLE. In addition to the annotated articles, file lists used in experiments for tuning and test are included. 10-fold cross-validations were performed, and the specific 10-fold splits of the test are included as well. All text is presented as plain text and encoded in UTF-8.

    Published on: 18 May 2017

    Permanent URL: http://hdl.handle.net/11272/UICPL

    CanMap® Address Points are unique and discrete representations of civic address assignments across Canada. It is the ultimate in answering the question of “where” and an anchor for a single source of accuracy in your mission-critical data. When building your location intelligence solution, this component can represent the single most important geometry feature providing high precision to your application. Benefits: Enhance your corporate location intelligence capabilities Provide highest precision coordinates for geocoding Optimize current automated workflows and reduce costly secondary manual processes Provide clear identification of actual addresses within a defined zone of interest Gain new competitiveness by powering your analytics with rooftop precision data features Enhance your end-user experience by getting them curbside to their exact destination With urban and rural coverage across Canada, the point feature is the ultimate in high-confidence, high accuracy geographic representations demarcating the physical location of an address.

    Published on: 09 May 2017

    Permanent URL: http://hdl.handle.net/11272/10487

    CanMap® Address Points are unique and discrete representations of civic address assignments across Canada. It is the ultimate in answering the question of “where” and an anchor for a single source of accuracy in your mission-critical data. When building your location intelligence solution, this component can represent the single most important geometry feature providing high precision to your application. Benefits: Enhance your corporate location intelligence capabilities Provide highest precision coordinates for geocoding Optimize current automated workflows and reduce costly secondary manual processes Provide clear identification of actual addresses within a defined zone of interest Gain new competitiveness by powering your analytics with rooftop precision data features Enhance your end-user experience by getting them curbside to their exact destination With urban and rural coverage across Canada, the point feature is the ultimate in high-confidence, high accuracy geographic representations demarcating the physical location of an address.

    Published on: 09 May 2017

    Permanent URL: http://hdl.handle.net/11272/10486

    The Labour Force Survey provides estimates of employment and unemployment which are among the most timely and important measures of performance of the Canadian economy. With the release of the survey results only 10 days after the completion of data collection, the LFS estimates are the first of the major monthly economic data series to be released. The Canadian Labour Force Survey was developed following the Second World War to satisfy a need for reliable and timely data on the labour market. Information was urgently required on the massive labour market changes involved in the transition from a war to a peace-time economy. The main objective of the LFS is to divide the working-age population into three mutually exclusive classifications - employed, unemployed, and not in the labour force - and to provide descriptive and explanatory data on each of these. LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, wage rates, union status, job permanency and workplace size are also produced. For a full listing and description of LFS variables, see the Guide to the Labour Force Survey (71-543-G), available through the "Publications" link above. These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

    Published on: 05 May 2017

    Permanent URL: http://hdl.handle.net/11272/10439