Newest Data
Arrivals
Last update: September 24, 2018
Recently added and updated files
    The primary objective of the Canadian Income Survey (CIS) is to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The data collected in the CIS is combined with Labour Force Survey (LFS, record number 3701) and tax data. The survey gathers information on labour market activity, school attendance, disability, support payments, child care expenses, inter-household transfers, personal income, and characteristics and costs of housing. This content is supplemented with information on individual and household characteristics (e.g. age, educational attainment, main job characteristics, family type), as well as geographic details (e.g. province, census metropolitan area (CMA)) from the LFS. Tax data for income and income sources are also combined with the survey data. Results from the survey are made available not only to various levels of government, but also to individuals and organizations. All levels of government can use CIS data to shape policies and programs related to the economic well-being of Canadians. Statistical organizations such as the Organization for Economic Cooperation and Development (OECD) use the results for international benchmarking and comparison studies.

    Published on: 21 September 2018

    Permanent URL: http://hdl.handle.net/11272/10646

    The primary objective of the Canadian Income Survey (CIS) is to provide information on the income and income sources of Canadians, along with their individual and household characteristics. The data collected in the CIS is combined with Labour Force Survey (LFS, record number 3701) and tax data. The survey gathers information on labour market activity, school attendance, disability, support payments, child care expenses, inter-household transfers, personal income, and characteristics and costs of housing. This content is supplemented with information on individual and household characteristics (e.g. age, educational attainment, main job characteristics, family type), as well as geographic details (e.g. province, census metropolitan area (CMA)) from the LFS. Tax data for income and income sources are also combined with the survey data. Results from the survey are made available not only to various levels of government, but also to individuals and organizations. All levels of government can use CIS data to shape policies and programs related to the economic well-being of Canadians. Statistical organizations such as the Organization for Economic Cooperation and Development (OECD) use the results for international benchmarking and comparison studies.

    Published on: 21 September 2018

    Permanent URL: http://hdl.handle.net/11272/10623

    The most complete and up-to-date Canadian postal code database for accurate customer profiling, direct marketing optimization, precise territory management, and serviceability insights. Postal Code Suite offers six digit postal code polygons so you can visualize postal code boundaries on a map. Each zip package includes Multiple Enhanced Postal Codes, Forward Sortation Areas, Local Delivery Units and topographic layers, for the province/territory, as well as Canada wide layers for area codes, capital cities, provincial boundaries, municipal boundaries, topographic boundaries, time zones and water layers.

    Published on: 18 September 2018

    Permanent URL: http://hdl.handle.net/11272/10644

    The most complete and up-to-date Canadian postal code database for accurate customer profiling, direct marketing optimization, precise territory management, and serviceability insights. Postal Code Suite offers six digit postal code polygons so you can visualize postal code boundaries on a map. This package comes as a Canada-wide Esri File Geodatabase, including Multiple Enhanced Postal Codes, Forward Sortation Areas, Local Delivery Units and topographic layers, for the province/territory, as well as Canada wide layers for area codes, capital cities, provincial boundaries, municipal boundaries, topographic boundaries, time zones and water layers.

    Published on: 18 September 2018

    Permanent URL: http://hdl.handle.net/11272/10645

    This series of cross-tabulations present a portrait of Canada based on the various census topics. They range in complexity and are available for various levels of geography.

    Published on: 12 September 2018

    Permanent URL: http://hdl.handle.net/11272/10518

    Topic-based Tabulations are complex tables on related themes: Aboriginal Peoples, Age and Sex, Education, Ethnic Origin and Visible Minorities, Families and Households, Housing and Shelter Costs, Immigration and Citizenship, Income and Earnings, Labour, Language, Marital Status, Mobility and Migration, and Place of Work and Commuting to Work. The tables cover varying levels of geography: Canada, province and territory, federal electoral district (FED) (2003 Representation Order), census metropolitan area/census agglomeration (CMA/CA), census division/census subdivision (CD/CSD), census tract (CT), forward sortation area (FSA), dissemination area (DA).

    Published on: 12 September 2018

    Permanent URL: http://hdl.handle.net/11272/TKDEW

    The Tuition and Living Accommodation Costs (TLAC) survey collects data for full-time students at Canadian degree-granting institutions that are publicly funded. This annual survey was developed to provide an overview of tuition and additional compulsory fees, and living accommodation costs for an academic year. The TLAC survey data are used to provide stakeholders, the public and students with annual tuition costs and changes in tuition fees from the previous year contribute to a better understanding of the costs to obtain a degree contribute to education policy development contribute to the Consumer Price Index facilitate interprovincial comparisons facilitate comparisons between institutions. Reference period: Academic year (September 1 to April 30) Collection period: April through June

    Published on: 11 September 2018

    Permanent URL: http://hdl.handle.net/11272/LN2IO

    LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, data on wage rates, union status, job permanency and establishment size are also produced. These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

    Published on: 11 September 2018

    Permanent URL: http://hdl.handle.net/11272/10575

    2011 NIST Language Recognition Evaluation Test Set contains selected training data and the evaluation test set for the 2011 NIST Language Recognition Evaluation. It consists of approximately 204 hours of conversational telephone speech and broadcast audio collected by the Linguistic Data Consortium (LDC) in the following 24 languages and dialects: Arabic (Iraqi), Arabic (Levantine), Arabic (Maghrebi), Arabic (Standard), Bengali, Czech, Dari, English (American), English (Indian), Farsi, Hindi, Lao, Mandarin, Punjabi, Pashto, Polish, Russian, Slovak, Spanish, Tamil, Thai, Turkish, Ukrainian and Urdu. The goal of the NIST (National Institute of Standards and Technology) Language Recognition Evaluation (LRE) is to establish the baseline of current performance capability for language recognition of conversational telephone speech and to lay the groundwork for further research efforts in the field. NIST conducted language recognition evaluations in 1996, 2003, 2005, 2007, and 2009. The 2011 evaluation emphasized the language pair condition and involved both conversational telephone speech (CTS) and broadcast narrow-band speech (BNBS). Further information regarding this evaluation can be found in the evaluation plan which is also included in the documentation for this release. LDC released the prior LREs as: 2003 NIST Language Recognition Evaluation (LDC2006S31) 2005 NIST Language Recognition Evaluation (LDC2008S05) 2007 NIST Language Recognition Evaluation Test Set (LDC2009S04) 2007 NIST Language Recognition Evaluation Supplemental Training Set (LDC2009S05) 2009 NIST Language Recognition Evaluation Test Set (LDC2014S06) Data This release includes training data for nine language varieties that had not been represented in prior LRE cycles – Arabic (Iraqi), Arabic (Levantine), Arabic (Maghrebi), Arabic (Standard), Czech, Lao, Punjabi, Polish and Slovak – contained in 893 audited segments of roughly 30 seconds duration and in 400 full-length CTS recordings. The evaluation test set comprises a total of 29,511 audio files, all manually audited at LDC for language and divided equally into three different test conditions according to the nominal amount of speech content per segment. Data was collected by LDC between 2009 and 2011. The CTS data was obtained using a “claque” collection model in which speakers (claques) called friends or relatives in their social network for a 10-minute conversation in the claque’s native language, such that each call would involve a unique callee. Participants were free to speak on topics of their own choosing. All calls were routed through a telephone collection system at LDC which stored the raw mu-law sample stream into separate audio files for each call side. Auditing and selection were applied to the callee side of every call and to the caller (claque) side in at most one call made by each claque. Contiguous regions containing between 25 and 35 seconds of speech were identified by signal analysis and extracted for manual audit. In some cases, shorter segments were also selected for audit. Broadcast audio was recorded via capture of satellite-receiver MPEG streams or analog audio receivers digitizing at 16 KHz. Platforms for data capture were located at LDC and in Tunisia and India. Recordings were analyzed to extract contiguous segments of narrow-band speech of at least 33 seconds duration; longer segments were trimmed to a maximum length of 35 seconds for audit. All audited segments for training and test are presented as 8-KHz, 16-bit PCM, single-channel audio files with NIST SPHERE headers. The full-length CTS data is the same, except that it consists of two channels.

    Published on: 11 September 2018

    Permanent URL: http://hdl.handle.net/11272/ZSKCP

    BOLT English SMS/Chat was developed by the Linguistic Data Consortium (LDC) and consists of naturally-occurring Short Message Service (SMS) and Chat (CHT) data collected through data donations and live collection involving native speakers of English. The corpus contains 18,429 conversations totaling 3,674,802 words across 375,967 messages. The BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources – discussion forums, text messaging and chat – in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. Data The data in this release was collected using two methods: new collection via LDC’s collection platform and donation of SMS or chat archives from BOLT collection participants. All data collected were reviewed manually to exclude any messages/conversations that were not in the target language or that had sensitive content, such as personal identifying information (PII). All data is presented in UTF-8 XML.

    Published on: 28 August 2018

    Permanent URL: http://hdl.handle.net/11272/DPX6K