Newest Data
Arrivals
Last update: January 20, 2019
Recently added and updated files
    This data collection includes the Supporting Information for the paper: Forest and pasture catchment dissolved organic matter in Southern Amazonia. It contains measurements of precipitation, temperature, oxidation-reduction potential, discharge, baseflow, dissolved organic carbon (DOC), DOC components, slope ratio, and fluorescence indices (FI, HIX, BIX)

    Published on: 15 January 2019

    Permanent URL: http://hdl.handle.net/11272/10707

    The Social Policy Simulation Database and Model (SPSD/M) is a tool designed to assist those interested in analyzing the financial interactions of governments and individuals in Canada. It can help one to assess the cost implications or income redistributive effects of changes in the personal taxation and cash transfer system. As the name implies, SPSD/M consists of two integrated parts: a database (SPSD), and a model (SPSM). The SPSD is a non-confidential, statistically representative database of individuals in their family context, with enough information on each individual to compute taxes paid to and cash transfers received from government. The SPSM is a static accounting model which processes each individual and family on the SPSD, calculates taxes and transfers using legislated or proposed programs and algorithms, and reports on the results. A sophisticated software environment gives the user a high degree of control over the inputs and outputs to the model and can allow the user to modify existing programs or test proposals for entirely new programs. The model comes with full documentation including an on-line help facility. Users and Applications The SPSD/M has been used in hundreds of sites across Canada. These sites have diverse research interests in the area of income tax-transfer and commodity tax systems in Canada as well as varied experience in micro-simulation. Our growing client base includes federal departments, provincial governments, universities, interest groups, corporate divisions, and private consultants. The diverse applications of the SPSD/M can be seen in the following examples of studies and published research reports: Costing out proposals for amendments to the Income Tax Act affecting the tax treatment of seniors and the disabled Estimating the fiscal viability of major personal tax reform options, including three flat tax scenarios The comparison low income (poverty) measures and their effect on the estimates of the number of poor An Analysis of the Distributional Impact of the Goods and Services Tax Married and Unmarried Couples: The Tax Question Taxes and Transfers in Rural Canada Equivalencies in Canadian Public Policy When the Baby Boom Grows Old: Impact on Canada's Public Sector Some potential uses of the model are illustrated by the following list of questions which may be answered using the SPSM: How large an increase in the federal Child Tax Benefit could be financed by allocating an additional $500 million to the program? Which province would have the most advantageous tax structure for an individual with $45,000 earned income, 2 children and $15,000 of investment income? What is the after-tax value of the major federal child support programs on a per child basis, and how are these benefits distributed across family types and income groups? How many individuals otherwise paying no tax would have to pay tax under various minimum tax systems, and what would additional government revenues be? How much money would be needed to raise all low income families and persons to Statistics Canada's low income cut-offs in 2014? How much would average household "consumable" income rise if a province eliminated its gasoline taxes? How much would federal government revenue rise by if there was an increase in the GST rate?

    Published on: 11 January 2019

    Permanent URL: http://hdl.handle.net/11272/10704

    This survey will give a detailed and up-to-date picture of not only what people are eating and what vitamins and minerals they take, but the impact this has on health and well-being. It will also evaluate changes in food consumption, nutrition and health since this survey was last done in 2004. The objectives of the Canadian Community Health Survey - Nutrition are: (1) To collect detailed data on the consumption of foods and dietary supplements among a representative sample of Canadians at national and provincial levels; (2) To estimate the distribution of usual dietary intake in terms of nutrients from foods, food groups, dietary supplements and eating patterns; (3) To gather anthropometric (physical) measurements for accurate body weight and height assessment to interpret dietary intake; (4) To support the interpretation and analysis of dietary intake data by collecting data on selected health conditions and socio-economic and demographic characteristics; and (5) To evaluate changes in dietary intake from the 2004 CCHS - Nutrition. The data collected from the survey will be used by Statistics Canada, Health Canada and the Public Health Agency of Canada, provincial and territorial ministries of health, as well as federal and provincial health planners across the country, industry and researchers and health professionals. Results from our surveys are used extensively for policy-making and program development that affect Canadian communities.

    Published on: 11 January 2019

    Permanent URL: http://hdl.handle.net/11272/10703

    In 1991, the National Task Force on Health Information cited a number of issues and problems with the health information system. To respond to these issues, the Canadian Institute for Health Information (CIHI), Statistics Canada and Health Canada joined forces to create a Health Information Roadmap. From this mandate, the Canadian Community Health Survey (CCHS) was conceived. The CCHS is a cross-sectional survey that collects information related to health status, health care utilization and health determinants for the Canadian population. The survey is offered in both official languages. It relies upon a large sample of respondents and is designed to provide reliable estimates at the health region level every 2 years. The CCHS has the following objectives: Support health surveillance programs by providing health data at the national, provincial and intra-provincial levels; Provide a single data source for health research on small populations and rare characteristics; Timely release of information easily accessible to a diverse community of users; Create a flexible survey instrument that includes a rapid response option to address emerging issues related to the health of the population. The CCHS produces an annual microdata file and a file combining two years of data. The CCHS collection years can also be combined by users to examine populations or rare characteristics. The primary use of the CCHS data is for health surveillance and population health research. Federal and provincial departments of health and human resources, social service agencies, and other types of government agencies use the information collected from respondents to monitor, plan, implement and evaluate programs to improve the health of Canadians. Researchers from various fields use the information to conduct research to improve health. Non-profit health organizations and the media use the CCHS results to raise awareness about health, an issue of concern to all Canadians. The survey began collecting data in 2001 and was repeated every two years until 2005. Starting in 2007, data for the Canadian Community Health Survey (CCHS) were collected annually instead of every two years. While a sample of approximately 130,000 respondents were interviewed during the reference periods of 2001, 2003 and 2005, the sample size was changed to 65,000 respondents each year starting in 2007. In 2012, CCHS began work on a major redesign project that was completed and implemented for the 2015 cycle. The objectives of the redesign were to review the sampling methodology, adopt a new sample frame, modernize the content and review the target population. Consultations were held with federal, provincial and territorial share partners, health region authorities and academics. As a result of the redesign, the 2015 CCHS has a new collection strategy, is drawing the sample from two different frames and has undergone major content revisions. With all these factors taken together, caution should be taken when comparing data from previous cycles to data released for the 2015 cycle onwards.

    Published on: 10 January 2019

    Permanent URL: http://hdl.handle.net/11272/10702

    LFS data are used to produce the well-known unemployment rate as well as other standard labour market indicators such as the employment rate and the participation rate. The LFS also provides employment estimates by industry, occupation, public and private sector, hours worked and much more, all cross-classifiable by a variety of demographic characteristics. Estimates are produced for Canada, the provinces, the territories and a large number of sub-provincial regions. For employees, data on wage rates, union status, job permanency and establishment size are also produced. These data are used by different levels of government for evaluation and planning of employment programs in Canada. Regional unemployment rates are used by Employment and Social Development Canada to determine eligibility, level and duration of insurance benefits for persons living within a particular employment insurance region. The data are also used by labour market analysts, economists, consultants, planners, forecasters and academics in both the private and public sector.

    Published on: 08 January 2019

    Permanent URL: http://hdl.handle.net/11272/10575

    The input-output multipliers are derived from the supply and use tables. They are used to assess the effects on the economy of an exogenous change in final demand for the output of a given industry. They provide a measure of the interdependence between an industry and the rest of the economy. The national and provincial multipliers show the direct, indirect, and induced effects on gross output, the detailed components of GDP, jobs, and imports. Like the supply and use tables, the multipliers are presented at four levels of aggregation: Detail level (236 industries), Link-1997 level (187 industries), Link-1961 level (111 industries) and Summary level (35 industries).

    Published on: 08 January 2019

    Permanent URL: http://hdl.handle.net/11272/10701

    Trees in the one-hectare second growth SI/MAB plot were recensused in 2001. Data were also collected for shrubs and small trees, coarse woody debris (CWD), and small mammals, and are presented here. For results on canopy cover, dwarf mistletoe, and slugs, see the 2001 student report. The SI/MAB plot was established in 1997 following the Smithsonian Institution / Man and Biosphere Program (now Monitoring and Assessment of Biodiversity) or SI/MAB protocol. It is located on the north side of Grappler Inlet near Bamfield, British Columbia on property owned and administered by the Bamfield Marine Sciences Centre. The centre of the plot has GPS coordinates 48°50’19.00”N, 125°08’02.00”W (48.83861111, -125.13388889). The SI/MAB plot is divided into 25 quadrats (20m x 20m). In 2001 quadrats were renumbered for north/south discrepancies. Trees with a DBH (diameter at breast height) ≥ 4 cm were recensused for species, DBH and status (physical condition). For the 2001 census, dead fallen trees were counted as coarse woody debris instead of trees. Shrubs and small trees (maximum DBH of 4cm) were identified and tagged following the Ecological Monitoring and Assessment Network (EMAN) protocol for Shrub and Small Tree Sampling. In 1999, 11 5x5m quadrats were randomly chosen in corners of randomly chosen 20mx20m quadrats in the SI/MAB plot. In 2001, the 11 quadrats were resampled, and 4 more quadrats were sampled. A species list of shrubs and small trees is presented here. Coarse woody debris was measured along three 90m transects within the SI/MAB plot (labeled site A = wet; site B = steep slope; site C = flat, stem exclusion). Methodology is described in the Vegetation Resources Inventory: Ground Sampling Procedures (2007) with decay classes 1-5 defined on page 196. The methodology was modified such that each 90m transect consisted of an equilateral triangle with sides of 30m each (vs. two transects at 90 degrees to each other). Triangles were used to ensure that any orientation biases were accounted for as pieces may have a dominant direction of fall. Data for coarse woody debris and volume calculations are presented here. Three days of live trapping of small mammals (i.e. deer mice) was done using two edge transects parallel but 3m from south and east edges of plot to minimize disturbance. Data is presented here. The majority of data was collected by students in the Coastal Biodiversity and Conservation course taught by Dr. Tom Berman July 23-Aug 31, 2001 with Teaching Assistant Dana Haggarty. Shrub and small tree data from 1999 was collected by students in the Coastal Biodiversity and Conservation course taught by Dr. Tom Berman and Dr. Andre Martel June 7-July 16, 1999.

    Published on: 04 December 2018

    Permanent URL: http://hdl.handle.net/11272/10651

    Avatar Education Portuguese was developed by the University of Pernambuco and consists of approximately 80 minutes of Brazilian Portuguese microphone speech with phonetic and orthographic transcriptions. The data was developed for Avatar Education, an animated virtual assistant designed to enhance communication and interaction in educational contexts, such as online learning. Data The corpus contains 1,400 utterances (700 male and 700 female) of read and spontaneous speech spoken by two professional speakers. Utterances were transcribed at the word level (without time alignments) and at the phoneme level (with time alignment labels). The audio data was recorded at 16kHz (mono, 16-bit) using Pro Tools recording software and stored in flac compressed wav format. The acoustic environment was controlled for background conditions that occur in application environments.

    Published on: 03 December 2018

    Permanent URL: http://hdl.handle.net/11272/FEDUJ

    BOLT Egyptian Arabic Treebank – Discussion Forum was developed by the Linguistic Data Consortium (LDC) and consists of Egyptian Arabic web discussion forum data with part-of-speech annotation, morphology, gloss and syntactic tree annotation. The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources – discussion forums, text messaging and chat – in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. The unannotated Egyptian Arabic source data is released as BOLT Arabic Discussion Forums (LDC2018T10). The annotations in this release follow Penn Arabic Treebank (PATB) annotation guidelines. The PATB project consists of two distinct phases: (a) part-of-speech tagging which divides the text into lexical tokens and gives relevant information about each token such as lexical category, inflectional features and a gloss; and (b) Arabic treebanking, which characterizes the constituent structures of word sequences, provides categories for each non-terminal node and identifies null elements, co-reference, traces and so on. There are two kinds of morphological analysis synchronized in the corpus. LDC Standard Morphological Analyzer (SAMA) Version 3.1 (LDC2010L01) was used for Modern Standard Arabic tokens, and CALIMA (Columbia Arabic Language and dIalect Morphological Analyzer) was used for Egyptian-Arabic tokens. Data This release contains 440,448 tokens before clitics were split and 508,548 tree tokens after clitics were split for treebank annotation. The source material is web discussion forums collected by LDC from various sources. Data is presented in a a variety of UTF-8 encoded text formats, specifically plain text, XML, tdf and Penn Treebank. See the included documentation for more information about the specific formats.

    Published on: 03 December 2018

    Permanent URL: http://hdl.handle.net/11272/M6MDP

    AISHELL-1 was developed by Beijing Shell Shell Technology Co., Ltd. It contains approximately 520 hours of Chinese Mandarin speech from 400 speakers recorded simultaneously on three different devices with associated transcripts. The goal of the collection was to support speech recognition system development in 11 domains, including smart homes, autonomous driving, entertainment, finance and science and technology. Participants read 500 sentences covering the domains; sentences were chosen for their speech and phonetic characteristics. Speakers were recruited from different accent areas across China, including North, South and Yue-Gui-Min regions. There were 214 female speakers and 186 male speakers, constituting 53% and 47% of the database, respectively. Additional demographic information about the participants is included in this release. Data Speech was recorded in a quiet indoor environment on a high fidelity microphone and two mobile phones (Android and iOS). All speech is presented as 16-bit flac compressed wav files; the microphone speech sample rate is 44.1kHz and the phone speech sample rate is 16kHz. Each speech file ranges from approximately 1 second to 14 seconds in length. Transcripts are stored as UTF-8 encoded plain text files and are not time-aligned.

    Published on: 03 December 2018

    Permanent URL: http://hdl.handle.net/11272/MTXY7