Funded Projects

Development and Evaluation of Improved Strategies for Genomic Selection Via Simulations and Empirical Testing

Award #: 2017-67007-26175
Patrick Schnable
Lizhi Wang, Guiping Hu
February 15, 2017
February 14, 2020

The overall goal of the proposed project is to increase the efficiency of crop breeding programs by developing and deploying improved genomic selection strategies that rely on improvements in the selection and mating steps.As a consequence of growing populations, changing diets, and the challenges of climate change, agricultural systems must produce more with less. More means greater demand for agricultural products such as food, feed, energy and fiber. Less means reduced agricultural inputs (at least on a per output basis) such as water, fertilizer, pesticides and a reduced environmental footprint, all on less land. A key tool for making agriculture systems more productive, sustainable and resilient is genetic improvement via breeding.Selection based on favorable (phenotypic selection) traits was central to the domestication of crops and has been used successfully in plant breeding for thousands of years. Recurrent selection is a special case of phenotypic selection that refers to a type of population improvement which includes the following key steps: phenotyping (evaluation), selection, and mating.Phenotyping involves measuring trait values of a population, such as grain and biomass yield, flowering time and color, etc. This step is often time-consuming/labor intensive and many phenotypes cannot be accurately scored without replicated tests.Selection involves the identification of those individuals within the population that exhibit the most favorable trait values.Mating occurs following selection and selected lines are typically randomly or arbitrarily mated to create a new population for phenotyping and selection. Little research has been conducted on the effects of other mating strategies.The phenotyping step can be costly, time consuming, logistically challenging, and/or destructive because it often involves a large number of individuals and many factors that can affect the measurements. Genomic selection (GS), which relies on efficient, high-throughput genotyping technologies (to determine the genetic composition of plants), enables breeders to predict the phenotype of plants based on their genotypes using advanced statistical models and a population of plants of known phenotypes used to "train" the prediction model. This new approach has transformed breeding, because it eliminates or reduces the need for phenotyping.Even moderately accurate genomic prediction can result in substantial cost saving and improve the rate of genetic gain in breeding programs. Previous literature has focused on three areas: the first is to improve the accuracy of phenotype prediction, upon which selection is based; the second is to define other quantitative indicators in lieu of the predicted phenotype to reflect the fitness of the lines from the genetics perspective and their likelihood to produce superior progeny; the third is the selection of an optimal training population that maximizes the accuracy of the prediction model when only a limited number of plants can be phenotyped. Although genomic prediction has transformed breeding, the selection and mating steps have not received equal attention and typically remain the same as in traditional phenotypic selection. Optimizing these two steps is a focus of the proposed project.To design optimal selection and mating strategies and to understand their interactions with other aspect of breeding programs we will make use of simulations within a framework of operations research, which deals with the application of advanced analytical methods to help make better decisions. Significantly, we will also develop user-friendly tools to allow breeders to optimize these parameters for their own breeding scenarios. The new selection and mating strategies that we propose will be designed using advanced mathematical programming and optimization techniques, fully tailorable to reflect user preferences with respect to the trade-offs among cost, time, and probability of success.

Low-cost nitrate sensors to populate genotype-informed yield prediction models for next generation breeders

Award #: 2017-67013-26463
Patrick Schnable
Sotirios Archontoulis, Mike Castellano, Liang Dong
April 1, 2017
March 31, 2019

Our civilization depends on continuously increasing levels of agricultural productivity, which itself depends on (among other things) the interplay of crop varieties and the environments in which these varieties are grown. Hence, to increase agricultural productivity and yield stability, it is necessary to develop improved crop varieties that deliver ever more yield, even under the variable weather conditions induced by global climate change, all the while minimizing the use of inputs such as fertilizers that are limiting, expensive or have undesirable ecological impacts. By coupling a network of innovative, low-cost nitrate sensors across multiple environments within the heart of the corn belt and advanced cropping systems modeling (APSIM, the most widely used modeling platform), the proposed research will enhance our understanding of and ability to predict yield and Genotype x Environment interactions. The integration of nitrate (N) dynamics into this model is expected to greatly increase the accuracy of its predictions. Because we will also integrate genotypes into this model, the proposed research outlines a new and innovative approach for breeding crops that exhibit increased yields and yield stability. It will be possible to readily translate this approach to other crops. By generating data on nitrate concentrations in soil and in planta at unprecedented spatial and temporal resolution at multiple sites with different soil characteristics and weather, the proposed research will also improve our understanding of N cycles in both the soil and plant. Although essential to plant growth and high yields, when over-applied N can result in a variety of serious negative externalities, some of which are currently the subject of high-impact litigation in Iowa. Project outcomes have the potential to provide guidance to farmers about how to apply sufficient but not excessive amounts of N fertilizer, resulting in both economic benefits to farmers and positive environmental externalities.Our focus on creating a new approach to breeding for yield stability meets the USDA sustainability goals to "satisfy human food and fiber needs" and "sustain the economic viability of farm operations". Our focus on nitrogen meets the USDA sustainability goals to "enhance environmental quality" and to "make the most efficient use of nonrenewable resources...and integrate, where appropriate, natural biological cycles and controls". More specifically, this proposal addresses the NIFA-Commodity Board co-funded priority for "development and application of tools to predict phenotype from genotype" and the "the development of high-throughput phenotyping equipment and methods".

Genetic networks regulating structure and function of the maize shoot apical meristem

Award #: 1238142
Michael Scanlon
Patrick Schnable, Marja Timmermans, Jianming Yu, Xiaoyu Zhang
February 1, 2013
January 31, 2018

The shoot apical meristem (SAM) is responsible for development of all above ground organs in the plant. SAM structure and function correlates with agronomically-important adult traits in the maize plant, and is also affected by planting density and shade stresses induced by agricultural environments. The ultimate goal of this project is to increase understanding of the regulatory networks controlling SAM structure and function and the responses of these networks to environmental stresses. The specific objectives are to: 1) describe the SAM allometric space in maize and its relatives using nanoscale computer tomographic scanning to provide 3-dimensional images of the phenotypic diversity of SAM structure and identify adult plant traits correlated with SAM structure; 2) identify differentially expressed genes in SAM size/shape outliers and mutants with abnormal SAM structures and generate a co-expression network of key genes implicated during SAM structure and function; 3) perform quantitative genetic analyses to identify specific variations within genes that correlate with variations in SAM structure/function and adult plant traits, and test functions of 40 key genes using reverse genetic aaproaches; 4) analyze the shade avoidance response and its effects on SAM structure and function; and 5) investigate epigenetic changes of SAM functional domains in response to shade avoidance using novel protocols that distinguish the stem cell organizing regions from the organogenic domains in the maize SAM.

These studies will provide the framework for scientific training and the public release of original data. Undergraduates at Truman State University, a small liberal arts institution, will be trained in morphological and LM-RNAseq analyses of maize mutants. REU students and undergraduates enrolled in Plant Physiology courses at Cornell University will participate in physiological experiments. This project will generate extensive transcriptomic data and vector constructs for tissue-specific epigenetic analyses which will be available to the scientific research community. Molecular markers and phenotypic data for diverse maize lines will be supplied to Panzea ( Genetic mapping associations, physiological shade-avoidance response data, transcriptomic and phenotypic data will be curated at MaizeGDB (, and seed stocks for maize shoot mutants and SAM size variants will be released through the Maize Genetics Cooperation Stock Center (

Hierarchical Modeling and Parallelized Bayesian Inference for the Analysis of RNAseq Data

Award #: 1R01GM109458-01
Daniel Nettleton
Peng Liu, Jarad Niemi, Patrick Schnable
September 1, 2013
May 31, 2017 -- Extended thru May 31, 2018

This proposal focuses on the development of hierarchical models and parallelized Bayesian inference for the analysis of RNA sequencing (RNAseq) data. Special emphasis is placed on gene expression profiling of parental inbred lines and their hybrid offspring for the discovery of key genes underlying heterosis, the genetic phenomenon otherwise known as hybrid vigor. The project will be led by a collaborative team of researchers with expertise in the analysis of high-dimensional gene expression data, Bayesian inference, bioinformatics, biology, computational methods, genetics, genomics, and statistics. The proposed research provides new tools for the analysis of high-dimension and low-sample-size count data generated by RNAseq technology. Hierarchical modeling allows for flexible information sharing across dimensions to extract as much information as possible from data. Parallel methods for Bayesian inference harness the power of modern computing to produce comprehensive results in a timely manner. Specific methods will be developed for (i) the identification of genes that exhibit expression heterosis, (ii) the detection of expressed and non-expressed genes, and (iii) the discovery of differential allele usage in hybrids. These methods will provide a deeper understanding of the molecular mechanisms of heterosis and lead to the discovery of key genes whose expression patterns provide hybrids with advantages over their parents. This information can be used to efficiently predict which of thousands of possible crosses will result in top performing hybrids. In addition to the specific methods mentioned above, hierarchical generalized linear models for the simultaneous analysis of tens of thousands of response variables will be developed. This work will permit the analysis of RNAseq data from complex designs with multiple sources of variability and will greatly extend the range of applicability for the funded research to encompass a variety of challenges in high-dimensional data analysis.

Public Health Relevance: The proposed work will provide medical researchers with advanced tools for studying the functions of genes in complex biological systems. The enhanced understanding of gene functions obtained with the developed tools can deepen understanding of diseases and lead to new treatments for the improvement of public health.

Development of a PhenoNet - an Integrated Robotic Network forĀ Field-basedĀ Studies of Genotype x Environment Interactions

Award #: 1625364
Lie Tang
Patrick Schnable
Srikant Srinivasan
September 15, 2016
August 31, 2019

An award is made to Iowa State University to develop and deploy PhenoNet - an integrated robotic network for field-based studies of genotype crossed with environment (GxE) interactions. The core component of PhenoNet is a set of PhenoBots; lightweight robots that are able to autonomously navigate between crop rows using GPS and local range sensors while employing advanced sensing technologies to phenotype crop plants. The PhenoBots can measure indicators such as stalk size, plant height, leaf angle and tassel/inflorescence properties over time. The robots will be optimized for maize research and can be easily adapted for other row crops. The network (PhenoNet) is a universal platform which enables comprehensive field-based research on genotype and environment interactions. The broader impacts of this project are threefold. First, PhenoNet will have an important impact on society as understanding genome X environment interactions will help address the need for sufficient food, feed, and fiber for the planet's growing population, which is vital in an ever-changing environment. PhenoNet will bring "big data" more deeply into agriculture by cementing connections between plant scientists and engineers in their efforts to reach this goal. Second, this project is synergistic with the NSF-NRT project, "Predictive Phenomics of Plants", recently awarded to Iowa State University. The research and engineering outlined in this Major Research Instrumentation project will provide an outstanding opportunity for students from engineering disciplines, computer science, statistics, and agronomy to collaborate and engage in state-of-the-art interdisciplinary research. This project will also advance the training of current engineers and plant scientists who are experienced with networking, robotics and agronomy. Third, this project will reach out to underrepresented groups by targeting minority-serving institutions for student recruitment and will work with the Society of Women Engineers and other similar groups in seeking women participants to help meet the NSF-NRT award's efforts to broaden participation.

The PhenoBots are an important and essential advancement in the fields of agriculture and technology because they more efficiently characterize tall plants over time to their maturity. Previous technology and platforms are either incapable of, or are greatly hindered by various constraints. The design improvements of the Phenobots enable the robots to be more robust, stable, lightweight, integrated and economical. This creates a pathway for transformative research as it enables in situ, non-invasive monitoring of the traits of tall crops, like maize, over time. PhenoNet will consist of a network of four PhenoBots, which will be deployed by plant scientists in Iowa, Kansas, Minnesota, Nebraska, and Wisconsin. The data generated from high throughput phenotyping will address whether it is possible to predict the phenotype of a given genotype in a specified environment.

Root Genetics in the Field to Understand Drought Adaptation and Carbon Sequestration

John McKay
Parker Antin, Randy A. Bartels, Thomas Borch, Pedro Andrade Sanchez, Francesca Cotrufo, Andrew French, Michael Ottman, Sangmi Palickara, Keith Paustian, Patrick Schnable, Chris Topp, Chris Turner, Matthew Wallenstein, Jianming Yu
July 3, 2017
July 2, 2020

Critical Need: Plants capture atmospheric carbon dioxide (CO2) using photosynthesis, and transfer the carbon to the soil through their roots. Soil organic matter, which is primarily composed of carbon, is a key determinant of soil's overall quality. Even though crop productivity has increased significantly over the past century, soil quality and levels of topsoil have declined during this period. Low levels of soil organic matter affect a plant's productivity, leading to increased fertilizer and water use. Automated tools and methods to accelerate the process of measuring root and soil characteristics and the creation of advanced algorithms for analyzing data can accelerate the development of field crops with deeper and more extensive root systems. Crops with these root systems could increase the amount of carbon stored in soils, leading to improved soil structure, fertilizer use efficiency, water productivity, and crop yield, as well as reduced topsoil erosion. If deployed at scale, these improved crops could passively sequester significant quantities of CO2 from the atmosphere that otherwise cannot be economically captured.

Project Innovation + Advantages: Colorado State University (CSU) will develop a high-throughput ground-based robotic platform that will characterize a plant's root system and the surrounding soil chemistry to better understand how plants cycle carbon and nitrogen in soil. CSU's robotic platform will use a suite of sensor technologies to investigate crop genetic-environment interaction and generate data to improve models of chemical cycling of soil carbon and nitrogen in agricultural environments. The platform will collect information on root structure and depth, and deploy a novel spectroscopic technology to quantify levels of carbon and other key elements in the soil. The technology proposed by the Colorado State team aims to speed the application of genetic and genomic tools for the discovery and deployment of root traits that control plant growth and soil carbon cycling. Crops will be studied at two field sites in Colorado and Arizona with diverse advantages and challenges to crop productivity, and the data collected will be used to develop a sophisticated carbon flux model. The sensing platform will allow characterization of the root systems in the ground and lead to improved quantification of soil health. The collected data will be managed and analyzed through the CyVerse "big data" computational analytics platform, enabling public access to data connecting aboveground plant traits with belowground soil carbon accumulation.

Potential Impact: If successful, developments made under the ROOTS program will produce crops that will greatly increase carbon uptake in soil, helping to remove CO2 from the atmosphere, decrease nitrous oxide (N2O) emissions, and improve agricultural productivity.

  • Security: America's soils are a strategic asset critical to national food and energy security. Improving the quality of soil in America's cropland will enable increased and more efficient production of feedstocks for food, feed, and fuel.
  • Environment: Increased organic matter in soil will help reduce fertilizer use, increase water productivity, reduce emissions of nitrous oxide, and passively sequester carbon dioxide from the atmosphere.
  • Economy: Healthy soil is foundational to the American economy and global trade. Increasing crop productivity will make American farmers more competitive and contribute to U.S. leadership in an emerging bio-economy.

A Scalable Framework for Visual Exploration and Hypotheses Extraction of Phenomics Data using Topological Analytics

Collaborative Research - Iowa State Award #: 1661475 ; Washington State University Award #: 1661348
Anantharaman Kalyanaraman
Bala Krishnamoorthy, Zhiwu Zhang, Bei W. Phillips, Patrick Schnable
August 1, 2017
July 31, 2020

Understanding how gene by environment interactions result in specific phenotypes is a core goal of modern biology and has real-world impacts on such things as crop management. Developing and managing successful crop practices is a goal that is fundamentally tied to our national food security. By applying novel computational visual analytical methods, this project seeks to identify and unravel the complex web of interactions linking genotypes, environments and phenotypes. These methods will first need to be designed and developed into usable software applications that can handle large volumes of crop phenomics data. High-throughput sensing technologies collect large volumes of field data for many plant traits, such as flowering time, related to crop development and production. The maize cultivars used here come from multiple genotypes that have been grown under a variety of environmental conditions, in order to give the widest range of conditions for understanding the interactions. The resulting data sets are growing quickly, both in size and complexity, but the analytical tools needed to extract knowledge and catalyze scientific discoveries have significantly lagged behind. The methodologies to be developed in this project represent a systematic attempt at bridging this rapidly widening divide. The project is inherently interdisciplinary, involving close research partnerships among computer scientists, plant scientists, and mathematicians. The research outcomes will be tightly integrated with education using a multipronged approach that includes, among others, postdoctoral and student training (graduates and undergraduates), curriculum development for a new campus-wide interdisciplinary undergraduate degree in Data Analytics, conference tutorials for training phenomics data practitioners, and contribution to the recruitment and retention of underrepresented minorities (particularly women) in STEM fields through the Pacific Northwest Louis Stokes Alliance for Minority Participation.

This project will lead to the design and development of a new, scalable, visual analytics platform suitable for hypothesis extraction and refinement from complex phenomics data sets. Focus on hypothesis extraction is critical in the context of phenomics data sets because much of the high-throughput sensing data being generated in crop fields are generated in the absence of specifically formulated hypotheses. Extracting plausible hypotheses from the data represents an important but tedious task. To this end, this project will apply and develop new capabilities using emerging advanced algorithmic principles, particularly from the branch of mathematics called algebraic topology that studies shapes and structure of complex data. The research objectives are three-fold. First, the project will employ and extend emerging algorithmic techniques from algebraic topology to decode the structure of large, complex phenomics data. Second, an interactive visual analytic platform will be developed to facilitate knowledge discovery using the extracted topological structures. Lastly, the quality and validity of a new visual analytic platform designed by this team will be tested using real-world maize data sets as well as simulated inputs as testbeds. The developed framework will encode functions for scientists to delineate hypotheses of three kinds: i) genetic characterization of single complex traits; ii) genetic characterization of multiple traits that share potentially pleiotropic effects; and iii) decoding and detailed characterization of genotype-by-environmental interactions, in particular, through a collaborative pilot study of maize flowering and growth traits. The expected significance of the proposed work is that biologists will be able to extract different types of testable hypotheses from plant phenomics data sets by employing a new class of visual analytic tools, and thus obtain a deeper understanding of the interactions among genotypes, environments and phenotypes. The project is potentially transformative in two ways: i) it will introduce advanced mathematical and computational principles into mainstream phenomic data analysis; and ii) it will usher in a new era where biologists spearhead data-driven hypothesis extraction and discovery with the aid of interactive, informative, and intuitive tools. The project will have a direct impact on the state of software in phenomics for fundamental data-driven discovery. To facilitate broader community adoption, the project will integrate the tools into the CyVerse Institute, and to a community phenomics software outlet. It will also lead to the development of automated scientific workflows. Project website:

NRT-DESE: P3 - Predictive Phenomics of Plants

Award #: 1545453
Julie Dickerson
Patrick Schnable, Theodore Heindel, Carolyn Lawrence-Dill
September 1, 2015
August 31, 2020

NRT- DESE: Predictive Phenomics of Plants (P3)

New methods to increase crop productivity are required to meet anticipated demands for food, feed, fiber, and fuel. Using modern sensors and data analysis techniques, it is now feasible to develop methods to predict plant growth and productivity based on information about their genome and environment. However, doing so requires expertise in plant sciences as well as computational sciences and engineering. This National Science Foundation Research Traineeship (NRT) award to Iowa State University will bring together students with diverse backgrounds, including plant sciences, statistics, and engineering, and provide them with data-enabled science and engineering training. The collaborative spirit required for students to thrive in this unique intellectual environment will be strengthened through the establishment of a community of practice to support collective learning. This traineeship anticipates preparing forty-eight (48) master's and doctoral students, including twenty-eight (28) funded doctoral students, with the understanding and tools to design and construct crops with desired traits that can thrive in a changing environment.

Understanding how particular genetic traits result in given plant characteristics under specific environmental conditions is a core goal of modern biology that will facilitate the efficient development of crops with commercially useful characteristics. Plant characteristics are influenced by genetics and a wide range of environmental factors, including, for example, rainfall, temperature and soil types. Developing methods to effectively integrate these diverse inputs that take advantage of existing biological, statistical, and engineering knowledge will be a key area in this research and training program that will bring together faculty from eight departments. Trainees will engage in cutting-edge research and development areas involving direct data collection and analysis from living plants, including sensor development, high throughput robotic technology, and biological feature extraction through image analysis. This traineeship will use the T-training model to provide students with training across a broad range of disciplines while developing a deep technical expertise in one area. This expertise, in combination with soft skills development, will enable the trainees to work across organizational and cultural boundaries as well as scientific disciplines. To develop understanding of how to share knowledge with diverse groups, the program will provide students with training beyond traditional coursework and research through activities that will develop advanced communication and entrepreneurship skills. Additionally, internship opportunities in industry, national labs, and other settings will equip trainees to choose among the diverse career paths available to scientists and engineers.

The NSF Research Traineeship (NRT) Program is designed to encourage the development and implementation of bold, new, potentially transformative, and scalable models for STEM graduate education training. The Traineeship Track is dedicated to effective training of STEM graduate students in high priority interdisciplinary research areas, through the comprehensive traineeship model that is innovative, evidence-based, and aligned with changing workforce and research needs.

Parallel Algorithms and Software for High-Throughput Sequence Assembly

Award #: 1162472
Srinivas Aluru
Karin Dorman, Patrick Schnable
May 1, 2012
October 31, 2013 -- Extended thru April 30, 2018

High-throughput next-generation DNA sequencing technologies (NGS) are causing a major revolution in life sciences research by allowing rapid and cost-effective sampling of genomes and transcriptomes (expressed genomic sequences). Assembly of genomes and transcriptomes from billions of such randomly sampled sequences is an important problem in computational biology. While significant strides have been made, much work remains in addressing the diverse and rapidly emerging platforms, improving assembly quality, and scaling to both large-scale data sizes and large genomes.

This project will harness the power of high performance computing to develop effective solutions for sequence assembly. It will lead to the development of scalable, efficient parallel algorithms and a parallel integrated software framework for genome and transcriptome assembly. The project seeks to advance the state of the art by targeting important unsolved problems such as hybrid assembly of sequences from multiple NGS platforms, making fundamental algorithmic advances to improve assembly quality, and conducting an in-depth effort at parallel algorithms development for the entire gamut of problems that arise in connection with assembly. It will be carried out by an interdisciplinary team of investigators, in partnership with leading NGS manufacturers and academicians involved in large plant genome sequencing projects.

The project will lead to the release of a scalable parallel software package for sequence assembly that will be made available to the scientific community. Postdoctoral and graduate students will be trained in computer science driven interdisciplinary research and in writing efficient high performance computing software. The project will influence curriculum development and will lead to educational materials in bioinformatics for next-generation sequencing.