Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
Mei Zhan, Matthew M. Crane, Eugeni V. Entchev, Antonio Caballero, Diana Andrea Fernandes de Abreu, QueeLim Ch’ng, Hang Lu

Abstract

Increasingly, the capacity for high-throughput experimentation provided by new imaging modalities, contrast techniques, microscopy tools, microflui-dics and computer controlled systems shifts the experimental bottleneck from the level of physical manipulation and raw data collection to automated recognition and data processing. Yet, despite their broad importance, image analysis solutions to address these needs have been narrowly tailored. Here, we present a generalizable formulation for autonomous identification of specific biological structures that is applicable for many problems. The process flow architecture we present here utilizes standard image processing techniques and the multi-tiered application of classification models such as support vector machines (SVM). These low-level functions are readily available in a large array of image processing software packages and programming languages. Our framework is thus both easy to implement at the modular level and provides specific high-level architecture to guide the solution of more complicated image-processing problems. We demonstrate the utility of the classification routine by developing two specific classifiers as a toolset for automation and cell identification in the model organism Caenorhabditis elegans. To serve a common need for automated high-resolution imaging and behavior applications in the C. elegans research community, we contribute a ready-to-use classifier for the identification of the head of the animal under bright field imaging. Furthermore, we extend our framework to address the pervasive problem of cell-specific identification under fluorescent imaging, which is critical for biological investigation in multicellular organisms or tissues. Using these examples as a guide, we envision the broad utility of the framework for diverse problems across different length scales and imaging methods.

Author Summary

As a result, automated image processing is increasingly necessary to extract relevant data in an objective, consistent and time-efficient manner. While image processing tools have been developed for general problems that affect large communities of biologists, the diversity of biological research questions and experimental techniques have left many problems unaddressed. Moreover, there is no clear way in which non-computer scientists can immediately apply a large body of computer vision and image processing techniques to address their specific problems or adapt existing tools to their needs. Here, we address this need by demonstrating an adaptable framework for image processing that is capable of accommodating a large range of biological problems with both high accuracy and computational efficiency. Moreover, we demonstrate the utilization of this framework for disparate problems by solving two specific image processing challenges in the model organism Caenorhabditis elegans. In addition to contributions to the C. elegans community, the solutions developed here provide both useful concepts and adaptable image-processing modules for other biological problems. This is a PLOS Computational Biology Methods paper

Introduction

On the clinical side, new and augmented imaging modalities and contrast techniques have increased the types of information that can be garnered from biological samples [1]. Similarly, many tools have recently been developed to enable new and accelerated forms of biological experimentation in both single cells and multicellular model organisms [2—10]. Increasingly, the capacity for high-throughput experimentation provided by new optical tools, microfluidics and computer controlled systems has eased the experimental bottleneck at the level of physical manipulation and raw data collection. Still, the power of many of these toolsets lies in facilitating the automation of experimental processes. The ability to perform real-time information extraction from images during the course of an experiment is therefore a crucial computational step to harnessing the potential of many of these physical systems (Fig 1A). Even when offline data analysis is sufficient, the capability of these systems to generate large, high-content datasets places a large burden on the speed of the downstream analysis. Automated image processing and the use of supervised learning techniques have the potential for bridging this gap between raw data availability and the limitations of manual analysis in terms of speed, objectivity and sensitivity to subtle changes [11]. In this area, many computer vision techniques, including some general object detection strategies, have been developed to address the detection and recognition of faces, vehicles, animals and household objects from standard camera images [12—17]. While this body of literature solves complex recognition problems within the domain of everyday objects and images, it is not clear how or whether they are generalizable to the imaging modalities and object detection problems that arise in biological image processing. While these techniques have garnered some important but limited adoption in biological applications[18—28], there is not a systematic methodology by which these computational approaches can be applied to solving common problems in mining biological images [29]. Thus, the development or adaptation of these tools for specific problems has thus far been relatively opaque to many potential end-users and require a high degree of expertise and intuition.

Specifically, extraction of meaningful information from biological images usually involves the identification of particular structures and calculation of their metrics, rather than the usage of global image metrics. Depending on the specimen and the experimental platform, this may range in scale from molecular or subcellular structure to individual cells or tissue structures within a heterogeneous specimen, or entire organisms. While toolsets have already been developed to address some common needs in biology [19—22, 24, 25, 30—32] and while powerful algorithmic tools exist for pattern and feature discrimination and decision-making [33—35] , there are still many unaddressed needs in biological image processing.

As opposed to finished, ready-to-use toolsets, which address a limited problem definition by design, the workflow we propose has the power to simultaneously address the need for accuracy, problem-specificity, and generalizability; end-users have the opportunity to choose platforms and customize as needed. We demonstrate the power of this approach for solving disparate biological image processing problems by developing two widely relevant toolsets for the multicellular model organism, Caenorhabditis elegans. To address the problems of extracting region, tissue and cell-specific information Within a multicellular context, we developed an image processing algorithm to distinguish the head of the worm under bright-field imaging and a set of tools for specific cell identification under fluorescence imaging. These developments demonstrate the flexibility of our framework to accommodate different imaging modalities and disparate biological structures. The resulting toolsets contribute directing to addressing two fundamental needs for automated studies in the worm and contribute specific concepts and modules that may be applied to a broader range of biological problems.

Results

To identify biological structures of interest, images are first pre-pro-cessed to condition the data and generate candidates for the structure of interest. In general, candidates can either be individual pixels or discrete segmented regions generated via a thresh-olding algorithm applied during preprocessing. To accommodate different image acquisition setups and acquisition parameters, we propose the use of an image calibration factor, C, in preprocessing and in all subsequent feature calculation steps. This calibration factor characterizes the relationship between the digitized and real-world length scales for a specific experimental setup and can be used to normalize feature and parameter scaling in all image processing steps (Materials and Methods, SI Table).

The candidate particles are quantitatively described by two distinct sets of descriptive features. These features may be derived from intuitive metrics designed to mimic human recognition or abstractions that capture additional information [33, 36]; they are mathematical descriptors that help delineate the structures of interest from other candidates and will form the basis for classification. Separation of features into two distinct layers of classification in our proposed scheme serves three purposes. First, it permits conceptual separation of intrinsic and extrinsic or relational properties of a biological structure. Second, it permits the inclusion of higher level descriptions of the relationships between structures identified from the first layer of classification. Finally, it allows computationally expensive features to only be associated with the second layer, which reduces the number of times these features must be calculated as low probability candidates have already been removed. Accordingly, the first layer of classification uses computationally inexpensive, intrinsic features of the candidates to generate a smaller set of candidates. The second layer addresses additional complexity, and uses computationally more expensive features or extrinsic features describing the relationship between candidates, but only on a smaller number of candidates. This two-tier scheme allows significant reduction in computational time. At each layer of classification, a trained classifier is used to make a decision about the candidate’s identity based on the features calculated. In this work, we chose to use support vector machines for all classification steps because of its insensitivity to specific conditioning of feature sets and therefore being more robust [34, 37]. We note that when constraints of the feature sets are well known, other models including Bayesian discriminators and heuristic thresholds can also be used. In general, the workflow architecture presented in Fig 1B permits the identification of generic biological structures and balances the capability for complexity with computational speed. We describe here two distinct applications using this two-tier classification methodology.

Bright-Field Head Identification

Thus, image processing for orientation along the anterior-posterior axis of the worm is crucial to enabling the full potential of many of the toolsets for high-resolution imaging and physical, chemical and optical manipulation of the worm. To address this need, many ad hoc tactics such as the presence of fluorescent markers [5, 24, 38, 39] or the assumption of forward locomotion in freely moving worms [22, 25, 32, 40—43] are often used delineate between the head and tail and orient the anterior-posterior axis. However, reliance on exogenously introduced fluorescent markers can necessitate time-consuming treatment of the worms under study and can spatially interfere with other fluorescent readouts of interest. While the assumption of forward locomotion does not require additional treatments, it is only useful in experimental contexts where worms are freely mobile. Therefore, these tactics lack general applicability to many high resolution imaging experiments, where worms may lack appropriate fluorescent markers or are physically restrained or chemically immobilized. Additionally, not relying on fluorescent markers avoids unnecessary photo-bleaching of the sample before data acquisition and affords robustness against age and condition-specific autofluorescence in the worm body [44].

Bright-field is a commonly available imaging modality and often used for location and positioning of specimens prior to fluorescent imaging. While the shape of the head and the tail differs somewhat, these differences are difficult to detect due to low contrast and may be physically obscured by some experimental platforms [38]. Instead, the head of the worm is more clearly distinguished by the presence of the pharynx, which has a stereotypical morphology that includes a biological structure for masticating food called the grinder [45]. As shown in Fig 2A, the grinder is a dark, uniquely shaped, high-contrast structure under bright field imaging. The grinder can also be easily resolved by most digital cameras at imaging magnifications above 20X and maintains its shape and integrity for several days of early adulthood [46, 47]. This stereotypical feature of the head, which is relatively consistent in the worm post-developmentally, can thus serve as the target biological structure for our two-layer classification scheme.

Methods), although similar images on agar pad would also suffice (81 Fig). Following our architecture in Fig 1B from left to right, application of the scheme involves three major steps: preprocessing of raw images to generate candidates for the structure of interest, selection and calculation features to describe these candidates at both layers of classification, and optimization and training of the two classifiers based on these feature sets.

We employ the Niblack local thresholding procedure in both this and our subsequent cell identification application to robustly segment particles, despite the potential variability in local lighting, texture and background tissue intensity as there would be in different imaging setups (Materials and Exp. Methods). Following initial thresholding, preliminary filtering of the binary particles is then applied to remove segmented regions that are either too small (less than 37.5 pmz) or too large (greater than 100 m2) to reduce downstream computation (B W1 in Fig 2B). The remaining particles are processed through our two-layer classification scheme to detect the presence of the pharyngeal grinder.

In the first layer of classification, intrinsic and computationally inexpensive metrics of the particles are computed and used as features (Fig 2C and S2 Fig) in classification of the grinder shape. These features represent a combination of simple, intuitive geometric features, such as area and perimeter, in addition to higher level measures of the object geometry and invariant moments suitable for shape description and identification [36]. Training and application of a classifier with this feature set eliminates candidates on the basis of intrinsic shape (B W2 in Fig 2C). However, the resulting false positives in Fig 2D show that the information within these shape metrics is insufficient to distinguish the grinder with high specificity.

Specifically, we note that the grinder resides inside the terminal bulb of the pharynx, which is characterized by a distinct circular region of muscular tissue (Fig 2A). Based on this observation, we define second layer features based on distributions of particle properties within a circular region around the centroid of the grinder candidate particle (S3 Fig). Noting that the pharyngeal tissue is characterized by textural ranges in the radial direction and relative uniformity in the angular direction, we build features sets describing both the radial and angular distributions the surrounding particles (S3 Fig).

To allow for supervised training of both the layer 1 and layer 2 classifiers, we annotated a selection of images (n = 1,430) by manually identifying particles that represent the pharyngeal grinder. The classifiers can then be trained to associate properties of the feature sets with the manually specified identity of candidate particles. However, in addition to informative feature selection and the curation of a representative training set, the performance of SVM classification models is subject to several parameters associated with the model itself and its kernel function [34, 48]. Thus, to ensure good performance of the final SVM model, we first optimize model parameters based on fivefold cross-validation on the training set (Fig 3A and 3B, Materials and Methods).

In our application, for the first layer of classification, the goal is to eliminate the large majority of background particles while retaining as many grinder particles in the candidate pool as possible for refined classification in the second layer. In other words, we aim to minimize false negatives while tolerating a moderate number of false positives. Therefore, we optimize the SVM parameters via the minimization of an adjusted error rate that penalizes false negatives more than false positives (Fig 3B). We show that with an appropriate parameter selection, the first layer of classification can eliminate over 90% of background particles while retaining almost 99% of the true grinder particles for further analysis downstream (Fig 3B).

A high degree of overlap between the distributions of the grinder and background particles and high error rates associated with the trained SVM in this visualization suggest that shape-intrinsic features are insufficient to fully describe the grinder structure. Nevertheless, the first layer of classification enriches the true grinder structure candidates in the training set from roughly 6.2% of the original particle set to 40% of the particle set entering into the second layer of classification (Fig 3C). This enriched set of candidate particles is used to optimize and train the second layer of classification in a similar manner (Fig 3D). With appropriate parameter selection, we show that the second layer of classification is capable of identifying the grinder with sensitivity and specificity above 95% (Fig 3E). We train the final layer 2 classifier with the reduced training set and these optimized parameters to yield high classification performance in combination with layer 1 (Fig 3F). Changes in experimental conditions, the genetic background of the worms under study or changes to the imaging system, can cause significant variation in the features, and thus degrade the classifier performance due to overfitting that fails to take into account experimental variation (Fig 3). To account for this potential variability, we include worms imaged at different ages and food conditions in the training set of images. To validate the utility and efficacy of the resulting classification scheme in a real-life laboratory setting, we analyze its performance on new data sets that were not used in training the classifier. First, in spite of morphological changes due to experimental conditions (Fig 4A), we show the resulting classification scheme operates with consistently high performance in distinguishing the head and the tail of the worm in the new data sets (Fig 4B). Second, while the training set only includes wildtype worms imaged under different conditions, the morphology and texture of the worm is also subject to genetic alteration (Fig 4C). To see whether our classification scheme can accommodate some of this genetic variability, we validate the classification scheme against a mutant strain (dpy-4(—)) with large morphological changes in the body of the worm (Fig 4C). Finally, changes in the imaging system can alter the digital resolution of biological structures of interest (Fig 4E). We show that the inclusion of a calibration factor adjusting for the pixel to micron conversion of the imaging system is sufficient for maintain classifier operation across a twofold change in the resolution of the imaging system (Fig 4F). Thus, this calibrated classification scheme can be easily adapted to systems with different camera pixel formats via the calculation of a new calibration factor.

Identification of Fluorescently Labeled Cells

Although fluorescent staining or tagging techniques can be used to target structures or molecules of interest, they often cannot offer perfect specificity. Furthermore, biological specimens can also include autofluorescent elements that confound the analysis of fluorescent images. Thus, sifting relevant information from fluorescent images can pose nontrivial image processing problems where background fluorescent objects can have similar intensities or spatial locations.

Existing toolsets permit fluorescent labeling of different genetic outputs of subsets of cells and tissues. However, fluorescent tags also often label multiple cells, cellular processes or tissue structures that must be distinguished to address specific biological questions. Moreover, C. elegans exhibits significant gut autofluorescence that varies in intensity and can obscure the identification of fluorescent targets throughout the length of the worm [44]. Here, we demonstrate the use of our framework to address these common challenges in fluorescent image processing, using neuron identification in the worm as a broadly useful example.

Fig 5B shows a corresponding set of bright field and fluorescent images illustrating the positioning of the neuron pair within the head region of the worm. In addition to the cell bodies of interest, the raw fluorescent image also shows cellular processes and autofluorescent granules in the gut of the worm that can confound cell-specific image analysis. Similar to our approach for pharyngeal grinder detection in Fig 2B, we begin building our cell identification toolset via preprocessing of the raw images by maximum intensity projection, Niblack thresholding and preliminary filtering of the resulting candidate particles (Fig 5C, Materials and Exp. Methods). In the selection of features for both layers of classification, we note that the layer 1 feature set we developed for the detection of the pharyngeal grinder can be generally applied to the description of particle shape within other contexts (82 Fig). Using this feature set, we optimize and train a layer 1 SVM classifier using a manually annotated training set (n = 218) (S4A Fig, Materials and Methods) and show that it is sufficient for identifying cellular regions with relatively high sensitivity and specificity (Fig 5D and 84A Fig).

To make a final identification of a true cell pair, we apply a second layer of classification based on the relational properties of potential pairs of particles that pass layer 1 classification (Fig 6B and SS Fig). To construct our layer 2 classifier, we optimize and train an SVM model based on these pairwise relational features (S4B Fig). We note that while the relational features we utilize are computa-tionally simple, embedding relational features on the second layer of classification dramatically reduces the size of the paired candidate set. For example, for detection of cell pairs amongst n 11 particles, there are (2 > = 2W2)! possible candidate pairs that require feature calculation. Validating the resulting cell pair classifier against new test images, we find robust single cell-pair detection in the majority of cases (Fig 6C, left). However, in a minority of cases, multiple candidate pairs are identified as potential neuron pairs in each image (Fig 6C, right). This is a common scenario as many promoters used in transgene markers are not necessarily specific to a single class of cells. In this case, the probability estimates from the SVM classifier [37, 49]

This boosts the specificity of the classifier without compromising the high sensitivity (Fig 6D). This additional step incorporates the real-world constraint that, at most, one cell pair exists in each valid image and resolves any conflicts that may arise in direct classification.

In this case, the specificity offered by the ins-6 promoter is insufficient to offer full cell specificity, requiring the identification of different cells from the raw fluorescent image. Taking advantage of our modular two-layer architecture, we reuse the preprocessing and first layer classification tools that we have already constructed to identify a small number of cell-shaped objects shown in Fig 7B. To detect the tetrad of cells with specificity for the A81 and AS] neurons, we construct a relational feature set based on combinations of neuron pairs (S6 Fig). As shown in Fig 7C, accounting for both correct cell pair identification tetrad sets that require feature calculation. Our two-layer architecture is therefore essential for the construction of such relational feature sets with larger numbers of targets. Without layer 1 classification, description of such complex sets quickly becomes intractable: even 10 candidate particles generates 1,260 different possible tetrad sets for feature calculation.

Subsequent validation of our two-layer classifier against new test images shows that the two-layer classification scheme operates with higher specificity but lower sensitivity in comparison to our single cell-pair classification problem (Fig 7D). Further analysis of the classifier performance within the test set of images shows that this lower sensitivity is mainly due to more degrees of freedom for variability associated with this particular image processing problem. As shown in Fig 7E, while the second-layer classifier accommodates some deviation from the stereotypical arrangement of the neurons shown in Fig 7A (positive identification on the left), there is a tradeoff between maintaining specificity and sensitivity (rejecting larger deviations as illustrated by the negative identification on the right). If the stringency is important, i.e. maintaining specificity and reducing misidentification rate, the users would have to tolerate a small amounts of false negatives. Users would need to determine a comfortable level of rejection rate for each specific problem to tune the classifier.

Discussion

Using our pipeline, we have developed two specific solutions addressing common image processing problems for the C. elegans community. Our contribution of a ready-to-use head-versus-tail classification scheme under bright-field imaging enables automated high-resolution imaging and stimulus application in a large range of biological experiments in the worm. Our neuronal cell pair identification application forms the basis for approaching the general problem of cell-specific information extraction within a multicellular context such as the worm. Together, these specific tools permit automated visual dissection of the multicellular worm at different resolutions that range from the targeting of rough anatomical regions to cell-specificity.

The detection of the pharyngeal grinder demonstrates a general class of problems where discrete structures are distinguished by both their intrinsic shape and the characteristics of their local environment. The entire framework, including the feature sets, developed and documented for this problem can be applied to the recognition of other discrete structures including subcellular organelles such as nuclei, specific cell types and tissue structures. The detection of single and multiple cell pairs extends the analysis to stereotypical formations of objects. The feature sets documented here for analyzing paired objects is directly applicable to the analysis of many symmetrical structures that arise in biology, such as in the nervous system. However, with some modification, similar features can be applied to the analysis of different patterns that may arise in specific biological processes such as development. Finally, the preprocessing modules developed for these two applications demonstrate the ability to segment out objects of different intensities from both bright-field and fluorescent imaging and are applicable to many other problem sets.

In the detection of the pharyngeal grinder, two-layer construction eliminated the need to compute a large set of regional descriptors by associating them with the second layer of classification and therefore a smaller candidate set. In comparison with direct calculation of all features in a single layer of classification, the two-layer architecture employed in this work reduced average total computational time by a factor of two (S7 Fig). In cell identification, reserving relational properties for the second-layer of classifications dramatically reduced the number of pairs or sets for which relationships must be described. In this case, the computational time savings associated with the two-layer architecture increases with the complexity of the second-layer relationships and can result in large, roughly sixfold speed improvements in the case of two cell pair identification (88 Fig). In addition to these computational benefits, our two applications also demonstrate that the segregation of intrinsic and secondary or extrinsic properties of a structure onto two layers of classification reserves many problem-specific features for the second layer and renders the first layer feature set generalizable. In addition, we have demonstrated that by incorporating a calibration factor to normalize feature calculation, these classifiers can be adapted to different optical systems and sensor configurations with only the modification of the calibration factor itself (Fig 4E and 4F and SI Table).

This approach imposes user-defined structure onto the data-extraction problem and promotes familiarization with the condition and fundamental limitations on the information content of imaging datasets [29, 51]. Moreover, having a small set of manually annotated images allows for the assessment of the reliability of the final analysis [29]. Thus, the user exercises control over higher level problem structure including the formulation of the overall classification question, the choice of the type of candidates and the features used. However, to constrain the construction of the solution, we present a specific workflow with integrated computational techniques that bypass much of the manual guesswork. Annotation and calculation of quantitative descriptors about particle or pixel candidates captures multivariate information about different structures. The use of this multivariate information with a classification model such as SVM obviates the need for manually assessing rectilinear thresholds for classification. Moreover, the performance of our classifiers demonstrate that the potentially nonlinear, multidimensional classification provided by SVM prove more powerful than rectilinear thresholding of individual features or dimensionality reduction techniques (Fig 3C and 3F). Overall, our proposed methodology provides a pipeline that streamlines and formalizes the image processing steps after the annotation of a training set.

In general, the construction of our classification scheme affords layer 1 classifiers more general applicability. For example, we have demonstrated the generalizability of our layer 1 feature set for binary particle classification with retraining for the identification of different shapes. The layer 1 classifier constructed in our cell identification scheme can also be reused for the classification of different downstream cellular arrangements. Even for the second layer of classification, where feature sets are problem-specif-ic, we have provided examples of both regional and relational feature set constructions that can form the basis of feature sets for other problems.

Conclusion

For instance, we consider our scheme to be a generalization of the previously reported application of SVMs towards the understanding of synaptic morphology in C. elegans [24]. In this application, individual pixels within the image form the pool of candidates for potential synaptic pixels in the first layer classification. The second layer of classification then refines this decision on the basis of relational characteristics between candidates. Here, we formalize this classification approach and demonstrate that it can be adapted towards detection of disparate structures imaged under different imaging modalities. The imaging processing approach we present here has inherent structural advantages in terms of conceptual division, modularization and computational efficiency and demonstrates the application of a powerful supervised learning model to streamline biological image processing. We thus envision that our methodology can form the basis for detection algorithms for structures ranging from the molecular to the tissue or organismal level under different experimental methodologies.

Materials and Methods

Worm Maintenance and Culture

Briefly, populations of worms were allowed to reach reproductive maturity and lay eggs on NGM agar media overnight. Age-synchronized worms were then obtained by washing free-moving worms off of the agar plate, allowing the remaining eggs to hatch for one hour and then washing the resulting L1 stage larvae off of the plate. Age-synchronized L1 worms were then transferred onto new NGM plates seeded with OP50 E. coli bacteria as a standard food source and grown until the desired age for imaging. To avoid overcrowding and food depletion, adult worms were transferred onto new plates daily. For starvation experiments, worms were transferred onto fresh NGM plates lacking a bacterial food source the day before imaging.

Microfluidics and Image Acquisition

For automated imaging, worms are washed off of NGM plates using S Basal buffer and introduced via pressure injection into the microfluidic device. Sequential activation of pressure sources driving liquid delivery and on-chip pneumatic valves is then used to drive individual worms within the device for imaging.

Relevant specifications and calibration metrics for these setups can be found in 81 Table. Although not strictly necessary, for generalizability in cases where the center of focus is adjusted to specific fluorescent targets and do not capture the pharynx well, a sparse three plane z-stack with a 15pm step size is used for bright field image acquisition. To fully capture neuronal cells, a dense z-stack was collected through the body of the worm. For fluorescence imaging of the single neuron pair in QL296, a 0.4um step size was used over a 60pm thick volume. For fluorescence imaging of multiple neurons pairs in QL617, a 1pm step size was used over a 100um thick volume.

Image Analysis and Computational Tools

In preprocessing, the three dimensional information in the acquired z-stacks were either maximum or minimum projected onto a single two-dimensional image for further processing. For bright-field images, a minimum projection With respect to 2 was utilized to accentuate the appearance of dark objects throughout the stack. Conversely, for fluorescence images, a maximum projection was utilized to accentuate the appearance of bright objects throughout the stack: In order to generate binary particles for classification, we use a local thresholding algorithm that uses information about the mean and variability of pixel intensities within a local region around a pixel: mom; and 010ml are the means and standard deviations of all pixel values that fall within a square region of width 2R + 1 centered around the pixel of interest xi, yz- and k is a parameter specifying the stringency of the threshold. mom; and mom; can be derived using standard image filtering with a binary square filter h(x,-, yj) of width 2R + 1: Using local mean and standard deviation information in the binary decision affords robustness against local background intensity and texture changes. The width of the local region, R, can be roughly selected on the basis of the size scale of the structure of interest. In accordance with the size scales of the pharyngeal structure and individual neurons, we use R 2 15pm for detection of the pharyngeal grinder and R 2 5pm for fluorescent cell segmentation.

We use k = 0.75 for our bright field application and k = 0.85 for our fluorescence application. Individual candidate particles in the resulting binary image are defined as groups of nonzero pixels that are connected to each other via any adjacent of diagonal pixel (8-connected). We note that changes in k can alter the size of segmented particles and the connectivity of segmented particles. Particularly in bright field, where the contrast mechanism lacks specificity, decreases in k can cause particles to merge via small bridges of dark texture. In order to build in some robustness against changes in k and background texture in these scenarios, we perform a form of a morphological opening operation after thresholding to remove small bridges that may arise between otherwise distinct particles. To do this, we perform a morphological erosion with a small circular structuring element followed by a morphological dilation with a smaller structuring element [53].

The first layer, which delineates structures of interest from other structures on the basis on its intrinsic geometric properties, is generally applicable to particle classification problems and is used for both the bright field and fluorescent structure detection outlined here. Details and equations for the calculation of the 14 features for layer 1 classification can be found in 82 Fig. Secondary characteristics of biological structures describe the context in which structures exist and their relationship to other structures. Due to the large variability in the secondary characteristics of biological structures, a generic set of features is not necessarily attainable or desirable due to concerns for computational efficiency. Rather, secondary features can be derived via a mathematical description of empirical observations of important structural properties. In the case of pharyngeal grinder detection, the secondary features are regional, forming a description of the image context in Which the grinder structure resides. The form of the features is based on an empirical understanding of this structural context and full details and equations for the calculation of the 34 features in layer 2 of the bright field classifier can be found in S3 Fig. In the case of cell pair detection, the secondary features are mostly relational, describing how particles from layer 1 of classification may or may not exist as pairs on the basis of both positioning and intensity. Second layer features for cell pair detection can be found in SS Fig. We do briefly note that we scale all calculated features using a calibration factor, C, derived from specifications of both the optics and sensors that form the imaging system: The use of this calibration system renders the trained classifier relatively invariable to small changes in the imaging setup via conversion of all features into real units. Calibration factors for all imaging systems and configurations used here can be found in 81 Table.

For general performance, we train use a Gaussian radial basis function kernel for all of our trained classifiers [48]. To ensure performance of the SVM model for our datasets, we optimize the penalty or margin parameter, C SVM, and the kernel parameter, 7/, for each training set using the fivefold cross-validation performance of the classifier as the output metric. For efficient parameter optimization, we start with a rough exponential grid search (Fig 3B and 3D and S4 Fig) and refine parameter selection with a finer grid search based on these results. To adjust for the relative proportions of positive and negative candidates in unbalanced training sets (Fig 3C), we also adjust the relative weight, W, of the classes according to their representation in the training set while training [37]. Additionally, we perform a small grid search for the optimal weighting factor to fully optimize the following performance metric. Probability estimates for single and multiple neuron pair identification are derived according to the native LIBSVM algorithm [37]. For visualization of the high dimensionality feature sets (Fig 3C and 3F), we apply Fisher’s linear discriminant analysis [54]. The two projection directions are chosen to be the first two eigen vectors of: SB is a measure of interclass separation and SW is a measure of intraclass scatter.

Supporting Information

a, b and c show three representative images of day 2, well-fed adult worms acquired using standard agar pad imaging techniques. The intermediate outputs of grinder detection (MP, BWO, BWl, BWZ, BW3) show the minimally projected image, the binary image after thresholding, the initial particle candidate set, the candidate set after the first layer of classification and the final particle set after the second layer of classification, respectively. The same process developed for head versus tail analysis on microfluidic chip robustly identifies the grinder structure in these conventionally acquired images.

a) Table of 14 features for binary shape description including low-level geometric descriptors, more complex derived measures of geometry and invariant moments. b) Diagram of binary particle indicating variables used for feature definition. c) Illustration and example of defining and calculating the perimeter of an irregular particle based on pixel connectivity. d) Illustration and example of the convex hull of a binary particle. (TIF)

a) Diagram of the region of interest around a grinder particle showing changes in texture and particle density along radial partitions. b) Diagram of the region of interest around a grinder particle distinguishing individual particles using different colors and showing particle distributions along angular partitions. c) Table of 34 features used to describe regional characteristics of the grinder particle for the second layer of classification. (TIF)

Optimized parameters for the first layer classifier (a), the second layer single pair classifier (b) and the second layer two pair classifier (c) show considerable variability, reinforcing the need for case-specific parameter optimization. (TIF)

a) Maximum intensity projection (MP) and binary image (B W2) showing candidate particles after the first layer of classification with relevant axes and regions labeled. b) Identification of possible pairs for feature calculation and schematic of an example feature set for one pair. c) Table of the four relational features used to describe cell pair patterns. (TIF)

a) Maximum intensity projection (MP) and binary image showing candidate particles after layer 1 classification (B W2) with relevant axes and regions labeled. b) Enumeration of the possible neuron pairs and the possible sets of neuron pairs with correct distinction between the A81 and AS] pairs. c) Schematic showing the frame of reference (XC, YC) for the calculation of the relative location of each neuron and the intensities of the neurons within two particular sets. d) Table showing that 6 properties are calculated for each neuron pair, resulting in a total of 12 relational features to identify the tetrad of neurons. (TIF)

a) Schematic comparisons of the two-layer, serial classification architecture employed in this work and an equivalent single-layer, parallel classification architecture used for time comparisons. b) Comparison of process-specific and total time requirements for the two-layer and equivalent one-layer architectures. Reducing second-layer feature calculations using the two-layer scheme results in over a twofold reduction in total classification time. All times are based on performance on MATLAB 2013b running on a quad core

a) Comparison of process-specific and total time requirements for the two-layer and equivalent one-layer architectures when applied to single neuron pair detection. b) Comparison of process-specific and total time requirements for the two-layer and equivalent one-layer architectures when applied to the identification of two distinct neuron pairs. All times are based on performance on MATLAB 2013b running on a quad core processor at 3.50 GHz 81 Table. Calculation of the calibration metric for common changes in the imaging system and acquisition parameters. Setups used for this study are highlighted. (TIF)

Acknowledgments

The authors would like to gratefully acknowledge Brad Parker and Jeffrey Andrews for machining hardware necessary for this work, and Dhaval S. Patel for critical commentary on the manuscript.

Author Contributions

Performed the experiments: MZ. Analyzed the data: MZ. Contributed reagents/materials/ analysis tools: EVE AC DAFdA QC. Wrote the paper: MZ MMC QC HL. Designed the software used in analysis: MZ MMC.

Topics

training set

Appears in 13 sentences as: training set (12) training sets (2)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. However, in addition to informative feature selection and the curation of a representative training set , the performance of SVM classification models is subject to several parameters associated with the model itself and its kernel function [34, 48].
    Page 7, “Bright-Field Head Identification”
  2. Thus, to ensure good performance of the final SVM model, we first optimize model parameters based on fivefold cross-validation on the training set (Fig 3A and 3B, Materials and Methods).
    Page 7, “Bright-Field Head Identification”
  3. To visualize feature and classifier performance, we use Fisher’s linear discriminant analysis to linearly project the 14 layer 1 features of the training set onto two dimensions that show maximum separation between grinder and background particles (Fig 3C).
    Page 7, “Bright-Field Head Identification”
  4. Nevertheless, the first layer of classification enriches the true grinder structure candidates in the training set from roughly 6.2% of the original particle set to 40% of the particle set entering into the second layer of classification (Fig 3C).
    Page 7, “Bright-Field Head Identification”
  5. We train the final layer 2 classifier with the reduced training set and these optimized parameters to yield high classification performance in combination with layer 1 (Fig 3F).
    Page 8, “Bright-Field Head Identification”
  6. To account for this potential variability, we include worms imaged at different ages and food conditions in the training set of images.
    Page 9, “Bright-Field Head Identification”
  7. Second, while the training set only includes wildtype worms imaged under different conditions, the morphology and texture of the worm is also subject to genetic alteration (Fig 4C).
    Page 9, “Bright-Field Head Identification”
  8. Using this feature set, we optimize and train a layer 1 SVM classifier using a manually annotated training set (n = 218) (S4A Fig, Materials and Methods) and show that it is sufficient for identifying cellular regions with relatively high sensitivity and specificity (Fig 5D and 84A Fig).
    Page 10, “Identification of Fluorescently Labeled Cells”
  9. To construct a new problem-specific layer 2 classifier based on relationships within these tetrad candidates, we optimize and train a SVM model based on a manually annotated training set (n = 324) (84C Fig).
    Page 13, “Identification of Fluorescently Labeled Cells”
  10. In both layers of classification, we adopt a supervised learning approach that depends upon human annotation of training sets of data.
    Page 14, “Discussion”
  11. Overall, our proposed methodology provides a pipeline that streamlines and formalizes the image processing steps after the annotation of a training set .
    Page 14, “Discussion”

See all papers in April 2015 that mention training set.

See all papers in PLOS Comp. Biol. that mention training set.

Back to top.

SVM

Appears in 12 sentences as: SVM (13)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. The process flow architecture we present here utilizes standard image processing techniques and the multi-tiered application of classification models such as support vector machines ( SVM ).
    Page 1, “Abstract”
  2. However, in addition to informative feature selection and the curation of a representative training set, the performance of SVM classification models is subject to several parameters associated with the model itself and its kernel function [34, 48].
    Page 7, “Bright-Field Head Identification”
  3. Thus, to ensure good performance of the final SVM model, we first optimize model parameters based on fivefold cross-validation on the training set (Fig 3A and 3B, Materials and Methods).
    Page 7, “Bright-Field Head Identification”
  4. Therefore, we optimize the SVM parameters via the minimization of an adjusted error rate that penalizes false negatives more than false positives (Fig 3B).
    Page 7, “Bright-Field Head Identification”
  5. A high degree of overlap between the distributions of the grinder and background particles and high error rates associated with the trained SVM in this visualization suggest that shape-intrinsic features are insufficient to fully describe the grinder structure.
    Page 7, “Bright-Field Head Identification”
  6. Using this feature set, we optimize and train a layer 1 SVM classifier using a manually annotated training set (n = 218) (S4A Fig, Materials and Methods) and show that it is sufficient for identifying cellular regions with relatively high sensitivity and specificity (Fig 5D and 84A Fig).
    Page 10, “Identification of Fluorescently Labeled Cells”
  7. To construct our layer 2 classifier, we optimize and train an SVM model based on these pairwise relational features (S4B Fig).
    Page 10, “Identification of Fluorescently Labeled Cells”
  8. In this case, the probability estimates from the SVM classifier [37, 49]
    Page 10, “Identification of Fluorescently Labeled Cells”
  9. To construct a new problem-specific layer 2 classifier based on relationships within these tetrad candidates, we optimize and train a SVM model based on a manually annotated training set (n = 324) (84C Fig).
    Page 13, “Identification of Fluorescently Labeled Cells”
  10. The use of this multivariate information with a classification model such as SVM obviates the need for manually assessing rectilinear thresholds for classification.
    Page 14, “Discussion”
  11. Moreover, the performance of our classifiers demonstrate that the potentially nonlinear, multidimensional classification provided by SVM prove more powerful than rectilinear thresholding of individual features or dimensionality reduction techniques (Fig 3C and 3F).
    Page 14, “Discussion”

See all papers in April 2015 that mention SVM.

See all papers in PLOS Comp. Biol. that mention SVM.

Back to top.

computational time

Appears in 5 sentences as: Computational time (2) computational time (3)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. This two-tier scheme allows significant reduction in computational time .
    Page 4, “Results”
  2. In comparison with direct calculation of all features in a single layer of classification, the two-layer architecture employed in this work reduced average total computational time by a factor of two (S7 Fig).
    Page 13, “Discussion”
  3. In this case, the computational time savings associated with the two-layer architecture increases with the complexity of the second-layer relationships and can result in large, roughly sixfold speed improvements in the case of two cell pair identification (88 Fig).
    Page 14, “Discussion”
  4. Computational time savings associated with two-layer classification architecture for head versus tail detection.
    Page 18, “Supporting Information”
  5. Computational time savings associated with two-layer classification architecture for cell identification.
    Page 19, “Supporting Information”

See all papers in April 2015 that mention computational time.

See all papers in PLOS Comp. Biol. that mention computational time.

Back to top.

false positives

Appears in 4 sentences as: false positives (4)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. However, the resulting false positives in Fig 2D show that the information within these shape metrics is insufficient to distinguish the grinder with high specificity.
    Page 7, “Bright-Field Head Identification”
  2. In other words, we aim to minimize false negatives while tolerating a moderate number of false positives .
    Page 7, “Bright-Field Head Identification”
  3. Therefore, we optimize the SVM parameters via the minimization of an adjusted error rate that penalizes false negatives more than false positives (Fig 3B).
    Page 7, “Bright-Field Head Identification”
  4. along with the selection of the most likely candidate in images with multiple positive classification results is used to eliminate these false positives .
    Page 12, “Identification of Fluorescently Labeled Cells”

See all papers in April 2015 that mention false positives.

See all papers in PLOS Comp. Biol. that mention false positives.

Back to top.

MATLAB

Appears in 4 sentences as: MATLAB (4)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. We use custom MATLAB code to perform all image preprocessing and feature extraction steps and enable the construction and testing of our classification schemes.
    Page 15, “Image Analysis and Computational Tools”
  2. To implement discrete classification steps using support vector machines, we use the LIBSVM library, which is freely available for multiple platforms including MATLAB [37].
    Page 17, “Image Analysis and Computational Tools”
  3. All times are based on performance on MATLAB 2013b running on a quad core
    Page 18, “Supporting Information”
  4. All times are based on performance on MATLAB 2013b running on a quad core processor at 3.50 GHz 81 Table.
    Page 19, “Supporting Information”

See all papers in April 2015 that mention MATLAB.

See all papers in PLOS Comp. Biol. that mention MATLAB.

Back to top.

experimental conditions

Appears in 3 sentences as: experimental conditions (3)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. In order to approach this problem with minimal reliance on specific experimental conditions , we note several consistent morphological differences between the head and the tail of the worm that are observable in bright-field imaging.
    Page 5, “Bright-Field Head Identification”
  2. Changes in experimental conditions , the genetic background of the worms under study or changes to the imaging system, can cause significant variation in the features, and thus degrade the classifier performance due to overfitting that fails to take into account experimental variation (Fig 3).
    Page 8, “Bright-Field Head Identification”
  3. First, in spite of morphological changes due to experimental conditions (Fig 4A), we show the resulting classification scheme operates with consistently high performance in distinguishing the head and the tail of the worm in the new data sets (Fig 4B).
    Page 9, “Bright-Field Head Identification”

See all papers in April 2015 that mention experimental conditions.

See all papers in PLOS Comp. Biol. that mention experimental conditions.

Back to top.

feature selection

Appears in 3 sentences as: feature selection (3)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. Second, in the feature selection step, distinct mathematical descriptors that may help to describe and distinguish the structure of interest are calculated for each layer of classification.
    Page 7, “Bright-Field Head Identification”
  2. However, in addition to informative feature selection and the curation of a representative training set, the performance of SVM classification models is subject to several parameters associated with the model itself and its kernel function [34, 48].
    Page 7, “Bright-Field Head Identification”
  3. Finally, while utility of our framework will require feature selection and training for each particular application, the modularity and architecture of our framework permits aspects of the specific tools we have developed here to be reused.
    Page 14, “Discussion”

See all papers in April 2015 that mention feature selection.

See all papers in PLOS Comp. Biol. that mention feature selection.

Back to top.

support vector

Appears in 3 sentences as: support vector (3)
In Automated Processing of Imaging Data through Multi-tiered Classification of Biological Structures Illustrated Using Caenorhabditis elegans
  1. The process flow architecture we present here utilizes standard image processing techniques and the multi-tiered application of classification models such as support vector machines (SVM).
    Page 1, “Abstract”
  2. In this work, we chose to use support vector machines for all classification steps because of its insensitivity to specific conditioning of feature sets and therefore being more robust [34, 37].
    Page 4, “Results”
  3. To implement discrete classification steps using support vector machines, we use the LIBSVM library, which is freely available for multiple platforms including MATLAB [37].
    Page 17, “Image Analysis and Computational Tools”

See all papers in April 2015 that mention support vector.

See all papers in PLOS Comp. Biol. that mention support vector.

Back to top.