John Arne Nesheim

Contact information

Please contact John Arne Nesheim for more information

Nucleotyping

The Nucleotyping project is based on the concept of genome instability in cancer, in line with todays understanding and theories of cancer development. Nucleotyping classifies a tumor based on qualitative and quantitative analyses of DNA and chromatin structure in the tumor cell nuclei using advanced high-resolution digital image analysis.

Illustration describing the Nucleotyping project

Background

The organization of DNA in its cell nucleus is an extremely complex 3-dimensional architecture of a number of different structures, from the simple sugar-phosphate chain to the highly condensed mitotic chromosomes.

Chromatin is the complex combination of DNA, RNA, and protein that makes up chromosomes. It is divided between heterochromatin (condensed) and euchromatin (extended) forms. The major components of chromatin are DNA and histone proteins, although many other chromosomal proteins are present and with distinct roles too. The functions of chromatin are to package DNA into a smaller volume to fit in the cell (1.8meter of DNA into a nucleus with a diameter of 6 um), to enable mitosis and meiosis, and to serve as a mechanism to control expression and DNA replication. Changes in chromatin structure are affected by DNA binding proteins and by chemical modifications such as methylation and acetylation.

Normal differentiation relies on a stringent transcription control system with a continual process of activation and deactivation of the different genes. It is generally recognised that chromatin structure regulate DNA function within the cell, and the overwhelming evidence for rearrangements of the genetic material in tumours, combined with the present knowledge of chromatin structure and function, support the view that cancer is a disease also of DNA organisation.

Chromatin may be studied on many different levels. Our approach is to study chromatin as it is visualized in the interphase nuclei under a microscope (chromatin literally means coloured lightend material). Our methods are based upon computer-assisted image analysis, where images of cell and tissue samples are transferred from the microscope (light-, laser- and electron microscopy) to the computer. The digital images of chromatin permits us to study the structure and organization of DNA bit by bit by use of advanced texture analysis. We refer to these methods as Nucleotyping.

History

Nucleotyping is an ambitious project, both in regards to software development, systems development, and scientific expectations. The project has a long history, but with renewed focus from fall 2004 after the establishment of The Institute for Cancer Genetics and Informatics. Since then, multiple applications have been created, and a system for production of data has been set up. By “Production of data” we mean the entire process from scanning of sections with whole slide scanners and light microscopes to calculation of properties for each individual nucleus, and viewing of these on the corresponding whole slide images. There are also several sub-projects, such as automatic segmentation and new texture methods. Several large clinical materials are collected and prepared in the Cytometry lab, and in the period 2009-2011, all these materials will be analyzed with our new methods and tools.

The methodology of Nucleotyping was first developed by applying high resolution image analysis on transmission electron microscopy (TEM) images. At the same time, several new textural features were developed in a collaboration project with the image analysis group at the Department of Informatics at the University of Oslo. When these features were used to analyse the chromatin structure of tumour nuclei, remarkable results were obtained, as we were able to automatically distinguish between nuclei from normal tissue, regenerating tissue, benign tissue and malignant tissue. It also became clear that this method had strong diagnostic and prognostic potential in cancer.

However, TEM was a very time consuming and expensive method, and hence also a method with little potential for practical use in routine diagnostics. We were at that time also developing a new system for DNA ploidy measurements based on image cytometry of nuclear suspensions. The nuclei preparation was based on established methods for monolayer production and the Feulgen staining technique. The methods of Nucleotyping were eventually transferred to the same platform and we were able to conduct some limited retrospective studies in advanced prostate cancer with very good results (1). At the time the method was based on manually segmenting each of a hundred single nuclei. It was still a very time consuming method with a lot of manual work and a far from optimal computing time. However, the concept was proven and it was decided to invest the necessary time and resources into development of methodology and a system that could run large trials, and eventually could be put to work in routine diagnostics.

For the next 5 years the focus was on the development of a high resolution image system for automatic cytometry with emphasis on diagnostic DNA ploidy. This system is now well established in the routine diagnostics at our hospital where we analayse samples from around 1000 patients each year for routine diagnosis and prognosis of cancer.

The Nucleotyping system that we are now entering into the final phase of developing can make use of, but are not restricted to this cytometry platform. This platform is now automatic, but based on monolayer (single nuclei) specimens. We are now in the process of finalising a platform that can handle routine histological sections. This will certainly also be easier to introduce in the clinical routine.

During the last 10 years we have also been carrying out a large amount of basic research in texture analysis, developing a number of new features and methods that has improved the use of Nucleotyping.

Overview

The Nucleotyping project is based on the concept of genome instability in cancer, in line with todays understanding and theories of cancer development. Nucleotyping classifies a tumor based on qualitative and quantitative analyses of DNA and chromatin structure in the tumor cell nuclei using advanced high-resolution digital image analysis.

Nucleotyping represents among other features an objective assessment of nuclear atypia, one of the most important features of histopathological diagnosis and prognosis. In parallel with the qualitative assessment of chromatin structure, the total amount of chromatin or DNA (dependent on the applied staining) are also measured, thus giving the DNA ploidy as well as more traditional nuclear morphometry. The method is also sensitive for larger chromosomal aberrations. Of even greater importance is the methods ability to map and quantify functional changes in DNA organization. Such changes are, to a large extent, sub-visual, and are therefor not detected by traditional microscopy. Nucleotyping might be described as interphase cytogenetics, or an interphase version of karyotyping, where organisational and functional domains of DNA are mapped and described.

The project has two main goals

High resolution image analysis

Nuclear images are digitized into a two dimensional physical image which is divided into small regions called pixels (picture elements). The most common sampling scheme is a rectangular sampling grid. Thus, the image is divided into horizontal lines of adjacent pixels. The Nyqvis-Shannon sampling theorem requires that we acquire at least two such pixels for each period of the smallest periodic structure in the original image. Each pixel has an integer location or address (line or row number and sample or column number), and a quantized integer pixel value representing the integrated intensity of light measured over the area of the pixel in the object, see figure:

This measured intensity may in light microscopy be proportional to the amount of light transmitted through a specimen, or to the intensity of fluorescence in the sample. In imaging of monolayers, the 2-D (x,y)-image is a matrix of projections of square sections through the 3-D specimen. In imaging of thin slices, the same is true, but now the specimen is relatively thin, so the averaging effects in the z-direction is much smaller. If we want a 3-D representation of a cell, we may digitize a complete set of thin slices, and co-register the digital images. Alternatively, we may use confocal imaging, which facilitates imaging of optical planes without physically slicing the cell. In both cases we put the pixel values into a 3-D matrix of image voxels (volume elements). The wavelength of the measured light offers an additional data dimension. The most common application of this is the three-channel RGB color images obtained by using three sets of sensors that are sensitive to red, green and blue light. Using a number of color filters and detectors with a limited spectral sensitivity, one may capture a multispectral set of digital images, each at a specific wavelength and bandwith, e.g, matching the spectral properties of certain stains used. Increasing the number of spectral bands, hyperspectral imaging may record a full spectrum for each pixel in the image. Time also offers an additional dimension for the image matrix. Successive digital images taken at equidistant time intervals may be very useful to study dynamic processes. Such image sequences may be regarded as a matrix of pixels having a temporal dimension. A digital image may therefore be up to five-dimensional (three spatial, a spectral and a temporal dimension). In this project, we limit our discussion to monochrome 2-D images. This means that we assume that the images have only one value per pixel. The pixel value represents the average intensity of the pixel area in the image, on a scale from 0 to e.g., 255 or 4095, and rounded off to the nearest integer. As the pixels values go from 0 corresponding to black to (2b − 1) corresponding to full intensity, we call this the gray value, and we need b bits to store each pixel value, 8 bits for 0 – 255, 12 bits for 0 4096. The number of gray levels is often referred to as the gray scale resolution. The optical resolution is defined as the minimum distance between two point signals of equal intensity that can be perceived as two separate signals. A point source is imaged as an Airy disk with a surrounding ring pattern. The radius to the first dark ring (minimum) of the Airy pattern in the lateral direction (x and y) defines the 2-D point spread function (PSF). The Rayleigh criterion states that two point signals are resolved when the first minimum of one Airy disk is aligned with the central maximum of the second Airy disk. The theoretical resolution for a light microscope may be calculated by rAiry = 0.61λ/(NA) laterally, and zMin = 2λ/(NA)2 axially, where λ is the wavelength, n is the refractive index of the immersion medium and (NA) is the numerical aperture (NA). So a high numerical aperture is more important than a short wavelength for high axial resolution.

The lateral resolution in m in the focal plane – will depend on both the magnification and the numerical aperture. The sampling of the image must be performed in accordance with this, using the Nyqvist sampling theorem. Thus, images from different experiments may have very different pixel sizes. Typical pixel sizes reported in the papers reviewed here are in the range 0.05 to 0.5 m, but all too often this information is lacking. Documenting the actual pixel size in a particular study is essential for the evaluation and intercomparison of the obtained results. The effects of uneven illumination and artifacts in the focal plane are easily removed from digital images, using background images in a shading correction. Removal of random image noise is usually handled by digital filtering, using either linear convolution filters (e.g., Gaussian), nonlinear filters (e.g. median), or hybrid filters. Before venturing into an irreversible filtering of the images, one should investigate the amount of noise and whether the noise is additive or multiplicative, e.g., by inspecting the standard deviation versus the mean value within small windows throughout the image.

The windows used for noise filtering, and for estimation of local image features (e.g. first order statistics like local standard deviation and local entropy, see Section 3.6.1) are usually odd-sized squares (3×3, 5×5, ). Clearly, the results will depend both on the window size and the actual pixel size. This should also be kept in mind when comparing results from an analysis of runs of adjacent pixels having similar gray levels. The morphometric features that measure the size and shape of the nucleus or chromatine structures must be calibrated by the actual pixel size. The densitometric features that measure the distribution of optical density or gray level of the pixel are less influenced by the geometrical resolution, but much more influenced by the choice of gray scale resolution. The more complex and exciting textural features that give information about the combined aspect: the spatial arrangement of the pixel gray levels, may clearly influenced by both the geometrical resolution and the gray scale resolution.

Texture analysis

Visual textures are spatially extended patterns of more or less accurate repetitions of some basic elements, called texels. In digital images each texel usually contains several pixels, and its characteristics and placement can be regular, quasi-regular or random. Natural textures, e.g., biomedical textures are generally more random, but may well contain strong elements of a deterministic or quasi-regular nature. Here are some natural textures from the Brodatz album that are sufficiently similar to constitute a challenge for texture analysis methods, and also sufficiently similar to nuclear chromatin texture to be relevant in the present context.

Texture methods can be broadly grouped into statistical methods and structural methods. Pixel based statistical approaches to texture analysis are considered to be generally applicable and work well for natural textures present in images. On the statistical level a texture can be defined by a set of statistics extracted from a large ensemble of local image properties. These statistics may range from simple first-order statistics, to second-order or higher-order statistics, depending on the number of pixels which define the local information. First order statistics are based on (local) information
about gray levels in single pixels, second order statistics is based on (local) information about gray levels in pairs of pixels, while higher order statistics are based on local information about gray levels in three or more pixels. Statistical methods extract such (local) information from the pixels of the image and describe the distribution of this information in a statistical way. The mean value and standard deviation of the gray level distribution (which is based on gray levels in single pixels) are examples of first order statistics, gray level coocurrence matrix features are examples of second-order statistics, while gray level run length features are examples of higher order-statistics. If one has enough second-order statistical information it is possible to re-synthesize textures having the same visual properties.

We have developed our own image analysis systems from the ground up. These contain an optional combination of around 35 different properties, where most are texture properties. These are mainly retrieved either from Gray Level Coocurrence Matrix, Grey Level Run Length Matrix, or our own Grey Level Entropy Matrix. Most of these can be used with 7 different gray tone resolutions and 13 different window sizes, so that we theoretically have over one thousand property variations which can be used in millions of combinations. The challenge is to find a combination of a small number of properties which score well in the final validation set. Finding the right property combination for a cancer type is a comprehensive task which often initiates new development. A learning set analysis might require many months, whereas the validation set may be concluded in hours or a few days.

Materials

Monolayers

A suspension of isolated whole nuclei are centrifuged to a specially prepared slide such that the nuclei lie in a single layer (monolayer). The same preparation method is also used for ICM DNA ploidy analyses. The benefit of using such a method is that one can measure intact nuclei, which can be identified automatically aided by simple image analyses methods (threshold methods). By using a DNA specific staining method such as the Feulgen-Schiff method, one can also measure the amount of absorbed light in the whole nucleus to find a precise measurement for the amount of DNA. The disadvantage with this method is that one loses histological information and possibilities to relate analyses to other nuclear markers.

Tissue sections

Here, we slice in the same way as routine pathology sections of paraffin blocks with a microtome before staining the DNA in the same way as for monolayers, with the Feulgen-Schiff technique. The benefit of this method is that one maintains the histology and can correlate analyses to, in principle, all other markers used within histopathology. The preparation is more simple and less resource demanding compared to monolayers. In terms of image analysis, there are however, great challenges especially with segmenting (identifying) individual nuclei, and the method requires much more advanced image analysis and drastically more calculation power (more powerful computers). This method is also inappropriate for DNA ploidy, as the nuclei are cut and the fraction of measured nuclei is unknown.

TMA sections

This method has the same benefits and disadvantages as described under tissue sections. The difference is that one uses a circular section with a small diameter (0.6-1.1 mm) and therefore drastically less histological representation and far less nuclei. The advantage is that up to 500 patient specimens can be placed on one slide and thereby perform high throughput analyses of patients and different markers.

3-D nuclei from sections

With this method, one combine some of the benefits with tissue sections and monolayers by making sections thick enough (12-20 µm) such that the section contains whole nuclei. In the microscope, images of optical sections of the nucleus are obtained , and the thickness of the optical sections vary with numerical aperture of the microscope lens and the wavelength of light. One must therefore take many optical sections of the same nucleus and thereafter put them together to a reconstructed 3-dimensional nucleus. This poses the greatest need for advanced image analysis and great calculation capacity, but has the advantage of performing high throughput, high resolution nuclear analyses (Nucleotyping) in three dimensions.

Experimental design

Nucleotyping involves finding the combination of properties in the nuclei which best describes the nucleus’ status in relation to a given measurement or hypothesis. As previously described, we focus on properties which can measure the genomic instability and work from the hypothesis that genome instability is a driving force in cancer. The nuclear DNA and chromatin structure of course varies with the nuclei specific function in each tissue and is naturally different in different tissue types. One must therefore train the computer to learn the properties first with regards to known functions or conditions, in order to find properties which best describe the actual function or condition. For example, when we test a group of properties on their ability to predict a disease course (prognosis), we look for a combination of optimal properties in a part of the material where the clinical outcome for each patient involved is known. This is the learning set. Here we are allowed to adapt the properties (after strict and well-defined methods) until we can differentiate between specimens from patients with e.g. good and poor prognoses. Thereafter, we must verify the results by applying the same property combination blindly on an independent data set. This is the validation set. Each cancer type and/or preparation method requires their own lerning and validation sets.

This text was last modified: 12.09.2017

Click here to see publications related to this topic

Media

Click to view slideshow
Nucleotyping - basic overview Nucleotyping - detailed process TEM zoomer [8150] TEM zoomer [8203] TEM zoomer [8356]
Chief Editor: Prof. HÃ¥vard E. Danielsen
Copyright Oslo University Hospital. Visiting address: The Norwegian Radium Hospital, Ullernchausséen 64, Oslo. Tel: 22 78 23 20