Data management for multimedia retrieval pdf download






















To date, however, there has been little information available on providing a complete set of services for multimedia databases, including their management, mining, and integration on the Web for electronic enterprises. Managing and Mining Multimedia Databases fills that gap. Focusing on managing and mining multimedia databases for electronic commerce and business, it explores database management system techniques for text, image, audio, and video databases.

It addresses the issues and challenges of mining multimedia databases to extract information, and discusses the directions and challenges related to integrating multimedia databases for the Web, particularly for e-business.

This book provides a comprehensive overview of multimedia data management and mining technologies, from the underlying concepts, architectures, and data models for multimedia database systems to the technologies that support multimedia data management on the Web, privacy issues, and emerging standards, prototypes, and products.

Designed for technical managers, executives, and technologists, it offers your only opportunity to learn about both multimedia data management and multimedia data mining within a single book. Download Web Data Management And Electronic Commerce books , Effective electronic commerce requires integrating resources and extracting the critical information from across Web sites.

From the recent efforts to develop tools for interoperability and warehousing between scattered information on the web emerged the new discipline of web data management, and this book, Web Data Management and Electronic Commerce. The first of its kind, it combines data management and mining, object technology, electronic commerce, Java, and the Internet into a complete overview of the concepts and developments in this new field.

It details technologies in security, multimedia data management techniques, and real-time processing and discusses the emerging standards of Java Database Connectivity, XML, metadata, and middleware. A simple Web site isn't good enough anymore To remain competitive, you need Internet capabilities that allow you and your customers to buy, sell, and advertise. Even if you are unfamiliar with e-commerce, this self-contained volume provides the background you need to understand it through appendices that explain data management, Internet, security, and object technology.

Approachable enough for the beginner and complete enough for the expert, Web Data Management and Electronic Commerce helps you to manage information effectively and efficiently. Download Multimedia Database In Perspective books , During the last decade, multimedia has emerged as a major research and de velopment area.

Pushed by advanced technology like huge-capacity storage de vices, fast networks, and powerful work stations, new applications have arisen. Many definitions of multimedia systems exist, one of them being computer sys tems that support interactive use of at least one of the following information sources: graphics, image, voice, sound, and video.

These systems have caused a boom in the world of entertainment, but also in other business areas great opportunities for novel products and services are available. The size of multi media data is often huge, and the storage of huge amounts of data is a task normally allocated to database management systems. Although some modern database management systems offer facilities to support development of multi media applications, many problems related to multimedia support are still not well understood.

This book reports on research efforts to solve some of these problems. An in troductory knowledge of databases, and also of operating systems and network technology is assumed. The book is very suitable as material for courses at senior or graduate level, but also for upgrading the skills of computer scientists working on database management systems, multimedia systems or applications.

The book consists of four parts. Part I is called "Requirements for a Mul timedia Database" and comprises chapters one to three. Chapter one presents an outline of the book. There are already various operational systems for this task, but they are usually built as special-purpose systems and lack the general capability as exhibited in a database management system DBMS , suitable for a wide variety of applications.

This paper argues that DBMS should be extended to manage multimedia data as well as the standard structured data, exploiting all the established techniques and providing the new data types to all applications. The paper examines the characteristics of multimedia data and outlines some of the current research projects in this area.

It recognizes the significant and successful applications of the database technology and information retrieval techniques developed in the last two decades and proposes to capitalize on these advances to develop a DBMS for handling multimedia data.

The paper also sketches some directions where future research may be headed to solve the complex issues in multimedia data processing. Keywords: Multimedia databases, Image management, Theses. Download Multimedia Database Systems books , This volume is a compendium of recent research and development work pertaining to the problems and issues in the design and development of multimedia database systems. The design of indexing and organization techniques and the development of efficient and.

In fact, RDF makes no assumption about a particular application domain, nor does it define the semantics of any particular application domain. The definition of the mechanism is domain neutral, yet the mechanism is suitable for describing information about any domain.

Properties: A property is a specific aspect, characteristic, attribute, or relation used to describe a resource. Each property has a specific meaning and defines its permitted values, the types of resources it can describe, and its relationship with other properties. The three individual parts of a statement are called the subject, predicate, and object of the statement, respectively. We can see that this resource can be described using various page-related content-based metadata, such as title of the page and keywords in the page, as well as ASU-related semantic metadata, such as the president of ASU and its campuses.

The RDF model intrinsically supports binary relations a statement specifies a relation between two Web resources. Higher arity relations have to be represented using multiple binary relations. Some metadata such as property names used to describe resources are generally application dependent, and this can cause difficulties when RDF descriptions need to be shared across application domains.

For example, the property location can be called in some other application domain an address. Although the semantics of both property names are the same, syntactically they are different. On the other extreme, a property name may denote different things in different application domains. In order to prevent such conflicts and ambiguities, the terminology used by each application domain can be identified using namespaces.

A namespace can be thought of as a context or a setting that gives a specific meaning to what might otherwise be a general term. It is frequently necessary to refer to a collection of resources: for example, to the list of courses taught in the Computer Science Department, or to state that a paper is written by several authors.

To represent such groups, RDF provides containers to hold lists of resources or literals. RDF defines three types of container objects to facilitate different groupings: a bag is an unordered list of resources or literals, a sequence is an ordered list of resources or literals, and an alternative is a list of resources or literals that represent alternatives for the single value of a property.

To achieve this, one has to model the original statement as a resource. In other words, the higher order statements treat RDF statements as uniquely identifiable resources. This process is called reification, and the statement is called a reified statement. Naturally, the relational data model is suitable to describe the metadata associated with the media objects. The object-oriented data model is suitable for describing the application semantics of the objects properly.

The content of a complex-media object such as a multimedia presentation can be considered semi-structured or self-describing as different presentations may be structured differently and, essentially, the relevant structure is prescribed by the author of the presentation in the presentation itself.

Lastly, each media object can be interpreted at a semantic level, and this interpretation can be encoded using RDF. On the other hand, as we will see, despite their diversity and expressive powers, the foregoing models, even when used together, may not be sufficient for describing media objects.

Thus, new models, such as fuzzy, probabilistic, vector-based, sequence-based, graph-based, or spatiotemporal models, may be needed to handle them properly. Colors, textures, and shapes are commonly used to describe images. Time and motion are used in video databases. Terms also referred to as keywords are often used in text retrieval. For example, some colors are perceived more strongly than the others by the human eye [Kaiser and Boynton, ].

The human eye is also more sensitive to contrast then colors in the image [Kaiser and Boynton, ]. In addition, the query workload i. We will consider feature selection in Section 4. Although these measures are in many cases application and data model specific, there are certain properties of these measures that transcend the data model and media type. The first three properties of the metric distances ensure consistency in retrieval.

The last property, on the other hand, is commonly exploited to prune the search space to reduce the number of objects to be considered for matching during retrieval Section 7. Therefore, we encourage you to pay close attention to whether the measures we discuss are metrics or not.

It is, however, important to note that the diversity of features and feature models does not necessarily imply a diversity, equivalent in magnitude, in terms of feature representations. Intuitively, the vector describes the composition of a given multimedia data object in terms of its quantifiable properties.

Histograms, for example, are good candidates for being represented in the form of vectors. We discuss the vector model in detail in Section 3. In fact, as we see in Section 2. We revisit graph and tree models in Section 3. Many times, however, the underlying regularity may be imprecise. In such a case, fuzzy or probabilistic models may be more suitable.

We discuss fuzzy models for multimedia in Section 3. In the rest of this section, we introduce and discuss many commonly used content features, including colors, textures, and shapes, and structural features, such as spatial and temporal models. We revisit the common representations and discuss them in more detail in Chapter 3. In fact, this is not entirely correct. However low level a feature is, it still needs a model within which it can be represented, interpreted, and described.

This model is critical: because of the finite nature of computational devices, each feature instance is usually allocated a fixed, and usually small, number of bits. This means that there is an upper bound on the number of different feature instances one can represent. Thus, it is important to choose a feature model that can help represent the space of possible and relevant feature instances as precisely as possible.

Because basic knowledge about commonly used low-level media features can help in understanding the data structures and algorithms that multimedia databases use to leverage them, in this section we provide an overview of the most common low-level features, such as color, texture, and shape. Higher level features, such as spatial and temporal models, are also discussed. For the applications that involve human vision, the color model needs to represent the colors that the human eye can perceive.

The human eye, more specifically the retina, relies on so-called rods and cones to perceive light signals. Rods help with night vision, where the light intensity is very low. They are able to differentiate between fine variations in the intensity of the light i.

The cones, on the other hand, come into play when the light intensity is high. The three types of cones, R, G, B, each perceive a different color, red, green, and blue, respectively. The RGB model of color. RGB Model Most recording systems cameras and display systems monitors use a similar additive mechanism for representing color information. In this model, commonly referred to as the RGB model, each color instance is represented as a point in a three-dimensional space, where the dimensions correspond to the possible intensities of the red, blue, and green light channels.

As shown in Figure 2. The diagonal line segment connecting the origin of the RGB color cube to the white corner has different intensities of light with equal contributions from red, green, and blue channels and, thus, corresponds to different shades of gray.

The RGB model is commonly implemented using data structures that allocate the same number of bits to each color channel. For example, a 3-byte representation of color, which can represent different color instances, would allocate 1 byte each to each color channel and thus distinguish including 0 intensities of pure red, green, and blue.

An image would then be represented as a two-dimensional matrix, where each cell in the dimension contains a bit color instance. These cells are commonly referred to as pixels. When the space available for representing storing or communicating images of this size is not as large, the number of bits allocated for each pixel needs to be brought down.

This can be achieved in different ways. One solution is to reduce the precision of the color channels. Although this might be a sufficient number of distinct colors to paint an image, because the color cube is partitioned regularly under the foregoing scheme, this might actually be wasteful.

For example, consider an image of the sea taken on a bright day. This picture would be rich in shades of blue, whereas many colors such as red, brown, and orange would not necessarily appear in the image.

Thus, a good portion of the 4, different colors we have might not be of use, while all the different shades of blue that we would need might be clustered under a single color instance, thus resulting in an overall unpleasant and dull picture. An alternative scheme to reduce the number of bits needed to represent color instances is to use a color table.

A color table is essentially a lookup table that maps from a less precise color index to a more precise color instance. Let us assume that 35 36 Models for Multimedia Data we can process all the pixels in an image to identify the best 4, distinct bit colors mostly shades of the blue in the preceding example needed to paint the picture. We can put these colors into an array i. Whenever this picture is to be displayed, the display software or hardware can use the lookup table to convert the color indexes to the actual bit RGB color instances.

Through the foregoing process, the color indexes are built one bit at a time by splitting the color instances into increasingly finer color clusters. The process is continued until the length of the color index matches the application requirements.

For instance, in the previous example, the min-cut partitioning will be repeated to the depth of 12 i. A third possible scheme one can use for reducing the number of bits needed to encode the color instances is to rely on the properties of human perception. As we mentioned earlier, the eye is not as sensitive to all color channels equally.

Some colors are more critical in helping differentiate objects than others. We discuss this next. In fact, in Section 4. Therefore, a color model that represents grayscale or luminance as an explicit component, rather than a combination of RGB, could be more effective in creating reduced representations without negatively affecting perception.

Given the luminance component, Y, and two of the existing RGB channels, say R and B, we can create a new color space YRB that can represent the same colors as the RGB, except that when we need to reduce the size of the bit representation, we can favor cuts in the number of bits of the R and B color components and preserve the Y luminance component intact to make sure that the user is able to perceive contrast well.

In contrast, the U and V components reflect the chrominance of the corresponding color instance precisely.

Further studies showed that the human eye does not prefer either U blue minus luminance or V red minus luminance strongly against the other. On the other hand, the eye is shown to be less sensitive to the differences in the purplegreen color range as opposed to the differences in the orange-blue color range.

Thus, if these purple-green and orange-blue components can be used instead of the UV components, this can give a further opportunity for reducing the bit representation, without much affecting the human perception of the overall color instance.

The relationship between UV and IQ chrominance components. More specifically, the same amount of change in a given stimulus is perceived more strongly if the original value is lower. Furthermore, the CIE space covers all the chromaticities visible to the human eye, whereas the RGB color space cannot do so. In fact, it has been shown that no three-light source can cover the entire spectrum of chromaticities described by CIE and perceived by the human eye.

We ignore the distinction and the relevant details. Blue [,] …… …… …… …… …… [[,] 53, 03] …… 46, , …… …… …… [,] …… …… …… …… …… [51,] …… …… …… …… …… [0,50] …… …… …… …… …… [0,50] [51,] [,] [,] [,] Red a …… 46, , …… …… … Red [51,] Blue [,] b Figure 2. Sample images with dominant shapes. Figure 2. The IFQ visual interface of the SEMCOG image and video retrieval system [Li and Candan, a]: the user is able to specify visual, semantic, and spatiotemporal predicates, which are automatically converted into an SQL-like language for fuzzy query processing.

Figure 5. Hilbert curve: a First order, b Second order, c Third order. Figure 7. Z-order traversal of 2D space. Max-a-min approach: a given a number of clusters, first b,c,d,e leaders that are sufficiently far apart from each other are selected, and then f the clustering is performed using the single-pass scheme. Figure User relevance feedback process.

In other words, the CIELAB model normalizes the luminosities and chromaticities of the color space with respect to the color instance that humans perceive as white. The second thing to note is that L is a normalized version of luminosity.

It takes values between 0 and 0 corresponds to black, and corresponds to the color that is perceived as white by humans. A similar color space, where the spectrum value of gray from black to white is represented as a vertical axis, the amount of color i. This color model is commonly visualized as a cylinder, cone or hexagonal cone hexcone, Figure 2. Color-Based Image Representation Using Histograms As we have seen, in almost all models, color instances are represented as combinations of three components.

This, in a sense, reflects the structure of the human 39 40 Models for Multimedia Data retina, where color is perceived through three types of cones sensitive to different color components. An image, then, can be seen as a two-dimensional matrix of color instances also called pixels , where each pixel is represented as a triple. Matching two images based on their color content for similarity-based retrieval, then, corresponds to comparing the triples contained in the corresponding arrays.

One way to achieve this is to compare the two arrays without loss of generality, assuming that they are of the same size by comparing the pixel pairs at the same array location for both images and aggregating their similarities or dissimilarities based on the underlying color model into a single score. This approach, however, has two disadvantages. A second disadvantage of this is that pixel-by-pixel matching of the images would be good for looking for almost-exact matches, but any image that has a slightly different composition including images that are slightly shifted or rotated would be identified as mismatches.

An alternative representation that both provides significant savings in matching cost and also reduces the sensitivity of the retrieval algorithms to rotations, shift, and many other deformations is the color histogram. Given a bag or multiset , B, of values from a domain, D, and a natural number, n, a histogram partitions the values in domain D into n partitions and, then, for each partition, records the number of values in B that fall into the corresponding range.

A color histogram does the same thing with the color instances in a given image: given n partitions or bins of the color space, the color histogram counts for each partition the number of pixels of the image that have color instances falling in that partition. In Section 3. Here, we note that a color histogram is a compact and nonspatial representation of the color information.

In other words, the pixels are associated with the color partitions without any regard to their localities; thus all the location information is lost in the process. In a sense, the color histogram is especially useful in cases where the overall color distribution of the given image is more important for retrieval than the spatial localities of the colors.

As a low-level feature, texture is fundamentally different from color, which is simply the description of the luminosity and chromaticity of the light corresponding to a single point, or pixel, in an image. The first major difference between color and texture is that, whereas it is possible to talk about the color of a single pixel, it is not possible to refer to the a b c d e f Figure 2. Texture is a collective feature of a set of neighboring pixels in the image.

Second, whereas there are standard ways to describe color, there is no widely accepted standard way to describe texture. Indeed, any locally dominant visual characteristic even color can be qualified as a texture feature. Moreover, being dominant does not imply being constant. In fact, a determining characteristic for most textures is the fact that they are nothing but patterns of change in the visual characteristics such as colors of neighboring pixels, and as thus, describing a given texture or the pattern requires describing how these even lower-level features change and evolve in the two-dimensional space of pixels that is the image.

As such textures can be described best by models that capture the rate and type of change. Random Fields A random field is a stochastic random process, where the values generated by the process are mapped onto positions on an underlying space see Sections 3. In other words, we are given a space, and each point in the space takes a value based on an underlying probability distribution. Moreover, the values of adjacent or even nearby points also affect each other Figure 2.

We can see that this provides a natural way for defining texture. We can model the image as the stochastic space, pixels as the points in this space, and the pixel color values as the values the points in the space take Figure 2. Thus, given an image, its texture can be modeled as a random field [Chellappa, ; Cross and Jain, ; Elfadel and Picard, ; Hassner and Sklansky, ; Kashyap and Chellappa, ; Kashyap et al.

Essentially, random field-based models treat the image texture as an instance or realization of a random field. Conversely, modeling a given texture or a set of texture samples involves finding the parameters of the random process that is most likely to output the given samples see Section 9. Fractals As we further discuss in Section 7. As such, fractals are commonly used in modeling analysis and synthesis of natural structures, such as snowflakes, branches of trees, leaves, skin, and coastlines, which usually show such self similarity Figure 2.

A number of works describe image textures especially natural ones, such as the surface of polished marble using fractals. Under this texture model, analyzing an image texture involves determining the parameters of a fractal or iterated function system that will generate the image texture by iterating a basic pattern at different scales [Chaudhuri and Sarkar, ; Dubuisson and Dubes, ; Kaplan, ; Keller et al.

Wavelets A wavelet is a special type of fractal, consisting of a mother wavelet function and its scaled and translated copies, called daughter wavelets. In Section 4. Unlike a general-purpose fractal, wavelets or more accurately, two-dimensional discrete wavelets can be used to break any image into multiple subimages, each corresponding to a different frequency i.

Consequently, wavelet-based techniques are suitable for studying frequency behavior e. Texture Histograms Whereas texture has diverse models, each focusing on different aspects and characteristics of the pixel structure forming the image, if we know the specific textures we are interested in, we can construct a texture histogram by creating an array of specific textures of interest and counting and recording the amount, confidence, or area of these specific textures in the given image.

Because most textures can be viewed as edges in the image, an alternative to this approach is to use edge histograms [Cao and Cai, ; Park et al. An edge histogram represents the frequency and the directionality of the brightness or luminosity changes in the image.

Wavelet-based texture signature for one-dimensional data. Once the rate and direction of change is detected for each pixel, noise is eliminated by removing those pixels that have changes below a threshold or do not have pixels showing similar changes nearby. Then, the edges are thinned by maintaining only those pixels that have large change rates in their immediate neighborhood along the corresponding gradient.

After these phases are completed, we are left with those pixels that correspond to significant brightness changes in the image.

At this point, the number of edge pixels can be used to quantify the edginess or smoothness of the texture. The sizes of clusters of edge points, on the other hand, can be used to quantify the granularity of the texture.

Once the image pixels and the magnitudes and directions of their gradients are computed, we can create a two-dimensional edge histogram, where one dimension corresponds to the degree of change and the other corresponds to the direction of 2. Convolution-based edge detection on a given image: a the center of the edge detection operator small matrix is aligned one by one with each and every suitable pixel in the image. In particular, we can count and record the number of edge pixels corresponding to each histogram value range.

This histogram can then be used to represent the overall directionality of the texture. Note that we can further extend this two-dimensional histogram to three dimensions, by finding how far apart the edge pixels are from each other along the change direction i. This would help capture the periodicity of the texture, that is, how often the basic elements of the texture repeat themselves. Instead it is a property of a set of neighboring pixels that help differentiate the set of pixels from the other pixels in the image.

Color and texture, for example, are commonly used to help segment out shapes from their background in the given image. The three sample images in Figures 2.

Thus, in all three cases, color and texture can be used to segment out the dominant shapes from the rest of the image. The sample image in Figure 2. Therefore, a naive color- and texture-based segmentation process would not identify the human shape, but instead would identify regions that are consistently red, white, brown, and so forth.

Extracting the human shape as a consistent atomic unit requires external knowledge that can help link the individual components, despite their apparent differences, into a single human shape. Therefore, the human shape may be considered as a high-level feature. There are various approaches to the extraction of shapes from a given image.

We discuss a few of the prominent schemes next. Segmentation Segmentation methods identify and cluster together those neighboring image pixels that are visually similar to each other Figure 2. This can be done using clustering such as K-means and partitioning such as min-cut algorithms discussed later in Chapter 8 [Marroquin and Girosi, ; Tolliver and Miller, ; Zhang and Wang, ]. A commonly used alternative is to grow homogeneous regions incrementally, from seed pixels selected randomly or based on some criteria, such as having a color well-represented in the corresponding histogram [Adams and Bischof, ; Ikonomakis et al.

Edge Detection and Linking Edge linking—based methods observe that boundaries of the shapes are generally delineated from the rest of the image by edges. These edges can be detected using edge detection techniques introduced earlier in Section 2.

Naturally, edges can be found at many places in an image, not all corresponding to region boundaries. Thus, to differentiate the edges that correspond to region boundaries from other edges in the image, we need to link the neighboring edge pixels to each other and check whether they form a closed region [Grinaker, ; Montanari, ; Rosenfeld et al.

As in edge-detection—based schemes, the watershed transformation identifies the gradients i. However, instead of identifying edges by suppressing those pixels that have smaller gradients less change than their neighbors and linking them to each other, the watershed algorithm treats the gradient image i.

The watershed lines are then treated as the boundary of the neighboring regions, and all pixels that shed to the same watershed lines are treated as a region [Beucher, ; Beucher and Lantuejoul, ; Beucher and Meyer, ; Nguyen et al. Describing the Boundaries of the Shapes Once the boundaries of the regions are identified, the next step is to describe their boundary curves in a way that can be stored, indexed, queried, and matched against others for retrieval [Freeman, , ; Saghri and Freeman, ].

The simplest mechanism for storing the shape of a region is to encode it using a string, commonly referred to as the chain code. In the chain code model for shape boundaries, each possible direction between two neighboring edge pixels is given a unique code Figure 2.

Starting from some specific pixel such as the leftmost pixel of the boundary , the pixels on the boundary are visited one by one, and the directions in which one traveled while visiting the edge pixels are noted in the form of a string Figure 2.

Note that the chain code is sensitive to the starting pixel, scaling, and rotation, but is not sensitive to translation or spatial shifts in the image. In general, the length of a chain code description of the boundary of a shape is equal to the number of pixels on the boundary. It is, however, possible to reduce the size of the representation by storing piecewise linear approximations of the boundary segments, rather than storing a code for each pair of neighboring pixels.

Note that finding the best set of line segments that represent the boundary of a shape requires application of curve segmentation algorithms, such as the one presented by Katzir et al. When the piecewise linear representation is not precise or compact enough, higher degree polynomial representations or B-splines can be used instead of the 2. Alternatively, the shape boundary can be represented in the form of a time series signal Figure 2. This compressibility property makes this representation attractive for low-bandwidth data exchange scenarios, such as object-based video compression in MPEG-4 [Koenen, ; MPEG4].

Shape Histograms As in color and texture histograms, shape histograms are constructed by counting certain quantifiable properties of the shapes and recording them into a histogram vector. For example, if the only relevant features are the 8 directional codes shown in Figure 2.

A number of other important shape properties are defined in terms of the moments of an object. Given this definition, the orientation i. Like most shape detection and indexing algorithms, Hough transform also starts with an edge detection step.

Consider for example the edge detection process described in Section 2. Let us, for now, also assume that the shapes we are looking for are line segments. This second formulation is interesting, because it provides an equation that relates the possible values of a to the possible values of m.

Moreover, this equation is also an equation of a line, albeit not on the x, y space, but on the m, a space. Although this equation alone is not sufficient for us to determine the specific m and a values for the line segment that contains our edge pixel, if we consider that all the pixels on the same line in the image will have the same m and a values, then we 2.

These pixels give us the set of equations a a The preceding strategy, however, has a significant problem. Although this would work in the ideal case where the x and y values on the line are identified precisely, in the real world of images where the edge pixel detection process is highly noisy, it is possible that there will be small variations and shifts in the pixel positions.

Consequently, the given set of equations may not have a common solution. Moreover, if the set of edge pixels are not all coming from a single line but are from two or more distinct line segments in the image, then even if the edge pixels are identified precisely, the set of equations will not have a solution. Thus, instead of trying to simultaneously solve the foregoing set of equations for a single pair of m and a, the Hough transform scheme keeps a two-dimensional accumulator matrix that accumulates votes for the possible m and a values.

More precisely, one dimension of the accumulator matrix corresponds to the possible values of m and the other corresponds to possible values of a. In other words, as in histograms, each array position of the accumulator corresponds to a range of m and a values. All entries in the accumulator are initially set to 0. We consider each equation one by one. Once we identify those accumulator entries, we increment the corresponding accumulator values by 1.

The intuition is that, if there is a more or less consistent line segment in the image, then maybe not all, but most of its pixels will be aligned and they will all vote for the same m and a pair. Consequently, the corresponding accumulator entry will accumulate a large number of votes. Thus, after we process the votes implied by all edge pixels in the image, we can look at the accumulator matrix and identify the m and a pairs where the accumulated votes are the highest.

These will be the m and a values that are most likely to correspond to the line segments in the image. Note that a disadvantage of this scheme is that, for vertical line segments, the slope m would be infinity, and it is hard to design a bounded accumulator for the unbounded m, a space.

If we are looking for shapes other than lines, we need to use equations that define those shapes as the bases for the transformations. This equation, however, may be costly to use because it has three unknowns a, b, and r the center coordinates and the radius and is nonlinear.

Fortunately, because the edge detection algorithm process described in Section 2. This final formulation eliminates r and relates the possible b and a values in the form of a line on the a, b space. Thus, a vote accumulator similar to the one for lines of images can be used to detect the centers of circles in the image.

Once the centers are identified, the radii can be computed by reassessing the pixels that voted for these centers. Finally, note that the Hough transform can be used as a shape histogram in two different ways. One approach is to use the accumulators to identify the positions of the lines, circles, and other shapes in the image and create histograms that report the numbers and other properties of these shapes.

Data Eng. View 1 excerpt, cites background. View 1 excerpt. Multi evidence fusion scheme for content-based image retrieval by clustering localised colour and texture features. Mathematics, Computer Science. View 2 excerpts, cites background.

Large Spatial Database Indexing with aX-tree. Highly Influenced. View 4 excerpts, cites background. View management in multimedia databases. Multimedia Tools and Applications. Supporting efficient multimedia database exploration. Principles of Visual Information Retrieval. Advances in Pattern Recognition.



0コメント

  • 1000 / 1000