diff options
author | nicholas-leonard <nick@nikopia.org> | 2014-10-16 06:56:25 +0400 |
---|---|---|
committer | nicholas-leonard <nick@nikopia.org> | 2014-10-16 06:56:25 +0400 |
commit | 5967a156a7d39b0d551716c46a66c01a7d787e5d (patch) | |
tree | f29d6ce3037d599169db0f65d03a1928b5ccf618 /README.md | |
parent | 0994bad0134266d3d557eb2b5a2132932d82ddb6 (diff) |
doc anchors and cleanup
Diffstat (limited to 'README.md')
-rw-r--r-- | README.md | 124 |
1 files changed, 93 insertions, 31 deletions
@@ -1,16 +1,22 @@ # image Package Reference Manual # -Unless speficied otherwise, this package deals with images of size -`nChannel x height x width`. +__image__ is the [Torch7 distribution](http://torch.ch/) package for processing +images. It contains a wide variety of functions divided into the following categories: * [Saving and loading](#image.saveload) images as JPEG, PNG, PPM and PGM; * [Simple transformations](#image.simpletrans) like translation, scaling and rotation; * [Parameterized transformations](#image.paramtrans) like convolutions and warping; * [Graphical user interfaces](#image.grapicalinter) like display and window; * [Color Space Conversions](#image.colorspace) from and to RGB, YUV, Lab, and HSL; - * [Constant Tensors](#image.constanttensor) like Lenna, Fabio and Gaussian and Laplacian kernels; + * [Tensor Constructors](#image.tensorconst) for creating Lenna, Fabio and Gaussian and Laplacian kernels; + +Note that unless speficied otherwise, this package deals with images of size +`nChannel x height x width`. <a name="image.saveload"/> ## Saving and Loading ## +This sections includes functions for saving and loading different types +of images to and from disk. +<a name="image.load"/> ### [res] image.load(filename, [depth, tensortype]) ### Loads an image located at path `filename` having `depth` channels (1 or 3) into a [Tensor](https://github.com/torch/torch7/blob/master/doc/tensor.md#tensor) @@ -27,6 +33,7 @@ The returned `res` Tensor has size `nChannel x height x width` where `nChannel` 1 (greyscale) or 3 (usually [RGB](https://en.wikipedia.org/wiki/RGB_color_model) or [YUV](https://en.wikipedia.org/wiki/YUV). +<a name="image.save"/> ### image.save(filename, tensor) ### Saves Tensor `tensor` to disk at path `filename`. The format to which the image is saved is extrapolated from the `filename`'s extension suffix. @@ -34,17 +41,22 @@ The `tensor` should be of size `nChannel x height x width`. <a name="image.simpletrans"/> ## Simple Transformations ## +This section includes simple but very common image transformations +like cropping, translation, scaling and rotation. +<a name="image.crop"/> ### [res] image.crop([dst,] src, x1, y1, [x2, y2]) ### Crops image `src` at coordinate `(x1, y1)` up to coordinate `(x2, y2)`. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.translate"/> ### [res] image.translate([dst,] src, x, y) ### Translates image `src` by `x` pixels horizontally and `y` pixels vertically. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.scale"/> ### [res] image.scale(src, width, height, [mode]) ### Rescale the height and width of image `src` to have width `width` and height `height`. Variable `mode` specifies @@ -64,14 +76,17 @@ width of the output, respectively. Rescale the height and width of image `src` to fit the dimensions of Tensor `dst`. +<a name="image.rotate"/> ### [res] image.rotate([dst,], src, theta) ### Rotates image `src` by `theta` radians. If `dst` is specified it is used to store the results of the rotation. +<a name="image.hflip"/> ### [res] image.hflip([dst,] src) ### Flips image `src` horizontally (left<->right). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.vflip"/> ### [res] image.vflip([dst,], src) ### Flips image `src` vertically (upsize<->down). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. @@ -96,6 +111,7 @@ When `saturate=true`, the result of the compression is passed through When provided, Tensor `tensorOut` is used to store results. Note that arguments should be provided as key-value pairs (in a table). +<a name="image.gaussianpyramid"/> ### [res] image.gaussianpyramid([dst,] src, scales) ### Constructs a [Gaussian pyramid](https://en.wikipedia.org/wiki/Gaussian_pyramid) of scales `scales` from a 2D or 3D `src` image or size @@ -110,7 +126,11 @@ Internally, this function makes use of functions [image.gaussian](#image.gaussia <a name="image.paramtrans"/> ## Parameterized transformations ## +This section includes functions for performing transformations on +images requiring parameter Tensors like a warp `field` or a convolution +`kernel`. +<a name="image.warp"/> ### [res] image.warp([dst,]src,field,[mode,offset,clamp]) ### Warps image `src` (of size`KxHxW`) according to flow field `field`. The latter has size `2xHxW` where the @@ -124,6 +144,7 @@ Permitted values are strings *clamp* (the default) or *pad*. If `dst` is specified, it is used to store the result of the warp. Otherwise, returns a new `res` Tensor. +<a name="image.convolve"/> ### [res] image.convolve([dst,] src, kernel, [mode]) ### Convolves Tensor `kernel` over image `src`. Valid string values for argument `mode` are : @@ -135,24 +156,7 @@ Note that this function internally uses If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. -<a name="image.toDisplayTensor"/> -### [res] image.toDisplayTensor(input, padding, nrow, scaleeach, min, max, symmetric, saturate) ### -Returns a single `res` Tensor that contains a grid of all in the images in `input`. -The latter can either be a table of image Tensors of size `height x width` (greyscale) or -`nChannel x height x width` (color), -or a single Tensor of size `batchSize x nChannel x height x width` or `nChannel x height x width` -where `nChannel=[3,1]`, `batchSize x height x width` or `height x width`. - -Unless `input` is a table and `scaleeach=false` (the default), all detected images -are compressed with successive calls to [image.minmax](#image.minmax): -```lua -image.minmax{tensor=input[i], min=min, max=max, symm=symmetric, saturate=saturate} -``` -`padding` specifies the number of padding pixels between images. The default is 0. -`nrow` specifies the number of images per row. The default is 6. - -Note that arguments can also be specified as key-value arguments (in a table). - +<a name="image.lcn"/> ### [res] image.lcn(src, [kernel]) ### Local contrast normalization (LCN) on a given `src` image using kernel `kernel`. If `kernel` is not given, then a default `9x9` Gaussian is used @@ -162,10 +166,19 @@ To prevent border effects, the image is first global contrast normalized (GCN) by substracting the global mean and dividing by the global standard deviation. +Then the image is locally contrast normalized using the following equation: ```lua res = (src - lm(src)) / sqrt( lm(src) - lm(src*src) ) ``` +where `lm(x)` is the local mean of each pixel in the image (i.e. +`image.convolve(x,kernel)`) and `sqrt(x)` is the element-wise +square root of `x`. In other words, LCN performs +local substractive and divisive normalization. +Note that this implementation is different than the LCN Layer defined on page 3 of +[What is the Best Multi-Stage Architecture for Object Recognition?](http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf). + +<a name="image.erode"/> ### [res] image.erode(src, [kernel, pad]) ### Performs a [morphological erosion](https://en.wikipedia.org/wiki/Erosion_(morphology)) on binary (zeros and ones) image `src` using odd @@ -174,6 +187,7 @@ The default is a kernel consisting of ones of size `3x3`. Number `pad` is the value to assume outside the image boundary when performing the convolution. The default is 1. +<a name="image.dilate"/> ### [res] image.dilate(src, [kernel, pad]) ### Performs a [morphological dilation](https://en.wikipedia.org/wiki/Dilation_(morphology)) on binary (zeros and ones) image `src` using odd @@ -184,9 +198,34 @@ the convolution. The default is 0. <a name="image.grapicalinter"/> ## Graphical User Interfaces ## -The following functions require package [qtlua](https://github.com/torch/qtlua). +The following functions, except for [image.toDisplayTensor](#image.toDisplayTensor), +require package [qtlua](https://github.com/torch/qtlua) and can only be +accessed via the `qlua` Lua interpreter (as opposed to the +[th](https://github.com/torch/trepl) or luajit interpreter). + +<a name="image.toDisplayTensor"/> +### [res] image.toDisplayTensor(input, [...]) ### +Optional arguments `[...]` expand to `padding`, `nrow`, `scaleeach`, `min`, `max`, `symmetric`, `saturate`. +Returns a single `res` Tensor that contains a grid of all in the images in `input`. +The latter can either be a table of image Tensors of size `height x width` (greyscale) or +`nChannel x height x width` (color), +or a single Tensor of size `batchSize x nChannel x height x width` or `nChannel x height x width` +where `nChannel=[3,1]`, `batchSize x height x width` or `height x width`. + +Unless `input` is a table and `scaleeach=false` (the default), all detected images +are compressed with successive calls to [image.minmax](#image.minmax): +```lua +image.minmax{tensor=input[i], min=min, max=max, symm=symmetric, saturate=saturate} +``` +`padding` specifies the number of padding pixels between images. The default is 0. +`nrow` specifies the number of images per row. The default is 6. + +Note that arguments can also be specified as key-value arguments (in a table). -### [res] image.display(input, zoom, min, max, legend, w, ox, oy, scaleeach, gui, offscreen, padding, symm, nrow) ### +<a name="image.display"/> +### [res] image.display(input, [...]) ### +Optional arguments `[...]` expand to `zoom`, `min`, `max`, `legend`, `win`, +`x`, `y`, `scaleeach`, `gui`, `offscreen`, `padding`, `symm`, `nrow`. Displays `input` image(s) with optional saturation and zooming. The `input`, which is either a Tensor of size `HxW`, `KxHxW` or `Kx3xHxW`, or list, is first prepared for display by passing it through [image.toDisplayTensor](#image.toDisplayTensor): @@ -200,67 +239,86 @@ The resulting `input` will be displayed using [qtlua](https://github.com/torch/q The displayed image will be zoomed by a factor of `zoom`. The default is 1. If `gui=true` (the default), the graphical user inteface (GUI) is an interactive window that provides the user with the ability to zoom in or out. -This can be turned off for a faster display. +This can be turned off for a faster display. `legend` is a legend to be displayed, +which has a default value of `image.display`. `win` is an optional qt window descriptor. +If `x` and `y` are given, they are used to offset the image. Both default to 0. +When `offscreen=true`, rendering (to generate images) is performed offscreen. - -### [window, painter] image.window(resize, mousepress, mousedoublepress) ### -Creates a window context for images. +<a name="image.window"/> +### [window, painter] image.window([...]) ### +Creates a window context for images. +Optional arguments `[...]` expand to `hook_resize`, `hook_mousepress`, `hook_mousedoublepress`. +These have a default value of `nil`, but may correspond to commensurate qt objects. <a name="image.colorspace"/> ## Color Space Conversions ## +This section includes functions for performing conversions between +different color spaces. +<a name="image.rgb2lab"/> ### [res] image.rgb2lab([dst,] src) ### Converts a `src` RGB image to [Lab](https://en.wikipedia.org/wiki/Lab_color_space). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.rgb2yuv"/> ### [res] image.rgb2yuv([dst,] src) ### Converts a RGB image to YUV. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.yuv2rgb"/> ### [res] image.yuv2rgb([dst,] src) ### Converts a YUV image to RGB. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.rgb2y"/> ### [res] image.rgb2y([dst,] src) ### Converts a RGB image to Y (discard U and V). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.rgb2hsl"/> ### [res] image.rgb2hsl([dst,] src) ### Converts a RGB image to [HSL](https://en.wikipedia.org/wiki/HSL_and_HSV). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.hsl2rgb"/> ### [res] image.hsl2rgb([dst,] src) ### Converts a HSL image to RGB. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.rgb2hsv"/> ### [res] image.rgb2hsv([dst,] src) ### Converts a RGB image to [HSV](https://en.wikipedia.org/wiki/HSL_and_HSV). If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.hsv2rgb"/> ### [res] image.hsv2rgb([dst,] src) ### Converts a HSV image to RGB. If `dst` is provided, it is used to store the output image. Otherwise, returns a new `res` Tensor. +<a name="image.rgb2nrgb"/> ### [res] image.rgb2nrgb([dst,] src) ### Converts an RGB image to normalized-RGB. - -## Constant Tensors ## -The following functions construct Tensor constants like Gaussian or +<a name="image.tensorconst"/> +## Tensor Constructors ## +The following functions construct Tensors like Gaussian or Laplacian kernels, or images like Lenna and Fabio. +<a name="image.lena"/> ### [res] image.lena() ### Returns the classic `Lenna.jpg` image as a `3 x 512 x 512` Tensor. +<a name="image.fabio"/> ### [res] image.fabio() ### Returns the `fabio.jpg` image as a `257 x 271` Tensor. +<a name="image.gaussian"/> ### [res] image.gaussian([size, sigma, amplitude, normalize, [...]]) ### Returns a 2D [Gaussian](https://en.wikipedia.org/wiki/Gaussian_function) kernel of size `height x width`. When used as a Gaussian smoothing operator in a 2D @@ -285,6 +343,7 @@ of it. Note that arguments can also be specified as key-value arguments (in a table). +<a name="image.gaussian1D"/> ### [res] image.gaussian1D([size, sigma, amplitude, normalize, mean]) ### Returns a 1D Gaussian kernel of size `size`, mean `mean` and standard deviation `sigma`. @@ -299,6 +358,7 @@ while a standard deviation of 0.25 is a quarter of it. Note that arguments can also be specified as key-value arguments (in a table). +<a name="image.laplacian"/> ### [res] image.laplacian([size, sigma, amplitude, normalize, [...]]) ### Returns a 2D [Laplacian](https://en.wikipedia.org/wiki/Blob_detection#The_Laplacian_of_Gaussian) kernel of size `height x width`. @@ -322,18 +382,20 @@ where the top-left corner is the origin. In other works, a mean of 0.5 is the center of the kernel size, while a standard deviation of 0.25 is a quarter of it. +<a name="image.colormap"/> ### [res] image.colormap(nColor) ### Creates an optimally-spaced RGB color mapping of `nColor` colors. Note that the mapping is obtained by generating the colors around the HSV wheel, varying the Hue component. The returned `res` Tensor has size `nColor x 3`. +<a name="image.jetColormap"/> ### [res] image.jetColormap(nColor) ### Creates a jet (blue to red) RGB color mapping of `nColor` colors. The returned `res` Tensor has size `nColor x 3`. ## Dependencies: -Torch7 (www.torch.ch) +[Torch7](www.torch.ch) ## Install: ``` |