Segmenting
Segmenting
|
1.
|
EasyOCR analyses the blobs to locate the characters and their bounding box, using one of two segmentation modes: |
|
□
|
keep objects mode: one blob corresponds to one character. |
|
□
|
repaste objects mode: the blobs are grouped into characters of a nominal size. This is useful when characters are broken or made up of several parts. When a blob is too large to be considered a single character, it can be split automatically using CutLargeChars. |
Character segmentation by blob grouping
|
2.
|
Filters remove very large and very small unwanted features. |
|
3.
|
EasyOCR processes the character image to normalize the size into a bounding box, extracts relevant features, and stores them in the font file. The patterns in a font are stored as arrays of pixels defined by PatternWidth and PatternHeight (by default 5 pixels wide and 9 pixels high). |
Segmentation parameters
Segmentation parameters must be the same during learning and recognition. Good segmentation improves recognition.
|
●
|
The Threshold parameter helps separate the text from the background. A too high value thickens black characters on white background and may cause merging, a too small value makes parts disappear. If the lighting conditions are very variable, automatic thresholding is a good choice. |
Too high threshold value (left), Threshold adjustment (middle), Too low threshold value (right)
|
●
|
NoiseArea: Blob areas smaller than this value are discarded. Make sure small character features are preserved (i.e., the dot over an "i" letter). |
|
●
|
MaxCharWidth, MaxCharHeight: Maximum character size. If a blob does not fit in a rectangle with these dimensions, it is discarded or split into several parts using vertical cutting lines. If several blobs fit in a rectangle with these dimensions, they are grouped together. |
|
●
|
MinCharWidth, MinCharHeight: Minimum character size. If a blob or a group of blobs fits in a rectangle with these dimensions, it is discarded. |
|
●
|
CharSpacing: The width of the smallest gap between adjacent letters. If it is larger than MaxCharWidth it has no effect. If the gap between two characters is wider than this, they are treated as different characters. This stops thin characters being incorrectly grouped together. |
|
●
|
RemoveBorder: Blobs near image/ROI edges cannot normally be exploited for character recognition. By default, they are discarded. |