TheIOcrSettingManager.GetSettingNamesmethod returns the names of the values as described in this table and in the same order.
The following table describes the settings supported by the LEADTOOLS OCR Module - LEAD Engine:
Recognition
Recognition.Adaption.AdaptedDataFilePath
Boolean type. Not used in this version of LEADTOOLS
The default orientation for the generated document if a page is blank or entirely graphics. Enum type with a default value set toNone.
Possible values are:
Value | Description |
---|---|
(0) None | Do not change the orientation. |
(1) Portrait | Try to change to portrait orientation (make the width less than height) if the page is blank or entirely graphics. |
(2) Landscape | Try to change to landscape orientation (make the width greater than height) if the page is blank or entirely graphics. |
Recognition.AutoRecognizeManager.FormatSpeedOptimized
Boolean type.trueto optimize the recognition speed based on the format of the final document; otherwise,false. The default value istrue. As an example, the OCR engine does not recognize font attributes such as italic or bold if the final document format is Text.
Recognition.AutoSecondPass
Boolean type. Automatically perform second image processing cleanup on the internal black and white image if the first pass did not provide satisfactory results. The default value istrue.
Recognition.CharacterFilter.DiscardNoiseLikeCharacters
Boolean type.trueto ignore recognized characters that have features similar to noise; otherwise,false. The default value isfalse.
Recognition.CharacterFilter.DiscardNoisyZones
Boolean type.trueto discard all the results in the zone if the engine determines that all the characters recognized are noise; otherwise,false. The default value isfalse.
Recognition.CharacterFilter.MinimumPixelHeight
Integer (0 to Int32.MaxValue). The minimum height of a recognized character, in pixels. The default value is6.
Recognition.CharacterFilter.MinimumPixelSizeExcludeCharacters
String (No maximum. Can benull). Characters to exclude from the minimum pixel width and height rule. The default value is".".
Recognition.CharacterFilter.MinimumPixelWidth
Integer (0 to Int32.MaxValue). The minimum width of a recognized character, in pixels. The default value is6.
Recognition.CharacterFilter.PostprocessMICR
Boolean Type.trueto post-process any MICR zones by discarding all of the characters, numbers and symbols that do not belong to the MICR character set as well as performing basic validity data checking; otherwise,false. The default value istrue.
Recognition.DetectColors
Boolean type. Automatically detect the foreground and background colors of each character. The default value istrue. If this value istrue, then the engine tries to automatically detect the colors of the zones whenIOcrPage。AutoZoneis called and sets the values inOcrZone.ForeColorandOcrZone.BackColor.
Recognition.DetectExactCharacterBounds
Boolean type.trueto detect the exact bounding rectangle for each recognized character; otherwise,false. The default value isfalse.
Recognition.Fonts.DetectFontStyles
Enum type. Enable or disable the detection of specific font properties. These flags affect the final generated document if the format supports fonts (such as the PDF or DOCX formats). The default value isBold|Italic|Underline|SansSerif|Serif|Proportional|Superscript|Subscript|Strikeout(0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80 | 0x100 = 1FF).
Values can be OR-ed together. Possible values are:
Value | Description |
---|---|
(0) None | Do not detect any font styles. |
(0x01) Bold | Detect bold font styles. |
(0x02) Italic | Detect italic font styles. |
(0x04) Underline | Detect underline font styles. |
(0x08) SansSerif | Detect Sans-Serif font styles (for example, Arial). |
(0x10) Serif | 检测(衬线字体样式example, Times New Roman). |
(0x20) Proportional | Detect proportional font styles (for example, Times New Roman or Arial) or fixed space font styles (for example, Courier New). |
(0x40) Superscript | Detect super-script font styles. |
(0x80) Subscript | Detect subscript font styles. |
(0x100) Strikeout | Detect strikeout font styles. |
Recognition.Fonts.EnableCapsCaps
Boolean type.trueto enable Caps/Caps (CamelCase) font recognition enhancements; otherwise,false. The default value isfalse.
Recognition.Fonts.RecognizeFontAttributes
Boolean type.true使字体属性识别;否则,false. The default value istrue.
Setting this value tofalsecan improve the speed of theIOcrPage。Recognizemethod if the font attributes of the recognized characters is not required, for instance, if recognition is used to obtain raw text only and not to create a formatted output document.
Recognition.MaximumPageConventionalMemorySize
Integer (0 to Int32.MaxValue). The appropriate setting forRecognition.MaximumPageConventionalMemorySize
depends on the system hardware configuration and the number of cores and application types being used. Change this setting if out-of-memory errors occur when running your application. TheIOcrEnginesupports loadingRasterImageobjects directly from disk files, streams or URLs, (as for example, the various methods in theIOcrPageCollectionandIOcrAutoRecognizeManagerclasses do). TheRasterImageloaded holds the original image and is useful only when saving graphics zones or image-over-text overlays. If the image is large, and one created using conventional memory, then a large amount of physical memory is used to hold this image in memory and is not available for other purposes such as auto-zoning or recognition. This is more noticeable in multi-threaded applications where loading several large images in conventional memory can cause out-of-memory errors, even when performing operations that normally would succeed.
Use "MaximumPageConventionalMemorySize" to set the maximum size of the image in memory allowed before theIOcrEngineautomatically switches to use the disk memory feature ofRasterImageand (RasterMemoryFlags). The "MaximumPageConventionalMemorySize" is in KBytes and its default value depends on the processors being used. For x86 processors, the value is42187(42MBytes). For x64 processors, the value is calculated dynamically (1.7GBytes for each 8 cores, not exceeding the physical memory size). These values allow a typical OCR image (8.5 by 11 inches at 300 DPI and 32-bits per pixel) to be loaded in conventional memory. Anything significantly larger than that gets switched to use disk memory mode.
Different factors affect the performance of a particular setting and must be weighed and they include the following factors:
Recognition.ModifyProcessingImage
Boolean type.trueto modify the processing image after recognition; otherwise,false. The default value istrue. It is best to set the value of this setting totrueifIOcrPage。Recognizeis called only once per page.
IOcrAutoRecognizeManagertemporarily sets the value of this setting totruewhile performing a recognition job.
Recognition.Preprocess.BlackWhiteImageConversionMethod
Enum type. This setting influences how a non-black and white image, stored in the Engine, is converted to a black and white one. The default value isDefault(0).
Possible values are:
Value | Description |
---|---|
(0) Default | This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Image binarization applies an automatic adaptive thresholding algorithm. |
(1) Dynamic | This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Each pixel is compared to a dynamically-calculated threshold. If the pixel intensity is higher it is set to white; otherwise, it is set to black. |
(2) User | This affects grayscale or 24-bit color images. A black and white image is created in the Engine's memory. Thresholding with a user-defined threshold value is performed. Set the threshold withRecognition.Preprocess.BlackWhiteImageConversionThreshold). |
Recognition.Preprocess.DownSampleLargeImage
Boolean type.trueto downsample large images prior to recognition; otherwise,false. The default value isfalse.
Set the value totrueto prevent the OCR engine from creating processing images (the image used for recognition) larger than 4000 by 4000 pixels (in order to preserve memory and resources). This value is ignored if the value of theMobileImagePreprocesssetting istrue.
Recognition.Preprocess.BlackWhiteImageConversionThreshold
Integer (0 to 255). The threshold to use when converting colored images to bitonal (black and white) in preparation for recognizing the text on the image. Conversion separates the text intensities from the background intensities. The default value is185.
This is the equivalent of callingIntensityDetectCommandon the image withInColorequal to the detected foreground (text) color,OutColorequal to the detected background color,Channelset toMaster,HighThresholdequal to 255, andLowThresholdequal to the value of this setting.
Recognition.Preprocess.ModifyOriginalImageOptions
Enum type. Specifies how the original image is modified when calling theIOcrPage。AutoPreprocessmethod. The default value isDeskew|Rotate|Invert(0x01 | 0x02 | 0x04 = 0x07).
Values can be OR'd together. Possible values are:
Value | Description |
---|---|
(0) None | Never modify the original image. |
(0x01) Deskew | Apply any angle found while deskewing (IOcrPage。GetDeskewAngle) the original image. |
(0x02) Rotate | Apply the angle (always a right angle) found while performing auto-orientation (IOcrPage。GetRotateAngle)原始图像(auto-orient)。 |
(0x04) Invert | Apply the inversion value (IOcrPage。IsInverted) on the original image. |
Recognition.Preprocess.MobileImagePreprocess
Boolean type.trueto enable mobile image processing mode; otherwise,false. The default value isfalse.
By design, the OCR engine tries to upscale low resolution (DPI) images. However, most cameras in mobile devices take pictures with a low resolution (for example, 72 DPI) but a large pixel size. If the OCR engine upscales such images, a lot more memory than necessary is consumed. If you are using the OCR engine to process images from a mobile camera, set the mobile image processing mode totrue.
Recognition.Preprocess.RemoveInvertedTextRegionsFromProcessImage
Boolean type.trueto automatically detect and recognize inverted regions, otherwise;false. The default value isfalse.
Set this value totrueto support OCRing an image that contains both black/white and white/black areas.
Recognition.Preprocess.UseZoningEngine
Boolean type.trueto use the zoning engine to exclude graphics areas from preprocessing calculations such as deskew and auto-rotate; otherwise,false. The default value istrue.
Recognition.Preprocess.MinimumAutoRotateConfidence
Integer (0 to 100). Used by theIOcrPage。AutoPreprocessmethod to determine the minimum confidence percentage threshold to use when orienting pages. The default value is26.
Recognition.RecognitionModuleTradeoff
Enum type. Recognition module tradeoff between speed and accuracy. The default value isBalanced(1).
Possible values are:
Value | Description |
---|---|
(0) Accurate | Accuracy is more important than speed. |
(1) Balanced | Accuracy and speed are equally important. |
(2) Fast | Speed is more important than accuracy. |
Recognition.RecognitionModuleType
Enum type. Specifies the recognition module to use to process machine-printed, handwritten text, or a combination of both.
Possible values are:
Value | Description |
---|---|
(0) OCR | The engine considers all input images to have machine-printed text only. |
(1) ICR | The engine considers all input images to have handwritten text only. |
(2) Mixed | The engine considers all input images to have a mix of machine-printed text and handwritten text. The engine detects automatically handwritten zones and machine printed zones, even if these zones have not been pre-specified by users. |
Recognition.ShareOriginalImage
Boolean type.trueto share the image reference used to create the OCR page; otherwise,false. The default value isfalse.
Setting this value totrueaffects theIOcrPageCollection.AddPage(RasterImage, OcrProgressCallback)andIOcrPageCollection.InsertPage(int, RasterImage, OcrProgressCallback)methods. When the value isfalse(the default), these methods make a copy of the image and use the copy to create the page. CallingIOcrPage。GetRasterImage(OcrPageType.Original)on such a page returns anullreference.
When the value istrue, these methods use the same image reference to create the page. CallingIOcrPage。GetRasterImage(OcrPageType.Original)on such a page returns the original image reference.
IOcrAutoRecognizeManagertemporarily sets the value of this setting totruewhile performing a recognition job.
Recognition.Threading.MaximumThreads
Integer (0 to Int32.MaxValue). Gets or sets the maximum number of threads to use in recognition. The LEADTOOLS OCR Module - LEAD Engine provides support for recognizing document zones in separate threads. This can improve the performance of theIOcrPage。Recognizemethod.
The default value is0, which instructs LEADTOOLS to use the system thread pool. The number of threads is calculated automatically. Setting the value to a number > 1 also turns multi-threading on, but the number of threads specified is ignored and calculated automatically instead. To turn multi-threading off inside theIOcrPage。Recognizemethod and use a single thread, set the value to1.
Recognition.Zoning.CropZoneImage
Boolean type.trueto crop each zone from the original image and recognize it by itself; otherwise,false. The default value istrue.
Setting this value totruecan improve the performance of theIOcrPage。Recognizemethod.
Recognition.Zoning.DetectVerticalZones
Enum type. Vertical zone detection mode. This works with Latin and Asian languages.
Possible values are:
Value | Description |
---|---|
(0) Auto | Automatic - currently this means on for Asian languages such as Japanese as off for Latin languages such as English. |
(1) On | On - currently this means on for both Asian languages such as Japanese and Latin languages such as English. |
(2) Off | Off - currently this means off for both Asian languages such as Japanese and Latin languages such as English. |
Recognition.Zoning.DetectZoneRotationAngle
Boolean type.trueto try to detect a separate rotation angle for each zone; otherwise,false. The default value isfalse.
Setting this value tofalsecan increase the speed of the recognition engine.
Recognition.Zoning.DisableMultiThreading
Boolean type.trueto disable multi-threading when performing auto-zoning; otherwise, multi-threading is enabled. The default value isfalse.
Multi-threading enhances the performance of the auto-zoning algorithm. However, it can be undesirable if the OCR engine is hosted on a server.
Recognition.Zoning.EnableDoubleZoning
Boolean type.trueto perform a second, internal autozoning procedure on each text zone in order to generate more homogenous zones for recognition; otherwise,false. The default value istrue. Setting this value can improve the performance of theIOcrPage。Recognizemethod.
Recognition.Zoning.Options
Enum type. These flags affect the way theIOcrPage。AutoZonemethod works. The default value isDetect Text|Detect Graphics|Detect Table|Detect Accurate Zones|Table Cells As Zones|Use Advanced Table Detection|Use Text Extractor|Favor Graphics(0x01 | 0x02 | 0x04 | 0x10 | 0x40 | 0x80 | 0x100 | 0x400).
Values can be OR'ed together. Possible values are:
Value | Description |
---|---|
(0) None | No options. The engine will not detect any zones and hence no recognition is performed. |
(0x01) Detect Text | Search for text zones inside the page image. |
(0x02) Detect Graphics | Search for graphic zones inside the page image. |
(0x04) Detect Table | Search for table zones inside the page image. |
(0x08) Allow Overlap | Allow zones to overlap; otherwise, detected zones will not overlap. |
(0x10) Detect Accurate Zones | Detect smaller and more accurate zones (like page paragraphs). Unless this flag is set the auto-zone algorithm tries to detect the largest possible zones. |
(0x20) Recognize One Cell Table | Even if a table has only one cell, detect it as a table. Must be OR'ed withDetect Table |
(0x40) Table Cells as Zones | Treat each cell detected inside a table as its own zone. If this option is set, the zone types areOcrZoneType.Textinstead ofOcrZoneType.Table. Must be OR'ed withDetect Table. |
(0x80) Use Advanced Table Detection | Use advanced table detection to obtain the most accurate results when the document contains tables. This option recursively and aggressively parses the document to locate the positions of tables and cells. Using this option generates the most accurate representation of the original document and its tables in the final output. This option must be OR'ed withDetect Table. |
(0x100) Use Text Extractor | Improves text zone recognition. Extracts text by separating graphics and tables from text areas. |
(0x200) Detect Checkbox | Search for checkbox zones inside the page image. |
(0x400) Favor Graphics | Favor converting a blob with very low accuracy into a contiguous graphics instead of text. This option is on by default and results in better visual representation of the output documents. The OCR engine will set this option off when performing Auto Recognize with Text or PDF with Image over Text options set. In this mode, the engine assumes that the result should contain all text parsed regardless of quality. |
Recognition.Words.DiscardLowConfidenceWords
Boolean type.trueto discard words with a low rating; otherwise,false. The default value istrue.
This setting controls the output. Iftrue, the engine checks the confidence of each word and prevents any having a low rating (lower than theLowWordConfidencevalue) from being included when saving the recognition results to any of the document formats supported by the LEADTOOLS toolkits.
Recognition.Words.DiscardLowConfidenceZones
Boolean type.trueto discard zones with low ratings; otherwise,false. The default value isfalse. This setting controls the output. Iftrue, the engine checks all of the words/characters in a zone. If it determines that the overall confidence and type of characters constitute noise, then the recognition results for the entire zone is discarded.
Recognition.Words.LowWordConfidence
Integer (0 to 100). Discard any word with a confidence value less than this value. the default value is50. This setting only takes effect whenDiscardLowConfidenceWordsis set totrue.
SpellChecker
SpellChecker.EnableCache
Boolean type.trueto enable caching of frequent words, otherwise;false. The default value istrue.
SpellChecker.MaximumDictionaries
Integer (0 to 255). Gets or sets the maximum number of spell checkers to use at the same time. The default value is the number of available dictionaries found in the system.