Prepare

PREP-#Number#

Nr.

Parameter

Example

2000

JobMode_Prepare

21

Sets the JobMode to be used in the Prepare step

2001

 

PrepAddConstantString

<CurrentDate><SplittingChar>

This value represents a string of characters which is prepended to a filename during the Prepare step. The example value, which uses variables in turn, indicates that the current date with some configurable separator character is used.

It is possible to use the current filenames via [INDEXFILE] and [DATAFILE], e.g.:

<CurrentDate><SplittingChar>[DATAFILE]<SplittingChar>

2002

 

SplittingChar

#

A separator character that can be used in various contexts (like a comma or a semicolon in a CSV file).

2003

PrepXMLInputElementList

imagefile;SapBarcodeId;Creator

 

There are different syntactical notations for this parameter. The first notation is suited for parsing from XML files without opening and closing tags, whereas the second one is suited for right this. Both variants can also be combined. These notations are explained below.

 

Besides, an extended parameter notation is supported. See <PrepXMLDocTypeElement>.

 

Variant 1:

Use a list of semicolon-separated tags to search for. The order of the tags determines the order in the output file. A non-existing tag element raises an error unless the tag is marked as optional with a preceding asterisk.

 

To read a tag’s attribute value, refer to this value by separating tag name (TN) and attribute name (AN) via a dot, e.g. “TN.AN”. To read a tag's text content (TC), refer to this text by separating tag name (TN) and attribute name (AN) via a dot and append the attribute value (AV) in curly brackets , e.g. "TN.AN{AV}".

If the tag name is preceded by a namespace (NS), the separator “:” must be replaced by the placeholder [DOPPELPUNKT], otherwise the parser would incorrectly assume syntax variant 2. So, in this case instead of “NS:TN.AN” use the appropriate notation:

NS[DOPPELPUNKT]TN.AN

 

If multiple occurrences of an element are possible, signal this with a pipe symbol |. The values in their output will then be separated by <PrepXMLInputElementSeparator>. An optional field with multiple occurrences is marked by the pipe followed by an asterisk |*.

  Example of an optional field with multiple occurrences  marked by the pipe followed by an asterisk |*
  Example of an optional field with multiple occurrences
 

Variant 2:

Use a query syntax to navigate to the relevant tags. The values of these tags are written into a CSV file separated by the character <SplittingChar>. The following queries illustrated below the figure are possible:

Example of the query syntax to navigate to the relevant tags

 
  • [tree]node

    This syntax implies that the element node is unique and tree is a parent element of node. Nested elements are separated by a double colon :: .

    For example, applied to the example above, the first hit of the following query… [documentlist::document::check]pdf

    … refers to the element pdf, which is a child of check, which in turn is a child of document, which in turn is a child of documentlist.

    This syntax can only be part of a query which is prepended to the following type of query.

 

  • element:attribute

    Returns the value of the attribute of the element. It must be ensured that the element is unique, e.g. by using [tree]. According to the above example:

    document:range returns bank

 
  • element:attribute()

    Returns a list of all values of the attribute of the element, which are separated by <PrepXMLInputElementSeparator>. By specifying [tree] the search can be restricted. According to the above example:

    file:name() returns check.txt,check.doc

 

  • element.attribute1{value1}:attribute2

    Returns the value of the attribute attribute2 of the element, which has another attribute attribute1 with the value value1. If there are any dot characters in value1, these must be indicated by[PUNKT]. By specifying [tree] uniqueness can be ensured. According to the above XML source e.g.

    • target at txt files only: file.type{txt}:name returns check.txt

    • counter of list1: documentlist.name{list1}:documentcount returns 10

 

  • An asterisk in the ending of a query marks it as optional, which returns an empty string if there is no search result. Otherwise processing is halted with a corresponding error message if the query is not successful.

2004

 

PrepXMLInputElementSeparator

,

If an element was specified via <PrepXMLInputElementList>, which starts with the pipe symbol “|”, multiple occurrences are returned separated by the character given here.

2005

 

PrepFilenameDataSeparator

_

If this separator character is given, the filename of the input file is parsed and according values are returned. If the filename is “Company1_27.6.2007_092138732894.txt”, for example, and the separator is an underscore “_”, three values are extracted: “Company1”, “27.6.2007” and “092138732894”.

2006

 

IndexFileName

Stammdaten.csv

In JobMode 23 the filename given here is interpreted as the relevant master data which belongs to this job.

2007

 

AdditionalPrepareConfigFile

<ColdConfigDirectory>/check.bpochecker.ini

In JobMode 90 an additional configuration file is needed, which contains settings that are only relevant for this JobMode.

2008

PrepBarcodeFormat

ID<SpoolType><Generation>[YEAR][FILE][DOCNR]

 

With this option a unique barcode can be generated and added to the data. The format is freely definable via the following variables:

 

[YEAR]

The current year, e.g. 2017

 

[MONTH]

The current month as a number, e.g. 12 for December (01-12)

 

[DAY]

The day of the month as a number, e.g. 31 (01-31)

 

[HOUR]

The hour of the processing time (00-24)

 

[MIN]

The minute of the processing time (00-59)

 

[SEC]

The second of the processing time (00-59)

 

[FILE]

The filename without its extension

 

[DOCNR]

A (sequential) document number in the current generation

 

[DOCNR:8]

A document number of 8 digits padded with leading zeroes

 

[SUBDOCNR]

A document number in a subfolder of multiple batches

 

[SUBDOCNR:10]

Like SUBDOCNR with ten digits padded with leading zeroes

 

<PID>

The current process ID of the process

 

<CurrentTimestamp>

The number of seconds since 1970

2009

 

PrepEvalCode

push(@item, $item[3] . $item[5] );

This option manipulates or adds single data values based on Perl code. The value must be a valid Perl command operating on the variable “@item”. This can be useful if one data field shall contain a combination of multiple other fields or if its content must be cut.

Examples:

  • Input:31.12.2007#RECH#1234567#

  • PrepEvalCode:push(@item, $item[1] . $item[2] );

  • Result:31.12.2007#RECH#1234567##RECH1234567#

  • Input:31.12.2007#RECH-ABC#1234567#

  • PrepEvalCode:push(@item, substr($item[1], -3) . $item[2] );

  • Result:31.12.2007#RECH-ABC#1234567##ABC1234567#

  •  

  • Input:31.12.2007#RECH-ABC#1234567#

  • PrepEvalCode:$item[1] = substr( $item[1], 0 , 4 );

  • Result:31.12.2007#RECH#1234567#

2010

 

PrepIndexFilePrefix

Index_

An indexing file is identified based on this prefix.

2011

 

PrepPageCountBinary

<ColdBinDirectory>/pdfPageCount [FILE]

With this parameter an external program can be executed, for example to determine the number of pages. The first row of output is added as a data value.

If the internal tool “TiffPageCount” is used, awk must additionally be applied to filter the filename from this tool’s output as illustrated below:

<ColdBinDirectory>/TiffPageCount [FILE] | 
awk -field-separator=\ '{print $NF;}'

2012

 

PrepRenameDataFileColumn

0

This value specifies the column number which indicates the new filename of a content file. With the value 0 the mechanism is deactivated. If the value is 3, and in the indexing file the third value is “invoice.pdf” a file is searched for that ends with “invoice.pdf”, which is then renamed into “invoice.pdf”. This option is useful in JobMode 31, which is based on constant prefixes.

2013

 

PrepConvertIndexFiles

No

If this parameter is set to “Yes”, the character set conversion tool “iconv” is run, which must be located in the system path. This is useful if the incoming indexing files must be converted e.g. from UTF-8 to ISO8859-1.

2014

 

PrepConvertFrom

UTF-8

If <PrepConvertIndexFiles> is set to “Yes”, this parameter defines the source encoding of the incoming data. Supported encodings can be listed with “iconv –l”.

2015

 

PrepConvertTo

ISO8859-1

If <PrepConvertIndexFiles> is set to “Yes”, this parameter defines the target encoding of the conversion output. Supported encodings can be listed with “iconv –l”.

2016

 

PrepareBinary

<ColdBinDirectory>/specialPrepare.pl [FILE] [OUTFILE]

The command given with this parameter is applied to each indexing file in JobMode 45. The following variables can be used, also with multiple occurrences:

[FILE]

the complete filename, e.g. RECH-0932489.idx

[FILEEXT]

only the file extension, e.g. idx

[FILENAMEONLY]

only the filename without extension, e.g. RECH-0932489

[PATH]

the path to the file, e.g. /cache/cold/docs/Spool.0001/

[OUTFILE]

the output file, commonly like <Spool>/<SpoolType.Generation>

2017

 

RetryFlagFilename

retry.flag

In JobMode 80 it can happen that the step Prepare must be started many times. A file with the name given here is generated in the spool directory to indicate that another start was undertaken after a first attempt.

2018

 

ErrorFlagFilename

ML001.LI1.<Generation>.error

In the Prepare step in JobMode 80 errors are signaled via a file. The filename can be specified with this parameter.

2019

 

EndFlagFilename

ML001.LI1.<Generation>.end

In the Prepare step in JobMode 80 errors are signaled via a file. The finalization of writing this error file is indicated via another file with the name that can be specified with this parameter.

2020

 

RecordCheckLine

0

This parameter indicates which row represents a checksum that reflects the number of input records. With 0 the mechanism is deactivated. A negative value can be used to count backwards from the ending, so “-1” indicates the last row. The exact structure of the checksum is defined by the parameter <RecordCheckRegEx>.

2021

 

RecordCheckRegEx

([0-9]+);1\.0

This parameter defines the structure of the checksum based on a regular expression. The number of records must be put in brackets. The given example value filters any integer number which precedes the constant character string “;1.0”, e.g. the value “140” below:

  • [...]

  • ABCD#1234#Blah

  • EEEA#5678#Huhu

  • <140;1.0>

2022

 

PrepFileNameColumn

0

If <PrepSearchSubdirectories> is active, this column is interpreted as a path to extend the names of delivered files by a subdirectory prefix. This is useful to check already in the Prepare step if an associated content file is present in a subdirectory. The value 0 deactivates the mechanism.

If this parameter is used in the according Prepare JobMode, and its value is not 0, then the parameter <DataFileExtension> is ignored, as the content filenames are already fully specified via this parameter.

(This parameter is similar to the parameter <FileField > in the Splitting step.)

2023

 

OutputDirectory

<ColdDataDirectory>/error/<SpoolType>

If pre-processing requires signaling feedback to another process via a file, this directory is used for such files.

2024

 

PrepFileExtFilter

jpg

If a spool directory contains multiple file types, but only one type is supposed to be processed, this parameter can be used as a filter. Many file types can be given separated by a blank.

2025

 

PrepMultiDataFileExtensions

pdf html

This parameter is similar to <DataFileExtension> but used in JobModes where the content file extensions can change, e.g. by an uncompress operation where different file types can be present in a zipped file. The JobMode identifies all content files based on this, and usually there are also indexing files present with the same filename but a different file type extension.

In some JobModes it is possible to use multiple file type extensions separated by blanks (e.g. PrepMultiDataFileExtensions = TIF PDF tif pdf).

2026

 

PrepSubdirIdxFileExtension

Idx

This parameter is similar to <IndexFileExtension> but used in JobModes where the indexing file extensions can change, e.g. by an uncompress operation where different file types can be present in a zipped file. The JobMode identifies all indexing files based on this.

2027

 

PrepProtocolHelperFile

<ColdLoggingDirectory>/ProtocolHelpFile.log

In the Protocol JobMode 10 this file is used to signal the success status of archived documents. The file contains a mapping of a unique index value to an indexing file and its directory. Thus, the index values require a unique ID which is given by <PrepUniqueIndexForProtocol>.

2028

 

PrepUniqueIndexForProtocol

7

Via this parameter the column is specified which contains the unique index value that is written into the file <PrepProtocolHelperFile> to ensure that there is a clear reference to a belonging indexing file in a log. The value 1 indicates the first record in the indexing file.

2029

 

 

PrepXMLTopElement

document

Several documents can be referenced in an XML structure. Via this parameter a surrounding parent element can be specified to treat those documents separately. Further elements to be parsed are given by <PrepXMLInputElementList>, but the parent element is <PrepXMLTopElement>. If this value is empty, it is assumed that there is only one document referenced in the XML.

Example :

<spool>
     <document>
           <File>1.pdf</File>
     </document>
     <document>
           <File>2.pdf</File>
     </document>
 </spool>

In this example the appropriate setting would be "PrepXMLTopElement = document".

2030

 

PrepSkipLinesFromStart

0

Indicates how many rows in the beginning of the input file are ignored in the Prepare step. These rows are ignored in all input files, which can be useful if there is some kind of header data which is otherwise irrelevant for pre-processing in this step.

2031

 

PrepSkipLinesFromEnd

0

Indicates how many rows in the ending of the input file are ignored in the Prepare step. These rows are ignored in all input files. This can be useful if there is some kind of status information in the ending, such as the number of records in the file.

2032

PrepRemoveDuplicateInputLines

No

 

If this option is set to “Yes”, the incoming files are analyzed and duplicate rows are removed. It is also possible to send an e-mail notification in this case via the next parameter.

 

Be aware that the structure of the file changes in an unpredictable manner and you cannot be sure that a certain row still refers to the same document afterward!

2033

 

PrepDuplicateInputLinesEmail

batch.processing.manager@t-systems.com

This e-mail address is used for notifications if <PrepRemoveDuplicateInputLines> is set to “Yes”. Multiple recipients can be given with a blank as separator.

2034

 

PrepDuplicateInputLinesSubject

Duplicate elements removed in <SpoolType.Generation>

The subject of the notification e-mail in case of a duplicate removal can be set here.

2035

 

PrepFixedSplittingLengthCheck

200

This parameter defines the expected length of a row in an input file.

2036

 

PrepFixedSplitting

1-10,39-58,89-118,139-200

With this parameter ranges within one row can be specified which are used to extract data. This is useful if no separator characters are present but fixed positions are known instead.

The first character in a row is at position 1. If a range from 39 to 58 is targeted at, use “39-58”. Multiple ranges can be separated by a comma. If only one character shall be extracted, it is necessary to define this as a range as well, e.g. “59-59”. To define all characters starting at a certain position and reaching to the end of the line (where the length does not matter) use a dollar sign, e.g. “60-$”.

2037

 

PrepFixedSplittingFilterCol

2

If this value is greater than 0, the referenced single column is treated as a filter column. Within this column a valid value must be given according to <PrepFixedSplittingValidCols>, otherwise this row is ignored.

2038

 

PrepFixedSplittingValidCols

RECHDAT,RECHNR,RECHBETR

With this parameter the allowed values of the filter column <PrepFixedSplittingFilterCol> are specified. Multiple values separated by a comma can be given.

2039

 

PrepFixedSplittingUseLastCol

3

If this value is greater than 0 it is interpreted as a column where empty values are filled in a specific manner: If a delivered value in this column is empty, the previously found value for this column is used. For example, if a customer ID is not repeated on purpose, but it is still meant to be valid in further records, the latest valid customer ID can be referenced by this.

2040

 

PrepFixedSplittingModifyFilterCol

Yes

If this option is set to “Yes”, the value in the filter column <PrepFixedSplittingFilterCol> is preprocessed before writing it to the CSV file and all blanks and dots are removed. This is useful if this column is used as <SplittingDocTypeCol>.

2041

 

PrepXMLInputElementFileList

Attachment:cryptname();Attachment:orgname()

With this parameter a list of files can be extracted from an XML file in the indexing file. The parameter value must be formatted like <PrepXMLInputElementList>. If two elements are given, the first one determines the real, existing filename and the second one defines the newly desired filename. For example, if “cryptname” is to be renamed into “orgname”:

PrepXMLInputElementFileList=Attachment:cryptname();Attachment:orgname()

If no renaming is desired, only a single element must be given here.

2042

 

PrepAddFileIfExists

Mitzeichnung.pdf

If this file exists, it is added to the list of files. If it is missing, this is NOT interpreted as an error.

2043

 

PrepXMLInputElementListGlobal

Document.Id

This parameter can be used like <PrepXMLInputElementList>, but the values are resolved globally, so even if some top element is defined via <PrepXMLTopElement> it is possible to resolve values from outside of this top element scope.

The given example “Document.Id” represents a query notation according to variant 2 of <PrepXMLInputElementList>:

2044

 

PrepSearchSubdirectories

No

If this option is set to “Yes”, the JobMode also searches in subdirectories (only used in JobMode 21 and 29). Also see <PrepFileNameColumn>.

2045

 

PrepChecksumTest

MD5:8

Executes a checksum test (MD5 or Adler32) in JobMode 21. If this parameter is empty, no such test is run. There must be at most one belonging content file. Its name must either be like the one of the indexing file but with the data file extension <DataFileExtension> or it must be defined in column <PrepFileNameColumn>.

Examples:

  • In Col8 is an MD5 checksum of the content file: PrepChecksumTest=MD5:8

  • In Col5 is an Adler32 checksum of the content file: PrepChecksumTest=Adler32:5

2046

 

PrepFileSizeTest

0

Executes a file size check in JobMode 21. With the value “0” no such check is run. There must be at most one belonging content file. Its name must either be like the one of the indexing file but with the data file extension <DataFileExtension> or it must be defined in column <PrepFileNameColumn>.

Example:

  • In the indexing file the size is given in Col5: PrepFileSizeTest = 5

2047

 

PrepAddAllNonIdxFiles

No

If this parameter is set to “Yes”, in JobMode 21 all files which are located in the same directory as the indexing file are treated as content files [DATAFILE]. Consequently, via PrepAddConstantString=[DATAFILE] it is simply possible to add all these content files as documents. This is commonly used in combination with PrepSearchSubdirectories=Yes.

2048

PrepXMLSubSpecialCharsPath

<ColdSpoolBaseDirectory>/<SpoolType>/

<SpoolType.Generation>/specialChar.txt

 

This parameter specifies a name (with the path) of a file which contains a replacement mapping. In this file pairs consisting of some original character string and a new replacement string are given (original=new). If an index value contains one of the listed original character strings, these are substituted by the new replacement string.

 

In the following example typical HTML special character encodings are replaced by their decoded characters:

 

  • &amp;=&

  • &quot;="

  • &auml;=ä

  • &Auml;=Ä

  •  

  • &ouml;=ö

  • &Ouml;=Ö

  • &uuml;=ü

  • &Uuml;=Ü

  • &lt;=<

  • &gt;=>

  • 2049

    PrepXMLDocTypeElement

    Yes

    If this feature is used, a special syntax is expected for the parameter <PrepXMLInputElementList> where the document type must be appended, for example:

    • PrepXMLInputElementList.Rechnungen = field.name{Dokumenttyp}:value ;field.name{Scan-Datum}:value ;field.name{Rechnungsnummer}:value

    • PrepXMLInputElementList.Eingangsrechnungen = <PrepXMLInputElementList.Rechnungen>

    • PrepXMLInputElementList.Ausgangsrechnungen = <PrepXMLInputElementList.Rechnungen>

    • PrepXMLInputElementList.Bestellungen = field.name{Dokumenttyp}:value ;field.name{Scan-Datum}:value ;field.name{Bestellnummer}:value

    • PrepXMLInputElementList.Kundenauftrag = field.name{Dokumenttyp}:value ;field.name{Scan-Datum}:value ;field.name{Auftragsnummer}:value

    • PrepXMLInputElementList.Verladeschein_SAP = field.name{Dokumenttyp}:value ;field.name{Scan-Datum}:value ;field.name{Lieferschein-Nr}:value; ;field.name{SAP-Nr}:value

    2054

    PrepRetryExternalCommandAllowed

    True

    In JobMode 96, the NONCRITICAL_ERROR exit code of the external command is forwarded to the Scheduler if this parameter is set to "True". Otherwise it is intercepted and replaced by the CRITICAL_ERROR exit code. The default is "True". This means that the external command can cause the Scheduler to retry if the parameter is "True".

    2055

    PrepFixCSVLineEnding

    No

    JobMode 29 ends each line of the created CSV file with a separator. This means that there is one value too many in each line. If this field is set to “Yes”, the line end is corrected. The default is “No” because of downward compatibility.

    2057

    PrepEmptyResultIsError

    No

    This parameter controls whether an empty result list should be considered as an error or not. Thereby the result list consists of the exported revision IDs during the legal export. This parameter is relevant for the legal export JobMode 951.

    2058

    PrepWriteEmptyResultReport

    Yes

    If this parameter is set, a <Docs>/NO-HITS/note.txt file is created in case of an empty export result list. This parameter is relevant for the legal export JobMode 952 .

    2060

    PrepFileNameMask

    ISREGEX;^Continental_[\d]{10}_[\d]{8}\.xml

    If this value is set, only files whose names match the mask are accepted as index files. The syntax is the same as for “FileNameMask”.

    Table 94: Prepare configuration parameters