Understanding the RED FLAGS engine
Previously we showed you the bird's eye view of the Red Flags architecture, now take a look at the engine itself, in details:
Control & data flow
Let's see how the engine works.
- Firstly, the main method is
RedflagsEngineApp.main()
. It handles special command line arguments that doesn't need the engine itself. If none of them were specified, it starts the Spring platform. - Spring now takes the lead an initializes all components. The engine's first stage is
RedflagsEngineBoot
. It depends onRedflagsEngineConfig
which will be filled with the configuration parameters. RedflagsEngineBoot
callsScopeProvider
to select and instantiate the appropriate implementation ofAbstractScope
. Scopes are basicallyNoticeID
iterators, you can find them in thehu.petabyte.redflags.engine.scope
package.- After that it creates and configures a
TedInterface
to be used by the engine later.TedInterface
deals with the downloading from TED and also it manages the file cache. - Then it calls
GearLoader
to load the appropriate gears (Notice
processors). - Lastly,
RedflagsEngineBoot
starts a newRedflagsEngineSession
. RedflagsEngineSession
receives the scope, the gears, and the thread count, and initializes the gears at first.- Then it creates a thread pool, starts iterating over the notice IDs (using scope) and creates a
GearTrain
thread for each one of them. - Lastly, it waits for all threads to finish.
GearTrain
objects are processor chains containing the gears. A chain is receiving an emptyNotice
object, gives it to the first gear, takes its result, gives it to the next gear and so on.- Notices are not being collected, but there's an option for that in
RedflagsEngineSession
.
Gear trains
GearLoader
reads the gear list from the redflags.engine.gears
property. The default list is the following:
redflags.engine.gears: # parse metadata - metadataParser #- cancelledNoticeFilter # skip non-Hungarian notices - countryFilter - originalLanguageFilter # download all tabs, and related notice tabs - archiver - docFamilyFetcher - docFamilyArchiver # skip further processing of old notices and tenders with non-classic dir. - publicationDateFilter - docFamilyDateFilter - directiveCorrector - directiveFilter # magic I. - parse document based on template - templateBasedDocumentParser - rawValueParser - frameworkAgreementParser - estimatedValueParser - renewableParser - countOfInvOpsParser - awardCriteriaParser - openingDateParser # magic II. - indicators - fwAgOneParticipantIndicator - fwAgFewParticipantsIndicator - fwAgLongDurationIndicator - fwAgHighEstimatedValueIndicator - contrDescCartellingIndicator - highEstimatedValueIndicator - totalQuantityHiDeltaIndicator - renewalOfContractIndicator - durationLongOrIndefiniteIndicator - persSitMissingCondIndicator - finAbMissingMinCondIndicator - finAbEquityCondIndicator - finAbRevenueCondExceedEstimValIndicator - finAbRevenueCondManyYearsIndicator - techCapMissingMinCondIndicator - techCapSingleContractRefCondIndicator - techCapExpertsExpCondManyYearsIndicator - techCapGeoCondIndicator - techCapRefCondExceedEstimValIndicator - techCapRefCondManyYearsIndicator - techCapEURefCondIndicator - procTypeAcceleratedIndicator - procTypeNegotiatedNoJustificationIndicator - countOfInvOpsLowIndicator - countOfInvOpsNoCondIndicator - awCritLacksIndicator - awCritPaymentDeadlineCondIndicator - awCritMethodMissingIndicator - deadlineIsTightIndicator - openingDateDiffersFromDeadlineIndicator - offerGuaranteeIsHighIndicator - finalValFarFromEstimValIndicator - procWithoutContractNoticeIndicator - numberOfOffersLowIndicator - unsuccessfulProcWithRiskIndicator - unsuccessfulProcWithoutInfo1Indicator - unsuccessfulProcWithoutInfo2Indicator - decisionDateDiffersFromOpeningDateIndicator - highFinalValueIndicator - contractingOrgInKMDBIndicator - winnerOrgInKMDBIndicator # output - H1FlagExporter - H2FlagExporter - mySQLExporter - stopGear
Gears are being passed to a GearTrain
with a notice. GearTrain
objects are Callable<Void>
instances, their call
method will go through the gears and pass them the notice object in their process
methods. The first gear receives a Notice
object which has only the notice ID in it.
Gear trains have a filter feature: when a gear returns null
, the the cycle breaks, the processing stops at that point.
See the stopGear
at the end. The StopGear
returns null
, so stops the processing. You should always have it at the end of your list, because if you specify a gear list which is shorter than the default list, it only overwrites the first part of the list, and the rest of the default list will be also in use.
When a gear throws an exception, it will be logged by GearTrain
, but the cycle continues with the next gear in line.
GearTrains
are called by RedflagsEngineSession
on separate threads. The size of the thread pool can be configured by the threads
property.
Gears in general
As I said before, gears are notice processors. They take an input notice and should return an output notice or null
. Every gear is an implementation of AbstractGear
. There is only one method that has to be implemented:
protected abstract Notice processImpl(Notice notice) throws Exception;
This method is called by the public final
method process
after a null-check of the input notice. The process
method is called by the GearTrain
.
Additionally there are two methods for gear initialization and finalization which can be overridden: beforeSession()
and afterSession()
. They are called by RedflagsEngineSession
before and after calling the GearTrain
. When a beforeSession
method throws an exception, the session ends at that point.
It is recommended not to use @PostConstruct
annotation or InitializingBean
when you need initialization because in that ways, the gear is being initialized at the application startup. But when you use the beforeSession
method, the initialization will only be performed when the gear is present in the gear list.
Gears must have the @Component
annotation, this way GearLoader
can instantiate them using Spring. All @Component
in Spring are singletons. If you need separate instances of a gear for every notice (for example for thread-safety), you should add @Scope("prototype")
annotation to your gear class.
By default your the bean name of your gear is the class name with the first letter transformed to lower case. You can override this by adding the default argument to the @Component
annotation: @Component("CustomName")
. The bean name is used in the gear list.
Property values can be injected into gears, this way they are able to be configured out of the box. There is two way to do that:
- A) Add
@Value("${property.name}")
annotation to a field in your gear class. - B) Add
@ConfigurationProperties(prefix = "mygear")
annotation to you gear class. After that any property havingmygear
prefix will be mapped to the fields of your gear class.
Also, you can use other gears in a gear, this way:
@Autowired
private MyOtherGear otherGear;
But make sure you call beforeSession
and afterSession
of the other gear if needed.
Here's an example gear:
@Component
@ConfigurationProperties(prefix = "mygear")
// @Scope("prototype") // in case you need separate instances
public class MyGear extends AbstractGear { // "myGear" in the gear list
private String strOption = "default value"; // mygear.strOption
private int intOption = 42; // mygear.intOption
private Pattern myPattern = Pattern.compile(".*"); // mygear.myPattern
//private @Autowired OtherGear g; // in case you need another gear
@Override
protected Notice processImpl(Notice notice) throws Exception {
// do something with notice and fields
return notice;
}
}
We can categorize gears by their main tasks:
- There are downloader or archiver gears which fetch data from TED and store them in files.
- There are parser gears which turn the raw input files into an object model.
- There are filter gears which can throw out notices that are not needed.
- There are indicator gears which examine notices and generate flags.
- There are exporter gears which produce output from the notices.
Let's discuss their features.
Downloader gears
Their task is to fetch HTML documents for a notice by either downloading them from TED and placing in the cache, or by reading directly from the cache if they already downloaded. They should be somewhere at the beginning of the list.
The mechanism mentioned above is implemented by TedInterface.
List of Red Flags' download gears
Parser gears
Parser gears process the downloaded data and fill in the POJOs. Like downloader gears, parser gears also doesn't have an abstract superclass to implement.
Some built-in parsers use downloader gears to fetch HTML content, and some of them
List of Red Flags' parser gears
Filter gears
A filter gear examines the current notice and decides whether it can reach the next processor or not. Conventionally these gears should implement the AbstractFilter
class and its only one method: boolean accept(Notice)
. Example:
@Component
public class MyFilter extends AbstractFilter {
@Override
public boolean accept(Notice notice) throws Exception {
return condition(notice);
}
}
List of Red Flags' filter gears
Indicator gears
A task of an indicators is to examine something in a notice and generate one or zero flag. Indicator classes should implement AbstractIndicator
and its method IndicatorResult flag(Notice)
.
All indicators have properties called category
and weight
, they can be used later in other gears or in the webapp, for example to differentiate displaying of flags. These values are initialized in beforeSession
from package.ClassName.category
and package.ClassName.weight
properties. The default category is null
, the default weight is 1.0
.
An IndicatorResult
object contains the following things:
type
(IndicatorResultType
)flagCategory
weight
description
The flag type can be one of these enum value: MISSING_DATA, IRRELEVANT_DATA, NO_FLAG, FLAG
. NO_FLAG
and FLAG
is trivial, I think. MISSING_DATA
can be used when some information needed by the indicator is missing from the notice. IRRELEVANT_DATA
result type can be used to indicate the notice is not from the type that the indicator wants to process. Only the results with FLAG
type will be displayed on the website.
The description
is handled specially by the webapp. First of all it must be a message label identifier, e.g. "myindicator.desc". But you can pass actual information as parameters to be displayed in the webapp. Here's how:
- Create a message label in your
messages.properties
file like this:myindicator.desc="Some value ({v}) in the notice is higher than the threshold ({t})."
- Then you can pass this as your flag description:
"myindicator.desc|v=139|t=100"
- Finally, the webapp will display:
Some value (139) in the notice is higher than the threshold (100).
- So you can enclose parameter names in brackets in your label, then you can pass
parameter=value
pairs separated by|
in yourdescription
.
Fortunately AbstractIndicator
has helper methods to generate description strings, and also complete IndicatorResult
objects:
|
----------------------------------------------------------|---
label(String label, String... args)
| generates a description string where the label identifier will be flag.IndicatorId.specifiedLabel
and parameters will be the param=value
pairs given in args
irrelevantData()
and missingData()
| generates IndicatorResult
objects with the corresponding flag types
returnFlag()
| generates a result with label flag.IndicatorId.info
and type FLAG
, category and weight will be copied from the current indicator class
returnFlag(String label, String... args)
| lets you replace info
to another label and add parameters
returnFlag(double weight, String lable, String... args)
| lets you override the indicator's weight too
IndicatorId
is generated from the name of the direct parent package and the simple class name, e.g. hu.petabyte.redflags.engine.gear.indicator.pl.AwCritLacksIndicator
will have pl.AwCritLacksIndicator
as its id. This way indicators of different languages/countries can be separated in language files too.
If you want to return NO_FLAG
you can simply return null
from your flag(Notice)
method.
Let's see an example indicator:
package mypackage.lang;
import hu.petabyte.redflags.engine.gear.indicator.AbstractIndicator;
import hu.petabyte.redflags.engine.model.IndicatorResult;
import hu.petabyte.redflags.engine.model.Notice;
import java.util.regex.Pattern;
import org.springframework.stereotype.Component;
@Component
public class MyIndicator extends AbstractIndicator {
@Override
protected IndicatorResult flagImpl(Notice notice) {
if (!dataExistsInNotice(notice)) {
return missingData(); // MISSING_DATA
} else if (condition1(notice)) {
return returnFlag(); // FLAG, "flag.lang.MyIndicator.info"
} else if (condition2(notice)) {
return returnFlag("info2"); // FLAG, "flag.lang.MyIndicator.info2"
// or with parameters:
// return returnFlag("info2", "p1=v1", "p2=v2");
}
return null; // NO_FLAG
}
}
There are 2 additional abstract classes, AbstractTD3CIndicator
and AbstractTD7Indicator
. They override the flag
method by testing the document type and calling the abstract flagImpl
method (same signature as flag
) if the notice is contract notice (TD-3
or TD-C
) or contract award notice (TD-7
).
List of Red Flags' indicator gears
Exporter gears
Exporter gears are usually on at the end of the list, their task is to produce an output from the parsed and generated data, such as CSV files or relational databases. Conventionally all exporter gears should implement AbstractExporter
and its abstract method: void export(Notice)
. Example:
@Component
@Scope("prototype") // you probably need only one of this
public class MyExporter extends AbstractExporter {
@Override
public void export(Notice notice) {
// write out notice
}
@Override
public void beforeSession() throws Exception {
// initialize output
}
@Override
public void afterSession() throws Exception {
// finalize output
}
}