Finance Case Study 2 - Inference Labs

Entity name mapping

Client is a leading Financial technology company based out of Singapore
Client works on market data to generate signals to share with Hedge Funds
Requirement was to develop a data mart for the client which could intelligently rationalize data between multiple sources i.e. if same attributes arrive from different feeds pick one based on rules

There is an existing relational data store that uses MySQL to store vendor provided data in the vendor provided schema which needed to be moved to Aurora
Rationalization of data from different sources needs to be both on matching of data and on business rules
Processing time needs to be as low as possible since data mart needs to be refreshed daily
Incremental updating mechanism is required since due to M&A data entities may change and performing historical dump becomes un-feasible every time

Developed an entity matching algorithm which rationalized data between multiple sources based on business rules and entity closeness
Entity attributes like text were used to define similarity
Closeness of entities was also defined as a metric of sub-entity matches

Developed and defined a de-normalized structure for the data
Aggregation of data is done at the Entity and Relationship level to a presentation format suitable for quicker query
Attributes and attribute aggregates are part of the Entity as property
Relation attributes are stored at relation table vs snowflake
Management of Entities is done at ETL level – i.e. if an entity is deleted or remapped it will be propagated onto all lookups using the ETL logic
Designed and implemented mile stoning to enable roll-backs from individual data providers and enable checkpoints for the aggregated data model