Data warehouse and Entity Transformation Engine

No Inference Labs is building a de-normalized data mart for a Fintech client. This data mart intelligently matches appropriate data from several different sources. 

Outcome

• Reduced DB transactional speed by ~80% to 4 hours/day

• Developed entity matching engine with 75% accuracy

Background Objective

• Client is a leading Financial technology company based out of Singapore​

• Client works on market data to generate signals to share with Hedge Funds​

• Requirement was to develop a data mart for the client which could intelligently rationalize data between multiple sources i.e. if the same attributes arrive from different feeds pick one based on rules​

Challenges

There is an existing relational data store that uses MySQL to store vendor-provided data in the vendor provided schema which needed to be moved to Aurora Rationalization of data from different sources needs to be both on the matching of data and on business rules

Solution Framework

Step 1: Transformation Engine

• Developed an entity matching algorithm which rationalized data between multiple sources based on business rules and entity closeness​

• Entity attributes like text were used to define similarity

• The closeness of entities was also defined as a metric of sub-entity matches​·      

Step 2: Data Modelling Engine

• Developed and defined a de-normalized structure for the data​·

• Aggregation of data is done at the Entity and Relationship level to a presentation format suitable for quicker query​

• Attributes and attribute aggregates are part of the Entity as property

• Relation attributes are stored at relation table vs snowflake

• Management of Entities is done at ETL level – i.e. if an entity is deleted or remapped it will be propagated onto all lookups

using the ETL logic​

• Designed and implemented milestoning to enable roll-backs from individual data providers and enable checkpoints for the aggregated data model

Solution Framework