Inference has built a Data warehouse and Entity Transformation Engine for a fintech client, a leading Financial technology company based out of Singapore. They work on market data to generate signals to share with Hedge Funds.
Requirement was to develop a data mart for the client which could intelligently rationalize data between multiple sources i.e. if the same attributes arrive from different feeds pick one based on rules.
Challenge spotted: There is an existing relational data store that uses MySQL to store vendor-provided data in the vendor provided schema which needed to be moved to Aurora Rationalization of data from different sources needs to be both on the matching of data and on business rules
Step 1: Transformation Engine:
- Developed an entity matching algorithm which rationalized data between multiple sources based on business rules and entity closeness
- Entity attributes like text were used to define similarity
- The closeness of entities was also defined as a metric of sub-entity matches·
Step 2: Data Modelling Engine
- Developed and defined a de-normalized structure for the data·
- Aggregation of data is done at the Entity and Relationship level to a presentation format suitable for quicker query.
- Attributes and attribute aggregates are part of the Entity as property
- Relation attributes are stored at relation table vs snowflake
- Management of Entities is done at ETL level – i.e. if an entity is deleted or remapped it will be propagated onto all lookups using the ETL logic
- Designed and implemented milestones to enable roll-backs from individual data providers and enable checkpoints for the aggregated data model
- Reduced DB transactional speed by approximately 80% to 4 hours/day.
- Developed entity matching engine with 75% accuracy.