Entity name mapping

Background & Objectives

  • Client is a leading Financial technology company based out of Singapore
  • Client works on market data to generate signals to share with Hedge Funds
  • Requirement was to develop a data mart for the client which could intelligently rationalize data between multiple sources i.e. if same attributes arrive from different feeds pick one based on rules

Challenges

  • There is an existing relational data store that uses MySQL to store vendor provided data in the vendor provided schema which needed to be moved to Aurora
  • Rationalization of data from different sources needs to be both on matching of data and on business rules
  • Processing time needs to be as low as possible since data mart needs to be refreshed daily
  • Incremental updating mechanism is required since due to M&A data entities may change and performing historical dump becomes un-feasible every time

Solution Framework

Step 1: Transformation Engine

  • Developed an entity matching algorithm which rationalized data between multiple sources based on business rules and entity closeness
  • Entity attributes like text were used to define similarity
  • Closeness of entities was also defined as a metric of sub-entity matches

Step 2: Data Modelling Engine

  • Developed and defined a de-normalized structure for the data
  • Aggregation of data is done at the Entity and Relationship level to a presentation format suitable for quicker query
  • Attributes and attribute aggregates are part of the Entity as property
  • Relation attributes are stored at relation table vs snowflake
  • Management of Entities is done at ETL level – i.e. if an entity is deleted or remapped it will be propagated onto all lookups using the ETL logic
  • Designed and implemented mile stoning to enable roll-backs from individual data providers and enable checkpoints for the aggregated data model
Entity name mapping

Solution Framework

Unlock the Power of Feedback with Inference Labs Today!

Read More Case Studies ​