preloader

Entity name mapping

Background & Objectives

  • Client is a leading Financial technology company based out of Singapore
  • Client works on market data to generate signals to share with Hedge Funds
  • Requirement was to develop a data mart for the client which could intelligently rationalize data between multiple sources i.e. if same attributes arrive from different feeds pick one based on rules

Challenges

  • There is an existing relational data store that uses MySQL to store vendor provided data in the vendor provided schema which needed to be moved to Aurora
  • Rationalization of data from different sources needs to be both on matching of data and on business rules
  • Processing time needs to be as low as possible since data mart needs to be refreshed daily
  • Incremental updating mechanism is required since due to M&A data entities may change and performing historical dump becomes un-feasible every time

Solution Framework

Step 1: Transformation Engine
  • Developed an entity matching algorithm which rationalized data between multiple sources based on business rules and entity closeness
  • Entity attributes like text were used to define similarity
  • Closeness of entities was also defined as a metric of sub-entity matches
Step 2: Data Modelling Engine
  • Developed and defined a de-normalized structure for the data
  • Aggregation of data is done at the Entity and Relationship level to a presentation format suitable for quicker query
  • Attributes and attribute aggregates are part of the Entity as property
  • Relation attributes are stored at relation table vs snowflake
  • Management of Entities is done at ETL level – i.e. if an entity is deleted or remapped it will be propagated onto all lookups using the ETL logic
  • Designed and implemented mile stoning to enable roll-backs from individual data providers and enable checkpoints for the aggregated data model

Technologies used

By continuing to use this site you consent to our »