PropMix launches – the first Value and Income investor platform for real estate

NEW YORK, NY / January 19, 2021 / – a real estate analytics & artificial intelligence  company with a large customer base among lenders, appraisers, and realtors, has announced – platform that seamlessly connects real estate value and income investors with investor-friendly real estate agents and mortgage lenders.

Powered by nationwide data including mortgages, distressed properties, and sale and rent values on every residential property, investors can easily find and analyze opportunities based on their financial goals such as cap rates, cash flows, return on investment, and so on. Prospektr also provides easy access to financial forecasting models and local trends to help investors make informed decisions.


“Prospektr democratizes the real estate investment market and makes it easy for anyone to learn, research, analyze and make decisions quickly with deep analytics, insights, and recommendations on their fingertips”, said Umesh Harigopal, CEO of PropMix and a 15-year veteran in US real estate investing. “The Prospektr network effect will exponentially grow as we continue to optimize portfolio analytics and financial recommendations using artificial intelligence and reduce the friction across the value chain.“ combines large scale data computing and machine learning with on-the-ground investment research experience to rapidly evolve the platform with new data, investor-friendly insights and features.. It offers a unique model for investors to gain free access to the platform to help them with research and decision making.

About PropMix LLC, is a real estate data, insights, and solutions company with deep experience in commercializing Artificial Intelligence. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards, empowers users to engage with data, make decisions using insights and build the real estate technology of the future. PropMix was founded in 2016 and is headquartered in New York. 

Using HomeRun to gather information and interior photos for an appraisal

HomeRun is a mobile friendly solution that enables an appraiser to gather property information and interior photos from a homeowner in the most convenient, reliable, and secure manner to help complete an appraisal. 

Step 1:

  • An Appraiser can initiate an appraisal order with the Subject Property information and homeowner details from 
  • HomeRun will match and retrieve public record data for the property so that the appraiser can validate and make any corrections if required. 
  • Once the appraiser submits the information, a confirmation email is sent to the Appraiser and the homeowner is notified via email.

Step 2:

  • The homeowner clicks on a link they received in their email to provide the necessary information.
  • They can also verify the validity of the request using the lender name and loan application number provided in the email. 
  • The homeowner provides answers to a few questions regarding the property’s ownership, occupancy, land details, membership of a homeowner’s association and fee, and any other use of the property.
  • Now it is time to upload interior photos. HomeRun uses the property’s geocodes from public records to ensure the photos belong to the property, and whether they were taken within the last seven days. 
  • Attestation: A final step for the homeowner is to certify that the information they provided is correct. Once submitted HomeRun notifies the appraiser.

Step 3:

  • The appraiser can click on a link in the confirmation email they received to check the status or access the information provided by the homeowner.
  • They can download the data and photos provided. 
  • All the changes in the appraisal order are stored in an auditable log that safeguards the compliance requirements of the Appraiser. 

Additional Details

The homeowner will receive detailed instructions about uploading photos of specific rooms in the home. They are required to take the photos with their mobile phone with the location tags turned on. Each photo can be a maximum of 5 MB in size. Each applicable room should have a minimum of two photos covering 2 different angles that cover all the 4 walls, ceiling, and floor.

The homeowner has the option to update the appraisal order in multiple steps and at each step the information they enter is auto-saved. While the homeowner is working on entering information, the appraisal order is in “pending” status and the appraiser can view the data and photos saved thus far. When the homeowner completes the attestation and submits the information, the status of the order is changed to “completed” and the homeowner will no longer be able to edit the information.

EB5 – Technology vs Real Estate Investments

Recently some emerging technology companies have turned to EB5 capital raise to execute their growth strategy. Many investors do not realize that technology companies are offering a much better job creation opportunity and has a much higher overall value return potential.

Technology traditionally has been a powerful business enabler. However Artificial Intelligence, Blockchain, and Immersive Experience are disrupting Industries exponentially by transforming the Industry architecture itself. This fundamental shift is fueled by automation, machine cognition and new C2E (Consumer to Everything) engagement models.

In this article we will compare real estate and technology investments on how they stack up for an EB5 investor.

Criteria Real Estate Project Technology Company
Return on Incremental Investment Real Estate projects have a target project development cost. Without the full budget amount being available, the project cannot be completed and revenue cannot be realized. Technology companies on the other hand can put any amount of investment to use immediately and generate a return from it. This provides an opportunity for every investor to be assured of their return.
Risk of Green Card approval


Investment in EB5 is expected to take a huge dip soon after the November 21, 2019 deadline. You would not expect the same investors to put down 90% more for the same green card soon after the deadline but in due time we expect the EB5 candidates to return to investing.

Real estate projects run the risk of not filling their full capital raise before the deadline and thereby not being able to reach the finish line for job creation. This may put the green card approval at risk. Technology projects on the other hand can continue to provide returns and hire with incremental investments and weather the dip in EB5 investments. It is part of the technology business model to easily adjust course based on available capital.
Valuation Potential


Companies are valued based on their revenue, income, assets, pipeline of customers, and the market potential.

Real estate projects are valued with traditional valuation models based on mostly on the income and assets. The multiplier and the goodwill are very much limited to the location. Technology companies hold patentable intellectual capital that has a long term growth potential. Selecting a tech company with the right specialization or secret sauce usually can result in exponential valuation – the multiplier on revenue would 10 to 15 times.


Convert your loan to equity


Your EB5 investment is usually loaned to the Job Creating Entity (JCE) and the returns are in the form of interest payments at set annual rate. Investors do not get a share of the business. But that is different with technology companies.

Most real estate investments due to the limited valuation opportunity may not offer the option to convert your investment into a share in the business. Technology companies enjoy a much higher valuation after the initial growth years and since the initial investors were the enablers, it is common practice to offer a conversion to shares in the company at a discount price. For example after 5 years if your capital is returned at $500K, in addition to all the interest payments, then you have the choice to buy into shares of that company that will be immediately valued at $625K or higher based on the discount offered to you.
Potential Market Size and Geographical reach


This is important for the growth and resiliency of a business. A wider geographical reach means less dependency on the local market shifts.

Real estate is by definition a local business and is limited to the market available to it in that area. Even real estate at tourist attractions are dependent on the local tourism industry. Technology companies cater to the global marketplace across regions and countries. Usually the intellectual capital they create is applicable across the world. This is especially beneficial when economies take different turns in various parts of the world.
Job Creation Potential especially when the Green Card processing dates are not current. Real estate projects hire at a high rate during the construction phase and once the construction is completed operating the real estate does not require as many employees. Technology companies have a continuous growth as the number of customers increases and the employees hired are retained and developed for the long term.

And more importantly, the skills that a tech company develops in their employees is longer lasting and very attractive for USA economy.


As we see above the technology companies are undeniably a better EB5 investment opportunity But real estate may still be attractive to investors who like the comfort of knowing that their money is invested in a physical asset despite its limitations.

However in this new world where data is the oil, Artificial Intelligence is augmenting our collective intellect and our physical and digital worlds are merging, tech is the place to be for every EB5 Investor! Learn more at

PropMix introduces Data-in-a-Box – a unique data service ideal for high-performance analytics and machine learning

MANHASSET HILLS, N.Y., Sep. 2, 2019 –, a real estate data and insights company, introduced a brand new way to interact with data using its Data-in-a-Box offering. Data-in-a-box is a cloud facility that provides easy and immediate access to very large amounts of property data to power various analytics and deep learning platforms in the real estate industry. Using this data-as-a-service will especially help lenders and appraises or any other real estate technology provider to reduce their internal data operations and leverage the economies of scale that PropMix is bringing to the industry.

PropMix has been diligently assembling the dream database for the real estate industry over the past several years and curating it with its artificial intelligence techniques. Their data quality improvement techniques include many patent pending capabilities to extract information from natural language and from real estate photos. The data lake now includes data on over 151 million properties, tax and assessment records, deed and mortgage records, foreclosures, and a lot more.

Many large companies in the mortgage and appraisal management market have their proprietary analytical and machine learning models that need access large amounts of data. These companies may already have a model that is proven in a local market with limited data and are ready to scale it up for use across the country. With Data-in-a-Box PropMix is enabling the growth strategies of such companies by offering cloud access to its proprietary data set. “Our customers can now focus on building value on top of the data instead of spending their time and money on gathering and standardizing data”, said Daniel Mancino, Vice President of Data Solutions and Sales at PropMix.

The real estate industry is undergoing a transformation with billions of dollars invested in PropTech each year. “With Data-in-a-Box offering, our goal is to accelerate innovation in the real estate industry by creating an environment where a company of any size can focus on creating their best machine learning and analytical models with seamless access to nationwide curated data”, said Umesh Harigopal, CEO of “We are excited to invite all industry participants to leverage our high quality data to accelerate the ongoing transformation of the industry driven by AI and Blockchain.”

About LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan,, 7329799507,

PropMix announces discounted pricing for Valuation Expo and Appraisal Buzz members

MANHASSET HILLS, N.Y., March 18, 2019 –, a real estate data and insights company, has announced in conjunction with Valuation Expo, a unique offer to try out its Market Conditions Advisory (MCA) product for appraisers.

All participants and delegates at the Valuation Expo at Charleston, SC from March 19 to 2, 2019 will be eligible for a 20% discount on all contracts that are signed before March 30, 2019 for upto a duration of 12 months from the date of signing. “With this offer, we are providing significant value to the independent appraisers as well as large AMCs to experience and adopt the seamless data and insights access platform for appraisers”, said Daniel Mancino, Vice President of Data Solutions and Sales at PropMix.

MCA was first released in February of 2018 and a large number of appraises have been leveraging its single point access to data and insights nationwide. PropMix recently integrated its image recognition technology into MCA to automate and simplify certain mundane tasks for the appraiser.

About LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan,, 7329799507,

PropMix brings Image Recognition to the Real Estate Appraisal Industry

MANHASSET HILLS, N.Y., February 20, 2019 – Appraisal Vision is PropMix’s new addition to the suite of products and services it has been enabling for the real estate appraisal industry. Appraisal Vision is an image recognition solution using a deep learning engine that has been trained on terrabytes of real estate image data over the past couple of years. It enables the extraction of information from images which is used for enriching data and improving and validating home valuation with information in the home photos.

Appraisal vision can power many solutions such as fraud detection, appraisal validation, and automate some simple tasks for the appraiser such as ordering and labeling photos in an appraisal form. “We will continue to integrate appraisal vision into many applications under the Market Conditions Advisor brand” said Umesh Harigopal, CEO of PropMix. “Our goal is to reduce and appraisers mundane tasks and help them focus on high-value activities.”

The core technology that powers Appraisal Vision is a complex chain of cascading neural networks each a convolutional neural network. PropMix’s heuristic algorithms combine results from multiple deep learning engines to arrive at its final predictions on a real estate photograph. The neural networks have been trained over the last 2 years on about 22 terabytes of image data. This has resulted in accuracy levels of about 93% and it continues to improve.

About LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.

Media Contact: Sakeer Hassan,, 7329799507,

Bradie – The Bar is Now Set Higher for Digital Engagement in Real Estate

Redefining digital in real estate

We are excited to announce Bradie – Broker and Agent Digital Engagement platform – our suite of capabilities for the real estate marketplace. We are redefining the online real estate experience completely using AI and Machine Learning techniques. It integrates our various existing products that have been used by hundreds of agents and brokers.

We believe that the real estate agent has the opportunity to build lifelong relationships with homeowners as their trusted advisors on the largest investment of their lives. Bradie is our journey to help agents nurture that relationship and provide value to the homeowners. Visit

Bradie brings to market brand new ways of interacting with real estate information using computer vision. Home buyers using an IDX portal can now stop staring at loads of data about the homes they have shortlisted and instead see them side-by-side using pictures of each room and focus on what makes the homes different from each other.



Stale and weeks old Home Value Reports are a thing of the past. Our comparable market analysis engine – iCMALive – provides an engaging platform with live updates to their personalized value analysis as the market changes in their neighborhood – a new home on the market, a new sale, or a price change. All such updates can be screened by the agent in real time and Bradie will communicate with the customer on the agent’s behalf – building a trustworthy relationship with the customers. Visit to get more details.

You have to see a demo of Bradie today to get a peek into a whole lot more groundbreaking features.Contact us or write to us at

Real estate data mining for your next business need using Public Records

Here are a few examples of how our public record data mining is leveraged

Comprehensive nationwide real estate public record data helps tackle various business needs in many industries beyond real estate. We mine terrabytes of real estate data to find the those needles or patterns in the haystack. This post covers a few real estate and non-real estate use cases we are actively supporting.

As previously announced, our public record data provides a comprehensive set of property attributes such as owner occupancy, last sale information, and more detailed tax assessment information along with full property details. It also provides property identification, seller/buyer information, tax exemption details, building information, and legal description of the property.

The valuation models and comparable similarity scoring are now based on authoritative property details and current market conditions from listing data.

Marketers in any industry

PropMix real estate data is very well suited for finding target customer base for many businesses. Here are a few examples:

  1. A skylight company recently needed information on all homes that have a skylight so that they could offer upgrades or servicing options
  2. A flooring company is able to provide an automated estimate of carpeting or hardwood flooring costs using our building area information
  3. An insurance company is able to target customers who have lived for more than 10 years in a home to consider modifying their insurance coverages

Real Estate Investors

Investors are interested in finding undervalued homes in good rental markets to buy and convert them to income generating rental properties.

  1. We identify tenant occupied properties in each neighborhood in the country and find areas where rental demand is increasing
  2. We then find owner occupied homes in these areas that can potentially be converted to investment properties.

Our data can also power a full investment pro forma including the total cost of ownership and return on your investment.

Mortgage Industry

Appraisers and lenders need information to accurately assess the risk of a collateral before underwriting a loan –  purchase, refinance, and/or home equity.

  1. Appraisers improve the accuracy of their valuations using extensive assessor recorder property data and comparable sales from public records – including new homes sales and/or owner sales not in the Multiple Listing Services.
  2. Underwriters or lender reviewers can run their appraisal review and AVMs using:
    1. Transaction history on a property
    2. Comprehensive report of the property details

As we continue to solve additional business problems we will provide updates on this blog on new and creative ways in which our customers are mining our data.

PropMix launched Market Conditions Advisor – a recommendation & analytics platform for Appraisers

MANHASSET HILLS, N.Y., February 6, 2018 –, a real estate data and insights company, has announced the general availability of its Market Conditions Advisor (MCA) product for appraisers. MCA  provides a single user interface to research and analyze property records from across the USA including current and past sales information to help appraisers generate analytics required for there GSE forms.

MCA comes packed with numerous features that have been developed using feedback from appraisers on the field. The product includes automated comparable recommendations powered by its customizable similarity scoring mechanism and further allows appraisers find comparables using a number of methods that merge listing data and public records. All the research and analytics can be easily exported to many formats that can be directly consumed by the most common appraisal forms software. “With the MCA integrations we are building, appraisers can now access all the data they need from within their favorite forms software without having to switch to the MLS portals.” said Daniel Mancino, Vice President of Data Solutions and Sales at PropMix. “A single access point makes it even more beneficial in cities where multiple MLSs serve the same location”.

MCA is powered by PropMix’s data and insights platform built using decades of experience in AI and machine learning. MCA will grow in the coming months both geographically as more Multiple Listing Service (MLS) relationships are added and more functionality as more machine learning insights are introduced into the product. The application also provides the images of the properties as well as the listing history. “We are developing MCA as a brand for the appraisal industry and this is the beginning of our pursuit to complement and augment the appraiser’s capabilities with real world decision making powered by the PropMix cognitive engine for real estate” said Sakeer Hassan, CMO for 

About LLC, an Innovation Incubator Inc portfolio company, offers a ground-breaking Real Estate Smart App Development Platform that enables the Real Estate ecosystem to easily consume and monetize data and insights and build Smart Solutions. PropMix’s platform and solutions are widely used by mortgage lenders, appraisers, realtors, and investors. Built on industry open standards for global scale, empowers users to engage with data, make decisions using insights and build the real estate technology of the future. Headquartered in New York, we also have presence in Boston MA, Leesburg VA, Freehold NJ in USA and Trivandrum, Kerala in India.


Media Contact: Sakeer Hassan,, 7329799507,

Improve the Quality of Your Real Estate Data

Part 2 – How to improve Real Estate Data Quality?


In Part 1 of this series we broadly covered why data quality is important in real estate, why real estate data quality has become a hard problem to solve, and presented a few examples of how to measure the quality of your real estate data. In this second and final part we will present a few ideas on how you could begin the practice of improving real estate data quality.

Data Quality Best Practices

As you would expect data quality is a common problem in many other industries irrespective of how old or new the industry is. As a result many best practices already exist for managing and improving data quality that can be easily adopted within real estate. Here are a few important areas to focus on.

Data Quality Assessment

Before we can start improving quality we need a solid understanding of the current state of the data. As we presented in the last section of Part 1, knowing how to measure for the quality of your data is a first step. These data quality metrics are very specific to the industry we are in and we have provided a few good starting points.


In addition, to knowing your current state a good data quality assessment practice is required to assess yourself periodically to measure improvements and also measure any data quality leaks due to data trickling into your platform. It is also a great way to present to senior management on the strides you are making in your organization.


Design of the quality metrics needs to be traceable directly to your company’s business objectives which would be different depending on where in the real estate market you play – lead generation, mortgage origination, appraisals, brokerage, etc. Such a traceability is important to get buy-in from the management to invest in data quality.

Data Governance
To have a strong commitment from the organization towards data quality and to continuously support the people, processes, and technologies to maintain the data quality a data governance board must be established with participants from the business and IT. Business participants would be those who are close to the consumption and production of data and the IT participants would be the data architects and modelers. The objectives of the governance board would be to


  • Establish data policies and standards
  • Defining and measuring data quality metrics
  • Discover data related issues and provide resolution paths
  • Establish proactive measures to reduce data quality leakage

Data Stewards

One of the most important roles within a data governance board and the overall data management practice is the Data Steward. Data stewards are the ultimate owners of specific sections of the data – usually called subject areas, and they would represent business users and producers of data. The buck stops with the data steward for all data quality issues and the steward takes a leadership role to resolve data accuracy, consistency, or integrity issues.


Data stewards are often the liaisons between the business and the IT department that manages the data for the business. In this role, they are required to work with the business and IT to define relevant quality metrics, have it interpreted and implemented appropriately with the IT department and ultimately showcase their quality improvements that improve business outcomes.

Create a Data Quality “Firewall”

Most data resulting within an organization are traceable broadly to 2 types of sources – applications where users are entering data or data feeds that are processed to load data into data stores.The idea of a data quality firewall is to catch and reject any data that violates data quality rules at the time of its entry into a data store. All data ingestion points will have to hit this one virtual firewall to be validated before being processed and stored.


The keyword above is “virtual” – because it is impractical to create a single system to act as a data quality firewall given the various subject areas of data and the departmental data ingestion points across the organization. The idea is not to create a choke point but a proactive mechanism to catch data quality issues for follow up and resolution before it goes downstream into transactional or analytical systems.


Data Standardization vs. Data Quality – What’s the difference

Does compliance to a data standard mean high data quality? In other words, if your data is Platinum level certified by RESO 1.5 data dictionary would you also considered it to be of high quality? It turns out the answer is not that straightforward.


There are typically 2 different views on data quality – conformance to a standard specification or usability of data for a specific purpose. If we take the first definition the data quality would be very high if a data set is certified by RESO. On the other hand as we discussed in Part 1, an agent could inadvertently enter erroneous listing data or purposefully tweak the listing for improved marketability. This can result in data inconsistency between a public record and a listing record for the same property leaving the user of the data to assign trustworthiness to the data sources before consumption. Since business objectives are driven by data use as opposed to conformance to a standard we prefer the second definition of data quality which is measured by its usability.


Consider another example of standard vs. quality: Assignment of a PropertySubType value of Condominium or Townhouse or Single Family Residence is standards compliant but an erroneous assignment of this field can cause the property to be missed from appearing in IDX searches. In addition, it can also cause valuation issues if not combined and cleansed against other data sources.


Having said that, certain standards specifications include elements of data use as well, in which case conformance to standards and usability begin to mean the same. But given the various uses of a particular data set it is in unfair to expect a standards organization to completely define all the usability of specs for the data resulting in an unwieldy standard that may reduce its adoption.


Here are some typical data quality concerns to consider:

Completeness Are we missing any values of critical fields?
Validity Is the data in a field valid? Does the whole record match my rules?
Uniqueness How much of our data is duplicated?
Consistency Is information consistent within a single record, across multiple records, and across multiple data sets?
Accuracy Does the data represent reality?
Temporal Consistency & Accuracy Does a snapshot in time represent reality at that time and are all data sets consistent with that snapshot?


As you can see, a data standard such as RESO would not be able to answer the above for all the real estate ecosystem players. We could define detailed rules for each of the concerns above and such rules will look different in a mortgage company and a sales lead generation company.

Practical data quality for real estate

Now let us bring all this down to a few specific takeaways to improve the quality of data in your company. We will define these in a few steps to begin with. But certainly stay tuned into our blog for future posts on this topic where we will continue to provide specific rules and heuristics you could implement.


Many of the activities below must be driven by an appointed data steward for each major data set you are dealing with – assessment, listings, deeds, mortgages, permits, etc.

Identify critical fields

The first step in your data quality journey is to identify the most critical fields for your particular application. Out of the 639 fields contained in the RESO 1.6 data dictionary, you would want to identify the fields that are required for your computations. There are some fields commonly required for any application and were listed in Part 1 of the article and repeated here for quick reference:


Parcel Number ListingContractDate AssociationName
Address StandardStatus AssociationFee
PropertyType OriginalListPrice Subdivision
PropertySubType ListPrice School Districts
Lot Size CloseDate
Zoning ClosePrice TotalActualRent
NumberOfBuildings DaysOnMarket
BedroomsTotal ListAgent Information
BathroomsTotal ListBroker Information
LivingArea SellingAgent Information
Tax Year SellingBroker Information
Tax Value Public Remarks
Tax Amount
Land Value
Improvement Value

Define Data Quality Rules

The next step is to define a set of rules that will consider 2 dimensions to begin with:


Data Quality Concerns: Completeness, Validity, Uniqueness, Consistency, Accuracy, and Temporal Consistency & Accuracy.

Extent of measurement: Single record, multiple history records of the same property, multiple history records of the same listing, multiple data sets (public records and listings)


You would end up with rules for each field, for each type of record, for a data set, and rules that cut across multiple data sets. These rules would validate the field, a record, a set of records, or the whole data set. Execution of these rules would result in either errors or warnings about the quality of your data.

Discovery with Data Profiling

Data Profiling helps you run a statistical analysis on the data to discover hitherto unknown problems

For example, we usually expect PropertySubType values to be always one of the known ones. But as new data gets processed, we might discover that certain PropertySubType mappings are absent in our standardization routines and as a result non-standard PropertySubTypes may be getting added to our DB.


To catch such issues, a data profiling capability will provide detailed stats on field populations, null counts, blank counts, and also field value distributions. For the PropertySubType values, the field value distribution will reveal to us that there is a new PropertySubType value with over 100,000 entries. This will mean that we should remap these values as required.


Running a data profiler periodically will help identify issues that creep up into the data. Note that a data quality firewall would only prevent “unclean” data when we have modeled such cleansing rules or quality rules within that firewall. But for previously unknown issues that get loaded via daily incremental data ingestions, we need to discover the issues and model prevention rules into the firewalls.


Establish Data Quality Metrics

Having defined the rules it is time to measure your quality against the rules you have established. Common quality metrics are:

  • Number of records that failed a particular quality rule
  • Field population thresholds and where we fall short
  • Field value distributions
  • Number of records with invalid data for each field
  • Number of records that failed a record level quality rule
  • Number of multi-record quality rule failures
  • Number of data-set level quality rule failures


For each of the above it is important to understand the trends and so you need to run the Data Profiler in regular intervals – weekly or monthly, to know how your data quality is trending – improving, getting worse, or discover issues that did not exist before.

Enforce the rules at the data ingestion points

This is the first and proactive step in improving and maintaining high quality of data.


Having defined the rules for measuring data quality, it is now important to maintain a higher quality data by ensuring we enforce these rules at the time data is created in the organization. Get the data steward to become the evangelist for the rules he/she has defined to work with each data origination point to implement the validation rules.

Define Heuristics for Quality Improvement

The reactive posture to data quality improvement is considered more of a data cleansing process and is a required element of a data quality practice. Most of the times, you are not in control of the data origination points and if the rule enforcement at the data origination point is too restrictive you might not have enough data for your applications. And hence the need for a reactive measure to cleanup data you have received.


There are broadly 2 alternatives – either perform the cleanup and then put it through a highly restrictive data quality firewall or have a lenient firewall with a downstream cleaning process. The choice depends very much on your application and its ability to deal with imperfect data.


Any data quality improvement mechanism is dependent on a set of heuristics that the data steward and the data architects work together to define. For example, you could reclassify a rental listing correctly by looking at the listing price and comparing it to local median sale price and to the median rental price. A strong partnership between a data steward and the data architect is necessary to define and develop these cleansing heuristics.


It is also recommended that you maintain a list of all active and retired heuristics used for cleansing. Another need alongside data cleansing is the ability to track the data lineage where you would keep track of the source of the cleansed data and the heuristics that caused the data to be modified.


Data quality is a cyclical process that begins with establishing rules, implementing them to measure quality, profiling the data, cleaning up the data as required, and finally go back to tweaking the rules to execute the cycle once more. The target metrics would start small but continue to tighten it with time.



We hope this article provided an overview and some key takeaways to implement a good data quality practice within your real estate technology platform. We will continue this conversation with more blog posts to provide you:

  • Practical data quality rules and metrics
  • Data cleansing heuristics to implement
  • Machine learning techniques in real estate data cleansing


We are planning to release our Data QA Tool specialized for real estate data free to the community. Please sign up here to be notified when the tool is released.

Want access to Data QA Tool?

Please provide your email to be alerted when Data QA Tool is published.