My requirements when developing this REAXML parser script was to make sure it was fast, efficient and not repeat imports if NO changes are detected and be able to use WP-Property to import the files and manage the property and rental listings.
So after countless hours searching for a REA XML parser to no avail I set out to find a developer to build my REA XML parser.
The script had to do the following:
- Combine the individual REAXML files into a single, or multiple import files.
- Process the results and store them in 3 new separate REAXML files: Current, SOLD/LEASED, and withdrawn to increase speed.
- When the property status changes to sold/leased/withdrawn delete the property entry from the current REAXML output file.
- Just added: Geocode the property address during import and add another field so my new property plugin did not have to also geocode the entry on import.
Problem #1 : The REAXML format is multiple XML files.
The problem with the REAXML import format is that the property entries are saved into separate files that could contain one or more XML elements that are either updated or new.
REAXML File Structure Example:
and they go on forever…
Problem #2 : Importing one huge XML file will slow down your site
Processing one single growing file every day can cause server slowdown especially on smaller shared servers and this is even more of a problem when your clients want the import schedule to run more frequently.
Problem #3 : Repeated entries in several files
I noticed that sometimes a record would come through several times with minor alterations each time, like the agent couldn’t make up their mind on the details of a particular property entry, normal human nature like pressing send on an email and then remembering one more thing. These records needed to be discarded and only the latest one considered for parsing, again for speed.
Solution was to merge the entries into three separate files:
Why split the files into Current, sold/leased and withdrawn?
The Australian real estate market is unlike the US market where agents and companies don’t have access to entire regions of property, they tend to only have a handful of listings per agent so putting the “current” property into its own file makes the process much faster with a much smaller file.
When the first few files are imported its not much of a problem, but after a couple of years having several hundred records being processed several times per day this could end up slowing or even killing your server, your phone will be ringing with customer complaints, not good.
So with the split files to process the current.xml file may only be 150kb but the sold.xml file is over 1.5mb. Much quicker to process the small current.xml file repeatedly and the sold.xml less frequently.
Why not import directly into the WordPress SQL database?
Importing directly into WordPress was a no go because at the time when WordPress was at version 2.6 it did not have custom post types capabilities like today. Also I did not have my own property plugin nor the expertise at that time to create such a plugin like WP-Property which has a nifty supermap feature which we ended up using to manage the listings on our WordPress theme.
Another reason is I did not want WordPress to continually import repeated data and images that didn’t change as this would blow out the size of your website very quickly.
How did you handle the unique ID problem on import?
Each REA XML entry has a unique ID but I didn’t want to import already existing entries if there were no changes each time the import schedule was run, that would be crazy.
What I did was use the unique ID combined with the modified date to determine if the entry was new. This significantly improved import speed if say 10 records are in the current file and only 1 changed, the importer skipped 9 and only imported one entry.
How stable is your script?
I’ve been using the script on several real estate websites over the past few years and only added the geocode function last year while developing our own property plugin. The script has worked 100% without any need for monitoring once the cron job is setup.
Solution was to create a REA XML parser that was fast, efficient and did not import already existing property records if no changes were made
Having been so focused on building real estate websites for clients I never thought of offering this easy to use script that can be installed quickly and easily for your clients websites. Also included with the script is the import settings so you can quickly setup WP-Property for your client or use WP ALL Import for a more custom setup.
Let me know in the comments below if this REA XML script is of interest to you so I can provide some documentation on its ease of use and setup.