Have you ever had to try parsing the structure of an email, which contains html, text and images? Easier said than done 😅
As a developer, you need to examine the email content, extract images or text, and do something with them—like upload the images to the web server or save them to a database. IMAP functions available in standard PHP libraries offer elegant and efficient processing of email messages, but if only things were simple! For a deeper look into what’s possible, check out PHP’s official IMAP extension documentation for detailed guidance on available functions and usage examples.
Remember those wonderful times when email servers provided consistent, predictable structure? When parsing the structure of email was boring? You probably do not remember such times, not because you are too young, but because such times never existed! 😀
The reality sets in when we realize most email servers use a complex structure to represent the email message content. Developers seeking examples on how to retrieve the email content will find solutions that assume the structure is predictable and will not change. Unfortunately, the underlying structure of an email changes when it contains replies or attachments.
In this article, we introduce our approach to identifying the structure of an email and extract the content we are interested in. We feel our approach is worth sharing because it works independently of how the web server represents the email. As you will examine the code, you may find it simple and elegant.
Email Message Structure
We use PHP to connect to the mail server, retrieve the latest email messages intended for use, and use PHP’s standard IMAP functions to get the structure of the email document. While this is a strategy that works in principle, we faced a major obstacle: inconsistent structure content. Even after doing extended research on this matter, we made little progress. As a result, we looked for in-house solutions that would recognize the structure of an email message, and process it to collect the target elements.
Two Prong Process
Our solution uses two functions that operate on the email message structure:
- extract_body_part_path – accepts the structure of the email message and returns a hash with information about the paths within the structure where data is stored.
- extract_body_part_path_exception – looks at email structure using the structure path as returned by extract_body_part_path.
Process Structure Recursively
In extracting the body part of an email from the available structure, we use a recursive approach. This is a common challenge when parsing email structure, as the content is typically stored within the HTML element—but its exact location can vary depending on server settings and response formats.
To solve this, we invoke a recursive iteration that builds a hash structure. Each value of this hash is then processed further to locate the data stored in the “encoding” item of the structure.
After identifying the correct path and filtering for numerical values, we store the path elements in an array—aptly named $keys
. The function returns a hash with two keys:
- Path – containing the values of
$keys
- Encoding – the encoding type corresponding to the current email component
This method makes parsing email structure more reliable, even across inconsistent or dynamically nested formats.
Process Path Exception
The second item of the system is the extract_body_part_path_exception function which is responsible for going through the initial email message structure. This function uses the structure path built with extract_body_part_path, and deals with unfamiliar structures. The work is detailed in the code below, but suffice it to say it provides a complete list of paths where the HTML content is stored.
Using IMAP Functions to Extract Email Content
The process of extracting the HTML type content follows these steps:
- Fetch IMAP email message structure – use the standard imap_fetchstructure function.
- Extract structure path – use extract_body_part_path on the structure to build information about locations of the HTML type content.
- Build complete structure path – use extract_body_part_exception on structure and paths collected at step 2.
- Extract HTML content – use imap_fetchbody to extract the specific path from the email message.
Conclusions
The structure representing an email retrieved from the mail server has several depth levels and changes based on the email message content and history. This is a problem for automatically processing its content, but we solved it by identifying all paths using a recursive approach.
At WPRiders, we know your website isn’t just a project—it’s part of your business engine ⚙️. That’s why we offer more than just code. From thoughtful WordPress custom development to dependable monthly WordPress maintenance, we’re here to support what matters most. Whether you’re planning a new build with custom WordPress website development 🛠️, looking for a reliable WordPress consultant 🧠, need and Elementor developer,or need ongoing support through our WordPress development service, we’d love to explore how we can work together.
If that sounds like what you need, 👉 Schedule a Free Discovery Call and let’s talk about your next move. We’re always up for a good conversation.