Gmail Document Loader
Overview
The Gmail Document Loader enables you to extract and process email content from your Gmail account. This powerful integration allows you to bring email data into your AnswerAI workflows, making it ideal for customer support analysis, email processing automation, and knowledge base creation from email conversations.
Key Benefits
- Access emails from specific Gmail labels/folders
- Process email threads with complete conversation history
- Extract email metadata (sender, subject, date, etc.)
- Handle both HTML and plain text email content
- Configurable message limits and filtering options
Prerequisites
Before using the Gmail Document Loader, ensure you have:
- Google OAuth Configured - Follow the Google OAuth Setup Guide
- Required Scopes - Your OAuth application must include:
https://www.googleapis.com/auth/gmail.readonly
https://www.googleapis.com/auth/gmail.modify
(if you need to mark emails as read)
How to Use
Step 1: Add Gmail Document Loader
-
Locate the Node
- Navigate to the Document Loaders section in the node library
- Find and drag the "Gmail" node onto your canvas
-
Connect Credential
- In the node configuration, click "Connect Credential"
- Select your existing Google OAuth credential or create a new one
- If creating new, you'll be redirected to Google for authorization
Step 2: Select Gmail Labels
-
Label Selection Interface
- Once your credential is connected, the label picker will load
- You'll see all available Gmail labels including:
- System labels (Inbox, Sent, Drafts, etc.)
- Custom labels you've created
- Labels with visual indicators showing label type
-
Refresh Labels
- Use the "Refresh Labels" button to reload available labels
- This is helpful if you've recently created new labels in Gmail
-
Choose Labels
- Select one or multiple labels to process
- Each selected label will be processed separately
- Popular choices include "Inbox", "Sent", or custom project labels
Step 3: Configure Processing Options
Max Messages
- Purpose: Limits the number of messages retrieved per label
- Default: 100 messages
- Recommendation: Start with a smaller number (10-50) for testing
Include Threads
- Purpose: Includes entire conversation threads for each email
- Default: false (disabled)
- When to use: Enable for customer support scenarios where context matters
- Note: This significantly increases the amount of data processed
Text Splitter (Optional)
- Purpose: Breaks large email content into smaller, manageable chunks
- When to use:
- Processing long email conversations
- Improving vector search performance
- Working with large email volumes
Step 4: Advanced Configuration
Additional Metadata
- Add custom metadata fields to your email documents
- Useful for categorization and filtering
- Example:
{
"department": "support",
"priority": "high",
"processed_date": "2024-01-15"
}
Omit Metadata Keys
- Purpose: Exclude specific metadata fields from the final documents
- Options:
- Comma-separated list:
threadId,labelId
- Use
*
to omit all metadata except Additional Metadata
- Comma-separated list:
- Available metadata fields:
source
: Gmail URL referencemessageId
: Unique Gmail message IDthreadId
: Gmail thread IDlabelId
: Label ID where message was foundlabelName
: Human-readable label namesubject
: Email subject linefrom
: Sender informationto
: Recipient informationdate
: Email timestamp
Understanding Email Processing
Content Formatting
The Gmail loader formats email content in a readable structure:
Subject: [Email Subject]
From: [Sender Name] <sender@email.com>
To: [Recipient Name] <recipient@email.com>
Date: [Email Date]
[Email Body Content]
--- THREAD --- (if Include Threads is enabled)
From: [Previous Sender]
Date: [Previous Date]
[Previous Email Content]
Thread Processing
When "Include Threads" is enabled:
- The primary email content appears first
- Thread messages are appended with clear separators
- Each thread message includes sender and date information
- Chronological order is maintained
Use Cases
Customer Support Analysis
Configuration:
Labels: ['Support', 'Customer Service']
Include Threads: true
Max Messages: 200
Text Splitter: Enabled (for long conversations)
Purpose: Analyze support conversations for common issues and response patterns
Knowledge Base Creation
Configuration:
Labels: ['Important', 'Project Communications']
Include Threads: false
Max Messages: 500
Additional Metadata: { 'category': 'knowledge_base' }
Purpose: Extract important information for documentation
Email Analytics
Configuration:
Labels: ['Inbox']
Include Threads: false
Max Messages: 1000
Omit Metadata: '*'
Purpose: Focus on email content analysis without metadata noise
Tips and Best Practices
Performance Optimization
-
Start Small
- Begin with 10-50 messages for testing
- Gradually increase based on processing time and needs
-
Use Specific Labels
- Avoid processing entire "Inbox" unless necessary
- Create specific labels for the content you need
-
Text Splitter Usage
- Enable for email threads longer than 2000 characters
- Use recursive character splitter for best results
Data Management
-
Metadata Strategy
- Keep relevant metadata for search and filtering
- Use "Omit Metadata Keys" to reduce noise
- Add custom metadata for better organization
-
Regular Updates
- Re-run the loader periodically for new emails
- Consider date-based filtering in future versions
Security Considerations
-
Credential Management
- Use separate credentials for different email accounts
- Regularly review and rotate OAuth tokens
- Monitor access logs in Google Console
-
Data Privacy
- Be mindful of sensitive information in emails
- Consider email content filtering before processing
- Comply with data protection regulations
Troubleshooting
Common Issues
-
"Google access token not found"
- Solution: Reconnect your Google OAuth credential
- Check: Ensure the credential is properly selected in the node
-
"Labels not loading"
- Solution: Click "Refresh Labels" button
- Check: Verify your OAuth credential has the correct scopes
- Try: Re-authorize the credential if refresh fails
-
"No emails found"
- Check: Selected labels contain messages
- Verify: Your Gmail account has access to the selected labels
- Consider: Increasing the max messages limit
-
"Rate limit exceeded"
- Solution: Reduce the number of messages being processed
- Wait: Gmail API has rate limits; try again later
- Optimize: Use more specific labels to reduce API calls
-
"Token expired" errors
- Solution: AnswerAI automatically refreshes tokens
- Manual fix: Re-authorize the credential if automatic refresh fails
- Check: Ensure your OAuth application is still active
Performance Issues
-
Slow Processing
- Reduce max messages limit
- Disable thread processing if not needed
- Use text splitter for very long emails
-
Memory Issues
- Process emails in smaller batches
- Increase text splitter chunk size
- Monitor system resources during processing
Integration Examples
Basic Email Ingestion
- Gmail Document Loader → Vector Store
- Use for: Simple email search and retrieval
Email Analysis Pipeline
- Gmail Document Loader → Text Splitter → Vector Store → Retrieval QA
- Use for: Answering questions about email content
Support Ticket Analysis
- Gmail Document Loader (Support label, threads enabled) → Chat Model
- Use for: Analyzing support patterns and generating insights
API Reference
The Gmail integration uses the Gmail API v1 with the following endpoints:
users.labels.list
- Retrieve available labelsusers.messages.list
- Get message lists per labelusers.messages.get
- Retrieve individual message contentusers.threads.get
- Get conversation threads (when enabled)
Next Steps
After setting up Gmail document loading:
- Configure Vector Storage - Store processed emails for search
- Set up Retrieval - Enable question-answering over email content
- Add Analysis Tools - Use chat models to analyze email patterns
- Automate Processing - Schedule regular email ingestion
Related Documentation: