From Direct API Calls to Queue-Based HubSpot Data Pipelines

Written by ebenezer melkamu | Jun 25, 2026 12:06:56 PM

From Direct API Calls to Queue-Based HubSpot Data Pipelines

HubSpot integrations often begin with a straightforward synchronization model: an application receives a request, calls the HubSpot APIs, stores the returned CRM data, and responds when the process is complete.

This architecture is simple, understandable, and often completely sufficient for an MVP or a small HubSpot portal.

The challenge appears when the integration begins processing larger portals, more CRM objects, scheduled synchronization jobs, and multiple requests at the same time. At that point, synchronization is no longer just an API request. It becomes a long-running data operation that requires its own execution model.

This article explores how a direct HubSpot synchronization process can evolve into a queue-based architecture built around background workers, trackable jobs, retries, and controlled API consumption.

The Starting Point: Direct Synchronization

In a direct synchronization architecture, the application performs the complete operation inside the original request.

A user, scheduled job, or internal service triggers the synchronization. The backend then:

Validates the request.
Calls the required HubSpot APIs.
Retrieves CRM records.
Transforms and stores the data.
Returns a response when the synchronization finishes.

The flow can be represented as:

This model has several advantages during the early stages of a project.

It requires minimal infrastructure, keeps the execution flow in one service, and is relatively easy to debug. For a small portal containing a limited number of contacts, companies, or deals, the synchronization may complete quickly enough that additional architecture would provide little value.

The direct model becomes problematic when synchronization time and system usage begin to increase.

Where Direct Synchronization Starts to Break

A HubSpot portal can contain large volumes of contacts, companies, deals, tickets, engagements, and custom objects. Retrieving this data often requires pagination, transformation, association handling, and multiple API requests.

As the workload grows, the original request remains open for longer periods of time. This creates several operational risks.

Long-running requests

A full synchronization may take several minutes. Keeping an HTTP request open for the entire operation increases the possibility of timeouts and leaves the user waiting without meaningful visibility into progress.

HubSpot API limits

HubSpot applies API usage limits. If multiple synchronizations run simultaneously, the integration may experience throttling, delayed requests, or failed operations.

The system therefore needs a way to control how quickly work is processed instead of allowing every incoming request to execute immediately.

Weak recovery from partial failures

A synchronization may successfully process contacts and companies before failing while retrieving deals.

In a simple direct implementation, the entire process may need to be restarted. This repeats work that has already completed and makes it difficult to recover only the failed portion.

Competition for application resources

When the same backend service handles user-facing requests and long-running synchronization jobs, heavy data processing can reduce the responsiveness of the application.

The API is being asked to perform two different responsibilities:

Respond quickly to users.
Execute resource-intensive background workloads.

These responsibilities have different performance and reliability requirements.

Concurrent and duplicate work

A user action, scheduled job, and webhook event could trigger synchronization at approximately the same time.

Without coordination, several processes may retrieve the same data, consume unnecessary API capacity, and attempt to update the same records concurrently.

Limited operational visibility

A direct request provides little structure for answering questions such as:

Which synchronization jobs are currently running?
Which jobs failed?
How long did each synchronization take?
Which CRM object caused the failure?
How many records were processed?
Can the failed operation be retried safely?

These are not only debugging questions. They become important product and support requirements as the integration matures.

Treating Synchronization as a Job

The architectural shift begins by separating the request to synchronize from the work required to perform the synchronization.

Instead of fetching HubSpot data immediately, the API creates a synchronization job and places it into a queue. A background worker later retrieves and processes that job.

The request flow becomes:

The API can now respond immediately with a run identifier:

{   

   "run_id": "run_123",
   "status": "pending" 

}

The client does not need to keep the original request open. It can use the run identifier to retrieve the job status separately.

For example:

GET /sync-runs/run_123

{
  
   "run_id": "run_123",
   "status": "processing",  
   "current_object": "companies", 
   "records_processed": 12500

}

Synchronization has now become a trackable system operation rather than an invisible side effect of an API request.

The Core Components of the Architecture

A queue-based HubSpot synchronization system can be divided into four main responsibilities.

The API service accepts and validates work

The API service receives synchronization requests from users, scheduled jobs, or other systems.

A request might specify which HubSpot objects should be synchronized:

{   

   "objects": ["contacts","companies","deals"] 

}

The API validates the request, creates a synchronization run, submits a job to the queue, and returns immediately.

It should not perform the full HubSpot data retrieval inside the request.

This keeps the API responsive and makes it possible to scale the request layer independently from the processing layer.

The queue controls when work is processed

The queue acts as a buffer between incoming synchronization requests and available workers.

A queued job might contain:

{   

   "run_id": "run_123",
   "job_type": "hubspot_crm_sync",
   "status": "pending",   
   "objects": ["contacts","companies","deals"]
 
}

Instead of allowing every request to begin at once, jobs wait until processing capacity is available.

This creates a natural place to control concurrency, protect HubSpot API limits, and prevent the backend from being overwhelmed by sudden increases in demand.

Workers execute synchronization jobs

Workers continuously look for pending jobs.

When a worker receives a job, it:

Marks the run as processing.
Executes the required synchronization services.
Records progress and processing metrics.
Marks the run as completed or failed.
Stores error information when execution fails.

Workers operate independently from the API service. The original request can finish while the synchronization continues safely in the background.

Additional workers can also be introduced as synchronization volume grows, provided concurrency remains within the limits imposed by HubSpot and the destination system.

Synchronization services contain the CRM logic

The worker should coordinate execution, but it should not contain every detail of how individual HubSpot objects are retrieved and stored.

That logic can be separated into services such as:

sync_contacts() 
sync_companies() 
sync_deals() 
sync_tickets()

A contact synchronization service may internally perform:

sync_contacts()

↓ 

fetch_contact_pages_from_hubspot()

↓

transform_contact_records()

↓

upsert_contacts()

↓

save_sync_progress()

This separation keeps the worker generic and makes the system easier to extend when another HubSpot object is introduced.

It also makes each synchronization service easier to test independently.

Building Reliability Into the Processing Model

Moving work into a queue creates the foundation for reliability, but the queue alone does not solve every failure scenario.

The system also needs clear processing rules.

Track explicit job states

Each synchronization run should have an explicit status, such as:

pending
processing 
completed 
failed 
retrying 
partially_completed

The run can also store:

Start and completion timestamps
Objects requested
Current object being processed
Number of records processed
Retry count
Error message
Error category
Worker identifier

This information improves debugging and provides the foundation for an operational dashboard.

Design retries carefully

Temporary network failures, rate-limit responses, and service interruptions should not always cause a permanent job failure.

The worker can retry recoverable failures using delayed or exponential backoff.

However, retries must be limited. A permanently invalid request should not remain in an endless retry loop.

The system should distinguish between:

Temporary failures that may succeed later
Validation failures that require correction
Authentication failures that require intervention
Data-specific failures that should be isolated and recorded

Make database operations idempotent

A job may be executed more than once because of a retry, worker restart, or message redelivery.

Database writes should therefore be designed so that repeating an operation does not create duplicate records.

For CRM synchronization, this often means using the HubSpot object ID as a stable external identifier and applying an upsert operation:

Insert the record when it does not exist. 
Update the record when it already exists.

Idempotency is one of the most important requirements for a reliable background-processing system.

Respect API consumption centrally

Workers should share a coordinated approach to HubSpot API usage.

Possible controls include:

Limiting the number of concurrent workers
Applying request delays
Respecting rate-limit response headers
Retrying throttled requests after the appropriate delay
Separating high-priority and low-priority jobs

Without centralized control, horizontal scaling can accidentally increase API pressure instead of improving performance.

What the Queue-Based Model Improves

The main benefit is not simply that synchronization happens in the background. The larger improvement is that the operation becomes controlled, observable, and recoverable.

Faster user-facing responses

The API returns a run identifier immediately instead of requiring the user to wait for the full synchronization.

Better failure recovery

Failed jobs can be retried without asking the user to restart the entire process manually.

Controlled HubSpot API usage

The queue and worker pool provide a place to manage concurrency and request rates.

Independent scalability

The API and worker services can scale according to different workloads.

Improved visibility

Every synchronization run can expose progress, duration, processed record counts, and failure information.

Clearer system responsibilities

The API accepts work, the queue stores work, workers execute work, and synchronization services contain the HubSpot-specific business logic.

This separation makes the system easier to understand, test, and maintain.

The Trade-offs

A queue-based architecture is not automatically the correct solution for every HubSpot integration.

It introduces additional infrastructure and operational responsibility:

Queue provisioning
Worker deployment
Job-state persistence
Retry rules
Idempotency handling
Monitoring and alerting
Cleanup of old job records
Dead-letter handling for repeatedly failing jobs

For a small portal with a short synchronization time, this complexity may not be justified.

A direct synchronization model remains reasonable when:

Data volume is small
Synchronization completes quickly
Requests are infrequent
Occasional manual retries are acceptable
The integration is still validating its core product assumptions

A queue-based model becomes more valuable when:

Synchronizations are long-running
Several users or systems can trigger jobs
Reliability is a product requirement
Failed operations must be recoverable
API limits need active management
Operational visibility is required
The system needs to scale beyond a single backend process

Architecture should evolve in response to real constraints, not only because a more advanced pattern exists.

Final Thoughts

A direct HubSpot synchronization process is often the correct place to begin. It allows a team to validate the integration quickly without introducing infrastructure that may not yet be necessary.

As portals become larger and synchronization becomes more frequent, the limitations of request-bound processing become more visible.

Introducing a queue changes synchronization from a long-running API request into a durable, trackable background job. Workers can process jobs at a controlled rate, retries can recover temporary failures, and each run can expose meaningful operational information.

The most important architectural change is not the queue itself. It is the separation of responsibilities:

The API accepts work. 
The queue stores work. 
The worker coordinates work. 
Synchronization services execute HubSpot-specific logic. The database records both CRM data and operational state.

That separation creates a stronger foundation for reliable HubSpot integrations, analytics pipelines, and larger-scale CRM data platforms.

View full post