60x Faster: How I Optimized FIFO SQS Throughput in Go
From 2 minutes to 2 seconds for 924 messages — two targeted changes I made to our messaging layer, no infrastructure swap required.
I was working on a campaign processing pipeline that was sending 924 messages to an AWS SQS FIFO queue. It took 2 minutes. That's ~7.7 messages/second — painful for a latency-sensitive flow where delays directly affect user experience.
After investigating the bottleneck, I shipped two targeted changes. The result: 2 seconds. Same queue, same infrastructure, 60x throughput.
What I Found
Profiling the pipeline, I traced the slowness to our messaging library: it was calling the SQS API once per message. For FIFO queues without content-based deduplication enabled, each call also required a MessageDeduplicationId — and the library was handling that inconsistently, which forced it down a slower, more conservative path.
The fix came down to two things:
- Explicit, deterministic deduplication IDs
- Batch sends instead of one call per message
Optimization 1: Deterministic Deduplication IDs
For FIFO queues, if you control the MessageDeduplicationId yourself, you get idempotency guarantees and can turn off content-based deduplication on the queue side (which has latency costs). I implemented the ID generation as a SHA-256 hash of the message content:
package sqs
import (
"crypto/sha256"
"encoding/hex"
"encoding/json"
)
// GenerateDeduplicationID returns a stable, content-derived dedup ID.
// json.Marshal output is deterministic for structs with exported fields.
func GenerateDeduplicationID(msg any) (string, error) {
b, err := json.Marshal(msg)
if err != nil {
return "", err
}
h := sha256.Sum256(b)
return hex.EncodeToString(h[:]), nil
}A few things worth noting here:
json.Marshalon a struct with exported fields is deterministic. On maps it is not — if your message is amap[string]any, sort the keys before marshaling or use a deterministic encoder.- The hash changes if any field value changes, which is exactly the behavior you want: same content → same ID → deduplicated by SQS.
Optimization 2: Batch Sending
SQS allows up to 10 messages per SendMessageBatch call. I grouped messages into batches of 10 before sending, which reduced 924 HTTP round-trips down to 93 — a 90% drop in API calls.
package sqs
import (
"context"
"encoding/json"
"fmt"
"strconv"
"github.com/aws/aws-sdk-go-v2/aws"
"github.com/aws/aws-sdk-go-v2/service/sqs"
"github.com/aws/aws-sdk-go-v2/service/sqs/types"
)
const maxBatchSize = 10
type BatchSender struct {
client *sqs.Client
queueURL string
}
func (s *BatchSender) SendInBatches(
ctx context.Context,
messages []any,
groupID string,
) error {
for i := 0; i < len(messages); i += maxBatchSize {
end := min(i+maxBatchSize, len(messages))
if err := s.sendBatch(ctx, messages[i:end], groupID); err != nil {
return err
}
}
return nil
}
func (s *BatchSender) sendBatch(
ctx context.Context,
batch []any,
groupID string,
) error {
entries := make([]types.SendMessageBatchRequestEntry, 0, len(batch))
for idx, msg := range batch {
body, err := json.Marshal(msg)
if err != nil {
return fmt.Errorf("marshal message %d: %w", idx, err)
}
dedupID, err := GenerateDeduplicationID(msg)
if err != nil {
return fmt.Errorf("dedup ID for message %d: %w", idx, err)
}
entries = append(entries, types.SendMessageBatchRequestEntry{
Id: aws.String(strconv.Itoa(idx)),
MessageBody: aws.String(string(body)),
MessageDeduplicationId: aws.String(dedupID),
MessageGroupId: aws.String(groupID),
})
}
result, err := s.client.SendMessageBatch(ctx, &sqs.SendMessageBatchInput{
QueueUrl: aws.String(s.queueURL),
Entries: entries,
})
if err != nil {
return fmt.Errorf("SendMessageBatch: %w", err)
}
if len(result.Failed) > 0 {
// In production: retry failed entries with exponential backoff.
// Failed.SenderFault == false means the error is on AWS's side — retriable.
return fmt.Errorf("%d messages failed in batch", len(result.Failed))
}
return nil
}Results
| Metric | Before | After |
|---|---|---|
| Total time (924 msgs) | 120 s | 2 s |
| Throughput | 7.7 msgs/s | 462 msgs/s |
| Time per message | ~130 ms | ~2.2 ms |
| SQS API calls | 924 | 93 |
| Cost reduction (API calls) | — | ~90% |
60x improvement. 98.3% less time. ~90% fewer API calls.
Going Further: Concurrent Batches
If ordering within a MessageGroupId doesn't matter, you can dispatch multiple batches in parallel using a semaphore to cap concurrency:
func (s *BatchSender) SendInBatchesConcurrent(
ctx context.Context,
messages []any,
groupID string,
concurrency int,
) error {
batches := chunk(messages, maxBatchSize)
sem := make(chan struct{}, concurrency)
errc := make(chan error, len(batches))
var wg sync.WaitGroup
for _, b := range batches {
wg.Add(1)
sem <- struct{}{}
go func(batch []any) {
defer wg.Done()
defer func() { <-sem }()
if err := s.sendBatch(ctx, batch, groupID); err != nil {
errc <- err
}
}(b)
}
wg.Wait()
close(errc)
for err := range errc {
if err != nil {
return err
}
}
return nil
}Note: concurrent sends across different MessageGroupId values are safe. Concurrent sends within the same MessageGroupId break FIFO ordering — only do this if you truly don't need ordering within the group.
SQS Batch Limits to Keep in Mind
- Max 10 messages per
SendMessageBatchcall - Max 256 KB total payload per batch
- Each individual message can be up to 256 KB
- Partial failures are normal — check
result.Failedand retry with backoff
Key Takeaways
The 90% reduction in API calls is what drives the speedup — not any clever algorithm. When I profiled the pipeline, the problem was obvious in retrospect: we were making one network round-trip per message, and network latency compounds. Most messaging bottlenecks I've investigated aren't throughput problems — they're latency-per-call problems that batching fixes for free.
If your pipeline is hitting SQS one message at a time, batch sends are the first thing to reach for. The code is straightforward, the AWS SDK supports it natively, and the gains — as I found here — can be dramatic.
Why This Matters Beyond One Company
Messaging latency is a hidden tax on every event-driven architecture. In the US, the majority of cloud-native applications — across fintech, e-commerce, healthcare technology, and logistics — rely on AWS SQS or equivalent message queues as the backbone of their operational pipelines. When those pipelines run inefficiently, the cost is paid in two currencies: engineering time spent debugging throughput bottlenecks, and direct AWS spend on API calls that could be batched for free.
The two changes documented here — deterministic deduplication IDs and SendMessageBatch grouping — are not optimizations that require deep infrastructure access or specialized tooling. They are available to any Go (or any-language) team using the standard AWS SDK today. The implementation reduces API calls by 90%, cuts pipeline latency by 98%, and does so without touching queue configuration, IAM policies, or service topology.
For organizations running campaign, notification, or event-processing workloads at scale, applying this pattern has a direct impact on both operational cost and user-facing latency — two areas where the gap between what's possible and what most teams are doing remains large. Publishing this case and the complete implementation is an attempt to close that gap.