JSON Large File Parse Error: Memory Crashes and Streaming Fixes

Quick answer

💡JSON.parse is synchronous and loads the entire file into a single JavaScript string before parsing a single token. On Node.js, V8's string size limit and heap size cause crashes for files beyond 50-200MB. Switch to a streaming parser like JSONStream or clarinet, or use jq with the --stream flag to process large files without holding them in memory.

Error symptoms

  • FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
  • RangeError: Invalid string length when calling fs.readFileSync on a large file
  • Process killed silently by the Linux OOM killer with no error message
  • ENOMEM error returned by the operating system before parsing begins
  • Node.js process exits with code 137 (SIGKILL from kernel OOM)
  • Python MemoryError in json.load() on machines with limited RAM

Common causes

  • Calling JSON.parse on the result of fs.readFileSync for files larger than 50MB
  • V8 engine string size limit of roughly 512MB on 64-bit Node.js pre-v12 was 256MB
  • Loading an entire database export or API response snapshot into one string
  • MongoDB BSON document limit of 16MB causing errors when exporting large collections
  • Python json.load() reading the full file into memory on servers with 512MB RAM
  • CI pipelines with restricted memory hitting the limit when processing test fixtures

When it happens

  • Processing database export files that contain millions of records in a single JSON array
  • Ingesting third-party data feeds delivered as large flat JSON files
  • Running data migrations that read a JSON snapshot from disk before transforming it
  • Building ETL pipelines in Node.js that concatenate API pages into one file
  • Parsing log archives converted from NDJSON to a single JSON array format

Examples and fixes

Replace a single JSON.parse call with a stream pipeline that emits one array item at a time.

Streaming a large JSON array with JSONStream in Node.js

❌ Wrong

const fs = require('fs');

// Crashes for files > 50MB on most servers
const raw = fs.readFileSync('/data/records.json', 'utf8');
const records = JSON.parse(raw);

for (const record of records) {
  processRecord(record);
}

console.log('Done');

✅ Fixed

const fs = require('fs');
const JSONStream = require('JSONStream');
const { pipeline } = require('stream/promises');

async function processLargeFile(filePath) {
  const source = fs.createReadStream(filePath, { encoding: 'utf8' });
  const parser = JSONStream.parse('*');

  parser.on('data', (record) => {
    processRecord(record);
  });

  await pipeline(source, parser);
  console.log('Done');
}

processLargeFile('/data/records.json');

The synchronous version allocates the full file as a string before parsing begins, which exhausts heap memory on large files. The streaming version uses Node.js readable streams piped through JSONStream.parse('*'), which emits each top-level array element as a parsed object. Memory usage stays roughly constant regardless of file size because only one record lives in memory at a time. The pipeline helper from stream/promises propagates backpressure and handles cleanup automatically when an error occurs mid-stream.

Use jq with the compact output flag to write one JSON object per line, then split into manageable chunks.

Splitting a large JSON array with jq for batch processing

❌ Wrong

# This loads the entire file into memory before outputting
jq '.' large_export.json > processed.json

# Then try to split the result
split -b 100m processed.json chunk_

# Byte splits create invalid JSON fragments
cat chunk_aa | jq '.' # SyntaxError: unexpected EOF

✅ Fixed

# Emit one JSON object per line without loading all at once
jq -c '.[]' large_export.json > records.ndjson

# Split by line count into valid batches of 1000 records each
split -l 1000 records.ndjson batch_

# Each batch file is valid NDJSON, processable independently
for f in batch_*; do
  while IFS= read -r line; do
    echo "$line" | jq '.id'
  done < "$f"
done

Splitting a JSON file by bytes produces fragments that start or end mid-object, making every chunk unparseable. The fixed approach uses jq -c '.[]' to stream each array element as a single compact JSON line, creating newline-delimited JSON. The split -l 1000 command then divides the output into files with exactly 1000 complete records each. Every batch file is independently valid and can be parsed, transformed, or loaded into a database without holding the original large file in memory.

Why JSON.parse fails on large files

JSON.parse in JavaScript is an all-or-nothing operation. Before it examines a single token, the entire JSON text must exist as a contiguous string in V8's heap memory. For a 200MB file, Node.js must allocate at least 200MB for the raw UTF-8 string, then additional memory for the parsed object graph, which often expands two to four times the original text size due to JavaScript object overhead. The total allocation for a 200MB file can easily reach 600-800MB.

V8's default heap size is around 1.5GB on 64-bit systems, but many cloud environments, Docker containers, and CI runners restrict available memory to 512MB or less. When the heap limit is reached during allocation, V8 throws FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory and kills the process immediately. This is not a catchable JavaScript error — it terminates the Node.js process before any error handler can run.

On Linux servers, the OOM (out-of-memory) killer can intervene even earlier. When the operating system runs out of physical memory and swap space, the kernel sends SIGKILL to the process consuming the most memory. The process disappears without a stack trace or error message. From Node.js this appears as an exit with code 137.

Python's json.load() reads the entire file into memory before returning, producing the same behavior. A 300MB JSON file may require 1GB of RAM once Python constructs the dictionary and list objects from the parsed data. On a server with 1GB total RAM, this leaves no room for the operating system, web server, or other processes running alongside.

MongoDB exports through mongoexport default to a JSON array format. A collection with 10 million documents at 500 bytes each produces a 5GB file. Attempting to parse this with JSON.parse or json.loads will fail on virtually any standard server.

The core issue is architectural: JSON was designed as a data interchange format for reasonably sized payloads, not for files that exceed available RAM. The solution is always to switch to a streaming approach that processes the data in pieces rather than requiring the entire file to be resident in memory simultaneously. Tools like JSONStream, clarinet, ijson for Python, and jq for the command line all implement this pattern.

Diagnosing the memory failure precisely

The first diagnostic step is confirming that memory, not a JSON syntax error, is the actual cause. A syntax error produces a SyntaxError with a character position. A memory error produces a FATAL ERROR message referencing heap allocation or causes the process to exit silently. These two failure modes look similar from the outside but require different fixes.

Check the file size first. Run ls -lh filename.json or wc -c filename.json before attempting to parse it. Any file above 50MB warrants a streaming approach on constrained servers. Files above 200MB should always use streaming regardless of available memory, because the object graph after parsing will consume far more RAM than the raw text.

For Node.js processes that crash silently, add the --max-old-space-size flag temporarily as a diagnostic tool. Running node --max-old-space-size=4096 script.js gives the V8 heap 4GB of space. If the script now completes, the crash was definitely a memory issue and not a logic error. This flag is a diagnostic step, not a permanent fix — deploying with 4GB heap allocation per Node process is not practical in most environments.

On Linux, check dmesg immediately after the crash. The kernel logs OOM killer activity with the message Out of memory: Kill process [pid] ([name]) score [n] or sacrifice child. This confirms the operating system killed the process rather than Node.js throwing a catchable error.

For Python, wrap json.load() in a try block catching MemoryError. You can also use resource.getrusage(resource.RUSAGE_SELF).ru_maxrss before and after the load call to measure peak memory consumption in kilobytes. This shows exactly how much memory the parsed object graph consumed.

For jq diagnostics, run jq --stream 'length' large.json to process the file in streaming mode and count tokens without building an in-memory representation. If this command completes but jq '.' large.json does not, the file itself is valid JSON and the issue is purely memory allocation during parse. You can then use jq -c '.[]' to emit items one per line and pipe them into further processing without loading the whole structure.

Streaming fixes for Node.js, Python, and CLI

In Node.js, the most widely used streaming JSON parser is JSONStream from npm. Install it with npm install JSONStream. It wraps the clarinet SAX-style parser and exposes a simpler API for common patterns. The most frequent pattern is JSONStream.parse('*'), which emits each top-level array element as a parsed JavaScript object. Connect it to a readable stream from fs.createReadStream and use the pipeline function from stream/promises to handle backpressure and error propagation automatically.

For more complex structures, JSONStream accepts dot-notation paths. JSONStream.parse('data.records.*') would emit each element from a nested array at data.records. You can also use JSONStream.parse([[true]]) to emit all values recursively, which is useful for deeply nested structures where you want every leaf object.

The clarinet parser itself is a lower-level alternative with more fine-grained control. It implements a SAX-style event model where you register handlers for onopenobject, onkey, onvalue, oncloseobject, and similar events. This is harder to use correctly but handles malformed UTF-8 and partial reads more gracefully than JSONStream in some edge cases.

In Python, the ijson library provides streaming JSON parsing with a similar pattern. Install with pip install ijson. Use ijson.items(file_handle, 'item') to iterate over top-level array items. For nested paths, use dot notation: ijson.items(f, 'data.records.item') iterates over records inside the nested structure. Python's json.load() is fine for files under 200MB on servers with adequate RAM, but ijson should be the default choice for any file whose size is unknown at development time.

At the command line, jq with the --stream flag processes JSON in streaming mode, emitting path-value pairs instead of building an in-memory tree. The output format is different from normal jq — each token is emitted as [path, value] — but it allows you to filter enormous files. A more practical CLI pattern is jq -c '.[]' large.json to write compact newline-delimited JSON, then pipe to split or process line by line with a shell loop.

For temporary relief while transitioning to streaming, increase the Node.js heap with the NODE_OPTIONS environment variable: export NODE_OPTIONS=--max-old-space-size=4096. This is not a permanent solution but buys time to implement proper streaming without blocking a deployment.

Edge cases that complicate large file parsing

Not all large JSON files are simple flat arrays. Some exports produce deeply nested structures where a single top-level object contains arrays that contain objects that contain further arrays. JSONStream handles these with path expressions, but the path must match the actual structure precisely. A common mistake is using JSONStream.parse('*') on a file where the data is nested at data.items.* rather than the top level, causing zero events to fire and the script to complete without processing anything.

Newline-delimited JSON (NDJSON or JSON Lines) is a different format entirely. Files with .ndjson or .jsonl extensions contain one complete JSON value per line. These should not be treated as a single large JSON array. Instead, read the file line by line with readline in Node.js or iterate over lines in Python, parsing each line independently with JSON.parse or json.loads. This is far simpler than streaming a monolithic JSON file and is worth converting to if you control the export format.

MongoDB exports using mongoexport --jsonArray produce a single JSON array, while the default mongoexport without that flag produces NDJSON. If you control the export process, omitting --jsonArray gives you a streamable format out of the box without needing a streaming parser.

Character encoding edge cases become more likely with large files from varied sources. A single invalid UTF-8 byte sequence anywhere in a 500MB file causes the parser to throw an error after spending significant time loading the file. Running iconv -f utf-8 -t utf-8 -c large.json > cleaned.json before parsing removes invalid byte sequences. The -c flag silently drops characters that cannot be converted, which may be acceptable depending on the data.

Files transferred over networks can be truncated if the connection drops mid-transfer. A truncated JSON array ends with a partial object rather than the closing bracket, causing a parse error near the end of the file. Check file size against the expected Content-Length from the server, or verify with jq 'length' (which will fail on truncated files) before processing. For files fetched from object storage, re-download and compare checksums before parsing.

Common mistakes when handling large JSON files

The most frequent mistake is accumulating parsed items in memory during streaming, which defeats the entire purpose. A streaming parser emits items one at a time, but if the code pushes every emitted item into an array, the array grows to the same size as the original parsed object. The fix is to process each item immediately — write it to a database, transform and write to an output stream, or aggregate statistics — and then discard the reference.

Another common error is using JSON.parse inside an async function and assuming that async makes it non-blocking. JSON.parse is synchronous C++ code. Running it inside an async function or wrapping it in a Promise does not make it streaming or non-blocking. It still occupies the event loop for the entire duration of parsing, freezing all other requests in a Node.js server. An 80MB JSON.parse call can block the event loop for several seconds.

Using node --max-old-space-size as the primary fix rather than a temporary diagnostic measure leads to servers running with dangerously high heap allocations. If a service processes user-uploaded JSON files of varying sizes, a single 2GB upload could consume all server memory. The correct fix is to enforce file size limits at the upload layer and implement streaming for files above a reasonable threshold.

Python developers sometimes use json.loads(open(path).read()) which reads the entire file into a string first, then parses it, using peak memory equal to the string plus the parsed object graph. Using json.load(open(path)) is slightly better because it reads from the file handle directly, but it still loads everything into memory before returning. Only ijson.items() achieves true streaming behavior.

For jq, piping a large file through jq '.' to pretty-print it before redirecting to another file doubles the memory requirement because jq without --stream builds the full in-memory tree. Use jq -c '.' instead to avoid re-indenting (saving memory on whitespace), or use jq --stream with appropriate filters for files where even this is too large. Validate your JSON first using /tools/json-validator on a sample before running jq on the full file to catch syntax errors early.

Production patterns for large JSON data

Design data pipelines to prefer NDJSON over single-array JSON files from the start. NDJSON requires no streaming parser — every language and tool can read files line by line. If you receive large JSON files from external partners, add a conversion step at the ingestion boundary that transforms the file to NDJSON before any downstream processing happens. This one architectural decision eliminates the entire class of large-file parse errors.

Enforce file size limits at the API or upload boundary before any parsing occurs. In Node.js with Express, use the limit option on express.json() middleware: app.use(express.json({ limit: '10mb' })). For user-uploaded files, validate the Content-Length header before reading the body. Returning a 413 Payload Too Large response early is far better than letting the server run out of memory.

When you must process very large files in Node.js, use the pipeline function from stream/promises rather than manually managing pipe() and error handlers. Pipeline automatically destroys streams on error and resolves the promise only after all data has been consumed. This prevents common resource leak bugs where a stream is not closed after an error mid-way through processing.

Add memory monitoring to long-running data processing jobs. Use process.memoryUsage().heapUsed to log heap consumption periodically, or integrate with a metrics system like Prometheus. Setting alerts when heap usage exceeds 70% of the limit allows you to catch growing file sizes before they cause production incidents.

For Python batch jobs, consider using ijson with a generator pattern rather than loading all items into a list. Yield each processed item from a generator function, then consume it with a for loop that writes to a database or output file. This keeps memory usage flat regardless of input size. You can validate a sample of the JSON file against /tools/json-validator before starting a long batch job to catch structural issues before investing processing time.

Document the maximum safe file size for every data pipeline component in your system. Different components have different limits — the streaming parser may handle 10GB, but the database bulk insert may have a 100MB batch limit. Making these limits explicit in runbooks and monitoring prevents the common failure mode where a file grows gradually over months until it suddenly exceeds the limit of one component.

Quick fix checklist

  • Check file size with ls -lh before attempting JSON.parse
  • Switch to JSONStream.parse('*') for Node.js files above 50MB
  • Use ijson.items() in Python instead of json.load() for large files
  • Run jq -c '.[]' to convert JSON arrays to NDJSON for line-by-line processing
  • Add --max-old-space-size only as a temporary diagnostic, not a production fix
  • Enforce Content-Length limits at the API boundary before reading the body
  • Check dmesg after silent crashes to confirm OOM killer involvement
  • Validate a sample of the file with /tools/json-validator before starting batch processing

Related guides

Frequently asked questions

What is the maximum file size JSON.parse can handle in Node.js?

There is no hard fixed limit, but practical failure typically occurs between 50MB and 512MB depending on available heap memory. V8's string size limit is approximately 512MB on 64-bit systems. More importantly, the parsed object graph expands to two to four times the raw JSON size, so a 100MB file may require 300-400MB of heap. Files above 50MB should use streaming parsers instead of JSON.parse.

Does --max-old-space-size permanently fix large file parse errors?

No. Increasing the V8 heap size with --max-old-space-size gives temporary relief but does not fix the underlying architecture problem. If file sizes grow over time, the crash will return at a higher threshold. On multi-tenant servers processing user-uploaded files, a large heap allocation per process can cause the entire machine to run out of memory. Use streaming parsers as the proper long-term fix.

How does JSONStream.parse('*') work internally?

JSONStream wraps the clarinet SAX-style parser, which emits events for each JSON token rather than building a complete tree. The asterisk pattern tells JSONStream to collect tokens until a complete top-level array element is assembled, then emit it as a JavaScript object. Only one assembled object lives in memory at a time. The clarinet parser itself processes the input stream in small chunks read from the underlying Node.js readable stream.

Can Python's json.load() stream large files without ijson?

No. Python's built-in json.load() reads the entire file into memory before returning, even though it accepts a file handle. The file handle argument only determines where the text comes from, not how it is processed. For streaming behavior in Python, you must use ijson, which wraps C-level yajl or pure Python backends that process the file incrementally. Install ijson with pip install ijson and use ijson.items(f, 'item') for top-level array elements.

What does jq --stream do differently from normal jq?

Normal jq builds a complete in-memory JSON tree before applying filters, consuming memory proportional to file size. The --stream flag switches to a streaming mode that emits path-value pairs as it reads tokens, never building the full tree. Output format changes to arrays like [[0, 'key'], value]. This mode lets jq process files larger than available RAM, though filters become more complex to write because they operate on path-value pairs rather than the full structure.

How do I convert a large JSON array file to NDJSON?

Use jq -c '.[]' large.json to emit each array element as a compact single-line JSON object, which produces valid NDJSON. Redirect the output to a new file: jq -c '.[]' large.json > output.ndjson. This processes the input in streaming mode and writes one complete JSON object per line. The resulting file can be processed line by line by any tool without requiring a streaming JSON parser.

Why does MongoDB BSON export cause large file parse errors?

MongoDB limits individual BSON documents to 16MB but imposes no limit on total export file size. The mongoexport --jsonArray flag wraps all documents in a single JSON array, which can grow to many gigabytes for large collections. Parsing this array with JSON.parse fails due to memory limits. Use mongoexport without --jsonArray to get NDJSON output, or use mongodump for binary BSON format which has better tooling for large datasets.

How do I validate a large JSON file without loading it into memory?

Use jq 'empty' large.json which validates the JSON structure without building an output tree. If the file is valid JSON, the command exits with code 0 and no output. If the file has a syntax error, jq reports the error message and character position. For files too large even for this, jq --stream 'empty' large.json validates in true streaming mode. You can also paste a representative sample into /tools/json-validator to check the structure before processing the full file.

Can I increase Node.js memory per request rather than globally?

Node.js heap size is set at process startup and applies to the entire process, not individual requests. You cannot allocate a larger heap for a single request handler. The correct pattern is to enforce request body size limits that prevent memory exhaustion on a per-request basis, and to use streaming parsers so that no single request requires holding an entire large file in heap memory simultaneously.

All tools run in your browser. Your data never leaves your device. Last updated: 2026-05-06.