You can parse the JSON file once to find the positions of each level-1 separator, i.e. a comma that is part of the top-level object, and then divide the file into sections indicated by these positions. For example:
{"a": [1, 2, 3], "b": "Hello, World!", "c": {"d": 4, "e": 5}} ^ ^ ^ ^ ^ | | | | | level-2 | quoted | level-2 | | level-1 level-1
Here we want to find the level-1 commas, that separate the objects which are contained by the top-level object. We can use a generator which parses the JSON stream and keeps track of descending into and stepping out of nested objects. When it encounters a level-1 comma that is not quoted it yields the corresponding position:
def find_sep_pos(stream, *, sep=','): level = 0 quoted = False # handling strings in the json backslash = False # handling quoted quotes for pos, char in enumerate(stream): if backslash: backslash = False elif char in '{[': level += 1 elif char in ']}': level -= 1 elif char == '"': quoted = not quoted elif char == '\\': backslash = True elif char == sep and not quoted and level == 1: yield pos
Used on the example data above, this would give list(find_sep_pos(example)) == [15, 37]
.
Then we can divide the file into sections that correspond to the separator positions and load each section individually via json.loads
:
import itertools as itimport jsonwith open('example.json') as fh: # Iterating over `fh` yields lines, so we chain them in order to get characters. sep_pos = tuple(find_sep_pos(it.chain.from_iterable(fh))) fh.seek(0) # reset to the beginning of the file stream = it.chain.from_iterable(fh) opening_bracket = next(stream) closing_bracket = dict(('{}', '[]'))[opening_bracket] offset = 1 # the bracket we just consumed adds an offset of 1 for pos in sep_pos: json_str = ( opening_bracket+''.join(it.islice(stream, pos - offset))+ closing_bracket ) obj = json.loads(json_str) # this is your object next(stream) # step over the separator offset = pos + 1 # adjust where we are in the stream right now print(obj) # The last object still remains in the stream, so we load it here. obj = json.loads(opening_bracket +''.join(stream)) print(obj)