How to merge JSON files in python

JSON (JavaScript Object Notation) is a data format used for storing, transferring, and visualizing data. Its simple structure of key:value pairs and array data types makes JSON an easy and versatile format for web services and analytical tasks.

When working with data, you will often come across useful JSON data in separate files that should be combined. For example, website analytics may be split into separate user, sales, and traffic data files. Combining relevant files into one JSON object or array simplifies the process of analysis.

In this article, we will cover the topic of merging multiple JSON files into one using Python. This will include:

Let’s get started!

Why to merge JSON files?

There are several reasons to merge JSON files :

  • Combine Data Sources – Merge user data from one JSON file with sales transaction data from a separate JSON log to enable analysis across domains.
  • Aggregate for Analysis – Tools like Pandas and visualizations like Tableau often expect a single data file as input. Merge JSON sources for simplified analytics.
  • Simplify Processing – Rather than handling multiple JSON data streams, combining relevant information into one file simplifies scripts.
  • By merging JSON sources into a single combined data structure, you can avoid manual steps to tie separate files together in analysis and visualization tasks.

Now let’s see how we can merge multiple JSON files using Python 3.

Merging all JSON objects into a single file

This is a very simple process, all we have to do is :

  1. Use Python in-built JSON module to encode and decode JSON data.
  2. Use the json.load function to load each file and append its data to a list.
  3. Last write the final list to a new file using the json.dump() function.
import json

json_files = ["file1.json", "file2.json", "file3.json"]
merged_data = []

for filename in json_files:
    with open(filename) as file:
        data = json.load(file)
        merged_data.extend(data)

with open("merged.json", "w") as file:
    json.dump(merged_data, file)

Code Explanation:

  1. Loop through each file, load it using json.load()
  2. Accumulate all the loaded data into a list
  3. Dump the merged list out to a new JSON file using json.dump()

Merging dictionaries by key in JSON file

This is useful if the JSON files contain dictionaries with overlapping keys. The steps are

  1. Use the json.load function to load each file into a dictionary.
  2. Next, update the main dictionary with items from each file while handling duplicates.
  3. Lastly, write the final dictionary to a new file using the json.dump function.
import json

json_files = ["file1.json", "file2.json", "file3.json"]
merged_data = {}

for filename in json_files:
    with open(filename) as file:
        data = json.load(file)
        merged_data.update(data)

with open("merged.json", "w") as file:
    json.dump(merged_data, file)

Code Explanation:

  1. Loop through each file name in files.
  2. Open and load the JSON data using json.load() function.
  3. Update the merged_data dict with the data loaded from this file using merged_data.update(). This will combine the data from each file into the merged_data dict.
  4. Open a file named “merged_data” and write the content of merged_data in it using json.dumb() function.

Merge JSON files in Python using Panda

In Python, we get different libraries like pandas that offers additional features for merging and manipulating JSON data.

The pandas is a python library for data analysis with built-in functions for handling JSON data, including merging files.

Here is an example on how to use Panda to merge JSON files in Python.

Concatenating DataFrames in Panda

Using Panda library we can follow the steps :

  • Use pd.read_json to read each JSON file into a separate files (dataframes).
  • Next, specify the orient parameter as "records" to treat each JSON object as a row.
  • Lastly, concatenate the DataFrames using pd.concat.
import pandas as pd

files = ["file1.json", "file2.json", "file3.json"]
dataframes = []

for filename in files:
    df = pd.read_json(filename, orient="records")
    dataframes.append(df)

combined_df = pd.concat(dataframes, ignore_index=True)


print(combined_df)

Code Explanation:

  1. files is a list of the JSON filenames to load
  2. A empty list dataframes is created to store the DataFrames loaded from each file
  3. A for loop iterates through each filename
    • pd.read_json loads the JSON file into a DataFrame, with orient=”records” indicating each line is a record/row
    • The DataFrame for each file is appended to the dataframes list
  4. pd.concat concatenates the list of DataFrames into a single big DataFrame, with ignore_index=True resetting the index instead of concatenating the indexes

Merging dictionaries by key in Panda

This is useful when we have JSON files with overlapping keys. The step we can follow are:

  • Use pd.read_json to read each JSON file into a dictionary.
  • Create a single empty dictionary to store data from each file.
  • Update the empty container with data from each dictionary, merging keys as needed.
import pandas as pd

files = ["file1.json", "file2.json", "file3.json"]
merged_data = {}

for filename in files:
    with open(filename) as f:
        data = json.load(f)
        merged_data.update(data)

combined_df = pd.DataFrame(merged_data)



print(combined_df)

Code Explanation:

  1. The merged_data initialized to an empty dict to accumulate the loaded JSON data
  2. Loop through each frame (files) in the array and load the data using json.load() function.
  3. Update the merge_data dict with the loaded dict, merging the key:value pair.
  4. Lastly, concatenate all the json files data into single dict.

Conclusion:

Here, we have learned different techniques on how to merge or combine multiple JSON files in Python. We have used in-built methods like the JSON module to encode and decode JSON data from files and also learned how to use Python libraries like Panda to combine JSON data frames into a single json file.

Scroll to Top