Any Large Complex JSON structure parsing steps into RDBMS in Generic way
- Soumen Das
- Jan 6
- 1 min read
In our day-to-day work, we come across many times with the need to parse various JSON files to load to RDBMS tables to fulfil our needs. However, the challenge lies with the complexity i.e., if the JSON has nested arrays, size and generic implementation path. Here, we will discuss about the steps to handle all those challenges and parse any complex large JSON file into RDBMS table(s) using generic methodology (coding via python).
Split the large JSON file into around 4 MB chunk size each small JSON files
Parse generically each small JSON file into CSVs using python in a enumerated folder structure. Consider each array inside the JSON file as a separate CSV file as the name of the node.
Merge the same name CSV files under different enumerated folder structures into a single CSV file in a different location using pandas concat dataframes function.
Load the merged CSVs to Blob Storage using python.
Populate RDBMS tables from the Blob Storage CSVs using ADF Copy activity or by some other means.
For security reasons, we can not share the actual code base here though.
Comentarios