Semi-structured data is data that lacks a rigid structure and that does not conform directly to a data model, but that has tags, metadata, or elements that describe the data.

Examples of semi-structured data are JSON or XML files.

Semi-structured data often contains enough information that it can be relatively easily converted into structured data

JSON data embedded inside of a string, is an example of semi-structured data. The string contains all the information required to understand the structure of the data, but is still for the moment just a string — it hasn’t been structured yet.

data
Record 1”{‘id’: 1, ‘name’: ‘Mary X’}“
Record 2”{‘id’: 2, ‘name’: ‘John D’}”

It is often relatively straightforward to convert semi-structured data into structured data. Converting semi-structured data into structured data is often done during the Data Transformation stage in an ETL or ELT process.