Regulatory study data (e.g. SDTM/SEND tabulations and ADaM analysis datasets) are currently exchanged in SAS XPORT (XPT) format, the legacy transport format used by the FDA. Each dataset is submitted as a separate .xpt file (for example, dm.xpt, adsl.xpt) with an accompanying define.xml to describe metadata. See a previous blog article "Submit the Clinical Trial Datasets to FDA: Using the right .xpt file format" and FDA's "STUDY DATATECHNICAL CONFORMANCE GUIDE Technical Specifications Document".
In April 2025, FDA issued a Federal Register notice "Electronic Study Data Submission; Data Standards; Clinical Data Interchange Standards Consortium Dataset-JavaScript Object Notation;Request for Comments' stating it is exploring CDISC’s Dataset-JSON (v1.1) – a JSON-based schema – as a new exchange standard for study data, with the long-term potential to replace SAS XPT v5. The FDA is requesting public comment on adopting Dataset-JSON for future submissions. This report compares the JSON and XPT formats in the context of clinical data exchange and FDA submissions, covering their overviews, advantages/disadvantages, official regulatory stance, and practical sponsor considerations.
JSON Format (CDISC Dataset-JSON)
JSON (JavaScript Object Notation) is a text-based, human-readable data format widely used in web and health IT. For example, HL7’s FHIR standard commonly uses JSON for healthcare data exchange. CDISC’s Dataset-JSON (v1.1) is a JSON schema specifically designed to represent tabular clinical study data. It is part of the CDISC Operational Data Model (ODM) v2.0 framework and is open-source and machine-readable. By design, each Dataset-JSON dataset can include column values in JSON and can reference a CDISC define.xml document for full metadata, linking data values to variable definitions. This format supports both file-based and API-based exchange of data. In practice, a set of JSON “dataset” files (one per domain) can be packaged with a define.xml or delivered via web services. The format is schema-driven and extensible, meaning it can accommodate richer metadata and longer field names than legacy formats. FDA notes that Dataset-JSON is simple to implement, very stable, and “widely supported” across software platforms. Its use of JSON (Unicode text) makes it easy to parse with standard programming libraries (JavaScript, Python, R, etc.), and it aligns with modern data standards and the FDA’s Data Modernization goals.
SAS XPORT (XPT) Format
The SAS XPORT (XPT) Transport Format v5 is the longstanding standard for FDA study data submission. XPT is a binary file format defined in the 1980s (SAS Technical Report TS-140) that encodes one dataset per file. In FDA submissions, each SDTM or ADaM dataset is delivered as an .xpt
file (e.g. dm.xpt
for demographics) along with a corresponding define.xml describing its variables. FDA’s guidance and catalogs explicitly support XPT v5: for example, a technical guide lists DM.xpt and ADSL.xpt as required files. The format is natively supported by SAS software (via PROC COPY or LIBNAME XPT) and by some third-party tools, ensuring that sponsors with SAS infrastructures can readily produce and consume it. However, XPT is not human-readable (it is binary) and has inherent limitations: variable names are limited to 8 characters (per the v5 spec) and labels to 40–200 characters, and there is no direct way to embed metadata (hence the separate define.xml). Because XPT v5 is a fixed, legacy format, it cannot represent nested or hierarchical data and requires separate metadata files. Despite these drawbacks, XPT is currently the required FDA exchange format for standardized study data – submissions that do not use FDA-approved formats (listed in the Data Standards Catalog) risk rejection.
Advantages and Disadvantages
-
JSON advantages: JSON is a modern, widely-used exchange format. Dataset-JSON supports linking to define-XML and can include rich metadata within or alongside the data. It is text-based and open, so it can be parsed by virtually any software (not just SAS), and it naturally integrates with web and API workflows. FDA’s 2022 assessment found that JSON offers “smaller file sizes, additional metadata, and simpler processing” compared to legacy formats. Because it is extensible, JSON removes XPT’s old limitations on field lengths and formats, enabling future evolution of data standards. In the PhUSE pilot, sponsors noted potential for improved efficiency, hardware cost savings, and alignment with digital data ecosystems.
-
JSON disadvantages: Dataset-JSON is not yet standard for FDA submissions, so adopting it today would require regulatory discussions or waivers. Industry tooling is nascent: sponsors must develop or acquire new processes (for example, SAS can export JSON but may need custom mapping to CDISC JSON schema). The FDA notice explicitly solicits comments on “integration challenges with existing tools and systems,” reflecting concern that current CDMS/SDTM pipelines are geared to XPT. Managing two formats during a transition also adds complexity. Because JSON is text, very large numeric datasets might be bulkier uncompressed (though gzip can mitigate this). Finally, until FDA grants formal acceptance (which would require a new guidance), sponsors using JSON would be taking a risk.
-
XPT advantages: XPT is a proven, FDA-sanctioned format. All major clinical data tools (especially SAS) can readily produce XPT. Regulatory reviewers and submission systems are already built for it, so sponsors face no surprise validation issues. Using XPT ensures immediate compliance with FDA standards (as affirmed in guidance and the Data Standards Catalog). The process of creating
.xpt
files is well-understood (e.g. using SAS PROC COPY or EXPORT), and many legacy datasets and analysis programs assume XPT input. XPT’s fixed format and single-table-per-file approach are simple and do not require on-the-fly schema negotiation. Long-term archiving of XPT files is routine (with define.xml), so sponsors have established practices for retainment. -
XPT disadvantages: XPT is technologically outdated. Its fixed schema (8-char names, etc.) and binary nature limit flexibility. It cannot easily accommodate new metadata or complex data types. Interoperability outside the SAS world is limited (one must use conversion tools). The format does not support streaming or API-based exchange, only static files. Because define.xml is separate, there is a risk of mismatches between data and metadata if not carefully managed. From an innovation standpoint, XPT is a single-version format (v5) with no path for evolving, so it is not aligned with modern data architectures (e.g. FHIR or big-data standards). Sponsors must also maintain SAS environments or rely on third-party readers, which may be a constraint for non-SAS shops.
It is noted that SAS has a procedure (Proc JSON) to facilitate the conversion of the SAS data sets to JSON format. It will not be an issue when data sets in JSON format are required for submission.
FDA Policy and Future Adoption (Federal Register Context)
According to the recent Federal Register notice, the FDA is not yet changing requirements but is actively evaluating JSON as an option. The notice explains that CDER and CBER have already conducted a pilot (with CDISC and PhUSE) showing that Dataset-JSON “has the potential to serve as a transport file for study data”. Based on a 2022 assessment, the FDA found JSON to be the most promising modern format to potentially replace XPT v5. FDA explicitly states it is considering Dataset-JSON “with the long-term potential to replace SAS XPORT Format (XPT)” for eStudy data. The Agency is requesting comments on the benefits and risks of adopting JSON and on integration challenges with current tools.
Importantly, the notice does not immediately authorize use of JSON in submissions. Until any regulatory change is finalized, sponsors must continue using FDA-supported formats (i.e. XPT v5 files with define.xml) for study data. FDA will consider the public feedback before deciding. The notice indicates that if FDA does adopt Dataset-JSON, it will update its guiding documents (specifically the “Standardized Study Data” guidance implementing Section 745A(a)) to specify JSON as a permitted format. In summary: FDA’s official preference today remains XPT (v5), but a future shift to JSON is on the table pending the rulemaking process and guidance revisions.
Practical Considerations for Sponsors
-
Regulatory compliance: Sponsors should follow FDA’s current standards. Until JSON is explicitly allowed, electronic study data must use formats in FDA’s Data Standards Catalog (currently XPT v5 for tabulation/analysis data). Any use of JSON for a submission would require prior FDA agreement (e.g. a pilot protocol or waiver). Sponsors should monitor the comment process (comments due June 9, 2025) and watch for any updated guidance.
-
Data preparation: Most sponsors build SDTM/ADaM in SAS or similar tools. Producing XPT files is straightforward in that environment (PROC COPY, EXPORT, or LIBNAME XPT). Moving to Dataset-JSON would require developing new export routines or converters. SAS 9.4 can write JSON, but additional CDISC JSON schema mapping may be needed. Conversely, new entrants or CDS/non-SAS shops may find JSON easier since many analytics platforms (R, Python, etc.) parse JSON naturally. Either way, sponsors may need to invest in tool upgrades or staff training if and when JSON becomes accepted.
-
Long-term archiving and interoperability: JSON’s plain-text nature may benefit long-term data access (no proprietary format, easily versioned). On the other hand, XPT has a long track record for archiving and reusability within regulated drug development. Sponsors should plan how to store meta-data (define.xml or embedded JSON schema) for whichever format they use.
-
Transition planning: FDA’s pilot (reported by CDER/CBER and industry) suggests promising results. Sponsors may consider participating in further testing or industry surveys to shape the outcome. They should factor potential future regulatory changes into their IT roadmaps. For example, new statistical or data warehouse systems could be chosen with JSON capabilities in mind. Stakeholders (data managers, statisticians, IT) should communicate so that any shift in format will be smooth (e.g. ensuring traceability between old and new-format data).
-
Resource impact: In the near term, maintaining support for XPT remains essential (the FDA is not dropping it yet). In the long term, shifting to JSON may lower costs (e.g. fewer hardware needs, quicker data processing as noted in industry pilots) but will require upfront effort. Sponsors should balance these factors and perhaps begin exploratory work (e.g. trial converting legacy XPT files to JSON) to assess any challenges ahead.
Trends and Upcoming Changes
The Federal Register notice signals a trend toward modernizing study data formats. JSON’s ubiquity in web and healthcare (e.g. FHIR) and its alignment with FDA’s Data Modernization Action Plan are strong drivers. CDISC’s release of Dataset-JSON v1.1 (Dec 2024) and ongoing PhUSE work show industry momentum. If JSON is adopted, expect a multi-year transition: FDA will announce any implementation timeline in future Federal Register updates (similar to how new CDISC versions are phased in). Internationally, regulators (like Health Canada or PMDA) may also follow FDA’s lead. In practice, sponsors should prepare for eventual co-existence of formats: for some time, both XPT and JSON may be permitted (with effective dates).
In summary, the immediate trend is that the FDA is open to modern data standards: it found JSON superior to alternatives (SAS XPT v8 or XML) in 2022. However, any concrete requirement change awaits the rulemaking process. Sponsors should stay informed, consider testing JSON internally, and be ready to meet whichever format the FDA ultimately endorses.
References:
- FDA’s April 9, 2025 Federal Register notice
- FDA "Electronic Study Data Submission; Data Standards; Clinical Data Interchange Standards Consortium Dataset-JavaScript Object Notation; Request for Comments"
- CDISC Dataset-JSON v1.1
- FDA STUDY DATATECHNICAL CONFORMANCE GUIDE Technical Specifications Document.
- SAS Documentation: JSON Procedure
No comments:
Post a Comment