Clinical Statistical Reporting in a Multilingual World

Project Scope

Several discrepancies have been discovered in statistical analysis results between different programming languages, even in fully qualified statistical computing environments. Subtle differences exist between the fundamental approaches implemented by each language, yielding differences in results which are each correct in their own right. The fact that these differences exist causes unease on the behalf of sponsor companies when submitting to a regulatory agency, as it is uncertain if the agency will view these differences as problematic. Understanding the agency’s expectations will contribute significantly to enabling the broader adoption of multiple programming languages in the production of data submission packages for regulatory review

The Clinical Statistical Reporting in a Multilingual World project seeks to clearly define this problem and provide a framework for assessing the fundamental differences for a particular statistical analysis across languages. In this context, the risk of interpreting numerical differences in analysis results due solely to differences in programming language can be mitigated, instilling confidence in both the sponsor company and the agency during the review period. This will be accomplished by:

Identifying common statistical analyses performed during submissions to narrow the scope of where discrepancies must be identified (e.g., continuous summaries, frequency counts, hazard models, bioequivalence testing, steady-state assessments, bioavailability testing, ANOVA)
Providing necessary documentation to produce equivalence in results between separate statistical analysis software packages/languages (where possible)
Evaluating and documenting differences in results between popular statistical analysis implementations as use cases
Provision of sample code for use cases through a publicly accessible code repository for both review and consumption
Promoting the notion that the ‘right’ implementation of a particular statistical analysis should be based sound statistical reasoning and not limited by the capabilities of a specific programming language or statistical analysis software package, nor its default settings

Project Leads	Email
Michael Rimler	michael.s.rimler@gsk.com
Mike Stackhouse	mike.stackhouse@atorusresearch.com
Lauren White (PHUSE Project Coordinator)	lauren@phuse.global

Objectives and Deliverables	Timelines
GitHub Repository documenting identified differences between statistical analysis implementations (based on R and SAS use cases as a starting point)	Q2 2021
Expand repository to provide comparable syntax across languages (based on R and SAS use cases as a starting point)	Q3 2021
Expand GitHub repository to incorporate Python and/or Julia	Q3 2021
White Paper providing framework for addressing language discrepancies in statistical analysis implementations, including specific use cases as examples	Q4 2021

CURRENT STATUS Q12021
Sub-teams investigating and researching differences of potential discrepancies between SAS and R to work towards the comparison document.

Project Members	Organisation
Aiming Yang	Merck, Sharp & Dohme
Amrit Singh	Bayer
Andy Miskell	Eli Lilly
Andy Nicholls	GSK
Brian Varney	Experis
Chung-kai Sun	Janssen Research & Development
Clara Beck	Chrestos
Doug Thompson	GSK
Harshal Khanolkar	Novo Nordisk
Joseph Rickert	RStudio
Karnika Dalal	Bayer
Kyle Lee	FDA
Ke Wang	Novartis
Michael Kane	Yale University
Matthew Kumar	Bayer
Mia QI	Janssen Research & Development
Min-Hua Jen	Eli Lilly
Soren Lophaven	Omicron
Steve Walker	Experis

Page tree

Clinical Statistical Reporting in a Multilingual World

Project Leads

Email

Project Members