Limb-Leaf Designs with Adaptive Exploration of the Dose

Preparing to load PDF file. please wait...

0 of 0
100%
Limb-Leaf Designs with Adaptive Exploration of the Dose

Transcript Of Limb-Leaf Designs with Adaptive Exploration of the Dose

Contemporary Clinical Trials 64 (2018) 210–218
Contents lists available at ScienceDirect
Contemporary Clinical Trials
journal homepage: www.elsevier.com/locate/conclintrial

Limb-Leaf designs for adaptive exploration of the dose-response curve T
John Spivacka,*, Bin Chengb, Bruce Levinb
a Department of Population Health Science and Policy, Icahn School of Medicine at Mount Sinai, New York, New York 10029, USA b Department of Biostatistics, Mailman School of Public Health, Columbia University, New York, New York 10032, USA

ARTICLE INFO
Keywords: Adaptive design Dose-response curve Nonmonotonicity Dose addition Closed testing principle

ABSTRACT
We propose a two-stage strategy, called the Limb-Leaf method, to explore the dose-response curve using dose promotion and addition in the context of adaptive seamless Phase II/III trials. Strong control of the overall type 1 familywise error rate of the proposed method is enforced by the closed testing principle. The design constants are determined to minimize the risk-adjusted expected total sample size while maintaining a target power. In the case of a nonmonotonic dose response curve where more doses are required to adequately explore the curve, substantial savings in sample size are achieved compared with a traditional strategy which offers only selection and promotion from among initial first stage doses.

1. Introduction
The traditional process of drug development generally consists of four phases: Phase 1, to find which doses can be tolerated, particularly the maximum tolerated dose (MTD); Phase 2, to determine the biological activity and adverse event rates of the tolerated doses; Phase 3, to determine efficacy of a selected dose; and Phase 4, after regulatory approval of the drug, as a review of safety and other long-term results. Traditionally, Phase 3 is run and analyzed independently of Phase 2, i.e., the Phase 2 results are not used in the final determination of efficacy.
One method of reducing the large costs in time, money, and patient exposure of this process is to merge phases together and to eliminate the gaps and delays between them. In general, a seamless design combines the objectives of multiple phases of the development process into a single trial. In particular, it is often possible to meet the objectives of Phases 2 and 3 within one less costly, combined study.
An adaptive seamless design is one that: (1) combines the objectives of different stages, (2) allows modification of the trial based on emerging data, and (3) is inferentially seamless in the sense that the final analysis combines data from before and after any adaptation while maintaining control over the type 1 error rate. A landmark two-stage adaptive seamless design was proposed by Thall et al. [19] (henceforth, the TSE Design). There a first stage is used to select the best (or apparently best) of several candidate treatments and the second stage focuses only on the selected treatment. Both stages include a control arm and the data from both stages are pooled for the final comparison between control and selected arm. This design in its original

formulation applies only to binary outcomes, however the TSE template is easily modified to other outcome distributions, for instance normal outcomes as described by Jennison and Turnbull [7,8]. An important generalization that includes multiple stages and the use of a test based on the score statistic was proposed by Stallard and Todd [16], and Todd and Stallard [20]. This method accommodates a general endpoint, which, for instance, could be normal, binary, or a time to an event.
Another route for the development of adaptive seamless designs has been through the adaptive P-value combination tests used by Bauer and Köhne [2]. This approach allows information from earlier stages to be combined with that of later stages, and treatment selection at adaptive interim analyses to be based on all previous information from inside and outside the trial. Midtrial modifications are possible without inflating the familywise type 1 error rate. The key ideas are: the construction of P-values with conditionally (sub)uniform distributions given the previous stages of the experiment, the pooling of evidence across stages using prespecified combination rules, and the use of a closed testing procedure to control the familywise error rate for the multiple hypotheses under study. The method of adaptive combination tests is very general and [14] shows how it includes group sequential tests, the two stage TSE design, and further generalizations as special cases.
2. A proposal for adaptive exploration
Experiments can benefit from a more structured exploration strategy when the dose-response curve is not assumed to be monotonic, particularly so in the case where it is still assumed to be unimodal. This

* Corresponding author. E-mail address: [email protected] (J. Spivack).
http://dx.doi.org/10.1016/j.cct.2017.10.002 Received 12 June 2017; Received in revised form 27 September 2017; Accepted 4 October 2017 Available online 06 October 2017 1551-7144/ © 2017 Elsevier Inc. All rights reserved.

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

possibility is especially plausible when dealing with a combined efficacy/toxicity or benefit/cost endpoint, or when dealing with a combination therapy such that no single ordering of the doses may be possible. We present the following example of a real-world setting in which such an exploration strategy might be applied.
2.1. An example based on the QALS trial
The U.S. Food and Drug Administration has approved only one drug, Riluzole, for the treatment of the devastating neurodegenerative disease Amyotrophic Lateral Sclerosis (ALS). Benefits in patient function and survival are small and safety and tolerability of Riluzole especially in regard to liver toxicity are major considerations. There is hope that the progression of ALS may one day be slowed or stopped by new medications or combinations of drugs.
Methods of action of Riluzole or other possible therapies are not fully elucidated; they are complex and may involve multiple mechanisms as reported, for instance, by Hubert et al. [5], Noh et al. [15], and Beal et al. [3]. In such settings the expectation that an investigational drug would have a monotonic dose response curve with increasing patient benefit up to a well-defined maximum tolerated dose is especially problematic. Indeed an incorrect assumption of monotonicity in the dose response is a favored explanation for the failure of a recent major ALS study as reported by Ludolph and Jesse [12].
The QALS trial published by Kaufmann et al. [9] was undertaken to investigate high doses of Coenzyme Q10 as a possible therapy for ALS. It was not assumed that the dose response would be monotonic and toxicity was carefully monitored. The study was designed in two stages, a selection stage followed by a futility test, also known as a non-superiority test (see, e.g., [10]). Two doses of CoQ10 (1800 and 2700 mg per day) together with a placebo arm began the first stage of the study. After an interim analysis the apparently better performing dose was selected to continue and additional recruitment to that arm and the placebo arm took place in the second stage. The final test statistics involved data pooled over both stages under appropriate control for selection bias and type 1 error.
The outcome of the first stage of the trial was that the higher dose did better than the lower dose on the outcome measure (ALSFRSr, the ALS functional rating scale, revised) and had high tolerability. After continuation into the second stage the test statistic associated with this higher dose was nominally sufficient to avoid a declaration of futility. The investigators nevertheless did not consider the evidence promising enough to give it full endorsement. Further details are given in Kauffman et al. [9].
Notwithstanding the importance of certain key differences of the QALS trial, especially its aim to test a futility hypothesis rather than a superiority hypothesis, it is easy to imagine that in this or a similar study, a conventional superiority hypothesis might instead be the goal. In such a case, for example, it might have been considered worthwhile to allow further exploration in the second stage around the apparently better first stage dose. Specifically, had there been an option to add higher doses beyond 2700 mg per day or to explore both above and below this dose level, this freedom and flexibility might have been attractive to investigators. It is perhaps possible that an efficacious dose might have been among those added, and good enough to earn a full endorsement. We would like to make such further options available to investigators by design in especially difficult disease areas like ALS.

undesirably large. On the other hand, at least heuristically, the more closely d *̂ approximates d* the greater the chance that the study will reject the global null hypothesis, H0 : θd ≤ 0 for all doses d, and more importantly, the closer the final recommended dose will be to that which gives patients the desired (or maximum) benefit. These considerations provide strong motivation to better explore the dose-response curve in adaptive seamless designs.
One possibility for better exploration of a non-monotonic dose-response curve in a single-stage selection strategy like the TSE Design would be to include a large number of closely spaced doses in the first stage. However, there are reasons to expect performance to suffer with this approach. First, the true d* may be hard to identify because it will have many competitors, some with nonzero effects. Also, a large number of first stage patients will have to be randomized to areas of the dose response curve that are not relevant to the final recommendation. Second, under the global null hypothesis of no treatment effect at any dose, many patients will have been treated before an early stopping decision can be made. We note that under either hypothesis, treating an excessive number of patients with an ineffective treatment or with ineffective doses of an otherwise worthwhile treatment is ethically undesirable. Some of these issues are mentioned by Thall et al. [19], but they play less of a role in the case where there are only a few possible doses to consider with broad spacing between them.
In this paper we introduce a two-stage selection procedure called the “Limb-Leaf” design in which second stage doses not only are promoted from a modest number of first stage candidates but also may be added in response to first stage results. We aim to improve the estimation of d* and to use resources more efficiently, this being particularly so under the global null hypothesis, where the probability of early stopping should be large. Such promotion and addition decisions can be based on all the available information, including efficacy and toxicity, whether it comes from within the study or from an outside source.
Although the Limb-Leaf design we present here achieves pre-specified performance characteristics in the case of a non-monotonic doseresponse curve, it actually comprises a more general approach to structured exploration of response functions.
3. Method
Below we assume without essential loss of generality that the data are normal with variance σ2 known. Extensions to other data types are mentioned with further detail given in the Appendix.
3.1. The TSE method
The TSE design has two stages; the first stage assigns subjects to all candidate doses plus the control, and the second stage studies only the best performing dose from the first stage against the control. There is an option to stop for futility using a cutoff value after the first stage, and the final decision is made by whether the combined measure of effect of the selected treatment exceeds a second cutoff value.
A version of the TSE design using normal outcomes is described as follows. Let the test doses in the experiment be denoted as d1,…,dI, with effects relative to control dose d0 of θ1,…,θI. We assume that at either stage, the outcomes at a given dose dj are independent and identically distributed (i.i.d.) as a normal random variable with mean μj and variance σ2, j = 0,1,…,I. The design proceeds in two stages:

2.2. Stagewise adaptive exploration
In exploring the dose-response relationship, particularly a nonmonotonic one, it is important to distinguish between d*, a dose with the maximum possible effect, and d *̂ , its estimated value. The corresponding effects of these doses, θd* and θd *̂ , say, could be different in a meaningful way, with θd *̂ < θd*. Similarly, if d* denotes a dose with a given desired effect (not necessarily maximum), |θd *̂ − θd*| could be

Stage 1. Randomize (I + 1)n1 patients equally to d0,d1,…,dI. Let T1 = max1≤i≤I T1,i, where for each i, T1,i = (X1,i − X1,0 )/ 2σ2 , X1,0 is the sample mean for the control d0, and X1,i is the sample mean for dose di, i = 1,…,I at stage 1. If T1 > y1, then continue by selecting the treatment, di* having the greatest observed effect, T1,i*, into a second stage. If T1 ≤ y1 then stop and accept H0 of no effect on any dose.
Stage 2. Randomize 2n2 additional patients equally to di* and d0. Let

211

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

T = n1 X1,i* − X1,0 + n2 X2,i* − X2,0 .

2 n1 + n2 2σ2

n1 + n2 2σ2

If T2 > y2 then reject H0,i*: θi* ≤ 0 and conclude that θi* > 0; if T2 ≤ y2 then do not reject H0,i*.

Notably, the design can stop early for futility but allows a new treatment to be judged superior to the control only after a second stage, based upon data from 2(n1 + n2) patients. The design constants n1,n2,y1, and y2 are determined by minimizing the risk-adjusted expected total sample size and maintaining a target power under a least favorable dose-response scenario. More details will be given in Section 5.

3.2. The Limb-Leaf design
3.2.1. Limb-Leaf structure and locatable effect Let D = {d1, …, dI} be a prespecified collection of test doses to be
investigated as in a TSE design. A Limb-Leaf structure for D is a relabelling of the doses in D as
D = {L1, l1,1, …, l1,m1; …; LK , lK,1, …, lK,mK},
where for each “limb” dose Lk, k = 1,…,K, there is an associated neighborhood also including mk leaf doses, lk,1, …, lk,mk. Let ΘD , called the dose-response configuration, be the collection of effects associated with Limb-Leaf system D . That is,
ΘD = {θL1, θl1,1, …, θl1,m1; …; θLK , θlK,1, …, θlK,mK}.
Generally, the effects θlk,1, …, θlk,mk on the respective leaves lk,1, …, lk,mk would be assumed to be somewhat similar but not identical to that of their associated limb dose Lk.
In this paper, we assume that the dose-response configuration under investigation has a “locatable” effect, whose definition is as follows. Definition 1. A dose-response configurationΘD is defined to have a locatable effect with respect to the Limb-Leaf SystemD = {L1, l1,1, …, l1,m1; …; LK , lK,1, …, lK,mK}and the response levelsδ = (δ1,δ2,δ3), with δ1 < δ2 < δ3, where δ3is considered to be the smallest desired level of effect, if the following two conditions hold:
1 The effects on each limb,θLk, for k = 1,…,K, are either less than or equal to δ1, or greater than or equal to δ2, with at least one k such thatθLk ≥ δ2.
2 For any limb LkwithθLk ≥ δ2, each effect in its neighborhood is either greater than or equal to δ3or less than or equal to δ2with at least one effect greater than or equal to δ3.
Intuitively, a dose-response configuration ΘD with a locatable effect means that the shape of the dose response curve should permit exploration in stages, the first on a coarse level and the second for finer level adjustments, in order to successfully identify a dose of desired effect level δ3 or better. Proposition 1. Any given dose-response configurationΘD has a locatable effect with respect to the Limb-Leaf SystemD = {L1, l1,1, …, l1,m1; …; LK , lK,1, …, lK,mK}and some vector of response levels δ = (δ1,δ2,δ3), with δ1 < δ2 < δ3.
The proof of this result is given in the Appendix. It follows that for prespecified D, ΘD , and δ, failure to meet the definition of locatability may be considered as a misspecification of δ, which is a useful perspective in examining the performance of the Limb-Leaf approach under violations of assumptions. This does not, however, mean that selection decisions for D and δ are unimportant; a selection of D and δ that are not only formally correct but appropriate to the underlying dose-response relationship is key for a design to have good performance characteristics. Guidance on how D and δ may be chosen and sources of information for this decision are offered in Sections 5 and 6.

3.2.2. The plan of a Limb-Leaf design A Limb-Leaf method proceeds as follows:

Step 1. Prespecify a vector c = (c1,c2), c1 < c2, representing different

levels of a test statistic, and a vector of weights (w1,w2) such that w1,w2 ≥ 0, with w12 + w22 = 1.

Step 2. Randomize n1L patients to each limb and control. Let θ1̂,Lk = X1,Lk − X1,L0 denote the first stage estimate and

Z1,Lk = θ1̂ ,Lk

2σ2 denote the first stage test statistic of the
n1L

effect of dose Lk, k = 1,…,K. The limb with the greatest first

stage statistic, denoted by Lk*, will have an estimate denoted by θ1̂,Lk* and a test statistic denoted by Z1,Lk*. Note that k* is a
random variable.

Step 3. There are 3 possibilities.

(i) If θ1̂,Lk* ≤ c1, then the study stops for futility. (ii) If c1 < θ1̂,Lk* ≤ c2 the experiment continues to Stage 2 with Lk*, its leaves lk*,1, …, lk*,mk*, and the control dose L0.

Randomize n2L patients to each of Lk* and control L0, and n2l

patients to each of lk*,1, …, lk*,mk*. Let θ2̂ ,Lk* = X2,Lk* − X2,L0, Z2,Lk* = θ2̂ 2,Lσk2* ; θ2̂ ,lk*,l = X2,lk*,l − X2,L0

n2L

and

Z2,lk*,l =

θ2̂ ,lk*,l σ2 σ2
+ n2l n2L

l = 1, …, mk*;

and

θL̂ k*pooled =

. n1L θ1̂ ,Lk* + n2L θ2̂ ,Lk*
n1L + n2L

If

θL̂ k*pooled

=

max

{

θ

̂
L

k

*

p

ooled

,

θ2̂ ,lk*,l,

l

=

1, …, mk*},

then

we

re-

ject H0,Lk*: θLk* ≤ 0 and estimate d *̂ as Lk* if and only if

w1Φ−1{FK (Z1,Lk*)} + w2 Φ−1( min {GLimb+k [ max(Z2,Lk*, Z2,lk*,(k))]}) > zα 0≤k≤mk*

Here Z2,lk*,(k) is the kth smallest order statistic of

{Z2,lk*,1, …, Z2,lk*,mk*} and it is understood that for k = 0 the maximum in the second term reduces to that over Z2,Lk* alone. The function FK is the cdf of max{Z1,L1, …, Z1,LK} under

H0: θL1 = ⋯=θLK = 0; GLimb +k is the cdf of

max{Z2,Lk*, Z2,lk*,1, …, Z2,lk*,k} under H0: θLk* = θlk*,1 = ⋯=

θlk*,k = 0, conditional on the present adaptation decision (ii); and zα is the (1 − α) × 100-th percentile of the standard

normal distribution.

If, for some l*,

θ2̂ ,lk*,l*

=

max

{

θ

̂
L

k

*

pooled

,

θ2̂ ,lk*,l,

l

=

1, …, mk*},

then we reject H0,lk*,l*: θlk*,l* ≤ 0 and claim d *̂ = lk*,l* if and

only if

(a) w1Φ−1{FK (Z1,Lk*)} + w2 Φ−1{GLimb+mk* [max(Z2,Lk*, Z2,lk*,l*)]} > zα, (b) w1Φ−1{ min1≤k≤K−1 [Fk (Z1,L(k))]} + w2 Φ−1{Gmk* (Z2,lk*,l* )} > zα,
and
(c) Φ−1{Gmk* (Z2,lk*,l* )} > zα, where Z1,L(k) is the kth smallest order statistic of {Z1,L1, …, Z1,LK} and Gmk* is the cdf of max{Z2,lk*,1, …, Z2,lk*,mk*} under H0: θLk* = θlk*,1 = ⋯=θlk*,mk* = 0, conditional on the present adaptation decision (ii).
(iii) If θ1̂,Lk* > c2 then Lk* and the control dose only should proceed into the second stage. Randomize n2L′ patients to Lk* and to the control. Let Z2,Lk* = X2,Lk*2−σ2X2,L0 . We reject
n2′ L
H0,Lk*: θLk* ≤ 0 and claim d *̂ = Lk* if and only if
w1Φ−1{FK (Z1,Lk*)} + w2 Z2,Lk* > zα.

We note that the selection of the final second stage dose using

212

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

θL̂ k*pooled is consistent with the seamless quality of the design: information across both stages is used for this decision. Hypothesis testing, however, may not be done directly in terms of θL̂ k*pooled. Stagewise combination rules, as described in the next section, are needed to make valid inference.
The Limb-Leaf design is not limited to normal outcomes with known variance. Extensions to practical cases such as those of a large sample with consistently estimated nuisance parameters, and normally distributed data with unknown variance are presented in the Appendix. The design extends similarly to cases such as that of testing a locationshift hypothesis for continuous distributions using rank sum tests, and that of testing a difference in proportions for binary data using exact tests. These will be presented in later work.

3.2.3. Closed testing procedure and justification of the Limb-Leaf method The search strategy outlined above entails selection over multiple
dose levels, which calls for control of the familywise error rate (FWER) with respect to the collection of hypotheses {H0,d: θd ≤ 0, d ∈ D }, where FWER is the probability of rejecting any true null hypothesis. Control of the FWER for a given procedure means that FWER ≤ α where the significance level α is prespecified. Furthermore, as argued by Tamhane et al. [18] among others, the appropriate form of familywise type 1 error control should be strong control, such that the FWER ≤ α regardless of which hypotheses or how many hypotheses from the collection {H0,d: θd ≤ 0, d ∈ D } are true. In other words we require that given a Limb-Leaf System D , FWER(ΘD) ≤ α regardless of the shape of the underlying dose-response configuration ΘD . We emphasize that in a design that allows addition of doses as well as their promotion, control
of the FWER only under the global null (weak control) does not imply the needed strong FWER control and is insufficient.
The closed testing procedure of Marcus et al. [13] guarantees strong control of the FWER with respect to a prespecified family of hypotheses which is closed under intersection. For a given set of hypotheses {H0,j : j = 1,…,J} the construction is as follows: For each subset S of {1,…,J} define the intersection hypothesis H0,S = ∩j∈SH0,j with corresponding level α test ϕS assumed to exist for each S. Then the closed testing principle requires that any hypothesis H0,j be rejected if and only if H0,S is rejected by ϕS for every set S that contains j. The proof of strong control of the FWER is immediate: Let S* be the set of the indices of all true null hypotheses and assume S* is non-empty (or else there is
nothing to prove). For a familywise error to be committed, H0,S* must be rejected at level α, which occurs with probability no greater than α.
The Limb-Leaf procedure uses the following tests for hypotheses
H0,D. If D contains any limb, then reject the null if and only if

ZD = w1Φ−1{FD ( max {Z1,d})} + w2 Φ−1{GD ( max {Z2,d})} ≥ zα.

d∈D

d∈D

Otherwise, reject the null if and only if

ZD = Φ−1{GD ( max {Z2,d})} ≥ zα.
d∈D

Here Zi,d is the test statistic for H0,d : θd ≤ 0 based on the ith stage data from dose d and control if dose d appears at Stage i, i = 1,2. FD is the cdf of maxd∈D{Z1,d} under θd = 0, d ∈ D, and GD is the cdf of maxd∈D{Z2,d} under θd = 0, d ∈ D, conditional on the first stage data and thus on adaptation decisions based on them. If no dose d ∈ D appears in the second stage, we set ZD ≡−∞, as their corresponding doses have been deemed irrelevant after review of first stage data. It is emphasized that w1 and w2, w12 + w22 = 1, must be prespecified at the design stage in order for the test to be valid, and that the second form
given is a special case of the first with w1 = 0. The following result provides a theoretical justification of the Limb-
Leaf procedure. The proof is given in the Appendix.
Theorem 1. For any dose-response configurationΘD , the Limb-Leaf method specified inSection 3.2.2yields strong control of the FWER at level α.

Further flexibility in the Limb-Leaf design may be allowed. Selection

of a dose other than the best performing at either stage does not undermine the validity of the familywise type 1 error rate control but may incur a power penalty that can be partly offset by a more complicated evaluation of all relevant intersection hypotheses. Dose selections can incorporate other information such as toxicity, cost or external data. Interim re-calculation of sample sizes, for instance to increase conditional power, would also be possible without undermining the validity of the trial. Finally, another potentially important type of flexibility would be in the definition of leaf doses; their exact dose levels or other aspects of their formulations could be finalized at the interim analysis without compromising the conditional sub-uniform distribution of their associated P-values and the procedure's validity.
The validity of the Limb-Leaf design under other types of data is addressed by Theorem 2, also proved in the Appendix. Theorem 2. The Limb-Leaf design extends to practical cases such as: 1) Normally distributed data with unknown variance σ2, and 2) Large sample inference with consistently estimated nuisance parameters.

4. Power and optimization

For a given Limb-Leaf System D and a vector of effect thresholds δ, we must prespecify the parameters n1L,n2L,n2L′,n2l, c = (c1,c2), and weight w1. To do this we minimize a risk-adjusted expected sample size under constraints on the power to identify and confirm an effect under specified alternatives.

4.1. Unfavorable configurations

Similar to Thall et al. [19], we choose certain unfavorable config-

urations (test configurations) in which we want to enforce adequate

power. These are instances of locatable effects with respect to δ that are

unfavorable to identification and confirmation of an effect on the cor-

rect target. One such configuration, the unfavorable limb effect con-

figuration,

ΘLimb,

would

satisfy

the

conditions:

θLk = θlk,1 = ⋯=θlk,mk = δ1 for k≠k*, θLk* = δ3, and

θlk*,1 = ⋯=θlk*,mk* = δ2 for some limb dose Lk*. Analogously, we define an unfavorable leaf effect configuration ΘLeaf as satisfying the condi-

tions θLk = θlk,1 = ⋯=θlk,mk = δ1 for k≠k*, θLk* = δ2, θlk*,l* = δ3, and θlk*,l = δ2 for l≠l*, for some limb dose Lk* and leaf dose lk*,l*.

The intuition of these choices is clear but a full characterization of

the configurations that minimize the power would depend on as-

sumptions concerning the construction of stagewise tests, combination

rules, and interim dose selection rules. We do not address a full solution

in this paper. Thus we call ΘLimb and ΘLeaf“ unfavorable” rather than

“least favorable” configurations.

4.2. Power and risk adjusted expected sample size

Let the risk-adjusted expected sample size, a form of Bayes' risk, be defined as
Eπ (N ) = π0 E0 (N ) + πΘLimb EΘLimb (N ) + πΘLeaf EΘLeaf (N ),
where E0 (N ), EΘLimb (N ), and EΘLeaf (N ) denote the expected total sample sizes under the global null, unfavorable limb, and unfavorable leaf configurations, respectively, and π0, πΘLimb, and πΘLeaf denote their associated prior probabilities. Conservative values for early to mid-stage drug development would be π0 = 0.8, πΘLimb = 0.1, and πΘLeaf = 0.1, for instance.
The design constants n1L,n2L,n2L′,n2l, c = (c1,c2), and w1 are selected to minimize Eπ(N), subject to power constraints
PΘLimb (Confirm the treatment effect on limb Lk*) ≥ 1 − β,
and
PΘLeaf (Confirm the treatment effect on leaf lk*,l*) ≥ 1 − β.

213

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

In general a nonlinear optimization procedure is required to find a possibly non-unique optimizer, or an approximate minimum can be found by a grid search over potential parameter values.
5. Simulation studies
We consider two schemas for comparison of a Limb-Leaf approach with a traditional TSE-style design. Both schemas are simple cases that could be used in practice. We compare performance characteristics over ranges of the δ1,δ2,δ3 parameters. In addition, robustness of the LimbLeaf type design is considered in the following subsection. Without loss of generality we assume that the outcome of a patient on dose d is distributed as N(θd,1), and the outcome of a control patient is distributed as N(0,1). All optimizations were conducted using computationally intensive grid-search algorithms. More efficient computational methods under development will be described in Section 6.

Table 1 TSE design constants under Schema A.

δ1

δ2

δ3

n1

n2

0.0

0.2

1.0

20

5

0.2

0.4

1.0

20

10

0.4

0.6

1.0

40

5

0.6

0.8

1.0

125

5

y1
0.575 0.500 0.625 0.700

y2
0.525 0.500 0.125 0.300

Eπ(NTSE)
82.7 125.0 162.1 502.0

Table 2 Limb-Leaf design constants under Schema A.

δ1

δ2

δ3

n1L n2L

n2L′ n2l c1

c2

w1

Eπ(NLL)

0.0 0.2 1.0 13 0.2 0.4 1.0 12 0.4 0.6 1.0 14 0.6 0.8 1.0 35

33 27 15 34 35 5 34 41 30 107 154 24

−0.475 −0.350 −0.150 0.275

1.275 1.500 1.700 1.925

0.025 0.150 0.100 0.275

132.6 138.8 136.8 225.0

5.1. Schema A: A single limb
One of the simplest possible Limb-Leaf schemas (hence forth, “Schema A”) is as follows. There are 3 testing doses of interest, organized as: limb L1 and leaves l1,1,l1,2. Stage 1 compares L1 to control L0 where each arm would have sample size n1L. Depending on the results of this first stage comparison, we might: i) terminate the study, ii) continue the study with further recruitment to L0 and L1 of n2L′ subjects each in the second stage, or iii) continue with recruitment of n2L subjects each to L0 and L1 as well n2l subjects each to leaves l1,1 and l1,2. In case iii) we will allow selection of the optimal dose by comparison of overall sample means; although the inclusion of the first stage subjects in the sample mean of L1 is not completely unbiased, in practice this information may improve the probability of correct selection of the best
dose. Evaluation of the results would be by the closed testing procedure utilizing pre-specified combination tests as described in Section 3. This could correspond, for instance, to a situation where investigators want the option to introduce interim modifications of their initial test dose L1 in case first stage results lead them to believe that either an increment or a decrement to the dose level might be necessary to optimize per-
formance.
The corresponding TSE design would include L0 and L1,l1,1,l1,2 as first stage doses, each with sample size n1. If the first stage result of the best performing test dose d *̂ exceeds the preset threshold y1, d* would continue recruitment in the second stage along with control dose L0 with n2 subjects per arm. The efficacy of d *̂ relative to L0 would then be determined by whether the overall sample mean of subjects assigned to d *̂ exceeds the pre-specified y2.
The design constants to be set for the Limb-Leaf design are then n1L,n2L,n2L′,n2l, c = (c1,c2), as well as the combination rule weights w1 and w2. Their values will be determined to minimize Eπ(N) defined in Section 4.2 with π0 = 0.8, πΘLimb = 0.1, and πΘLeaf = 0.1. Specifically, for given δ = (δ1,δ2,δ3), we let ΘLimb be given by {θL1 = δ3, θl1,1 = δ2, θl1,2 = δ2}, and ΘLeaf be given by {θL1 = δ2, θl1,1 = δ2, θl1,2 = δ3}. Numerical optimization of the criterion Eπ(N) subject to contraints of 0.9 power in both ΘLimb and ΘLeaf configurations determines the design constants.
The corresponding TSE-type design will require the design constants
n1,n2,y1, and y2. Since the TSE-type design does not recognize a distinction between limb and leaf doses, both ΘLimb and ΘLeaf may be expressed as ΘTSE, given by {θL1 = δ3, θl1,1 = δ2, θl1,2 = δ2}. The previous criterion, Eπ(N) then reduces to 0.8E0 (N ) + 0.2EΘTSE (N ). The power constraints similarly reduce to the single restriction of 0.9 power in the ΘTSE configuration.
Below we present two tables, of optimized parameter values and performance characteristics for the two designs over ranges of δ = (δ1,δ2,δ3) for Schema A (Tables 1 and 2). Where this simple case assigns the same values of ΘLimb,ΘLeaf, and ΘTSE for several values of δ, the

optimization is unaffected by which one is chosen and the results are combined.
5.2. Schema B: Two limbs
A slightly more complex Limb-Leaf schema could include 6 doses of interest: limbs L1 and L2 each with two associated leaves, l1,1,l1,2 and l2,1,l2,2, respectively. Stage 1 includes L1,L2 and control L0 where each arm would have sample size n1L. Depending on the results of this first stage comparison, we might: i) terminate the study, ii) continue the study with further recruitment to L0 and Lk* (the best performing limb in the first stage) of n2L′ subjects each in the second stage, or iii) continue with recruitment of n2L subjects each to L0, and Lk* as well n2l subjects each to leaves lk*,1 and lk*,2. In case iii) we again allow final selection of the optimal dose by comparison of overall sample means, possibly across both stages. Evaluation of the results would be by the closed testing procedure utilizing pre-specified combination tests as described in Section 3. This could correspond, for instance, to a situation where investigators have a prior belief that one of two chosen doses may provide sufficient efficacy, however, they also want the option to further explore around the dose level that appears promising in order to fine tune towards an optimal dose.
The corresponding TSE design includes L0,L1,l1,1,l1,2,L2,l2,1,l2,2 as first stage doses, each with sample size n1. If the first stage result of the best performing test dose d* exceeds the preset threshold y1, d* would continue recruitment in the second stage along with control dose L0 with n2 subjects per arm. The efficacy of d* relative to L0 would then be determined by whether the overall sample mean of subjects assigned to d* exceeds the pre-specified y2.
The design constants to be set for the Limb-Leaf design are as before n1L,n2L,n2L′,n2l, c = (c1,c2), and the weights w1 and w2. We minimize Eπ(N) defined in Section 4.2 with π0 = 0.8, πΘLimb = 0.1, and πΘLeaf = 0.1. In this case, for given δ = (δ1,δ2,δ3), we let ΘLimb be given by
{θL1 = δ1, θl1,1 = δ1, θl1,2 = δ1, θL2 = δ3, θl2,1 = δ2, θl2,2 = δ2},
and ΘLeaf be given by
{θL1 = δ1, θl1,1 = δ1, θl1,2 = δ1, θL2 = δ2, θl2,1 = δ2, θl2,2 = δ3}.
Numerical optimization of the criterion Eπ(N) subject to contraints of 0.9 power in both ΘLimb and ΘLeaf configurations determines design constants.
The corresponding TSE-type design constants n1,n2,y1, and y2 are set as before. The previous criterion, Eπ(N) then reduces to 0.8E0 (N ) + 0.2EΘTSE (N ), where ΘTSE is either ΘLimb or ΘLeaf given in the previous paragraph. The power constraints similarly reduce to the single restriction of 0.9 power in the ΘTSE configuration. Numerical optimization subject to the power constraint then determines values of

214

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

Table 3 TSE design constants under Schema B.

δ1

δ2

δ3

n1

n2

0.0

0.2

1.0

18

12

0.0

0.4

1.0

20

12

0.0

0.6

1.0

36

2

0.0

0.8

1.0

122

2

0.2

0.4

1.0

22

22

0.2

0.6

1.0

36

2

0.2

0.8

1.0

124

2

0.4

0.6

1.0

38

4

0.4

0.8

1.0

122

4

0.6

0.8

1.0

122

4

y1
0.525 0.475 0.575 0.650 0.425 0.575 0.650 0.550 0.675 0.675

y2
0.550 0.550 0.400 0.575 0.525 0.375 0.375 0.150 0.325 0.150

Eπ(NTSE)
134.9 149.4 252.9 854.8 165.7 292.9 868.8 267.9 855.6 855.6

Table 6 Sample size comparison of Limb-Leaf design and TSE design under Schema B.

δ1

δ2

δ3

Eπ(NTSE)

Eπ(NLL)

Eπ (NLL)

Eπ (NTSE)

0.0

0.2

1.0

134.9

0.0

0.4

1.0

149.4

0.0

0.6

1.0

252.9

0.0

0.8

1.0

854.8

0.2

0.4

1.0

165.7

0.2

0.6

1.0

292.9

0.2

0.8

1.0

868.8

0.4

0.6

1.0

267.8

0.4

0.8

1.0

855.6

0.6

0.8

1.0

855.6

426.0

3.2

216.3

1.4

178.7

0.7

266.5

0.3

367.3

2.2

218.7

0.7

275.5

0.3

352.0

1.3

294.5

0.3

1064.7

1.2

Table 4 Limb-Leaf design constants under Schema B.

δ1 δ2 δ3 n1L n2L n2L′ n2l c1

c2

w1

Eπ(NLL)

0.0 0.2 1.0 85 52 42 22 −0.225 0.675 0.050 426.0 0.0 0.4 1.0 33 40 37 14 −0.050 1.100 0.050 216.3 0.0 0.6 1.0 28 36 48 31 0.125 1.350 0.125 178.7 0.0 0.8 1.0 37 112 156 65 0.350 1.750 0.275 266.5 0.2 0.4 1.0 97 40 59 37 0.125 0.825 0.200 367.3 0.2 0.6 1.0 44 51 55 44 0.200 1.300 0.1125 218.7 0.2 0.8 1.0 39 111 148 12 0.325 1.575 0.175 275.5 0.4 0.6 1.0 90 54 88 36 0.200 1.050 0.325 352.0 0.4 0.8 1.0 50 108 178 6 0.350 1.625 0.075 294.5 0.6 0.8 1.0 143 130 188 28 −0.450 1.300 0.275 1064.7

n1,n2,y1, and y2. Below we present two tables of optimized parameter values and
performance characteristics for the two designs over ranges of δ for schema B (Tables 3 and 4).

5.3. Interpretation

Interpretation of the comparison between designs is aided by the following tables. The key figure is Eπ(NLL)/Eπ(NTSE), the ratio of risk adjusted expected total sample size between the Limb-Leaf and TSE designs. Where this ratio is less than 1, the comparison favors the LimbLeaf approach. As in Section 5.1, cells of Schema A which are redundant in terms of ΘLimb,ΘLeaf, and ΘTSE are combined (Tables 5 and 6).
We observe such a favorable ratio across both schemas for δ = (0.0, 0.6, 1.0), (0.0, 0.8, 1.0), (0.2, 0.6, 1.0), (0.2, 0.8, 1.0), and (0.4, 0.8, 1.0). These schemas correspond to situations where a Limb-Leaf strategy as presented in Section 3 would be appropriate: The initially chosen limb dose achieves 60% or more of the desired signal; a further fine-tuning of the dose level is then possible to maximize performance. A subtle departure from this pattern occurs in Schema B with δ = (0.6,0.8,1.0). If a first stage selection is necessary between two limbs, too small a separation between their response levels will require a large first stage sample size, disproportionately hurting the performance of the Limb-Leaf approach.
We acknowledge some limitations to the proper setting for the Limb-Leaf design, especially that prior knowledge or informed judgement are needed to guide the selection of limbs, leaves, and to support

Table 5 Sample size comparison of Limb-Leaf design and TSE design under Schema A.

δ1

δ2

δ3

Eπ(NTSE)

Eπ(NLL)

Eπ (NLL)

Eπ (NTSE)

0.0

0.2

1.0

82.7

0.2

0.4

1.0

125.0

0.4

0.6

1.0

162.1

0.6

0.8

1.0

502.0

132.6

1.6

138.8

1.1

136.8

0.8

225.0

0.4

the selection of design constants appropriate for the locatable effect assumption.
In practice, we would expect to use a maximum of three limbs and no more than two leaves per limb. Sufficient basis should exist from preliminary clinical and pre-clinical studies, pharmacokinetic and pharmacodynamics considerations, and/or expert judgement from related research to guide the choices of the limb doses and to motivate the leaf doses as clinically meaningful modifications of their respective limbs.
Such considerations are equally important in the selection of the vector δ. As in any trial, δ3, as the effect level desired to detect, should be plausible and planners should avoid over-optimism. The component δ2 should be chosen from a balanced judgment of possibilities such that an effect of level δ2 should indicate a region of promising activity for finding an effect within it of magnitude δ3. On the other hand, δ1 should represent a baseline level of effect that would not itself identify an especially promising region. We advise that in planning to use a LimbLeaf design one should give careful explanation in detail of these design choices and provide justification and citation of sources of information on which these design decisions are based. Guidance for making conservative choices that are robust to moderate misspecifications is given in Section 5.4.
5.4. Robustness against distortions for selected configurations
Recall that in Sections 5.1 and 5.2, for any given δ = (δ1,δ2,δ3) two unfavorable configurations, ΘLimb and ΘLimb, are chosen as alternatives. We denote them here as ΘLimb(δ) and ΘLeaf(δ), respectively, to emphasize their dependence on δ. Under these alternatives the Limb-Leaf design constants are calculated to achieve a 90% power. In this section, we study the robustness of the Limb-Leaf design against misspecifications of these two unfavorable alternatives. Specifically, for any given δ in Tables 2 and 4, we investigate how far we can modify each component into a possible δ′ such that the designs specified in Tables 2 and 4 maintain at least 85% or 80% power under the two modified configurations ΘLimb(δ′) and ΘLeaf(δ′). We focus on five favorable cases for both schemas with δ values of (0.0,0.6,1.0),(0.0,0.8,1.0),(0.2,0.6,1.0), (0.2,0.8,1.0), and (0.4,0.8,1.0). As in Section 5.1, under Schema A redundant cells are combined (Tables 7 and 8).
We remark that in all cases where power was maintained, the overall change in risk adjusted expected sample size was within 10% of the unperturbed value (results not shown).
The design is seen to be robust to variation in δ1′, and especially so for δ1′ less than δ1. Since the true underlying δ is unknown to the investigator, a conservative design specification would set a value for δ1 close to the upper limit of what is assumed possible. The design is sensitive at least in some scenarios to values of δ2′ greater than δ2, however it is robust to values of δ2′ that are less than δ2; a conservative specification would choose δ2 near the upper limit of its assumed plausible range. Instances where δ3′ is less than δ3 represent the greatest threat to the power of the design. For example, with the

215

J. Spivack et al.

Table 7 Allowed ranges of δ′ = (δ1′,δ2′,δ3′) for the Limb-Leaf designs under schema A.

Ranges maintaining 85% power

δ1

δ2

δ3

δ1′

δ2′

0.2

0.6

1.0

( −∞,0.60]

[0.29,0.67]

0.4

0.8

1.0

( −∞,0.80]

[0.60,0.83]

δ3′ [0.95,∞) [0.97,∞)

Contemporary Clinical Trials 64 (2018) 210–218

Ranges maintaining 80% power

δ1′ ( −∞,0.60] ( −∞,0.80]

δ2′ [0.21,0.72] [0.55,0.86]

δ3′ [0.91,∞) [0.95,∞)

Table 8 Allowed ranges of δ′ = (δ1′,δ2′,δ3′) for the Limb-Leaf designs under schema B.

Ranges maintaining 85% power

δ1

δ2

δ3

δ1′

δ2′

0.0

0.6

1.0

[ −0.58,0.18]

[0.47,0.69]

0.0

0.8

1.0

[ −0.50,0.43]

[0.61,0.83]

0.2

0.6

1.0

[ −0.48,0.33]

[0.50,0.73]

0.4

0.8

1.0

[ −0.49,0.44]

[0.59,0.83]

0.4

0.8

1.0

[ −0.43,0.51]

[0.65,0.84]

δ3′ [0.95,∞) [0.98,∞) [0.91,∞) [0.98,∞) [0.97,∞)

Ranges maintaining 80% power

δ1′ [ −0.68,0.28] [ −0.56,0.52] [ −0.57,0.38] [ −0.57,0.52] [ −0.53,0.57]

δ2′ [0.41,0.73] [0.56,0.86] [0.45,0.77] [0.55,0.86] [0.62,0.86]

δ3′ [0.90,∞) [0.95,∞) [0.88,∞) [0.95,∞) [0.95,∞)

assumed value of δ = (0.0,0.8,1.0), a δ3′ value corresponding to a 2.5% reduction in δ3 is associated with a 5% reduction in power while a 5% reduction in δ3 is associated with a 10% reduction in power. On the other hand, values of δ3′ greater than δ3 are not associated with negative impacts on power. For design purposes, an initial estimate of δ3 on the low end of its assumed range would be a conservative choice.
6. Discussion
The proposed Limb-Leaf procedure is a structured exploration strategy that builds on earlier adaptive designs such as those proposed in Bauer and Keiser [1] and Bauer and Köhne [2]. It is a generalization of the design by Thall et al. [19] targeted to improve performance under a possibly non-monotonic dose-response relationship. These have become more relevant in recent years as new therapies often entail increasingly complex mechanisms of action and combination therapies may not allow a clear ordering among their dosage levels. When δ = (δ1,δ2,δ3) is chosen appropriately, for example when δ2 is at least 60% of δ3, the saving in total sample size using a Limb-Leaf design is substantial compared with the TSE design. Limitations on the proper setting for the Limb-Leaf design and the prior knowledge and judgement needed to implement it were discussed in Section 5.3. Guidance on robust parameter choices was given in Section 5.4.
The Limb-Leaf design can be generalized to allow greater flexibility. For example, it can permit different numbers of leaves for different limbs or leaves shared by more than one limb. Furthermore, the precise location (or formulation) of leaves need not be specified in full detail ahead of the second stage; error is controlled and a valid design is achieved as long as we account for their existence. Theoretically, at least, this allows the Limb-Leaf design to consider a larger class of doses than could ever be tested in a TSE-style design — in fact a countably infinite class. In addition, other combination rules besides the weighted

inverse normal rule used in this paper for stage-wise Z-values are possible. Fisher's combination rule, for instance, may be a robust choice if some doses are expected to perform especially poorly and one wants to limit their impact on tests of effect of unrelated leaf doses. Further study on combination rules for robustness and efficiency is warranted. Along with different combination rules, different rules for stochastic curtailment may be employed. The efficiency gain of group sequential continuation [14] as well as dropping poorly performing treatment arms at potentially earlier interim points is worth investigation as is the comparison of relative benefit in a TSE-style approach.
To accomplish point and interval estimation in the context of LimbLeaf designs existing methods based on bias adjusted estimators [17], test inversion to form confidence regions, and median unbiased point estimators are applicable [4,6,8,11]. The specific implementation of these methods and comparison of their performance will be treated in future research.
A final issue is the computational complexity of parameter optimization in a Limb-Leaf type design. An R program developed by Mr. X. Lu using simulated annealing for nonlinear optimization appears to offer substantially improved speed over a basic grid search method such that a design can be optimized in less than one hour on a Windows desktop computer with 3 GHz processing speed. A preliminary version of this software is available from the authors upon request. A full version is planned for public release as an R package.
Acknowledgments
We thank Mr X. Lu for offering his expertise in statistical programming.
This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Appendix A

Proof of Proposition 1. For simplicity assume that the effects of no two doses are identical. By renumbering the limbs as necessary, we may take the limb effects θLk, k = 1, …, K , as forming an increasing sequence. Choose δ1 such that θLK−1 < δ1 < θLK . Then, the effects of all leaf doses associated with θLK may be written as an increasing sequenceθlk,1, …, θlk,mk
There are then two subcases. The first applies if θlk,mk > θLK . Choose δ2 such that δ1 < δ2 = θLK . Let i* = min{i: θlk,i > θLK}, the lowest leaf dose whose effect exceeds that of LK. We may then choose δ3 such that θLK < δ3 < θlK,i*. The conditions for a locatable effect with respect to δ = (δ1,δ2,δ3) are then satisfied.
If θlk,mk < θLK then let γ = max{δ1, θlk,mk, θLK−1}. Choose δ2 and δ3 such that γ < δ2 < δ3 < θLK . The conditions for a locatable effect with respect to δ = (δ1,δ2,δ3) are similarly satisfied.
This construction may be modified to produce other solutions and to allow perfect ties between dose effects although the notation becomes more
complicated.

216

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

Proof of Theorem 1. We demonstrate that when H0,d *̂ is rejected, so are all the intersection hypotheses that contain it. Given data, we use zi,d to
denote the observed Zi,d and relabel doses as necessary for convenience of notation. Suppose c1 < θ1̂,Lk* ≤ c2 and d *̂ = Lk*. A composite hypothesis including H0,Lk* may be written as H0,D where

D = {Lk*} ∪ ⎧L1, …, L ′⎫ ∪ ⎧lk*,1, …, lk*,m ′ ⎫ ∪ ⎧l1,1, …, l1,m′⎫ ∪ …∪⎧lK′,1, …, lK′,m ′ ⎫.

⎨⎩

K ⎬⎭ ⎨⎩

k*⎬⎭ ⎨⎩

1⎬⎭

⎨⎩

K′⎬⎭

Here Lj ≠ Lk*, K′≤ K − 1, and 0 ≤ m′j ≤ mj, j = 1,…,K′. For K′ or m′j equal to zero, the corresponding terms in braces are counted as empty. Let k′′ ∈ {0…mk*} denote the number of leaves in D that were utilized (randomized to) after the interim decision. Then the relations

zD = w1Φ−1{F1+K′ (z1,Lk*)} + w2Φ−1{GLimb+k′′[ max (z2,Lk*, z2,lk*,l, l = 1, …, k′′)]} ≥ w1Φ−1{FK (z1,Lk*)} + w2 Φ−1( min {GLimb+k [ max(z2,Lk*, z2,lk*,(k) )]}) > zα 0≤k≤mk*
imply the rejection of HD. It follows from their definitions that F1+K′ (x) ≥ FK (x) ∀ x, and that GLimb+k′′ (x) ≥ GLimb+mk* (x) data. The second inequality is assumed from the rejection of H0,Lk*
Suppose c1 < θ1̂,Lk* ≤ c2 and d *̂ = lk*,l* for some k*,l*. We consider D in three cases.

∀ x, given any first stage

Case 1. If D contains Lk*, using the same notations as above,

zD = w1Φ−1{F1+K′ (z1,Lk*)} + w2 Φ−1{GLimb+k′′ [ max (z2,Lk*, z2,lk*,l*)]} ≥ w1Φ−1{FK (z1,Lk*)} + w2 Φ−1{GLimb+mk* [max (z2,Lk*, z2,lk*,l* )]} > zα,

where the last inequality is a rejection condition for H0,lk*,l*.

Case 2. If D does not contain Lk* but contains some other limb, we write

D

=

{L1,

…, LK′}



{lk*,1,

…, lk*,m ′ } k*



{l1,1,

…, l1,m1′}



…∪{l K ′, 1,

…, lK′,m ′ ′}

K

using the previous notations. Then,
zD = w1Φ−1{F K′ ( max {z1,L1, …, z1,LK′})} + w2 Φ−1{Gk′′ (z2,lk*,l*)} ≥ w1Φ−1{ min [Fk (z1,L(k))]} + w2 Φ−1{Gmk* (z2,lk*,l* )} > zα.
1≤k≤K−1

The first inequality holds since, given any first stage data, Gk′′ (x) ≥ Gmk*, condition for H0,lk*,l*.

Case 3. If D does not contain any limbs, we write

D

=

{lk*,1,

…, lk*,m ′ } k*



{l1,1,

…, l1,m1′}



…∪{l K ′, 1,

…, lK′,m ′ ′}.

K

∀ x , from their definitions. The second inequality is a rejection

Then, zD = Φ−1{Gk′′ (z2,lk*,l*)} ≥ Φ−1{Gmk* (z2,lk*,l*)} > zα,

The second inequality is a rejection condition for H0,lk*,l*. Finally, suppose θ1̂,Lk* > c2. We write

D = {Lk*} ∪ ⎧L1, …, L ′⎫ ∪ ⎧lk*,1, …, lk*,m ′ ⎫ ∪ ⎧l1,1, …, l1,m′⎫ ∪ …∪⎧lK′,1, …, lK′,m ′ ⎫.

⎨⎩

K ⎬⎭ ⎨⎩

k*⎬⎭ ⎨⎩

1⎬⎭

⎨⎩

K′⎬⎭

Since zD = w1Φ−1{F1+K′ (z1,Lk*)} + w2 z2,Lk* ≥ w1Φ−1{FK (z1,Lk*)} + w2 z2,Lk* > zα,

where the second inequality is assumed from the rejection of H0,Lk*, H0,D is rejected. This concludes the proof of the theorem. Proof of Theorem 2.

1. For normally distributed outcomes with unknown σ2, let Z1,Lk = (X1,Lk − X1,L0) 2σn121̂ ,LLk denote the first stage test statistic of the effect of dose Lk,

k = 1,…,K, where σ12̂,Lk is the pooled estimate of σ2 in the control arm and arm Lk from stage 1. Instead of being normal with known variance, each Z1,Lk now follows a t-distribution with 2(n1L − 1) df under its associated null hypothesis of no treatment effect.

Similarly, let Z2,Lk* = (X2,Lk* − X2,L0)

2σn22̂ 2,LLk* or (X2,Lk* − X2,L0)

2σ22̂ ,Lk* as required denote the second stage test statistic for the effect of dose
n2′L

Lk*, where σ22̂ ,Lk is the pooled estimate of σ2 in the control arm and arm Lk* in stage 2. Z2,Lk* follows a t-distribution with 2(n2L − 1) df under its

associated null hypothesis. We may also define Z2,lk*,l = (X2,lk*,l − X2,L0) σ22̂ ,lk ( n12l + n21L) , l = 1, …, mk*, where σ22̂ ,lk is the pooled estimate of σ2 in the control arm and arm lk*,l in stage 2, such that under the associated null hypothesis it follows a t-distribution with (n2L + n2l − 2) df.

For prespecified values of n1L, n2L, n2L′and n2l, the functions FK, Gmk*, GLimb+mk* are then well defined under each adaptation; they can be calculated or approximated by Monte-Carlo simulation from the reference case of unit variance since the parameter σ2 has been removed by

invariance. Subject only to these changes in interpretation, the design of Section 3.2.2 applies and the proof of Theorem 1 applies to guarantee

217

J. Spivack et al.

Contemporary Clinical Trials 64 (2018) 210–218

control of the FWER at level α.

The Satterthwaite approximation may be used as usual to accommodate unequal variances.

2. A general method to draw inferences about parameters of interest in large samples is based on the calculation of the score statistic and observed

Fisher information. The method is valid for general endpoints under regularity conditions satisfied, for instance, when the response distribution

belongs to an exponential family; nuisance parameters are accommodated through use of consistent estimators. Details are given by Stallard and

Todd [16]. Theoretical justification is given by Whitehead [21].

Let θd be a measure of the superiority of treatment d relative to control with θd > 0 indicating superiority, and let Si,d and Vi,d denote the efficient

score and observed Fisher Information for θd at stages i = 1,2, (evaluated at θd = 0). It may be shown that the asymptotic joint distribution of the

Z1,L = S1,L1 , …, Z1,L = S1,LK is multivariate normal with mean proportional to the vector of assumed effect sizes and fixed covariance matrix.

1

V1,L1

K

V1,LK

Similarly, given any adaptation decision applied in the second stage, Z2,L = S2,Lk* , Z2,l = S2,lk*,1 , …, Z2,l = S2,lk*,mk* have an asymptotic

k*

V2, L *

k

k*,1

V2,lk*,1

k*,mk*

V2,lk*,mk*

distribution with mean proportional to the vector of assumed effect sizes and fixed covariance matrix. Stagewise effect estimates may be cal-

culated as θ1̂ ,Lk* = ZV11,,LLk** , θ2̂ ,Lk* = ZV22,L,Lk** , and θ2̂ ,lk*,l = ZV22,,lkl**,,ll , l = 1, …, mk*.

k

k

k

It follows that the plan of a Limb-Leaf Design of Section 3.2.2 can be implemented with the above new meanings for the symbols θ1̂,Lk*, θ2̂,Lk*, and

θ2̂ ,lk*,l, l = 1, …, mk*, as well as Z1,L1, …, Z1,LK and Z2,Lk*, Z2,lk*,1, …, Z2,lk*,mK*. The functions FK, Gmk*, GLimb+mk* are then well defined. Subject only to

these changes in interpretation, the design of Section 3.2.2 applies and the proof of Theorem 1 shows that the FWER is controlled approximately

at level α in large samples.

References
[1] P. Bauer, M. Keiser, Combining different phases in the development of medical treatments within a single trial, Stat. Med. 18 (1999) 1833–1848.
[2] P. Bauer, K. Köhne, Evaluation of experiments with adaptive interim analyses, Biometrics 50 (1994) 1029–1041.
[3] M.F. Beal, A.E. Lang, A.C. Ludolph, Neurodegenerative Diseases: Neurobiology, Pathogenesis and Therapeutics, Cambridge University Press, Cambridge, England, 2005.
[4] F. Bretz, H. Schmidli, F. König, A. Racine, W. Maurer, Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: general concepts, Biom. J. 48 (2006) 623–634.
[5] J.P. Hubert, J.C. Delumeau, J. Glowinski, J. Prémont, A. Doble, Antagonism by riluzole of entry of calcium evoked by NMDA and veratridine in rate cultured granule cells: evidence for a dual mechanism of action, Br. J. Pharmacol. 113 (1994) 261–267.
[6] C. Jennison, B. Turnbull, Group Sequential Methods with Applications to Clinical Trials, CRC Press, New York, New York, 2000.
[7] C. Jennison, B. Turnbull, Confirmatory seamless phase II/III clinical trials with hypotheses selection at interim: opportunities and limitations, Biom. J. 48 (2006) 650–655.
[8] C. Jennison, B. Turnbull, Adaptive seamless designs: selection and prospective testing of hypotheses, J. Biopharm. Stat. 17 (2007) 1135–1161.
[9] Kaufmann et al, Phase II trial of CoQ10 for ALS finds insufficient evidence to justify phase III, Ann. Neurol. 66 (2) (2009) 235–244.
[10] B. Levin, The futility study - progress over the last decade, Contemp. Clin. Trials 45

(A) (2015) 69–75. [11] Q. Liu, M.A. Proschan, G.W. Pledger, A unified theory of Two-Stage adaptive de-
signs, J. Am. Stat. Assoc. 97 (2002) 1034–1041. [12] A.C. Ludolph, S. Jesse, Review: evidence-based drug treatment in amyotrophic
lateral sclerosis and upcoming clinical trials, Ther. Adv. Neurol. Disord. 2 (2009) 319–326. [13] R. Marcus, E. Peritz, K.R. Gabriel, On closed testing procedures with special reference to ordered analysis of variance, Biometrika 63 (1976). [14] H.H. Müller, H. Schäfer, Adaptive group sequential designs for clinical trials: combining the advantages of adaptive and of classical group sequential approaches, Biometrics 57 (2001) 886–891. [15] K.M. Noh, J.Y. Hwang, H.C. Shin, J.Y. Koh, A novel neuroprotective mechanism of riluzole: direct inhibition of protein kinase C, Neurobiol. Disord. 7 (2000) 375–383. [16] N. Stallard, S. Todd, Sequential designs for phase III clinical trials incorporating treatment selection, Stat. Med. 22 (2003) 689–703. [17] N. Stallard, S. Todd, Point estimates and confidence regions for sequential trials involving selection, J. Stat. Plan. Infer. 135 (2005) 402–419. [18] A. Tamhane, Y. Hochberg, C. Dunnett, Multiple test procedures for dose finding, Biometrics 52 (1996) 21–37. [19] P.F. Thall, R. Simon, S.S. Ellenberg, Two-stage selection and testing designs for comparative clinical trials, Biometrika 75 (1988) 303–310. [20] S. Todd, N. Stallard, A new clinical design combing phases 2 and 3: sequential designs with treatment selection and a change of endpoint, Drug Inf. J. 39 (2005) 109–118. [21] J. Whitehead, The Design and Analysis of Sequential Clinical Trials, Revised Second Edition, Wiley, England, Chichester, 1997.

218
DoseControlDesignDosesStage