Weak aggregating algorithm for the distribution-free perishable inventory problem
In this article, we propose a novel approach to the distributionfree, multi-period problem that utilizes recent advances in the theory of prediction and learning with expert advice.
Weak Aggregating Algorithm (WAA)
假设专家建议构成集合 $\Theta$,专家 $\theta \in \Theta$ 可以是一个数,或是回归的系数。
根据历史信息 $\pi(D_n, \gamma_n^\theta)$ 来决定 $p_n(\mathrm{d} \theta)$,也就是专家 $\theta$ 的比例,最后得到
$$ \gamma_n = \displaystyle\int_{\Theta} \gamma_{n}^\theta \, p_n(\mathrm{d} \theta) $$
作为集体建议和下一次的决策。最后,可以证明 $\gamma_n \to_{\mathbb{P}} y^\ast$ 。
Dynamic Pricing
Dynamic Pricing Under a General Parametric Choice Model
OR, 2012. Josef Broder, Paat Rusmevichientong.
$T$ 个顾客以离散的时间到达商店,人们的 willingness-to-pay $\{V_t: t \geqslant 1\}$ 是一个以 $\mathbf{z} \in \mathscr{Z} \subset \mathbb{R}^n$ 为参数的 i.i.d. 的分布,其累计分布函数 $d(p; \mathbf{z}) = P(V_t \geqslant p)$ 。
当顾客的 willingness-to-pay 不大于商品的价值时,顾客就会购买商品。
期望利润:$r(p; \mathbf{z}) = p d(p; \mathbf{z})$,可行的价格 $\mathscr{P} = [p_\min, p_\max] \subset \mathbb{R}_+$ .
Assumption 1
-
$0<d_{\min } \leqslant d(p ; \mathbf{z}) \leqslant d_{\max }<1$ for all $p \in \mathscr{P}$ and $\mathbf{z} \in \mathscr{Z}$.
-
The revenue function $p \mapsto r(p ; \mathbf{z})$ has a unique maximizer $p^\ast(\mathbf{z}) \in \mathscr{P}$.
-
The function $\mathbf{z} \mapsto p^\ast(\mathbf{z})$ is L-Lipschitz, that is, $\left|p^\ast(\mathbf{z})-p^\ast(\overline{\mathbf{z}})\right| \leqslant L\|\mathbf{z}-\overline{\mathbf{z}}\|$ for all $\mathbf{z}, \overline{\mathbf{z}} \in \mathscr{Z}$.
-
The revenue function $p \mapsto r(p ; \mathbf{z})$ is twice differentiable with $\sup _{p \in \mathscr{P}, \mathbf{z} \in \mathcal{X}}\left|r^{\prime \prime}(p ; \mathbf{z})\right| \leqslant c_r$.
用MLE去估计参数
Chasing Demand: Learning and Earning in a Changing Environment
MOR, 2017. N. Bora Keskin, Assaf Zeevi.
假设需求与价格的关系为: $$ D_t=\alpha_t+\beta_t p_t+\epsilon_t \quad \text { for } t=1,2, \ldots, $$ $\theta = (\alpha, \beta) \in \Theta$ 是未知参数。这是一个线性关系。记 $$ r(p, \theta):=p(\alpha+\beta p) \qquad \varphi(\theta):=\arg \max \{r(p, \theta): p \in[l, u]\} $$ Constant-Budget
将 $0-T$ 内参数的变化以如下指标度量: $$ V_\theta(T):=\sup _{\left\{t_0, t_1, \ldots, t_K\right\} \in \mathscr{P}, K \geq 1}\left\{\sum_{k=1}^K\left\|\theta_{t_k}-\theta_{t_{k-1}}\right\|^2\right\} $$ 对于参数 $\boldsymbol{\theta}$,加以约束: $$ \mathscr{V}(T, B)=\left\{\boldsymbol{\theta}: V_\theta(T) \leq B\right\} $$ 其 regret 定义为: $$ \mathscr{R}^\pi(T, B)=\sup \left\{\Delta_\theta^\pi(T): \boldsymbol{\theta} \in \mathcal{V}(T, B)\right\} \qquad \Delta_\theta^\pi(T)=\mathbb{E}_\theta^\pi\left\{\sum_{t=1}^T\left(1-\frac{r\left(p_t, \theta_t\right)}{r^\ast\left(\theta_t\right)}\right)\right\} $$
证得 regret 界为: $$ c B^{1 / 3} T^{2 / 3} \leq \mathscr{R}^\pi(T, B) \leq C B^{1/3}T^{2/3} $$ 策略是一个滑动窗口的加权最小二乘。
Bursty Changes
假设参数的变动存在一个最小的幅度 $$ d_\theta:=\inf \left\{\left\|\theta_t-\theta_s\right\|: \theta_t \neq \theta_s, 1 \leq s<t \leq T\right\} \geq \delta $$ 这意味着至多存在 $\bar{C} = \lceil B / \delta^2 \rceil$,
在这种情况下,可以得到 $CT^{1/2} \log T$ 的界。
Pricing and Inventory
Online Network Revenue Management Using Thompson Sampling
OR, 2018. Kris Johnson Ferreira, David Simchi-Levi, He Wang.
文章研究的是一个网络优化问题,有 $N$ 种产品,$M$ 种资源,第 $i \in [N]$ 种产品消耗第 $j \in [M]$ 种资源的数量为 $a_{ij}$,$I_j$ 是第 $j$ 种资源的初始库存水平。
可行的价格是一个离散的集合 $\{p_1, p_2, \dots, p_K\}$,其中 $p_k$ 是一个 $N$ 维的向量。需求分布 $D(t) = (D_1(t), \dots, D_N(t))$ 是一个轻尾的参数分布 $F(x_1, \dots, x_N; p_k, \theta)$ 。记在真实的参数 $\theta$ 下的需求均值为 $d = \{d_{ik}\}_{i \in [N], k \in [K]}$ 。
文章指出了该收益管理问题与MAB的差异是模型里有库存约束。
In the MAB setting, if mean revenue associated with each price vector is known, the optimal strategy is to choose a price vector with the highest mean revenue. But in the presence of limited inventory, a mixed strategy that chooses multiple price vectors over the selling season may achieve significantly higher revenue than any single price strategy. Therefore, a good pricing strategy should converge not to a single price, but to a distribution of (possibly) multiple prices.
首先文章分析了每个时期库存约束固定是 $c_j := I_j / T$ 的简单情况,给出了算法 1。
Algorithm 1. Thompson Sampling with Fixed Inventory Constraints (TS-Fixed)
Repeat the following steps for all periods $t = 1, \dots, T$
- Sample Demand: Sample a random parameter $\theta(t) \in \Theta$ according to the posterior distribution of $\theta$ given history $\mathscr{H}_{t-1}$. Let the mean demand under $\theta(t)$ be $d(t)=\left\{d_{i k}(t)\right\}_{i \in[N], k \in[K]}$.
- Optimize Prices Given Sampled Demand: Solve the following linear program, denoted by LP $(d(t))$ 。
$$ \begin{aligned} \mathrm{LP}(d(t)): \quad \max _x & \sum_{k=1}^K\left(\sum_{i=1}^N p_{i k} d_{i k}(t)\right) x_k \\ \text { subject to } & \sum_{k=1}^K\left(\sum_{i=1}^N a_{i j} d_{i k}(t)\right) x_k \leq c_j, \; \forall j \in[M] \\ & \sum_{k=1}^K x_k \leq 1 \\ & x_k \geq 0, k \in[K] \end{aligned} $$
Let $x(t)=\left(x_1(t), \ldots, x_K(t)\right)$ be the optimal solution to $\operatorname{LP}(d(t))$. 3. Offer Price: Offer price vector $P(t)=p_k$ with probability $x_k(t)$, and choose $P(t)=p_{\infty}$ with probability $1-\sum_{k=1}^K x_k(t)$. 4. Update Estimate of Parameter: Observe demand $D(t)$. Update the history $\mathscr{H}_t=\mathscr{H}_{t-1} \cup\{P(t), D(t)\}$ and the posterior distribution of $\theta$ given $\mathscr{H}_t$.
接着,在算法1,把 $t$ 时刻的库存约束换成 $c_j(t)= I_j (t - 1) / (T - t + 1)$ 作为一个时期平均的剩余库存,就得到了文章的算法2。
文章接着定义了 regret 和 Bayesian regret: $$ \begin{aligned} & \operatorname{Regret}(T, \theta)=\mathrm{E}\left[\operatorname{Rev}^\ast(T) \mid \theta\right]-\mathrm{E}[\operatorname{Rev}(T) \mid \theta] \\ & \text{BayesRegret }(T)=\mathrm{E}[\operatorname{Regret}(T, \theta)] \end{aligned} $$ 假设 $D_i(t)$ 有上界 $\overline{d}_i$,定义 $$ p_{\max }:=\max _{k \in[K]} \sum_{i=1}^N p_{i k} \bar{d}_i, \quad p_{\max }^j:=\max _{i \in[N]: a_{i j} \neq 0, k \in[K]} \frac{p_{i k}}{a_{i j}}, \forall j \in[M] $$
Technical Note—Data-Based Dynamic Pricing and Inventory Control with Censored Demand and Limited Price Changes
OR, 2020. Boxiao Chen, Xiuli Chao, Yining Wang.
这篇文章的假设和理论方法来自「OR, 2012. Josef Broder, Paat Rusmevichientong.」
每个时期的需求依赖于价格 $p_t$,需求分布依赖参数 $\mathbf{z} \in \mathscr{Z} \subset \mathbb{R}^k$,其概率质量函数是 $D_t(p_t , \mathbf{z}) = f(\cdot; p_t, \mathbf{z})$,累计分布函数为 $F(\cdot; p_t, \mathbf{z})$,支撑集 $\{d^l, d^{l} + 1, \dots, d^h\}$。
每个时期只能观测到销售数据(censored demand),参数 $\mathbf{z}$ 需要通过在数据中学习。
需要满足价格变化次数不超过 $m$ 这个约束: $$ \sum_{t=1}^{T-1} \mathbf{1}\left(p_t \neq p_{t+1}\right) \leq m $$ Benchmark. 如果 $\mathbf{z}$ 已知,并且完全知道 $D_t (p_t, \mathbf{z})$,那么存在最优的价格和 order-up-to 库存水平 $(p^\ast, q^\ast)$,由此可以计算 regret。
通过 censored data、非 iid 数据得到的 MLE 估计量 $\hat{z}_i$ 有良好的收敛性: $$ \mathbb{P}\left\{\left|z-\hat{z}_i\right| \geq \epsilon\right\} \leq K_3 e^{-K_4 I_i \epsilon^2}+\frac{K_5}{I_i} \tag{Proposition-1} $$
Dynamic Pricing and Learning with Finite Inventories
OR, 2015. Arnoud V. den Boer, Bert Zwart.
Parametric demand learning with limited price explorations in a backlog stochastic inventory system.
IISE, 2019.
Data-Driven Dynamic Pricing and Ordering with Perishable Inventory in a Changing Environment
MS, 2022. N. Bora Keskin, Yuexing Li, Jing-Sheng Song.
【href】
Bayesian Dynamic Pricing
Bayesian Dynamic Pricing Policies: Learning and Earning Under a Binary Prior Distribution
MS, 2012.
Bayesian dithering for learning: Asymptotically optimal policies in dynamic pricing
In this paper, we present dithering as a unified approach to strike the balance between exploration and exploitation in the context of dynamic pricing.
阐述了参数方法 incomplete learning 的问题。
The Exploration-Exploitation Trade-off in the Newsvendor Problem
STOCHASTIC SYSTEMS, 2022. Omar Besbes, Juan M. Chaneton, Ciamac C. Moallemi.
Analysis without Regret
Bayesian MDP in inventory mangement
Stalking Information: Bayesian Inventory Management with Unobserved Lost Sales
MS, 1999. Martin A. Lariviere, Evan L. Porteus.
The censored newsvendor and the optimal acquisition of information.
OR, 2002. XIAOMEI DING, MARTIN L. PUTERMAN, ARNAB BISI.
This paper investigates the effect of demand censoring on the optimal policy in newsvendor inventory models with general parametric demand distributions and unknown parameter values.
Using a Bayesian Markov decision process approach we show that the optimal inventory level in the presence of censored demand is higher than would be determined using a Bayesian myopic policy.
假设需求分布 $X_n$ 是从参数分布族 $f(\cdot | \theta)$ 中产生的,参数 $\theta$ 未知。给定先验分布 $\pi_n(\theta)$,后验分布是: $$ \pi_{n+1}\left(\theta | x_n\right)=\frac{f\left(x_n | \theta\right) \pi_n(\theta)}{\int_{\Theta} f\left(x_n | \theta^{\prime}\right) \pi_n\left(\theta^{\prime}\right) d \theta^{\prime}} $$ 于是这时候对需求分布的估计就是: $$ m_n(x)=\int_{\Theta} f(x | \theta) \pi_n\left(\theta | x_{n-1}\right) d \theta $$ 文章把 Observable Lost Sales 的情况建模成一个 Bayesian MDP。
而对于 demand censoring 的情况也可以写成一个 BMDP。
A Note on “The Censored Newsvendor and the Optimal Acquisition of Information”
OR, 2009. Alain Bensoussan, Metin Çakanyıldırım, Suresh P. Sethi.
This paper revisits the finite-horizon model of a censored newsvendor by Ding et al. [Ding, X., M. L. Puterman, A. Bisi. 2002. The censored newsvendor and the optimal acquisition of information. Oper. Res. 50 517–527]. An important result claimed there without a proper proof is that the myopic order quantity is always less than or equal to the optimal order quantity. Lu et al. [Lu, X., J. S. Song, K. Zhu. 2008. Analysis of perishable inventory systems with censored demand data. Oper. Res. 56(4) 1034–1038.] supplied a correct proof of the result.
Technical Note—Analysis of Perishable-Inventory Systems with Censored Demand Data
OR, 2008. Xiangwen Lu, Jing-Sheng Song, Kaijie Zhu.
用 DP 进行建模。
Dynamic Inventory Management with Learning About the Demand Distribution and Substitution Probability
MSOM, 2008
A Multiperiod Newsvendor Problem with Partially Observed Demand
MOR, 2007. Alain Bensoussan, Metin Çakanyıldırım, Suresh P. Sethi.
没用贝叶斯方法,但也是DP。
Dynamic pricing and inventory management with demand learning: A bayesian approach
COR, 2020. Jue Liu, Zhan Pang, Linggang Qi.
Operational Statistics
A practical inventory control policy using operational statistics
ORL, 2005. Liwan H. Liyanage, J.George Shanthikumar.
The traditional approach of separating the parameter estimation and the maximization of the expected profit leads to a suboptimal inventory policy.
引入了 operational statistics 这一概念。
文章假设需求分布服从指数分布,但是参数未知。通常的做法是 estimation 然后做 optimization,即先通过需求样本估计指数分布的参数,再选取该参数下指数分布的某个分为点作为下一阶段的订货量。
文章证明了这种做法是次优的(sub-optimal)。直接把订货量当成需求样本的函数,代入到利润函数里,再计算得到订货量的这种方法,是最优的,文章称这种做法叫 operational statistics。
Inventory Policy with Parametric Demand: Operational Statistics, Linear Correction, and Regression
POM, 2012. Vivek Ramamurthy, J. George Shanthikumar, Zuo-Jun Max Shen.
对更多的参数族讨论了其 operational statistics。
The framework of parametric and nonparametric operational data analytics
Other
A two-armed bandit theory of market pricing
Theory
Learning to Optimize via Posterior Sampling
MOR, 2014. Daniel Russo, Benjamin Van Roy.
Thompson Sampling for the MNL-Bandit
Proceedings of Machine Learning Research, 2017. Shipra Agrawal, Vashist Avadhanula, Vineet Goyal, Assaf Zeevi.
$N$ 件商品,其价值分别为 $r_1, \dots, r_N$,记选品 $S$ 的选择模型为: $$ p_i(S)= \begin{cases}\displaystyle\frac{v_i}{v_0+\sum_{j \in S} v_j}, & \text { if } i \in S \cup\{0\} \\ 0, & \text { otherwise, }\end{cases} $$ 假设 $v_0 = 1$,静态的优化问题是 $$ \max_{|S| \leq K} \; R(S, \mathbf{v}) = \sum_{i \in S} r_i p_i(S)=\sum_{i \in S} \frac{r_i v_i}{1+\sum_{j \in S} v_j} $$ 在已知 $v_i \; (i=1, \dots, N)$ 时,存在最优的 assortment $S^\ast$,定义 regret $$ \operatorname{Reg}(T, \mathbf{v})=\mathbb{E}\left[\sum_{t=1}^T R\left(S^\ast, \mathbf{v}\right)-R\left(S_t, \mathbf{v}\right)\right] $$ 文章设计了一组共轭先验 (conjugate prior):
令 $\tilde{v}_{i,\ell}$ 是选品集保持为 $S_\ell$ 时,$i \in S_{\ell}$ 选取的次数(直到商品 $0$ 被选取,即顾客没有选择商品),则对于任意的 $\ell, i$,$\tilde{v}_{i, \ell}$ 都是均值为 $v_i$ 的几何分布,试验成功的几率是 $1/(1+ v_i)$ .