-
Notifications
You must be signed in to change notification settings - Fork 8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
50 changed files
with
1,361 additions
and
190 deletions.
There are no files selected for viewing
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
Weather = ('Rainy', 'Sunny') | ||
Activity = ('walk', 'shop', 'clean') | ||
|
||
obs = list(range(len(Activity))) # 观测序列 | ||
states_h = list(range(len(Weather))) # 隐状态 | ||
|
||
# 初始概率(隐状态) | ||
start_p = [0.6, 0.4] | ||
# 转移概率(隐状态) | ||
trans_p = [[0.7, 0.3], | ||
[0.4, 0.6]] | ||
# 发射概率(隐状态表现为显状态的概率) | ||
emit_p = [[0.1, 0.4, 0.5], | ||
[0.6, 0.3, 0.1]] | ||
|
||
|
||
def viterbi(obs, states_h, start_p, trans_p, emit_p): | ||
"""维特比算法""" | ||
dp = [[0.0] * len(states_h)] * len(obs) | ||
path = [[0] * len(obs)] * len(states_h) | ||
|
||
# 初始化 | ||
for i in start_p: | ||
dp[0][i] = states_h[i] * emit_p[i][obs[0]] | ||
path[i][0] = i | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
Binary file not shown.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
NLP-事实类问答评测 | ||
=== | ||
|
||
Index | ||
--- | ||
<!-- TOC --> | ||
|
||
- [任务描述](#任务描述) | ||
- [基础模型 - BiDAF](#基础模型---bidaf) | ||
|
||
<!-- /TOC --> | ||
|
||
## 任务描述 | ||
- 针对每个问题 q,给定与之对应的若干候选答案篇章 a1,a2,…,an,要求设计算法从候选篇章中**抽取合适的词语、短语或句子**,形成一段正确、完整、简洁的文本,作为预测答案 apred,目标是 apred 能够正确、完整、简洁地回答问题 q。 | ||
|
||
- **示例** | ||
``` | ||
问题: 中国最大的内陆盆地是哪个 | ||
答案:塔里木盆地 | ||
材料: | ||
1. 中国新疆的塔里木盆地,是世界上最大的内陆盆地,东西长约1500公里,南北最宽处约600公里。盆地底部海拔1000米左右,面积53万平方公里。 | ||
2. 中国最大的固定、半固定沙漠天山与昆仑山之间又有塔里木盆地,面积53万平方公里,是世界最大的内陆盆地。盆地中部是塔克拉玛干大沙漠,面积33.7万平方公里,为世界第二大流动性沙漠。 | ||
``` | ||
|
||
- **数据下载** | ||
- [CIPS-SOGOU问答比赛](http://task.www.sogou.com/cips-sogou_qa/) (少量) | ||
- [百度 WebQA V2.0](http://ai.baidu.com/broad/download) | ||
- [百度 WebQA V1.0 预处理版](https://pan.baidu.com/s/1SADkZjF7kdH2Qk37LTdXKw)(密码: kc2q) | ||
> [【语料】百度的中文问答数据集WebQA](https://spaces.ac.cn/archives/4338) - 科学空间|Scientific Spaces | ||
|
||
## 基础模型 - BiDAF | ||
> [1611.01603] [Bidirectional Attention Flow for Machine Comprehension](https://arxiv.org/abs/1611.01603) | ||
**5/6 层模型结构** | ||
1. 嵌入层(字+词) | ||
1. Encoder 层 | ||
1. Attention 交互层 | ||
1. Decoder 层 | ||
1. 输出层 |
Binary file added
BIN
+128 KB
project/ref/[2009].The_BellKor_Solution_to_the_Netflix_Grand_Prize.pdf
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
术语表 | ||
=== | ||
|
||
Index | ||
--- | ||
<!-- TOC --> | ||
|
||
- [指数加权平均(指数衰减平均)](#指数加权平均指数衰减平均) | ||
- [偏差修正](#偏差修正) | ||
|
||
<!-- /TOC --> | ||
|
||
## 指数加权平均(指数衰减平均) | ||
> [什么是指数加权平均、偏差修正? - 郭耀华](http://www.cnblogs.com/guoyaohua/p/8544835.html) - 博客园 | ||
- **加权平均** | ||
- 假设 `θi` 的权重分别为 `ρi`,则 `θi` 的加权平均为: | ||
<div align="center"><a href="http://www.codecogs.com/eqnedit.php?latex=\fn_jvn&space;v=\sum_{i=1}^t\rho_i\theta_i,\quad&space;where\&space;\sum_{i=1}^t\rho_i=1"><img src="../assets/公式_20180903213109.png" height="" /></a></div> | ||
|
||
- **指数加权平均** | ||
<div align="center"><a href="http://www.codecogs.com/eqnedit.php?latex=\fn_jvn&space;\large&space;v_t=\rho&space;v_{t-1}+(1-\rho)\theta_t"><img src="../assets/公式_20180903203229.png" height="" /></a></div> | ||
|
||
> 注意到越久前的记录其权重呈**指数衰减**,因此指数加权平均也称**指数衰减平均** | ||
- **示例**:设 `ρ=0.9, v0=0` | ||
|
||
<div align="center"><a href="http://www.codecogs.com/eqnedit.php?latex=\fn_jvn&space;\begin{aligned}&space;v_t&=0.1\theta_t+0.9{\color{Red}v_{t-1}}\\&space;&=0.1\theta_t+0.1*0.9\theta_{t-1}+0.9^2{\color{Red}v_{t-2}}\\&space;&=0.1\theta_t+0.1*0.9\theta_{t-1}+0.1*0.9^2\theta_{t-2}+\cdots+0.1*0.9^{t-1}\theta_1&space;\end{aligned}"><img src="../assets/公式_20180903210625.png" height="" /></a></div> | ||
|
||
> 其中 `v_t` 可以**近似**认为是最近 `1/1-ρ` 个值的滑动平均(`ρ=0.9`时,`0.1 * 0.9^9 ≈ 0.038`),更久前的记录其权重已近似为 0。 | ||
### 偏差修正 | ||
- 指数加权平均在前期会存在较大的**误差** | ||
<div align="center"><a href="http://www.codecogs.com/eqnedit.php?latex=\fn_jvn&space;\sum_{i=1}^t0.1*0.9^{i-1}=0.1\cdot\frac{1-0.9^t}{1-0.9}=1-0.9^t"><img src="../assets/公式_20180903212935.png" height="" /></a></div> | ||
|
||
- 注意到只有当 `t -> ∞` 时,所有权重的和才接近 1,当 `t` 比较小时,并不是标准的加权平均 | ||
- **示例**:设 `ρ=0.9, v0=0` | ||
<div align="center"><a href="http://www.codecogs.com/eqnedit.php?latex=\fn_jvn&space;\begin{aligned}&space;v_t&=0.1\theta_t+0.9{\color{Red}v_{t-1}}\\&space;&=0.1\theta_t+0.1*0.9\theta_{t-1}+0.9^2{\color{Red}v_{t-2}}\\&space;&=0.1\theta_t+0.1*0.9\theta_{t-1}+0.1*0.9^2\theta_{t-2}+\cdots+0.1*0.9^{t-1}\theta_1&space;\end{aligned}"><img src="../assets/公式_20180903210625.png" height="" /></a></div> | ||
|
||
- 当 `t` 较小时,与希望的加权平均结果差距较大 | ||
- **引入偏差修正** | ||
<div align="center"><a href="http://www.codecogs.com/eqnedit.php?latex=\fn_jvn&space;\large&space;\frac{v_t}{1-\rho^t}"><img src="../assets/公式_20180903213410.png" height="" /></a></div> | ||
|
||
- 偏差修正只对**前期**的有修正效果,**后期**当 `t` 逐渐增大时 `1-ρ^t -> 1`,将不再影响 `v_t`,与期望相符 |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.