We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
当前进展:
整体:
具体:
当前代码工作的测试及相应结果:
输入:
实际输出:
遇到的问题:
Q1:输出采样路径时发现或者出现连续采样到相同顶点的/或者出现运行错误(Array out of boundary)
S1:通过debug发现两处问题: 1. scala中Double.NaN不等于任何数(包括其自身),判断时应该使用 “.isNaN()” 2.构建层间跳跃时,忽视某顶点的高一层对应顶点可能其邻点集合为空,在实现时判断逻辑优化为如下: - 在构建多层网络时在RDD中直接过滤掉邻点集为空的元素 - 在进行跨层判断时,先判断该顶点是否存在上层对应点,若不存在,则跨层只向下
Q2:调用Spark milb中的算法时,接口所要求的的Dataframe中列的类型不匹配(如word2vec要求输入列元素类型为Array[String])
S2:在重写函数 transformSchema中更改相应类型,例如: override def transformSchema(schema: StructType): StructType = { StructType(Seq(StructField("src",IntegerType, nullable = false),StructField("epochNum",IntegerType,nullable = false),StructField("path",ArrayType(StringType),nullable = false))) }
override def transformSchema(schema: StructType): StructType = { StructType(Seq(StructField("src",IntegerType, nullable = false),StructField("epochNum",IntegerType,nullable = false),StructField("path",ArrayType(StringType),nullable = false))) }
Q3: 如何将每轮算法的运行进行切分和封装以最终实现分布式
S3:进一步参考源码中“DeepWalkPartition”和"DeepWalkPSModel"
未来的工作:
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Angel项目第五周&第六周进展
当前进展:
整体:
具体:
当前代码工作的测试及相应结果:
输入:
实际输出:
遇到的问题:
Q1:输出采样路径时发现或者出现连续采样到相同顶点的/或者出现运行错误(Array out of boundary)
S1:通过debug发现两处问题:
1. scala中Double.NaN不等于任何数(包括其自身),判断时应该使用 “.isNaN()”
2.构建层间跳跃时,忽视某顶点的高一层对应顶点可能其邻点集合为空,在实现时判断逻辑优化为如下:
- 在构建多层网络时在RDD中直接过滤掉邻点集为空的元素
- 在进行跨层判断时,先判断该顶点是否存在上层对应点,若不存在,则跨层只向下
Q2:调用Spark milb中的算法时,接口所要求的的Dataframe中列的类型不匹配(如word2vec要求输入列元素类型为Array[String])
S2:在重写函数 transformSchema中更改相应类型,例如:
override def transformSchema(schema: StructType): StructType = { StructType(Seq(StructField("src",IntegerType, nullable = false),StructField("epochNum",IntegerType,nullable = false),StructField("path",ArrayType(StringType),nullable = false))) }
Q3: 如何将每轮算法的运行进行切分和封装以最终实现分布式
S3:进一步参考源码中“DeepWalkPartition”和"DeepWalkPSModel"
未来的工作:
The text was updated successfully, but these errors were encountered: