卷积神经网络(CNN):
可以捕获图像稳定的local 局部特征和模式 其利用权值共享的理念为解决物体的平移问题而设计的,通过卷积核在全局移动获得局部pattern,再用多个(若干)卷积核叠加,来从多个角度去提取适合下游任务的表征。通过反向传播机制来学习这样的表征,使得一个对象不论在图像中的位置如何都能被系统识别。
并且随着卷积层数增加可以获得全局global的风格和特征表示
但其不能处理视点变化的其他效果,如旋转、缩放等。
主要改进途径:
1、Dilated convolution 空洞卷积 隔1(k)个进行卷积 recepitive field
是在标准的 convolution map 里注入空洞,以此来增加 reception field。相比原来的正常convolution,dilated convolution 多了一个 hyper-parameter 称之为 dilation rate 指的是kernel的间隔数量(e.g. 正常的 convolution 是 dilatation rate 1)。
2、Spatial Pyramid Pooling
3、并行结构 Inception
4、串行结构: Residual Connections/ DenseNet
5、 Faster R-CNN 提名策略:打分评估Regression box
6、 Capsule Network 胶囊网络:向量神经元 局部和整体的关系
参考:https://www.sohu.com/a/226611009_633698
http://08643.cn/p/83309cdb9326
7、Gate机制:
门机制,一般通过激活函数sigmoid或tanh来和一个神经元做点积,这样可以通过学习sigmoid里的权重来控制释放多少信息到下一层。
整体结构上的搭配和集成:
如体现一个层次性的结构,或者多个网络集成、并行、多尺度
8、TCN:一维卷积,在时间维度上的融合和依赖捕获
9、NLP知识蒸馏:
通过一部分学习软分类(即有一个小数来过度)和硬间隔(0-1)的方式来学到更多的特征
10、Attention:
通过建立encoder和decoder 或建模形成encoder-decoder,将decoder的结果和encoder进行相似度度量,然后形成权重encoder中每一步的权重,再反过来去输出decoder。(一般为时序上时间序列LSTM+Attention)
Channel Attention: 对通道进行加权
Spatial Attention:即为w*h上每个点的位置都有一个权重,然后对所有的空间位置进行加权。
Multi-head attention:多头注意力机制,就是通过多个并行的参数变换(线性变换),使得可以学习到不同角度的pattern。
11、动态网络搜索:
(1)NAS:Neural Architecture Search
(2)Dynamic Routing
问题:
1.坐标系和 部分-整体关系很重要。
CNN仅仅将图像识别为以不同图案排列的像素斑点,也没有实体及其关系的显式内部表示。
2.差异可能是危险的,稍加修改可能会出现错误分类,而对人眼没有影响,即噪声影响了卷积的稳定性。
Graph Convolution:拓展到图结构数据的卷积
Graph CN 三个挑战
1、要学稳定的local pattern
2、降低核空间的计算复杂度
3、选择相等的邻接节点个数(可通过KNN,自适应邻节点选择等)
可以改进的方向:
GCN内部改进的可能操作:
1.不同的操作:前后差分,相邻节点差分
2.相似性度量方式和不同的节点选择
3.残差,密集连接Residual等,
4.多尺度Multi-scale GCN
5.层次性,+LSTM等
Advancing GraphSAGE with a data-driven node sampling
【CVPR2020】Learning Dynamic Routing for Semantic Segmentation
根据输入尺寸的不同,选择不同的前向传播路径。
多路径传播与跳跃连接,这种连接方式在语义分割中相当重要。
路径选择门控网络Soft conditional Gate: 可根据输入图像自适应地选择特征变换路径。路径选择门控还能被建模为一个可微分模组,结合给定的计算资源对网络结构进行端到端优化。本文主要将路径选择实例化为一个软条件门控机制来实现根据object的尺度进行路径的自动选择。
This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing. The proposed framework generates data-dependent routes, adapting to the scale distribution of each image
on the fly 在运行中
relax the network level routing space to support multi-level propagations, bridging substantial network capacity.
Large distribution variance of object scale (in a figure/picture) e.g., the tiny object instances and picture-filled background stuff.
bring difficulties to feature representations
delve 钻研 探究 There are several works delving in searching effective architectures.
intend/aim/design to in a single xx, which lacks the adaptability to diverse scale distributions in the real-world environment.
customizable 定制化的network
accommodate the scale variance of each image.
conceptally novel
the specific network varies with inputs.
Different from them, this work focuses on semantic representation and intends to alleviate scale variance as well as improve network efficiency.
skip connections
The overall approach, named xx, can be easily
instantiated for semantic segmentation.
be fixed for stability
dashed lines denote alternative paths for dynamic routing.
both to reduce the feature resolution and double the number of filters.
upsampling: increase the spatial resolution as well as halve the number of filters. (二等分)
suffice for resolution decline pipeline 满足。。
为了释放xx的潜力
To release the potential of dynamic routing, we provide the FC paths between adjacent layers with xx.
与x只有一条特定路径不同的是,我们在每一个candidate中进一步放宽了路由空间来支持多路径和跳跃连接。
Different from Autoxx where only one specific path xx is selected in the inference stage, we further relax the routing space to support multi-path routes and skip-connections in each candidate.
动态网络的种类:
1.drop blocks
2.prune channels
3.skip convolutional blocks
4.dynamic routing
Cell Operation
Soft conditional Gate:
再加上2个Budget约束,Node和Space,即可得到动态神经网络路由的基本框架。
【CVPR2020】SGAS: Sequential Greedy Architecture Search
由于一般的神经网络搜索方法都是利用小数据集有代表性的数据集来进行快速测评,解决的是搜索神经网络过程中在小数据集上评估和在最终大数据集评估结果的差异,需要想办法缩小这种差异。
Protein-Protein interaction graphs
from feature engineering to architecture engineering
many efforts have been made to reduce ..
As a matter of fact 事实上
利用先验知识快速的找到算法的最好配置
leverage prior experience in order to quickly find good algorithm configurations
讲弊端:However, its high computational cost has prevented widespread adoption.
This paper proposes a novel greedy architecture search algorithm, SGAS, which addresses this discrepancy and searches very efficiently.
(1) high correlation between the validation accuracy during the search phase and the final evaluation accuracy
(2) discovers top performing architectures with much less search cost.
阐述相关工作:
One of the earliest successful architectures xx.
Other prominent networks include xx which revolutionized computer vision by xx by a large margin.
ResNet [19] and DenseNet [22] were further milestones in architecture design
These works were extended with xx where a new cell-based search...
leverage the redundancy in network space and only sample a subset of channels in super-net during search to reduce computation.
方法:
有向无环图 Directed acyclic graph (DAG)
每一个directed edge(i,j)都连着一个操作o(i,j) be associated with an operation. 将信息(信号)从node-i 传递到node-j.
alpha(i,j) 是 architectural parameters: a softmax mixture over all possible operations
relax the selection of operations to a continuous optimization problem.
Each intermediate node aggregates information flows from all of predecessors.
Search-Evaluation Correlation:
NAS: stwostages: a search phase and an evaluation phase In order to reduce computational overhead
在轻量级模型里进行搜索和评估,得到最好结果的模型再进行大规模训练。
所以最终的模型表现要和搜索阶段的一致
This assumption usually does not hold due to the discrepancy in dataset, hyper-parameters and network architectures between the search and evaluation phases.
Refer to this issue as degenerate search-evaluation correlation.
Nc和Nd表示一致的和不一致的对。
It is a number in the range from ?1 to 1 where ?1 corresponds to a perfect negative correlation and 1 to a perfect positive correlation.
Sequential Greedy Architecture Search
问题解决:1. reduce the discrepancy between the search and evaluation phases;
- reduce the negative effect of weight sharing.
Propose to solve the bi-level optimization (Equation 1, 2) in a sequential greedy fashion to reduce the model discrepancy and the weight sharing progressively.
As a side benefit, the efficiency improves as parameters in and are pruned gradually in the optimization loop. The search procedure of the remaining A and W forms a new sub-problem which will be solved iteratively.
At the end of the search phase, a stand-alone network without weight sharing is obtained, as illustrated in Figure 2.
从三个方面考虑了边:
edge importance
selection certainty: 通过熵 Entropy
selection stability
直方图交叉核:层次性直方图,通过对2个数据分布的多层次直方图进行比较,得到交集个数,再加权得到最终的分布相似度。
参考:[https://blog.csdn.net/smartempire/article/details/23168945]
两个标准:
通过选择 stability(histogram intersection)/certainty(entropy)等,及嵌入了edge importance.
Criterion 1
高 selection certainty 稳定性 确定性
高 edge importance
Criterion 2
Criterion 1 and 2 improves the Kendall τ correlation coefficients to 0.56 and 0.42 respectively.
实验略。
【ICCV2019】DeepGCN:
利用空洞卷积和残差/密集连接使得GCN网络加深。
其做了消融实验表现了更深的网络具有更好的表达能力。细节:先做relu激活再做加和,卷积核大小16或9个。
GCN内部改进的可能操作:
1.不同的操作:前后差分,相邻节点差分
2.相似性度量方式和不同的节点选择
3.残差,密集连接Residual等,
4.多尺度Multi-scale GCN
5.层次性,+LSTM等
Advancing GraphSAGE with a data-driven node sampling
图的应用:
【NIPS2017】GraphSAGE:适用于大规模网络的归纳式(inductive)学习方法
能够为新增节点快速学习得到embedding。
GraphSAGE has enabled an inductive capability for inferring unseen nodes or graphs by aggregating subsampled local neighborhoods and by learning in a mini-batch gradient descent fashion.
能够推断未知的节点或图 通过聚合子采样(二次采样)的局部邻居 以小批量梯度下降的方式 infer a batch of target nodes with diverse degrees in parallel
大部分直推式学习主要问题:
缺乏权值共享(DeepWalk, LINE, node2vec)节点embedding直接是一个N*d的矩阵,互相之间没有共享学习参数。
输入维度固定|v|,训练过程依赖点的集合的固定网络结构限制了动态图的能力,无法为新加入的节点生成embeddig.
Unlike these previous approaches, we leverage feature information in order to train a model
to produce embeddings for unseen nodes.
where Feature vectors for graphs are derived from various graph kernels.
GraphSAGE模型:
邻居采样:为每个节点采样固定数量的邻居。
邻居特征聚集:通过聚集采用得到邻居特征
sample and aggregate 所以GCN里的操作还有sample 选取哪些为邻居节点,如采用KNN的方法每次更新邻居节点矩阵 聚合 交互 each aggregator function aggregates information from a different number of hops.
The core idea is to learn how to aggregate feature information from a node's local neighborhood (如degrees/text attributes of nearby nodes).
GraphSAGE embedding generation. use standard stochastic gradient descent and bp techniques.
谱聚类Spectral clustering 是从图论中演化出来的算法,后来在聚类中得到了广泛的应用。主要思想:把所有的数据看做空间中的点,这些点之间可以用边连接起来。距离较远的两个点之间的边权重值较低,而距离较近的两个点之间的边权重值较高,通过对所有数据点组成的图进行切图,让切图后不同的子图间边权重和尽可能的低,而子图内的边权重和尽可能的高,从而达到聚类的目的。
参考:https://www.cnblogs.com/pinard/p/6221564.html
3.1 embedding generation
learned the parameters of K aggregator functions
GraphSAGE embedding generation Algorithm:
对于每个节点对应的neighborhood做聚合,即执行第k个aggregator
然后再和当前该节点做一次concatenate. 最后和Wk权重相乘,过激活函数,得到h(k,v)
h(k,v) = h(k,v)/|h(k,v)|2
对每个节点都做一次,并且对于每种aggregator都做一次 就是相当于k层GCN,也就是k阶图卷积
再concat操作.
As this process iterates, nodes incrementally gain more and more information from further reaches of the graph.
First, each node v ∈ V aggregates the representations of the nodes in its immediate neighborhood,
into a single vector h(k?1,N(v)) .
GraphSAGE then concatenates the node’s current representation, h(k?1,v), with the aggregated neighborhood vector h(k?1,N(v)).
介绍方法的时候【借助了什么理论】:
provide theoretical context for our algorithm design to learn the topological structure of node neighborhoods.
provide insights into xxx
需要去自主地寻找或者定义一系列固定大小的邻居。作为N(v) 并且在不同的层聚合的邻居数不同。
draw different uniform samples at each iteration, k
3.2 Learning the parameters of GraphSAGE
执行一种类似聚类的方法:
不相干的表征->区别大
相似的表征->similar
representation
The graph-based loss function encourages nearby nodes to have similar representations, while enforcing
that the representations of disparate nodes are highly distinct.
node feature到下游学习任务
on a specific downstream task
3.3 Aggregation functions:
operate over an unordered set of vectors
输入序列的不变性
Mean aggregator
LSTM aggregator
Pooling aggregator
证明三个引理:
Theoretical analysis
We probe the expressive capabilities of GraphSAGE in order to provide insight into how GraphSAGE can learn about graph structure, even though it is inherently based on features.
Provide insight into how GraphSAGE can learn about graph structure
【定理1】Theorem 1 states that for any graph there exists a parameter setting for Algorithm 1 such that it can
approximate clustering coefficients in that graph to an arbitrary precision
估计图中的聚类参数 到任意的准确率
The proof of Theorem 1 relies on some properties of the pooling aggregator, which also provides insight into why GraphSAGE-pool outperforms the GCN and mean-based aggregators.
该证明依赖于pooling聚合器的一些性质,且为GraphSAGE-pool能够超越GCN和mean-based 聚合提供的一种证据。
相关工作的写法:
Unlike these previous approaches,
we leverage feature information in order xx
attempt to classify entire xx
the focus of our work is xx
【ICLR】ADVANCING GRAPHSAGE WITH A DATA-DRIVEN NODE SAMPLING
Advancing graphSAGE with a data-driven node sampling
GraphSAGE能够推断未知的节点或图 通过聚合子采样(二次采样)的局部邻居 以小批量梯度下降的方式 infer a batch of target nodes with diverse degrees in parallel
而uniform sample会使得在evaluation和training阶段的variance很大。导致局部最优, leading to sub-optimum accuracy.
数据驱动采样方法来推断邻居节点的实值重要性
We propose a new data-driven sampling approach to reason about the real-valued
importance of a neighborhood by a non-linear regressor, and to use the value as a
criterion for subsampling neighborhoods. regressor通过基于数值的强化学习方式学到。
The implied importance for each combination of vertex and neighborhood is inductively extracted from the negative classification loss output of GraphSAGE.
Introduction of 图网络
Machine learning on graph-structured network data has proliferated in a number of important applications.
To name a few, it shows great potential in a chemical prediction problem 【化学预测问题】 (Gilmer et al.(2017)), a protein functions understanding, and particle physics experiments【物理粒子实验】(Henrion et al. (2017); Choma et al. (2018)). Learning the representation of structural information about a graph discovers a mapping that embeds nodes (or sub-graphs), as points in a low-dimensional vector space. Graph neural network algorithms based on neighborhood aggregations, addressed the problem by leveraging a node’s attributes (Kipf & Welling (2016); Hamilton et al. (2017); Pham et al. (2017)). The GraphSAGE algorithm(Hamilton et al. (2017)) recursively subsamples by uniform sampling a fixed number of nodes from local neighborhoods over multiple hops, and learns a set of aggregator models that aggregate the hidden features of the subsampled nodes by backtracking toward the origin.