代码有些问题 #27

wjy3326 · 2022-04-27T03:33:05Z

def get_local_word2entity(entities): # 格式：entity_id: entity
"""
Given the entities information in one line of the dataset, construct a map from word to entity index
E.g., given entities = 'id_1:Harry Potter;id_2:England', return a map = {'harry':index_of(id_1),
'potter':index_of(id_1), 'england': index_of(id_2)}
:param entities: entities information in one line of the dataset
:return: a local map from word to entity index
"""
local_map = {}

for entity_pair in entities.split(';'):
    entity_id = entity_pair[:entity_pair.index(':')]
    entity_name = entity_pair[entity_pair.index(':') + 1:]

    # remove non-character word and transform words to lower case
    entity_name = PATTERN1.sub(' ', entity_name)
    entity_name = PATTERN2.sub(' ', entity_name).lower()
    # constructing map: word -> entity_index
    for w in entity_name.split(' '):
        entity_index = entity2index[entity_id]  
        local_map[w] = entity_index  # 这里有问题，不同的实体如果有相同的词，就覆盖了之前的entity_index，不知道这里什么意思？

return local_map

The text was updated successfully, but these errors were encountered:

hwwang55 · 2022-04-27T04:37:56Z

您好，确实存在您说的可能性，但是实际情况下几乎不会出现这种情况。感谢！

wjy3326 · 2022-04-27T05:27:29Z

我觉得如果实体数量足够大，是经常可能发生这个情况的，毕竟常用词语就那么几千个，但是实体有很多，不知道这个函数有什么作用？

hwwang55 · 2022-04-27T06:27:41Z

这个计算是针对每个新闻标题进行的，很难会出现在一个新闻标题中，同一个词出现在多个实体中的情况。这个函数是计算每个词属于哪一个实体，为了后续的convolution操作。

wjy3326 · 2022-04-27T06:44:01Z

了解了，谢谢

wjy3326 · 2022-04-27T09:00:54Z

问下1. word_embedding跟entity_embedding拼接那块，是entity中的所有字的embedding都是entity_embedding吗？因为entity_embedding是针对整个实体，是个词语，拼接是按照字拼接对吧？这里的entity_id （0,0,0,0,3533,3533,3533,0,0,）里的三个字的entity_embedding都是一样的吧？
2. 训练最后为什么用sigmoid，不用softmax呢？

wjy3326 · 2022-04-27T10:45:04Z

sigmoid设置的阈值是多少呢？0.5吗？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

代码有些问题 #27

代码有些问题 #27

wjy3326 commented Apr 27, 2022

hwwang55 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

hwwang55 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

代码有些问题 #27

代码有些问题 #27

Comments

wjy3326 commented Apr 27, 2022

hwwang55 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

hwwang55 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022

wjy3326 commented Apr 27, 2022