Skip to content

Commit

Permalink
Add Python Interfaces (#12)
Browse files Browse the repository at this point in the history
* add python interface

* del python deps

* add python README.md

* add python README.md

* add python demo link into python readme

* support python3

* update readme

* update readme

* modify PYTHON_INCLUDE in Makefile

* update code annotation

* modify raw_input inferface to support python3 && update Makefile

* modify topical word embeddings model

* update the path of news model

* update the path of news model

* update slda_infer code annotation
  • Loading branch information
lianrzh authored Jul 21, 2017
1 parent 3485a09 commit a496187
Show file tree
Hide file tree
Showing 20 changed files with 957 additions and 12 deletions.
33 changes: 27 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,18 @@ ifndef DEPS_PATH
DEPS_PATH = $(shell pwd)/third_party
endif

ifndef PYTHON_PATH
PYTHON_PATH = $(shell python -c"import sys; print(sys.prefix)")
endif

ifndef PYTHON_VERSION
PYTHON_VERSION = $(shell ls $(PYTHON_PATH)/include | grep python)
endif

ifndef PYTHON_INCLUDE
PYTHON_INCLUDE = $(shell ls $(PYTHON_PATH)/include | grep python | sed "s:^:$(PYTHON_PATH)/include/:")
endif

ifndef PROTOC
PROTOC = ${DEPS_PATH}/bin/protoc
endif
Expand All @@ -25,12 +37,13 @@ CXXFLAGS=-pipe \

INCPATH=-I./include/ \
-I./include/familia \
-I./third_party/include
-I./third_party/include \
-I$(PYTHON_INCLUDE)

LDFLAGS_SO = -L$(DEPS_PATH)/lib -L./build/ -lfamilia -lprotobuf -lglog -lgflags
LDFLAGS_SO = -L$(DEPS_PATH)/lib -L$(PYTHON_PATH)/lib -L./build/ -lfamilia -lprotobuf -lglog -lgflags

.PHONY: all
all: familia
all: familia python/demo/familia.so
@echo $(SOURCES)
@echo $(OBJS)
$(CXX) $(CXXFLAGS) $(INCPATH) build/demo/inference_demo.o $(LDFLAGS_SO) -o inference_demo
Expand All @@ -50,13 +63,17 @@ clean:
rm -rf word_distance_demo
rm -rf topic_word_demo
rm -rf show_topic_demo
rm -rf build
rm -rf build
rm -rf python/cpp/*.o
rm -rf python/demo/*.so
rm -rf python/demo/*.pyc
find src -name "*.pb.[ch]*" -delete

# third party dependency
deps: ${GLOGS} ${GFLAGS} ${PROTOBUF}
@echo "dependency installed!"

.PHONY: familia
familia: build/libfamilia.a

OBJS = $(addprefix build/, vose_alias.o inference_engine.o model.o vocab.o document.o sampler.o config.o util.o semantic_matching.o tokenizer.o \
Expand All @@ -74,12 +91,16 @@ build/libfamilia.a: include/config.pb.h $(OBJS)
build/%.o: src/%.cpp
@mkdir -p $(@D)
$(CXX) $(INCPATH) $(CXXFLAGS) -MM -MT build/$*.o $< >build/$*.d
$(CXX) $(INCPATH) $(CXXFLAGS) -c $< -o $@
$(CXX) $(INCPATH) $(CXXFLAGS) -c $< -o $@

# build proto
include/config.pb.h src/config.cpp : proto/config.proto
include/config.pb.h src/config.cpp : proto/config.proto
$(PROTOC) --cpp_out=./src --proto_path=./proto $<
mv src/config.pb.h ./include/familia
mv src/config.pb.cc ./src/config.cpp

python/demo/familia.so : python/cpp/familia_wrapper.cpp familia
$(CXX) $(INCPATH) $(CXXFLAGS) -c $< -o python/cpp/familia_wrapper.o
$(CXX) $(INCPATH) $(CXXFLAGS) -shared python/cpp/familia_wrapper.o $(LDFLAGS_SO) -l$(PYTHON_VERSION) -o $@

-include $(wildcard */*.d *.d)
4 changes: 2 additions & 2 deletions model/download_model.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,6 @@
# 下载主题模型文件

if [ ! -d news ]; then
wget http://familia.bj.bcebos.com/models/news.tar.gz
tar -xzf news.tar.gz
wget http://familia.bj.bcebos.com/models/news.v1.tar.gz
tar -xzf news.v1.tar.gz
fi
23 changes: 23 additions & 0 deletions python/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Familia Python接口

## 代码编译
第三方依赖除了Familia C++代码所需要的库之外,还需要依赖python,默认使用当前系统python(支持python2和python3),兼容Linux和Mac操作系统。
默认情况下在Familia目录执行以下脚本会自动获取依赖并编译产生familia.so。

$ sh build.sh # 包含获取并安装第三方依赖的过程

## Python接口
将原先C++代码封装成两个python类(familia_wrapper.py):InferenceEngineWrapper 和 TopicalWordEmbeddingsWrapper.
其中,InferenceEngineWrappr提供了与主题模型相关的接口:

- lda_infer # LDA主题模型推断
- slda_infer # SentenceLDA主题模型推断
- cal_doc_distance # 计算长文本与长文本之间的距离
- cal_query_doc_similarity # 计算短文本跟长文本之间的相关性

TopicalWordEmbeddingsWrapper则提供了与TWE模型相关的接口:

- nearest_words # 寻求与目标词最相关的词
- nearest_words_around_topic # 寻求与目标主题最相关的词

具体使用方法可参照[Demo使用文档](https://github.com/baidu/Familia/wiki/Python-Demo%E4%BD%BF%E7%94%A8%E6%96%87%E6%A1%A3)
Loading

0 comments on commit a496187

Please sign in to comment.