Created a VPE platform demo with only one fake pedestrian tracking al…

…gorithm module. Added checkpoint support and optimized operations for outputing to Kafka from Spark Streaming. Add Spark master setting to system configuration file. Add some comments and documentation. Move the command to the MessageHandlingApplication to the key field of a Kafka message. Add a fake pedestrian tracker to simulate pedestrian tracking application, together with a fake meta data saving application. Extract property solving to a separate class and apply it to all the applications. Some comments supplemented. License and readme added. Now able to send customized classes (i.e. Track) through Kafka. Add an attribute recognition application with fake attribute recognizer into the system. MessageHandlingApp can now control the execution flow. Unified designs of the sinks. Attribute recognition application can now parallely handle tasks from Kafka and HDFS/database, by joining two streams. Amended several commits. Now the system can run locally and on YARN, but when running on YARN, it still cannot receive messages from Kafka. Add comments to the Track class. Add track ID to the Track class. Some supplement to the attribute. Now able to receive messages from Kafka! Now the property file does not need to be uploaded to the HDFS. Edit it locally, and the system can pass it to YARN automatically. Solved problem caused by incompatibility of Spark Streaming checkpoint and Spark broadcast. Update README.md Make parameters of SparkSubmit setable in the system. However, they need further configurations in the YARN environment, and I have not figure out what configurations are needed. Now all applications can run concurrently on a cluster. Update README.md Add extra configuration advice for multi-applications running and monitoring. Add support for modifying scheduling strategies at startup. Add log system. Fix a bug caused by wrongly generated serialization ID and checkpoint directory sharing. Synthesize logging methods. Solve HDFS saving problem. Now logs can be printed to the terminal which starts the application. Use createDStream instead of createDirectDStream of Kafka for robustness and simplicity. Reduce steps in the pedestrian tracking application. Add support for storing images onto HDFS in JPEG format. Use SparkLauncher instead to submit apps. New modules should register their listening topics to TopicManager statically now. Enable parallel Kafka receiver. Move configurations into files. Make meta data saving directory changable. Fix bugs. Add comments and in-code docs. Create native tracker interface. add native file Change VideoData's variable type Add test for HDFS video decoder. Updated the name of the project. From VPE-Platform to LaS-VPE Platform. Add submodule of video decoder. Create TaskData class. Add comments. Use TaskData for graph-like task scheduling. Format all the files. Add comments. Add ReID module. Optimize the way of adding new modules. Add data feeding module for retrieving data from storage. Reorganize packages of some classes. Enable running multiple applications in one command. Unify routine for building parallel Kafka receiving streams. Fix bugs. Make it easy to switch between two Kafka receiving methods. Enhance robustness of Kafka producer usage. Reduce memory cost. Fix bugs. Fix bugs. Remove large files for preparations of upgrading to Spark 2.0. Update ReID API. Add extern solver for ReID. Optimize joining operation for ReID. Combine metadata saving app and data feeding app to reduce containers. Make BoundingBox class static. Modify ReID interface. Use JSON for saving tracks. Regularize ReID & db connector interface. Reorganize native modules. Upload Decoder.java and Decoder_Test.java Add function of skipping frames to the video decoder. Correct usage of VideoDecoder in its test class. Add Maven support. Add function of getting linked pedestrians to the graph database connector interface. Add version to scripts. Enable Maven to automatically handle native library. Make Maven build to the bin directory. Correct scripts. Update README. Solve bug when the lib directory does not exist when mvn package. Disable library removing during Maven clean. Add support for windows native libraries. Suppress copy error during Maven building. Remove dependency on hadoop-hdfs. Enable packing dependencies into the JAR file. Fix bugs. Use simple singleton instead of broadcast. Remove spark context settings in apps. Remove useless variables and parameters. Make SingletonManager manage classes using class names but not class types. Update in-file license. Enable update instance on creation of manager. Change to Maven standard directory layout. Create JUnit test for VideoDecoder. Adapt to new Video Decoder supporting CMake. Change exception thrown by GraphDatabaseConnector. Add Javadoc. Add pedestrian attribute recognizer using external solver. Create base class for ReID feature. Rename Track to Tracklet. Ignore VSCode files. Add to Javadoc of the classes using extern solvers. Add native folder. Add annotation. Enhance robustness of socket receiving. Solve bugs of external solvers. Adapt to new version of Video Decoder. Enable processing a whole dataset within one command. Add ISEE Basic Pedestrian Tracker. Add support for uploading extra configuration files to Spark. Enable broadcasting configuration files to workers. Now native modules are pushed to cluster to be made. Solve bug that messages are too large to be sent. Add test of JNI of ISEEBasicTracker. Solved configuration uploading problem. Simplify node and execution plan implementation. Redesigned TaskData class. Fix bugs. Tracking can work now!
kyu-sz · Oct 31, 2016 · 95d8425 · 95d8425
1 parent 2e0dade
commit 95d8425
Show file tree

Hide file tree

Showing 399 changed files with 81,734 additions and 242 deletions.
diff --git a/.classpath b/.classpath
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,21 @@
+# Binary files #
+*.so
+
+# Temporary files #
+checkpoint/
+*.log
+*.swp
+*.lck
+
+# Eclipse project files #
+.project
+.classpath
+.settings/
+/bin/
+
+# VSCode files #
+.vscode
+
+# IDEA files #
+*.iml
+.idea
diff --git a/.gitmodules b/.gitmodules
@@ -0,0 +1,6 @@
+[submodule "ISEE-Basic-Pedestrian-Tracker"]
+	path = src/native/ISEE-Basic-Pedestrian-Tracker
+	url = https://github.com/kyu-sz/ISEE-Basic-Pedestrian-Tracker.git
+[submodule "Video-Decoder"]
+	path = src/native/Video-Decoder
+	url = https://github.com/kyu-sz/Video-Decoder.git
diff --git a/.project b/.project
diff --git a/.settings/org.eclipse.jdt.core.prefs b/.settings/org.eclipse.jdt.core.prefs
diff --git a/LICENSE b/LICENSE
diff --git a/README.md b/README.md
@@ -0,0 +1,126 @@
+# LaS-VPE Platform
+
+[![AUR](https://img.shields.io/aur/license/yaourt.svg?maxAge=2592000)](LICENSE)
+
+By Ken Yu, Yang Zhou, Da Li, Dangwei Li and Houjing Huang, under guidance of Dr. Zhang Zhang and Prof. Kaiqi Huang.
+
+LaS-VPE Platform is a large-scale distributed video parsing and evaluation platform under the Intelligent Scene Exploration and Evaluation(iSEE) research platform of the Center for Research on Intelligent Perception and Computing(CRIPAC), Institute of Automation, Chinese Academy of Science. 
+
+The platform is powered by Spark Streaming and Kafka.
+
+The documentation is published on [Github Pages](https://kyu-sz.github.io/LaS-VPE-Platform).
+
+## License
+
+LaS-VPE Platform is released under the GPL License.
+
+## Contents
+1. [Requirements](#requirements)
+2. [How to run](#how-to-run)
+3. [How to monitor](#how-to-monitor)
+4. [How to add a new module](#how-to-add-a-new-module)
+5. [How to add a new native algorithm](#how-to-add-a-new-native-algorithm) 
+6. [How to deploy a new version](#how-to-deploy-a-new-version)
+
+## Requirements
+
+1. Use Maven to build the project:
+
+	```Shell
+	sudo apt-get install maven
+	```
+2. Deploy Kafka(>=0.8), HDFS(>=2.2) and YARN(>=2.2) properly on your cluster.
+To enable multi-appications running concurrently, see [Job-Scheduling](https://spark.apache.org/docs/1.2.0/job-scheduling.html) and configure your environment.
+
+## How to run
+
+Clone the project to your cluster:
+
+```Shell
+# Make sure to clone with --recursive
+git clone --recursive https://github.com/kyu-sz/LaS-VPE-Platform
+```
+
+Build and pack the system into a JAR:
+
+```Shell
+mvn compile && mvn package
+```
+
+Configure the environment and running properties in the files in [conf](conf).
+
+Specially, modify the [cluster-env.sh](conf/cluster-env.sh) in [conf](conf) to adapt to your cluster address.
+
+Upload the whole project to your cluster:
+
+```Shell
+./sbin/upload.sh
+```
+
+If the platform depends on native libraries, deliver them to worker nodes using [install.sh](sbin/install.sh) in [sbin](sbin) on your cluster. Note that this script requires the _HADOOP_HOME_ environment variable.
+
+Invoke the scripts in the home directory by command like "./sbin/run-*.sh".
+
+It is recommended to last start the [run-command-generating-app.sh](sbin/run-command-generating-app.sh), which is the debugging tool to simulate commands to the message handling application.
+
+Welcome to read Ken Yu's Chinese [blog](http://blog.csdn.net/kyu_115s/article/details/51887223) on experiences gained during the development.
+
+## How to monitor
+
+To briefly monitor, some informations are printed to the console that starts each module. However, to use this function, you must have the name of the host you start the module registered to each task nodes.
+
+To fully monitor your Spark application, you might need to access the log files in the slave nodes. However, if your application runs on a cluster without desktop, and you connect remotely to the cluster, you might not be able to access the web pages loaded from the slave nodes.
+
+To solve this problem, first add the ip address of the slave nodes to the /etc/hosts in the master node. Make sure the master node can access the pages on slave nodes by terminal browsers like w3m or lynx. In Ubuntu, they can be installed by ```sudo apt-get install w3m``` or ```sudo apt-get install lynx```.
+
+Then, configure your master node to be a proxy server using Squid. Tutors can be found in websites like [Help-Ubuntu-Squid](https://help.ubuntu.com/community/Squid).
+
+Finally, configure your browser to use the proxy provided by you master node. Then it would be able to access pages on slave nodes.
+
+In Firefox, it is recommended to use the AutoProxy plugin for enabling proxy. Beside obvious configurations, you need to first access *about:config*, then set *network.proxy.socks_remote_dns* as *true*.
+
+## Basic conceptions in the project
+
+_Application_: Same as that in YARN.
+
+_Stream_: A flow of DStreams. Each stream may take in more than one input Kafka topic, but output at most one kind of output. An _Application_ may contains multiple streams.
+
+_Node_: An execution of a _Stream_. A pack of input data and parameters are input into the stream.
+
+_ExecutionPlan_: A flow graph of _Node_.
+
+## How to add a new module
+
+A new module may be based on some algorithms whose codes are written in other languages, so you first need to wrap them into JAVA using JNI.
+
+See an application such as [PedestrianTrackingApp](src/main/java/org/cripac/isee/pedestrian/tracking/PedestrianTracker.java), etc. for example of how to write an application module. Write your own module then add it to this project. Also register its class name to the [AppManager](src/main/java/org/cripac/isee/vpe/ctrl/AppManager.java) by adding a line in the static block, similar to other lines.
+
+You may also need to extend the [CommandGeneratingApp](src/main/java/org/cripac/isee/vpe/debug/CommandGeneratingApp.java), [MessageHandlingApp](src/main/java/org/cripac/isee/vpe/ctrl/MessageHandlingApp.java) and [DataManagingApp](src/main/java/org/cripac/isee/vpe/data/DataManagingApp.java) for support of the module.
+
+## How to add a new native algorithm
+
+You may want to run algorithms written in other languages like C/C++ on this platform. There are already examples of them: see [Video-Decoder](Video-Decoder) and [ISEE-Basic=Pedestrian-Tracker](ISEE-Basic=Pedestrian-Tracker). 
+
+First of all, you should wrap your algorithm with JNI. It is recommended to implement this in another GitHub repository, and import it as a submodule.
+
+Then, add the corresponding Java class to the platform. Be careful to put it in a suitable package.
+
+Finally, build your algorithm project, and copy the resulting shared JNI library and those it depends on into the [library folder](lib/linux) directory.
+
+To enable auto building and cleaning together with Maven, it is recommended to use CMake to build your project. Then edit the [native library building script](sbin/build-native-libs.sh) and [native library cleaning script](sbin/clean-native-libs.sh), following the examples in them.
+
+If the new algorithm requires extra configuration files, remember to register them to the [CongigFileManager](src/main/java/org/cripac/isee/vpe/ctrl/ConfigFileManager.java).
+
+## How to deploy a new version
+
+Pack the new or modified project into a new JAR.
+
+If you have updated the version number, remember to check the [system property file](conf/system.properties), where there is an option named "vpe.platform.jar" specifying the name of the JAR file to upload, and it should be the same as that of your newly built JAR file.
+
+Upload the JAR to your cluster with your customized [uploading script](sbin/upload.sh).
+
+After that, kill the particular old application and run the new one. It need not restart other modules! Now your module runs together with the original modules.
+
+However, for modified modules, if you have run them once with checkpoint enabled, you should clean the old checkpoint directory or use a new one, so that the system can run new contexts rather than recovering old ones.
+
+Sometimes you may also need to clean Kafka and Zookeeper logs, which are defaultly stored in the /tmp folder in each executor.
diff --git a/bin/.gitignore b/bin/.gitignore
diff --git a/conf/cluster-env.sh b/conf/cluster-env.sh
@@ -0,0 +1,7 @@
+#!/usr/bin/env bash
+export DRIVER_USER="labadmin"
+export DRIVER_NODE="rman-nod1"
+export VPE_FOLDER="/home/labadmin/las-vpe-platform"
+
+export HADOOP_HOME=${HADOOP_HOME}
+export SLAVE_HADOOP_HOME=${HADOOP_HOME}
diff --git a/conf/log4j.properties b/conf/log4j.properties
@@ -0,0 +1,58 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Set everything to be logged to the console
+log4j.rootCategory=INFO, console
+log4j.appender.console=org.apache.log4j.ConsoleAppender
+log4j.appender.console.target=System.err
+log4j.appender.console.layout=org.apache.log4j.PatternLayout
+log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %-5p %c{1}: %m%n
+
+log4j.appender.RollingAppender=org.apache.log4j.DailyRollingFileAppender
+log4j.appender.RollingAppender.File=/tmp/spark-streaming.log
+log4j.appender.RollingAppender.DatePattern='.'yyyy-MM-dd
+log4j.appender.RollingAppender.layout=org.apache.log4j.PatternLayout
+log4j.appender.RollingAppender.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %-5p %c{1}: %m%n
+
+# By default, everything goes to console and file
+log4j.rootLogger=INFO, console, RollingAppender
+
+# The noiser spark logs got to file only
+log4f.logger.spark.storage=INFO, RollingAppender
+log4j.additivity.spark.storage=false
+log4f.logger.spark.scheduler=INFO, RollingAppender
+log4j.additivity.spark.scheduler=false
+log4f.logger.spark.CacheTracker=INFO, RollingAppender
+log4j.additivity.spark.CacheTracker=false
+log4f.logger.spark.CacheTrackerActor=INFO, RollingAppender
+log4j.additivity.spark.CacheTrackerActor=false
+log4f.logger.spark.MapOutputTracker=INFO, RollingAppender
+log4j.additivity.spark.MapOutputTracker=false
+log4f.logger.spark.MapOutputTrackerActor=INFO, RollingAppender
+log4j.additivity.spark.MapOutputTrackerActor=false
+
+# Settings to quiet third party logs that are too verbose
+log4j.logger.org.spark-project.jetty=WARN
+log4j.logger.org.spark-project.jetty.util.component.AbstractLifeCycle=ERROR
+log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=WARN
+log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=WARN
+log4j.logger.org.apache.parquet=ERROR
+log4j.logger.parquet=ERROR
+
+# SPARK-9183: Settings to avoid annoying messages when looking up nonexistent UDFs in SparkSQL with Hive support
+log4j.logger.org.apache.hadoop.hive.metastore.RetryingHMSHandler=FATAL
+log4j.logger.org.apache.hadoop.hive.ql.exec.FunctionRegistry=ERROR
diff --git a/conf/pedestrian-tracking/isee-basic/CAM01_0.conf b/conf/pedestrian-tracking/isee-basic/CAM01_0.conf
@@ -0,0 +1,121 @@
+[CAM]
+CAMarea=4F
+CAMID=1001000
+CAMname=CAM01
+FPS=13
+HomographyMatrix_11=0.00
+HomographyMatrix_12=0.00
+HomographyMatrix_13=0.00
+HomographyMatrix_14=0.00
+HomographyMatrix_21=0.00
+HomographyMatrix_22=0.00
+HomographyMatrix_23=0.00
+HomographyMatrix_24=0.00
+HomographyMatrix_31=1.23
+HomographyMatrix_32=2.35
+HomographyMatrix_33=8.76
+HomographyMatrix_34=2.78
+HomographyMatrix_41=3.74
+HomographyMatrix_42=5.12
+HomographyMatrix_43=6.89
+HomographyMatrix_44=1.32
+[F10]
+flag=1
+nTripwire=1
+ROI1=1001001
+ROI1_nPts=2
+ROI1_point1.x=122
+ROI1_point1.y=357
+ROI1_point2.x=639
+ROI1_point2.y=369
+[F20]
+flag=1
+dBgThresh=0.70
+dFactor=2.52
+iRefDistance=10
+iDownScale=2
+iInitFrame=178
+iLostFrame=10
+dInitLearnRate=0.01
+dInitMean=127.50
+dInitStd=18.00
+dInitWeight=0.05
+dDisWithoutPredict=90.00
+dDisWithPredict=80.00
+dMinObjSize=10000.00
+dMaxObjSize=368640.00
+dMinStd=17.00
+dUpdateLearnRate=0.001
+iPredictWinLen=5
+iStartFrame=10
+dSuddenRatio=0.90
+dObjSizeLowRate=4.00
+dObjSizeUpRate=4.00
+dMaxVelocity=1000.00
+bBrectDis=1
+bDesOut=1
+bRegionDis=1
+bRegionOut=1
+bSceneOut=1
+bTimeOut=1
+bTrajDis=1
+bTypeOut=1
+dNearbyPosition=0.00
+dMediumPosition=0.00
+dFarawayPosition=0.00
+dMediumTargetSize=0.00
+dNearbyTargetSize=0.00
+dFarawayTargetSize=0.00
+nAlarmROI=1
+nNoUse=0
+ROI1=1001001
+ROI1_alarmType=0
+ROI1_alarmLevel=0
+ROI1_iTripwireDirection=3
+ROI1_iApproachingFrameCount=1
+ROI1_iPassedFrameCount=1
+ROI1_nPts=4
+ROI1_dMinTargetSize=0.00
+ROI1_strDescription=
+ROI1_point1.x=134
+ROI1_point1.y=336
+ROI1_point2.x=25
+ROI1_point2.y=667
+ROI1_point3.x=740
+ROI1_point3.y=656
+ROI1_point4.x=622
+ROI1_point4.y=332
+[F30]
+flag=1
+nROI=1
+ROI1=1001001
+ROI1_nPts=4
+ROI1_point1.x=134
+ROI1_point1.y=336
+ROI1_point2.x=25
+ROI1_point2.y=667
+ROI1_point3.x=740
+ROI1_point3.y=656
+ROI1_point4.x=622
+ROI1_point4.y=332
+[F40]
+flag=1
+BackImagePath=test_data/img_data/background1.jpg
+DownSampleWidth=640
+DownSampleHeight=320
+IfShadow=0
+WeightNum_y=4
+WeightCoefficient=1.12
+NormalizationCoefficient=9.00
+nROI=1
+ROI1=1001001
+ROI1_nPts=4
+ROI1_point1.x=134
+ROI1_point1.y=336
+ROI1_point2.x=25
+ROI1_point2.y=667
+ROI1_point3.x=740
+ROI1_point3.y=656
+ROI1_point4.x=622
+ROI1_point4.y=332
+[F50]