# 今日学术视野(2017.09.20)

cond-mat.dis-nn - 无序系统与神经网络

cs.AI - 人工智能

cs.CL - 计算与语言

cs.CR - 加密与安全

cs.CV - 机器视觉与模式识别

cs.DC - 分布式、并行与集群计算

cs.DL - 数字图书馆

cs.DS - 数据结构与算法

cs.ET - 新兴技术

cs.IR - 信息检索

cs.IT - 信息论

cs.LG - 自动学习

cs.MA - 多代理系统

cs.NE - 神经与进化计算

cs.NI - 网络和互联网体系结构

cs.RO - 机器人学

cs.SD - 声音处理

cs.SE - 软件工程

cs.SI - 社交网络与信息网络

cs.SY - 系统与控制

math.NA - 数值分析

math.RA - 环与代数

math.ST - 统计理论

physics.soc-ph - 物理学与社会

q-bio.OT - 其他定量生物学

stat.AP - 应用统计

stat.ME - 统计方法论

stat.ML - (统计)机器学习

• [cond-mat.dis-nn]Learning Disordered Topological Phases by Statistical Recovery of Symmetry

• [cs.AI]A Categorical Approach for Recognizing Emotional Effects of Music

• [cs.AI]AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms

• [cs.AI]Augmenting End-to-End Dialog Systems with Commonsense Knowledge

• [cs.AI]Markov Brains: A Technical Introduction

• [cs.AI]Memory Augmented Control Networks

• [cs.AI]Reinforcement Learning Based Conversational Search Assistant

• [cs.AI]Relational Marginal Problems: Theory and Estimation

• [cs.AI]The Uncertainty Bellman Equation and Exploration

• [cs.CL]"How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts

• [cs.CL]AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline

• [cs.CL]Acquiring Background Knowledge to Improve Moral Value Prediction

• [cs.CL]Character Distributions of Classical Chinese Literary Texts: Zipf's Law, Genres, and Epochs

• [cs.CL]Combining Search with Structured Data to Create a More Engaging User Experience in Open Domain Dialogue

• [cs.CL]Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue

• [cs.CL]Data Innovation for International Development: An overview of natural language processing for qualitative data analysis

• [cs.CL]Flexible Computing Services for Comparisons and Analyses of Classical Chinese Poetry

• [cs.CL]Hierarchical Gated Recurrent Neural Tensor Network for Answer Triggering

• [cs.CL]Limitations of Cross-Lingual Learning from Image Search

• [cs.CL]Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification

• [cs.CL]Role of Morphology Injection in Statistical Machine Translation

• [cs.CL]Sequence to Sequence Learning for Event Prediction

• [cs.CL]Toward a full-scale neural machine translation in production: the Booking.com use case

• [cs.CL]Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models

• [cs.CL]Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems

• [cs.CR]Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning

• [cs.CR]Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification

• [cs.CR]Settling Payments Fast and Private: Efficient Decentralized Routing for Path-Based Transactions

• [cs.CV]A Causal And-Or Graph Model for Visibility Fluent Reasoning in Human-Object Interactions

• [cs.CV]A Hierarchical Probabilistic Model for Facial Feature Detection

• [cs.CV]An Improved Fatigue Detection System Based on Behavioral Characteristics of Driver

• [cs.CV]Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery

• [cs.CV]Beyond SIFT using Binary features for Loop Closure Detection

• [cs.CV]Combinational neural network using Gabor filters for the classification of handwritten digits

• [cs.CV]Continuous Multimodal Emotion Recognition Approach for AVEC 2017

• [cs.CV]Coupled Ensembles of Neural Networks

• [cs.CV]DeepLung: 3D Deep Convolutional Nets for Automated Pulmonary Nodule Detection and Classification

• [cs.CV]Depression Scale Recognition from Audio, Visual and Text Analysis

• [cs.CV]Direct Pose Estimation with a Monocular Camera

• [cs.CV]Direction-Aware Semi-Dense SLAM

• [cs.CV]E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network

• [cs.CV]Facial Feature Tracking under Varying Facial Expressions and Face Poses based on Restricted Boltzmann Machines

• [cs.CV]Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video

• [cs.CV]Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence

• [cs.CV]Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs

• [cs.CV]LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation

• [cs.CV]Learning a Fully Convolutional Network for Object Recognition using very few Data

• [cs.CV]Long-Term Ensemble Learning of Visual Place Classifiers

• [cs.CV]Microscopy Cell Segmentation via Adversarial Neural Networks

• [cs.CV]Multi-Person Pose Estimation via Column Generation

• [cs.CV]Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks

• [cs.CV]NIMA: Neural Image Assessment

• [cs.CV]Neural Affine Grayscale Image Denoising

• [cs.CV]Normal Integration: A Survey

• [cs.CV]Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

• [cs.CV]Rotation Adaptive Visual Object Tracking with Motion Consistency

• [cs.CV]Social Style Characterization from Egocentric Photo-streams

• [cs.CV]StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

• [cs.CV]Target-adaptive CNN-based pansharpening

• [cs.CV]The Multiscale Bowler-Hat Transform for Blood Vessel Enhancement in Retinal Images

• [cs.CV]To Go or Not To Go? A Near Unsupervised Learning Approach For Robot Navigation

• [cs.CV]Variational Methods for Normal Integration

• [cs.CV]Vehicle Tracking in Wide Area Motion Imagery via Stochastic Progressive Association Across Multiple Frames (SPAAM)

• [cs.CV]Video Object Segmentation Without Temporal Information

• [cs.CV]Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition

• [cs.CV]Zero-Shot Learning to Manage a Large Number of Place-Specific Compressive Change Classifiers

• [cs.DC]Cost-Based Assessment of Partitioning Algorithms of Agent-Based Systems on Hybrid Cloud Environments

• [cs.DC]Generalized PMC model for the hybrid diagnosis of multiprocessor systems

• [cs.DC]Hybrid Fault diagnosis capability analysis of Hypercubes under the PMC model and MM* model

• [cs.DC]IBM Deep Learning Service

• [cs.DC]Use of Information, Memory and Randomization in Asynchronous Gathering

• [cs.DC]sPIN: High-performance streaming Processing in the Network

• [cs.DL]SKOS Concepts and Natural Language Concepts: an Analysis of Latent Relationships in KOSs

• [cs.DS]Learning Depth-Three Neural Networks in Polynomial Time

• [cs.ET]MOL-Eye: A New Metric for the Performance Evaluation of a Molecular Signal

• [cs.IR]Anticipating Information Needs Based on Check-in Activity

• [cs.IR]MERF: Morphology-based Entity and Relational Entity Extraction Framework for Arabic

• [cs.IR]Towards Building a Knowledge Base of Monetary Transactions from a News Collection

• [cs.IT]AG codes and AG quantum codes from cyclic extensions of the Suzuki and Ree curves

• [cs.IT]Bounds on Binary Locally Repairable Codes Tolerating Multiple Erasures

• [cs.IT]Challenges and potentials for visible light communications: State of the art

• [cs.IT]Codes over Affine Algebras with a Finite Commutative Chain coefficient Ring

• [cs.IT]Cooperative Network Synchronization: Asymptotic Analysis

• [cs.IT]Finite-Alphabet Precoding for Massive MU-MIMO with Low-resolution DACs

• [cs.IT]Indistinguishability and Energy Sensitivity of Asymptotically Gaussian Compressed Encryption

• [cs.IT]Millimeter Wave Channel Measurements and Implications for PHY Layer Design

• [cs.IT]Modeling Co-location in Multi-Operator mmWave Networks with Spectrum Sharing

• [cs.IT]Multivariable codes in principal ideal polynomial quotient rings with applications to additive modular bivariate codes over $\mathbb{F}_4$

• [cs.IT]Network Deployment for Maximal Energy Efficiency in Uplink with Zero-Forcing

• [cs.IT]On the Restricted Isometry of the Columnwise Khatri-Rao Product

• [cs.IT]Performance Analysis of FSO System with Spatial Diversity and Relays for M-QAM over Log-Normal Channel

• [cs.IT]Performance Evaluation of Spatial Complementary Code Keying Modulation in MIMO Systems

• [cs.IT]Performance analysis of dual-hop optical wireless communication systems over k-distribution turbulence channel with pointing error

• [cs.IT]Rapid Fading Due to Human Blockage in Pedestrian Crowds at 5G Millimeter-Wave Frequencies

• [cs.IT]Reliability of Multicast under Random Linear Network Coding

• [cs.IT]Secrecy Rate of Distributed Cooperative MIMO in the Presence of Multi-Antenna Eavesdropper

• [cs.IT]Stable Recovery of Structured Signals From Corrupted Sub-Gaussian Measurements

• [cs.IT]The Stochastic Geometry Analyses of Cellular Networks with α-Stable Self-Similarity

• [cs.LG]Autoencoder-Driven Weather Clustering for Source Estimation during Nuclear Events

• [cs.LG]Deep Automated Multit-task Learning

• [cs.LG]Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks

• [cs.LG]FlashProfile: Interactive Synthesis of Syntactic Profiles

• [cs.LG]Grade Prediction with Temporal Course-wise Influence

• [cs.LG]Leveraging Distributional Semantics for Multi-Label Learning

• [cs.LG]Minimal Effort Back Propagation for Convolutional Neural Networks

• [cs.LG]Multi-Agent Distributed Lifelong Learning for Collective Knowledge Acquisition

• [cs.LG]Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder

• [cs.LG]Multi-Modal Multi-Task Deep Learning for Autonomous Driving

• [cs.LG]N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning

• [cs.LG]On Inductive Abilities of Latent Factor Models for Relational Learning

• [cs.LG]Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents

• [cs.LG]Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification

• [cs.MA]Guided Deep Reinforcement Learning for Swarm Systems

• [cs.NE]$ε$-Lexicase selection: a probabilistic and multi-objective analysis of lexicase selection in continuous domains

• [cs.NE]Dynamic Capacity Estimation in Hopfield Networks

• [cs.NE]Push and Pull Search for Solving Constrained Multi-objective Optimization Problems

• [cs.NI]Channel Access Method Classification For Cognitive Radio Applications

• [cs.RO]A novel Skill-based Programming Paradigm based on Autonomous Playing and Skill-centric Testing

• [cs.RO]AA-ICP: Iterative Closest Point with Anderson Acceleration

• [cs.RO]Bayesian Optimization Using Domain Knowledge on the ATRIAS Biped

• [cs.RO]Decentralized Collision-Free Control of Multiple Robots in 2D and 3D Spaces

• [cs.RO]Design, Development and Evaluation of a UAV to Study Air Quality in Qatar

• [cs.RO]Endo-VMFuseNet: Deep Visual-Magnetic Sensor Fusion Approach for Uncalibrated, Unsynchronized and Asymmetric Endoscopic Capsule Robot Localization Data

• [cs.RO]Learning Sampling Distributions for Robot Motion Planning

• [cs.RO]Recognizing Objects In-the-wild: Where Do We Stand?

• [cs.RO]Sensor-Based Reactive Symbolic Planning in Partially Known Environments

• [cs.RO]Sim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter: Domain Randomization and Adaptation with Modular Networks

• [cs.RO]Topomap: Topological Mapping and Navigation Based on Visual SLAM Maps

• [cs.RO]Why did the Robot Cross the Road? - Learning from Multi-Modal Sensor Data for Autonomous Road Crossing

• [cs.SD]Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement

• [cs.SD]Speech Dereverberation Using Nonnegative Convolutive Transfer Function and Spectro temporal Modeling

• [cs.SE]Joining Jolie to Docker - Orchestration of Microservices on a Containers-as-a-Service Layer

• [cs.SI]Label propagation for clustering

• [cs.SI]Representation Learning on Graphs: Methods and Applications

• [cs.SI]The Geometric Block Model

• [cs.SI]Towards matching user mobility traces in large-scale datasets

• [cs.SY]A Generalized Framework for Kullback-Leibler Markov Aggregation

• [cs.SY]Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems

• [math.NA]Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

• [math.NA]Variational Gaussian Approximation for Poisson Data

• [math.RA]MacWilliams' extension theorem for infinite rings

• [math.ST]A Sharp Lower Bound for Mixed-membership Estimation

• [math.ST]A generalization of the Log Lindley distribution -- its properties and applications

• [math.ST]An alternative to continuous univariate distributions supported on a bounded interval: The BMT distribution

• [math.ST]Nonparametric Shape-restricted Regression

• [math.ST]Rigorous Analysis for Efficient Statistically Accurate Algorithms for Solving Fokker-Planck Equations in Large Dimensions

• [math.ST]Semi-supervised learning

• [math.ST]Spectral Radii of Truncated Circular Unitary Matrices

• [physics.soc-ph]Mapping temporal-network percolation to weighted, static event graphs

• [q-bio.OT]An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems

• [stat.AP]An adsorbed gas estimation model for shale gas reservoirs via statistical learning

• [stat.AP]Applying Machine Learning Methods to Enhance the Distribution of Social Services in Mexico

• [stat.AP]Forecasting of commercial sales with large scale Gaussian Processes

• [stat.AP]Reassessing Accuracy Rates of Median Decisions

• [stat.ME]Bayesian analysis of three parameter singular and absolute continuous Marshall-Olkin bivariate Pareto distribution

• [stat.ME]Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions

• [stat.ME]Estimating the Variance of Measurement Errors in Running Variables of Sharp Regression Discontinuity Designs

• [stat.ME]Method for Mode Mixing Separation in Empirical Mode Decomposition

• [stat.ME]Parameter Regimes in Partial Functional Panel Regression

• [stat.ME]Regularization and Variable Selection with Copula Prior

• [stat.ME]Robust estimation in single index models with asymmetric errors

• [stat.ME]Some variations on Random Survival Forest with application to Cancer Research

• [stat.ME]Statistical inference on random dot product graphs: a survey

• [stat.ML]Bayesian nonparametric Principal Component Analysis

• [stat.ML]Constrained Bayesian Optimization for Automatic Chemical Design

• [stat.ML]Learning Mixtures of Multi-Output Regression Models by Correlation Clustering for Multi-View Data

• [stat.ML]Multivariate Gaussian Network Structure Learning

• [stat.ML]Neonatal Seizure Detection using Convolutional Neural Networks

• [stat.ML]Relevant Ensemble of Trees

• [stat.ML]Subset Labeled LDA for Large-Scale Multi-Label Classification

• [stat.ML]The generalised random dot product graph

• [stat.ML]ZhuSuan: A Library for Bayesian Deep Learning

·····································

• [cond-mat.dis-nn]**Learning Disordered Topological Phases by Statistical Recovery of Symmetry**

*Nobuyuki Yoshioka, Yutaka Akagi, Hosho Katsura*

http://arxiv.org/abs/1709.05790v1

In this letter, we apply the artificial neural network in a supervised manner to map out the quantum phase diagram of disordered topological superconductor in class DIII. Given the disorder that keeps the discrete symmetries of the ensemble as a whole, translational symmetry which is broken in the quasiparticle distribution individually is recovered statistically by taking an ensemble average. By using this, we classify the phases by the artificial neural network that learned the quasiparticle distribution in the clean limit, and show that the result is totally consistent with the calculation by the transfer matrix method or noncommutative geometry approach. If all three phases, namely the $\mathbb{Z}_2$, trivial, and the thermal metal phases appear in the clean limit, the machine can classify them with high confidence over the entire phase diagram. If only the former two phases are present, we find that the machine remains confused in the certain region, leading us to conclude the detection of the unknown phase which is eventually identified as the thermal metal phase. In our method, only the first moment of the quasiparticle distribution is used for input, but application to a wider variety of systems is expected by the inclusion of higher moments.

• [cs.AI]**A Categorical Approach for Recognizing Emotional Effects of Music**

*Mohsen Sahraei Ardakani, Ehsan Arbabi*

http://arxiv.org/abs/1709.05684v1

Recently, digital music libraries have been developed and can be plainly accessed. Latest research showed that current organization and retrieval of music tracks based on album information are inefficient. Moreover, they demonstrated that people use emotion tags for music tracks in order to search and retrieve them. In this paper, we discuss separability of a set of emotional labels, proposed in the categorical emotion expression, using Fisher's separation theorem. We determine a set of adjectives to tag music parts: happy, sad, relaxing, exciting, epic and thriller. Temporal, frequency and energy features have been extracted from the music parts. It could be seen that the maximum separability within the extracted features occurs between relaxing and epic music parts. Finally, we have trained a classifier using Support Vector Machines to automatically recognize and generate emotional labels for a music part. Accuracy for recognizing each label has been calculated; where the results show that epic music can be recognized more accurately (77.4%), comparing to the other types of music.

• [cs.AI]**AI Programmer: Autonomously Creating Software Programs Using Genetic Algorithms**

*Kory Becker, Justin Gottschlich*

http://arxiv.org/abs/1709.05703v1

In this paper, we present the first-of-its-kind machine learning (ML) system, called AI Programmer, that can automatically generate full software programs requiring only minimal human guidance. At its core, AI Programmer uses genetic algorithms (GA) coupled with a tightly constrained programming language that minimizes the overhead of its ML search space. Part of AI Programmer's novelty stems from (i) its unique system design, including an embedded, hand-crafted interpreter for efficiency and security and (ii) its augmentation of GAs to include instruction-gene randomization bindings and programming language-specific genome construction and elimination techniques. We provide a detailed examination of AI Programmer's system design, several examples detailing how the system works, and experimental data demonstrating its software generation capabilities and performance using only mainstream CPUs.

• [cs.AI]**Augmenting End-to-End Dialog Systems with Commonsense Knowledge**

*Tom Young, Erik Cambria, Iti Chaturvedi, Minlie Huang, Hao Zhou, Subham Biswas*

http://arxiv.org/abs/1709.05453v1

Building dialog agents that can converse naturally with humans is a challenging yet intriguing problem of artificial intelligence. In open-domain human-computer conversation, where the conversational agent is expected to respond to human responses in an interesting and engaging way, commonsense knowledge has to be integrated into the model effectively. In this paper, we investigate the impact of providing commonsense knowledge about the concepts covered in the dialog. Our model represents the first attempt to integrating a large commonsense knowledge base into end-to-end conversational models. In the retrieval-based scenario, we propose the Tri-LSTM model to jointly take into account message and commonsense for selecting an appropriate response. Our experiments suggest that the knowledge-augmented models are superior to their knowledge-free counterparts in automatic evaluation.

• [cs.AI]**Markov Brains: A Technical Introduction**

*Arend Hintze, Jeffrey A. Edlund, Randal S. Olson, David B. Knoester, Jory Schossau, Larissa Albantakis, Ali Tehrani-Saleh, Peter Kvam, Leigh Sheneman, Heather Goldsby, Clifford Bohm, Christoph Adami*

http://arxiv.org/abs/1709.05601v1

Markov Brains are a class of evolvable artificial neural networks (ANN). They differ from conventional ANNs in many aspects, but the key difference is that instead of a layered architecture, with each node performing the same function, Markov Brains are networks built from individual computational components. These computational components interact with each other, receive inputs from sensors, and control motor outputs. The function of the computational components, their connections to each other, as well as connections to sensors and motors are all subject to evolutionary optimization. Here we describe in detail how a Markov Brain works, what techniques can be used to study them, and how they can be evolved.

• [cs.AI]**Memory Augmented Control Networks**

*Arbaaz Khan, Clark Zhang, Nikolay Atanasov, Konstantinos Karydis, Vijay Kumar, Daniel D. Lee*

http://arxiv.org/abs/1709.05706v1

Planning problems in partially observable environments cannot be solved directly with convolutional networks and require some form of memory. But, even memory networks with sophisticated addressing schemes are unable to learn intelligent reasoning satisfactorily due to the complexity of simultaneously learning to access memory and plan. To mitigate these challenges we introduce the Memory Augmented Control Network (MACN). The proposed network architecture consists of three main parts. The first part uses convolutions to extract features and the second part uses a neural network-based planning module to pre-plan in the environment. The third part uses a network controller that learns to store those specific instances of past information that are necessary for planning. The performance of the network is evaluated in discrete grid world environments for path planning in the presence of simple and complex obstacles. We show that our network learns to plan and can generalize to new environments.

• [cs.AI]**Reinforcement Learning Based Conversational Search Assistant**

*Milan Aggarwal, Aarushi Arora, Shagun Sodhani, Balaji Krishnamurthy*

http://arxiv.org/abs/1709.05638v1

In this work, we develop an end-to-end Reinforcement Learning based architecture for a conversational search agent to assist users in searching on an e-commerce marketplace for digital assets. Our approach caters to a search task fundamentally different from the ones which have limited search modalities where the user can express his preferences objectively. The system interacts with the users to display search results to the queries, and gauges user's intent and context of the conversation to choose the next action and reply. To train the agent in the absence of true conversation data, a virtual user is constructed to model a human user using the query and session logs from a major stock photography and digital assets marketplace. The system provides an alternative that is more engaging than the traditional search while maintaining similar effectiveness. This work provides a mechanism to build and deploy bootstrapped version of an effective conversational agent from readily available query log data. The system can then be used to acquire true conversational data and be fine-tuned further. The methodology discussed in this paper can be extended to e-commerce domains in general.

• [cs.AI]**Relational Marginal Problems: Theory and Estimation**

*Ondrej Kuzelka, Yuyi Wang, Jesse Davis, Steven Schockaert*

http://arxiv.org/abs/1709.05825v1

In the propositional setting, the marginal problem is to find a (maximum-entropy) distribution that has some given marginals. We study this problem in a relational setting and make the following contributions. First, we compare two different notions of relational marginals. Second, we show a duality between the resulting relational marginal problems and the maximum likelihood estimation of the parameters of relational models, which generalizes a well-known duality from the propositional setting. Third, by exploiting the relational marginal formulation, we present a statistically sound method to learn the parameters of relational models that will be applied in settings where the number of constants differs between the training and test data. Furthermore, based on a relational generalization of marginal polytopes, we characterize cases where the standard estimators based on feature's number of true groundings needs to be adjusted and we quantitatively characterize the consequences of these adjustments. Fourth, we prove bounds on expected errors of the estimated parameters, which allows us to lower-bound, among other things, the effective sample size of relational training data.

• [cs.AI]**The Uncertainty Bellman Equation and Exploration**

*Brendan O'Donoghue, Ian Osband, Remi Munos, Volodymyr Mnih*

http://arxiv.org/abs/1709.05380v1

We consider the exploration/exploitation problem in reinforcement learning. For exploitation, it is well known that the Bellman equation connects the value at any time-step to the expected value at subsequent time-steps. In this paper we consider a similar uncertainty Bellman equation (UBE), which connects the uncertainty at any time-step to the expected uncertainties at subsequent time-steps, thereby extending the potential exploratory benefit of a policy beyond individual time-steps. We prove that the unique fixed point of the UBE yields an upper bound on the variance of the estimated value of any fixed policy. This bound can be much tighter than traditional count-based bonuses that compound standard deviation rather than variance. Importantly, and unlike several existing approaches to optimism, this method scales naturally to large systems with complex generalization. Substituting our UBE-exploration strategy for $\epsilon$-greedy improves DQN performance on 51 out of 57 games in the Atari suite.

• [cs.CL]**"How May I Help You?": Modeling Twitter Customer Service Conversations Using Fine-Grained Dialogue Acts**

*Shereen Oraby, Pritam Gundecha, Jalal Mahmud, Mansurul Bhuiyan, Rama Akkiraju*

http://arxiv.org/abs/1709.05413v1

Given the increasing popularity of customer service dialogue on Twitter, analysis of conversation data is essential to understand trends in customer and agent behavior for the purpose of automating customer service interactions. In this work, we develop a novel taxonomy of fine-grained "dialogue acts" frequently observed in customer service, showcasing acts that are more suited to the domain than the more generic existing taxonomies. Using a sequential SVM-HMM model, we model conversation flow, predicting the dialogue act of a given turn in real-time. We characterize differences between customer and agent behavior in Twitter customer service conversations, and investigate the effect of testing our system on different customer service industries. Finally, we use a data-driven approach to predict important conversation outcomes: customer satisfaction, customer frustration, and overall problem resolution. We show that the type and location of certain dialogue acts in a conversation have a significant effect on the probability of desirable and undesirable outcomes, and present actionable rules based on our findings. The patterns and rules we derive can be used as guidelines for outcome-driven automated customer service platforms.

• [cs.CL]**AISHELL-1: An Open-Source Mandarin Speech Corpus and A Speech Recognition Baseline**

*Hui Bu, Jiayu Du, Xingyu Na, Bengu Wu, Hao Zheng*

http://arxiv.org/abs/1709.05522v1

An open-source Mandarin speech corpus called AISHELL-1 is released. It is by far the largest corpus which is suitable for conducting the speech recognition research and building speech recognition systems for Mandarin. The recording procedure, including audio capturing devices and environments are presented in details. The preparation of the related resources, including transcriptions and lexicon are described. The corpus is released with a Kaldi recipe. Experimental results implies that the quality of audio recordings and transcriptions are promising.

• [cs.CL]**Acquiring Background Knowledge to Improve Moral Value Prediction**

*Ying Lin, Joe Hoover, Morteza Dehghani, Marlon Mooijman, Heng Ji*

http://arxiv.org/abs/1709.05467v1

In this paper, we address the problem of detecting expressions of moral values in tweets using content analysis. This is a particularly challenging problem because moral values are often only implicitly signaled in language, and tweets contain little contextual information due to length constraints. To address these obstacles, we present a novel approach to automatically acquire background knowledge from an external knowledge base to enrich input texts and thus improve moral value prediction. By combining basic text features with background knowledge, our overall context-aware framework achieves performance comparable to a single human annotator. To the best of our knowledge, this is the first attempt to incorporate background knowledge for the prediction of implicit psychological variables in the area of computational social science.

• [cs.CL]**Character Distributions of Classical Chinese Literary Texts: Zipf's Law, Genres, and Epochs**

*Chao-Lin Liu, Shuhua Zhang, Yuanli Geng, Huei-ling Lai, Hongsu Wang*

http://arxiv.org/abs/1709.05587v1

We collect 14 representative corpora for major periods in Chinese history in this study. These corpora include poetic works produced in several dynasties, novels of the Ming and Qing dynasties, and essays and news reports written in modern Chinese. The time span of these corpora ranges between 1046 BCE and 2007 CE. We analyze their character and word distributions from the viewpoint of the Zipf's law, and look for factors that affect the deviations and similarities between their Zipfian curves. Genres and epochs demonstrated their influences in our analyses. Specifically, the character distributions for poetic works of between 618 CE and 1644 CE exhibit striking similarity. In addition, although texts of the same dynasty may tend to use the same set of characters, their character distributions still deviate from each other.

• [cs.CL]**Combining Search with Structured Data to Create a More Engaging User Experience in Open Domain Dialogue**

*Kevin K. Bowden, Shereen Oraby, Jiaqi Wu, Amita Misra, Marilyn Walker*

http://arxiv.org/abs/1709.05411v1

The greatest challenges in building sophisticated open-domain conversational agents arise directly from the potential for ongoing mixed-initiative multi-turn dialogues, which do not follow a particular plan or pursue a particular fixed information need. In order to make coherent conversational contributions in this context, a conversational agent must be able to track the types and attributes of the entities under discussion in the conversation and know how they are related. In some cases, the agent can rely on structured information sources to help identify the relevant semantic relations and produce a turn, but in other cases, the only content available comes from search, and it may be unclear which semantic relations hold between the search results and the discourse context. A further constraint is that the system must produce its contribution to the ongoing conversation in real-time. This paper describes our experience building SlugBot for the 2017 Alexa Prize, and discusses how we leveraged search and structured data from different sources to help SlugBot produce dialogic turns and carry on conversations whose length over the semi-finals user evaluation period averaged 8:17 minutes.

• [cs.CL]**Creating and Characterizing a Diverse Corpus of Sarcasm in Dialogue**

*Shereen Oraby, Vrindavan Harrison, Lena Reed, Ernesto Hernandez, Ellen Riloff, Marilyn Walker*

http://arxiv.org/abs/1709.05404v1

The use of irony and sarcasm in social media allows us to study them at scale for the first time. However, their diversity has made it difficult to construct a high-quality corpus of sarcasm in dialogue. Here, we describe the process of creating a large- scale, highly-diverse corpus of online debate forums dialogue, and our novel methods for operationalizing classes of sarcasm in the form of rhetorical questions and hyperbole. We show that we can use lexico-syntactic cues to reliably retrieve sarcastic utterances with high accuracy. To demonstrate the properties and quality of our corpus, we conduct supervised learning experiments with simple features, and show that we achieve both higher precision and F than previous work on sarcasm in debate forums dialogue. We apply a weakly-supervised linguistic pattern learner and qualitatively analyze the linguistic differences in each class.

• [cs.CL]**Data Innovation for International Development: An overview of natural language processing for qualitative data analysis**

*Philipp Broniecki, Anna Hanchar, Slava J. Mikhaylov*

http://arxiv.org/abs/1709.05563v1

Availability, collection and access to quantitative data, as well as its limitations, often make qualitative data the resource upon which development programs heavily rely. Both traditional interview data and social media analysis can provide rich contextual information and are essential for research, appraisal, monitoring and evaluation. These data may be difficult to process and analyze both systematically and at scale. This, in turn, limits the ability of timely data driven decision-making which is essential in fast evolving complex social systems. In this paper, we discuss the potential of using natural language processing to systematize analysis of qualitative data, and to inform quick decision-making in the development context. We illustrate this with interview data generated in a format of micro-narratives for the UNDP Fragments of Impact project.

• [cs.CL]**Flexible Computing Services for Comparisons and Analyses of Classical Chinese Poetry**

*Chao-Lin Liu*

http://arxiv.org/abs/1709.05729v1

We collect nine corpora of representative Chinese poetry for the time span of 1046 BCE and 1644 CE for studying the history of Chinese words, collocations, and patterns. By flexibly integrating our own tools, we are able to provide new perspectives for approaching our goals. We illustrate the ideas with two examples. The first example show a new way to compare word preferences of poets, and the second example demonstrates how we can utilize our corpora in historical studies of the Chinese words. We show the viability of the tools for academic research, and we wish to make it helpful for enriching existing Chinese dictionary as well.

• [cs.CL]**Hierarchical Gated Recurrent Neural Tensor Network for Answer Triggering**

*Wei Li, Yunfang Wu*

http://arxiv.org/abs/1709.05599v1

In this paper, we focus on the problem of answer triggering ad-dressed by Yang et al. (2015), which is a critical component for a real-world question answering system. We employ a hierarchical gated recurrent neural tensor (HGRNT) model to capture both the context information and the deep in-teractions between the candidate answers and the question. Our result on F val-ue achieves 42.6%, which surpasses the baseline by over 10 %.

• [cs.CL]**Limitations of Cross-Lingual Learning from Image Search**

*Mareike Hartmann, Anders Soegaard*

http://arxiv.org/abs/1709.05914v1

Cross-lingual representation learning is an important step in making NLP scale to all the world's languages. Recent work on bilingual lexicon induction suggests that it is possible to learn cross-lingual representations of words based on similarities between images associated with these words. However, that work focused on the translation of selected nouns only. In our work, we investigate whether the meaning of other parts-of-speech, in particular adjectives and verbs, can be learned in the same way. We also experiment with combining the representations learned from visual data with embeddings learned from textual data. Our experiments across five language pairs indicate that previous work does not scale to the problem of learning cross-lingual representations beyond simple nouns.

• [cs.CL]**Order-Preserving Abstractive Summarization for Spoken Content Based on Connectionist Temporal Classification**

*Bo-Ru Lu, Frank Shyu, Yun-Nung Chen, Hung-Yi Lee, Lin-shan Lee*

http://arxiv.org/abs/1709.05475v1

Connectionist temporal classification (CTC) is a powerful approach for sequence-to-sequence learning, and has been popularly used in speech recognition. The central ideas of CTC include adding a label "blank" during training. With this mechanism, CTC eliminates the need of segment alignment, and hence has been applied to various sequence-to-sequence learning problems. In this work, we applied CTC to abstractive summarization for spoken content. The "blank" in this case implies the corresponding input data are less important or noisy; thus it can be ignored. This approach was shown to outperform the existing methods in term of ROUGE scores over Chinese Gigaword and MATBN corpora. This approach also has the nice property that the ordering of words or characters in the input documents can be better preserved in the generated summaries.

• [cs.CL]**Role of Morphology Injection in Statistical Machine Translation**

*Sreelekha S, Pushpak Bhattacharyya*

http://arxiv.org/abs/1709.05487v1

Phrase-based Statistical models are more commonly used as they perform optimally in terms of both, translation quality and complexity of the system. Hindi and in general all Indian languages are morphologically richer than English. Hence, even though Phrase-based systems perform very well for the less divergent language pairs, for English to Indian language translation, we need more linguistic information (such as morphology, parse tree, parts of speech tags, etc.) on the source side. Factored models seem to be useful in this case, as Factored models consider word as a vector of factors. These factors can contain any information about the surface word and use it while translating. Hence, the objective of this work is to handle morphological inflections in Hindi and Marathi using Factored translation models while translating from English. SMT approaches face the problem of data sparsity while translating into a morphologically rich language. It is very unlikely for a parallel corpus to contain all morphological forms of words. We propose a solution to generate these unseen morphological forms and inject them into original training corpora. In this paper, we study factored models and the problem of sparseness in context of translation to morphologically rich languages. We propose a simple and effective solution which is based on enriching the input with various morphological forms of words. We observe that morphology injection improves the quality of translation in terms of both adequacy and fluency. We verify this with the experiments on two morphologically rich languages: Hindi and Marathi, while translating from English.

• [cs.CL]**Sequence to Sequence Learning for Event Prediction**

*Dai Quoc Nguyen, Dat Quoc Nguyen, Cuong Xuan Chu, Stefan Thater, Manfred Pinkal*

http://arxiv.org/abs/1709.06033v1

This paper presents an approach to the task of predicting an event description from a preceding sentence in a text. Our approach explores sequence-to-sequence learning using a bidirectional multi-layer recurrent neural network. Our approach substantially outperforms previous work in terms of the BLEU score on two datasets derived from WikiHow and DeScript respectively. Since the BLEU score is not easy to interpret as a measure of event prediction, we complement our study with a second evaluation that exploits the rich linguistic annotation of gold paraphrase sets of events.

• [cs.CL]**Toward a full-scale neural machine translation in production: the Booking.com use case**

*Pavel Levin, Nishikant Dhanuka, Talaat Khalil, Fedor Kovalev, Maxim Khalilov*

http://arxiv.org/abs/1709.05820v1

While some remarkable progress has been made in neural machine translation (NMT) research, there have not been many reports on its development and evaluation in practice. This paper tries to fill this gap by presenting some of our findings from building an in-house travel domain NMT system in a large scale E-commerce setting. The three major topics that we cover are op- timization and training (including different optimization strategies and corpus sizes), handling real-world content and evaluating results.

• [cs.CL]**Unwritten Languages Demand Attention Too! Word Discovery with Encoder-Decoder Models**

*Marcely Zanon Boito, Alexandre Berard, Aline Villavicencio, Laurent Besacier*

http://arxiv.org/abs/1709.05631v1

Word discovery is the task of extracting words from unsegmented text. In this paper we examine to what extent neural networks can be applied to this task in a realistic unwritten language scenario, where only small corpora and limited annotations are available. We investigate two scenarios: one with no supervision and another with limited supervision with access to the most frequent words. Obtained results show that it is possible to retrieve at least 27% of the gold standard vocabulary by training an encoder-decoder neural machine translation system with only 5,157 sentences. This result is close to those obtained with a task-specific Bayesian nonparametric model. Moreover, our approach has the advantage of generating translation alignments, which could be used to create a bilingual lexicon. As a future perspective, this approach is also well suited to work directly from speech.

• [cs.CL]**Word Vector Enrichment of Low Frequency Words in the Bag-of-Words Model for Short Text Multi-class Classification Problems**

*Bradford Heap, Michael Bain, Wayne Wobcke, Alfred Krzywicki, Susanne Schmeidl*

http://arxiv.org/abs/1709.05778v1

The bag-of-words model is a standard representation of text for many linear classifier learners. In many problem domains, linear classifiers are preferred over more complex models due to their efficiency, robustness and interpretability, and the bag-of-words text representation can capture sufficient information for linear classifiers to make highly accurate predictions. However in settings where there is a large vocabulary, large variance in the frequency of terms in the training corpus, many classes and very short text (e.g., single sentences or document titles) the bag-of-words representation becomes extremely sparse, and this can reduce the accuracy of classifiers. A particular issue in such settings is that short texts tend to contain infrequently occurring or rare terms which lack class-conditional evidence. In this work we introduce a method for enriching the bag-of-words model by complementing such rare term information with related terms from both general and domain-specific Word Vector models. By reducing sparseness in the bag-of-words models, our enrichment approach achieves improved classification over several baseline classifiers in a variety of text classification problems. Our approach is also efficient because it requires no change to the linear classifier before or during training, since bag-of-words enrichment applies only to text being classified.

• [cs.CR]**Adaptive Laplace Mechanism: Differential Privacy Preservation in Deep Learning**

*NhatHai Phan, Xintao Wu, Han Hu, Dejing Dou*

http://arxiv.org/abs/1709.05750v1

In this paper, we focus on developing a novel mechanism to preserve differential privacy in deep neural networks, such that: (1) The privacy budget consumption is totally independent of the number of training steps; (2) It has the ability to adaptively inject noise into features based on the contribution of each to the output; and (3) It could be applied in a variety of different deep neural networks. To achieve this, we figure out a way to perturb affine transformations of neurons, and loss functions used in deep neural networks. In addition, our mechanism intentionally adds "more noise" into features which are "less relevant" to the model output, and vice-versa. Our theoretical analysis further derives the sensitivities and error bounds of our mechanism. Rigorous experiments conducted on MNIST and CIFAR-10 datasets show that our mechanism is highly effective and outperforms existing solutions.

• [cs.CR]**Mitigating Evasion Attacks to Deep Neural Networks via Region-based Classification**

*Xiaoyu Cao, Neil Zhenqiang Gong*

http://arxiv.org/abs/1709.05583v1

Deep neural networks (DNNs) have transformed several artificial intelligence research areas including computer vision, speech recognition, and natural language processing. However, recent studies demonstrated that DNNs are vulnerable to adversarial manipulations at testing time. Specifically, suppose we have a testing example, whose label can be correctly predicted by a DNN classifier. An attacker can add a small carefully crafted noise to the testing example such that the DNN classifier predicts an incorrect label, where the crafted testing example is called adversarial example. Such attacks are called evasion attacks. Evasion attacks are one of the biggest challenges for deploying DNNs in safety and security critical applications such as self-driving cars. In this work, we develop new DNNs that are robust to state-of-the-art evasion attacks. Our key observation is that adversarial examples are close to the classification boundary. Therefore, we propose region-based classification to be robust to adversarial examples. Specifically, for a benign/adversarial testing example, we ensemble information in a hypercube centered at the example to predict its label. In contrast, traditional classifiers are point-based classification, i.e., given a testing example, the classifier predicts its label based on the testing example alone. Our evaluation results on MNIST and CIFAR-10 datasets demonstrate that our region-based classification can significantly mitigate evasion attacks without sacrificing classification accuracy on benign examples. Specifically, our region-based classification achieves the same classification accuracy on testing benign examples as point-based classification, but our region-based classification is significantly more robust than point-based classification to state-of-the-art evasion attacks.

• [cs.CR]**Settling Payments Fast and Private: Efficient Decentralized Routing for Path-Based Transactions**

*Stefanie Roos, Pedro Moreno-Sanchez, Aniket Kate, Ian Goldberg*

http://arxiv.org/abs/1709.05748v1

Path-based transaction (PBT) networks, which settle payments from one user to another via a path of intermediaries, are a growing area of research. They overcome the scalability and privacy issues in cryptocurrencies like Bitcoin and Ethereum by replacing expensive and slow on-chain blockchain operations with inexpensive and fast off-chain transfers. In the form of credit networks such as Ripple and Stellar, they also enable low-price real-time gross settlements across different currencies. For example, SilentWhsipers is a recently proposed fully distributed credit network relying on path-based transactions for secure and in particular private payments without a public ledger. At the core of a decentralized PBT network is a routing algorithm that discovers transaction paths between payer and payee. During the last year, a number of routing algorithms have been proposed. However, the existing ad hoc efforts lack either efficiency or privacy. In this work, we first identify several efficiency concerns in SilentWhsipers. Armed with this knowledge, we design and evaluate SpeedyMurmurs, a novel routing algorithm for decentralized PBT networks using efficient and flexible embedding-based path discovery and on-demand efficient stabilization to handle the dynamics of a PBT network. Our simulation study, based on real-world data from the currently deployed Ripple credit network, indicates that SpeedyMurmurs reduces the overhead of stabilization by up to two orders of magnitude and the overhead of routing a transaction by more than a factor of two. Furthermore, using SpeedyMurmurs maintains at least the same success ratio as decentralized landmark routing, while providing lower delays. Finally, SpeedyMurmurs achieves key privacy goals for routing in PBT networks.

• [cs.CV]**A Causal And-Or Graph Model for Visibility Fluent Reasoning in Human-Object Interactions**

*Lei Qin, Yuanlu Xu, Xiaobai Liu, Song-Chun Zhu*

http://arxiv.org/abs/1709.05437v1

Tracking humans that are interacting with the other subjects or environment remains unsolved in visual tracking, because the visibility of the human of interests in videos is unknown and might vary over times. In particular, it is still difficult for state-of-the-art human trackers to recover complete human trajectories in crowded scenes with frequent human interactions. In this work, we consider the visibility status of a subject as a fluent variable, whose changes are mostly attributed to the subject's interactions with the surrounding, e.g., crossing behind another objects, entering a building, or getting into a vehicle, etc. We introduce a Causal And-Or Graph (C-AOG) to represent the causal-effect relations between an object's visibility fluents and its activities, and develop a probabilistic graph model to jointly reason the visibility fluent change (e.g., from visible to invisible) and track humans in videos. We formulate the above joint task as an iterative search of feasible causal graph structure that enables fast search algorithm, e.g., dynamic programming method. We apply the proposed method on challenging video sequences to evaluate its capabilities of estimating visibility fluent changes of subjects and tracking subjects of interests over time. Results with comparisons demonstrated that our method clearly outperforms the alternative trackers and can recover complete trajectories of humans in complicated scenarios with frequent human interactions.

• [cs.CV]**A Hierarchical Probabilistic Model for Facial Feature Detection**

*Yue Wu, Ziheng Wang, Qiang Ji*

http://arxiv.org/abs/1709.05732v1

Facial feature detection from facial images has attracted great attention in the field of computer vision. It is a nontrivial task since the appearance and shape of the face tend to change under different conditions. In this paper, we propose a hierarchical probabilistic model that could infer the true locations of facial features given the image measurements even if the face is with significant facial expression and pose. The hierarchical model implicitly captures the lower level shape variations of facial components using the mixture model. Furthermore, in the higher level, it also learns the joint relationship among facial components, the facial expression, and the pose information through automatic structure learning and parameter estimation of the probabilistic model. Experimental results on benchmark databases demonstrate the effectiveness of the proposed hierarchical probabilistic model.

• [cs.CV]**An Improved Fatigue Detection System Based on Behavioral Characteristics of Driver**

*Rajat Gupta, Kanishk Aman, Nalin Shiva, Yadvendra Singh*

http://arxiv.org/abs/1709.05669v1

In recent years, road accidents have increased significantly. One of the major reasons for these accidents, as reported is driver fatigue. Due to continuous and longtime driving, the driver gets exhausted and drowsy which may lead to an accident. Therefore, there is a need for a system to measure the fatigue level of driver and alert him when he/she feels drowsy to avoid accidents. Thus, we propose a system which comprises of a camera installed on the car dashboard. The camera detect the driver's face and observe the alteration in its facial features and uses these features to observe the fatigue level. Facial features include eyes and mouth. Principle Component Analysis is thus implemented to reduce the features while minimizing the amount of information lost. The parameters thus obtained are processed through Support Vector Classifier for classifying the fatigue level. After that classifier output is sent to the alert unit.

• [cs.CV]**Automatic Tool Landmark Detection for Stereo Vision in Robot-Assisted Retinal Surgery**

*Thomas Probst, Kevis-Kokitsi Maninis, Ajad Chhatkuli, Mouloud Ourak, Emmanuel Vander Poorten, Luc Van Gool*

http://arxiv.org/abs/1709.05665v1

Computer vision and robotics are being increasingly applied in medical interventions. Especially in interventions where extreme precision is required they could make a difference. One such application is robot-assisted retinal microsurgery. In recent works, such interventions are conducted under a stereo-microscope, and with a robot-controlled surgical tool. The complementarity of computer vision and robotics has however not yet been fully exploited. In order to improve the robot control we are interested in 3D reconstruction of the anatomy and in automatic tool localization using a stereo microscope. In this paper, we solve this problem for the first time using a single pipeline, starting from uncalibrated cameras to reach metric 3D reconstruction and registration, in retinal microsurgery. The key ingredients of our method are: (a) surgical tool landmark detection, and (b) 3D reconstruction with the stereo microscope, using the detected landmarks. To address the former, we propose a novel deep learning method that detects and recognizes keypoints in high definition images at higher than real-time speed. We use the detected 2D keypoints along with their corresponding 3D coordinates obtained from the robot sensors to calibrate the stereo microscope using an affine projection model. We design an online 3D reconstruction pipeline that makes use of smoothness constraints and performs robot-to-camera registration. The entire pipeline is extensively validated on open-sky porcine eye sequences. Quantitative and qualitative results are presented for all steps.

• [cs.CV]**Beyond SIFT using Binary features for Loop Closure Detection**

*Lei Han, Guyue Zhou, Lan Xu, Lu Fang*

http://arxiv.org/abs/1709.05833v1

In this paper a binary feature based Loop Closure Detection (LCD) method is proposed, which for the first time achieves higher precision-recall (PR) performance compared with state-of-the-art SIFT feature based approaches. The proposed system originates from our previous work Multi-Index hashing for Loop closure Detection (MILD), which employs Multi-Index Hashing (MIH)~\cite{greene1994multi} for Approximate Nearest Neighbor (ANN) search of binary features. As the accuracy of MILD is limited by repeating textures and inaccurate image similarity measurement, burstiness handling is introduced to solve this problem and achieves considerable accuracy improvement. Additionally, a comprehensive theoretical analysis on MIH used in MILD is conducted to further explore the potentials of hashing methods for ANN search of binary features from probabilistic perspective. This analysis provides more freedom on best parameter choosing in MIH for different application scenarios. Experiments on popular public datasets show that the proposed approach achieved the highest accuracy compared with state-of-the-art while running at 30Hz for databases containing thousands of images.

• [cs.CV]**Combinational neural network using Gabor filters for the classification of handwritten digits**

*N. Joshi*

http://arxiv.org/abs/1709.05867v1

A classification algorithm that combines the components of k-nearest neighbours and multilayer neural networks has been designed and tested. With this method the computational time required for training the dataset has been reduced substancially. Gabor filters were used for the feature extraction to ensure a better performance. This algorithm is tested with MNIST dataset and it will be integrated as a module in the object recognition software which is currently under development.

• [cs.CV]**Continuous Multimodal Emotion Recognition Approach for AVEC 2017**

*Narotam Singh, Nittin Singh, Abhinav Dhall*

http://arxiv.org/abs/1709.05861v1

This paper reports the analysis of audio and visual features in predicting the emotion dimensions under the seventh Audio/Visual Emotion Subchallenge (AVEC 2017). For visual features we used the HOG (Histogram of Gradients) features, Fisher encodings of SIFT (Scale-Invariant Feature Transform) features based on Gaussian mixture model (GMM) and some pretrained Convolutional Neural Network layers as features; all these extracted for each video clip. For audio features we used the Bag-of-audio-words (BoAW) representation of the LLDs (low-level descriptors) generated by openXBOW provided by the organisers of the event. Then we trained fully connected neural network regression model on the dataset for all these different modalities. We applied multimodal fusion on the output models to get the Concordance correlation coefficient on Development set as well as Test set.

• [cs.CV]**Coupled Ensembles of Neural Networks**

*Anuvabh Dutt, Denis Pellerin, Georges Quénot*

http://arxiv.org/abs/1709.06053v1

We investigate in this paper the architecture of deep convolutional networks. Building on existing state of the art models, we propose a reconfiguration of the model parameters into several parallel branches at the global network level, with each branch being a standalone CNN. We show that this arrangement is an efficient way to significantly reduce the number of parameters without losing performance or to significantly improve the performance with the same level of performance. The use of branches brings an additional form of regularization. In addition to the split into parallel branches, we propose a tighter coupling of these branches by placing the "fuse (averaging) layer" before the Log-Likelihood and SoftMax layers during training. This gives another significant performance improvement, the tighter coupling favouring the learning of better representations, even at the level of the individual branches. We refer to this branched architecture as "coupled ensembles". The approach is very generic and can be applied with almost any DCNN architecture. With coupled ensembles of DenseNet-BC and parameter budget of 25M, we obtain error rates of 2.92%, 15.68% and 1.50% respectively on CIFAR-10, CIFAR-100 and SVHN tasks. For the same budget, DenseNet-BC has error rate of 3.46%, 17.18%, and 1.8% respectively. With ensembles of coupled ensembles, of DenseNet-BC networks, with 50M total parameters, we obtain error rates of 2.72%, 15.13% and 1.42% respectively on these tasks.

• [cs.CV]**DeepLung: 3D Deep Convolutional Nets for Automated Pulmonary Nodule Detection and Classification**

*Wentao Zhu, Chaochun Liu, Wei Fan, Xiaohui Xie*

http://arxiv.org/abs/1709.05538v1

In this work, we present a fully automated lung CT cancer diagnosis system, DeepLung. DeepLung contains two parts, nodule detection and classification. Considering the 3D nature of lung CT data, two 3D networks are designed for the nodule detection and classification respectively. Specifically, a 3D Faster R-CNN is designed for nodule detection with a U-net-like encoder-decoder structure to effectively learn nodule features. For nodule classification, gradient boosting machine (GBM) with 3D dual path network (DPN) features is proposed. The nodule classification subnetwork is validated on a public dataset from LIDC-IDRI, on which it achieves better performance than state-of-the-art approaches, and surpasses the average performance of four experienced doctors. For the DeepLung system, candidate nodules are detected first by the nodule detection subnetwork, and nodule diagnosis is conducted by the classification subnetwork. Extensive experimental results demonstrate the DeepLung is comparable to the experienced doctors both for the nodule-level and patient-level diagnosis on the LIDC-IDRI dataset.

• [cs.CV]**Depression Scale Recognition from Audio, Visual and Text Analysis**

*Shubham Dham, Anirudh Sharma, Abhinav Dhall*

http://arxiv.org/abs/1709.05865v1

Depression is a major mental health disorder that is rapidly affecting lives worldwide. Depression not only impacts emotional but also physical and psychological state of the person. Its symptoms include lack of interest in daily activities, feeling low, anxiety, frustration, loss of weight and even feeling of self-hatred. This report describes work done by us for Audio Visual Emotion Challenge (AVEC) 2017 during our second year BTech summer internship. With the increase in demand to detect depression automatically with the help of machine learning algorithms, we present our multimodal feature extraction and decision level fusion approach for the same. Features are extracted by processing on the provided Distress Analysis Interview Corpus-Wizard of Oz (DAIC-WOZ) database. Gaussian Mixture Model (GMM) clustering and Fisher vector approach were applied on the visual data; statistical descriptors on gaze, pose; low level audio features and head pose and text features were also extracted. Classification is done on fused as well as independent features using Support Vector Machine (SVM) and neural networks. The results obtained were able to cross the provided baseline on validation data set by 17% on audio features and 24.5% on video features.

• [cs.CV]**Direct Pose Estimation with a Monocular Camera**

*Darius Burschka, Elmar Mair*

http://arxiv.org/abs/1709.05815v1

We present a direct method to calculate a 6DoF pose change of a monocular camera for mobile navigation. The calculated pose is estimated up to a constant unknown scale parameter that is kept constant over the entire reconstruction process. This method allows a direct cal- culation of the metric position and rotation without any necessity to fuse the information in a probabilistic approach over longer frame sequence as it is the case in most currently used VSLAM approaches. The algorithm provides two novel aspects to the field of monocular navigation. It allows a direct pose estimation without any a-priori knowledge about the world directly from any two images and it provides a quality measure for the estimated motion parameters that allows to fuse the resulting information in Kalman Filters. We present the mathematical formulation of the approach together with experimental validation on real scene images.

• [cs.CV]**Direction-Aware Semi-Dense SLAM**

*Julian Straub, Randi Cabezas, John Leonard, John W. Fisher III*

http://arxiv.org/abs/1709.05774v1

To aide simultaneous localization and mapping (SLAM), future perception systems will incorporate forms of scene understanding. In a step towards fully integrated probabilistic geometric scene understanding, localization and mapping we propose the first direction-aware semi-dense SLAM system. It jointly infers the directional Stata Center World (SCW) segmentation and a surfel-based semi-dense map while performing real-time camera tracking. The joint SCW map model connects a scene-wide Bayesian nonparametric Dirichlet Process von-Mises-Fisher mixture model (DP-vMF) prior on surfel orientations with the local surfel locations via a conditional random field (CRF). Camera tracking leverages the SCW segmentation to improve efficiency via guided observation selection. Results demonstrate improved SLAM accuracy and tracking efficiency at state of the art performance.

• [cs.CV]**E$^2$BoWs: An End-to-End Bag-of-Words Model via Deep Convolutional Neural Network**

*Xiaobin Liu, Shiliang Zhang, Tiejun Huang, Qi Tian*

http://arxiv.org/abs/1709.05903v1

Traditional Bag-of-visual Words (BoWs) model is commonly generated with many steps including local feature extraction, codebook generation, and feature quantization, \emph{etc.} Those steps are relatively independent with each other and are hard to be jointly optimized. Moreover, the dependency on hand-crafted local feature makes BoWs model not effective in conveying high-level semantics. These issues largely hinder the performance of BoWs model in large-scale image applications. To conquer these issues, we propose an End-to-End BoWs (E$^2$BoWs) model based on Deep Convolutional Neural Network (DCNN). Our model takes an image as input, then identifies and separates the semantic objects in it, and finally outputs the visual words with high semantic discriminative power. Specifically, our model firstly generates Semantic Feature Maps (SFMs) corresponding to different object categories through convolutional layers, then introduces Bag-of-Words Layers (BoWL) to generate visual words for each individual feature map. We also introduce a novel learning algorithm to reinforce the sparsity of the generated E$^2$BoWs model, which further ensures the time and memory efficiency. We evaluate the proposed E$^2$BoWs model on several image search datasets including \emph{CIFAR-10}, \emph{CIFAR-100}, \emph{MIRFLICKR-25K} and \emph{NUS-WIDE}. Experimental results show that our method achieves promising accuracy and efficiency compared with recent deep learning based retrieval works.

• [cs.CV]**Facial Feature Tracking under Varying Facial Expressions and Face Poses based on Restricted Boltzmann Machines**

*Yue Wu, Zuoguan Wang, Qiang Ji*

http://arxiv.org/abs/1709.05731v1

Facial feature tracking is an active area in computer vision due to its relevance to many applications. It is a nontrivial task, since faces may have varying facial expressions, poses or occlusions. In this paper, we address this problem by proposing a face shape prior model that is constructed based on the Restricted Boltzmann Machines (RBM) and their variants. Specifically, we first construct a model based on Deep Belief Networks to capture the face shape variations due to varying facial expressions for near-frontal view. To handle pose variations, the frontal face shape prior model is incorporated into a 3-way RBM model that could capture the relationship between frontal face shapes and non-frontal face shapes. Finally, we introduce methods to systematically combine the face shape prior models with image measurements of facial feature points. Experiments on benchmark databases show that with the proposed method, facial feature points can be tracked robustly and accurately even if faces have significant facial expressions and poses.

• [cs.CV]**Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Detection in Video**

*Mohammad Javad Shafiee, Brendan Chywl, Francis Li, Alexander Wong*

http://arxiv.org/abs/1709.05943v1

Object detection is considered one of the most challenging problems in this field of computer vision, as it involves the combination of object classification and object localization within a scene. Recently, deep neural networks (DNNs) have been demonstrated to achieve superior object detection performance compared to other approaches, with YOLOv2 (an improved You Only Look Once model) being one of the state-of-the-art in DNN-based object detection methods in terms of both speed and accuracy. Although YOLOv2 can achieve real-time performance on a powerful GPU, it still remains very challenging for leveraging this approach for real-time object detection in video on embedded computing devices with limited computational power and limited memory. In this paper, we propose a new framework called Fast YOLO, a fast You Only Look Once framework which accelerates YOLOv2 to be able to perform object detection in video on embedded devices in a real-time manner. First, we leverage the evolutionary deep intelligence framework to evolve the YOLOv2 network architecture and produce an optimized architecture (referred to as O-YOLOv2 here) that has 2.8X fewer parameters with just a ~2% IOU drop. To further reduce power consumption on embedded devices while maintaining performance, a motion-adaptive inference method is introduced into the proposed Fast YOLO framework to reduce the frequency of deep inference with O-YOLOv2 based on temporal motion characteristics. Experimental results show that the proposed Fast YOLO framework can reduce the number of deep inferences by an average of 38.13%, and an average speedup of ~3.3X for objection detection in video compared to the original YOLOv2, leading Fast YOLO to run an average of ~18FPS on a Nvidia Jetson TX1 embedded system.

• [cs.CV]**Joint Estimation of Camera Pose, Depth, Deblurring, and Super-Resolution from a Blurred Image Sequence**

*Haesol Park, Kyoung Mu Lee*

http://arxiv.org/abs/1709.05745v1

The conventional methods for estimating camera poses and scene structures from severely blurry or low resolution images often result in failure. The off-the-shelf deblurring or super-resolution methods may show visually pleasing results. However, applying each technique independently before matching is generally unprofitable because this naive series of procedures ignores the consistency between images. In this paper, we propose a pioneering unified framework that solves four problems simultaneously, namely, dense depth reconstruction, camera pose estimation, super-resolution, and deblurring. By reflecting a physical imaging process, we formulate a cost minimization problem and solve it using an alternating optimization technique. The experimental results on both synthetic and real videos show high-quality depth maps derived from severely degraded images that contrast the failures of naive multi-view stereo methods. Our proposed method also produces outstanding deblurred and super-resolved images unlike the independent application or combination of conventional video deblurring, super-resolution methods.

• [cs.CV]**Joint Parsing of Cross-view Scenes with Spatio-temporal Semantic Parse Graphs**

*Hang Qi, Yuanlu Xu, Tao Yuan, Tianfu Wu, Song-Chun Zhu*

http://arxiv.org/abs/1709.05436v1

Cross-view video understanding is an important yet under-explored area in computer vision. In this paper, we introduce a joint parsing method that takes view-centric proposals from pre-trained computer vision models and produces spatio-temporal parse graphs that represents a coherent scene-centric understanding of cross-view scenes. Our key observations are that overlapping fields of views embed rich appearance and geometry correlations and that knowledge segments corresponding to individual vision tasks are governed by consistency constraints available in commonsense knowledge. The proposed joint parsing framework models such correlations and constraints explicitly and generates semantic parse graphs about the scene. Quantitative experiments show that scene-centric predictions in the parse graph outperform view-centric predictions.

• [cs.CV]**LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation**

*Gabriele Costante, Thomas A. Ciarfuglia*

http://arxiv.org/abs/1709.06019v1

This work proposes a novel deep network architecture to solve the camera Ego-Motion estimation problem. A motion estimation network generally learns features similar to Optical Flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an Auto-Encoder network to find a non-linear representation of the OF manifold. In addition, we propose to learn the latent space jointly with the estimation task, so that the learned OF features become a more robust description of the OF input. We call this novel architecture LS-VO. The experiments show that LS-VO achieves a considerable increase in performances in respect to baselines, while the number of parameters of the estimation network only slightly increases.

• [cs.CV]**Learning a Fully Convolutional Network for Object Recognition using very few Data**

*Christoph Reinders, Hanno Ackermann, Michael Ying Yang, Bodo Rosenhahn*

http://arxiv.org/abs/1709.05910v1

In recent years, data-driven methods have shown great success for extracting information about the infrastruc- ture in urban areas. These algorithms are usually trained on large datasets consisting of thousands or millions of labeled training examples. While large datasets have been published regarding cars, for cyclists very few labeled data is available although appearance, point of view, and positioning of even relevant objects differ. Unfortunately, labeling data is costly and requires a huge amount of work. In this paper, we thus address the problem of learning with very few labels. The aim is to recognize particular traffic signs in crowdsourced data to collect information which is of interest to cyclists. We propose a system for object recognition that is trained with only 15 examples per class on average. To achieve this, we combine the advantages of convolutional neural networks and random forests to learn a patch-wise classifier. In the next step, we map the random forest to a neural network and transform the classifier to a fully convolutional network. Thereby, the processing of full images is significantly accelerated and bounding boxes can be predicted. Finally, we integrate data of the Global Positioning System (GPS) to localize the predictions on the map. In comparison to Faster R-CNN and other networks for object recognition or algorithms for transfer learning, we considerably reduce the required amount of labeled data. We demonstrate good performance on the recognition of traffic signs for cyclists as well as their localization in maps.

• [cs.CV]**Long-Term Ensemble Learning of Visual Place Classifiers**

*Xiaoxiao Fei, Kanji Tanaka, Yichu Fang, Akitaka Takayama*

http://arxiv.org/abs/1709.05470v1

This paper addresses the problem of cross-season visual place classification (VPC) from a novel perspective of long-term map learning. Our goal is to enable transfer learning efficiently from one season to the next, at a small constant cost, and without wasting the robot's available long-term-memory by memorizing very large amounts of training data. To realize a good tradeoff between generalization and specialization abilities, we employ an ensemble of convolutional neural network (DCN) classifiers and consider the task of scheduling (when and which classifiers to retrain), given a previous season's DCN classifiers as the sole prior knowledge. We present a unified framework for retraining scheduling and discuss practical implementation strategies. Furthermore, we address the task of partitioning a robot's workspace into places to define place classes in an unsupervised manner, rather than using uniform partitioning, so as to maximize VPC performance. Experiments using the publicly available NCLT dataset revealed that retraining scheduling of a DCN classifier ensemble is crucial and performance is significantly increased by using planned scheduling.

• [cs.CV]**Microscopy Cell Segmentation via Adversarial Neural Networks**

*Assaf Arbelle, Tammy Riklin Raviv*

http://arxiv.org/abs/1709.05860v1

We present a novel approach for the segmentation of microscopy images. This method utilizes recent development in the field of Deep Artificial Neural Networks in general and specifically the advances in Generative Adversarial Neural Networks (GAN). We propose a pair of two competitive networks which are trained simultaneously and together define a min-max game resulting in an accurate segmentation of a given image. The system is an expansion of the well know GAN model to conditional probabilities given an input image. This approach has two main strengths as it is weakly supervised, i.e. can be easily trained on a limited amount of data, and does not require a definition of a loss function for the optimization. Promising results are presented. The code is freely available at: https://github.com/arbellea/DeepCellSeg.git

• [cs.CV]**Multi-Person Pose Estimation via Column Generation**

*Shaofei Wang, Chong Zhang, Miguel A. Gonzalez-Ballester, Alexander Ihler, Julian Yarkony*

http://arxiv.org/abs/1709.05982v1

We study the problem of multi-person pose estimation in natural images. A pose estimate describes the spatial position and identity (head, foot, knee, etc.) of every non-occluded body part of a person. Pose estimation is difficult due to issues such as deformation and variation in body configurations and occlusion of parts, while multi-person settings add complications such as an unknown number of people, with unknown appearance and possible interactions in their poses and part locations. We give a novel integer program formulation of the multi-person pose estimation problem, in which variables correspond to assignments of parts in the image to poses in a two-tier, hierarchical way. This enables us to develop an efficient custom optimization procedure based on column generation, where columns are produced by exact optimization of very small scale integer programs. We demonstrate improved accuracy and speed for our method on the MPII multi-person pose estimation benchmark.

• [cs.CV]**Multi-Task Learning for Segmentation of Building Footprints with Deep Neural Networks**

*Benjamin Bischke, Patrick Helber, Joachim Folz, Damian Borth, Andreas Dengel*

http://arxiv.org/abs/1709.05932v1

The increased availability of high resolution satellite imagery allows to sense very detailed structures on the surface of our planet. Access to such information opens up new directions in the analysis of remote sensing imagery. However, at the same time this raises a set of new challenges for existing pixel-based prediction methods, such as semantic segmentation approaches. While deep neural networks have achieved significant advances in the semantic segmentation of high resolution images in the past, most of the existing approaches tend to produce predictions with poor boundaries. In this paper, we address the problem of preserving semantic segmentation boundaries in high resolution satellite imagery by introducing a new cascaded multi-task loss. We evaluate our approach on Inria Aerial Image Labeling Dataset which contains large-scale and high resolution images. Our results show that we are able to outperform state-of-the-art methods by 8.3% without any additional post-processing step.

• [cs.CV]**NIMA: Neural Image Assessment**

*Hossein Talebi, Peyman Milanfar*

http://arxiv.org/abs/1709.05424v1

Automatically learned quality assessment for images has recently become a hot topic due to its usefulness in a wide variety of applications such as evaluating image capture pipelines, storage techniques and sharing media. Despite the subjective nature of this problem, most existing methods only predict the mean opinion score provided by datasets such as AVA [1] and TID2013 [2]. Our approach differs from others in that we predict the distribution of human opinion scores using a convolutional neural network. Our architecture also has the advantage of being significantly simpler than other methods with comparable performance. Our proposed approach relies on the success (and retraining) of proven, state-of-the-art deep object recognition networks. Our resulting network can be used to not only score images reliably and with high correlation to human perception, but also to assist with adaptation and optimization of photo editing/enhancement algorithms in a photographic pipeline. All this is done without need of a "golden" reference image, consequently allowing for single-image, semantic- and perceptually-aware, no-reference quality assessment.

• [cs.CV]**Neural Affine Grayscale Image Denoising**

*Sungmin Cha, Taesup Moon*

http://arxiv.org/abs/1709.05672v1

We propose a new grayscale image denoiser, dubbed as Neural Affine Image Denoiser (Neural AIDE), which utilizes neural network in a novel way. Unlike other neural network based image denoising methods, which typically apply simple supervised learning to learn a mapping from a noisy patch to a clean patch, we formulate to train a neural network to learn an \emph{affine} mapping that gets applied to a noisy pixel, based on its context. Our formulation enables both supervised training of the network from the labeled training dataset and adaptive fine-tuning of the network parameters using the given noisy image subject to denoising. The key tool for devising Neural AIDE is to devise an estimated loss function of the MSE of the affine mapping, solely based on the noisy data. As a result, our algorithm can outperform most of the recent state-of-the-art methods in the standard benchmark datasets. Moreover, our fine-tuning method can nicely overcome one of the drawbacks of the patch-level supervised learning methods in image denoising; namely, a supervised trained model with a mismatched noise variance can be mostly corrected as long as we have the matched noise variance during the fine-tuning step.

• [cs.CV]**Normal Integration: A Survey**

*Yvain Quéau, Jean-Denis Durou, Jean-François Aujol*

http://arxiv.org/abs/1709.05940v1

The need for efficient normal integration methods is driven by several computer vision tasks such as shape-from-shading, photometric stereo, deflectometry, etc. In the first part of this survey, we select the most important properties that one may expect from a normal integration method, based on a thorough study of two pioneering works by Horn and Brooks [28] and by Frankot and Chellappa [19]. Apart from accuracy, an integration method should at least be fast and robust to a noisy normal field. In addition, it should be able to handle several types of boundary condition, including the case of a free boundary, and a reconstruction domain of any shape i.e., which is not necessarily rectangular. It is also much appreciated that a minimum number of parameters have to be tuned, or even no parameter at all. Finally, it should preserve the depth discontinuities. In the second part of this survey, we review most of the existing methods in view of this analysis, and conclude that none of them satisfies all of the required properties. This work is complemented by a companion paper entitled Variational Methods for Normal Integration, in which we focus on the problem of normal integration in the presence of depth discontinuities, a problem which occurs as soon as there are occlusions.

• [cs.CV]**Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks**

*Anastasiia D. Sokolova, Angelina S. Kharchevnikova, Andrey V. Savchenko*

http://arxiv.org/abs/1709.05675v1

In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. Gender and age are estimated for each cluster (person) in order to facilitate the usage of the organized video collection. The particular attention is focused on the aggregation of features extracted from each frame with the deep convolutional neural networks. The experimental results of the proposed approach using YTF and IJB-A datasets demonstrated that the most accurate and fast solution is achieved for matching of normalized average of feature vectors of all frames in a track.

• [cs.CV]**Rotation Adaptive Visual Object Tracking with Motion Consistency**

*Litu Rout, Sidhartha, Gorthi R. K. S. S. Manyam, Deepak Mishra*

http://arxiv.org/abs/1709.06057v1

Visual Object tracking research has undergone significant improvement in the past few years. The emergence of tracking by detection approach in tracking paradigm has been quite successful in many ways. Recently, deep convolutional neural networks have been extensively used in most successful trackers. Yet, the standard approach has been based on correlation or feature selection with minimal consideration given to motion consistency. Thus, there is still a need to capture various physical constraints through motion consistency which will improve accuracy, robustness and more importantly rotation adaptiveness. Therefore, one of the major aspects of this paper is to investigate the outcome of rotation adaptiveness in visual object tracking. Among other key contributions, the paper also includes various consistencies that turn out to be extremely effective in numerous challenging sequences than the current state-of-the-art.

• [cs.CV]**Social Style Characterization from Egocentric Photo-streams**

*Maedeh Aghaei, Mariella Dimiccoli, Cristian Canton Ferrer, Petia Radeva*

http://arxiv.org/abs/1709.05775v1

This paper proposes a system for automatic social pattern characterization using a wearable photo-camera. The proposed pipeline consists of three major steps. First, detection of people with whom the camera wearer interacts and, second, categorization of the detected social interactions into formal and informal. These two steps act at event-level where each potential social event is modeled as a multi-dimensional time-series, whose dimensions correspond to a set of relevant features for each task, and a LSTM network is employed for time-series classification. In the last step, recurrences of the same person across the whole set of social interactions are clustered to achieve a comprehensive understanding of the diversity and frequency of the social relations of the user. Experiments over a dataset acquired by a user wearing a photo-camera during a month show promising results on the task of social pattern characterization from egocentric photo-streams.

• [cs.CV]**StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection**

*Sanghyun Woo, Soonmin Hwang, In So Kweon*

http://arxiv.org/abs/1709.05788v1

One-stage object detectors such as SSD or YOLO already have shown promising accuracy with small memory footprint and fast speed. However, it is widely recognized that one-stage detectors have difficulty in detecting small objects while they are competitive with two-stage methods on large objects. In this paper, we investigate how to alleviate this problem starting from the SSD framework. Due to their pyramidal design, the lower layer that is responsible for small objects lacks strong semantics(e.g contextual information). We address this problem by introducing a feature combining module that spreads out the strong semantics in a top-down manner. Our final model StairNet detector unifies the multi-scale representations and semantic distribution effectively. Experiments on PASCAL VOC 2007 and PASCAL VOC 2012 datasets demonstrate that StairNet significantly improves the weakness of SSD and outperforms the other state-of-the-art one-stage detectors.

• [cs.CV]**Target-adaptive CNN-based pansharpening**

*Giuseppe Scarpa, Sergio Vitale, Davide Cozzolino*

http://arxiv.org/abs/1709.06054v1

We recently proposed a convolutional neural network for remote sensing image pansharpening obtaining a significant performance gain over the state of the art. In this paper, we explore a number of architectural and training variations to this baseline, achieving further performance gains with a lightweight network which trains very fast. Leveraging on this latter property, we propose a target-adaptive usage modality which ensures a very good performance also in the presence of a mismatch w.r.t. the training set, and even across different sensors. The proposed method, published online as an off-the-shelf software tool, allows users to perform fast and high-quality CNN-based pansharpening of their own target images on general-purpose hardware.

• [cs.CV]**The Multiscale Bowler-Hat Transform for Blood Vessel Enhancement in Retinal Images**

*Çiğdem Sazak, Carl J. Nelson, Boguslaw Obara*

http://arxiv.org/abs/1709.05495v1

Enhancement, followed by segmentation, quantification and modelling, of blood vessels in retinal images plays an essential role in computer-aid retinopathy diagnosis. In this paper, we introduce a new vessel enhancement method which is the bowler-hat transform based on mathematical morphology. The proposed method combines different structuring elements to detect innate features of vessel-like structures. We evaluate the proposed method qualitatively and quantitatively, and compare it with the existing, state-of-the-art methods using both synthetic and real datasets. Our results show that the proposed method achieves high-quality vessel-like structure enhancement in both synthetic examples and in clinically relevant retinal images, and is shown to be able to detect fine vessels while remaining robust at junctions.

• [cs.CV]**To Go or Not To Go? A Near Unsupervised Learning Approach For Robot Navigation**

*Noriaki Hirose, Amir Sadeghian, Patrick Goebel, Silvio Savarese*

http://arxiv.org/abs/1709.05439v1

It is important for robots to be able to decide whether they can go through a space or not, as they navigate through a dynamic environment. This capability can help them avoid injury or serious damage, e.g., as a result of running into people and obstacles, getting stuck, or falling off an edge. To this end, we propose an unsupervised and a near-unsupervised method based on Generative Adversarial Networks (GAN) to classify scenarios as traversable or not based on visual data. Our method is inspired by the recent success of data-driven approaches on computer vision problems and anomaly detection, and reduces the need for vast amounts of negative examples at training time. Collecting negative data indicating that a robot should not go through a space is typically hard and dangerous because of collisions, whereas collecting positive data can be automated and done safely based on the robot's own traveling experience. We verify the generality and effectiveness of the proposed approach on a test dataset collected in a previously unseen environment with a mobile robot. Furthermore, we show that our method can be used to build costmaps (we call as "GoNoGo" costmaps) for robot path planning using visual data only.

• [cs.CV]**Variational Methods for Normal Integration**

*Yvain Quéau, Jean-Denis Durou, Jean-François Aujol*

http://arxiv.org/abs/1709.05965v1

The need for an efficient method of integration of a dense normal field is inspired by several computer vision tasks, such as shape-from-shading, photometric stereo, deflectometry, etc. Inspired by edge-preserving methods from image processing, we study in this paper several variational approaches for normal integration, with a focus on non-rectangular domains, free boundary and depth discontinuities. We first introduce a new discretization for quadratic integration, which is designed to ensure both fast recovery and the ability to handle non-rectangular domains with a free boundary. Yet, with this solver, discontinuous surfaces can be handled only if the scene is first segmented into pieces without discontinuity. Hence, we then discuss several discontinuity-preserving strategies. Those inspired, respectively, by the Mumford-Shah segmentation method and by anisotropic diffusion, are shown to be the most effective for recovering discontinuities.

• [cs.CV]**Vehicle Tracking in Wide Area Motion Imagery via Stochastic Progressive Association Across Multiple Frames (SPAAM)**

*Ahmed Elliethy, Gaurav Sharma*

http://arxiv.org/abs/1709.06035v1

Vehicle tracking in Wide Area Motion Imagery (WAMI) relies on associating vehicle detections across multiple WAMI frames to form tracks corresponding to individual vehicles. The temporal window length, i.e., the number $M$ of sequential frames, over which associations are collectively estimated poses a trade-off between accuracy and computational complexity. A larger $M$ improves performance because the increased temporal context enables the use of motion models and allows occlusions and spurious detections to be handled better. The number of total hypotheses tracks, on the other hand, grows exponentially with increasing $M$, making larger values of $M$ computationally challenging to tackle. In this paper, we introduce SPAAM an iterative approach that progressively grows $M$ with each iteration to improve estimated tracks by exploiting the enlarged temporal context while keeping computation manageable through two novel approaches for pruning association hypotheses. First, guided by a road network, accurately co-registered to the WAMI frames, we disregard unlikely associations that do not agree with the road network. Second, as $M$ is progressively enlarged at each iteration, the related increase in association hypotheses is limited by revisiting only the subset of association possibilities rendered open by stochastically determined dis-associations for the previous iteration. The stochastic dis-association at each iteration maintains each estimated association according to an estimated probability for confidence, obtained via a probabilistic model. Associations at each iteration are then estimated globally over the $M$ frames by (approximately) solving a binary integer programming problem for selecting a set of compatible tracks. Vehicle tracking results obtained over test WAMI datasets indicate that our proposed approach provides significant performance improvements over 3 alternatives.

• [cs.CV]**Video Object Segmentation Without Temporal Information**

*Kevis-Kokitsi Maninis, Sergi Caelles, Yuhua Chen, Jordi Pont-Tuset, Laura Leal-Taixé, Daniel Cremers, Luc Van Gool*

http://arxiv.org/abs/1709.06031v1

Video Object Segmentation, and video processing in general, has been historically dominated by methods that rely on the temporal consistency and redundancy in consecutive video frames. When the temporal smoothness is suddenly broken, such as when an object is occluded, or some frames are missing in a sequence, the result of these methods can deteriorate significantly or they may not even produce any result at all. This paper explores the orthogonal approach of processing each frame independently, i.e disregarding the temporal information. In particular, it tackles the task of semi-supervised video object segmentation: the separation of an object from the background in a video, given its mask in the first frame. We present Semantic One-Shot Video Object Segmentation (OSVOS-S), based on a fully-convolutional neural network architecture that is able to successively transfer generic semantic information, learned on ImageNet, to the task of foreground segmentation, and finally to learning the appearance of a single annotated object of the test sequence (hence one shot). We show that instance level semantic information, when combined effectively, can dramatically improve the results of our previous method, OSVOS. We perform experiments on two recent video segmentation databases, which show that OSVOS-S is both the fastest and most accurate method in the state of the art.

• [cs.CV]**Where to Focus: Deep Attention-based Spatially Recurrent Bilinear Networks for Fine-Grained Visual Recognition**

*Lin Wu, Yang Wang*

http://arxiv.org/abs/1709.05769v1

Fine-grained visual recognition typically depends on modeling subtle difference from object parts. However, these parts often exhibit dramatic visual variations such as occlusions, viewpoints, and spatial transformations, making it hard to detect. In this paper, we present a novel attention-based model to automatically, selectively and accurately focus on critical object regions with higher importance against appearance variations. Given an image, two different Convolutional Neural Networks (CNNs) are constructed, where the outputs of two CNNs are correlated through bilinear pooling to simultaneously focus on discriminative regions and extract relevant features. To capture spatial distributions among the local regions with visual attention, soft attention based spatial Long-Short Term Memory units (LSTMs) are incorporated to realize spatially recurrent yet visually selective over local input patterns. All the above intuitions equip our network with the following novel model: two-stream CNN layers, bilinear pooling layer, spatial recurrent layer with location attention are jointly trained via an end-to-end fashion to serve as the part detector and feature extractor, whereby relevant features are localized and extracted attentively. We show the significance of our network against two well-known visual recognition tasks: fine-grained image classification and person re-identification.

• [cs.CV]**Zero-Shot Learning to Manage a Large Number of Place-Specific Compressive Change Classifiers**

*Tanaka Kanji*

http://arxiv.org/abs/1709.05397v1

With recent progress in large-scale map maintenance and long-term map learning, the task of change detection on a large-scale map from a visual image captured by a mobile robot has become a problem of increasing criticality. Previous approaches for change detection are typically based on image differencing and require the memorization of a prohibitively large number of mapped images in the above context. In contrast, this study follows the recent, efficient paradigm of change-classifier-learning and specifically employs a collection of place-specific change classifiers. Our change-classifier-learning algorithm is based on zero-shot learning (ZSL) and represents a place-specific change classifier by its training examples mined from an external knowledge base (EKB). The proposed algorithm exhibits several advantages. First, we are required to memorize only training examples (rather than the classifier itself), which can be further compressed in the form of bag-of-words (BoW). Secondly, we can incorporate the most recent map into the classifiers by straightforwardly adding or deleting a few training examples that correspond to these classifiers. Thirdly, we can share the BoW vocabulary with other related task scenarios (e.g., BoW-based self-localization), wherein the vocabulary is generally designed as a rich, continuously growing, and domain-adaptive knowledge base. In our contribution, the proposed algorithm is applied and evaluated on a practical long-term cross-season change detection system that consists of a large number of place-specific object-level change classifiers.

• [cs.DC]**Cost-Based Assessment of Partitioning Algorithms of Agent-Based Systems on Hybrid Cloud Environments**

*Chahrazed Labba, Narjès Bellamine Ben Saoud*

http://arxiv.org/abs/1709.05708v1

Distributing agent-based simulators reveals many challenges while deploying them on a hybrid cloud infrastructure. In fact, a researcher's main motivations by running simulations on hybrid clouds, are reaching more scalable systems as well as reducing monetary costs. Indeed, hybrid cloud environment, despite providing scalability and effective control over proper data, requires an efficient deployment strategy combining both an efficient partitioning mechanism and cost savings. In this paper, we propose a cost deployment model dedicated to distributed agent-based simulation systems. This cost model, combining general performance partitioning criteria as well as monetary costs, is used to evaluate cluster and grid based partitioning algorithms on hybrid cloud environments. The first experimental results show that, for a given agent-based model, a good partitioning method used with the suitable hybrid cloud environment lead to an efficient and economic deployment.

• [cs.DC]**Generalized PMC model for the hybrid diagnosis of multiprocessor systems**

*Qiang Zhu*

http://arxiv.org/abs/1709.05586v1

Fault diagnosis is important to the design and maintenance of large multiprocessor systems. PMC model is the most famous diagnosis model in the system level diagnosis of multiprocessor systems. Under the PMC model, only node faults are allowed. But in real circumstances, link faults may occur. So based on the PMC model, we propose in this paper a diagnosis model called the generalized PMC(GPMC) model to adapt to the real circumstances. The foundation of GPMC model has been established. And to measure the fault diagnosis capability of multiprocessor systems under the GPMC model, the fault diagnosis capability measuring parameters: $h$-edge restricted diagnosability and $h$-vertex restricted edge diagnosability have been introduced. As an application, the $h$-edge restricted diagnosability and $h$-vertex restricted edge diagnosability of hypercubes are explored. Finally, we present in this paper a diagnosis algorithm to locate faulty processors and faulty links in a multiprocessor system, simulation results show that the algorithm is quite efficient.

• [cs.DC]* Hybrid Fault diagnosis capability analysis of Hypercubes under the PMC model and MM model**

*Qiang Zhu, Lili Li, Sanyang Liu, Xing Zhang*

http://arxiv.org/abs/1709.05588v1

System level diagnosis is an important approach for the fault diagnosis of multiprocessor systems. In system level diagnosis, diagnosability is an important measure of the diagnosis capability of interconnection networks. But as a measure, diagnosability can not reflect the diagnosis capability of multiprocessor systems to link faults which may occur in real circumstances. In this paper, we propose the definition of $h$-edge tolerable diagnosability to better measure the diagnosis capability of interconnection networks under hybrid fault circumstances. The $h$-edge tolerable diagnosability of a multiprocessor system $G$ is the maximum number of faulty nodes that the system can guarantee to locate when the number of faulty edges does not exceed $h$,denoted by $t_h^{e}(G)$. The PMC model and MM model are the two most widely studied diagnosis models for the system level diagnosis of multiprocessor systems. The hypercubes are the most well-known interconnection networks. In this paper, the $h$-edge tolerable diagnosability of $n$-dimensional hypercube under the PMC model and MM$^{*}$ is determined as follows: $t_h^{e}(Q_n)= n-h$, where $1\leq h<n$, $n\geq3$.

• [cs.DC]**IBM Deep Learning Service**

*Bishwaranjan Bhattacharjee, Scott Boag, Chandani Doshi, Parijat Dube, Ben Herta, Vatche Ishakian, K. R. Jayaram, Rania Khalaf, Avesh Krishna, Yu Bo Li, Vinod Muthusamy, Ruchir Puri, Yufei Ren, Florian Rosenberg, Seetharami R. Seelam, Yandong Wang, Jian Ming Zhang, Li Zhang*

http://arxiv.org/abs/1709.05871v1

Deep learning driven by large neural network models is overtaking traditional machine learning methods for understanding unstructured and perceptual data domains such as speech, text, and vision. At the same time, the "as-a-Service"-based business model on the cloud is fundamentally transforming the information technology industry. These two trends: deep learning, and "as-a-service" are colliding to give rise to a new business model for cognitive application delivery: deep learning as a service in the cloud. In this paper, we will discuss the details of the software architecture behind IBM's deep learning as a service (DLaaS). DLaaS provides developers the flexibility to use popular deep learning libraries such as Caffe, Torch and TensorFlow, in the cloud in a scalable and resilient manner with minimal effort. The platform uses a distribution and orchestration layer that facilitates learning from a large amount of data in a reasonable amount of time across compute nodes. A resource provisioning layer enables flexible job management on heterogeneous resources, such as graphics processing units (GPUs) and central processing units (CPUs), in an infrastructure as a service (IaaS) cloud.

• [cs.DC]**Use of Information, Memory and Randomization in Asynchronous Gathering**

*Andrzej Pelc*

http://arxiv.org/abs/1709.05869v1

We investigate initial information, unbounded memory and randomization in gathering mobile agents on a grid. We construct a state machine, such that it is possible to gather, with probability 1, all configurations of its copies. This machine has initial input, unbounded memory, and is randomized. We show that no machine having any two of these capabilities but not the third, can be used to gather, with high probability, all configurations. We construct deterministic Turing Machines that are used to gather all connected configurations, and we construct deterministic finite automata that are used to gather all contractible connected configurations.

• [cs.DC]**sPIN: High-performance streaming Processing in the Network**

*Torsten Hoefler, Salvatore Di Girolamo, Konstantin Taranov, Ryan E. Grant, Ron Brightwell*

http://arxiv.org/abs/1709.05483v1

Optimizing communication performance is imperative for large-scale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data movement. However, these devices are limited to fixed functions, such as remote direct memory access. We develop sPIN, a portable programming model to offload simple packet processing functions to the network card. To demonstrate the potential of the model, we design a cycle-accurate simulation environment by combining the network simulator LogGOPSim and the CPU simulator gem5. We implement offloaded message matching, datatype processing, and collective communications and demonstrate transparent full-application speedups. Furthermore, we show how sPIN can be used to accelerate redundant in-memory filesystems and several other use cases. Our work investigates a portable packet-processing network acceleration model similar to compute acceleration with CUDA or OpenCL. We show how such network acceleration enables an eco-system that can significantly speed up applications and system services.

• [cs.DL]**SKOS Concepts and Natural Language Concepts: an Analysis of Latent Relationships in KOSs**

*Anna Mastora, Manolis Peponakis, Sarantos Kapidakis*

http://arxiv.org/abs/1709.05576v1

The vehicle to represent Knowledge Organization Systems (KOSs) in the environment of the Semantic Web and linked data is the Simple Knowledge Organization System (SKOS). SKOS provides a way to assign a URI to each concept, and this URI functions as a surrogate for the concept. This fact makes of main concern the need to clarify the URIs' ontological meaning. The aim of this study is to investigate the relation between the ontological substance of KOS concepts and concepts revealed through the grammatical and syntactic formalisms of natural language. For this purpose, we examined the dividableness of concepts in specific KOSs (i.e. a thesaurus, a subject headings system and a classification scheme) by applying Natural Language Processing (NLP) techniques (i.e. morphosyntactic analysis) to the lexical representations (i.e. RDF literals) of SKOS concepts. The results of the comparative analysis reveal that, despite the use of multi-word units, thesauri tend to represent concepts in a way that can hardly be further divided conceptually, while Subject Headings and Classification Schemes - to a certain extent - comprise terms that can be decomposed into more conceptual constituents. Consequently, SKOS concepts deriving from thesauri are more likely to represent atomic conceptual units and thus be more appropriate tools for inference and reasoning. Since identifiers represent the meaning of a concept, complex concepts are neither the most appropriate nor the most efficient way of modelling a KOS for the Semantic Web.

• [cs.DS]**Learning Depth-Three Neural Networks in Polynomial Time**

*Surbhi Goel, Adam Klivans*

http://arxiv.org/abs/1709.06010v1

We give a polynomial-time algorithm for learning neural networks with one hidden layer of sigmoids feeding into any smooth, monotone activation function (e.g. Sigmoid or ReLU). We make no assumptions on the structure of the network, and the algorithm succeeds with respect to any distribution on the unit ball in $n$ dimensions (hidden weight vectors also have unit norm). This is the first assumption-free, provably efficient algorithm for learning neural networks with more than one hidden layer. Our algorithm-- Alphatron-- is a simple, iterative update rule that combines isotonic regression with kernel methods. It outputs a hypothesis that yields efficient oracle access to interpretable features. It also suggests a new approach to Boolean function learning via smooth relaxations of hard thesholds, sidestepping traditional hardness results from computational learning theory. As applications, we obtain the first provably correct algorithms for common schemes in multiple-instance learning (in the difficult case where the examples within each bag are not identically distributed) as well the first polynomial-time algorithm for learning intersections of a polynomial number of halfspaces with a margin.

• [cs.ET]**MOL-Eye: A New Metric for the Performance Evaluation of a Molecular Signal**

*Meric Turan, Mehmet Sukru Kuran, H. Birkan Yilmaz, Chan-Byoung Chae, Tuna Tugcu*

http://arxiv.org/abs/1709.05604v1

Inspired by the eye diagram in classical radio frequency (RF) based communications, the MOL-Eye diagram is proposed for the performance evaluation of a molecular signal within the context of molecular communication. Utilizing various features of this diagram, three new metrics for the performance evaluation of a molecular signal, namely the maximum eye height, standard deviation of received molecules, and counting SNR (CSNR) are introduced. The applicability of these performance metrics in this domain is verified by comparing the performance of binary concentration shift keying (BCSK) and BCSK with consecutive power adjustment (BCSK-CPA) modulation techniques in a vessel-like environment with laminar flow. The results show that, in addition to classical performance metrics such as bit-error rate and channel capacity, these performance metrics can also be used to show the advantage of an efficient modulation technique over a simpler one.

• [cs.IR]**Anticipating Information Needs Based on Check-in Activity**

*Jan R. Benetka, Krisztian Balog, Kjetil Nørvåg*

http://arxiv.org/abs/1709.05749v1

In this work we address the development of a smart personal assistant that is capable of anticipating a user's information needs based on a novel type of context: the person's activity inferred from her check-in records on a location-based social network. Our main contribution is a method that translates a check-in activity into an information need, which is in turn addressed with an appropriate information card. This task is challenging because of the large number of possible activities and related information needs, which need to be addressed in a mobile dashboard that is limited in size. Our approach considers each possible activity that might follow after the last (and already finished) activity, and selects the top information cards such that they maximize the likelihood of satisfying the user's information needs for all possible future scenarios. The proposed models also incorporate knowledge about the temporal dynamics of information needs. Using a combination of historical check-in data and manual assessments collected via crowdsourcing, we show experimentally the effectiveness of our approach.

• [cs.IR]**MERF: Morphology-based Entity and Relational Entity Extraction Framework for Arabic**

*Amin A. Jaber, Fadi A. Zaraket*

http://arxiv.org/abs/1709.05700v1

Rule-based techniques and tools to extract entities and relational entities from documents allow users to specify desired entities using natural language questions, finite state automata, regular expressions, structured query language statements, or proprietary scripts. These techniques and tools require expertise in linguistics and programming and lack support of Arabic morphological analysis which is key to process Arabic text. In this work, we present MERF; a morphology-based entity and relational entity extraction framework for Arabic text. MERF provides a user-friendly interface where the user, with basic knowledge of linguistic features and regular expressions, defines tag types and interactively associates them with regular expressions defined over Boolean formulae. Boolean formulae range over matches of Arabic morphological features, and synonymity features. Users define user defined relations with tuples of subexpression matches and can associate code actions with subexpressions. MERF computes feature matches, regular expression matches, and constructs entities and relational entities from user defined relations. We evaluated our work with several case studies and compared with existing application-specific techniques. The results show that MERF requires shorter development time and effort compared to existing techniques and produces reasonably accurate results within a reasonable overhead in run time.

• [cs.IR]**Towards Building a Knowledge Base of Monetary Transactions from a News Collection**

*Jan R. Benetka, Krisztian Balog, Kjetil Nørvåg*

http://arxiv.org/abs/1709.05743v1

We address the problem of extracting structured representations of economic events from a large corpus of news articles, using a combination of natural language processing and machine learning techniques. The developed techniques allow for semi-automatic population of a financial knowledge base, which, in turn, may be used to support a range of data mining and exploration tasks. The key challenge we face in this domain is that the same event is often reported multiple times, with varying correctness of details. We address this challenge by first collecting all information pertinent to a given event from the entire corpus, then considering all possible representations of the event, and finally, using a supervised learning method, to rank these representations by the associated confidence scores. A main innovative element of our approach is that it jointly extracts and stores all attributes of the event as a single representation (quintuple). Using a purpose-built test set we demonstrate that our supervised learning approach can achieve 25% improvement in F1-score over baseline methods that consider the earliest, the latest or the most frequent reporting of the event.

• [cs.IT]**AG codes and AG quantum codes from cyclic extensions of the Suzuki and Ree curves**

*Maria Montanucci, Marco Timpanella, Giovanni Zini*

http://arxiv.org/abs/1709.05979v1

We investigate several types of linear codes constructed from two families $\tilde{\mathcal S}_q$ and $\tilde{\mathcal R}

q$ of maximal curves over finite fields recently constructed by Skabelund as cyclic covers of the Suzuki and Ree curves. Plane models for such curves are provided, and the Weierstrass semigroup $H(P)$ at an $\mathbb{F}{q}$-rational point $P$ is shown to be symmetric.

• [cs.IT]**Bounds on Binary Locally Repairable Codes Tolerating Multiple Erasures**

*Matthias Grezet, Ragnar Freij-Hollanti, Thomas Westerbäck, Oktay Olmez, Camilla Hollanti*

http://arxiv.org/abs/1709.05801v1

Recently, locally repairable codes has gained significant interest for their potential applications in distributed storage systems. However, most constructions in existence are over fields with size that grows with the number of servers, which makes the systems computationally expensive and difficult to maintain. Here, we study linear locally repairable codes over the binary field, tolerating multiple local erasures. We derive bounds on the minimum distance on such codes, and give examples of LRCs achieving these bounds. Our main technical tools come from matroid theory, and as a byproduct of our proofs, we show that the lattice of cyclic flats of a simple binary matroid is atomic.

• [cs.IT]**Challenges and potentials for visible light communications: State of the art**

*Pranav Kumar Jha, Neha Mishra, D. Sriram Kumar*

http://arxiv.org/abs/1709.05489v1

Visible Light Communication is the emerging field in the area of Indoor Optical Wireless Communication which uses white light LEDs for transmitting data and light simultaneously. LEDs can be modulated at very high speeds which increases its efficiency and enabling it for the dual purposes of data communication and illumination simultaneously. Radio Frequency have some limitations which is not at par with the current demand of bandwidth but using visible light, it is possible to achieve higher data rates per user. In this paper, we discuss some challenges, potentials and possible future applications for this new technology. Basically, visible light communication is for indoor application capable of multiuser access. We also design a very basic illumination pattern inside a room using uniform power distribution.

• [cs.IT]**Codes over Affine Algebras with a Finite Commutative Chain coefficient Ring**

*E. Martínez-Moro, A. Piñera-Nicolás, I. F. Rúa*

http://arxiv.org/abs/1709.05464v1

We consider codes defined over an affine algebra $\mathcal A=R[X_1,\dots,X_r]/\left\langle t_1(X_1),\dots,t_r(X_r)\right\rangle$, where $t_i(X_i)$ is a monic univariate polynomial over a finite commutative chain ring $R$. Namely, we study the $\mathcal A-$submodules of $\mathcal A^l$ ($l\in \mathbb{N}$). These codes generalize both the codes over finite quotients of polynomial rings and the multivariable codes over finite chain rings. {Some codes over Frobenius local rings that are not chain rings are also of this type}. A canonical generator matrix for these codes is introduced with the help of the Canonical Generating System. Duality of the codes is also considered.

• [cs.IT]**Cooperative Network Synchronization: Asymptotic Analysis**

*Yifeng Xiong, Nan Wu, Yuan Shen, Moe Z. Win*

http://arxiv.org/abs/1709.05476v1

Accurate clock synchronization is required for collaborative operations among nodes across wireless networks. Compared with traditional layer-by-layer methods, cooperative network synchronization techniques lead to significant improvement in performance, efficiency, and robustness. This paper develops a framework for the performance analysis of cooperative network synchronization. We introduce the concepts of cooperative dilution intensity (CDI) and relative CDI to characterize the interaction between agents, which can be interpreted as properties of a random walk over the network. Our approach enables us to derive closed-form asymptotic expressions of performance limits, relating them to the quality of observations as well as network topology.

• [cs.IT]**Finite-Alphabet Precoding for Massive MU-MIMO with Low-resolution DACs**

*Chang-Jen Wang, Chao-Kai Wen, Shi Jin, Shang-Ho Tsai*

http://arxiv.org/abs/1709.05755v1

Massive multiuser multiple-input multiple-output (MU-MIMO) systems are expected to be the core technology in fifth-generation wireless systems because they significantly improve spectral efficiency. However, the requirement for a large number of radio frequency (RF) chains results in high hardware costs and power consumption, which obstruct the commercial deployment of massive MIMO systems. A potential solution is to use low-resolution digital-to-analog converters (DAC)/analog-to-digital converters for each antenna and RF chain. However, using low-resolution DACs at the transmit side directly limits the degree of freedom of output signals and thus poses a challenge to the precoding design. In this study, we develop efficient and universal algorithms for a downlink massive MU-MIMO system with finite-alphabet precodings. Our algorithms are developed based on the alternating direction method of multipliers (ADMM) framework. The original ADMM does not converge in a nonlinear discrete optimization problem. The primary cause of this problem is that the alternating (update) directions in ADMM on one side are biased, and those on the other side are unbiased. By making the two updates consistent in an unbiased manner, we develop two algorithms: one demonstrates excellent performance and the other possesses a significantly low computational complexity. Compared with state-of-the-art techniques, the proposed precoding algorithms present significant advantages in performance and computational complexity.

• [cs.IT]**Indistinguishability and Energy Sensitivity of Asymptotically Gaussian Compressed Encryption**

*Nam Yul Yu*

http://arxiv.org/abs/1709.05744v1

The principle of compressed sensing (CS) can be applied in a cryptosystem by providing the notion of security. In information-theoretic sense, it is known that a CS-based cryptosystem can be perfectly secure if it employs a random Gaussian sensing matrix updated at each encryption and its plaintext has constant energy. In this paper, we propose a new CS-based cryptosystem that employs a secret bipolar keystream and a public unitary matrix, which can be suitable for practical implementation by generating and renewing the keystream in a fast and efficient manner. We demonstrate that the sensing matrix is asymptotically Gaussian for a sufficiently large plaintext length, which guarantees a reliable CS decryption for a legitimate recipient. By means of probability metrics, we also show that the new CS-based cryptosystem can have the indistinguishability against an adversary, as long as the keystream is updated at each encryption and each plaintext has constant energy. Finally, we investigate how much the security of the new CS-based cryptosystem is sensitive to energy variation of plaintexts.

• [cs.IT]**Millimeter Wave Channel Measurements and Implications for PHY Layer Design**

*Vasanthan Raghavan, Andrzej Partyka, Lida Akhoondzadehasl, Ali Tassoudji, Ozge Koymen, John Sanelli*

http://arxiv.org/abs/1709.05590v1

There has been an increasing interest in the millimeter wave (mmW) frequency regime in the design of next-generation wireless systems. The focus of this work is on understanding mmW channel properties that have an important bearing on the feasibility of mmW systems in practice and have a significant impact on physical (PHY) layer design. In this direction, simultaneous channel sounding measurements at 2.9, 29 and 61 GHz are performed at a number of transmit-receive location pairs in indoor office, shopping mall and outdoor environments. Based on these measurements, this paper first studies large-scale properties such as path loss and delay spread across different carrier frequencies in these scenarios. Towards the goal of understanding the feasibility of outdoor-to-indoor coverage, material measurements corresponding to mmW reflection and penetration are studied and significant notches in signal reception spread over a few GHz are reported. Finally, implications of these measurements on system design are discussed and multiple solutions are proposed to overcome these impairments.

• [cs.IT]**Modeling Co-location in Multi-Operator mmWave Networks with Spectrum Sharing**

*Rebal Jurdi, Abhishek K. Gupta, Jeffrey G. Andrews, Robert W. Heath Jr*

http://arxiv.org/abs/1709.05741v1

Competing cellular operators aggressively share infrastructure in many major US markets. If operators also were to share spectrum in next-generation mmWave networks, intra-cellular interference will become correlated with inter-cellular interference. We propose a mathematical framework to model a multi-operator mmWave cellular network with co-located base-stations. We then characterize the SINR distribution for an arbitrary network and derive its coverage probability. To understand how varying the spatial correlation between different networks affects coverage probability, we derive special results for the two-operator scenario, where we construct the operators' individual networks from a single network via probabilistic coupling. For external validation, we devise a method to quantify and estimate spatial correlation from actual base-station deployments. We compare our two-operator model against an actual macro-cell-dominated network and an actual DAS-node-dominated network of different scales. Using the actual deployment data to set the parameters of our model, we observe that coverage probabilities for the model and actual deployments not only compare very well to each other, but also match nearly perfectly for the case of the DAS-node-dominated deployment. Another interesting observation is that spectrum and infrastructure sharing has a lower rate coverage probability for lower thresholds, which would make it less suitable for low-rate applications.

• [cs.IT]**Multivariable codes in principal ideal polynomial quotient rings with applications to additive modular bivariate codes over $\mathbb{F}_4$**

*E. Martínez-Moro, A. Piñera-Nicolás, I. F. Rúa*

http://arxiv.org/abs/1709.05466v1

In this work, we study the structure of multivariable modular codes over finite chain rings when the ambient space is a principal ideal ring. We also provide some applications to additive modular codes over the finite field $\mathbb{F}_4$.

• [cs.IT]**Network Deployment for Maximal Energy Efficiency in Uplink with Zero-Forcing**

*Andrea Pizzo, Daniel Verenzuela, Luca Sanguinetti, Emil Björnson*

http://arxiv.org/abs/1709.06060v1

This work aims to design a cellular network for maximal energy efficiency (EE). In particular, we consider the uplink with multi-antenna base stations and assume that zero- forcing (ZF) combining is used for data detection with imperfect channel state information. Using stochastic geometry and a new lower bound on the average per-user spectral efficiency of the network, we optimize the pilot reuse factor, number of antennas and users per base station. Closed-form expressions are computed from which valuable insights into the interplay between the optimization variables, hardware characteristics, and propagation environment are obtained. Numerical results are used to validate the analysis and make comparisons with a network using maximum ratio (MR) combining. The results show that a Massive MIMO setup arises as the EE-optimal network configuration. In addition, ZF provides higher EE than MR while allowing a smaller pilot reuse factor and a more dense network deployment.

• [cs.IT]**On the Restricted Isometry of the Columnwise Khatri-Rao Product**

*Saurabh Khanna, Chandra R Murthy*

http://arxiv.org/abs/1709.05789v1

The columnwise Khatri-Rao product of two matrices is an important matrix type, reprising its role as a structured sensing matrix in many fundamental linear inverse problems. Robust signal recovery in such inverse problems is often contingent on proving the restricted isometry property (RIP) of a certain system matrix expressible as a Khatri-Rao product of two matrices. In this work, we analyze the RIP of a generic columnwise Khatri-Rao product by deriving two upper bounds for its $k$-th order Restricted Isometry Constant ($k$-RIC) for different values of $k$. The first RIC bound is given in terms of the individual RICs of the input matrices participating in the Khatri-Rao product. The second RIC bound is probabilistic in nature, and is specified in terms of the input matrix dimensions. We show that the Khatri-Rao product of a pair of $m \times n$ sized random matrices comprising independent and identically distributed subgaussian entries satisfies $k$-RIP with arbitrarily high probability, provided $m$ exceeds $\mathcal{O}(\sqrt{k} \log^{3/2} n)$. This is a milder condition compared to $\mathcal{O}(k \log n)$ rows needed to guarantee $k$-RIP of the input subgaussian random matrices participating in the Khatri-Rao product. Our results confirm that the Khatri-Rao product exhibits stronger restricted isometry compared to its constituent matrices for the same RIP order. The proposed RIC bounds are potentially useful in obtaining improved performance guarantees in several sparse signal recovery and tensor decomposition problems.

• [cs.IT]**Performance Analysis of FSO System with Spatial Diversity and Relays for M-QAM over Log-Normal Channel**

*Pranav Kumar Jha, Nitin Kachare, K Kalyani, D. Sriram Kumar*

http://arxiv.org/abs/1709.05488v1

The performance analysis of free space optical communication (FSO) systems using relays and spatial diversity at the transmitter end is presented in this paper. The impact of atmospheric turbulence and attenuation caused by different weather conditions and geometric losses has also been taken into account. The effect of turbulence is modeled over a log-normal probability density function. We present the exact closed form expressions of bit-error rate (BER) for M-ary quadrature amplitude modulation (M-QAM). The FSO system link performance is compared for on-off keying, M-ary pulse amplitude modulation and M-QAM modulation techniques. For relay based free space optical communication systems, M-QAM is proved to be superior than other systems considering the same spectral efficiency for each system. A significant performance enhancement in terms of BER analysis and SNR gains is shown for multi-hop MISO FSO system.

• [cs.IT]**Performance Evaluation of Spatial Complementary Code Keying Modulation in MIMO Systems**

*A. H. Jafari, T. O'Farrell*

http://arxiv.org/abs/1709.05525v1

Spatial complementary code keying modulation (SCCKM) is proposed as a novel block coding modulation scheme. An input binary sequence is modulated based on the different lengths of complementary code keying (CCK) modulation and then spread across the transmit antennas (spatial domain) in a multiple input multiple output (MIMO) system exploiting orthogonal frequency division multiplexing (OFDM). At the receiver side, zero forcing equalization is applied to the OFDM modulated data to mitigate the effect of the multipath fast fading channel and then followed by maximum likelihood (ML) detection to retrieve the input sequence. The performance of SCCKM in different MIMO systems is compared to that of spatial modulation (SM) as a baseline scheme. Simulation results show that for the same spectral efficiency, SCCKM is able to substantially improve the bit error rate (BER).

• [cs.IT]**Performance analysis of dual-hop optical wireless communication systems over k-distribution turbulence channel with pointing error**

*Neha Mishra, D. Sriram Kumar, Pranav Kumar Jha*

http://arxiv.org/abs/1709.05490v1

In this paper, we investigate the performance of the dual-hop free space optical (FSO) communication systems under the effect of strong atmospheric turbulence together with misalignment effects (pointing error). We consider a relay assisted link using decode and forward (DF) relaying protocol between source and destination with the assumption that Channel State Information is available at both transmitting and receiving terminals. The atmospheric turbulence channels are modeled by k-distribution with pointing error impairment. The exact closed form expression is derived for outage probability and bit error rate and illustrated through numerical plots. Further BER results are compared for the different modulation schemes.

• [cs.IT]**Rapid Fading Due to Human Blockage in Pedestrian Crowds at 5G Millimeter-Wave Frequencies**

*George R. MacCartney Jr., Theodore S. Rappaport, Sundeep Rangan*

http://arxiv.org/abs/1709.05883v1

Rapidly fading channels caused by pedestrians in dense urban environments will have a significant impact on millimeter-wave (mmWave) communications systems that employ electrically-steerable and narrow beamwidth antenna arrays. A peer-to-peer (P2P) measurement campaign was conducted with 7-degree, 15-degree, and 60-degree half-power beamwidth (HPBW) antenna pairs at 73.5 GHz and with 1 GHz of RF null-to-null bandwidth in a heavily populated open square scenario in Brooklyn, New York, to study blockage events caused by typical pedestrian traffic. Antenna beamwidths that range approximately an order of magnitude were selected to gain knowledge of fading events for antennas with different beamwidths since antenna patterns for mmWave systems will be electronically-adjustable. Two simple modeling approaches in the literature are introduced to characterize the blockage events by either a two-state Markov model or a four-state piecewise linear modeling approach. Transition probability rates are determined from the measurements and it is shown that average fade durations with a -5 dB threshold are 299.0 ms for 7-degree HPBW antennas and 260.2 ms for 60-degree HPBW antennas. The four-state piecewise linear modeling approach shows that signal strength decay and rise times are asymmetric for blockage events and that mean signal attenuations (average fade depths) are inversely proportional to antenna HPBW, where 7-degree and 60-degree HPBW antennas resulted in mean signal fades of 15.8 dB and 11.5 dB, respectively. The models presented herein are valuable for extending statistical channel models at mmWave to accurately simulate real-world pedestrian blockage events when designing fifth-generation (5G) wireless systems.

• [cs.IT]**Reliability of Multicast under Random Linear Network Coding**

*Evgeny Tsimbalo, Andrea Tassi, Robert J. Piechocki*

http://arxiv.org/abs/1709.05477v1

We consider a lossy multicast network in which the reliability is provided by means of Random Linear Network Coding. Our goal is to characterise the performance of such network in terms of the probability that a source message is delivered to all destination nodes. Previous studies considered coding over large finite fields, small numbers of destination nodes or specific, often impractical, channel conditions. In contrast, we focus on a general problem, considering arbitrary field size and number of destination nodes, as well as a realistic channel. We propose a lower bound on the probability of successful delivery, which is more accurate than the approximation commonly used in the literature. In addition, we present a novel analysis of the systematic version of RLNC and propose a simpler, but sufficiently close lower bound. The accuracy of the proposed bounds is verified via extensive Monte Carlo simulations, where the impact of the network and code parameters are investigated. Specifically, we show that the mean square error of the bounds for a ten-user network can be as low as $9 \cdot 10^{-5}$ and $2 \cdot 10^{-6}$ for the non-systematic and systematic cases, respectively.

• [cs.IT]**Secrecy Rate of Distributed Cooperative MIMO in the Presence of Multi-Antenna Eavesdropper**

*Zhong Zheng, Zygmunt Haas, Mario Kieburg*

http://arxiv.org/abs/1709.05383v1

We propose and study the secrecy cooperative MIMO architecture to enable and to improve the secrecy transmissions between clusters of mobile devices in presence of a multi-antenna eavesdropper. The cooperative MIMO is formed by temporarily activating clusters of the nearby trusted devices, with each cluster being centrally coordinated by its corresponding cluster head. We assume that the transmitters apply a practical eigen-direction precoding scheme and that the eavesdropper has multiple possible locations in the proximity of the legitimate devices. We first obtain the expression of the secrecy rate, where the required ergodic mutual information between the transmit cluster and the eavesdropper is characterized by closed-form approximations. The proposed approximations are accurate, and are especially useful for the secrecy rate maximization in terms of the computational complexity. The secrecy rate maximization is then recast into the difference convex programming, which can be solved by an iterative outer approximation algorithm. Numerical results show that the achievable secrecy rate can be effectively improved by activating at least the number of trusted devices as compared to the number of antennas at the eavesdropper. The secrecy rate is further improved by increasing the cluster size.

• [cs.IT]**Stable Recovery of Structured Signals From Corrupted Sub-Gaussian Measurements**

*Jinchi Chen, Yulong Liu*

http://arxiv.org/abs/1709.05827v1

This paper studies the problem of accurately recovering a structured signal from a small number of corrupted sub-Gaussian measurements. We consider three different procedures to reconstruct signal and corruption when different kinds of prior knowledge are available. In each case, we provide conditions (in terms of the number of measurements) for stable signal recovery from structured corruption with added unstructured noise. Our results theoretically demonstrate how to choose the regularization parameters in both partially and fully penalized recovery procedures and shed some light on the relationships among the three procedures. The key ingredient in our analysis is an extended matrix deviation inequality for isotropic sub-Gaussian matrices, which implies a tight lower bound for the restricted singular value of the extended sensing matrix. Numerical experiments are presented to verify our theoretical results.

• [cs.IT]**The Stochastic Geometry Analyses of Cellular Networks with α-Stable Self-Similarity**

*Rongpeng Li, Zhifeng Zhao, Yi Zhong, Chen Qi, Honggang Zhang*

http://arxiv.org/abs/1709.05733v1

To understand the spatial deployment of base stations (BSs) is the first step to facilitate the performance analyses of cellular networks, as well as the design of efficient networking protocols. Poisson point process (PPP) has been widely adopted to characterize the deployment of BSs and established the reputation to give tractable results in the stochastic geometry analyses. However, given the total randomness assumption in PPP, its accuracy has been recently questioned. On the other hand, the actual deployment of BSs during the past long evolution is highly correlated with heavy-tailed human activities. The {\alpha}-stable distribution, one kind of heavy-tailed distributions, has demonstrated superior accuracy to statistically model the spatial density of BSs. In this paper, we start with the new findings on {\alpha}-stable distributed BSs deployment and investigate the intrinsic feature (i.e., the spatial self-similarity) embedded in the BSs. Based on these findings, we perform the coverage probability analyses and provide the relevant theoretical results. In particular, we show that for some special cases, our work could reduce to the fundamental work by J. G. Andrews. We also examine the network performance under extensive simulation settings and validate that the simulations results are consistent with our theoretical derivations.

• [cs.LG]**Autoencoder-Driven Weather Clustering for Source Estimation during Nuclear Events**

*I. A. Klampanos, A. Davvetas, S. Andronopoulos, C. Pappas, A. Ikonomopoulos, V. Karkaletsis*

http://arxiv.org/abs/1709.05840v1

Emergency response applications for nuclear or radiological events can be significantly improved via deep feature learning due to the hidden complexity of the data and models involved. In this paper we present a novel methodology for rapid source estimation during radiological releases based on deep feature extraction and weather clustering. Atmospheric dispersions are then calculated based on identified predominant weather patterns and are matched against simulated incidents indicated by radiation readings on the ground. We evaluate the accuracy of our methods over multiple years of weather reanalysis data in the European region. We juxtapose these results with deep classification convolution networks and discuss advantages and disadvantages.

• [cs.LG]**Deep Automated Multit-task Learning**

*Davis Liang, Yan Shu*

http://arxiv.org/abs/1709.05554v1

Multi-task learning (MTL) has recently contributed to learning better representations in service of various NLP tasks. MTL aims at improving the performance of a primary task, by jointly training on a secondary task. This paper introduces automated tasks, which exploit the sequential nature of the input data, as secondary tasks in an MTL model. We explore next word prediction, next character prediction, and missing word completion as potential automated tasks. Our results show that training on a primary task in parallel with a secondary automated task improves both the convergence speed and accuracy for the primary task. We suggest two methods for augmenting an existing network with automated tasks and establish better performance in topic prediction, sentiment analysis, and hashtag recommendation. Finally, we show that the MTL models can perform well on datasets that are small and colloquial by nature.

• [cs.LG]**Deep Scattering: Rendering Atmospheric Clouds with Radiance-Predicting Neural Networks**

*Simon Kallweit, Thomas Müller, Brian McWilliams, Markus Gross, Jan Novák*

http://arxiv.org/abs/1709.05418v1

We present a technique for efficiently synthesizing images of atmospheric clouds using a combination of Monte Carlo integration and neural networks. The intricacies of Lorenz-Mie scattering and the high albedo of cloud-forming aerosols make rendering of clouds---e.g. the characteristic silverlining and the "whiteness" of the inner body---challenging for methods based solely on Monte Carlo integration or diffusion theory. We approach the problem differently. Instead of simulating all light transport during rendering, we pre-learn the spatial and directional distribution of radiant flux from tens of cloud exemplars. To render a new scene, we sample visible points of the cloud and, for each, extract a hierarchical 3D descriptor of the cloud geometry with respect to the shading location and the light source. The descriptor is input to a deep neural network that predicts the radiance function for each shading configuration. We make the key observation that progressively feeding the hierarchical descriptor into the network enhances the network's ability to learn faster and predict with high accuracy while using few coefficients. We also employ a block design with residual connections to further improve performance. A GPU implementation of our method synthesizes images of clouds that are nearly indistinguishable from the reference solution within seconds interactively. Our method thus represents a viable solution for applications such as cloud design and, thanks to its temporal stability, also for high-quality production of animated content.

• [cs.LG]**FlashProfile: Interactive Synthesis of Syntactic Profiles**

*Saswat Padhi, Prateek Jain, Daniel Perelman, Oleksandr Polozov, Sumit Gulwani, Todd Millstein*

http://arxiv.org/abs/1709.05725v1

We address the problem of learning comprehensive syntactic profiles for a set of strings. Real-world datasets, typically curated from multiple sources, often contain data in various formats. Thus any data processing task is preceded by the critical step of data format identification. However, manual inspection of data to identify various formats is infeasible in standard big-data scenarios. We present a technique for generating comprehensive syntactic profiles in terms of user-defined patterns that also allows for interactive refinement. We define a syntactic profile as a set of succinct patterns that describe the entire dataset. Our approach efficiently learns such profiles, and allows refinement by exposing a desired number of patterns. Our implementation, FlashProfile, shows a median profiling time of 0.7s over 142 tasks on 74 real datasets. We also show that access to the generated data profiles allow for more accurate synthesis of programs, using fewer examples in programming-by-example workflows.

• [cs.LG]**Grade Prediction with Temporal Course-wise Influence**

*Zhiyun Ren, Xia Ning, Huzefa Rangwala*

http://arxiv.org/abs/1709.05433v1

There is a critical need to develop new educational technology applications that analyze the data collected by universities to ensure that students graduate in a timely fashion (4 to 6 years); and they are well prepared for jobs in their respective fields of study. In this paper, we present a novel approach for analyzing historical educational records from a large, public university to perform next-term grade prediction; i.e., to estimate the grades that a student will get in a course that he/she will enroll in the next term. Accurate next-term grade prediction holds the promise for better student degree planning, personalized advising and automated interventions to ensure that students stay on track in their chosen degree program and graduate on time. We present a factorization-based approach called Matrix Factorization with Temporal Course-wise Influence that incorporates course-wise influence effects and temporal effects for grade prediction. In this model, students and courses are represented in a latent "knowledge" space. The grade of a student on a course is modeled as the similarity of their latent representation in the "knowledge" space. Course-wise influence is considered as an additional factor in the grade prediction. Our experimental results show that the proposed method outperforms several baseline approaches and infer meaningful patterns between pairs of courses within academic programs.

• [cs.LG]**Leveraging Distributional Semantics for Multi-Label Learning**

*Rahul Wadbude, Vivek Gupta, Piyush Rai, Nagarajan Natarajan, Harish Karnick*

http://arxiv.org/abs/1709.05976v1

We present a novel and scalable label embedding framework for large-scale multi-label learning a.k.a ExMLDS (Extreme Multi-Label Learning using Distributional Semantics). Our approach draws inspiration from ideas rooted in distributional semantics, specifically the Skip Gram Negative Sampling (SGNS) approach, widely used to learn word embeddings for natural language processing tasks. Learning such embeddings can be reduced to a certain matrix factorization. Our approach is novel in that it highlights interesting connections between label embedding methods used for multi-label learning and paragraph/document embedding methods commonly used for learning representations of text data. The framework can also be easily extended to incorporate auxiliary information such as label-label correlations; this is crucial especially when there are a lot of missing labels in the training data. We demonstrate the effectiveness of our approach through an extensive set of experiments on a variety of benchmark datasets, and show that the proposed learning methods perform favorably compared to several baselines and state-of-the-art methods for large-scale multi-label learning.

• [cs.LG]**Minimal Effort Back Propagation for Convolutional Neural Networks**

*Bingzhen Wei, Xu Sun, Xuancheng Ren, Jingjing Xu*

http://arxiv.org/abs/1709.05804v1

As traditional neural network consumes a significant amount of computing resources during back propagation, \citet{Sun2017mePropSB} propose a simple yet effective technique to alleviate this problem. In this technique, only a small subset of the full gradients are computed to update the model parameters. In this paper we extend this technique into the Convolutional Neural Network(CNN) to reduce calculation in back propagation, and the surprising results verify its validity in CNN: only 5% of the gradients are passed back but the model still achieves the same effect as the traditional CNN, or even better. We also show that the top-$k$ selection of gradients leads to a sparse calculation in back propagation, which may bring significant computational benefits for high computational complexity of convolution operation in CNN.

• [cs.LG]**Multi-Agent Distributed Lifelong Learning for Collective Knowledge Acquisition**

*Mohammad Rostami, Soheil Kolouri, Kyungnam Kim, Eric Eaton*

http://arxiv.org/abs/1709.05412v1

Lifelong machine learning methods acquire knowledge over a series of consecutive tasks, continually building upon their experience. Current lifelong learning algorithms rely upon a single learning agent that has centralized access to all data. In this paper, we extend the idea of lifelong learning from a single agent to a network of multiple agents that collectively learn a series of tasks. Each agent faces some (potentially unique) set of tasks; the key idea is that knowledge learned from these tasks may benefit other agents trying to learn different (but related) tasks. Our Collective Lifelong Learning Algorithm (CoLLA) provides an efficient way for a network of agents to share their learned knowledge in a distributed and decentralized manner, while preserving the privacy of the locally observed data. We provide theoretical guarantees for robust performance of the algorithm and empirically demonstrate that CoLLA outperforms existing approaches for distributed multi-task learning on a variety of data sets.

• [cs.LG]**Multi-Entity Dependence Learning with Rich Context via Conditional Variational Auto-encoder**

*Luming Tang, Yexiang Xue, Di Chen, Carla P. Gomes*

http://arxiv.org/abs/1709.05612v1

Multi-Entity Dependence Learning (MEDL) explores conditional correlations among multiple entities. The availability of rich contextual information requires a nimble learning scheme that tightly integrates with deep neural networks and has the ability to capture correlation structures among exponentially many outcomes. We propose MEDL_CVAE, which encodes a conditional multivariate distribution as a generating process. As a result, the variational lower bound of the joint likelihood can be optimized via a conditional variational auto-encoder and trained end-to-end on GPUs. Our MEDL_CVAE was motivated by two real-world applications in computational sustainability: one studies the spatial correlation among multiple bird species using the eBird data and the other models multi-dimensional landscape composition and human footprint in the Amazon rainforest with satellite images. We show that MEDL_CVAE captures rich dependency structures, scales better than previous methods, and further improves on the joint likelihood taking advantage of very large datasets that are beyond the capacity of previous methods.

• [cs.LG]**Multi-Modal Multi-Task Deep Learning for Autonomous Driving**

*Sauhaarda Chowdhuri, Tushar Pankaj, Karl Zipser*

http://arxiv.org/abs/1709.05581v1

Several deep learning approaches have been applied to the autonomous driving task, many employing end-to-end deep neural networks. Autonomous driving is complex, utilizing multiple behavioral modalities ranging from lane changing to turning and stopping. However, most existing approaches do not factor in the different behavioral modalities of the driving task into the training strategy. This paper describes a technique for using Multi-Modal Multi-Task Learning that considers multiple behavioral modalities as distinct modes of operation for an end-to-end autonomous deep neural network utilizing the insertion of modal information as secondary input data. Using labeled data from hours of driving our fleet of 1/10th scale model cars, we trained multiple neural networks to imitate the steering angle and driving speed of human control of a car. We show that in each case, our models trained with MTL can match or outperform multiple networks trained on individual tasks, while using a fraction of the parameters and having more distinct modes of operation than a network trained without MTL on the same multi-modal data. These results should encourage Multi-Modal MTL-style training with the insertion of Modal Information for tasks with related behaviors.

• [cs.LG]**N2N Learning: Network to Network Compression via Policy Gradient Reinforcement Learning**

*Anubhav Ashok, Nicholas Rhinehart, Fares Beainy, Kris M. Kitani*

http://arxiv.org/abs/1709.06030v1

While bigger and deeper neural network architectures continue to advance the state-of-the-art for many computer vision tasks, real-world adoption of these networks is impeded by hardware and speed constraints. Conventional model compression methods attempt to address this problem by modifying the architecture manually or using pre-defined heuristics. Since the space of all reduced architectures is very large, modifying the architecture of a deep neural network in this way is a difficult task. In this paper, we tackle this issue by introducing a principled method for learning reduced network architectures in a data-driven way using reinforcement learning. Our approach takes a larger

`teacher' network as input and outputs a compressed`

student' network derived from the`teacher' network. In the first stage of our method, a recurrent policy network aggressively removes layers from the large`

teacher' model. In the second stage, another recurrent policy network carefully reduces the size of each remaining layer. The resulting network is then evaluated to obtain a reward -- a score based on the accuracy and compression of the network. Our approach uses this reward signal with policy gradients to train the policies to find a locally optimal student network. Our experiments show that we can achieve compression rates of more than 10x for models such as ResNet-34 while maintaining similar performance to the input`teacher' network. We also present a valuable transfer learning result which shows that policies which are pre-trained on smaller`

teacher' networks can be used to rapidly speed up training on larger `teacher' networks.

• [cs.LG]**On Inductive Abilities of Latent Factor Models for Relational Learning**

*Théo Trouillon, Éric Gaussier, Christopher R. Dance, Guillaume Bouchard*

http://arxiv.org/abs/1709.05666v1

Latent factor models are increasingly popular for modeling multi-relational knowledge graphs. By their vectorial nature, it is not only hard to interpret why this class of models works so well, but also to understand where they fail and how they might be improved. We conduct an experimental survey of state-of-the-art models, not towards a purely comparative end, but as a means to get insight about their inductive abilities. To assess the strengths and weaknesses of each model, we create simple tasks that exhibit first, atomic properties of binary relations, and then, common inter-relational inference through synthetic genealogies. Based on these experimental results, we propose new research directions to improve on existing models.

• [cs.LG]**Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents**

*Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew Hausknecht, Michael Bowling*

http://arxiv.org/abs/1709.06009v1

The Arcade Learning Environment (ALE) is an evaluation platform that poses the challenge of building AI agents with general competency across dozens of Atari 2600 games. It supports a variety of different problem settings and it has been receiving increasing attention from the scientific community, leading to some high-profile success stories such as the much publicized Deep Q-Networks (DQN). In this article we take a big picture look at how the ALE is being used by the research community. We show how diverse the evaluation methodologies in the ALE have become with time, and highlight some key concerns when evaluating agents in the ALE. We use this discussion to present some methodological best practices and provide new benchmark results using these best practices. To further the progress in the field, we introduce a new version of the ALE that supports multiple game modes and provides a form of stochasticity we call sticky actions. We conclude this big picture look by revisiting challenges posed when the ALE was introduced, summarizing the state-of-the-art in various problems and highlighting problems that remain open.

• [cs.LG]**Why Pay More When You Can Pay Less: A Joint Learning Framework for Active Feature Acquisition and Classification**

*Hajin Shim, Sung Ju Hwang, Eunho Yang*

http://arxiv.org/abs/1709.05964v1

We consider the problem of active feature acquisition, where we sequentially select the subset of features in order to achieve the maximum prediction performance in the most cost-effective way. In this work, we formulate this active feature acquisition problem as a reinforcement learning problem, and provide a novel framework for jointly learning both the RL agent and the classifier (environment). We also introduce a more systematic way of encoding subsets of features that can properly handle innate challenge with missing entries in active feature acquisition problems, that uses the orderless LSTM-based set encoding mechanism that readily fits in the joint learning framework. We evaluate our model on a carefully designed synthetic dataset for the active feature acquisition as well as several real datasets such as electric health record (EHR) datasets, on which it outperforms all baselines in terms of prediction performance as well feature acquisition cost.

• [cs.MA]**Guided Deep Reinforcement Learning for Swarm Systems**

*Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann*

http://arxiv.org/abs/1709.06011v1

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.

• [cs.NE]**$ε$-Lexicase selection: a probabilistic and multi-objective analysis of lexicase selection in continuous domains**

*William La Cava, Thomas Helmuth, Lee Spector, Jason H. Moore*

http://arxiv.org/abs/1709.05394v1

Lexicase selection is a parent selection method that considers training cases individually, rather than in aggregate, when performing parent selection. Whereas previous work has demonstrated the ability of lexicase selection to solve difficult problems, the central goal of this paper is to develop the theoretical underpinnings that explain its performance. To this end, we derive an analytical formula that gives the expected probabilities of selection under lexicase selection, given a population and its behavior. In addition, we expand upon the relation of lexicase selection to many-objective optimization methods to describe the behavior of lexicase, which is to select individuals on the boundaries of Pareto fronts in high-dimensional space. We show analytically why lexicase selection performs more poorly for certain sizes of population and training cases, and show why it has been shown to perform more poorly in continuous error spaces. To address this last concern, we introduce $\epsilon$-lexicase selection, which modifies the pass condition in lexicase selection to allow near-elite individuals to pass cases, thereby improving selection performance with continuous errors. We show that $\epsilon$-lexicase outperforms several diversity-maintenance strategies on a number of real-world and synthetic regression problems.

• [cs.NE]**Dynamic Capacity Estimation in Hopfield Networks**

*Saarthak Sarup, Mingoo Seok*

http://arxiv.org/abs/1709.05340v1

Understanding the memory capacity of neural networks remains a challenging problem in implementing artificial intelligence systems. In this paper, we address the notion of capacity with respect to Hopfield networks and propose a dynamic approach to monitoring a network's capacity. We define our understanding of capacity as the maximum number of stored patterns which can be retrieved when probed by the stored patterns. Prior work in this area has presented static expressions dependent on neuron count $N$, forcing network designers to assume worst-case input characteristics for bias and correlation when setting the capacity of the network. Instead, our model operates simultaneously with the learning Hopfield network and concludes on a capacity estimate based on the patterns which were stored. By continuously updating the crosstalk associated with the stored patterns, our model guards the network from overwriting its memory traces and exceeding its capacity. We simulate our model using artificially generated random patterns, which can be set to a desired bias and correlation, and observe capacity estimates between 93% and 97% accurate. As a result, our model doubles the memory efficiency of Hopfield networks in comparison to the static and worst-case capacity estimate while minimizing the risk of lost patterns.

• [cs.NE]**Push and Pull Search for Solving Constrained Multi-objective Optimization Problems**

*Zhun Fan, Wenji Li, Xinye Cai, Hui Li, Caimin Wei, Qingfu Zhang, Kalyanmoy Deb, Erik D. Goodman*

http://arxiv.org/abs/1709.05915v1

This paper proposes a push and pull search (PPS) framework for solving constrained multi-objective optimization problems (CMOPs). To be more specific, the proposed PPS divides the search process into two different stages, including the push and pull search stages. In the push stage, a multi-objective evolutionary algorithm (MOEA) is adopted to explore the search space without considering any constraints, which can help to get across infeasible regions very fast and approach the unconstrained Pareto front. Furthermore, the landscape of CMOPs with constraints can be probed and estimated in the push stage, which can be utilized to conduct the parameters setting for constraint-handling approaches applied in the pull stage. Then, a constrained multi-objective evolutionary algorithm (CMOEA) equipped with an improved epsilon constraint-handling is applied to pull the infeasible individuals achieved in the push stage to the feasible and non-dominated regions. Compared with other CMOEAs, the proposed PPS method can more efficiently get across infeasible regions and converge to the feasible and non-dominated regions by applying push and pull search strategies at different stages. To evaluate the performance regarding convergence and diversity, a set of benchmark CMOPs is used to test the proposed PPS and compare with other five CMOEAs, including MOEA/D-CDP, MOEA/D-SR, C-MOEA/D, MOEA/D-Epsilon and MOEA/D-IEpsilon. The comprehensive experimental results demonstrate that the proposed PPS achieves significantly better or competitive performance than the other five CMOEAs on most of the benchmark set.

• [cs.NI]**Channel Access Method Classification For Cognitive Radio Applications**

*Mihir Laghate, Paulo Urriza, Danijela Cabric*

http://arxiv.org/abs/1709.05460v1

Motivated by improved detection and prediction of temporal holes, we propose a two stage algorithm to classify the channel access method used by a primary network. The first stage extends an existing fourth-order cumulant-based modulation classifier to distinguish between TDMA, OFDMA, and CDMA. The second stage proposes a novel collision detector using the sample variance of the same cumulant to detect contention-based channel access methods. Our proposed method is blind and independent of the received SNR. Simulations show that our classification of TDMA, OFDMA, and CDMA is robust to network load while detection of contention outperforms existing methods.

• [cs.RO]**A novel Skill-based Programming Paradigm based on Autonomous Playing and Skill-centric Testing**

*Simon Hangl, Andreas Mennel, Justus Piater*

http://arxiv.org/abs/1709.06049v1

We introduce a novel paradigm for robot pro- gramming with which we aim to make robot programming more accessible for unexperienced users. In order to do so we incorporate two major components in one single framework: autonomous skill acquisition by robotic playing and visual programming. Simple robot program skeletons solving a task for one specific situation, so-called basic behaviours, are provided by the user. The robot then learns how to solve the same task in many different situations by autonomous playing which reduces the barrier for unexperienced robot programmers. Programmers can use a mix of visual programming and kinesthetic teaching in order to provide these simple program skeletons. The robot program can be implemented interactively by programming parts with visual programming and kinesthetic teaching. We further integrate work on experience-based skill-centric robot software testing which enables the user to continuously test implemented skills without having to deal with the details of specific components.

• [cs.RO]**AA-ICP: Iterative Closest Point with Anderson Acceleration**

*A. L. Pavlov, G. V. Ovchinnikov, D. Yu. Derbyshev, D. Tsetserukou, I. V. Oseledets*

http://arxiv.org/abs/1709.05479v1

Iterative Closest Point (ICP) is a widely used method for performing scan-matching and registration. Being simple and robust method, it is still computationally expensive and may be challenging to use in real-time applications with limited resources on mobile platforms. In this paper we propose novel effective method for acceleration of ICP which does not require substantial modifications to the existing code. This method is based on an idea of Anderson acceleration which is an iterative procedure for finding a fixed point of contractive mapping. The latter is often faster than a standard Picard iteration, usually used in ICP implementations. We show that ICP, being a fixed point problem, can be significantly accelerated by this method enhanced by heuristics to improve overall robustness. We implement proposed approach into Point Cloud Library (PCL) and make it available online. Benchmarking on real-world data fully supports our claims.

• [cs.RO]**Bayesian Optimization Using Domain Knowledge on the ATRIAS Biped**

*Akshara Rai, Rika Antonova, Seungmoon Song, William Martin, Hartmut Geyer, Christopher G. Atkeson*

http://arxiv.org/abs/1709.06047v1

Controllers in robotics often consist of expert-designed heuristics, which can be hard to tune in higher dimensions. It is typical to use simulation to learn these parameters, but controllers learned in simulation often don't transfer to hardware. This necessitates optimization directly on hardware. However, collecting data on hardware can be expensive. This has led to a recent interest in adapting data-efficient learning techniques to robotics. One popular method is Bayesian Optimization (BO), a sample-efficient black-box optimization scheme, but its performance typically degrades in higher dimensions. We aim to overcome this problem by incorporating domain knowledge to reduce dimensionality in a meaningful way, with a focus on bipedal locomotion. In previous work, we proposed a transformation based on knowledge of human walking that projected a 16-dimensional controller to a 1-dimensional space. In simulation, this showed enhanced sample efficiency when optimizing human-inspired neuromuscular walking controllers on a humanoid model. In this paper, we present a generalized feature transform applicable to non-humanoid robot morphologies and evaluate it on the ATRIAS bipedal robot -- in simulation and on hardware. We present three different walking controllers; two are evaluated on the real robot. Our results show that this feature transform captures important aspects of walking and accelerates learning on hardware and simulation, as compared to traditional BO.

• [cs.RO]**Decentralized Collision-Free Control of Multiple Robots in 2D and 3D Spaces**

*Xiaotian Yang*

http://arxiv.org/abs/1709.05843v1

Decentralized control of robots has attracted huge research interests. However, some of the research used unrealistic assumptions without collision avoidance. This report focuses on the collision-free control for multiple robots in both complete coverage and search tasks in 2D and 3D areas which are arbitrary unknown. All algorithms are decentralized as robots have limited abilities and they are mathematically proved. The report starts with the grid selection in the two tasks. Grid patterns simplify the representation of the area and robots only need to move straightly between neighbor vertices. For the 100% complete 2D coverage, the equilateral triangular grid is proposed. For the complete coverage ignoring the boundary effect, the grid with the fewest vertices is calculated in every situation for both 2D and 3D areas. The second part is for the complete coverage in 2D and 3D areas. A decentralized collision-free algorithm with the above selected grid is presented driving robots to sections which are furthest from the reference point. The area can be static or expanding, and the algorithm is simulated in MATLAB. Thirdly, three grid-based decentralized random algorithms with collision avoidance are provided to search targets in 2D or 3D areas. The number of targets can be known or unknown. In the first algorithm, robots choose vacant neighbors randomly with priorities on unvisited ones while the second one adds the repulsive force to disperse robots if they are close. In the third algorithm, if surrounded by visited vertices, the robot will use the breadth-first search algorithm to go to one of the nearest unvisited vertices via the grid. The second search algorithm is verified on Pioneer 3-DX robots. The general way to generate the formula to estimate the search time is demonstrated. Algorithms are compared with five other algorithms in MATLAB to show their effectiveness.

• [cs.RO]**Design, Development and Evaluation of a UAV to Study Air Quality in Qatar**

*Khalid Al-Hajjaji, Mouadh Ezzin, Husain Khamdan, Abdelhakim El Hassani, Nizar Zorba*

http://arxiv.org/abs/1709.05628v1

Measuring gases for air quality monitoring is a challenging task that claims a lot of time of observation and large numbers of sensors. The aim of this project is to develop a partially autonomous unmanned aerial vehicle (UAV) equipped with sensors, in order to monitor and collect air quality real time data in designated areas and send it to the ground base. This project is designed and implemented by a multidisciplinary team from electrical and computer engineering departments. The electrical engineering team responsible for implementing air quality sensors for detecting real time data and transmit it from the plane to the ground. On the other hand, the computer engineering team is in charge of Interface sensors and provide platform to view and visualize air quality data and live video streaming. The proposed project contains several sensors to measure Temperature, Humidity, Dust, CO, CO2 and O3. The collected data is transmitted to a server over a wireless internet connection and the server will store, and supply these data to any party who has permission to access it through android phone or website in semi-real time. The developed UAV has carried several field tests in Al Shamal airport in Qatar, with interesting results and proof of concept outcomes.

• [cs.RO]**Endo-VMFuseNet: Deep Visual-Magnetic Sensor Fusion Approach for Uncalibrated, Unsynchronized and Asymmetric Endoscopic Capsule Robot Localization Data**

*Mehmet Turan, Yasin Almalioglu, Hunter Gilbert, Alp Eren Sari, Ufuk Soylu, Metin Sitti*

http://arxiv.org/abs/1709.06041v1

In the last decade, researchers and medical device companies have made major advances towards transforming passive capsule endoscopes into active medical robots. One of the major challenges is to endow capsule robots with accurate perception of the environment inside the human body, which will provide necessary information and enable improved medical procedures. We extend the success of deep learning approach from various research fields to the problem of uncalibrated, asynchronous and asymmetric sensor fusion for endoscopic capsule robots. The results performed on real pig stomach datasets show that our method achieves submillimeter precision for both translational and rotational movements and contains various advantages over traditional sensor fusion techniques.

• [cs.RO]**Learning Sampling Distributions for Robot Motion Planning**

*Brian Ichter, James Harrison, Marco Pavone*

http://arxiv.org/abs/1709.05448v1

A defining feature of sampling-based motion planning is the reliance on an implicit representation of the state space, which is enabled by a set of probing samples. Traditionally, these samples are drawn either probabilistically or deterministically to uniformly cover the state space. Yet, the motion of many robotic systems is often restricted to "small" regions of the state space, due to e.g. differential constraints or collision-avoidance constraints. To accelerate the planning process, it is thus desirable to devise non-uniform sampling strategies that favor sampling in those regions where an optimal solution might lie. This paper proposes a methodology for non-uniform sampling, whereby a sampling distribution is learnt from demonstrations, and then used to bias sampling. The sampling distribution is computed through a conditional variational autoencoder, allowing sample generation from the latent space conditioned on the specific planning problem. This methodology is general, can be used in combination with any sampling-based planner, and can effectively exploit the underlying structure of a planning problem while maintaining the theoretical guarantees of sampling-based approaches. Specifically, on several planning problems, the proposed methodology is shown to effectively learn representations for the relevant regions of the state space, resulting in an order of magnitude improvement in terms of success rate and convergence to the optimal cost.

• [cs.RO]**Recognizing Objects In-the-wild: Where Do We Stand?**

*Mohammad Reza Loghmani, Barbara Caputo, Markus Vincze*

http://arxiv.org/abs/1709.05862v1

The ability to recognize objects is an essential skill for a robotic system acting in human-populated environments. Despite decades of effort from the robotic and vision research communities, robots are still missing good visual perceptual systems, preventing the use of autonomous agents for real-world applications. The progress is slowed down by the lack of a testbed able to accurately represent the world perceived by the robot in-the-wild. In order to fill this gap, we introduce a large-scale, multi-view object dataset collected with an RGB-D camera mounted on a mobile robot. The dataset embeds the challenges faced by a robot in a real-life application and provides a useful tool for validating object recognition algorithms. Besides describing the characteristics of the dataset, the paper evaluates the performance of a collection of well-established deep convolutional networks on the new dataset and analyzes the transferability of deep representations from Web images to robotic data. Despite the promising results obtained with such representations, the experiments demonstrate that object classification with real-life robotic data is far from being solved. Finally, we provide a comparative study to analyze and highlight the open challenges in robot vision, explaining the discrepancies in the performance.

• [cs.RO]**Sensor-Based Reactive Symbolic Planning in Partially Known Environments**

*Vasileios Vasilopoulos, William Vega-Brown, Omur Arslan, Nicholas Roy, Daniel E. Koditschek*

http://arxiv.org/abs/1709.05474v1

This paper considers the problem of completing assemblies of passive objects in nonconvex environments, cluttered with convex obstacles of unknown position, shape and size that satisfy a specific separation assumption. A differential drive robot equipped with a gripper and a LIDAR sensor, capable of perceiving its environment only locally, is used to position the passive objects in a desired configuration. The method combines the virtues of a deliberative planner generating high-level, symbolic commands, with the formal guarantees of convergence and obstacle avoidance of a reactive planner that requires little onboard computation and is used online. The validity of the proposed method is verified both with formal proofs and numerical simulations.

• [cs.RO]**Sim-to-real Transfer of Visuo-motor Policies for Reaching in Clutter: Domain Randomization and Adaptation with Modular Networks**

*Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter Corke*

http://arxiv.org/abs/1709.05746v1

A modular method is proposed to learn and transfer visuo-motor policies from simulation to the real world in an efficient manner by combining domain randomization and adaptation. The feasibility of the approach is demonstrated in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The learned visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 93.3% success rate and 2.2 cm control accuracy.

• [cs.RO]**Topomap: Topological Mapping and Navigation Based on Visual SLAM Maps**

*Fabian Blöchliger, Marius Fehr, Marcin Dymczyk, Thomas Schneider, Roland Siegwart*

http://arxiv.org/abs/1709.05533v1

Visual robot navigation within large-scale, semistructured environments deals with various challenges such as computation intensive path planning algorithms or insufficient knowledge about traversable spaces. Moreover, many stateof-the-art navigation approaches only operate locally instead of gaining a more conceptual understanding of the planning objective. This limits the complexity of tasks a robot can accomplish and makes it harder to deal with uncertainties that are present in the context of real-time robotics applications. In this work, we present Topomap, a framework which simplifies the navigation task by providing a map to the robot which is tailored for path planning use. This novel approach transforms a sparse feature-based map from a visual Simultaneous Localization And Mapping (SLAM) system into a three-dimensional topological map. This is done in two steps. First, we extract occupancy information directly from the noisy sparse point cloud. Then, we create a set of convex free-space clusters, which are the vertices of the topological map. We show that this representation improves the efficiency of global planning, and we provide a complete derivation of our algorithm. Planning experiments on real world datasets demonstrate that we achieve similar performance as RRT* with significantly lower computation times and storage requirements. Finally, we test our algorithm on a mobile robotic platform to prove its advantages.

• [cs.RO]**Why did the Robot Cross the Road? - Learning from Multi-Modal Sensor Data for Autonomous Road Crossing**

*Noha Radwan, Wera Winterhalter, Christian Dornhege, Wolfram Burgard*

http://arxiv.org/abs/1709.06039v1

We consider the problem of developing robots that navigate like pedestrians on sidewalks through city centers for performing various tasks including delivery and surveillance. One particular challenge for such robots is crossing streets without pedestrian traffic lights. To solve this task the robot has to decide based on its sensory input if the road is clear. In this work, we propose a novel multi-modal learning approach for the problem of autonomous street crossing. Our approach solely relies on laser and radar data and learns a classifier based on Random Forests to predict when it is safe to cross the road. We present extensive experimental evaluations using real-world data collected from multiple street crossing situations which demonstrate that our approach yields a safe and accurate street crossing behavior and generalizes well over different types of situations. A comparison to alternative methods demonstrates the advantages of our approach.

• [cs.SD]**Nonnegative HMM for Babble Noise Derived from Speech HMM: Application to Speech Enhancement**

*Nasser Mohammadiha, Arne Leijon*

http://arxiv.org/abs/1709.05559v1

Deriving a good model for multitalker babble noise can facilitate different speech processing algorithms, e.g. noise reduction, to reduce the so-called cocktail party difficulty. In the available systems, the fact that the babble waveform is generated as a sum of N different speech waveforms is not exploited explicitly. In this paper, first we develop a gamma hidden Markov model for power spectra of the speech signal, and then formulate it as a sparse nonnegative matrix factorization (NMF). Second, the sparse NMF is extended by relaxing the sparsity constraint, and a novel model for babble noise (gamma nonnegative HMM) is proposed in which the babble basis matrix is the same as the speech basis matrix, and only the activation factors (weights) of the basis vectors are different for the two signals over time. Finally, a noise reduction algorithm is proposed using the derived speech and babble models. All of the stationary model parameters are estimated using the expectation-maximization (EM) algorithm, whereas the time-varying parameters, i.e. the gain parameters of speech and babble signals, are estimated using a recursive EM algorithm. The objective and subjective listening evaluations show that the proposed babble model and the final noise reduction algorithm significantly outperform the conventional methods.

• [cs.SD]**Speech Dereverberation Using Nonnegative Convolutive Transfer Function and Spectro temporal Modeling**

*Nasser Mohammadiha, Simon Doclo*

http://arxiv.org/abs/1709.05557v1

This paper presents two single channel speech dereverberation methods to enhance the quality of speech signals that have been recorded in an enclosed space. For both methods, the room acoustics are modeled using a nonnegative approximation of the convolutive transfer function (NCTF), and to additionally exploit the spectral properties of the speech signal, such as the low rank nature of the speech spectrogram, the speech spectrogram is modeled using nonnegative matrix factorization (NMF). Two methods are described to combine the NCTF and NMF models. In the first method, referred to as the integrated method, a cost function is constructed by directly integrating the speech NMF model into the NCTF model, while in the second method, referred to as the weighted method, the NCTF and NMF based cost functions are weighted and summed. Efficient update rules are derived to solve both optimization problems. In addition, an extension of the integrated method is presented, which exploits the temporal dependencies of the speech signal. Several experiments are performed on reverberant speech signals with and without background noise, where the integrated method yields a considerably higher speech quality than the baseline NCTF method and a state of the art spectral enhancement method. Moreover, the experimental results indicate that the weighted method can even lead to a better performance in terms of instrumental quality measures, but that the optimal weighting parameter depends on the room acoustics and the utilized NMF model. Modeling the temporal dependencies in the integrated method was found to be useful only for highly reverberant conditions.

• [cs.SE]**Joining Jolie to Docker - Orchestration of Microservices on a Containers-as-a-Service Layer**

*Alberto Giaretta, Nicola Dragoni, Manuel Mazzara*

http://arxiv.org/abs/1709.05635v1

Cloud computing is steadily growing and, as IaaS vendors have started to offer pay-as-you-go billing policies, it is fundamental to achieve as much elasticity as possible, avoiding over-provisioning that would imply higher costs. In this paper, we briefly analyse the orchestration characteristics of PaaSSOA, a proposed architecture already implemented for Jolie microservices, and Kubernetes, one of the various orchestration plugins for Docker; then, we outline similarities and differences of the two approaches, with respect to their own domain of application. Furthermore, we investigate some ideas to achieve a federation of the two technologies, proposing an architectural composition of Jolie microservices on Docker Container-as-a-Service layer.

• [cs.SI]**Label propagation for clustering**

*Lovro Šubelj*

http://arxiv.org/abs/1709.05634v1

Label propagation is a heuristic method initially proposed for community detection in networks, while the method can be adopted also for other types of network clustering and partitioning. Among all the approaches and techniques described in this book, label propagation is neither the most accurate nor the most robust method. It is, however, without doubt one of the simplest and fastest clustering methods. Label propagation can be implemented with a few lines of programming code and applied to networks with hundreds of millions of nodes and edges on a standard computer, which is true only for a handful of other methods in the literature. In this chapter, we present the basic framework of label propagation, review different advances and extensions of the original method, and highlight its equivalences with other approaches. We show how label propagation can be used effectively for large-scale community detection, graph partitioning, identification of structurally equivalent nodes and other network structures. We conclude the chapter with a summary of the label propagation methods and suggestions for future research.

• [cs.SI]**Representation Learning on Graphs: Methods and Applications**

*William L. Hamilton, Rex Ying, Jure Leskovec*

http://arxiv.org/abs/1709.05584v1

Machine learning on graphs is an important and ubiquitous task with applications ranging from drug design to friendship recommendation in social networks. The primary challenge in this domain is finding a way to represent, or encode, graph structure so that it can be easily exploited by machine learning models. Traditionally, machine learning approaches relied on user-defined heuristics to extract features encoding structural information about a graph (e.g., degree statistics or kernel functions). However, recent years have seen a surge in approaches that automatically learn to encode graph structure into low-dimensional embeddings, using techniques based on deep learning and nonlinear dimensionality reduction. Here we provide a conceptual review of key advancements in this area of representation learning on graphs, including matrix factorization-based methods, random-walk based algorithms, and graph convolutional networks. We review methods to embed individual nodes as well as approaches to embed entire (sub)graphs. In doing so, we develop a unified framework to describe these recent approaches, and we highlight a number of important applications and directions for future work.

• [cs.SI]**The Geometric Block Model**

*Sainyam Galhotra, Arya Mazumdar, Soumyabrata Pal, Barna Saha*

http://arxiv.org/abs/1709.05510v1

To capture the inherent geometric features of many community detection problems, we propose to use a new random graph model of communities that we call a Geometric Block Model. The geometric block model generalizes the random geometric graphs in the same way that the well-studied stochastic block model generalizes the Erdos-Renyi random graphs. It is also a natural extension of random community models inspired by the recent theoretical and practical advancement in community detection. While being a topic of fundamental theoretical interest, our main contribution is to show that many practical community structures are better explained by the geometric block model. We also show that a simple triangle-counting algorithm to detect communities in the geometric block model is near-optimal. Indeed, even in the regime where the average degree of the graph grows only logarithmically with the number of vertices (sparse-graph), we show that this algorithm performs extremely well, both theoretically and practically. In contrast, the triangle-counting algorithm is far from being optimum for the stochastic block model. We simulate our results on both real and synthetic datasets to show superior performance of both the new model as well as our algorithm.

• [cs.SI]**Towards matching user mobility traces in large-scale datasets**

*Dániel Kondor, Behrooz Hashemian, Yves-Alexandre de Montjoye, Carlo Ratti*

http://arxiv.org/abs/1709.05772v1

The problem of unicity and reidentifiability of records in large-scale databases has been studied in different contexts and approaches, with focus on preserving privacy or matching records from different data sources. With an increasing number of service providers nowadays routinely collecting location traces of their users on unprecedented scales, there is a pronounced interest in the possibility of matching records and datasets based on spatial trajectories. Extending previous work on reidentifiability of spatial data and trajectory matching, we now present the first large-scale analysis of user matchability in real mobility datasets on realistic scales, i.e.~among two datasets that consist of several million people's mobility traces for a one week interval each. We extract the relevant statistical properties which influence the matching process and provide an estimate on a performance of matching and thus the matchability of users. We derive that for individuals with typical activity in the transportation system (those making 3-4 trips per day on average), a matching algorithm based on the co-occurrence of their activities is expected to achieve a 16.8% success rate based only on a one-week long observation of their mobility traces. Extrapolating for longer time intervals, we expect a success rate of over 55% after four week long observations. We further evaluate different scenarios of data collection frequency, giving estimates of matchability over time in several realastic cases of mobility datasets.

• [cs.SY]**A Generalized Framework for Kullback-Leibler Markov Aggregation**

*Rana Ali Amjad, Clemens Blöchl, Bernhard C. Geiger*

http://arxiv.org/abs/1709.05907v1

This paper proposes an information-theoretic cost function for aggregating a Markov chain via a (possibly stochastic) mapping. The cost function is motivated by two objectives: 1) The process obtained by observing the Markov chain through the mapping should be close to a Markov chain, and 2) the aggregated Markov chain should retain as much of the temporal dependence structure of the original Markov chain as possible. We discuss properties of this parameterized cost function and show that it contains the cost functions previously proposed by Deng et al., Xu et al., and Geiger et al. as special cases. We moreover discuss these special cases providing a better understanding and highlighting potential shortcomings: For example, the cost function proposed by Geiger et al. is tightly connected to approximate probabilistic bisimulation, but leads to trivial solutions if optimized without regularization. We furthermore propose a simple heuristic to optimize our cost function for deterministic aggregations and illustrate its performance on a set of synthetic examples.

• [cs.SY]**Gaussian Process Latent Force Models for Learning and Stochastic Control of Physical Systems**

*Simo Särkkä, Mauricio A. Álvarez, Neil D. Lawrence*

http://arxiv.org/abs/1709.05409v1

This paper is concerned with estimation and stochastic control in physical systems which contain unknown input signals or forces. These unknown signals are modeled as Gaussian processes (GP) in the sense that GP models are used in machine learning. The resulting latent force models (LFMs) can be seen as hybrid models that contain a first-principles physical model part and a non-parametric GP model part. The aim of this paper is to collect and extend the statistical inference and learning methods for this kind of models, provide new theoretical results for the models, and to extend the methodology and theory to stochastic control of LFMs.

• [math.NA]**Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations**

*Christian Beck, Weinan E, Arnulf Jentzen*

http://arxiv.org/abs/1709.05963v1

High-dimensional partial differential equations (PDE) appear in a number of models from the financial industry, such as in derivative pricing models, credit valuation adjustment (CVA) models, or portfolio optimization models. The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio. Moreover, such PDEs are often fully nonlinear due to the need to incorporate certain nonlinear phenomena in the model such as default risks, transaction costs, volatility uncertainty (Knightian uncertainty), or trading constraints in the model. Such high-dimensional fully nonlinear PDEs are exceedingly difficult to solve as the computational effort for standard approximation methods grows exponentially with the dimension. In this work we propose a new method for solving high-dimensional fully nonlinear second-order PDEs. Our method can in particular be used to sample from high-dimensional nonlinear expectations. The method is based on (i) a connection between fully nonlinear second-order PDEs and second-order backward stochastic differential equations (2BSDEs), (ii) a merged formulation of the PDE and the 2BSDE problem, (iii) a temporal forward discretization of the 2BSDE and a spatial approximation via deep neural nets, and (iv) a stochastic gradient descent-type optimization procedure. Numerical results obtained using ${\rm T{\small ENSOR}F{\small LOW}}$ in ${\rm P{\small YTHON}}$ illustrate the efficiency and the accuracy of the method in the cases of a $100$-dimensional Black-Scholes-Barenblatt equation, a $100$-dimensional Hamilton-Jacobi-Bellman equation, and a nonlinear expectation of a $ 100 $-dimensional $ G $-Brownian motion.

• [math.NA]**Variational Gaussian Approximation for Poisson Data**

*Simon Arridge, Kazufumi Ito, Bangti Jin, Chen Zhang*

http://arxiv.org/abs/1709.05885v1

The Poisson model is frequently employed to describe count data, but in a Bayesian context it leads to an analytically intractable posterior probability distribution. In this work, we analyze a variational Gaussian approximation to the posterior distribution arising from the Poisson model with a Gaussian prior. This is achieved by seeking an optimal Gaussian distribution minimizing the Kullback-Leibler divergence from the posterior distribution to the approximation, or equivalently maximizing the lower bound for the model evidence. We derive an explicit expression for the lower bound, and show the existence and uniqueness of the optimal Gaussian approximation. The lower bound functional can be viewed as a variant of classical Tikhonov regularization that penalizes also the covariance. Then we develop an efficient alternating direction maximization algorithm for solving the optimization problem, and analyze its convergence. We discuss strategies for reducing the computational complexity via low rank structure of the forward operator and the sparsity of the covariance. Further, as an application of the lower bound, we discuss hierarchical Bayesian modeling for selecting the hyperparameter in the prior distribution, and propose a monotonically convergent algorithm for determining the hyperparameter. We present extensive numerical experiments to illustrate the Gaussian approximation and the algorithms.

• [math.RA]**MacWilliams' extension theorem for infinite rings**

*Friedrich Martin Schneider, Jens Zumbrägel*

http://arxiv.org/abs/1709.06070v1

Finite Frobenius rings have been characterized as precisely those finite rings satisfying the MacWilliams extension property, by work of Wood. In the present note we offer a generalization of this remarkable result to the realm of Artinian rings. Namely, we prove that a left Artinian ring has the left MacWilliams property if and only if it is left pseudo-injective and its finitary left socle embeds into the semisimple quotient. Providing a topological perspective on the MacWilliams property, we also show that the finitary left socle of a left Artinian ring embeds into the semisimple quotient if and only if it admits a finitarily left torsion-free character, if and only if the Pontryagin dual of the regular left module is almost monothetic. In conclusion, an Artinian ring has the MacWilliams property if and only if it is finitarily Frobenius, i.e., it is quasi-Frobenius and its finitary socle embeds into the semisimple quotient.

• [math.ST]**A Sharp Lower Bound for Mixed-membership Estimation**

*Jiashun Jin, Zheng Tracy Ke*

http://arxiv.org/abs/1709.05603v1

Consider an undirected network with $n$ nodes and $K$ perceivable communities, where some nodes may have mixed memberships. We assume that for each node $1 \leq i \leq n$, there is a probability mass function $\pi_i$ defined over ${1, 2, \ldots, K}$ such that [ \pi_i(k) = \mbox{the weight of node $i$ on community $k$}, \qquad 1 \leq k \leq K. ] The goal is to estimate ${\pi_i, 1 \leq i \leq n}$ (i.e., membership estimation). We model the network with the {\it degree-corrected mixed membership (DCMM)} model \cite{Mixed-SCORE}. Since for many natural networks, the degrees have an approximate power-law tail, we allow {\it severe degree heterogeneity} in our model. For any membership estimation ${\hat{\pi}

i, 1 \leq i \leq n}$, since each $\pi_i$ is a probability mass function, it is natural to measure the errors by the average $\ell^1$-norm [ \frac{1}{n} \sum{i = 1}^n | \hat{\pi}_i - \pi_i|_1. ] We also consider a variant of the $\ell^1$-loss, where each $|\hat{\pi}_i - \pi_i|_1$ is re-weighted by the degree parameter $\theta_i$ in DCMM (to be introduced). We present a sharp lower bound. We also show that such a lower bound is achievable under a broad situation. More discussion in this vein is continued in our forthcoming manuscript. The results are very different from those on community detection. For community detection, the focus is on the special case where all $\pi_i$ are degenerate; the goal is clustering, so Hamming distance is the natural choice of loss function, and the rate can be exponentially fast. The setting here is broader and more difficult: it is more natural to use the $\ell^1$-loss, and the rate is only polynomially fast.

• [math.ST]**A generalization of the Log Lindley distribution -- its properties and applications**

*S. Chakraborty, S. H. Ong, C. M. Ng*

http://arxiv.org/abs/1709.05613v1

An extension of the two-parameter Log-Lindley distribution of Gomez et al. (2014) with support in (0, 1) is proposed. Its important properties like cumulative distribution function, moments, survival function, hazard rate function, Shannon entropy, stochastic n ordering and convexity (concavity) conditions are derived. An application in distorted premium principal is outlined and parameter estimation by method of maximum likelihood is also presented. We also consider use of a re-parameterized form of the proposed distribution in regression modeling for bounded responses by considering a real life data in comparison with beta regression and log-Lindley regression models.

• [math.ST]**An alternative to continuous univariate distributions supported on a bounded interval: The BMT distribution**

*Camilo Jose Torres-Jimenez, Alvaro Mauricio Montenegro-Diaz*

http://arxiv.org/abs/1709.05534v1

In this paper, we introduce the BMT distribution as an unimodal alternative to continuous univariate distributions supported on a bounded interval. The ideas behind the mathematical formulation of this new distribution come from computer aid geometric design, specifically from Bezier curves. First, we review general properties of a distribution given by parametric equations and extend the definition of a Bezier distribution. Then, after proposing the BMT cumulative distribution function, we derive its probability density function and a closed-form expression for quantile function, median, interquartile range, mode, and moments. The domain change from [0,1] to [c,d] is mentioned. Estimation of parameters is approached by the methods of maximum likelihood and maximum product of spacing. We test the numerical estimation procedures using some simulated data. Usefulness and flexibility of the new distribution are illustrated in three real data sets. The BMT distribution has a significant potential to estimate domain parameters and to model data outside the scope of the beta or similar distributions.

• [math.ST]**Nonparametric Shape-restricted Regression**

*Adityanand Guntuboyina, Bodhisattva Sen*

http://arxiv.org/abs/1709.05707v1

We consider the problem of nonparametric regression under shape constraints. The main examples include isotonic regression (with respect to any partial order), unimodal/convex regression, additive shape-restricted regression, and constrained single index model. We review some of the theoretical properties of the least squares estimator (LSE) in these problems, emphasizing on the adaptive nature of the LSE. In particular, we study the risk behavior of the LSE, and its pointwise limiting distribution theory, with special emphasis to isotonic regression. We survey various methods for constructing pointwise confidence intervals around these shape-restricted functions. We also briefly discuss the computation of the LSE and indicate some open research problems and future directions.

• [math.ST]**Rigorous Analysis for Efficient Statistically Accurate Algorithms for Solving Fokker-Planck Equations in Large Dimensions**

*Nan Chen, Andrew J. Majda, Xin T. Tong*

http://arxiv.org/abs/1709.05585v1

This article presents a rigorous analysis for efficient statistically accurate algorithms for solving the Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. Despite the conditional Gaussianity, these nonlinear systems contain many strong non-Gaussian features such as intermittency and fat-tailed probability density functions (PDFs). The algorithms involve a hybrid strategy that requires only a small number of samples $L$ to capture both the transient and the equilibrium non-Gaussian PDFs with high accuracy. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious Gaussian kernel density estimation in the remaining low-dimensional subspace. Rigorous analysis shows that the mean integrated squared error in the recovered PDFs in the high-dimensional subspace is bounded by the inverse square root of the determinant of the conditional covariance, where the conditional covariance is completely determined by the underlying dynamics and is independent of $L$. This is fundamentally different from a direct application of kernel methods to solve the full PDF, where $L$ needs to increase exponentially with the dimension of the system and the bandwidth shrinks. A detailed comparison between different methods justifies that the efficient statistically accurate algorithms are able to overcome the curse of dimensionality. It is also shown with mathematical rigour that these algorithms are robust in long time provided that the system is controllable and stochastically stable. Particularly, dynamical systems with energy-conserving quadratic nonlinearity as in many geophysical and engineering turbulence are proved to have these properties.

• [math.ST]**Semi-supervised learning**

*Alejandro Cholaquidis, Ricardo Fraiman, Mariela Sued*

http://arxiv.org/abs/1709.05673v1

Semi-supervised learning deals with the problem of how, if possible, to take advantage of a huge amount of not classified data, to perform classification, in situations when, typically, the labelled data are few. Even though this is not always possible (it depends on how useful is to know the distribution of the unlabelled data in the inference of the labels), several algorithm have been proposed recently. A new algorithm is proposed, that under almost neccesary conditions, attains asymptotically the performance of the best theoretical rule, when the size of unlabeled data tends to infinity. The set of necessary assumptions, although reasonables, show that semi-parametric classification only works for very well conditioned problems.

• [math.ST]**Spectral Radii of Truncated Circular Unitary Matrices**

*Wenhao Gui, Yongcheng Qi*

http://arxiv.org/abs/1709.05441v1

Consider a truncated circular unitary matrix which is a $p_n$ by $p_n$ submatrix of an $n$ by $n$ circular unitary matrix by deleting the last $n-p_n$ columns and rows. Jiang and Qi (2017) proved that the maximum absolute value of the eigenvalues (known as spectral radius) of the truncated matrix, after properly normalized, converges in distribution to the Gumbel distribution if $p_n/n$ is bounded away from $0$ and $1$. In this paper we investigate the limiting distribution of the spectral radius under one of the following four conditions: (1). $p_n\to\infty$ and $p_n/n\to 0$ as $n\to\infty$; (2). $(n-p_n)/n\to 0$ and $(n-p_n)/(\log n)^3\to\infty$ as $n\to\infty$; (3). $n-p_n\to\infty$ and $(n-p_n)/\log n\to 0$ as $n\to\infty$ and (4). $n-p_n=k\ge 1$ is a fixed integer. We prove that the spectral radius converges in distribution to the Gumbel distribution under the first three conditions and to a reversed Weibull distribution under the fourth condition.

• [physics.soc-ph]**Mapping temporal-network percolation to weighted, static event graphs**

*Mikko Kivelä, Jordan Cambe, Jari Saramäki, Márton Karsai*

http://arxiv.org/abs/1709.05647v1

Many processes of spreading and diffusion take place on temporal networks, and their outcomes are influenced by correlations in the times of contact. These correlations have a particularly strong influence on processes where the spreading agent has a limited lifetime at nodes: disease spreading (recovery time), diffusion of rumors (lifetime of information), and passenger routing (maximum acceptable time between transfers). Here, we introduce weighted event graphs as a powerful and fast framework for studying connectivity determined by time-respecting paths where the allowed waiting times between contacts have an upper limit. We study percolation on the weighted event graphs and in the underlying temporal networks, with simulated and real-world networks. We show that this type of temporal-network percolation is analogous to directed percolation, and that it can be characterized by multiple order parameters.

• [q-bio.OT]**An Algorithmic Information Calculus for Causal Discovery and Reprogramming Systems**

*Hector Zenil, Narsis A. Kiani, Francesco Marabita, Yue Deng, Szabolcs Elias, Angelika Schmidt, Gordon Ball, Jesper Tegnér*

http://arxiv.org/abs/1709.05429v1

We introduce a conceptual framework and an interventional calculus to reconstruct the dynamics of, steer, and manipulate systems based on their intrinsic algorithmic probability using the universal principles of the theory of computability and algorithmic information. By applying sequences of controlled interventions to systems and networks, we estimate how changes in their algorithmic information content are reflected in positive/negative shifts towards and away from randomness. The strong connection between approximations to algorithmic complexity (the size of the shortest generating mechanism) and causality induces a sequence of perturbations ranking the network elements by the steering capabilities that each of them is capable of. This new dimension unmasks a separation between causal and non-causal components providing a suite of powerful parameter-free algorithms of wide applicability ranging from optimal dimension reduction, maximal randomness analysis and system control. We introduce methods for reprogramming systems that do not require the full knowledge or access to the system's actual kinetic equations or any probability distributions. A causal interventional analysis of synthetic and regulatory biological networks reveals how the algorithmic reprogramming qualitatively reshapes the system's dynamic landscape. For example, during cellular differentiation we find a decrease in the number of elements corresponding to a transition away from randomness and a combination of the system's intrinsic properties and its intrinsic capabilities to be algorithmically reprogrammed can reconstruct an epigenetic landscape. The interventional calculus is broadly applicable to predictive causal inference of systems such as networks and of relevance to a variety of machine and causal learning techniques driving model-based approaches to better understanding and manipulate complex systems.

• [stat.AP]**An adsorbed gas estimation model for shale gas reservoirs via statistical learning**

*Yuntian Chen, Su Jiang, Dongxiao Zhang, Chaoyang Liu*

http://arxiv.org/abs/1709.05619v1

Shale gas plays an important role in reducing pollution and adjusting the structure of world energy. Gas content estimation is particularly significant in shale gas resource evaluation. There exist various estimation methods, such as first principle methods and empirical models. However, resource evaluation presents many challenges, especially the insufficient accuracy of existing models and the high cost resulting from time-consuming adsorption experiments. In this research, a low-cost and high-accuracy model based on geological parameters is constructed through statistical learning methods to estimate adsorbed shale gas content

• [stat.AP]**Applying Machine Learning Methods to Enhance the Distribution of Social Services in Mexico**

*Kris Sankaran, Diego Garcia-Olano, Mobin Javed, Maria Fernanda Alcala-Durand, Adolfo De Unánue, Paul van der Boor, Eric Potash, Roberto Sánchez Avalos, Luis Iñaki Alberro Encinas, Rayid Ghani*

http://arxiv.org/abs/1709.05551v1

The Government of Mexico's social development agency, SEDESOL, is responsible for the administration of social services and has the mission of lifting Mexican families out of poverty. One key challenge they face is matching people who have social service needs with the services SEDESOL can provide accurately and efficiently. In this work we describe two specific applications implemented in collaboration with SEDESOL to enhance their distribution of social services. The first problem relates to systematic underreporting on applications for social services, which makes it difficult to identify where to prioritize outreach. Responding that five people reside in a home when only three do is a type of underreporting that could occur while a social worker conducts a home survey with a family to determine their eligibility for services. The second involves approximating multidimensional poverty profiles across households. That is, can we characterize different types of vulnerabilities -- for example, food insecurity and lack of health services -- faced by those in poverty? We detail the problem context, available data, our machine learning formulation, experimental results, and effective feature sets. As far as we are aware this is the first time government data of this scale has been used to combat poverty within Mexico. We found that survey data alone can suggest potential underreporting. Further, we found geographic features useful for housing and service related indicators and transactional data informative for other dimensions of poverty. The results from our machine learning system for estimating poverty profiles will directly help better match 7.4 million individuals to social programs.

• [stat.AP]**Forecasting of commercial sales with large scale Gaussian Processes**

*Rodrigo Rivera, Evgeny Burnaev*

http://arxiv.org/abs/1709.05548v1

This paper argues that there has not been enough discussion in the field of applications of Gaussian Process for the fast moving consumer goods industry. Yet, this technique can be important as it e.g., can provide automatic feature relevance determination and the posterior mean can unlock insights on the data. Significant challenges are the large size and high dimensionality of commercial data at a point of sale. The study reviews approaches in the Gaussian Processes modeling for large data sets, evaluates their performance on commercial sales and shows value of this type of models as a decision-making tool for management.

• [stat.AP]**Reassessing Accuracy Rates of Median Decisions**

*Andrea Capotorti, Frank Lad, Giuseppe Sanfilippo*

http://arxiv.org/abs/1709.05637v1

We show how Bruno de Finetti's fundamental theorem of prevision has computable applications in statistical problems that involve only partial information. Specifically, we assess accuracy rates for median decision procedures used in the radiological diagnosis of asbestosis. Conditional exchangeability of individual radiologists' diagnoses is recognized as more appropriate than independence which is commonly presumed. The FTP yields coherent bounds on probabilities of interest when available information is insufficient to determine a complete distribution. Further assertions that are natural to the problem motivate a partial ordering of conditional probabilities, extending the computation from a linear to a quadratic programming problem.

• [stat.ME]**Bayesian analysis of three parameter singular and absolute continuous Marshall-Olkin bivariate Pareto distribution**

*Biplab Paul, Arabin Kumar Dey, Sanku Dey*

http://arxiv.org/abs/1709.05906v1

This paper provides bayesian analysis of Marshall-Olkin bivariate Pareto distribution. We consider three parameter bivariate Pareto distribution. We take both singular and absolute continuous version of probability density function into our consideration. We consider two types of prior - reference prior and gamma prior. Bayes estimate of the parameters are calculated based on slice cum gibbs sampler and Lindley approximation. A credible interval is also provided for all methods and all prior distributions.

• [stat.ME]**Efficient Statistically Accurate Algorithms for the Fokker-Planck Equation in Large Dimensions**

*Nan Chen, Andrew J. Majda*

http://arxiv.org/abs/1709.05562v1

Solving the Fokker-Planck equation for high-dimensional complex turbulent dynamical systems is an important and practical issue. However, most traditional methods suffer from the curse of dimensionality and have difficulties in capturing the fat tailed highly intermittent probability density functions (PDFs) of complex systems in turbulence, neuroscience and excitable media. In this article, efficient statistically accurate algorithms are developed for solving both the transient and the equilibrium solutions of Fokker-Planck equations associated with high-dimensional nonlinear turbulent dynamical systems with conditional Gaussian structures. The algorithms involve a hybrid strategy that requires only a small number of ensembles. Here, a conditional Gaussian mixture in a high-dimensional subspace via an extremely efficient parametric method is combined with a judicious non-parametric Gaussian kernel density estimation in the remaining low-dimensional subspace. Particularly, the parametric method provides closed analytical formulae for determining the conditional Gaussian distributions in the high-dimensional subspace and is therefore computationally efficient and accurate. The full non-Gaussian PDF of the system is then given by a Gaussian mixture. Different from the traditional particle methods, each conditional Gaussian distribution here covers a significant portion of the high-dimensional PDF. Therefore a small number of ensembles is sufficient to recover the full PDF, which overcomes the curse of dimensionality. Notably, the mixture distribution has a significant skill in capturing the transient behavior with fat tails of the high-dimensional non-Gaussian PDFs, and this facilitates the algorithms in accurately describing the intermittency and extreme events in complex turbulent systems.

• [stat.ME]**Estimating the Variance of Measurement Errors in Running Variables of Sharp Regression Discontinuity Designs**

*Kota Mori*

http://arxiv.org/abs/1709.05863v1

Treatment effect estimation through regression discontinuity designs faces a severe challenge when the running variable is measured with errors, as the errors smooth out the discontinuity that the identification hinges on. Recent studies have shown that the variance of the measurement error term plays an important role on both bias correction and identification under such situations, but little is studied about how one can estimate the unknown variance from data. This paper proposes two estimators for the variance of measurement errors. The proposed estimators do not rely on any external source other than the data of the running variable and treatment assignment.

• [stat.ME]**Method for Mode Mixing Separation in Empirical Mode Decomposition**

*Olav B. Fosso, Marta Molinas*

http://arxiv.org/abs/1709.05547v1

The Empirical Mode Decomposition (EMD) is a signal analysis method that separates multi-component signals into single oscillatory modes called intrinsic mode functions (IMFs), each of which can generally be associated to a physical meaning of the process from which the signal is obtained. When the phenomena of mode mixing occur, as a result of the EMD sifting process, the IMFs can lose their physical meaning hindering the interpretation of the results of the analysis. In the paper, "One or Two frequencies? The Empirical Mode Decomposition Answers", Gabriel Rilling and Patrick Flandrin [3] presented a rigorous mathematical analysis that explains how EMD behaves in the case of a composite two-tones signal and the amplitude and frequency ratios by which EMD will perform a good separation of tones. However, the authors did not propose a solution for separating the neighboring tones that will naturally remain mixed after an EMD. In this paper, based on the findings by Rilling and Flandrin, a method that can separate neighbouring spectral components, that will naturally remain within a single IMF, is presented. This method is based on reversing the conditions by which mode mixing occurs and that were presented in the map by Rilling and Flandrin in the above mentioned paper. Numerical experiments with signals containing closely spaced spectral components shows the effective separation of modes that EMD can perform after this principle is applied. The results verify also the regimes presented in the theoretical analysis by Rilling and Flandrin.

• [stat.ME]**Parameter Regimes in Partial Functional Panel Regression**

*Dominik Liebl, Fabian Walders*

http://arxiv.org/abs/1709.05786v1

We propose a partial functional linear regression model for panel data with time varying parameters. The parameter vector of the multivariate model component is allowed to be completely time varying while the function-valued parameter of the functional model component is assumed to change over K unknown parameter regimes. We derive consistency for the suggested estimators and for our classification procedure used to detect the K unknown parameter regimes. In addition, we derive the convergence rates of our estimators under a double asymptotic where we differentiate among different asymptotic scenarios depending on the relative order of the panel dimensions n and T. The statistical model is motivated by our real data application, where we consider the so-called idiosyncratic volatility puzzle using high frequency data from the S&P500.

• [stat.ME]**Regularization and Variable Selection with Copula Prior**

*Rahul Sharma, Sourish Das*

http://arxiv.org/abs/1709.05514v1

We propose the copula prior for regularization and variable selection method. Under certain choices of the copula, we show that the lasso, elastic net or g-prior are special cases of copula prior. Besides, we propose

`lasso with Gauss copula prior' and`

lasso with t-copula prior.' The simulation study and real world data show that the copula prior often outperforms the lasso and elastic net while having a comparable sparsity of representation. Also, the copula prior encourages a grouping effect. The strongly correlated predictors tend to be in or out of the model collectively under the copula prior. The copula prior is particularly useful when the predictors are the highly correlated and the number of predictors ($p$) is larger than the number of observations ($n$). The copula prior is a generic method, which can be used to define the new prior.

• [stat.ME]**Robust estimation in single index models with asymmetric errors**

*Claudio Agostinelli, Ana M. Bianco, Graciela Boente*

http://arxiv.org/abs/1709.05422v1

We consider a robust estimation method for the parametric and nonparametric components of a single index model with asymmetric errors. The robust profile estimators are based on a stepwise procedure. Consistency results for the robust estimators and the asymptotic distribution of the single index parameter estimator are obtained. Besides, the empirical influence curve allows to study the estimators sensitivity to anomalous observations. Through a numerical study, the performance of the robust proposal is compared with that of their classical relatives, under a log--Gamma model. The numerical experiment shows the good robustness properties of the proposed estimators and the advantages of considering robust estimators as well.

• [stat.ME]**Some variations on Random Survival Forest with application to Cancer Research**

*Arabin Kumar Dey, Anshul Juneja*

http://arxiv.org/abs/1709.05515v1

Random survival forest can be extremely time consuming for large data set. In this paper we propose few computationally efficient algorithms in prediction of survival function. We explore the behavior of the algorithms for different cancer data sets. Our construction includes right censoring data too. We have also applied the same for competing risk survival function.

• [stat.ME]**Statistical inference on random dot product graphs: a survey**

*Avanti Athreya, Donniell E. Fishkind, Keith Levin, Vince Lyzinski, Youngser Park, Yichen Qin, Daniel L. Sussman, Minh Tang, Joshua T. Vogelstein, Carey E. Priebe*

http://arxiv.org/abs/1709.05454v1

The random dot product graph (RDPG) is an independent-edge random graph that is analytically tractable and, simultaneously, either encompasses or can successfully approximate a wide range of random graphs, from relatively simple stochastic block models to complex latent position graphs. In this survey paper, we describe a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices. We examine the analogues, in graph inference, of several canonical tenets of classical Euclidean inference: in particular, we summarize a body of existing results on the consistency and asymptotic normality of the adjacency and Laplacian spectral embeddings, and the role these spectral embeddings can play in the construction of single- and multi-sample hypothesis tests for graph data. We investigate several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome. We outline requisite background and current open problems in spectral graph inference.

• [stat.ML]**Bayesian nonparametric Principal Component Analysis**

*Clément Elvira, Pierre Chainais, Nicolas Dobigeon*

http://arxiv.org/abs/1709.05667v1

Principal component analysis (PCA) is very popular to perform dimension reduction. The selection of the number of significant components is essential but often based on some practical heuristics depending on the application. Only few works have proposed a probabilistic approach able to infer the number of significant components. To this purpose, this paper introduces a Bayesian nonparametric principal component analysis (BNP-PCA). The proposed model projects observations onto a random orthogonal basis which is assigned a prior distribution defined on the Stiefel manifold. The prior on factor scores involves an Indian buffet process to model the uncertainty related to the number of components. The parameters of interest as well as the nuisance parameters are finally inferred within a fully Bayesian framework via Monte Carlo sampling. A study of the (in-)consistence of the marginal maximum a posteriori estimator of the latent dimension is carried out. A new estimator of the subspace dimension is proposed. Moreover, for sake of statistical significance, a Kolmogorov-Smirnov test based on the posterior distribution of the principal components is used to refine this estimate. The behaviour of the algorithm is first studied on various synthetic examples. Finally, the proposed BNP dimension reduction approach is shown to be easily yet efficiently coupled with clustering or latent factor models within a unique framework.

• [stat.ML]**Constrained Bayesian Optimization for Automatic Chemical Design**

*Ryan-Rhys Griffiths*

http://arxiv.org/abs/1709.05501v1

Automatic Chemical Design leverages recent advances in deep generative modelling to provide a framework for performing continuous optimization of molecular properties. Although the provision of a continuous representation for prospective lead drug candidates has opened the door to hitherto inaccessible tools of mathematical optimization, some challenges remain for the design process. One known pathology is the model's tendency to decode invalid molecular structures. The goal of this thesis is to test the hypothesis that the origin of this pathology is rooted in the current formulation of Bayesian optimization. Recasting the optimization procedure as a constrained Bayesian optimization problem results in novel drug compounds produced by the model consistently ranking in the 100th percentile of the distribution over training set scores.

• [stat.ML]**Learning Mixtures of Multi-Output Regression Models by Correlation Clustering for Multi-View Data**

*Eric Lei, Kyle Miller, Artur Dubrawski*

http://arxiv.org/abs/1709.05602v1

In many datasets, different parts of the data may have their own patterns of correlation, a structure that can be modeled as a mixture of local linear correlation models. The task of finding these mixtures is known as correlation clustering. In this work, we propose a linear correlation clustering method for datasets whose features are pre-divided into two views. The method, called Canonical Least Squares (CLS) clustering, is inspired by multi-output regression and Canonical Correlation Analysis. CLS clusters can be interpreted as variations in the regression relationship between the two views. The method is useful for data mining and data interpretation. Its utility is demonstrated on a synthetic dataset and stock market dataset.

• [stat.ML]**Multivariate Gaussian Network Structure Learning**

*Xingqi Du, Subhashis Ghosal*

http://arxiv.org/abs/1709.05552v1

We consider a graphical model where a multivariate normal vector is associated with each node of the underlying graph and estimate the graphical structure. We minimize a loss function obtained by regressing the vector at each node on those at the remaining ones under a group penalty. We show that the proposed estimator can be computed by a fast convex optimization algorithm. We show that as the sample size increases, the estimated regression coefficients and the correct graphical structure are correctly estimated with probability tending to one. By extensive simulations, we show the superiority of the proposed method over comparable procedures. We apply the technique on two real datasets. The first one is to identify gene and protein networks showing up in cancer cell lines, and the second one is to reveal the connections among different industries in the US.

• [stat.ML]**Neonatal Seizure Detection using Convolutional Neural Networks**

*Alison O'Shea, Gordon Lightbody, Geraldine Boylan, Andriy Temko*

http://arxiv.org/abs/1709.05849v1

This study presents a novel end-to-end architecture that learns hierarchical representations from raw EEG data using fully convolutional deep neural networks for the task of neonatal seizure detection. The deep neural network acts as both feature extractor and classifier, allowing for end-to-end optimization of the seizure detector. The designed system is evaluated on a large dataset of continuous unedited multi-channel neonatal EEG totaling 835 hours and comprising of 1389 seizures. The proposed deep architecture, with sample-level filters, achieves an accuracy that is comparable to the state-of-the-art SVM-based neonatal seizure detector, which operates on a set of carefully designed hand-crafted features. The fully convolutional architecture allows for the localization of EEG waveforms and patterns that result in high seizure probabilities for further clinical examination.

• [stat.ML]**Relevant Ensemble of Trees**

*Gitesh Dawer, Adrian Barbu*

http://arxiv.org/abs/1709.05545v1

Tree ensembles are flexible predictive models that can capture relevant variables and to some extent their interactions in a compact and interpretable manner. Most algorithms for obtaining tree ensembles are based on versions of boosting or Random Forest. Previous work showed that boosting algorithms exhibit a cyclic behavior of selecting the same tree again and again due to the way the loss is optimized. At the same time, Random Forest is not based on loss optimization and obtains a more complex and less interpretable model. In this paper we present a novel method for obtaining compact tree ensembles by growing a large pool of trees in parallel with many independent boosting threads and then selecting a small subset and updating their leaf weights by loss optimization. We allow for the trees in the initial pool to have different depths which further helps with generalization. Experiments on real datasets show that the obtained model has usually a smaller loss than boosting, which is also reflected in a lower misclassification error on the test set.

• [stat.ML]**Subset Labeled LDA for Large-Scale Multi-Label Classification**

*Yannis Papanikolaou, Grigorios Tsoumakas*

http://arxiv.org/abs/1709.05480v1

Labeled Latent Dirichlet Allocation (LLDA) is an extension of the standard unsupervised Latent Dirichlet Allocation (LDA) algorithm, to address multi-label learning tasks. Previous work has shown it to perform in par with other state-of-the-art multi-label methods. Nonetheless, with increasing label sets sizes LLDA encounters scalability issues. In this work, we introduce Subset LLDA, a simple variant of the standard LLDA algorithm, that not only can effectively scale up to problems with hundreds of thousands of labels but also improves over the LLDA state-of-the-art. We conduct extensive experiments on eight data sets, with label sets sizes ranging from hundreds to hundreds of thousands, comparing our proposed algorithm with the previously proposed LLDA algorithms (Prior--LDA, Dep--LDA), as well as the state of the art in extreme multi-label classification. The results show a steady advantage of our method over the other LLDA algorithms and competitive results compared to the extreme multi-label classification algorithms.

• [stat.ML]**The generalised random dot product graph**

*Patrick Rubin-Delanchy, Carey E. Priebe, Minh Tang*

http://arxiv.org/abs/1709.05506v1

This paper introduces a latent position network model, called the generalised random dot product graph, comprising as special cases the stochastic blockmodel, mixed membership stochastic blockmodel, and random dot product graph. In this model, nodes are represented as random vectors on $\mathbb{R}^d$, and the probability of an edge between nodes $i$ and $j$ is given by the bilinear form $X_i^T I_{p,q} X_j$, where $I_{p,q} = \mathrm{diag}(1,\ldots, 1, -1, \ldots, -1)$ with $p$ ones and $q$ minus ones, where $p+q=d$. As we show, this provides the only possible representation of nodes in $\mathbb{R}^d$ such that mixed membership is encoded as the corresponding convex combination of latent positions. The positions are identifiable only up to transformation in the indefinite orthogonal group $O(p,q)$, and we discuss some consequences for typical follow-on inference tasks, such as clustering and prediction.

• [stat.ML]**ZhuSuan: A Library for Bayesian Deep Learning**

*Jiaxin Shi, Jianfei Chen, Jun Zhu, Shengyang Sun, Yucen Luo, Yihong Gu, Yuhao Zhou*

http://arxiv.org/abs/1709.05870v1

In this paper we introduce ZhuSuan, a python probabilistic programming library for Bayesian deep learning, which conjoins the complimentary advantages of Bayesian methods and deep learning. ZhuSuan is built upon Tensorflow. Unlike existing deep learning libraries, which are mainly designed for deterministic neural networks and supervised tasks, ZhuSuan is featured for its deep root into Bayesian inference, thus supporting various kinds of probabilistic models, including both the traditional hierarchical Bayesian models and recent deep generative models. We use running examples to illustrate the probabilistic programming on ZhuSuan, including Bayesian logistic regression, variational auto-encoders, deep sigmoid belief networks and Bayesian recurrent neural networks.