# 今日学术视野(2017.11.16)

cs.AI - 人工智能

cs.CL - 计算与语言

cs.CR - 加密与安全

cs.CV - 机器视觉与模式识别

cs.CY - 计算与社会

cs.DC - 分布式、并行与集群计算

cs.DS - 数据结构与算法

cs.HC - 人机接口

cs.IR - 信息检索

cs.IT - 信息论

cs.LG - 自动学习

cs.MS - 数学软件

cs.NE - 神经与进化计算

cs.RO - 机器人学

cs.SE - 软件工程

cs.SI - 社交网络与信息网络

cs.SY - 系统与控制

econ.EM - 计量经济学

eess.AS - 语音处理

eess.SP - 信号处理

math.CO - 组合数学

math.OC - 优化与控制

math.PR - 概率

math.ST - 统计理论

q-bio.CB - 细胞行为

q-bio.QM - 定量方法

quant-ph - 量子物理

stat.AP - 应用统计

stat.CO - 统计计算

stat.ME - 统计方法论

stat.ML - (统计)机器学习

• [cs.AI]An Empirical Study of the Effects of Spurious Transitions on Abstraction-based Heuristics

• [cs.AI]DataVizard: Recommending Visual Presentations for Structured Data

• [cs.AI]Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks

• [cs.AI]Learning Abduction under Partial Observability

• [cs.AI]Medical Diagnosis From Laboratory Tests by Combining Generative and Discriminative Learning

• [cs.AI]On the Synthesis of Guaranteed-Quality Plans for Robot Fleets in Logistics Scenarios via Optimization Modulo Theories

• [cs.AI]Prediction Under Uncertainty with Error-Encoding Networks

• [cs.AI]Self-Regulating Artificial General Intelligence

• [cs.AI]SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring

• [cs.AI]Tree Projections and Constraint Optimization Problems: Fixed-Parameter Tractability and Parallel Algorithms

• [cs.AI]Web Robot Detection in Academic Publishing

• [cs.CL]Classical Structured Prediction Losses for Sequence to Sequence Learning

• [cs.CL]Commonsense LocatedNear Relation Extraction

• [cs.CL]Controllable Abstractive Summarization

• [cs.CL]Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation

• [cs.CL]Digitising Cultural Complexity: Representing Rich Cultural Data in a Big Data environment

• [cs.CL]Discovering conversational topics and emotions associated with Demonetization tweets in India

• [cs.CL]DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications

• [cs.CL]Efficient Representation for Natural Language Processing via Kernelized Hashcodes

• [cs.CL]Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering

• [cs.CL]False Positive and Cross-relation Signals in Distant Supervision Data

• [cs.CL]Fast Reading Comprehension with ConvNets

• [cs.CL]From Word Segmentation to POS Tagging for Vietnamese

• [cs.CL]Interpretable probabilistic embeddings: bridging the gap between topic models and neural networks

• [cs.CL]Learning Document Embeddings With CNNs

• [cs.CL]Learning an Executable Neural Semantic Parser

• [cs.CL]MojiTalk: Generating Emotional Responses at Scale

• [cs.CL]Natural Language Inference with External Knowledge

• [cs.CL]On Extending Neural Networks with Loss Ensembles for Text Classification

• [cs.CL]QuickEdit: Editing Text & Translations via Simple Delete Actions

• [cs.CL]Robust Multilingual Part-of-Speech Tagging via Adversarial Training

• [cs.CL]SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning

• [cs.CL]Syntax-Directed Attention for Neural Machine Translation

• [cs.CL]Towards Human-level Machine Reading Comprehension: Reasoning and Inference with Multiple Strategies

• [cs.CL]Unified Pragmatic Models for Generating and Following Instructions

• [cs.CL]Unsupervised patient representations from clinical notes with interpretable classification decisions

• [cs.CL]Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT

• [cs.CL]Zero-Shot Style Transfer in Text Using Recurrent Neural Networks

• [cs.CR]Cryptanalysis of Merkle-Hellman cipher using parallel genetic algorithm

• [cs.CR]CryptoDL: Deep Neural Networks over Encrypted Data

• [cs.CR]Stampery Blockchain Timestamping Architecture (BTA) - Version 6

• [cs.CV]3D Shape Classification Using Collaborative Representation based Projections

• [cs.CV]All-Transfer Learning for Deep Neural Networks and its Application to Sepsis Classification

• [cs.CV]An Automatic Diagnosis Method of Facial Acne Vulgaris Based on Convolutional Neural Network

• [cs.CV]An optimized shape descriptor based on structural properties of networks

• [cs.CV]Arbitrarily-Oriented Text Recognition

• [cs.CV]Automatic Target Recognition of Aircraft using Inverse Synthetic Aperture Radar

• [cs.CV]Capturing Localized Image Artifacts through a CNN-based Hyper-image Representation

• [cs.CV]Conditional Autoencoders with Adversarial Information Factorization

• [cs.CV]Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation

• [cs.CV]Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video

• [cs.CV]Crowd counting via scale-adaptive convolutional neural network

• [cs.CV]D-PCN: Parallel Convolutional Neural Networks for Image Recognition in Reverse Adversarial Style

• [cs.CV]Denoising Imaging Polarimetry by Adapted BM3D Method

• [cs.CV]Dynamic Zoom-in Network for Fast Object Detection in Large Images

• [cs.CV]Evaluation of trackers for Pan-Tilt-Zoom Scenarios

• [cs.CV]Feature Enhancement Network: A Refined Scene Text Detector

• [cs.CV]Gender recognition and biometric identification using a large dataset of hand images

• [cs.CV]Grab, Pay and Eat: Semantic Food Detection for Smart Restaurants

• [cs.CV]Hand Gesture Recognition with Leap Motion

• [cs.CV]High-Order Attention Models for Visual Question Answering

• [cs.CV]Latent Constrained Correlation Filter

• [cs.CV]Modeling Human Categorization of Natural Images Using Deep Feature Representations

• [cs.CV]Robust Image Registration via Empirical Mode Decomposition

• [cs.CV]Robust Keyframe-based Dense SLAM with an RGB-D Camera

• [cs.CV]Saliency-based Sequential Image Attention with Multiset Prediction

• [cs.CV]Vertebral body segmentation with GrowCut: Initial experience, workflow and practical application

• [cs.CV]Visual Concepts and Compositional Voting

• [cs.CV]XGAN: Unsupervised Image-to-Image Translation for many-to-many Mappings

• [cs.CY]A Case Study of the 2016 Korean Cyber Command Compromise

• [cs.CY]Cloud Computing and Content Management Systems: A Case Study in Macedonian Education

• [cs.CY]Gerrymandering and Computational Redistricting

• [cs.CY]United Nations Digital Blue Helmets as a Starting Point for Cyber Peacekeeping

• [cs.CY]Using Phone Sensors and an Artificial Neural Network to Detect Gait Changes During Drinking Episodes in the Natural Environment

• [cs.DC]A Parallel Best-Response Algorithm with Exact Line Search for Nonconvex Sparsity-Regularized Rank Minimization

• [cs.DC]Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning

• [cs.DC]Cheating by Duplication: Equilibrium Requires Global Knowledge

• [cs.DC]Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes

• [cs.DC]Solving the Resource Constrained Project Scheduling Problem Using the Parallel Tabu Search Designed for the CUDA Platform

• [cs.DS]Robust Online Speed Scaling With Deadline Uncertainty

• [cs.DS]Similarity-Aware Spectral Sparsification by Edge Filtering

• [cs.HC]Coordination Technology for Active Support Networks: Context, Needfinding, and Design

• [cs.HC]Social Sensing of Floods in the UK

• [cs.IR]A Hierarchical Contextual Attention-based GRU Network for Sequential Recommendation

• [cs.IR]A distributed system for SearchOnMath based on the Microsoft BizSpark program

• [cs.IR]Faithful to the Original: Fact Aware Neural Abstractive Summarization

• [cs.IR]Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey

• [cs.IR]Neural Attentive Session-based Recommendation

• [cs.IR]Recommender Systems with Random Walks: A Survey

• [cs.IR]Targeted Advertising Based on Browsing History

• [cs.IT]A General Framework for Covariance Matrix Optimization in MIMO Systems

• [cs.IT]A Joint Encryption-Encoding Scheme Using QC-LDPC Codes Based on Finite Geometry

• [cs.IT]Capacity of UAV-Enabled Multicast Channel: Joint Trajectory Design and Power Allocation

• [cs.IT]Effect of enhanced dissipation by shear flows on transient relaxation and probability density function in two dimensions

• [cs.IT]Eigendecomposition-Based Partial FFT Demodulation for Differential OFDM in Underwater Acoustic Communications

• [cs.IT]Energy-Delay-Distortion Problem

• [cs.IT]Information Design for Strategic Coordination of Autonomous Devices with Non-Aligned Utilities

• [cs.IT]Persuasion with limited communication resources

• [cs.IT]Preserving Reliability to Heterogeneous Ultra-Dense Distributed Networks in Unlicensed Spectrum

• [cs.IT]Private Function Retrieval

• [cs.IT]Restoration by Compression

• [cs.IT]Robust Kullback-Leibler Divergence and Universal Hypothesis Testing for Continuous Distributions

• [cs.IT]Spatial Channel Covariance Estimation for the Hybrid MIMO Architecture: A Compressive Sensing Based Approach

• [cs.IT]Towards a Converse for the Nearest Lattice Point Problem

• [cs.IT]Truncated Polynomial Expansion Downlink Precoders and Uplink Detectors for Massive MIMO

• [cs.LG]"Found in Translation": Predicting Outcome of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models

• [cs.LG]A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction

• [cs.LG]A learning problem that is independent of the set theory ZFC axioms

• [cs.LG]A machine learning approach for efficient uncertainty quantification using multiscale methods

• [cs.LG]ADaPTION: Toolbox and Benchmark for Training Convolutional Neural Networks with Reduced Numerical Precision Weights and Activation

• [cs.LG]Adversarial Symmetric Variational Autoencoder

• [cs.LG]Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks

• [cs.LG]CUR Decompositions, Similarity Matrices, and Subspace Clustering

• [cs.LG]Dynamic Principal Projection for Cost-Sensitive Online Multi-Label Classification

• [cs.LG]Fixing Weight Decay Regularization in Adam

• [cs.LG]Linking Sequences of Events with Sparse or No Common Occurrence across Data Sets

• [cs.LG]Machine Learning for the Geosciences: Challenges and Opportunities

• [cs.LG]Machine vs Machine: Defending Classifiers Against Learning-based Adversarial Attacks

• [cs.LG]Multiple-Source Adaptation for Regression Problems

• [cs.LG]Near-optimal sample complexity for convex tensor completion

• [cs.LG]On the ERM Principle with Networked Data

• [cs.LG]Parameter Estimation in Finite Mixture Models by Regularized Optimal Transport: A Unified Framework for Hard and Soft Clustering

• [cs.LG]Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness

• [cs.LG]Provably efficient neural network representation for image classification

• [cs.LG]Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

• [cs.LG]Robust Matrix Elastic Net based Canonical Correlation Analysis: An Effective Algorithm for Multi-View Unsupervised Learning

• [cs.LG]Skyline Identification in Multi-Armed Bandits

• [cs.LG]Sobolev GAN

• [cs.LG]Sparsification of the Alignment Path Search Space in Dynamic Time Warping

• [cs.LG]Spatio-Temporal Data Mining: A Survey of Problems and Methods

• [cs.LG]Tensor Decompositions for Modeling Inverse Dynamics

• [cs.LG]Three Factors Influencing Minima in SGD

• [cs.LG]TripletGAN: Training Generative Model with Triplet Loss

• [cs.LG]Unified Spectral Clustering with Optimal Graph

• [cs.LG]Weightless: Lossy Weight Encoding For Deep Neural Network Compression

• [cs.LG]pyLEMMINGS: Large Margin Multiple Instance Classification and Ranking for Bioinformatics Applications

• [cs.MS]Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation

• [cs.NE]BP-STDP: Approximating Backpropagation using Spike Timing Dependent Plasticity

• [cs.NE]Concurrent Pump Scheduling and Storage Level Optimization Using Meta-Models and Evolutionary Algorithms

• [cs.NE]Deep Rewiring: Training very sparse deep networks

• [cs.NE]Learning Explanatory Rules from Noisy Data

• [cs.NE]Neural Networks Architecture Evaluation in a Quantum Computer

• [cs.NE]Reliability and Sharpness in Border Crossing Traffic Interval Prediction

• [cs.RO]Anytime Motion Planning on Large Dense Roadmaps with Expensive Edge Evaluations

• [cs.RO]Towards Planning and Control of Hybrid Systems with Limit Cycle using LQR Trees

• [cs.SE]Towards an interdisciplinary, socio-technical analysis of software ecosystem health

• [cs.SI]Generalized Neural Graph Embedding with Matrix Factorization

• [cs.SY]A Supervised Learning Concept for Reducing User Interaction in Passenger Cars

• [cs.SY]A unified decision making framework for supply and demand management in microgrid networks

• [econ.EM]Uniform Inference for Conditional Factor Models with Instrumental and Idiosyncratic Betas

• [eess.AS]Deep Networks tag the location of bird vocalisations on audio spectrograms

• [eess.AS]Multilingual Adaptation of RNN Based ASR Systems

• [eess.AS]Phonemic and Graphemic Multilingual CTC Based Speech Recognition

• [eess.SP]Person Recognition using Smartphones' Accelerometer Data

• [math.CO]Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science

• [math.OC]A Robust Variable Step Size Fractional Least Mean Square (RVSS-FLMS) Algorithm

• [math.PR]A Note on the Quasi-Stationary Distribution of the Shiryaev Martingale on the Positive Half-Line

• [math.PR]Joint Large Deviation principle for empirical measures of the d-regular random graphs

• [math.ST]Adaptive estimation and noise detection for an ergodic diffusion with observation noises

• [math.ST]Minimax estimation in linear models with unknown finite alphabet design

• [math.ST]On the boundary between qualitative and quantitative methods for causal inference

• [math.ST]Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm

• [math.ST]Strong consistency and optimality for generalized estimating equations with stochastic covariates

• [math.ST]The mixability of elliptical distributions with supermodular functions

• [math.ST]Thresholding Bandit for Dose-ranging: The Impact of Monotonicity

• [q-bio.CB]Using Game Theory for Real-Time Behavioral Dynamics in Microscopic Populations with Noisy Signaling

• [q-bio.QM]Parkinson's Disease Digital Biomarker Discovery with Optimized Transitions and Inferred Markov Emissions

• [quant-ph]Quantum transport senses community structure in networks

• [stat.AP]A Bayesian Model for Forecasting Hierarchically Structured Time Series

• [stat.AP]How to estimate time-varying Vector Autoregressive Models? A comparison of two methods

• [stat.AP]Machine Learning Meets Microeconomics: The Case of Decision Trees and Discrete Choice

• [stat.AP]On constraining projections of future climate using observations and simulations from multiple climate models

• [stat.AP]State space models for non-stationary intermittently coupled systems

• [stat.CO]Feature Selection based on the Local Lift Dependence Scale

• stat.MEConditional Sample Generation Based on Distribution Element Trees

• [stat.ME]A Test for Isotropy on a Sphere using Spherical Harmonic Functions

• [stat.ME]Bayesian linear regression models with flexible error distributions

• [stat.ME]Causal Inference from Observational Studies with Clustered Interference

• [stat.ME]Change Detection in a Dynamic Stream of Attributed Networks

• [stat.ME]Checking validity of monotone domain mean estimators

• [stat.ME]Deterministic parallel analysis

• [stat.ME]Estimating prediction error for complex samples

• [stat.ME]Generalised empirical likelihood-based kernel density estimation

• [stat.ME]Graph-Based Two-Sample Tests for Discrete Data

• [stat.ME]K-groups: A Generalization of K-means Clustering

• [stat.ME]Optimal estimation in functional linear regression for sparse noise-contaminated data

• [stat.ME]Quickest Detection of Markov Networks

• [stat.ME]Sharpening randomization-based causal inference for $2^2$ factorial designs with binary outcomes

• [stat.ME]Simultaneous Registration and Clustering for Multi-dimensional Functional Data

• [stat.ME]The SPDE approach for Gaussian random fields with general smoothness

• [stat.ML]A Batch Learning Framework for Scalable Personalized Ranking

• [stat.ML]A Sequence-Based Mesh Classifier for the Prediction of Protein-Protein Interactions

• [stat.ML]ACtuAL: Actor-Critic Under Adversarial Learning

• [stat.ML]Alpha-Divergences in Variational Dropout

• [stat.ML]Analyzing and Improving Stein Variational Gradient Descent for High-dimensional Marginal Inference

• [stat.ML]Blind Source Separation Using Mixtures of Alpha-Stable Distributions

• [stat.ML]Data Augmentation Generative Adversarial Networks

• [stat.ML]Fast and reliable inference algorithm for hierarchical stochastic block models

• [stat.ML]Feature importance scores and lossless feature pruning using Banzhaf power indices

• [stat.ML]Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals

• [stat.ML]Invariances and Data Augmentation for Supervised Music Transcription

• [stat.ML]Joint Gaussian Processes for Biophysical Parameter Retrieval

• [stat.ML]Learning and Visualizing Localized Geometric Features Using 3D-CNN: An Application to Manufacturability Analysis of Drilled Holes

• [stat.ML]Model Criticism in Latent Space

• [stat.ML]Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach

• [stat.ML]STARK: Structured Dictionary Learning Through Rank-one Tensor Recovery

• [stat.ML]Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

• [stat.ML]Semi-Supervised Learning via New Deep Network Inversion

• [stat.ML]Sensor Selection and Random Field Reconstruction for Robust and Cost-effective Heterogeneous Weather Sensor Networks for the Developing World

• [stat.ML]Should You Derive, Or Let the Data Drive? An Optimization Framework for Hybrid First-Principles Data-Driven Modeling

• [stat.ML]Simple And Efficient Architecture Search for Convolutional Neural Networks

• [stat.ML]Sparse quadratic classification rules via linear dimension reduction

• [stat.ML]Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries

• [stat.ML]Stochastic Strictly Contractive Peaceman-Rachford Splitting Method

• [stat.ML]Straggler Mitigation in Distributed Optimization Through Data Encoding

• [stat.ML]The Multi-layer Information Bottleneck Problem

• [stat.ML]Variance Reduced methods for Non-convex Composition Optimization

• [stat.ML]WMRB: Learning to Rank in a Scalable Batch Training Approach

·····································

• [cs.AI]**An Empirical Study of the Effects of Spurious Transitions on Abstraction-based Heuristics**

*Mehdi Sadeqi, Robert C. Holte, Sandra Zilles*

http://arxiv.org/abs/1711.05105v1

The efficient solution of state space search problems is often attempted by guiding search algorithms with heuristics (estimates of the distance from any state to the goal). A popular way for creating heuristic functions is by using an abstract version of the state space. However, the quality of abstraction-based heuristic functions, and thus the speed of search, can suffer from spurious transitions, i.e., state transitions in the abstract state space for which no corresponding transitions in the reachable component of the original state space exist. Our first contribution is a quantitative study demonstrating that the harmful effects of spurious transitions on heuristic functions can be substantial, in terms of both the increase in the number of abstract states and the decrease in the heuristic values, which may slow down search. Our second contribution is an empirical study on the benefits of removing a certain kind of spurious transition, namely those that involve states with a pair of mutually exclusive (mutex) variablevalue assignments. In the context of state space planning, a mutex pair is a pair of variable-value assignments that does not occur in any reachable state. Detecting mutex pairs is a problem that has been addressed frequently in the planning literature. Our study shows that there are cases in which mutex detection helps to eliminate harmful spurious transitions to a large extent and thus to speed up search substantially.

• [cs.AI]**DataVizard: Recommending Visual Presentations for Structured Data**

*Rema Ananthanarayanan, Pranay Kr. Lohia, Srikanta Bedathur*

http://arxiv.org/abs/1711.04971v1

Selecting the appropriate visual presentation of the data such that it preserves the semantics of the underlying data and at the same time provides an intuitive summary of the data is an important, often the final step of data analytics. Unfortunately, this is also a step involving significant human effort starting from selection of groups of columns in the structured results from analytics stages, to the selection of right visualization by experimenting with various alternatives. In this paper, we describe our \emph{DataVizard} system aimed at reducing this overhead by automatically recommending the most appropriate visual presentation for the structured result. Specifically, we consider the following two scenarios: first, when one needs to visualize the results of a structured query such as SQL; and the second, when one has acquired a data table with an associated short description (e.g., tables from the Web). Using a corpus of real-world database queries (and their results) and a number of statistical tables crawled from the Web, we show that DataVizard is capable of recommending visual presentations with high accuracy. We also present the results of a user survey that we conducted in order to assess user views of the suitability of the presented charts vis-a-vis the plain text captions of the data.

• [cs.AI]**Efficiency Analysis of ASP Encodings for Sequential Pattern Mining Tasks**

*Thomas Guyet, Yves Moinard, René Quiniou, Torsten Schaub*

http://arxiv.org/abs/1711.05090v1

This article presents the use of Answer Set Programming (ASP) to mine sequential patterns. ASP is a high-level declarative logic programming paradigm for high level encoding combinatorial and optimization problem solving as well as knowledge representation and reasoning. Thus, ASP is a good candidate for implementing pattern mining with background knowledge, which has been a data mining issue for a long time. We propose encodings of the classical sequential pattern mining tasks within two representations of embeddings (fill-gaps vs skip-gaps) and for various kinds of patterns: frequent, constrained and condensed. We compare the computational performance of these encodings with each other to get a good insight into the efficiency of ASP encodings. The results show that the fill-gaps strategy is better on real problems due to lower memory consumption. Finally, compared to a constraint programming approach (CPSM), another declarative programming paradigm, our proposal showed comparable performance.

• [cs.AI]**Learning Abduction under Partial Observability**

*Brendan Juba, Zongyi Li, Evan Miller*

http://arxiv.org/abs/1711.04438v1

Juba recently proposed a formulation of learning abductive reasoning from examples, in which both the relative plausibility of various explanations, as well as which explanations are valid, are learned directly from data. The main shortcoming of this formulation of the task is that it assumes access to full-information (i.e., fully specified) examples; relatedly, it offers no role for declarative background knowledge, as such knowledge is rendered redundant in the abduction task by complete information. In this work, we extend the formulation to utilize such partially specified examples, along with declarative background knowledge about the missing data. We show that it is possible to use implicitly learned rules together with the explicitly given declarative knowledge to support hypotheses in the course of abduction. We also show how to use knowledge in the form of graphical causal models to refine the proposed hypotheses. Finally, we observe that when a small explanation exists, it is possible to obtain a much- improved guarantee in the challenging exception-tolerant setting. Such small, human-understandable explanations are of particular interest for potential applications of the task.

• [cs.AI]**Medical Diagnosis From Laboratory Tests by Combining Generative and Discriminative Learning**

*Shiyue Zhang, Pengtao Xie, Dong Wang, Eric P. Xing*

http://arxiv.org/abs/1711.04329v1

A primary goal of computational phenotype research is to conduct medical diagnosis. In hospital, physicians rely on massive clinical data to make diagnosis decisions, among which laboratory tests are one of the most important resources. However, the longitudinal and incomplete nature of laboratory test data casts a significant challenge on its interpretation and usage, which may result in harmful decisions by both human physicians and automatic diagnosis systems. In this work, we take advantage of deep generative models to deal with the complex laboratory tests. Specifically, we propose an end-to-end architecture that involves a deep generative variational recurrent neural networks (VRNN) to learn robust and generalizable features, and a discriminative neural network (NN) model to learn diagnosis decision making, and the two models are trained jointly. Our experiments are conducted on a dataset involving 46,252 patients, and the 50 most frequent tests are used to predict the 50 most common diagnoses. The results show that our model, VRNN+NN, significantly (p<0.001) outperforms other baseline models. Moreover, we demonstrate that the representations learned by the joint training are more informative than those learned by pure generative models. Finally, we find that our model offers a surprisingly good imputation for missing values.

• [cs.AI]**On the Synthesis of Guaranteed-Quality Plans for Robot Fleets in Logistics Scenarios via Optimization Modulo Theories**

*Francesco Leofante, Erika Ábrahám, Tim Niemueller, Gerhard Lakemeyer, Armando Tacchella*

http://arxiv.org/abs/1711.04259v1

In manufacturing, the increasing involvement of autonomous robots in production processes poses new challenges on the production management. In this paper we report on the usage of Optimization Modulo Theories (OMT) to solve certain multi-robot scheduling problems in this area. Whereas currently existing methods are heuristic, our approach guarantees optimality for the computed solution. We do not only present our final method but also its chronological development, and draw some general observations for the development of OMT-based approaches.

• [cs.AI]**Prediction Under Uncertainty with Error-Encoding Networks**

*Mikael Henaff, Junbo Zhao, Yann LeCun*

http://arxiv.org/abs/1711.04994v1

In this work we introduce a new framework for performing temporal predictions in the presence of uncertainty. It is based on a simple idea of disentangling components of the future state which are predictable from those which are inherently unpredictable, and encoding the unpredictable components into a low-dimensional latent variable which is fed into a forward model. Our method uses a supervised training objective which is fast and easy to train. We evaluate it in the context of video prediction on multiple datasets and show that it is able to consistently generate diverse predictions without the need for alternating minimization over a latent space or adversarial training.

• [cs.AI]**Self-Regulating Artificial General Intelligence**

*Joshua S. Gans*

http://arxiv.org/abs/1711.04309v1

Here we examine the paperclip apocalypse concern for artificial general intelligence (or AGI) whereby a superintelligent AI with a simple goal (ie., producing paperclips) accumulates power so that all resources are devoted towards that simple goal and are unavailable for any other use. We provide conditions under which a paper apocalypse can arise but also show that, under certain architectures for recursive self-improvement of AIs, that a paperclip AI may refrain from allowing power capabilities to be developed. The reason is that such developments pose the same control problem for the AI as they do for humans (over AIs) and hence, threaten to deprive it of resources for its primary goal.

• [cs.AI]**SkipFlow: Incorporating Neural Coherence Features for End-to-End Automatic Text Scoring**

*Yi Tay, Minh C. Phan, Luu Anh Tuan, Siu Cheung Hui*

http://arxiv.org/abs/1711.04981v1

Deep learning has demonstrated tremendous potential for Automatic Text Scoring (ATS) tasks. In this paper, we describe a new neural architecture that enhances vanilla neural network models with auxiliary neural coherence features. Our new method proposes a new \textsc{SkipFlow} mechanism that models relationships between snapshots of the hidden representations of a long short-term memory (LSTM) network as it reads. Subsequently, the semantic relationships between multiple snapshots are used as auxiliary features for prediction. This has two main benefits. Firstly, essays are typically long sequences and therefore the memorization capability of the LSTM network may be insufficient. Implicit access to multiple snapshots can alleviate this problem by acting as a protection against vanishing gradients. The parameters of the \textsc{SkipFlow} mechanism also acts as an auxiliary memory. Secondly, modeling relationships between multiple positions allows our model to learn features that represent and approximate textual coherence. In our model, we call this \textit{neural coherence} features. Overall, we present a unified deep learning architecture that generates neural coherence features as it reads in an end-to-end fashion. Our approach demonstrates state-of-the-art performance on the benchmark ASAP dataset, outperforming not only feature engineering baselines but also other deep learning models.

• [cs.AI]**Tree Projections and Constraint Optimization Problems: Fixed-Parameter Tractability and Parallel Algorithms**

*Georg Gottlob, Gianlugi Greco, Francesco Scarcello*

http://arxiv.org/abs/1711.05216v1

Tree projections provide a unifying framework to deal with most structural decomposition methods of constraint satisfaction problems (CSPs). Within this framework, a CSP instance is decomposed into a number of sub-problems, called views, whose solutions are either already available or can be computed efficiently. The goal is to arrange portions of these views in a tree-like structure, called tree projection, which determines an efficiently solvable CSP instance equivalent to the original one. Deciding whether a tree projection exists is NP-hard. Solution methods have therefore been proposed in the literature that do not require a tree projection to be given, and that either correctly decide whether the given CSP instance is satisfiable, or return that a tree projection actually does not exist. These approaches had not been generalized so far on CSP extensions for optimization problems, where the goal is to compute a solution of maximum value/minimum cost. The paper fills the gap, by exhibiting a fixed-parameter polynomial-time algorithm that either disproves the existence of tree projections or computes an optimal solution, with the parameter being the size of the expression of the objective function to be optimized over all possible solutions (and not the size of the whole constraint formula, used in related works). Tractability results are also established for the problem of returning the best K solutions. Finally, parallel algorithms for such optimization problems are proposed and analyzed. Given that the classes of acyclic hypergraphs, hypergraphs of bounded treewidth, and hypergraphs of bounded generalized hypertree width are all covered as special cases of the tree projection framework, the results in this paper directly apply to these classes. These classes are extensively considered in the CSP setting, as well as in conjunctive database query evaluation and optimization.

• [cs.AI]**Web Robot Detection in Academic Publishing**

*Athanasios Lagopoulos, Grigorios Tsoumakas, Georgios Papadopoulos*

http://arxiv.org/abs/1711.05098v1

Recent industry reports assure the rise of web robots which comprise more than half of the total web traffic. They not only threaten the security, privacy and efficiency of the web but they also distort analytics and metrics, doubting the veracity of the information being promoted. In the academic publishing domain, this can cause articles to be faulty presented as prominent and influential. In this paper, we present our approach on detecting web robots in academic publishing websites. We use different supervised learning algorithms with a variety of characteristics deriving from both the log files of the server and the content served by the website. Our approach relies on the assumption that human users will be interested in specific domains or articles, while web robots crawl a web library incoherently. We experiment with features adopted in previous studies with the addition of novel semantic characteristics which derive after performing a semantic analysis using the Latent Dirichlet Allocation (LDA) algorithm. Our real-world case study shows promising results, pinpointing the significance of semantic features in the web robot detection problem.

• [cs.CL]**Classical Structured Prediction Losses for Sequence to Sequence Learning**

*Sergey Edunov, Myle Ott, Michael Auli, David Grangier, Marc'Aurelio Ranzato*

http://arxiv.org/abs/1711.04956v1

There has been much recent work on training neural attention models at the sequence-level using either reinforcement learning-style methods or by optimizing the beam. In this paper, we survey a range of classical objective functions that have been widely used to train linear models for structured prediction and apply them to neural sequence to sequence models. Our experiments show that these losses can perform surprisingly well by slightly outperforming beam search optimization in a like for like setup. We also report new state of the art results on both IWSLT 2014 German-English translation as well as Gigaword abstractive summarization.

• [cs.CL]**Commonsense LocatedNear Relation Extraction**

*Frank F. Xu, Bill Y. Lin, Kenny Q. Zhu*

http://arxiv.org/abs/1711.04204v1

LocatedNear relation describes two typically co-located objects, which is a type of useful commonsense knowledge for computer vision, natural language understanding, machine comprehension, etc. We propose to automatically extract such relationship through a sentence-level classifier and aggregating the scores of entity pairs detected from a large number of sentences. To enable the research of these tasks, we release two benchmark datasets, one containing 5,000 sentences annotated with whether a mentioned entity pair has LocatedNear relation in the given sentence or not; the other containing 500 pairs of physical objects and whether they are commonly located nearby. We also propose some baseline methods for the tasks and compare the results with a state-of-the-art general-purpose relation classifier.

• [cs.CL]**Controllable Abstractive Summarization**

*Angela Fan, David Grangier, Michael Auli*

http://arxiv.org/abs/1711.05217v1

Current models for document summarization ignore user preferences such as the desired length, style or entities that the user has a preference for. We present a neural summarization model that enables users to specify such high level attributes in order to control the shape of the final summaries to better suit their needs. With user input, we show that our system can produce high quality summaries that are true to user preference. Without user input, we can set the control variables automatically and outperform comparable state of the art summarization systems despite the relative simplicity of our model.

• [cs.CL]**Convolutional Neural Network with Word Embeddings for Chinese Word Segmentation**

*Chunqi Wang, Bo Xu*

http://arxiv.org/abs/1711.04411v1

Character-based sequence labeling framework is flexible and efficient for Chinese word segmentation (CWS). Recently, many character-based neural models have been applied to CWS. While they obtain good performance, they have two obvious weaknesses. The first is that they heavily rely on manually designed bigram feature, i.e. they are not good at capturing n-gram features automatically. The second is that they make no use of full word information. For the first weakness, we propose a convolutional neural model, which is able to capture rich n-gram features without any feature engineering. For the second one, we propose an effective approach to integrate the proposed model with word embeddings. We evaluate the model on two benchmark datasets: PKU and MSR. Without any feature engineering, the model obtains competitive performance -- 95.7% on PKU and 97.3% on MSR. Armed with word embeddings, the model achieves state-of-the-art performance on both datasets -- 96.5% on PKU and 98.0% on MSR, without using any external labeled resource.

• [cs.CL]**Digitising Cultural Complexity: Representing Rich Cultural Data in a Big Data environment**

*Jennifer Edmond, Georgina Nugent Folan*

http://arxiv.org/abs/1711.04452v1

One of the major terminological forces driving ICT integration in research today is that of "big data." While the phrase sounds inclusive and integrative, "big data" approaches are highly selective, excluding input that cannot be effectively structured, represented, or digitised. Data of this complex sort is precisely the kind that human activity produces, but the technological imperative to enhance signal through the reduction of noise does not accommodate this richness. Data and the computational approaches that facilitate "big data" have acquired a perceived objectivity that belies their curated, malleable, reactive, and performative nature. In an input environment where anything can "be data" once it is entered into the system as "data," data cleaning and processing, together with the metadata and information architectures that structure and facilitate our cultural archives acquire a capacity to delimit what data are. This engenders a process of simplification that has major implications for the potential for future innovation within research environments that depend on rich material yet are increasingly mediated by digital technologies. This paper presents the preliminary findings of the European-funded KPLEX (Knowledge Complexity) project which investigates the delimiting effect digital mediation and datafication has on rich, complex cultural data. The paper presents a systematic review of existing implicit definitions of data, elaborating on the implications of these definitions and highlighting the ways in which metadata and computational technologies can restrict the interpretative potential of data. It sheds light on the gap between analogue or augmented digital practices and fully computational ones, and the strategies researchers have developed to deal with this gap. The paper proposes a reconceptualisation of data as it is functionally employed within digitally-mediated research so as to incorporate and acknowledge the richness and complexity of our source materials.

• [cs.CL]**Discovering conversational topics and emotions associated with Demonetization tweets in India**

*Mitodru Niyogi, Asim K. Pal*

http://arxiv.org/abs/1711.04115v1

Social media platforms contain great wealth of information which provides us opportunities explore hidden patterns or unknown correlations, and understand people's satisfaction with what they are discussing. As one showcase, in this paper, we summarize the data set of Twitter messages related to recent demonetization of all Rs. 500 and Rs. 1000 notes in India and explore insights from Twitter's data. Our proposed system automatically extracts the popular latent topics in conversations regarding demonetization discussed in Twitter via the Latent Dirichlet Allocation (LDA) based topic model and also identifies the correlated topics across different categories. Additionally, it also discovers people's opinions expressed through their tweets related to the event under consideration via the emotion analyzer. The system also employs an intuitive and informative visualization to show the uncovered insight. Furthermore, we use an evaluation measure, Normalized Mutual Information (NMI), to select the best LDA models. The obtained LDA results show that the tool can be effectively used to extract discussion topics and summarize them for further manual analysis.

• [cs.CL]**DuReader: a Chinese Machine Reading Comprehension Dataset from Real-world Applications**

*Wei He, Kai Liu, Yajuan Lyu, Shiqi Zhao, Xinyan Xiao, Yuan Liu, Yizhong Wang, Hua Wu, Qiaoqiao She, Xuan Liu, Haifeng Wang*

http://arxiv.org/abs/1711.05073v1

In this paper, we introduce DuReader, a new large-scale, open-domain Chinese machine reading comprehension (MRC) dataset, aiming to tackle real-world MRC problems. In comparison to prior datasets, DuReader has the following characteristics: (a) the questions and the documents are all extracted from real application data, and the answers are human generated; (b) it provides rich annotations for question types, especially yes-no and opinion questions, which take a large proportion in real users' questions but have not been well studied before; (c) it provides multiple answers for each question. The first release of DuReader contains 200k questions, 1,000K documents, and 420k answers, which, to the best of our knowledge, is the largest Chinese MRC dataset so far. Experimental results show there exists big gap between the state-of-the-art baseline systems and human performance, which indicates DuReader is a challenging dataset that deserves future study. The dataset and the code of the baseline systems are public available now.

• [cs.CL]**Efficient Representation for Natural Language Processing via Kernelized Hashcodes**

*Sahil Garg, Aram Galstyan, Irina Rish, Guillermo Cecchi, Shuyang Gao*

http://arxiv.org/abs/1711.04044v1

Kernel similarity functions have been successfully applied in classification models such as Support Vector Machines, Gaussian Processes and k-Nearest Neighbors (kNN), but found to be computationally expensive for Natural Language Processing (NLP) tasks due to the cost of computing kernel similarities between discrete natural language structures. A well-known technique, Kernelized Locality Sensitive Hashing (KLSH), allows for an approximate computation of kNN graphs and significantly reduces the number of kernel computations; however, applying KLSH to other classifiers have not been explored. In this paper, we propose to use random subspaces of KLSH codes for constructing an efficient representation that preserves fine-grained structure of the data and is suitable for general classification methods. Further, we proposed an approach for optimizing KLSH model for supervised classification problems, by maximizing a variational lower bound on the mutual information between the KLSH codes (feature vectors) and the class labels.We apply the proposed approach to the task of extracting information about bio-molecular interactions from the semantic parsing of scientific papers. Our empirical results on a variety of datasets demonstrate significant improvements over the state of the art.

• [cs.CL]**Evidence Aggregation for Answer Re-Ranking in Open-Domain Question Answering**

*Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell*

http://arxiv.org/abs/1711.05116v1

A popular recent approach to answering open-domain questions is to first search for question-related passages and then apply reading comprehension models to extract answers. Existing methods usually extract answers from single passages independently. But some questions require a combination of evidence from across different sources to answer correctly. In this paper, we propose two models which make use of multiple passages to generate their answers. Both use an answer-reranking approach which reorders the answer candidates generated by an existing state-of-the-art QA model. We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer. Our models have achieved state-of-the-art results on three public open-domain QA datasets: Quasar-T, SearchQA and the open-domain version of TriviaQA, with about 8 percentage points of improvement over the former two datasets.

• [cs.CL]**False Positive and Cross-relation Signals in Distant Supervision Data**

*Anca Dumitrache, Lora Aroyo, Chris Welty*

http://arxiv.org/abs/1711.05186v1

Distant supervision (DS) is a well-established method for relation extraction from text, based on the assumption that when a knowledge-base contains a relation between a term pair, then sentences that contain that pair are likely to express the relation. In this paper, we use the results of a crowdsourcing relation extraction task to identify two problems with DS data quality: the widely varying degree of false positives across different relations, and the observed causal connection between relations that are not considered by the DS method. The crowdsourcing data aggregation is performed using ambiguity-aware CrowdTruth metrics, that are used to capture and interpret inter-annotator disagreement. We also present preliminary results of using the crowd to enhance DS training data for a relation classification model, without requiring the crowd to annotate the entire set.

• [cs.CL]**Fast Reading Comprehension with ConvNets**

*Felix Wu, Ni Lao, John Blitzer, Guandao Yang, Kilian Weinberger*

http://arxiv.org/abs/1711.04352v1

State-of-the-art deep reading comprehension models are dominated by recurrent neural nets. Their sequential nature is a natural fit for language, but it also precludes parallelization within an instances and often becomes the bottleneck for deploying such models to latency critical scenarios. This is particularly problematic for longer texts. Here we present a convolutional architecture as an alternative to these recurrent architectures. Using simple dilated convolutional units in place of recurrent ones, we achieve results comparable to the state of the art on two question answering tasks, while at the same time achieving up to two orders of magnitude speedups for question answering.

• [cs.CL]**From Word Segmentation to POS Tagging for Vietnamese**

*Dat Quoc Nguyen, Thanh Vu, Dai Quoc Nguyen, Mark Dras, Mark Johnson*

http://arxiv.org/abs/1711.04951v1

This paper presents an empirical comparison of two strategies for Vietnamese Part-of-Speech (POS) tagging from unsegmented text: (i) a pipeline strategy where we consider the output of a word segmenter as the input of a POS tagger, and (ii) a joint strategy where we predict a combined segmentation and POS tag for each syllable. We also make a comparison between state-of-the-art (SOTA) feature-based and neural network-based models. On the benchmark Vietnamese treebank (Nguyen et al., 2009), experimental results show that the pipeline strategy produces better scores of POS tagging from unsegmented text than the joint strategy, and the highest accuracy is obtained by using a feature-based model.

• [cs.CL]**Interpretable probabilistic embeddings: bridging the gap between topic models and neural networks**

*Anna Potapenko, Artem Popov, Konstantin Vorontsov*

http://arxiv.org/abs/1711.04154v1

We consider probabilistic topic models and more recent word embedding techniques from a perspective of learning hidden semantic representations. Inspired by a striking similarity of the two approaches, we merge them and learn probabilistic embeddings with online EM-algorithm on word co-occurrence data. The resulting embeddings perform on par with Skip-Gram Negative Sampling (SGNS) on word similarity tasks and benefit in the interpretability of the components. Next, we learn probabilistic document embeddings that outperform paragraph2vec on a document similarity task and require less memory and time for training. Finally, we employ multimodal Additive Regularization of Topic Models (ARTM) to obtain a high sparsity and learn embeddings for other modalities, such as timestamps and categories. We observe further improvement of word similarity performance and meaningful inter-modality similarities.

• [cs.CL]**Learning Document Embeddings With CNNs**

*Chundi Liu, Shunan Zhao, Maksims Volkovs*

http://arxiv.org/abs/1711.04168v1

We propose a new model for unsupervised document embedding. Existing approaches either require complex inference or use recurrent neural networks that are difficult to parallelize. We take a different route and use recent advances in language modelling to develop a convolutional neural network embedding model. This allows us to train deeper architectures that are fully parallelizable. Stacking layers together increases the receptive field allowing each successive layer to model increasingly longer range semantic dependencies within the document. Empirically, we demonstrate superior results on two publicly available benchmarks.

• [cs.CL]**Learning an Executable Neural Semantic Parser**

*Jianpeng Cheng, Siva Reddy, Vijay Saraswat, Mirella Lapata*

http://arxiv.org/abs/1711.05066v1

This paper describes a neural semantic parser that maps natural language utterances onto logical forms which can be executed against a task-specific environment, such as a knowledge base or a database, to produce a response. The parser generates tree-structured logical forms with a transition-based approach which combines a generic tree-generation algorithm with domain-general operations defined by the logical language. The generation process is modeled by structured recurrent neural networks, which provide a rich encoding of the sentential context and generation history for making predictions. To tackle mismatches between natural language and logical form tokens, various attention mechanisms are explored. Finally, we consider different training settings for the neural semantic parser, including a fully supervised training where annotated logical forms are given, weakly-supervised training where denotations are provided, and distant supervision where only unlabeled sentences and a knowledge base are available. Experiments across a wide range of datasets demonstrate the effectiveness of our parser.

• [cs.CL]**MojiTalk: Generating Emotional Responses at Scale**

*Xianda Zhou, William Yang Wang*

http://arxiv.org/abs/1711.04090v1

Generating emotional language is a key step towards building empathetic natural language processing agents. However, a major challenge for this line of research is the lack of large-scale labeled training data, and previous studies are limited to only small sets of human annotated sentiment labels. Additionally, explicitly controlling the emotion and sentiment of generated text is also difficult. In this paper, we take a more radical approach: we exploit the idea of leveraging Twitter data that are naturally labeled with emojis. More specifically, we collect a large corpus of Twitter conversations that include emojis in the response, and assume the emojis convey the underlying emotions of the sentence. We then introduce a reinforced conditional variational encoder approach to train a deep generative model on these conversations, which allows us to use emojis to control the emotion of the generated text. Experimentally, we show in our quantitative and qualitative analyses that the proposed models can successfully generate high-quality abstractive conversation responses in accordance with designated emotions.

• [cs.CL]**Natural Language Inference with External Knowledge**

*Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen*

http://arxiv.org/abs/1711.04289v1

Modeling informal inference in natural language is very challenging. With the recent availability of large annotated data, it has become feasible to train complex models such as neural networks to perform natural language inference (NLI), which have achieved state-of-the-art performance. Although there exist relatively large annotated data, can machines learn all knowledge needed to perform NLI from the data? If not, how can NLI models benefit from external knowledge and how to build NLI models to leverage it? In this paper, we aim to answer these questions by enriching the state-of-the-art neural natural language inference models with external knowledge. We demonstrate that the proposed models with external knowledge further improve the state of the art on the Stanford Natural Language Inference (SNLI) dataset.

• [cs.CL]**On Extending Neural Networks with Loss Ensembles for Text Classification**

*Hamideh Hajiabadi, Diego Molla-Aliod, Reza Monsefi*

http://arxiv.org/abs/1711.05170v1

Ensemble techniques are powerful approaches that combine several weak learners to build a stronger one. As a meta learning framework, ensemble techniques can easily be applied to many machine learning techniques. In this paper we propose a neural network extended with an ensemble loss function for text classification. The weight of each weak loss function is tuned within the training phase through the gradient propagation optimization method of the neural network. The approach is evaluated on several text classification datasets. We also evaluate its performance in various environments with several degrees of label noise. Experimental results indicate an improvement of the results and strong resilience against label noise in comparison with other methods.

• [cs.CL]**QuickEdit: Editing Text & Translations via Simple Delete Actions**

*David Grangier, Michael Auli*

http://arxiv.org/abs/1711.04805v1

We propose a framework for computer-assisted text editing. It applies to translation post-editing and to paraphrasing and relies on very simple interactions: a human editor modifies a sentence by marking tokens they would like the system to change. Our model then generates a new sentence which reformulates the initial sentence by avoiding the words from the marked tokens. Our approach builds upon neural sequence-to-sequence modeling and introduces a neural network which takes as input a sentence along with deleted token markers. Our model is trained on translation bi-text by simulating post-edits. Our results on post-editing for machine translation and paraphrasing evaluate the performance of our approach. We show +11.4 BLEU with limited post-editing effort on the WMT-14 English-German translation task (25.2 to 36.6), which represents +5.9 BLEU over the post-editing baseline (30.7 to 36.6).

• [cs.CL]**Robust Multilingual Part-of-Speech Tagging via Adversarial Training**

*Michihiro Yasunaga, Jungo Kasai, Dragomir Radev*

http://arxiv.org/abs/1711.04903v1

Adversarial training (AT) is a powerful regularization method for neural networks, aiming to achieve robustness to input perturbations. Yet, the specific effects of the robustness obtained by AT are still unclear in the context of natural language processing. In this paper, we propose and analyze a neural POS tagging model that exploits adversarial training (AT). In our experiments on the Penn Treebank WSJ corpus and the Universal Dependencies (UD) dataset (28 languages), we find that AT not only improves the overall tagging accuracy, but also 1) largely prevents overfitting in low resource languages and 2) boosts tagging accuracy for rare / unseen words. The proposed POS tagger achieves state-of-the-art performance on nearly all of the languages in UD v1.2. We also demonstrate that 3) the improved tagging performance by AT contributes to the downstream task of dependency parsing, and that 4) AT helps the model to learn cleaner word and internal representations. These positive results motivate further use of AT for natural language tasks.

• [cs.CL]**SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning**

*Xiaojun Xu, Chang Liu, Dawn Song*

http://arxiv.org/abs/1711.04436v1

Synthesizing SQL queries from natural language is a long-standing open problem and has been attracting considerable interest recently. Toward solving the problem, the de facto approach is to employ a sequence-to-sequence-style model. Such an approach will necessarily require the SQL queries to be serialized. Since the same SQL query may have multiple equivalent serializations, training a sequence-to-sequence-style model is sensitive to the choice from one of them. This phenomenon is documented as the "order-matters" problem. Existing state-of-the-art approaches rely on reinforcement learning to reward the decoder when it generates any of the equivalent serializations. However, we observe that the improvement from reinforcement learning is limited. In this paper, we propose a novel approach, i.e., SQLNet, to fundamentally solve this problem by avoiding the sequence-to-sequence structure when the order does not matter. In particular, we employ a sketch-based approach where the sketch contains a dependency graph so that one prediction can be done by taking into consideration only the previous predictions that it depends on. In addition, we propose a sequence-to-set model as well as the column attention mechanism to synthesize the query based on the sketch. By combining all these novel techniques, we show that SQLNet can outperform the prior art by 9% to 13% on the WikiSQL task.

• [cs.CL]**Syntax-Directed Attention for Neural Machine Translation**

*Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Tiejun Zhao*

http://arxiv.org/abs/1711.04231v1

Attention mechanism, including global attention and local attention, plays a key role in neural machine translation (NMT). Global attention attends to all source words for word prediction. In comparison, local attention selectively looks at fixed-window source words. However, alignment weights for the current target word often decrease to the left and right by linear distance centering on the aligned source position and neglect syntax-directed distance constraints. In this paper, we extend local attention with syntax-distance constraint, to focus on syntactically related source words with the predicted target word, thus learning a more effective context vector for word prediction. Moreover, we further propose a double context NMT architecture, which consists of a global context vector and a syntax-directed context vector over the global attention, to provide more translation performance for NMT from source representation. The experiments on the large-scale Chinese-to-English and English-to-Germen translation tasks show that the proposed approach achieves a substantial and significant improvement over the baseline system.

• [cs.CL]**Towards Human-level Machine Reading Comprehension: Reasoning and Inference with Multiple Strategies**

*Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu*

http://arxiv.org/abs/1711.04964v1

This paper presents a new MRC model that is capable of three key comprehension skills: 1) handling rich variations in question types; 2) understanding potential answer choices; and 3) drawing inference through multiple sentences. The model is based on the proposed MUlti-Strategy Inference for Comprehension (MUSIC) architecture, which is able to dynamically apply different attention strategies to different types of questions on the fly. By incorporating a multi-step inference engine analogous to ReasoNet (Shen et al., 2017), MUSIC can also effectively perform multi-sentence inference in generating answers. Evaluation on the RACE dataset shows that the proposed method significantly outperforms previous state-of-the-art models by 7.5% in relative accuracy.

• [cs.CL]**Unified Pragmatic Models for Generating and Following Instructions**

*Daniel Fried, Jacob Andreas, Dan Klein*

http://arxiv.org/abs/1711.04987v1

We extend models for both following and generating natural language instructions by adding an explicit pragmatic layer. These pragmatics-enabled models explicitly reason about why speakers produce certain instructions, and about how listeners will react upon hearing them. Given learned base listener and speaker models, we build a pragmatic listener that uses the base speaker to reason counterfactually about alternative action descriptions, and a pragmatic speaker that uses the base listener to simulate the interpretation of candidate instruction sequences. Evaluation of language generation and interpretation in the SAIL navigation and SCONE instruction following datasets shows that the pragmatic inference procedure improves state-of-the-art listener models (at correctly interpreting human instructions) and speaker models (at producing instructions correctly interpretable by humans) in diverse settings.

• [cs.CL]**Unsupervised patient representations from clinical notes with interpretable classification decisions**

*Madhumita Sushil, Simon Šuster, Kim Luyckx, Walter Daelemans*

http://arxiv.org/abs/1711.05198v1

We have two main contributions in this work: 1. We explore the usage of a stacked denoising autoencoder, and a paragraph vector model to learn task-independent dense patient representations directly from clinical notes. We evaluate these representations by using them as features in multiple supervised setups, and compare their performance with those of sparse representations. 2. To understand and interpret the representations, we explore the best encoded features within the patient representations obtained from the autoencoder model. Further, we calculate the significance of the input features of the trained classifiers when we use these pretrained representations as input.

• [cs.CL]**Word, Subword or Character? An Empirical Study of Granularity in Chinese-English NMT**

*Yining Wang, Long Zhou, Jiajun Zhang, Chengqing Zong*

http://arxiv.org/abs/1711.04457v1

Neural machine translation (NMT), a new approach to machine translation, has been proved to outperform conventional statistical machine translation (SMT) across a variety of language pairs. Translation is an open-vocabulary problem, but most existing NMT systems operate with a fixed vocabulary, which causes the incapability of translating rare words. This problem can be alleviated by using different translation granularities, such as character, subword and hybrid word-character. Translation involving Chinese is one of the most difficult tasks in machine translation, however, to the best of our knowledge, there has not been any other work exploring which translation granularity is most suitable for Chinese in NMT. In this paper, we conduct an extensive comparison using Chinese-English NMT as a case study. Furthermore, we discuss the advantages and disadvantages of various translation granularities in detail. Our experiments show that subword model performs best for Chinese-to-English translation with the vocabulary which is not so big while hybrid word-character model is most suitable for English-to-Chinese translation. Moreover, experiments of different granularities show that Hybrid_BPE method can achieve best result on Chinese-to-English translation task.

• [cs.CL]**Zero-Shot Style Transfer in Text Using Recurrent Neural Networks**

*Keith Carlson, Allen Riddell, Daniel Rockmore*

http://arxiv.org/abs/1711.04731v1

Zero-shot translation is the task of translating between a language pair where no aligned data for the pair is provided during training. In this work we employ a model that creates paraphrases which are written in the style of another existing text. Since we provide the model with no paired examples from the source style to the target style during training, we call this task zero-shot style transfer. Herein, we identify a high-quality source of aligned, stylistically distinct text in Bible versions and use this data to train an encoder/decoder recurrent neural model. We also train a statistical machine translation system, Moses, for comparison. We find that the neural network outperforms Moses on the established BLEU and PINC metrics for evaluating paraphrase quality. This technique can be widely applied due to the broad definition of style which is used. For example, tasks like text simplification can easily be viewed as style transfer. The corpus itself is highly parallel with 33 distinct Bible Versions used, and human-aligned due to the presence of chapter and verse numbers within the text. This makes the data a rich source of study for other natural language tasks.

• [cs.CR]**Cryptanalysis of Merkle-Hellman cipher using parallel genetic algorithm**

*Nedjmeeddine Kantour, Sadek Bouroubi*

http://arxiv.org/abs/1711.04642v1

In 1976, Whitfield Diffie and Martin Hellman introduced the public key cryptography or asymmetric cryptography standards. Two years later, an asymmetric cryptosystem was published by Ralph Merkle and Martin Hellman called MH, based on a variant of knapsack problem known as the subset-sum problem which is proven to be NP-hard. Furthermore, over the last four decades, Metaheuristics have achieved a remarkable progress in solving NP-hard optimization problems. However, the conception of these methods raises several challenges, mainly the adaptation and the parameters setting. In this paper, we propose a Parallel Genetic Algorithm (PGA) adapted to explore effectively the search space of considerable size in order to break the MH cipher. Experimental study is included, showing the performance of the proposed attacking scheme and finally concluding with a comparison with the LLL algorithm attack.

• [cs.CR]**CryptoDL: Deep Neural Networks over Encrypted Data**

*Ehsan Hesamifard, Hassan Takabi, Mehdi Ghasemi*

http://arxiv.org/abs/1711.05189v1

Machine learning algorithms based on deep neural networks have achieved remarkable results and are being extensively used in different domains. However, the machine learning algorithms requires access to raw data which is often privacy sensitive. To address this issue, we develop new techniques to provide solutions for running deep neural networks over encrypted data. In this paper, we develop new techniques to adopt deep neural networks within the practical limitation of current homomorphic encryption schemes. More specifically, we focus on classification of the well-known convolutional neural networks (CNN). First, we design methods for approximation of the activation functions commonly used in CNNs (i.e. ReLU, Sigmoid, and Tanh) with low degree polynomials which is essential for efficient homomorphic encryption schemes. Then, we train convolutional neural networks with the approximation polynomials instead of original activation functions and analyze the performance of the models. Finally, we implement convolutional neural networks over encrypted data and measure performance of the models. Our experimental results validate the soundness of our approach with several convolutional neural networks with varying number of layers and structures. When applied to the MNIST optical character recognition tasks, our approach achieves 99.52% accuracy which significantly outperforms the state-of-the-art solutions and is very close to the accuracy of the best non-private version, 99.77%. Also, it can make close to 164000 predictions per hour. We also applied our approach to CIFAR-10, which is much more complex compared to MNIST, and were able to achieve 91.5% accuracy with approximation polynomials used as activation functions. These results show that CryptoDL provides efficient, accurate and scalable privacy-preserving predictions.

• [cs.CR]**Stampery Blockchain Timestamping Architecture (BTA) - Version 6**

*Adán Sánchez de Pedro Crespo, Luis Iván Cuende García*

http://arxiv.org/abs/1711.04709v1

A method for timestamping, anchoring and certification of a virtually unlimited amount of data in one or more blockchains, focusing on scalability and cost-effectiveness while ensuring existence, integrity and ownership by using cryptographic proofs that are independently verifiable by anyone in the world without disclosure of the original data and without the intervention of the certifying party.

• [cs.CV]**3D Shape Classification Using Collaborative Representation based Projections**

*F. Fotopoulou, S. Oikonomou, A. Papathanasiou, G. Economou, S. Fotopoulos*

http://arxiv.org/abs/1711.04875v1

A novel 3D shape classification scheme, based on collaborative representation learning, is investigated in this work. A data-driven feature-extraction procedure, taking the form of a simple projection operator, is in the core of our methodology. Provided a shape database, a graph encapsulating the structural relationships among all the available shapes, is first constructed and then employed in defining low-dimensional sparse projections. The recently introduced method of CRPs (collaborative representation based projections), which is based on L2-Graph, is the first variant that is included towards this end. A second algorithm, that particularizes the CRPs to shape descriptors that are inherently nonnegative, is also introduced as potential alternative. In both cases, the weights in the graph reflecting the database structure are calculated so as to approximate each shape as a sparse linear combination of the remaining dataset objects. By way of solving a generalized eigenanalysis problem, a linear matrix operator is designed that will act as the feature extractor. Two popular, inherently high dimensional descriptors, namely ShapeDNA and Global Point Signature (GPS), are employed in our experimentations with SHREC10, SHREC11 and SCHREC 15 datasets, where shape recognition is cast as a multi-class classification problem that is tackled by means of an SVM (support vector machine) acting within the reduced dimensional space of the crafted projections. The results are very promising and outperform state of the art methods, providing evidence about the highly discriminative nature of the introduced 3D shape representations.

• [cs.CV]**All-Transfer Learning for Deep Neural Networks and its Application to Sepsis Classification**

*Yoshihide Sawada, Yoshikuni Sato, Toru Nakada, Kei Ujimoto, Nobuhiro Hayashi*

http://arxiv.org/abs/1711.04450v1

In this article, we propose a transfer learning method for deep neural networks (DNNs). Deep learning has been widely used in many applications. However, applying deep learning is problematic when a large amount of training data are not available. One of the conventional methods for solving this problem is transfer learning for DNNs. In the field of image recognition, state-of-the-art transfer learning methods for DNNs re-use parameters trained on source domain data except for the output layer. However, this method may result in poor classification performance when the amount of target domain data is significantly small. To address this problem, we propose a method called All-Transfer Deep Learning, which enables the transfer of all parameters of a DNN. With this method, we can compute the relationship between the source and target labels by the source domain knowledge. We applied our method to actual two-dimensional electrophoresis image~(2-DE image) classification for determining if an individual suffers from sepsis; the first attempt to apply a classification approach to 2-DE images for proteomics, which has attracted considerable attention as an extension beyond genomics. The results suggest that our proposed method outperforms conventional transfer learning methods for DNNs.

• [cs.CV]**An Automatic Diagnosis Method of Facial Acne Vulgaris Based on Convolutional Neural Network**

*Xiaolei Shen, Jiachi Zhang, Chenjun Yan, Hong Zhou*

http://arxiv.org/abs/1711.04481v1

In this paper, we present a new automatic diagnosis method of facial acne vulgaris based on convolutional neural network. This method is proposed to overcome the shortcoming of classification types in previous methods. The core of our method is to extract features of images based on convolutional neural network and achieve classification by classifier. We design a binary classifier of skin-and-non-skin to detect skin area and a seven-classifier to achieve the classification of facial acne vulgaris and healthy skin. In the experiment, we compared the effectiveness of our convolutional neural network and the pre-trained VGG16 neural network on the ImageNet dataset. And we use the ROC curve and normal confusion matrix to evaluate the performance of the binary classifier and the seven-classifier. The results of our experiment show that the pre-trained VGG16 neural network is more effective in extracting image features. The classifiers based on the pre-trained VGG16 neural network achieve the skin detection and acne classification and have good robustness.

• [cs.CV]**An optimized shape descriptor based on structural properties of networks**

*Gisele H. B. Miranda, Jeaneth Machicao, Odemir M. Bruno*

http://arxiv.org/abs/1711.05104v1

The structural analysis of shape boundaries leads to the characterization of objects as well as to the understanding of shape properties. The literature on graphs and networks have contributed to the structural characterization of shapes with different theoretical approaches. We performed a study on the relationship between the shape architecture and the network topology constructed over the shape boundary. For that, we used a method for network modeling proposed in 2009. Firstly, together with curvature analysis, we evaluated the proposed approach for regular polygons. This way, it was possible to investigate how the network measurements vary according to some specific shape properties. Secondly, we evaluated the performance of the proposed shape descriptor in classification tasks for three datasets, accounting for both real-world and synthetic shapes. We demonstrated that not only degree related measurements are capable of distinguishing classes of objects. Yet, when using measurements that account for distinct properties of the network structure, the construction of the shape descriptor becomes more computationally efficient. Given the fact the network is dynamically constructed, the number of iterations can be reduced. The proposed approach accounts for a more robust set of structural measurements, that improved the discriminant power of the shape descriptors.

• [cs.CV]**Arbitrarily-Oriented Text Recognition**

*Zhanzhan Cheng, Xuyang Liu, Fan Bai, Yi Niu, Shiliang Pu, Shuigeng Zhou*

http://arxiv.org/abs/1711.04226v1

Recognizing text from natural images is still a hot research topic in computer vision due to its various applications. Despite the enduring research of several decades on optical character recognition (OCR), recognizing texts from natural images is still a challenging task. This is because scene texts are often in irregular arrangements (curved, arbitrarily-oriented or seriously distorted), which have not yet been well addressed in the literature. Existing methods on text recognition mainly work with regular (horizontal and frontal) texts and cannot be trivially generalized to handle irregular texts. In this paper, we develop the arbitrary orientation network (AON) to capture the deep features of irregular texts (e.g. arbitrarily-oriented, perspective or curved), which are combined into an attention-based decoder to generate character sequence. The whole network can be trained end-to-end by using only images and word-level labels. Extensive experiments on various benchmarks, including the CUTE80, SVT-Perspective, IIIT5k, SVT and ICDAR datasets, show that the proposed AON-based method substantially outperforms the existing methods.

• [cs.CV]**Automatic Target Recognition of Aircraft using Inverse Synthetic Aperture Radar**

*Carlos Pena-Caballero, Elifaleth Cantu, Jesus Rodriguez, Adolfo Gonzales, Osvaldo Castellanos, Angel Cantu, Megan Strait, Jae Son, Dongchul Kim*

http://arxiv.org/abs/1711.04901v1

Along with the improvement of radar technologies, Automatic Target Recognition (ATR) using Synthetic Aperture Radar (SAR) and Inverse SAR (ISAR) has come to be an active research area. SAR/ISAR are radar techniques to generate a two-dimensional high-resolution image of a target. Unlike other similar experiments using Convolutional Neural Networks (CNN) to solve this problem, we utilize an unusual approach that leads to better performance and faster training times. Our CNN uses complex values generated by a simulation to train the network; additionally, we utilize a multi-radar approach to increase the accuracy of the training and testing processes, thus resulting in higher accuracies than the other papers working on SAR/ISAR ATR. We generated our dataset with 7 different aircraft models with a radar simulator we developed called RadarPixel; it is a Windows GUI program implemented using Matlab and Java programming, the simulator is capable of accurately replicating a real SAR/ISAR configurations. Our objective is to utilize our multi-radar technique and determine the optimal number of radars needed to detect and classify targets.

• [cs.CV]**Capturing Localized Image Artifacts through a CNN-based Hyper-image Representation**

*Parag Shridhar Chandakkar, Baoxin Li*

http://arxiv.org/abs/1711.04945v1

Training deep CNNs to capture localized image artifacts on a relatively small dataset is a challenging task. With enough images at hand, one can hope that a deep CNN characterizes localized artifacts over the entire data and their effect on the output. However, on smaller datasets, such deep CNNs may overfit and shallow ones find it hard to capture local artifacts. Thus some image-based small-data applications first train their framework on a collection of patches (instead of the entire image) to better learn the representation of localized artifacts. Then the output is obtained by averaging the patch-level results. Such an approach ignores the spatial correlation among patches and how various patch locations affect the output. It also fails in cases where few patches mainly contribute to the image label. To combat these scenarios, we develop the notion of hyper-image representations. Our CNN has two stages. The first stage is trained on patches. The second stage utilizes the last layer representation developed in the first stage to form a hyper-image, which is used to train the second stage. We show that this approach is able to develop a better mapping between the image and its output. We analyze additional properties of our approach and show its effectiveness on one synthetic and two real-world vision tasks - no-reference image quality estimation and image tampering detection - by its performance improvement over existing strong baselines.

• [cs.CV]**Conditional Autoencoders with Adversarial Information Factorization**

*Antonia Creswell, Anil A Bharath, Biswa Sengupta*

http://arxiv.org/abs/1711.05175v1

Generative models, such as variational auto-encoders (VAE) and generative adversarial networks (GAN), have been immensely successful in approximating image statistics in computer vision. VAEs are useful for unsupervised feature learning, while GANs alleviate supervision by penalizing inaccurate samples using an adversarial game. In order to utilize benefits of these two approaches, we combine the VAE under an adversarial setup with auxiliary label information. We show that factorizing the latent space to separate the information needed for reconstruction (a continuous space) from the information needed for image attribute classification (a discrete space), enables the capability to edit specific attributes of an image.

• [cs.CV]**Conditional Random Field and Deep Feature Learning for Hyperspectral Image Segmentation**

*Fahim Irfan Alam, Jun Zhou, Alan Wee-Chung Liew, Xiuping Jia, Jocelyn Chanussot, Yongsheng Gao*

http://arxiv.org/abs/1711.04483v1

Image segmentation is considered to be one of the critical tasks in hyperspectral remote sensing image processing. Recently, convolutional neural network (CNN) has established itself as a powerful model in segmentation and classification by demonstrating excellent performances. The use of a graphical model such as a conditional random field (CRF) contributes further in capturing contextual information and thus improving the segmentation performance. In this paper, we propose a method to segment hyperspectral images by considering both spectral and spatial information via a combined framework consisting of CNN and CRF. We use multiple spectral cubes to learn deep features using CNN, and then formulate deep CRF with CNN-based unary and pairwise potential functions to effectively extract the semantic correlations between patches consisting of three-dimensional data cubes. Effective piecewise training is applied in order to avoid the computationally expensive iterative CRF inference. Furthermore, we introduce a deep deconvolution network that improves the segmentation masks. We also introduce a new dataset and experimented our proposed method on it along with several widely adopted benchmark datasets to evaluate the effectiveness of our method. By comparing our results with those from several state-of-the-art models, we show the promising potential of our method.

• [cs.CV]**Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video**

*Boris Knyazev, Roman Shvetsov, Natalia Efremova, Artem Kuharenko*

http://arxiv.org/abs/1711.04598v1

In this paper we describe a solution to our entry for the emotion recognition challenge EmotiW 2017. We propose an ensemble of several models, which capture spatial and audio features from videos. Spatial features are captured by convolutional neural networks, pretrained on large face recognition datasets. We show that usage of strong industry-level face recognition networks increases the accuracy of emotion recognition. Using our ensemble we improve on the previous best result on the test set by about 1 %, achieving a 60.03 % classification accuracy without any use of visual temporal information.

• [cs.CV]**Crowd counting via scale-adaptive convolutional neural network**

*Lu Zhang, Miaojing Shi*

http://arxiv.org/abs/1711.04433v1

The task of crowd counting is to automatically estimate the pedestrian number in crowd images. To cope with the scale and perspective changes that commonly exist in crowd images, state-of-the-art approaches employ multi-column CNN architectures to regress density maps of crowd images. Multiple columns have different receptive fields corresponding to pedestrians (heads) of different scales. We instead propose a scale-adaptive CNN (SaCNN) architecture with a backbone of fixed small receptive fields. We extract feature maps from multiple layers and adapt them to have the same output size; we combine them to produce the final density map. The number of people is computed by integrating the density map. We also introduce a relative count loss along with the density map loss to improve the network generalization on crowd scenes with few pedestrians, where most representative approaches perform poorly on. We conduct extensive experiments on the ShanghaiTech, UCF_CC_50 and WorldExpo datasets as well as a new dataset SmartCity that we collect for crowd scenes with few people. The results demonstrate significant improvements of SaCNN over the state-of-the-art.

• [cs.CV]**D-PCN: Parallel Convolutional Neural Networks for Image Recognition in Reverse Adversarial Style**

*Shiqi Yang, Gang Peng*

http://arxiv.org/abs/1711.04237v1

In this paper, a recognition framework named D-PCN using a discriminator is proposed, which can intensify the feature extracting ability of convolutional neural networks. The framework contains two parallel convolutional neural networks, and a discriminator, which is introduced from the Generative Adversarial Nets and can improve the performance of parallel networks. The two nets are devised side by side, and the discriminator takes in the features from parallel networks as input, aiming to guide the two nets to learn features of different details in a reverse adversarial style. After that, the feature maps from two nets get aggregated, then an extra overall classifier is added and will output the final prediction employing the fused features. The training strategy of the D-PCN is also introduced which ensures the utilization of the discriminator. We experiment the D-PCN with several CNN models including NIN, ResNet, ResNeXt and DenseNet using single NVIDIA TITAN Xp, on the two benchmark datasets: CIFAR-100 and downsampled ImageNet-1k, the D-PCN enhances all models on CIFAR-100 and also reinforces the performance of ResNet on downsampled ImageNet-1k explicitly. In particular, it yields state-of-the-art classification performance on CIFAR-100 with compared to relative works.

• [cs.CV]**Denoising Imaging Polarimetry by Adapted BM3D Method**

*Alexander B. Tibbs, Ilse M. Daly, Nicholas W. Roberts, David R. Bull*

http://arxiv.org/abs/1711.04853v1

Imaging polarimetry allows more information to be extracted from a scene than conventional intensity or colour imaging. However, a major challenge of imaging polarimetry is image degradation due to noise. This paper investigates the mitigation of noise through denoising algorithms and compares existing denoising algorithms with a new method, based on BM3D. This algorithm, PBM3D, gives visual quality superior to the state of the art across all images and noise standard deviations tested. We show that denoising polarization images using PBM3D allows the degree of polarization to be more accurately calculated by comparing it to spectroscopy methods.

• [cs.CV]**Dynamic Zoom-in Network for Fast Object Detection in Large Images**

*Mingfei Gao, Ruichi Yu, Ang Li, Vlad I. Morariu, Larry S. Davis*

http://arxiv.org/abs/1711.05187v1

We introduce a generic framework that reduces the computational cost of object detection while retaining accuracy for scenarios where objects with varied sizes appear in high resolution images. Detection progresses in a coarse-to-fine manner, first on a down-sampled version of the image and then on a sequence of higher resolution regions identified as likely to improve the detection accuracy. Built upon reinforcement learning, our approach consists of a model (R-net) that uses coarse detection results to predict the potential accuracy gain for analyzing a region at a higher resolution and another model (Q-net) that sequentially selects regions to zoom in. Experiments on the Caltech Pedestrians dataset show that our approach reduces the number of processed pixels by over 50% without a drop in detection accuracy. The merits of our approach become more significant on a high resolution test set collected from YFCC100M dataset where our approach maintains high detection performance while reducing the number of processed pixels by about 70% and the detection time by over 50%.

• [cs.CV]**Evaluation of trackers for Pan-Tilt-Zoom Scenarios**

*Yucao Tang, Guillaume-Alexandre Bilodeau*

http://arxiv.org/abs/1711.04260v1

Tracking with a Pan-Tilt-Zoom (PTZ) camera has been a research topic in computer vision for many years. Compared to tracking with a still camera, the images captured with a PTZ camera are highly dynamic in nature because the camera can perform large motion resulting in quickly changing capture conditions. Furthermore, tracking with a PTZ camera involves camera control to position the camera on the target. For successful tracking and camera control, the tracker must be fast enough, or has to be able to predict accurately the next position of the target. Therefore, standard benchmarks do not allow to assess properly the quality of a tracker for the PTZ scenario. In this work, we use a virtual PTZ framework to evaluate different tracking algorithms and compare their performances. We also extend the framework to add target position prediction for the next frame, accounting for camera motion and processing delays. By doing this, we can assess if predicting can make long-term tracking more robust as it may help slower algorithms for keeping the target in the field of view of the camera. Results confirm that both speed and robustness are required for tracking under the PTZ scenario.

• [cs.CV]**Feature Enhancement Network: A Refined Scene Text Detector**

*Sheng Zhang, Yuliang Liu, Lianwen Jin, Canjie Luo*

http://arxiv.org/abs/1711.04249v1

In this paper, we propose a refined scene text detector with a \textit{novel} Feature Enhancement Network (FEN) for Region Proposal and Text Detection Refinement. Retrospectively, both region proposal with \textit{only} $3\times 3$ sliding-window feature and text detection refinement with \textit{single scale} high level feature are insufficient, especially for smaller scene text. Therefore, we design a new FEN network with \textit{task-specific}, \textit{low} and \textit{high} level semantic features fusion to improve the performance of text detection. Besides, since \textit{unitary} position-sensitive RoI pooling in general object detection is unreasonable for variable text regions, an \textit{adaptively weighted} position-sensitive RoI pooling layer is devised for further enhancing the detecting accuracy. To tackle the \textit{sample-imbalance} problem during the refinement stage, we also propose an effective \textit{positives mining} strategy for efficiently training our network. Experiments on ICDAR 2011 and 2013 robust text detection benchmarks demonstrate that our method can achieve state-of-the-art results, outperforming all reported methods in terms of F-measure.

• [cs.CV]**Gender recognition and biometric identification using a large dataset of hand images**

*Mahmoud Afifi*

http://arxiv.org/abs/1711.04322v1

The human hand possesses distinctive features which can reveal gender information. In addition, the hand is considered one of the primary biometric traits used to identify a person. In this work, we propose a large dataset of human hand images with detailed ground-truth information for gender recognition and biometric identification. The proposed dataset comprises of 11,076 hand images (dorsal and palmar sides), from 190 subjects of different ages under the same lighting conditions. Using this dataset, a convolutional neural network (CNN) can be trained effectively for the gender recognition task. Based on this, we design a two-stream CNN to tackle the gender recognition problem. This trained model is then used as a feature extractor to feed a set of support vector machine classifiers for the biometric identification task. To the best of our knowledge, this is the first dataset containing the image of the dorsal side of the hand, captured by a regular digital camera and subsequently considered the first study attempting to use the features extracted from the dorsal side of the hand for gender recognition and biometric identification.

• [cs.CV]**Grab, Pay and Eat: Semantic Food Detection for Smart Restaurants**

*Eduardo Aguilar, Beatriz Remeseiro, Marc Bolaños, Petia Radeva*

http://arxiv.org/abs/1711.05128v1

The increase in awareness of people towards their nutritional habits has drawn considerable attention to the field of automatic food analysis. Focusing on self-service restaurants environment, automatic food analysis is not only useful for extracting nutritional information from foods selected by customers, it is also of high interest to speed up the service solving the bottleneck produced at the cashiers in times of high demand. In this paper, we address the problem of automatic food tray analysis in canteens and restaurants environment, which consists in predicting multiple foods placed on a tray image. We propose a new approach for food analysis based on convolutional neural networks, we name Semantic Food Detection, which integrates in the same framework food localization, recognition and segmentation. We demonstrate that our method improves the state of the art food detection by a considerable margin on the public dataset UNIMIB2016 achieving about 90% in terms of F-measure, and thus provides a significant technological advance towards the automatic billing in restaurant environments.

• [cs.CV]**Hand Gesture Recognition with Leap Motion**

*Youchen Du, Shenglan Liu, Lin Feng, Menghui Chen, Jie Wu*

http://arxiv.org/abs/1711.04293v1

The recent introduction of depth cameras like Leap Motion Controller allows researchers to exploit the depth information to recognize hand gesture more robustly. This paper proposes a novel hand gesture recognition system with Leap Motion Controller. A series of features are extracted from Leap Motion tracking data, we feed these features along with HOG feature extracted from sensor images into a multi-class SVM classifier to recognize performed gesture, dimension reduction and feature weighted fusion are also discussed. Our results show that our model is much more accurate than previous work.

• [cs.CV]**High-Order Attention Models for Visual Question Answering**

*Idan Schwartz, Alexander G. Schwing, Tamir Hazan*

http://arxiv.org/abs/1711.04323v1

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

• [cs.CV]**Latent Constrained Correlation Filter**

*Baochang Zhang, Shangzhen Luan, Chen Chen, Jungong Han, Wei Wang, Alessandro Perina, Ling Shao*

http://arxiv.org/abs/1711.04192v1

Correlation filters are special classifiers designed for shift-invariant object recognition, which are robust to pattern distortions. The recent literature shows that combining a set of sub-filters trained based on a single or a small group of images obtains the best performance. The idea is equivalent to estimating variable distribution based on the data sampling (bagging), which can be interpreted as finding solutions (variable distribution approximation) directly from sampled data space. However, this methodology fails to account for the variations existed in the data. In this paper, we introduce an intermediate step -- solution sampling -- after the data sampling step to form a subspace, in which an optimal solution can be estimated. More specifically, we propose a new method, named latent constrained correlation filters (LCCF), by mapping the correlation filters to a given latent subspace, and develop a new learning framework in the latent subspace that embeds distribution-related constraints into the original problem. To solve the optimization problem, we introduce a subspace based alternating direction method of multipliers (SADMM), which is proven to converge at the saddle point. Our approach is successfully applied to three different tasks, including eye localization, car detection and object tracking. Extensive experiments demonstrate that LCCF outperforms the state-of-the-art methods. The source code will be publicly available. https://github.com/bczhangbczhang/.

• [cs.CV]**Modeling Human Categorization of Natural Images Using Deep Feature Representations**

*Ruairidh M. Battleday, Joshua C. Peterson, Thomas L. Griffiths*

http://arxiv.org/abs/1711.04855v1

Over the last few decades, psychologists have developed sophisticated formal models of human categorization using simple artificial stimuli. In this paper, we use modern machine learning methods to extend this work into the realm of naturalistic stimuli, enabling human categorization to be studied over the complex visual domain in which it evolved and developed. We show that representations derived from a convolutional neural network can be used to model behavior over a database of >300,000 human natural image classifications, and find that a group of models based on these representations perform well, near the reliability of human judgments. Interestingly, this group includes both exemplar and prototype models, contrasting with the dominance of exemplar models in previous work. We are able to improve the performance of the remaining models by preprocessing neural network representations to more closely capture human similarity judgments.

• [cs.CV]**Robust Image Registration via Empirical Mode Decomposition**

*Reza Abbasi-Asl, Aboozar Ghaffari, Emad Fatemizadeh*

http://arxiv.org/abs/1711.04247v1

Spatially varying intensity noise is a common source of distortion in images. Bias field noise is one example of such distortion that is often present in the magnetic resonance (MR) images. In this paper, we first show that empirical mode decomposition (EMD) can considerably reduce the bias field noise in the MR images. Then, we propose two hierarchical multi-resolution EMD-based algorithms for robust registration of images in the presence of spatially varying noise. One algorithm (LR-EMD) is based on registering EMD feature-maps of both floating and reference images in various resolution levels. In the second algorithm (AFR-EMD), we first extract an average feature-map based on EMD from both floating and reference images. Then, we use a simple hierarchical multi-resolution algorithm based on downsampling to register the average feature-maps. Both algorithms achieve lower error rate and higher convergence percentage compared to the intensity-based hierarchical registration. Specifically, using mutual information as the similarity measure, AFR-EMD achieves 42% lower error rate in intensity and 52% lower error rate in transformation compared to intensity-based hierarchical registration. For LR-EMD, the error rate is 32% lower for the intensity and 41% lower for the transformation.

• [cs.CV]**Robust Keyframe-based Dense SLAM with an RGB-D Camera**

*Haomin Liu, Chen Li, Guojun Chen, Guofeng Zhang, Michael Kaess, Hujun Bao*

http://arxiv.org/abs/1711.05166v1

In this paper, we present RKD-SLAM, a robust keyframe-based dense SLAM approach for an RGB-D camera that can robustly handle fast motion and dense loop closure, and run without time limitation in a moderate size scene. It not only can be used to scan high-quality 3D models, but also can satisfy the demand of VR and AR applications. First, we combine color and depth information to construct a very fast keyframe-based tracking method on a CPU, which can work robustly in challenging cases (e.g.~fast camera motion and complex loops). For reducing accumulation error, we also introduce a very efficient incremental bundle adjustment (BA) algorithm, which can greatly save unnecessary computation and perform local and global BA in a unified optimization framework. An efficient keyframe-based depth representation and fusion method is proposed to generate and timely update the dense 3D surface with online correction according to the refined camera poses of keyframes through BA. The experimental results and comparisons on a variety of challenging datasets and TUM RGB-D benchmark demonstrate the effectiveness of the proposed system.

• [cs.CV]**Saliency-based Sequential Image Attention with Multiset Prediction**

*Sean Welleck, Jialin Mao, Kyunghyun Cho, Zheng Zhang*

http://arxiv.org/abs/1711.05165v1

Humans process visual scenes selectively and sequentially using attention. Central to models of human visual attention is the saliency map. We propose a hierarchical visual architecture that operates on a saliency map and uses a novel attention mechanism to sequentially focus on salient regions and take additional glimpses within those regions. The architecture is motivated by human visual attention, and is used for multi-label image classification on a novel multiset task, demonstrating that it achieves high precision and recall while localizing objects with its attention. Unlike conventional multi-label image classification models, the model supports multiset prediction due to a reinforcement-learning based training process that allows for arbitrary label permutation and multiple instances per label.

• [cs.CV]**Vertebral body segmentation with GrowCut: Initial experience, workflow and practical application**

*Jan Egger, Christopher Nimsky, Xiaojun Chen*

http://arxiv.org/abs/1711.04592v1

In this contribution, we used the GrowCut segmentation algorithm publicly available in three-dimensional Slicer for three-dimensional segmentation of vertebral bodies. To the best of our knowledge, this is the first time that the GrowCut method has been studied for the usage of vertebral body segmentation. In brief, we found that the GrowCut segmentation times were consistently less than the manual segmentation times. Hence, GrowCut provides an alternative to a manual slice-by-slice segmentation process.

• [cs.CV]**Visual Concepts and Compositional Voting**

*Jianyu Wang, Zhishuai Zhang, Cihang Xie, Yuyin Zhou, Vittal Premachandran, Jun Zhu, Lingxi Xie, Alan Yuille*

http://arxiv.org/abs/1711.04451v1

It is very attractive to formulate vision in terms of pattern theory \cite{Mumford2010pattern}, where patterns are defined hierarchically by compositions of elementary building blocks. But applying pattern theory to real world images is currently less successful than discriminative methods such as deep networks. Deep networks, however, are black-boxes which are hard to interpret and can easily be fooled by adding occluding objects. It is natural to wonder whether by better understanding deep networks we can extract building blocks which can be used to develop pattern theoretic models. This motivates us to study the internal representations of a deep network using vehicle images from the PASCAL3D+ dataset. We use clustering algorithms to study the population activities of the features and extract a set of visual concepts which we show are visually tight and correspond to semantic parts of vehicles. To analyze this we annotate these vehicles by their semantic parts to create a new dataset, VehicleSemanticParts, and evaluate visual concepts as unsupervised part detectors. We show that visual concepts perform fairly well but are outperformed by supervised discriminative methods such as Support Vector Machines (SVM). We next give a more detailed analysis of visual concepts and how they relate to semantic parts. Following this, we use the visual concepts as building blocks for a simple pattern theoretical model, which we call compositional voting. In this model several visual concepts combine to detect semantic parts. We show that this approach is significantly better than discriminative methods like SVM and deep networks trained specifically for semantic part detection. Finally, we return to studying occlusion by creating an annotated dataset with occlusion, called VehicleOcclusion, and show that compositional voting outperforms even deep networks when the amount of occlusion becomes large.

• [cs.CV]**XGAN: Unsupervised Image-to-Image Translation for many-to-many Mappings**

*Amélie Royer, Konstantinos Bousmalis, Stephan Gouws, Fred Bertsch, Inbar Moressi, Forrester Cole, Kevin Murphy*

http://arxiv.org/abs/1711.05139v1

Style transfer usually refers to the task of applying color and texture information from a specific style image to a given content image while preserving the structure of the latter. Here we tackle the more generic problem of semantic style transfer: given two unpaired collections of images, we aim to learn a mapping between the corpus-level style of each collection, while preserving semantic content shared across the two domains. We introduce XGAN ("Cross-GAN"), a dual adversarial autoencoder, which captures a shared representation of the common domain semantic content in an unsupervised way, while jointly learning the domain-to-domain image translations in both directions. We exploit ideas from the domain adaptation literature and define a semantic consistency loss which encourages the model to preserve semantics in the learned embedding space. We report promising qualitative results for the task of face-to-cartoon translation. The cartoon dataset we collected for this purpose will also be released as a new benchmark for semantic style transfer.

• [cs.CY]**A Case Study of the 2016 Korean Cyber Command Compromise**

*Kyong Jae Park, Sung Mi Park, Joshua I. James*

http://arxiv.org/abs/1711.04500v1

On October 2016 the South Korean cyber military unit was the victim of a successful cyber attack that allowed access to internal networks. Per usual with large scale attacks against South Korean entities, the hack was immediately attributed to North Korea. Also, per other large-scale cyber security incidents, the same types of 'evidence' were used for attribution purposes. Disclosed methods of attribution provide weak evidence, and the procedure Korean organizations tend to use for information disclosure lead many to question any conclusions. We will analyze and discuss a number of issues with the current way that South Korean organizations disclose cyber attack information to the public. A time line of events and disclosures will be constructed and analyzed in the context of appropriate measures for cyber warfare. Finally, we will examine the South Korean cyber military attack in terms previously proposed cyber warfare response guidelines. Specifically, whether any of the guidelines can be applied to this real-world case, and if so, is South Korea justified in declaring war based on the most recent cyber attack.

• [cs.CY]**Cloud Computing and Content Management Systems: A Case Study in Macedonian Education**

*Jove Jankulovski, Pece Mitrevski*

http://arxiv.org/abs/1711.04025v1

Technologies have become inseparable of our lives, economy, and the society as a whole. For example, clouds provide numerous computing resources that can facilitate our lives, whereas the Content Management Systems (CMSs) can provide the right content for the right user. Thus, education must embrace these emerging technologies in order to prepare citizens for the 21st century. The research explored 'if' and 'how' Cloud Computing influences the application of CMSs, and 'if' and 'how' it fosters the usage of mobile technologies to access cloud resources. The analyses revealed that some of the respondents have sound experience in using clouds and in using CMSs. Nevertheless, it was evident that significant number of respondents have limited or no experience in cloud computing concepts, cloud security and CMSs, as well. Institutions of the system should update educational policies in order to enable education innovation, provide means and support, and continuously update/upgrade educational infrastructure.

• [cs.CY]**Gerrymandering and Computational Redistricting**

*Olivia Guest, Frank J. Kanayet, Bradley C. Love*

http://arxiv.org/abs/1711.04640v1

Partisan gerrymandering poses a threat to democracy. Moreover, the complexity of the districting task may exceed human capacities. One potential solution is using computational models to automate the districting process by optimising objective and open criteria, such as how spatially compact districts are. We formulated one such model that minimised pairwise distance between voters within a district. Using US Census Bureau data, we confirmed our prediction that the difference in compactness between the computed and actual districts would be greatest for states that are large and therefore difficult for humans to properly district given their limited capacities. The computed solutions highlighted differences in how humans and machines solve this task with machine solutions more fully optimised and displaying emergent properties not evident in human solutions. These results suggest a division of labour in which humans debate and formulate districting criteria whereas machines optimise the criteria to draw the district boundaries.

• [cs.CY]**United Nations Digital Blue Helmets as a Starting Point for Cyber Peacekeeping**

*Nikolay Akatyev, Joshua I. James*

http://arxiv.org/abs/1711.04502v1

Prior works, such as the Tallinn manual on the international law applicable to cyber warfare, focus on the circumstances of cyber warfare. Many organizations are considering how to conduct cyber warfare, but few have discussed methods to reduce, or even prevent, cyber conflict. A recent series of publications started developing the framework of Cyber Peacekeeping (CPK) and its legal requirements. These works assessed the current state of organizations such as ITU IMPACT, NATO CCDCOE and Shanghai Cooperation Organization, and found that they did not satisfy requirements to effectively host CPK activities. An assessment of organizations currently working in the areas related to CPK found that the United Nations (UN) has mandates and organizational structures that appear to somewhat overlap the needs of CPK. However, the UN's current approach to Peacekeeping cannot be directly mapped to cyberspace. In this research we analyze the development of traditional Peacekeeping in the United Nations, and current initiatives in cyberspace. Specifically, we will compare the proposed CPK framework with the recent initiative of the United Nations named the 'Digital Blue Helmets' as well as with other projects in the UN which helps to predict and mitigate conflicts. Our goal is to find practical recommendations for the implementation of the CPK framework in the United Nations, and to examine how responsibilities defined in the CPK framework overlap with those of the 'Digital Blue Helmets' and the Global Pulse program.

• [cs.CY]**Using Phone Sensors and an Artificial Neural Network to Detect Gait Changes During Drinking Episodes in the Natural Environment**

*Brian Suffoletto, Pedram Gharani, Tammy Chung, Hassan Karimi*

http://arxiv.org/abs/1711.03410v2

Phone sensors could be useful in assessing changes in gait that occur with alcohol consumption. This study determined (1) feasibility of collecting gait-related data during drinking occasions in the natural environment, and (2) how gait-related features measured by phone sensors relate to estimated blood alcohol concentration (eBAC). Ten young adult heavy drinkers were prompted to complete a 5-step gait task every hour from 8pm to 12am over four consecutive weekends. We collected 3-xis accelerometer, gyroscope, and magnetometer data from phone sensors, and computed 24 gait-related features using a sliding window technique. eBAC levels were calculated at each time point based on Ecological Momentary Assessment (EMA) of alcohol use. We used an artificial neural network model to analyze associations between sensor features and eBACs in training (70% of the data) and validation and test (30% of the data) datasets. We analyzed 128 data points where both eBAC and gait-related sensor data was captured, either when not drinking (n=60), while eBAC was ascending (n=55) or eBAC was descending (n=13). 21 data points were captured at times when the eBAC was greater than the legal limit (0.08 mg/dl). Using a Bayesian regularized neural network, gait-related phone sensor features showed a high correlation with eBAC (Pearson's r > 0.9), and >95% of estimated eBAC would fall between -0.012 and +0.012 of actual eBAC. It is feasible to collect gait-related data from smartphone sensors during drinking occasions in the natural environment. Sensor-based features can be used to infer gait changes associated with elevated blood alcohol content.

• [cs.DC]**A Parallel Best-Response Algorithm with Exact Line Search for Nonconvex Sparsity-Regularized Rank Minimization**

*Yang Yang, Marius Pesavento*

http://arxiv.org/abs/1711.04489v1

In this paper, we propose a convergent parallel best-response algorithm with the exact line search for the nondifferentiable nonconvex sparsity-regularized rank minimization problem. On the one hand, it exhibits a faster convergence than subgradient algorithms and block coordinate descent algorithms. On the other hand, its convergence to a stationary point is guaranteed, while ADMM algorithms only converge for convex problems. Furthermore, the exact line search procedure in the proposed algorithm is performed efficiently in closed-form to avoid the meticulous choice of stepsizes, which is however a common bottleneck in subgradient algorithms and successive convex approximation algorithms. Finally, the proposed algorithm is numerically tested.

• [cs.DC]**Accelerating HPC codes on Intel(R) Omni-Path Architecture networks: From particle physics to Machine Learning**

*Peter Boyle, Michael Chuvelev, Guido Cossu, Christopher Kelly, Christoph Lehner, Lawrence Meadows*

http://arxiv.org/abs/1711.04883v1

We discuss practical methods to ensure near wirespeed performance from clusters with either one or two Intel(R) Omni-Path host fabric interfaces (HFI) per node, and Intel(R) Xeon Phi(TM) 72xx (Knight's Landing) processors, and using the Linux operating system. The study evaluates the performance improvements achievable and the required programming approaches in two distinct example problems: firstly in Cartesian communicator halo exchange problems, appropriate for structured grid PDE solvers that arise in quantum chromodynamics simulations of particle physics, and secondly in gradient reduction appropriate to synchronous stochastic gradient descent for machine learning. As an example, we accelerate a published Baidu Research reduction code and obtain a factor of ten speedup over the original code using the techniques discussed in this paper. This displays how a factor of ten speedup in strongly scaled distributed machine learning could be achieved when synchronous stochastic gradient descent is massively parallelised with a fixed mini-batch size. We find a significant improvement in performance robustness when memory is obtained using carefully allocated 2MB "huge" virtual memory pages, implying that either non-standard allocation routines should be used for communication buffers. These can be accessed via a LD_PRELOAD override in the manner suggested by libhugetlbfs. We make use of a the Intel(R) MPI 2019 library "Technology Preview" and underlying software to enable thread concurrency throughout the communication software stake via multiple PSM2 endpoints per process and use of multiple independent MPI communicators. When using a single MPI process per node, we find that this greatly accelerates delivered bandwidth in many core Intel(R) Xeon Phi processors.

• [cs.DC]**Cheating by Duplication: Equilibrium Requires Global Knowledge**

*Yehuda Afek, Shaked Rafaeli, Moshe Sulamy*

http://arxiv.org/abs/1711.04728v1

Distributed algorithms with rational agents have always assumed the size of the network is known to the participants before the algorithm starts. Here we address the following question: what global information must agents know a-priori about the network in order for equilibrium to be possible? We start this investigation by considering different distributed computing problems and showing how much each agent must a-priori know about $n$, the number of agents in the network, in order for distributed algorithms to be equilibria. We prove that when $n$ is not a-priori known, equilibrium for both knowledge sharing and coloring is impossible. We provide new algorithms for both problems when $n$ is a-priori known to all agents. We further show that when agents are given a range in which the actual value of $n$ may be, different distributed problems require different such ranges in order for equilibrium to be possible. By providing algorithms that are equilibrium on the one hand and impossibility results on the other, we provide the tight range in which equilibrium is possible but beyond which there exist no equilibrium for the following common distributed problems: Leader Election, Knowledge Sharing, Coloring, Partition and Orientation.

• [cs.DC]**Extremely Large Minibatch SGD: Training ResNet-50 on ImageNet in 15 Minutes**

*Takuya Akiba, Shuji Suzuki, Keisuke Fukuda*

http://arxiv.org/abs/1711.04325v1

We demonstrate that training ResNet-50 on ImageNet for 90 epochs can be achieved in 15 minutes with 1024 Tesla P100 GPUs. This was made possible by using a large minibatch size of 32k. To maintain accuracy with this large minibatch size, we employed several techniques such as RMSprop warm-up, batch normalization without moving averages, and a slow-start learning rate schedule. This paper also describes the details of the hardware and software of the system used to achieve the above performance.

• [cs.DC]**Solving the Resource Constrained Project Scheduling Problem Using the Parallel Tabu Search Designed for the CUDA Platform**

*Libor Bukata, Premysl Sucha, Zdenek Hanzalek*

http://arxiv.org/abs/1711.04556v1

In the paper, a parallel Tabu Search algorithm for the Resource Constrained Project Scheduling Problem is proposed. To deal with this NP-hard combinatorial problem many optimizations have been performed. For example, a resource evaluation algorithm is selected by a heuristic and an effective Tabu List was designed. In addition to that, a capacity-indexed resource evaluation algorithm was proposed and the GPU (Graphics Processing Unit) version uses a homogeneous model to reduce the required communication bandwidth. According to the experiments, the GPU version outperforms the optimized parallel CPU version with respect to the computational time and the quality of solutions. In comparison with other existing heuristics, the proposed solution often gives better quality solutions.

• [cs.DS]**Robust Online Speed Scaling With Deadline Uncertainty**

*Goonwanth Reddy, Rahul Vaze*

http://arxiv.org/abs/1711.04978v1

A speed scaling problem is considered, where time is divided into slots, and jobs with payoff $v$ arrive at the beginning of the slot with associated deadlines $d$. Each job takes one slot to be processed, and multiple jobs can be processed by the server in each slot with energy cost $g(k)$ for processing $k$ jobs in one slot. The payoff is accrued by the algorithm only if the job is processed by its deadline. We consider a robust version of this speed scaling problem, where a job on its arrival reveals its payoff $v$, however, the deadline is hidden to the online algorithm, which could potentially be chosen adversarially and known to the optimal offline algorithm. The objective is to derive a robust (to deadlines) and optimal online algorithm that achieves the best competitive ratio. We propose an algorithm (called min-LCR) and show that it is an optimal online algorithm for any convex energy cost function $g(.)$. We do so without actually evaluating the optimal competitive ratio, and give a general proof that works for any convex $g$, which is rather novel. For the popular choice of energy cost function $g(k) = k^\alpha, \alpha \ge 2$, we give concrete bounds on the competitive ratio of the algorithm, which ranges between $2.618$ and $3$ depending on the value of $\alpha$. The best known online algorithm for the same problem, but where deadlines are revealed to the online algorithm has competitive ratio of $2$ and a lower bound of $\sqrt{2}$. Thus, importantly, lack of deadline knowledge does not make the problem degenerate, and the effect of deadline information on the optimal competitive ratio is limited.

• [cs.DS]**Similarity-Aware Spectral Sparsification by Edge Filtering**

*Zhuo Feng*

http://arxiv.org/abs/1711.05135v1

In recent years, spectral graph sparsification techniques that can compute ultra-sparse graph proxies have been extensively studied for accelerating various numerical and graph-related applications. Prior nearly-linear-time spectral sparsification methods first extract low-stretch spanning tree of the original graph to form the backbone of the sparsifier, and then recover small portions of spectrally-critical off-tree edges to the spanning to significantly improve the approximation quality. However, it is not clear how many off-tree edges should be recovered for achieving a desired spectral similarity level within the sparsifier. Motivated by recent graph signal processing techniques, this paper proposes a similarity-aware spectral graph sparsification framework that leverages an efficient off-tree edge filtering scheme to construct spectral sparsifiers with guaranteed spectral similarity (relative condition number) level. An iterative graph densification framework and a generalized eigenvalue stability checking scheme are introduced to facilitate efficient and effective filtering of off-tree edges even for highly ill-conditioned problems. The proposed method has been validated using various kinds of graphs obtained from public domain sparse matrix collections relevant to VLSI CAD, finite element analysis, as well as social and data networks frequently studied in many machine learning and data mining applications.

• [cs.HC]**Coordination Technology for Active Support Networks: Context, Needfinding, and Design**

*Stanley J. Rosenschein, Todd Davies*

http://arxiv.org/abs/1711.04216v1

Coordination is a key problem for addressing goal-action gaps in many human endeavors. We define interpersonal coordination as a type of communicative action characterized by low interpersonal belief and goal conflict. Such situations are particularly well described as having collectively "intelligent", "common good" solutions, viz., ones that almost everyone would agree constitute social improvements. Coordination is useful across the spectrum of interpersonal communication -- from isolated individuals to organizational teams. Much attention has been paid to coordination in teams and organizations. In this paper we focus on the looser interpersonal structures we call active support networks (ASNs), and on technology that meets their needs. We describe two needfinding investigations focused on social support, which examined (a) four application areas for improving coordination in ASNs: (i) academic coaching, (ii) vocational training, (iii) early learning intervention, and (iv) volunteer coordination; and (b) existing technology relevant to ASNs. We find a thus-far unmet need for personal task management software that allows smooth integration with an individual's active support network. Based on identified needs, we then describe an open architecture for coordination that has been developed into working software. The design includes a set of capabilities we call "social prompting," as well as templates for accomplishing multi-task goals, and an engine that controls coordination in the network. The resulting tool is currently available and in continuing development. We explain its use in ASNs with an example. Follow-up studies are underway in which the technology is being applied in existing support networks.

• [cs.HC]**Social Sensing of Floods in the UK**

*Rudy Arthur, Chris A. Boulton, Humphrey Shotton, Hywel T. P. Williams*

http://arxiv.org/abs/1711.04695v1

"Social sensing" is a form of crowd-sourcing that involves systematic analysis of digital communications to detect real-world events. Here we consider the use of social sensing for observing natural hazards. In particular, we present a case study that uses data from a popular social media platform (Twitter) to detect and locate flood events in the UK. In order to improve data quality we apply a number of filters (timezone, simple text filters and a naive Bayes `relevance' filter) to the data. We then use place names in the user profile and message text to infer the location of the tweets. These two steps remove most of the irrelevant tweets and yield orders of magnitude more located tweets than we have by relying on geo-tagged data. We demonstrate that high resolution social sensing of floods is feasible and we can produce high-quality historical and real-time maps of floods using Twitter.

• [cs.IR]**A Hierarchical Contextual Attention-based GRU Network for Sequential Recommendation**

*Qiang Cui, Shu Wu, Yan Huang, Liang Wang*

http://arxiv.org/abs/1711.05114v1

Sequential recommendation is one of fundamental tasks for Web applications. Previous methods are mostly based on Markov chains with a strong Markov assumption. Recently, recurrent neural networks (RNNs) are getting more and more popular and has demonstrated its effectiveness in many tasks. The last hidden state is usually applied as the sequence's representation to make recommendation. Benefit from the natural characteristics of RNN, the hidden state is a combination of long-term dependency and short-term interest to some degrees. However, the monotonic temporal dependency of RNN impairs the user's short-term interest. Consequently, the hidden state is not sufficient to reflect the user's final interest. In this work, to deal with this problem, we propose a Hierarchical Contextual Attention-based GRU (HCA-GRU) network. The first level of HCA-GRU is conducted on the input. We construct a contextual input by using several recent inputs based on the attention mechanism. This can model the complicated correlations among recent items and strengthen the hidden state. The second level is executed on the hidden state. We fuse the current hidden state and a contextual hidden state built by the attention mechanism, which leads to a more suitable user's overall interest. Experiments on two real-world datasets show that HCA-GRU can effectively generate the personalized ranking list and achieve significant improvement.

• [cs.IR]**A distributed system for SearchOnMath based on the Microsoft BizSpark program**

*Ricardo M. Oliveira, Flavio B. Gonzaga, Valmir C. Barbosa, Geraldo B. Xexéo*

http://arxiv.org/abs/1711.04189v1

Mathematical information retrieval is a relatively new area, so the first search tools capable of retrieving mathematical formulas began to appear only a few years ago. The proposals made public so far mostly implement searches on internal university databases, small sets of scientific papers, or Wikipedia in English. As such, only modest computing power is required. In this context, SearchOnMath has emerged as a pioneering tool in that it indexes several different databases and is compatible with several mathematical representation languages. Given the significantly greater number of formulas it handles, a distributed system becomes necessary to support it. The present study is based on the Microsoft BizSpark program and has aimed, for 38 different distributed-system scenarios, to pinpoint the one affording the best response times when searching the SearchOnMath databases for a collection of 120 formulas.

• [cs.IR]**Faithful to the Original: Fact Aware Neural Abstractive Summarization**

*Ziqiang Cao, Furu Wei, Wenjie Li, Sujian Li*

http://arxiv.org/abs/1711.04434v1

Unlike extractive summarization, abstractive summarization has to fuse different parts of the source text, which inclines to create fake facts. Our preliminary study reveals nearly 30% of the outputs from a state-of-the-art neural summarization system suffer from this problem. While previous abstractive summarization approaches usually focus on the improvement of informativeness, we argue that faithfulness is also a vital prerequisite for a practical abstractive summarization system. To avoid generating fake facts in a summary, we leverage open information extraction and dependency parse technologies to extract actual fact descriptions from the source text. The dual-attention sequence-to-sequence framework is then proposed to force the generation conditioned on both the source text and the extracted fact descriptions. Experiments on the Gigaword benchmark dataset demonstrate that our model can greatly reduce fake summaries by 80%. Notably, the fact descriptions also bring significant improvement on informativeness since they often condense the meaning of the source text.

• [cs.IR]**Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey**

*Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng*

http://arxiv.org/abs/1711.04305v1

Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents. Researchers have published many articles in the field of topic modeling and applied in various fields such as software engineering, political science, medical and linguistic science, etc. There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field. Researchers have proposed various models based on the LDA in topic modeling. According to previous work, this paper can be very useful and valuable for introducing LDA approaches in topic modeling. In this paper, we investigated scholarly articles highly (between 2003 to 2016) related to Topic Modeling based on LDA to discover the research development, current trends and intellectual structure of topic modeling. Also, we summarize challenges and introduce famous tools and datasets in topic modeling based on LDA.

• [cs.IR]**Neural Attentive Session-based Recommendation**

*Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Jun Ma*

http://arxiv.org/abs/1711.04725v1

Given e-commerce scenarios that user profiles are invisible, session-based recommendation is proposed to generate recommendation results from short sessions. Previous work only considers the user's sequential behavior in the current session, whereas the user's main purpose in the current session is not emphasized. In this paper, we propose a novel neural networks framework, i.e., Neural Attentive Recommendation Machine (NARM), to tackle this problem. Specifically, we explore a hybrid encoder with an attention mechanism to model the user's sequential behavior and capture the user's main purpose in the current session, which are combined as a unified session representation later. We then compute the recommendation scores for each candidate item with a bi-linear matching scheme based on this unified session representation. We train NARM by jointly learning the item and session representations as well as their matchings. We carried out extensive experiments on two benchmark datasets. Our experimental results show that NARM outperforms state-of-the-art baselines on both datasets. Furthermore, we also find that NARM achieves a significant improvement on long sessions, which demonstrates its advantages in modeling the user's sequential behavior and main purpose simultaneously.

• [cs.IR]**Recommender Systems with Random Walks: A Survey**

*Laknath Semage*

http://arxiv.org/abs/1711.04101v1

Recommender engines have become an integral component in today's e-commerce systems. From recommending books in Amazon to finding friends in social networks such as Facebook, they have become omnipresent. Generally, recommender systems can be classified into two main categories: content based and collaborative filtering based models. Both these models build relationships between users and items to provide recommendations. Content based systems achieve this task by utilizing features extracted from the context available, whereas collaborative systems use shared interests between user-item subsets. There is another relatively unexplored approach for providing recommendations that utilizes a stochastic process named random walks. This study is a survey exploring use cases of random walks in recommender systems and an attempt at classifying them.

• [cs.IR]**Targeted Advertising Based on Browsing History**

*Yong Zhang, Hongming Zhou, Nganmeng Tan, Saeed Bagheri, Meng Joo Er*

http://arxiv.org/abs/1711.04498v1

Audience interest, demography, purchase behavior and other possible classifications are ex- tremely important factors to be carefully studied in a targeting campaign. This information can help advertisers and publishers deliver advertisements to the right audience group. How- ever, it is not easy to collect such information, especially for the online audience with whom we have limited interaction and minimum deterministic knowledge. In this paper, we pro- pose a predictive framework that can estimate online audience demographic attributes based on their browsing histories. Under the proposed framework, first, we retrieve the content of the websites visited by audience, and represent the content as website feature vectors; second, we aggregate the vectors of websites that audience have visited and arrive at feature vectors representing the users; finally, the support vector machine is exploited to predict the audience demographic attributes. The key to achieving good prediction performance is preparing representative features of the audience. Word Embedding, a widely used tech- nique in natural language processing tasks, together with term frequency-inverse document frequency weighting scheme is used in the proposed method. This new representation ap- proach is unsupervised and very easy to implement. The experimental results demonstrate that the new audience feature representation method is more powerful than existing baseline methods, leading to a great improvement in prediction accuracy.

• [cs.IT]**A General Framework for Covariance Matrix Optimization in MIMO Systems**

*Chengwen Xing, Yindi Jing, Shuai Wang, Jiaheng Wang, Jianping An*

http://arxiv.org/abs/1711.04449v1

For multi-input multi-output (MIMO) systems, many transceiver design problems involve the optimization of the covariance matrices of the transmitted signals. Karush-Kuhn-Tucker (KKT) conditions based derivations are the most popular method, and many derivations and results have been reported for different scenarios of MIMO systems. We propose a unified framework in formulating the KKT conditions for general MIMO systems. Based on this framework, the optimal water-filling structure of the transmission covariance matrices are derived rigorously, which is applicable to a wide range of MIMO systems. Our results show that for MIMO systems with various power constraint formulations and objective functions, both the derivation logics and water-filling structures for the optimal covariance matrix solutions are fundamentally the same. Thus, our unified framework and solution reveal the underlying relationships among the different water-filling structures of the covariance matrices. Furthermore, our results provide new solutions to the covariance matrix optimization of many complicated MIMO systems with multiple users and imperfect channel state information (CSI) which were unknown before.

• [cs.IT]**A Joint Encryption-Encoding Scheme Using QC-LDPC Codes Based on Finite Geometry**

*Hossein Khayami, Taraneh Eghlidos, Mohammad Reza Aref*

http://arxiv.org/abs/1711.04611v1

Joint encryption-encoding schemes has been released to fulfill both reliability and security desires in a single step. Using Low Density Parity Check (LDPC) codes in joint encryption-encoding schemes, as an alternative to classical linear codes, would shorten the key size as well as improving error correction capability. In this article, we present a joint encryption-encoding scheme using Quasi Cyclic-Low Density Parity Check (QC-LDPC) codes based on finite geometry. We observed that our proposed scheme not only outperforms in its key size and transmission rate, but also remains secure against all known cryptanalyses of code-based secret key cryptosystems. We subsequently show that our scheme benefits from low computational complexity. In our proposed joint encryption-encoding scheme, by taking the advantage of QC-LDPC codes based on finite geometries, the key size decreases to 1/5 of that of the so far best similar system. In addition, using our proposed scheme a plenty of different desirable transmission rates is achievable. The wide variety of codes proposed here makes our cryptosystem applicable on a number of different communication and cryptographic standards.

• [cs.IT]**Capacity of UAV-Enabled Multicast Channel: Joint Trajectory Design and Power Allocation**

*Yundi Wu, Jie Xu, Ling Qiu, Rui Zhang*

http://arxiv.org/abs/1711.04387v1

This paper studies an unmanned aerial vehicle (UAV)-enabled multicast channel, in which a UAV serves as a mobile transmitter to deliver common information to a set of $K$ ground users. We aim to characterize the capacity of this channel over a finite UAV communication period, subject to its maximum speed constraint and an average transmit power constraint. To achieve the capacity, the UAV should use a sufficiently long code that spans over its whole communication period. Accordingly, the multicast channel capacity is achieved via maximizing the minimum achievable time-averaged rates of the $K$ users, by jointly optimizing the UAV's trajectory and transmit power allocation over time. However, this problem is non-convex and difficult to be solved optimally. To tackle this problem, we first consider a relaxed problem by ignoring the maximum UAV speed constraint, and obtain its globally optimal solution via the Lagrange dual method. The optimal solution reveals that the UAV should hover above a finite number of ground locations, with the optimal hovering duration and transmit power at each location. Next, based on such a multi-location-hovering solution, we present a successive hover-and-fly trajectory design and obtain the corresponding optimal transmit power allocation for the case with the maximum UAV speed constraint. Numerical results show that our proposed joint UAV trajectory and transmit power optimization significantly improves the achievable rate of the UAV-enabled multicast channel, and also greatly outperforms the conventional multicast channel with a fixed-location transmitter.

• [cs.IT]**Effect of enhanced dissipation by shear flows on transient relaxation and probability density function in two dimensions**

*Eun-jin Kim, Ismail Movahedi*

http://arxiv.org/abs/1711.04898v1

We report a non-perturbative study of the effects of shear flows on turbulence reduction in a decaying turbulence in two dimensions. By considering different initial power spectra and shear flows (zonal flows, combined zonal flows and streamers), we demonstrate how shear flows rapidly generate small scales, leading to a fast damping of turbulence amplitude. In particular, a double exponential decrease in turbulence amplitude is shown to occur due to an exponential increase in wavenumber. The scaling of the effective dissipation time scale $\tau_{e}$, previously taken to be a hybrid time scale $\tau_{e} \propto \tau_{\Omega}^{{2/3}} \tau_{\eta}$, is shown to depend on types of depend on the type of shear flow as well as the initial power spectrum. Here, $\tau_{\Omega}$ and $\tau_{\eta}$ are shearing and molecular diffusion times, respectively. Furthermore, we present time-dependent Probability Density Functions (PDFs) and discuss the effect of enhanced dissipation on PDFs and a dynamical time scale $\tau(t)$, which represents the time scale over which a system passes through statistically different states.

• [cs.IT]**Eigendecomposition-Based Partial FFT Demodulation for Differential OFDM in Underwater Acoustic Communications**

*Jing Han, Lingling Zhang, Qunfei Zhang, Geert Leus*

http://arxiv.org/abs/1711.04892v1

Differential orthogonal frequency division multiplexing (OFDM) is practically attractive for underwater acoustic communications since it has the potential to obviate channel estimation. However, similar to coherent OFDM, it may suffer from severe inter-carrier interference over time-varying channels. To alleviate the induced performance degradation, we adopt the newly-emerging partial FFT demodulation technique in this paper and propose an eigendecomposition-based algorithm to compute the combining weights. Compared to existing adaptive methods, the new algorithm can avoid error propagation and eliminate the need for parameter tuning. Moreover, it guarantees global optimality under the narrowband Doppler assumption, with the optimal weight vector of partial FFT demodulation achieved by the eigenvector associated with the smallest eigenvalue of the pilot detection error matrix. Finally, the algorithm can also be extended straightforwardly to perform subband-wise computation to counteract wideband Doppler effects.

• [cs.IT]**Energy-Delay-Distortion Problem**

*Rahul Vaze, Shreyas Chaudhari, Akshat Choube, Nitin Aggarwal*

http://arxiv.org/abs/1711.05032v1

An energy-limited source trying to transmit multiple packets to a destination with possibly different sizes is considered. With limited energy, the source cannot potentially transmit all bits of all packets. In addition, there is a delay cost associated with each packet. Thus, the source has to choose, how many bits to transmit for each packet, and the order in which to transmit these bits, to minimize the cost of distortion (introduced by transmitting lower number of bits) and queueing plus transmission delay, across all packets. Assuming an exponential metric for distortion loss and linear delay cost, we show that the optimization problem is jointly convex. Hence, the problem can be exactly solved using convex solvers, however, because of the complicated expression derived from the KKT conditions, no closed form solution can be found even with the simplest cost function choice made in the paper, also the optimal order in which packets should be transmitted needs to be found via brute force. To facilitate a more structured solution, a discretized version of the problem is also considered, where time and energy are divided in discrete amounts. In any time slot (fixed length), bits belonging to any one packet can be transmitted, while any discrete number of energy quanta can be used in any slot corresponding to any one packet, such that the total energy constraint is satisfied. The discretized problem is a special case of a multi-partitioning problem, where each packet's utility is super-modular and the proposed greedy solution is shown to incur cost that is at most $2$-times of the optimal cost.

• [cs.IT]**Information Design for Strategic Coordination of Autonomous Devices with Non-Aligned Utilities**

*Maël Le Treust, Tristan Tomala*

http://arxiv.org/abs/1711.04492v1

In this paper, we investigate the coordination of autonomous devices with non-aligned utility functions. Both encoder and decoder are considered as players, that choose the encoding and the decoding in order to maximize their long-run utility functions. The topology of the point-to-point network under investigation, suggests that the decoder implements a strategy, knowing in advance the strategy of the encoder. We characterize the encoding and decoding functions that form an equilibrium, by using empirical coordination. The equilibrium solution is related to an auxiliary game in which both players choose some conditional distributions in order to maximize their expected utilities. This problem is closely related to the literature on "Information Design" in Game Theory. We also characterize the set of posterior distributions that are compatible with a rate-limited channel between the encoder and the decoder. Finally, we provide an example of non-aligned utility functions corresponding to parallel fading multiple access channels.

• [cs.IT]**Persuasion with limited communication resources**

*Maël Le Treust, Tristan Tomala*

http://arxiv.org/abs/1711.04474v1

We consider a Bayesian persuasion problem where the persuader communicates with the decision maker through an imperfect communication channel. The channel has a fixed and limited number of messages and is subject to exogenous noise. Imperfect communication entails a loss of payoff for the persuader. We show that if the persuasion problem consists of a large number of independent copies of the same base problem, then the persuader achieves a better payoff by linking the problems together. We measure the payoff gain in terms of the capacity of the communication channel.

• [cs.IT]**Preserving Reliability to Heterogeneous Ultra-Dense Distributed Networks in Unlicensed Spectrum**

*Qimei Cui, Yu Gu, Wei Ni, Xuefei Zhang, Xiaofeng Tao, Ping Zhang, Ren Ping Liu*

http://arxiv.org/abs/1711.04614v1

This article investigates the prominent dilemma between capacity and reliability in heterogeneous ultra-dense distributed networks, and advocates a new measure of effective capacity to quantify the maximum sustainable data rate of a link while preserving the quality-of-service (QoS) of the link in such networks. Recent breakthroughs are brought forth in developing the theory of the effective capacity in heterogeneous ultra-dense distributed networks. Potential applications of the effective capacity are demonstrated on the admission control, power control and resource allocation of such networks, with substantial gains revealed over existing technologies. This new measure is of particular interest to ultra-dense deployment of the emerging fifth-generation (5G) wireless networks in the unlicensed spectrum, leveraging the capacity gain brought by the use of the unlicensed band and the stringent reliability sustained by 5G in future heterogeneous network environments.

• [cs.IT]**Private Function Retrieval**

*Mahtab Mirmohseni, Mohammad Ali Maddah-Ali*

http://arxiv.org/abs/1711.04677v1

The widespread use of cloud computing services raise the question of how one can delegate the processing tasks to the untrusted distributed parties without breeching the privacy of its data and algorithms. Motivated by the algorithm privacy concerns in a distributed computing system, in this paper, we introduce the Private Function Retrieval (PFR) problem, where a user wishes to efficiently retrieve a linear function of $K$ messages from $N$ non-communicating replicated servers while keeping the function hidden from each individual server. The goal is to find a scheme with minimum communication cost. To characterize the fundamental limits of the communication cost, we define the capacity of PFR problem as the size of the message that can be privately retrieved (which is the size of one file) normalized to the required downloaded information bits. We first show that for the PFR problem with $K$ messages, $N=2$ servers and a linear function with binary coefficients the capacity is $C=\frac{1}{2}\Big(1-\frac{1}{2

^{K}\Big)}{-1}$. Interestingly, this is the capacity of retrieving one of $K$ messages from $N=2$ servers while keeping the index of the requested message hidden from each individual server, the problem known as private information retrieval (PFR). Then, we extend the proposed achievable scheme to the case of arbitrary number of servers and arbitrary $GF(q)$ field for the coefficients and obtain $R=\Big(1-\frac{1}{N}\Big)\Big(1+\frac{\frac{1}{N-1}}{(\frac{q^{K-1}{q-1})}{N-1}}\Big)$.

• [cs.IT]**Restoration by Compression**

*Yehuda Dar, Michael Elad, Alfred M. Bruckstein*

http://arxiv.org/abs/1711.05147v1

In this paper we study the topic of signal restoration using complexity regularization, quantifying the compression bit-cost of the signal estimate. While complexity-regularized restoration is an established concept, solid practical methods were suggested only for the Gaussian denoising task, leaving more complicated restoration problems without a generally constructive approach. Here we present practical methods for complexity-regularized restoration of signals, accommodating deteriorations caused by a known linear degradation operator of an arbitrary form. Our iterative procedure, obtained using the Half Quadratic Splitting approach, addresses the restoration task as a sequence of simpler problems involving $ \ell_2$-regularized estimations and rate-distortion optimizations (considering the squared-error criterion). Further, we replace the rate-distortion optimizations with an arbitrary standardized compression technique and thereby restore the signal by leveraging underlying models designed for compression. Additionally, we propose a shift-invariant complexity regularizer, measuring the bit-cost of all the shifted forms of the estimate, extending our method to use averaging of decompressed outputs gathered from compression of shifted signals. On the theoretical side, we present an analysis of complexity-regularized restoration of a cyclo-stationary Gaussian signal from deterioration by a linear shift-invariant operator and an additive white Gaussian noise. The theory shows that optimal complexity-regularized restoration relies on an elementary restoration filter and compression spreading reconstruction quality unevenly based on the energy distribution of the degradation filter. Nicely, these ideas are realized also in the proposed practical methods. Finally, we present experiments showing good results for image deblurring and inpainting using the HEVC compression standard.

• [cs.IT]**Robust Kullback-Leibler Divergence and Universal Hypothesis Testing for Continuous Distributions**

*Pengfei Yang, Biao Chen*

http://arxiv.org/abs/1711.04238v1

Universal hypothesis testing refers to the problem of deciding whether samples come from a nominal distribution or an unknown distribution that is different from the nominal distribution. Hoeffding's test, whose test statistic is equivalent to the empirical Kullback-Leibler divergence (KLD), is known to be asymptotically optimal for distributions defined on finite alphabets. With continuous observations, however, the discontinuity of the KLD in the distribution functions results in significant complications for universal hypothesis testing. This paper introduces a robust version of the classical KLD, defined as the KLD from a distribution to the L'evy ball of a known distribution. This robust KLD is shown to be continuous in the underlying distribution function with respect to the weak convergence. The continuity property enables the development of a universal hypothesis test for continuous observations that is shown to be asymptotically optimal for continuous distributions in the same sense as that of the Hoeffding's test for discrete distributions.

• [cs.IT]**Spatial Channel Covariance Estimation for the Hybrid MIMO Architecture: A Compressive Sensing Based Approach**

*Sungwoo Park, Robert W. Heath Jr*

http://arxiv.org/abs/1711.04207v1

Spatial channel covariance information can replace full knowledge of the entire channel matrix for designing analog precoders in hybrid multiple-input-multiple-output (MIMO) architecture. Spatial channel covariance estimation, however, is challenging for the hybrid MIMO architecture because the estimator operating at baseband can only obtain a lower dimensional pre-combined signal through fewer radio frequency (RF) chains than antennas. In this paper, we propose two approaches for covariance estimation based on compressive sensing techniques. One is to apply a time-varying sensing matrix, and the other is to exploit the prior knowledge that the covariance matrix is Hermitian. We present the rationale of the two ideas and validate the superiority of the proposed methods by theoretical analysis and numerical simulations. We conclude the paper by extending the proposed algorithms from narrowband massive MIMO systems with a single receive antenna to wideband systems with multiple receive antennas.

• [cs.IT]**Towards a Converse for the Nearest Lattice Point Problem**

*Vinay A. Vaishampayan*

http://arxiv.org/abs/1711.04714v1

We consider the problem of distributed computation of the nearest lattice point for a two dimensional lattice. An interactive model of communication is considered. The problem is to bound the communication complexity of the search for a nearest lattice point. Upper bounds have been developed in two recent works. Here we prove the optimality of a particular step in the derivation of the upper bound.

• [cs.IT]**Truncated Polynomial Expansion Downlink Precoders and Uplink Detectors for Massive MIMO**

*Andreas Benzin, Giuseppe Caire, Yonatan Shadmi, Antonia Tulino*

http://arxiv.org/abs/1711.04141v1

In TDD reciprocity-based massive MIMO it is essential to be able to compute the downlink precoding matrix over all OFDM resource blocks within a small fraction of the uplink-downlink slot duration. Early implementation of massive MIMO are limited to the simple Conjugate Beamforming (ConjBF) precoding method, because of such computation latency limitation. However, it has been widely demonstrated by theoretical analysis and system simulation that Regularized Zero-Forcing (RZF) precoding is generally much more effective than ConjBF for a large but practical number of transmit antennas. In order to recover a significant fraction of the gap between ConjBF and RZF and yet meeting the very strict computation latency constraints, truncated polynomial expansion (TPE) methods have been proposed. In this paper we present a novel TPE method that outperforms all previously proposed methods in the general non-symmetric case of users with arbitrary antenna correlation. In addition, the proposed method is significantly simpler and more flexible than previously proposed methods based on deterministic equivalents and free probability in large random matrix theory. We consider power allocation with our TPE approach, and show that classical system optimization problems such as min-sum power and max-min rate can be easily solved. Furthermore, we provide a detailed computation latency analysis specifically targeted to a highly parallel FPGA hardware architecture.

• [cs.LG]**"Found in Translation": Predicting Outcome of Complex Organic Chemistry Reactions using Neural Sequence-to-Sequence Models**

*Philippe Schwaller, Theophile Gaudin, David Lanyi, Costas Bekas, Teodoro Laino*

http://arxiv.org/abs/1711.04810v1

There is an intuitive analogy of an organic chemist's understanding of a compound and a language speaker's understanding of a word. Consequently, it is possible to introduce the basic concepts and analyze potential impacts of linguistic analysis to the world of organic chemistry. In this work, we cast the reaction prediction task as a translation problem by introducing a template-free sequence-to-sequence model, trained end-to-end and fully data-driven. We propose a novel way of tokenization, which is arbitrarily extensible with reaction information. With this approach, we demonstrate results superior to the state-of-the-art solution by a significant margin on the top-1 accuracy. Specifically, our approach achieves an accuracy of 80.1% without relying on auxiliary knowledge such as reaction templates. Also, 66.4% accuracy is reached on a larger and noisier dataset.

• [cs.LG]**A Sparse Graph-Structured Lasso Mixed Model for Genetic Association with Confounding Correction**

*Wenting Ye, Xiang Liu, Haohan Wang, Eric P. Xing*

http://arxiv.org/abs/1711.04162v1

While linear mixed model (LMM) has shown a competitive performance in correcting spurious associations raised by population stratification, family structures, and cryptic relatedness, more challenges are still to be addressed regarding the complex structure of genotypic and phenotypic data. For example, geneticists have discovered that some clusters of phenotypes are more co-expressed than others. Hence, a joint analysis that can utilize such relatedness information in a heterogeneous data set is crucial for genetic modeling. We proposed the sparse graph-structured linear mixed model (sGLMM) that can incorporate the relatedness information from traits in a dataset with confounding correction. Our method is capable of uncovering the genetic associations of a large number of phenotypes together while considering the relatedness of these phenotypes. Through extensive simulation experiments, we show that the proposed model outperforms other existing approaches and can model correlation from both population structure and shared signals. Further, we validate the effectiveness of sGLMM in the real-world genomic dataset on two different species from plants and humans. In Arabidopsis thaliana data, sGLMM behaves better than all other baseline models for 63.4% traits. We also discuss the potential causal genetic variation of Human Alzheimer's disease discovered by our model and justify some of the most important genetic loci.

• [cs.LG]**A learning problem that is independent of the set theory ZFC axioms**

*Shai Ben-David, Pavel Hrubes, Shay Moran, Amir Shpilka, Amir Yehudayoff*

http://arxiv.org/abs/1711.05195v1

We consider the following statistical estimation problem: given a family F of real valued functions over some domain X and an i.i.d. sample drawn from an unknown distribution P over X, find h in F such that the expectation of h w.r.t. P is probably approximately equal to the supremum over expectations on members of F. This Expectation Maximization (EMX) problem captures many well studied learning problems; in fact, it is equivalent to Vapnik's general setting of learning. Surprisingly, we show that the EMX learnability, as well as the learning rates of some basic class F, depend on the cardinality of the continuum and is therefore independent of the set theory ZFC axioms (that are widely accepted as a formalization of the notion of a mathematical proof). We focus on the case where the functions in F are Boolean, which generalizes classification problems. We study the interaction between the statistical sample complexity of F and its combinatorial structure. We introduce a new version of sample compression schemes and show that it characterizes EMX learnability for a wide family of classes. However, we show that for the class of finite subsets of the real line, the existence of such compression schemes is independent of set theory. We conclude that the learnability of that class with respect to the family of probability distributions of countable support is independent of the set theory ZFC axioms. We also explore the existence of a "VC-dimension-like" parameter that captures learnability in this setting. Our results imply that that there exist no "finitary" combinatorial parameter that characterizes EMX learnability in a way similar to the VC-dimension based characterization of binary valued classification problems.

• [cs.LG]**A machine learning approach for efficient uncertainty quantification using multiscale methods**

*Shing Chan, Ahmed H. Elsheikh*

http://arxiv.org/abs/1711.04315v1

Several multiscale methods account for sub-grid scale features using coarse scale basis functions. For example, in the Multiscale Finite Volume method the coarse scale basis functions are obtained by solving a set of local problems over dual-grid cells. We introduce a data-driven approach for the estimation of these coarse scale basis functions. Specifically, we employ a neural network predictor fitted using a set of solution samples from which it learns to generate subsequent basis functions at a lower computational cost than solving the local problems. The computational advantage of this approach is realized for uncertainty quantification tasks where a large number of realizations has to be evaluated. We attribute the ability to learn these basis functions to the modularity of the local problems and the redundancy of the permeability patches between samples. The proposed method is evaluated on elliptic problems yielding very promising results.

• [cs.LG]**ADaPTION: Toolbox and Benchmark for Training Convolutional Neural Networks with Reduced Numerical Precision Weights and Activation**

*Moritz B. Milde, Daniel Neil, Alessandro Aimar, Tobi Delbruck, Giacomo Indiveri*

http://arxiv.org/abs/1711.04713v1

Deep Neural Networks (DNNs) and Convolutional Neural Networks (CNNs) are useful for many practical tasks in machine learning. Synaptic weights, as well as neuron activation functions within the deep network are typically stored with high-precision formats, e.g. 32 bit floating point. However, since storage capacity is limited and each memory access consumes power, both storage capacity and memory access are two crucial factors in these networks. Here we present a method and present the ADaPTION toolbox to extend the popular deep learning library Caffe to support training of deep CNNs with reduced numerical precision of weights and activations using fixed point notation. ADaPTION includes tools to measure the dynamic range of weights and activations. Using the ADaPTION tools, we quantized several CNNs including VGG16 down to 16-bit weights and activations with only 0.8% drop in Top-1 accuracy. The quantization, especially of the activations, leads to increase of up to 50% of sparsity especially in early and intermediate layers, which we exploit to skip multiplications with zero, thus performing faster and computationally cheaper inference.

• [cs.LG]**Adversarial Symmetric Variational Autoencoder**

*Yunchen Pu, Weiyao Wang, Ricardo Henao, Liqun Chen, Zhe Gan, Chunyuan Li, Lawrence Carin*

http://arxiv.org/abs/1711.04915v1

A new form of variational autoencoder (VAE) is developed, in which the joint distribution of data and codes is considered in two (symmetric) forms: ($i$) from observed data fed through the encoder to yield codes, and ($ii$) from latent codes drawn from a simple prior and propagated through the decoder to manifest data. Lower bounds are learned for marginal log-likelihood fits observed data and latent codes. When learning with the variational bound, one seeks to minimize the symmetric Kullback-Leibler divergence of joint density functions from ($i$) and ($ii$), while simultaneously seeking to maximize the two marginal log-likelihoods. To facilitate learning, a new form of adversarial training is developed. An extensive set of experiments is performed, in which we demonstrate state-of-the-art data reconstruction and generation on several image benchmark datasets.

• [cs.LG]**Attention-based Information Fusion using Multi-Encoder-Decoder Recurrent Neural Networks**

*Stephan Baier, Sigurd Spieckermann, Volker Tresp*

http://arxiv.org/abs/1711.04679v1

With the rising number of interconnected devices and sensors, modeling distributed sensor networks is of increasing interest. Recurrent neural networks (RNN) are considered particularly well suited for modeling sensory and streaming data. When predicting future behavior, incorporating information from neighboring sensor stations is often beneficial. We propose a new RNN based architecture for context specific information fusion across multiple spatially distributed sensor stations. Hereby, latent representations of multiple local models, each modeling one sensor station, are jointed and weighted, according to their importance for the prediction. The particular importance is assessed depending on the current context using a separate attention function. We demonstrate the effectiveness of our model on three different real-world sensor network datasets.

• [cs.LG]**CUR Decompositions, Similarity Matrices, and Subspace Clustering**

*Akram Aldroubi, Keaton Hamm, Ahmet Bugra Koku, Ali Sekmen*

http://arxiv.org/abs/1711.04178v1

A general framework for solving the subspace clustering problem using the CUR decomposition is presented. The CUR decomposition provides a natural way to construct similarity matrices for data that come from a union of unknown subspaces $\mathscr{U}=\underset{i=1}{\overset{M}\bigcup}S_i$. The similarity matrices thus constructed give the exact clustering in the noise-free case. A simple adaptation of the technique also allows clustering of noisy data. Two known methods for subspace clustering can be derived from the CUR technique. Experiments on synthetic and real data are presented to test the method.

• [cs.LG]**Dynamic Principal Projection for Cost-Sensitive Online Multi-Label Classification**

*Hong-Min Chu, Kuan-Hao Huang, Hsuan-Tien Lin*

http://arxiv.org/abs/1711.05060v1

We study multi-label classification (MLC) with three important real-world issues: online updating, label space dimensional reduction (LSDR), and cost-sensitivity. Current MLC algorithms have not been designed to address these three issues simultaneously. In this paper, we propose a novel algorithm, cost-sensitive dynamic principal projection (CS-DPP) that resolves all three issues. The foundation of CS-DPP is an online LSDR framework derived from a leading LSDR algorithm. In particular, CS-DPP is equipped with an efficient online dimension reducer motivated by matrix stochastic gradient, and establishes its theoretical backbone when coupled with a carefully-designed online regression learner. In addition, CS-DPP embeds the cost information into label weights to achieve cost-sensitivity along with theoretical guarantees. Experimental results verify that CS-DPP achieves better practical performance than current MLC algorithms across different evaluation criteria, and demonstrate the importance of resolving the three issues simultaneously.

• [cs.LG]**Fixing Weight Decay Regularization in Adam**

*Ilya Loshchilov, Frank Hutter*

http://arxiv.org/abs/1711.05101v1

We note that common implementations of adaptive gradient algorithms, such as Adam, limit the potential benefit of weight decay regularization, because the weights do not decay multiplicatively (as would be expected for standard weight decay) but by an additive constant factor. We propose a simple way to resolve this issue by decoupling weight decay and the optimization steps taken w.r.t. the loss function. We provide empirical evidence that our proposed modification (i) decouples the optimal choice of weight decay factor from the setting of the learning rate for both standard SGD and Adam, and (ii) substantially improves Adam's generalization performance, allowing it to compete with SGD with momentum on image classification datasets (on which it was previously typically outperformed by the latter). We also demonstrate that longer optimization runs require smaller weight decay values for optimal results and introduce a normalized variant of weight decay to reduce this dependence. Finally, we propose a version of Adam with warm restarts (AdamWR) that has strong anytime performance while achieving state-of-the-art results on CIFAR-10 and ImageNet32x32. Our source code is available at https://github.com/loshchil/AdamW-and-SGDW

• [cs.LG]**Linking Sequences of Events with Sparse or No Common Occurrence across Data Sets**

*Yunsung Kim*

http://arxiv.org/abs/1711.04248v1

Data of practical interest - such as personal records, transaction logs, and medical histories - are sequential collections of events relevant to a particular source entity. Recent studies have attempted to link sequences that represent a common entity across data sets to allow more comprehensive statistical analyses and to identify potential privacy failures. Yet, current approaches remain tailored to their specific domains of application, and they fail when co-referent sequences in different data sets contain sparse or no common events, which occurs frequently in many cases. To address this, we formalize the general problem of "sequence linkage" and describe "LDA-Link," a generic solution that is applicable even when co-referent event sequences contain no common items at all. LDA-Link is built upon "Split-Document" model, a new mixed-membership probabilistic model for the generation of event sequence collections. It detects the latent similarity of sequences and thus achieves robustness particularly when co-referent sequences share sparse or no event overlap. We apply LDA-Link in the context of social media profile reconciliation where users make no common posts across platforms, comparing to the state-of-the-art generic solution to sequence linkage.

• [cs.LG]**Machine Learning for the Geosciences: Challenges and Opportunities**

*Anuj Karpatne, Imme Ebert-Uphoff, Sai Ravela, Hassan Ali Babaie, Vipin Kumar*

http://arxiv.org/abs/1711.04708v1

Geosciences is a field of great societal relevance that requires solutions to several urgent problems facing our humanity and the planet. As geosciences enters the era of big data, machine learning (ML) -- that has been widely successful in commercial domains -- offers immense potential to contribute to problems in geosciences. However, problems in geosciences have several unique challenges that are seldom found in traditional applications, requiring novel problem formulations and methodologies in machine learning. This article introduces researchers in the machine learning (ML) community to these challenges offered by geoscience problems and the opportunities that exist for advancing both machine learning and geosciences. We first highlight typical sources of geoscience data and describe their properties that make it challenging to use traditional machine learning techniques. We then describe some of the common categories of geoscience problems where machine learning can play a role, and discuss some of the existing efforts and promising directions for methodological development in machine learning. We conclude by discussing some of the emerging research themes in machine learning that are applicable across all problems in the geosciences, and the importance of a deep collaboration between machine learning and geosciences for synergistic advancements in both disciplines.

• [cs.LG]**Machine vs Machine: Defending Classifiers Against Learning-based Adversarial Attacks**

*Jihun Hamm*

http://arxiv.org/abs/1711.04368v1

Recently, researchers have discovered that the state-of-the-art object classifiers can be fooled easily by small perturbations in the input unnoticeable to human eyes. Several methods were proposed to craft adversarial examples, as well as methods of robustifying the classifier against such examples. An attacker with the knowledge of the classifier parameters can generate strong adversarial patterns. Conversely, a classifier with the knowledge of such patterns can be trained to be robust to them. The cat-and-mouse game nature of the attacks and the defenses raises the question of the presence of an equilibrium in the dynamic. In this paper, we propose a game framework to formulate the interaction of attacks and defenses and present the natural notion of the best worst-case defense and attack. We propose simple algorithms to numerically find those solutions motivated by sensitivity penalization. In addition, we show the potentials of learning-based attacks, and present the close relationship between the adversarial attack and the privacy attack problems. The results are demonstrated with MNIST and CIFAR-10 datasets.

• [cs.LG]**Multiple-Source Adaptation for Regression Problems**

*Judy Hoffman, Mehryar Mohri, Ningshan Zhang*

http://arxiv.org/abs/1711.05037v1

We present a detailed theoretical analysis of the problem of multiple-source adaptation in the general stochastic scenario, extending known results that assume a single target labeling function. Our results cover a more realistic scenario and show the existence of a single robust predictor accurate for \emph{any} target mixture of the source distributions. Moreover, we present an efficient and practical optimization solution to determine the robust predictor in the important case of squared loss, by casting the problem as an instance of DC-programming. We report the results of experiments with both an artificial task and a sentiment analysis task. We find that our algorithm outperforms competing approaches by producing a single robust model that performs well on any target mixture distribution.

• [cs.LG]**Near-optimal sample complexity for convex tensor completion**

*Navid Ghadermarzy, Yaniv Plan, Özgür Yılmaz*

http://arxiv.org/abs/1711.04965v1

We analyze low rank tensor completion (TC) using noisy measurements of a subset of the tensor. Assuming a rank-$r$, order-$d$, $N \times N \times \cdots \times N$ tensor where $r=O(1)$, the best sampling complexity that was achieved is $O(N^{\frac{d}{2}})$, which is obtained by solving a tensor nuclear-norm minimization problem. However, this bound is significantly larger than the number of free variables in a low rank tensor which is $O(dN)$. In this paper, we show that by using an atomic-norm whose atoms are rank-$1$ sign tensors, one can obtain a sample complexity of $O(dN)$. Moreover, we generalize the matrix max-norm definition to tensors, which results in a max-quasi-norm (max-qnorm) whose unit ball has small Rademacher complexity. We prove that solving a constrained least squares estimation using either the convex atomic-norm or the nonconvex max-qnorm results in optimal sample complexity for the problem of low-rank tensor completion. Furthermore, we show that these bounds are nearly minimax rate-optimal. We also provide promising numerical results for max-qnorm constrained tensor completion, showing improved recovery results compared to matricization and alternating least squares.

• [cs.LG]**On the ERM Principle with Networked Data**

*Yuanhong Wang, Yuyi Wang, Xingwu Liu, Juhua Pu*

http://arxiv.org/abs/1711.04297v1

Networked data, in which every training example involves two objects and may share some common objects with others, is used in many machine learning tasks such as learning to rank and link prediction. A challenge of learning from networked examples is that target values are not known for some pairs of objects. In this case, neither the classical i.i.d.\ assumption nor techniques based on complete U-statistics can be used. Most existing theoretical results of this problem only deal with the classical empirical risk minimization (ERM) principle that always weights every example equally, but this strategy leads to unsatisfactory bounds. We consider general weighted ERM and show new universal risk bounds for this problem. These new bounds naturally define an optimization problem which leads to appropriate weights for networked examples. Though this optimization problem is not convex in general, we devise a new fully polynomial-time approximation scheme (FPTAS) to solve it.

• [cs.LG]**Parameter Estimation in Finite Mixture Models by Regularized Optimal Transport: A Unified Framework for Hard and Soft Clustering**

*Arnaud Dessein, Nicolas Papadakis, Charles-Alban Deledalle*

http://arxiv.org/abs/1711.04366v1

In this short paper, we formulate parameter estimation for finite mixture models in the context of discrete optimal transportation with convex regularization. The proposed framework unifies hard and soft clustering methods for general mixture models. It also generalizes the celebrated $k$\nobreakdash-means and expectation-maximization algorithms in relation to associated Bregman divergences when applied to exponential family mixture models.

• [cs.LG]**Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness**

*Michael Kearns, Seth Neel, Aaron Roth, Zhiwei Steven Wu*

http://arxiv.org/abs/1711.05144v1

The most prevalent notions of fairness in machine learning are statistical definitions: they fix a small collection of pre-defined subgroups, and then ask for parity of some statistic of the classifier across these subgroups. Constraints of this form are susceptible to (intentional or inadvertent) fairness gerrymandering, in which a classifier appears to be fair on each individual subgroup, but badly violates the fairness constraint on one or more structured subgroups defined over the protected attributes. We propose instead to demand statistical notions of fairness across exponentially (or infinitely) many subgroups, defined by a structured class of functions over the protected attributes. This interpolates between statistical definitions of fairness, and recently proposed individual notions of fairness, but it raises several computational challenges. It is no longer clear how to even audit a fixed classifier to see if it satisfies such a strong definition of fairness. We prove that the computational problem of auditing subgroup fairness for both equality of false positive rates and statistical parity is equivalent to the problem of weak agnostic learning --- which means it is computationally hard in the worst case, even for simple structured subclasses. However, it also suggests that common heuristics for learning can be applied to successfully solve the auditing problem in practice. We then derive an algorithm that provably converges to the best fair distribution over classifiers in a given class, given access to oracles which can solve the agnostic learning and auditing problems. The algorithm is based on a formulation of subgroup fairness as fictitious play in a two-player zero-sum game between a Learner and an Auditor. We implement our algorithm using linear regression as a heuristic oracle, and show that we can effectively both audit and learn fair classifiers on real datasets.

• [cs.LG]**Provably efficient neural network representation for image classification**

*Yichen Huang*

http://arxiv.org/abs/1711.04606v1

The state-of-the-art approaches for image classification are based on neural networks. Mathematically, the task of classifying images is equivalent to finding the function that maps an image to the label it is associated with. To rigorously establish the success of neural network methods, we should first prove that the function has an efficient neural network representation, and then design provably efficient training algorithms to find such a representation. Here, we achieve the first goal based on a set of assumptions about the patterns in the images. The validity of these assumptions is very intuitive in many image classification problems, including but not limited to, recognizing handwritten digits.

• [cs.LG]**Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice**

*Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli*

http://arxiv.org/abs/1711.04735v1

It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed. For example, ensuring the mean squared singular value of a network's input-output Jacobian is $O(1)$ is essential for avoiding the exponential vanishing or explosion of gradients. The stronger condition that all singular values of the Jacobian concentrate near $1$ is a property known as dynamical isometry. For deep linear networks, dynamical isometry can be achieved through orthogonal weight initialization and has been shown to dramatically speed up learning; however, it has remained unclear how to extend these results to the nonlinear setting. We address this question by employing powerful tools from free probability theory to compute analytically the entire singular value distribution of a deep network's input-output Jacobian. We explore the dependence of the singular value distribution on the depth of the network, the weight initialization, and the choice of nonlinearity. Intriguingly, we find that ReLU networks are incapable of dynamical isometry. On the other hand, sigmoidal networks can achieve isometry, but only with orthogonal weight initialization. Moreover, we demonstrate empirically that deep nonlinear networks achieving dynamical isometry learn orders of magnitude faster than networks that do not. Indeed, we show that properly-initialized deep sigmoidal networks consistently outperform deep ReLU networks. Overall, our analysis reveals that controlling the entire distribution of Jacobian singular values is an important design consideration in deep learning.

• [cs.LG]**Robust Matrix Elastic Net based Canonical Correlation Analysis: An Effective Algorithm for Multi-View Unsupervised Learning**

*Peng-Bo Zhang, Zhi-Xin Yang*

http://arxiv.org/abs/1711.05068v1

This paper presents a robust matrix elastic net based canonical correlation analysis (RMEN-CCA) for multiple view unsupervised learning problems, which emphasizes the combination of CCA and the robust matrix elastic net (RMEN) used as coupled feature selection. The RMEN-CCA leverages the strength of the RMEN to distill naturally meaningful features without any prior assumption and to measure effectively correlations between different 'views'. We can further employ directly the kernel trick to extend the RMEN-CCA to the kernel scenario with theoretical guarantees, which takes advantage of the kernel trick for highly complicated nonlinear feature learning. Rather than simply incorporating existing regularization minimization terms into CCA, this paper provides a new learning paradigm for CCA and is the first to derive a coupled feature selection based CCA algorithm that guarantees convergence. More significantly, for CCA, the newly-derived RMEN-CCA bridges the gap between measurement of relevance and coupled feature selection. Moreover, it is nontrivial to tackle directly the RMEN-CCA by previous optimization approaches derived from its sophisticated model architecture. Therefore, this paper further offers a bridge between a new optimization problem and an existing efficient iterative approach. As a consequence, the RMEN-CCA can overcome the limitation of CCA and address large-scale and streaming data problems. Experimental results on four popular competing datasets illustrate that the RMEN-CCA performs more effectively and efficiently than do state-of-the-art approaches.

• [cs.LG]**Skyline Identification in Multi-Armed Bandits**

*Albert Cheu, Ravi Sundaram, Jonathan Ullman*

http://arxiv.org/abs/1711.04213v1

We introduce a variant of the classical PAC multi-armed bandit problem. There is an ordered set of $n$ arms $A[1],\dots,A[n]$, each with some stochastic reward drawn from some unknown bounded distribution. The goal is to identify the $skyline$ of the set $A$, consisting of all arms $A[i]$ such that $A[i]$ has larger expected reward than all lower-numbered arms $A[1],\dots,A[i-1]$. We define a natural notion of an $\varepsilon$-approximate skyline and prove matching upper and lower bounds for identifying an $\varepsilon$-skyline. Specifically, we show that in order to identify an $\varepsilon$-skyline from among $n$ arms with probability $1-\delta$, $$ \Theta\bigg(\frac{n}{\varepsilon^2} \cdot \min\bigg{ \log\bigg(\frac{1}{\varepsilon \delta}\bigg), \log\bigg(\frac{n}{\delta}\bigg) \bigg} \bigg) $$ samples are necessary and sufficient. When $\varepsilon \gg 1/n$, our results improve over the na"ive algorithm, which draws enough samples to approximate the expected reward of every arm. Our results show that the sample complexity of this problem lies strictly in between that best arm identification (Even-Dar et al., COLT'02) and that of approximating the expected reward of every arm.

• [cs.LG]**Sobolev GAN**

*Youssef Mroueh, Chun-Liang Li, Tom Sercu, Anant Raj, Yu Cheng*

http://arxiv.org/abs/1711.04894v1

We propose a new Integral Probability Metric (IPM) between distributions: the Sobolev IPM. The Sobolev IPM compares the mean discrepancy of two distributions for functions (critic) restricted to a Sobolev ball defined with respect to a dominant measure $\mu$. We show that the Sobolev IPM compares two distributions in high dimensions based on weighted conditional Cumulative Distribution Functions (CDF) of each coordinate on a leave one out basis. The Dominant measure $\mu$ plays a crucial role as it defines the support on which conditional CDFs are compared. Sobolev IPM can be seen as an extension of the one dimensional Von-Mises Cram'er statistics to high dimensional distributions. We show how Sobolev IPM can be used to train Generative Adversarial Networks (GANs). We then exploit the intrinsic conditioning implied by Sobolev IPM in text generation. Finally we show that a variant of Sobolev GAN achieves competitive results in semi-supervised learning on CIFAR-10, thanks to the smoothness enforced on the critic by Sobolev GAN which relates to Laplacian regularization.

• [cs.LG]**Sparsification of the Alignment Path Search Space in Dynamic Time Warping**

*Saeid Soheily-Khah, Pierre-François Marteau*

http://arxiv.org/abs/1711.04453v1

Temporal data are naturally everywhere, especially in the digital era that sees the advent of big data and internet of things. One major challenge that arises during temporal data analysis and mining is the comparison of time series or sequences, which requires to determine a proper distance or (dis)similarity measure. In this context, the Dynamic Time Warping (DTW) has enjoyed success in many domains, due to its 'temporal elasticity', a property particularly useful when matching temporal data. Unfortunately this dissimilarity measure suffers from a quadratic computational cost, which prohibits its use for large scale applications. This work addresses the sparsification of the alignment path search space for DTW-like measures, essentially to lower their computational cost without loosing on the quality of the measure. As a result of our sparsification approach, two new (dis)similarity measures, namely SP-DTW (Sparsified-Paths search space DTW) and its kernelization SP-K rdtw (Sparsified-Paths search space K rdtw kernel) are proposed for time series comparison. A wide range of public datasets is used to evaluate the efficiency (estimated in term of speed-up ratio and classification accuracy) of the proposed (dis)similarity measures on the 1-Nearest Neighbor (1-NN) and the Support Vector Machine (SVM) classification algorithms. Our experiment shows that our proposed measures provide a significant speed-up without loosing on accuracy. Furthermore, at the cost of a slight reduction of the speedup they significantly outperform on the accuracy criteria the old but well known Sakoe-Chiba approach that reduces the DTW path search space using a symmetric corridor.

• [cs.LG]**Spatio-Temporal Data Mining: A Survey of Problems and Methods**

*Gowtham Atluri, Anuj Karpatne, Vipin Kumar*

http://arxiv.org/abs/1711.04710v1

Large volumes of spatio-temporal data are increasingly collected and studied in diverse domains including, climate science, social sciences, neuroscience, epidemiology, transportation, mobile health, and Earth sciences. Spatio-temporal data differs from relational data for which computational approaches are developed in the data mining community for multiple decades, in that both spatial and temporal attributes are available in addition to the actual measurements/attributes. The presence of these attributes introduces additional challenges that needs to be dealt with. Approaches for mining spatio-temporal data have been studied for over a decade in the data mining community. In this article we present a broad survey of this relatively young field of spatio-temporal data mining. We discuss different types of spatio-temporal data and the relevant data mining questions that arise in the context of analyzing each of these datasets. Based on the nature of the data mining problem studied, we classify literature on spatio-temporal data mining into six major categories: clustering, predictive learning, change detection, frequent pattern mining, anomaly detection, and relationship mining. We discuss the various forms of spatio-temporal data mining problems in each of these categories.

• [cs.LG]**Tensor Decompositions for Modeling Inverse Dynamics**

*Stephan Baier, Volker Tresp*

http://arxiv.org/abs/1711.04683v1

Modeling inverse dynamics is crucial for accurate feedforward robot control. The model computes the necessary joint torques, to perform a desired movement. The highly non-linear inverse function of the dynamical system can be approximated using regression techniques. We propose as regression method a tensor decomposition model that exploits the inherent three-way interaction of positions x velocities x accelerations. Most work in tensor factorization has addressed the decomposition of dense tensors. In this paper, we build upon the decomposition of sparse tensors, with only small amounts of nonzero entries. The decomposition of sparse tensors has successfully been used in relational learning, e.g., the modeling of large knowledge graphs. Recently, the approach has been extended to multi-class classification with discrete input variables. Representing the data in high dimensional sparse tensors enables the approximation of complex highly non-linear functions. In this paper we show how the decomposition of sparse tensors can be applied to regression problems. Furthermore, we extend the method to continuous inputs, by learning a mapping from the continuous inputs to the latent representations of the tensor decomposition, using basis functions. We evaluate our proposed model on a dataset with trajectories from a seven degrees of freedom SARCOS robot arm. Our experimental results show superior performance of the proposed functional tensor model, compared to challenging state-of-the art methods.

• [cs.LG]**Three Factors Influencing Minima in SGD**

*Stanisław Jastrzębski, Zachary Kenton, Devansh Arpit, Nicolas Ballas, Asja Fischer, Yoshua Bengio, Amos Storkey*

http://arxiv.org/abs/1711.04623v1

We study the properties of the endpoint of stochastic gradient descent (SGD). By approximating SGD as a stochastic differential equation (SDE) we consider the Boltzmann-Gibbs equilibrium distribution of that SDE under the assumption of isotropic variance in loss gradients. Through this analysis, we find that three factors - learning rate, batch size and the variance of the loss gradients - control the trade-off between the depth and width of the minima found by SGD, with wider minima favoured by a higher ratio of learning rate to batch size. We have direct control over the learning rate and batch size, while the variance is determined by the choice of model architecture, model parameterization and dataset. In the equilibrium distribution only the ratio of learning rate to batch size appears, implying that the equilibrium distribution is invariant under a simultaneous rescaling of learning rate and batch size by the same amount. We then explore experimentally how learning rate and batch size affect SGD from two perspectives: the endpoint of SGD and the dynamics that lead up to it. For the endpoint, the experiments suggest the endpoint of SGD is invariant under simultaneous rescaling of batch size and learning rate, and also that a higher ratio leads to flatter minima, both findings are consistent with our theoretical analysis. We note experimentally that the dynamics also seem to be invariant under the same rescaling of learning rate and batch size, which we explore showing that one can exchange batch size and learning rate for cyclical learning rate schedule. Next, we illustrate how noise affects memorization, showing that high noise levels lead to better generalization. Finally, we find experimentally that the invariance under simultaneous rescaling of learning rate and batch size breaks down if the learning rate gets too large or the batch size gets too small.

• [cs.LG]**TripletGAN: Training Generative Model with Triplet Loss**

*Gongze Cao, Yezhou Yang, Jie Lei, Cheng Jin, Yang Liu, Mingli Song*

http://arxiv.org/abs/1711.05084v1

As an effective way of metric learning, triplet loss has been widely used in many deep learning tasks, including face recognition and person-ReID, leading to many states of the arts. The main innovation of triplet loss is using feature map to replace softmax in the classification task. Inspired by this concept, we propose here a new adversarial modeling method by substituting the classification loss of discriminator with triplet loss. Theoretical proof based on IPM (Integral probability metric) demonstrates that such setting will help the generator converge to the given distribution theoretically under some conditions. Moreover, since triplet loss requires the generator to maximize distance within a class, we justify tripletGAN is also helpful to prevent mode collapse through both theory and experiment.

• [cs.LG]**Unified Spectral Clustering with Optimal Graph**

*Zhao Kang, Chong Peng, Qiang Cheng, Zenglin Xu*

http://arxiv.org/abs/1711.04258v1

Spectral clustering has found extensive use in many areas. Most traditional spectral clustering algorithms work in three separate steps: similarity graph construction; continuous labels learning; discretizing the learned labels by k-means clustering. Such common practice has two potential flaws, which may lead to severe information loss and performance degradation. First, predefined similarity graph might not be optimal for subsequent clustering. It is well-accepted that similarity graph highly affects the clustering results. To this end, we propose to automatically learn similarity information from data and simultaneously consider the constraint that the similarity matrix has exact c connected components if there are c clusters. Second, the discrete solution may deviate from the spectral solution since k-means method is well-known as sensitive to the initialization of cluster centers. In this work, we transform the candidate solution into a new one that better approximates the discrete one. Finally, those three subtasks are integrated into a unified framework, with each subtask iteratively boosted by using the results of the others towards an overall optimal solution. It is known that the performance of a kernel method is largely determined by the choice of kernels. To tackle this practical problem of how to select the most suitable kernel for a particular data set, we further extend our model to incorporate multiple kernel learning ability. Extensive experiments demonstrate the superiority of our proposed method as compared to existing clustering approaches.

• [cs.LG]**Weightless: Lossy Weight Encoding For Deep Neural Network Compression**

*Brandon Reagen, Udit Gupta, Robert Adolf, Michael M. Mitzenmacher, Alexander M. Rush, Gu-Yeon Wei, David Brooks*

http://arxiv.org/abs/1711.04686v1

The large memory requirements of deep neural networks limit their deployment and adoption on many devices. Model compression methods effectively reduce the memory requirements of these models, usually through applying transformations such as weight pruning or quantization. In this paper, we present a novel scheme for lossy weight encoding which complements conventional compression techniques. The encoding is based on the Bloomier filter, a probabilistic data structure that can save space at the cost of introducing random errors. Leveraging the ability of neural networks to tolerate these imperfections and by re-training around the errors, the proposed technique, Weightless, can compress DNN weights by up to 496x with the same model accuracy. This results in up to a 1.51x improvement over the state-of-the-art.

• [cs.LG]**pyLEMMINGS: Large Margin Multiple Instance Classification and Ranking for Bioinformatics Applications**

*Amina Asif, Wajid Arshad Abbasi, Farzeen Munir, Asa Ben-Hur, Fayyaz ul Amir Afsar Minhas*

http://arxiv.org/abs/1711.04913v1

Motivation: A major challenge in the development of machine learning based methods in computational biology is that data may not be accurately labeled due to the time and resources required for experimentally annotating properties of proteins and DNA sequences. Standard supervised learning algorithms assume accurate instance-level labeling of training data. Multiple instance learning is a paradigm for handling such labeling ambiguities. However, the widely used large-margin classification methods for multiple instance learning are heuristic in nature with high computational requirements. In this paper, we present stochastic sub-gradient optimization large margin algorithms for multiple instance classification and ranking, and provide them in a software suite called pyLEMMINGS. Results: We have tested pyLEMMINGS on a number of bioinformatics problems as well as benchmark datasets. pyLEMMINGS has successfully been able to identify functionally important segments of proteins: binding sites in Calmodulin binding proteins, prion forming regions, and amyloid cores. pyLEMMINGS achieves state-of-the-art performance in all these tasks, demonstrating the value of multiple instance learning. Furthermore, our method has shown more than 100-fold improvement in terms of running time as compared to heuristic solutions with improved accuracy over benchmark datasets. Availability and Implementation: pyLEMMINGS python package is available for download at: http://faculty.pieas.edu.pk/fayyaz/software.html#pylemmings.

• [cs.MS]**Domain-Specific Acceleration and Auto-Parallelization of Legacy Scientific Code in FORTRAN 77 using Source-to-Source Compilation**

*Wim Vanderbauwhede, Gavin Davidson*

http://arxiv.org/abs/1711.04471v1

Massively parallel accelerators such as GPGPUs, manycores and FPGAs represent a powerful and affordable tool for scientists who look to speed up simulations of complex systems. However, porting code to such devices requires a detailed understanding of heterogeneous programming tools and effective strategies for parallelization. In this paper we present a source to source compilation approach with whole-program analysis to automatically transform single-threaded FORTRAN 77 legacy code into OpenCL-accelerated programs with parallelized kernels. The main contributions of our work are: (1) whole-source refactoring to allow any subroutine in the code to be offloaded to an accelerator. (2) Minimization of the data transfer between the host and the accelerator by eliminating redundant transfers. (3) Pragmatic auto-parallelization of the code to be offloaded to the accelerator by identification of parallelizable maps and reductions. We have validated the code transformation performance of the compiler on the NIST FORTRAN 78 test suite and several real-world codes: the Large Eddy Simulator for Urban Flows, a high-resolution turbulent flow model; the shallow water component of the ocean model Gmodel; the Linear Baroclinic Model, an atmospheric climate model and Flexpart-WRF, a particle dispersion simulator. The automatic parallelization component has been tested on as 2-D Shallow Water model (2DSW) and on the Large Eddy Simulator for Urban Flows (UFLES) and produces a complete OpenCL-enabled code base. The fully OpenCL-accelerated versions of the 2DSW and the UFLES are resp. 9x and 20x faster on GPU than the original code on CPU, in both cases this is the same performance as manually ported code.

• [cs.NE]**BP-STDP: Approximating Backpropagation using Spike Timing Dependent Plasticity**

*Amirhossein Tavanaei, Anthony S. Maida*

http://arxiv.org/abs/1711.04214v1

The problem of training spiking neural networks (SNNs) is a necessary precondition to understanding computations within the brain, a field still in its infancy. Previous work has shown that supervised learning in multi-layer SNNs enables bio-inspired networks to recognize patterns of stimuli through hierarchical feature acquisition. Although gradient descent has shown impressive performance in multi-layer (and deep) SNNs, it is generally not considered biologically plausible and is also computationally expensive. This paper proposes a novel supervised learning approach based on an event-based spike-timing-dependent plasticity (STDP) rule embedded in a network of integrate-and-fire (IF) neurons. The proposed temporally local learning rule follows the backpropagation weight change updates applied at each time step. This approach enjoys benefits of both accurate gradient descent and temporally local, efficient STDP. Thus, this method is able to address some open questions regarding accurate and efficient computations that occur in the brain. The experimental results on the XOR problem, the Iris data, and the MNIST dataset demonstrate that the proposed SNN performs as successfully as the traditional NNs. Our approach also compares favorably with the state-of-the-art multi-layer SNNs.

• [cs.NE]**Concurrent Pump Scheduling and Storage Level Optimization Using Meta-Models and Evolutionary Algorithms**

*Morad Behandish, Zheng Yi Wu*

http://arxiv.org/abs/1711.04988v1

In spite of the growing computational power offered by the commodity hardware, fast pump scheduling of complex water distribution systems is still a challenge. In this paper, the Artificial Neural Network (ANN) meta-modeling technique has been employed with a Genetic Algorithm (GA) for simultaneously optimizing the pump operation and the tank levels at the ends of the cycle. The generalized GA+ANN algorithm has been tested on a real system in the UK. Comparing to the existing operation, the daily cost is reduced by about 10-15%, while the number of pump switches are kept below 4 switches-per-day. In addition, tank levels are optimized ensure a periodic behavior, which results in a predictable and stable performance over repeated cycles.

• [cs.NE]**Deep Rewiring: Training very sparse deep networks**

*Guillaume Bellec, David Kappel, Wolfgang Maass, Robert Legenstein*

http://arxiv.org/abs/1711.05136v1

Neuromorphic hardware tends to pose limits on the connectivity of deep networks that one can run on them. But also generic hardware and software implementations of deep learning run more efficiently on sparse networks. Several methods exist for pruning connections of a neural network after it was trained without connectivity constraints. We present an algorithm, DEEP R, that enables us to train directly a sparsely connected neural network. DEEP R automatically rewires the network during supervised training so that connections are there where they are most needed for the task, while its total number is all the time strictly bounded. We demonstrate that DEEP R can be used to train very sparse feedforward and recurrent neural networks on standard benchmark tasks with just a minor loss in performance. DEEP R is based on a rigorous theoretical foundation that views rewiring as stochastic sampling of network configurations from a posterior.

• [cs.NE]**Learning Explanatory Rules from Noisy Data**

*Richard Evans, Edward Grefenstette*

http://arxiv.org/abs/1711.04574v1

Artificial Neural Networks are powerful function approximators capable of modelling solutions to a wide variety of problems, both supervised and unsupervised. As their size and expressivity increases, so too does the variance of the model, yielding a nearly ubiquitous overfitting problem. Although mitigated by a variety of model regularisation methods, the common cure is to seek large amounts of training data---which is not necessarily easily obtained---that sufficiently approximates the data distribution of the domain we wish to test on. In contrast, logic programming methods such as Inductive Logic Programming offer an extremely data-efficient process by which models can be trained to reason on symbolic domains. However, these methods are unable to deal with the variety of domains neural networks can be applied to: they are not robust to noise in or mislabelling of inputs, and perhaps more importantly, cannot be applied to non-symbolic domains where the data is ambiguous, such as operating on raw pixels. In this paper, we propose a Differentiable Inductive Logic framework ($\partial$ILP), which can not only solve tasks which traditional ILP systems are suited for, but shows a robustness to noise and error in the training data which ILP cannot cope with. Furthermore, as it is trained by backpropagation against a likelihood objective, it can be hybridised by connecting it with neural networks over ambiguous data in order to be applied to domains which ILP cannot address, while providing data efficiency and generalisation beyond what neural networks on their own can achieve.

• [cs.NE]**Neural Networks Architecture Evaluation in a Quantum Computer**

*Adenilton José da Silva, Rodolfo Luan F. de Oliveira*

http://arxiv.org/abs/1711.04759v1

In this work, we propose a quantum algorithm to evaluate neural networks architectures named Quantum Neural Network Architecture Evaluation (QNNAE). The proposed algorithm is based on a quantum associative memory and the learning algorithm for artificial neural networks. Unlike conventional algorithms for evaluating neural network architectures, QNNAE does not depend on initialization of weights. The proposed algorithm has a binary output and results in 0 with probability proportional to the performance of the network. And its computational cost is equal to the computational cost to train a neural network.

• [cs.NE]**Reliability and Sharpness in Border Crossing Traffic Interval Prediction**

*Lei Lin, John Handley, Adel Sadek*

http://arxiv.org/abs/1711.04848v1

Short-term traffic volume prediction models have been extensively studied in the past few decades. However, most of the previous studies only focus on single-value prediction. Considering the uncertain and chaotic nature of the transportation system, an accurate and reliable prediction interval with upper and lower bounds may be better than a single point value for transportation management. In this paper, we introduce a neural network model called Extreme Learning Machine (ELM) for interval prediction of short-term traffic volume and improve it with the heuristic particle swarm optimization algorithm (PSO). The hybrid PSO-ELM model can generate the prediction intervals under different confidence levels and guarantee the quality by minimizing a multi-objective function which considers two criteria reliability and interval sharpness. The PSO-ELM models are built based on an hourly traffic dataset and compared with ARMA and Kalman Filter models. The results show that ARMA models are the worst for all confidence levels, and the PSO-ELM models are comparable with Kalman Filter from the aspects of reliability and narrowness of the intervals, although the parameters of PSO-ELM are fixed once the training is done while Kalman Filter is updated in an online approach. Additionally, only the PSO-ELMs are able to produce intervals with coverage probabilities higher than or equal to the confidence levels. For the points outside of the prediction levels given by PSO-ELMs, they lie very close to the bounds.

• [cs.RO]**Anytime Motion Planning on Large Dense Roadmaps with Expensive Edge Evaluations**

*Shushman Choudhury, Oren Salzman, Sanjiban Choudhury, Christopher M. Dellin, Siddhartha S. Srinivasa*

http://arxiv.org/abs/1711.04040v1

We propose an algorithmic framework for efficient anytime motion planning on large dense geometric roadmaps, in domains where collision checks and therefore edge evaluations are computationally expensive. A large dense roadmap (graph) can typically ensure the existence of high quality solutions for most motion-planning problems, but the size of the roadmap, particularly in high-dimensional spaces, makes existing search-based planning algorithms computationally expensive. We deal with the challenges of expensive search and collision checking in two ways. First, we frame the problem of anytime motion planning on roadmaps as searching for the shortest path over a sequence of subgraphs of the entire roadmap graph, generated by some densification strategy. This lets us achieve bounded sub-optimality with bounded worst-case planning effort. Second, for searching each subgraph, we develop an anytime planning algorithm which uses a belief model to compute the collision probability of unknown configurations and searches for paths that are Pareto-optimal in path length and collision probability. This algorithm is efficient with respect to collision checks as it searches for successively shorter paths. We theoretically analyze both our ideas and evaluate them individually on high-dimensional motion-planning problems. Finally, we apply both of these ideas together in our algorithmic framework for anytime motion planning, and show that it outperforms BIT* on high-dimensional hypercube problems.

• [cs.RO]**Towards Planning and Control of Hybrid Systems with Limit Cycle using LQR Trees**

*Ramkumar Natarajan, Siddharthan Rajasekaran, Jonathan D. Taylor*

http://arxiv.org/abs/1711.04063v1

We present a multi-query recovery policy for a hybrid system with goal limit cycle. The sample trajectories and the hybrid limit cycle of the dynamical system are stabilized using locally valid Time Varying LQR controller policies which probabilistically cover a bounded region of state space. The original LQR Tree algorithm builds such trees for non-linear static and non-hybrid systems like a pendulum or a cart-pole. We leverage the idea of LQR trees to plan with a continuous control set, unlike methods that rely on discretization like dynamic programming to plan for hybrid dynamical systems where it is hard to capture the exact event of discrete transition. We test the algorithm on a compass gait model by stabilizing a dynamic walking hybrid limit cycle with point foot contact from random initial conditions. We show results from the simulation where the system comes back to a stable behavior with initial position or velocity perturbation and noise.

• [cs.SE]**Towards an interdisciplinary, socio-technical analysis of software ecosystem health**

*Tom Mens, Bram Adams, Josianne Marsan*

http://arxiv.org/abs/1711.04532v1

This extended abstract presents the research goals and preliminary research results of the interdisciplinary research project SECOHealth, an ongoing collaboration between research teams of Polytechnique Montreal (Canada), the University of Mons (Belgium) and Laval University (Canada). SECOHealth aims to contribute to research and practice in software engineering by delivering a validated interdisciplinary scientific methodology and a catalog of guidelines and recommendation tools for improving software ecosystem health.

• [cs.SI]**Generalized Neural Graph Embedding with Matrix Factorization**

*Junliang Guo, Linli Xu, Xunpeng Huang, Enhong Chen*

http://arxiv.org/abs/1711.04094v1

Recent advances in language modeling such as word2vec motivate a number of graph embedding approaches by treating random walk sequences as sentences to encode structural proximity in a graph. However, most of the existing principles of neural graph embedding do not incorporate auxiliary information such as node content flexibly. In this paper we take a matrix factorization perspective of graph embedding which generalizes to structural embedding as well as content embedding in a natural way. For structure embedding, we validate that the matrix we construct and factorize preserves the high-order proximities of the graph. Label information can be further integrated into the matrix via the process of random walk sampling to enhance the quality of embedding. In addition, we generalize the Skip-Gram Negative Sampling model to integrate the content of the graph in a matrix factorization framework. As a consequence, graph embedding can be learned in a unified framework integrating graph structure and node content as well as label information simultaneously. We demonstrate the efficacy of the proposed model with the tasks of semi-supervised node classification and link prediction on a variety of real-world benchmark network datasets.

• [cs.SY]**A Supervised Learning Concept for Reducing User Interaction in Passenger Cars**

*Marius Stärk, Damian Backes, Christian Kehl*

http://arxiv.org/abs/1711.04518v1

In this article an automation system for human-machine-interfaces (HMI) for setpoint adjustment using supervised learning is presented. We use HMIs of multi-modal thermal conditioning systems in passenger cars as example for a complex setpoint selection system. The goal is the reduction of interaction complexity up to full automation. The approach is not limited to climate control applications but can be extended to other setpoint-based HMIs.

• [cs.SY]**A unified decision making framework for supply and demand management in microgrid networks**

*Raghuram Bharadwaj Diddigi, Sai Koti Reddy Danda, Krishnasuri Narayanam, Shalabh Bhatnagar*

http://arxiv.org/abs/1711.05078v1

This paper considers two important problems - on the supply-side and demand-side respectively and studies both in a unified framework. On the supply side, we study the problem of energy sharing among microgrids with the goal of maximizing profit obtained from selling power while meeting customer demand. On the other hand, under shortage of power, this problem becomes one of deciding the amount of power to be bought with dynamically varying prices. On the demand side, we consider the problem of optimally scheduling the time-adjustable demand - i.e., of loads with flexible time windows in which they can be scheduled. While previous works have treated these two problems in isolation, we combine these problems together and provide for the first time in the literature, a unified Markov decision process (MDP) framework for these problems. We then apply the Q-learning algorithm, a popular model-free reinforcement learning technique, to obtain the optimal policy. Through simulations, we show that our model outperforms the traditional power sharing models.

• [econ.EM]**Uniform Inference for Conditional Factor Models with Instrumental and Idiosyncratic Betas**

*Yuan Liao, Xiye Yang*

http://arxiv.org/abs/1711.04392v1

It has been well known in financial economics that factor betas depend on observed instruments such as firm specific characteristics and macroeconomic variables, and a key object of interest is the effect of instruments on the factor betas. One of the key features of our model is that we specify the factor betas as functions of time-varying observed instruments that pick up long-run beta fluctuations, plus an orthogonal idiosyncratic component that captures high-frequency movements in beta. It is often the case that researchers do not know whether or not the idiosyncratic beta exists, or its strengths, and thus uniformity is essential for inferences. It is found that the limiting distribution of the estimated instrument effect has a discontinuity when the strength of the idiosyncratic beta is near zero, which makes usual inferences fail to be valid and produce misleading results. In addition, the usual "plug-in" method using the estimated asymptotic variance is only valid pointwise. The central goal is to make inference about the effect on the betas of firms' instruments, and to conduct out-of-sample forecast of integrated volatilities using estimated factors. Both procedures should be valid uniformly over a broad class of data generating processes for idiosyncratic betas with various signal strengths and degrees of time-variant. We show that a cross-sectional bootstrap procedure is essential for the uniform inference, and our procedure also features a bias correction for the effect of estimating unknown factors.

• [eess.AS]**Deep Networks tag the location of bird vocalisations on audio spectrograms**

*Lefteris Fanioudakis, Ilyas Potamitis*

http://arxiv.org/abs/1711.04347v1

This work focuses on reliable detection and segmentation of bird vocalizations as recorded in the open field. Acoustic detection of avian sounds can be used for the automatized monitoring of multiple bird taxa and querying in long-term recordings for species of interest. These tasks are tackled in this work, by suggesting two approaches: A) First, DenseNets are applied to weekly labeled data to infer the attention map of the dataset (i.e. Salience and CAM). We push further this idea by directing attention maps to the YOLO v2 Deepnet-based, detection framework to localize bird vocalizations. B) A deep autoencoder, namely the U-net, maps the audio spectrogram of bird vocalizations to its corresponding binary mask that encircles the spectral blobs of vocalizations while suppressing other audio sources. We focus solely on procedures requiring minimum human attendance, suitable to scan massive volumes of data, in order to analyze them, evaluate insights and hypotheses and identify patterns of bird activity. Hopefully, this approach will be valuable to researchers, conservation practitioners, and decision makers that need to design policies on biodiversity issues.

• [eess.AS]**Multilingual Adaptation of RNN Based ASR Systems**

*Markus Müller, Sebastian Stüker, Alex Waibel*

http://arxiv.org/abs/1711.04569v1

A large amount of data is required for automatic speech recognition (ASR) systems achieving good performance. While such data is readily available for languages like English, there exists a long tail of languages with only limited language resources. By using data from additional source languages, this problem can be mitigated. In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units to train systems jointly on multiple languages poses difficulties: While the same phones share the same symbols across languages, they are pronounced slightly different because of, e.g., small shifts in tongue positions. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. In this work, we extended this approach by introducing a novel technique which we call "modulation" to add LFVs . We evaluated our approach in multiple conditions, showing improvements in both full and low resource conditions as well as for grapheme and phone based systems.

• [eess.AS]**Phonemic and Graphemic Multilingual CTC Based Speech Recognition**

*Markus Müller, Sebastian Stüker, Alex Waibel*

http://arxiv.org/abs/1711.04564v1

Training automatic speech recognition (ASR) systems requires large amounts of data in the target language in order to achieve good performance. Whereas large training corpora are readily available for languages like English, there exists a long tail of languages which do suffer from a lack of resources. One method to handle data sparsity is to use data from additional source languages and build a multilingual system. Recently, ASR systems based on recurrent neural networks (RNNs) trained with connectionist temporal classification (CTC) have gained substantial research interest. In this work, we extended our previous approach towards training CTC-based systems multilingually. Our systems feature a global phone set, based on the joint phone sets of each source language. We evaluated the use of different language combinations as well as the addition of Language Feature Vectors (LFVs). As contrastive experiment, we built systems based on graphemes as well. Systems having a multilingual phone set are known to suffer in performance compared to their monolingual counterparts. With our proposed approach, we could reduce the gap between these mono- and multilingual setups, using either graphemes or phonemes.

• [eess.SP]**Person Recognition using Smartphones' Accelerometer Data**

*Thingom Bishal Singha, Rajsekhar Kumar Nath, A. V. Narsimhadhan*

http://arxiv.org/abs/1711.04689v1

Smartphones have become quite pervasive in various aspects of our daily lives. They have become important links to a host of important data and applications, which if compromised, can lead to disastrous results. Due to this, today's smartphones are equipped with multiple layers of authentication modules. However, there still lies the need for a viable and unobtrusive layer of security which can perform the task of user authentication using resources which are cost-efficient and widely available on smartphones. In this work, we propose a method to recognize users using data from a phone's embedded accelerometer sensors. Features encapsulating information from both time and frequency domains are extracted from walking data samples, and are used to build a Random Forest ensemble classification model. Based on the experimental results, the resultant model delivers an accuracy of 0.9679 and Area under Curve (AUC) of 0.9822.

• [math.CO]**Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science**

*George C. Linderman, Gal Mishne, Yuval Kluger, Stefan Steinerberger*

http://arxiv.org/abs/1711.04712v1

If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k' \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.

• [math.OC]**A Robust Variable Step Size Fractional Least Mean Square (RVSS-FLMS) Algorithm**

*Shujaat Khan, Muhammad Usman, Imran Naseem, Roberto Togneri, Mohammed Bennamoun*

http://arxiv.org/abs/1711.04973v1

In this paper, we propose an adaptive framework for the variable step size of the fractional least mean square (FLMS) algorithm. The proposed algorithm named the robust variable step size-FLMS (RVSS-FLMS), dynamically updates the step size of the FLMS to achieve high convergence rate with low steady state error. For the evaluation purpose, the problem of system identification is considered. The experiments clearly show that the proposed approach achieves better convergence rate compared to the FLMS and adaptive step-size modified FLMS (AMFLMS).

• [math.PR]**A Note on the Quasi-Stationary Distribution of the Shiryaev Martingale on the Positive Half-Line**

*Aleksey S. Polunchenko, Servet Martinez, Jaime San Martin*

http://arxiv.org/abs/1711.05134v1

We obtain a closed-form formula for the quasi-stationary distribution of the classical Shiryaev martingale diffusion considered on the positive half-line $[A,+\infty)$ with $A>0$ fixed; the state space's left endpoint is assumed to be the killing boundary. The formula is obtained analytically as the solution of the appropriate singular Sturm-Liouville problem; the latter was first considered in Section 7.8.2 of Collet et al. (2013), but has heretofore remained unsolved.

• [math.PR]**Joint Large Deviation principle for empirical measures of the d-regular random graphs**

*U. Ibrahim, A. Lotsi, K. Doku-Amponsah*

http://arxiv.org/abs/1711.05028v1

For a $d-$regular random model, we assign to vertices $q-$state spins. From this model, we define the \emph{empirical co-operate measure}, which enumerates the number of co-operation between a given couple of spins, and \emph{ empirical spin measure}, which enumerates the number of sites having a given spin on the $d-$regular random graph model. For these empirical measures we obtain large deviation principle(LDP) in the weak topology.

• [math.ST]**Adaptive estimation and noise detection for an ergodic diffusion with observation noises**

*Shogo H. Nakakita, Masayuki Uchida*

http://arxiv.org/abs/1711.04462v1

We research adaptive maximum likelihood-type estimation for an ergodic diffusion process where the observation is contaminated by noise. This methodology leads to the asymptotic independence of the estimators for the variance of observation noise, the diffusion parameter and the drift one of the latent diffusion process. Moreover, it can lessen the computational burden compared to simultaneous maximum likelihood-type estimation. In addition to adaptive estimation, we propose a test to see if noise exists or not, and analyse real data as the example such that data contains observation noise with statistical significance.

• [math.ST]**Minimax estimation in linear models with unknown finite alphabet design**

*Merle Behr, Axel Munk*

http://arxiv.org/abs/1711.04145v1

We provide minimax theory for joint estimation of $F$ and $\omega$ in linear models $Y = F \omega + Z$ where the parameter matrix $\omega$ and the design matrix $F$ are unknown but the latter takes values in a known finite set. This allows to separate $F$ and $\omega$, a task which is not doable, in general. We obtain in the noiseless case, i.e., $Z = 0$, stable recovery of $F$ and $\omega$ from the linear model. Based on this, we show for Gaussian error matrix $Z$ that the LSE attains minimax rates for the prediction error for $F \omega$. Notably, these are exponential in the dimension of one component of $Y$. The finite alphabet allows estimation of $F$ and $\omega$ itself and it is shown that the LSE achieves the minimax rate. As computation of the LSE is not feasible, an efficient algorithm is proposed. Simulations suggest that this approximates the LSE well.

• [math.ST]**On the boundary between qualitative and quantitative methods for causal inference**

*Yue Wang, Linbo Wang*

http://arxiv.org/abs/1711.04466v1

We consider how to quantify the causal effect from a random variable to a response variable. We show that with multiple Markov boundaries, conditional mutual information (CMI) will produce 0, while causal strength (CS) and part mutual information (PMI), which claim to behave better, are not well-defined, and have other problems. The reason is that the quantitative causal inference with multiple Markov boundaries is an ill-posed problem. We will give a criterion and some applicable algorithms to determine whether a distribution has non-unique Markov boundaries.

• [math.ST]**Sparse High-Dimensional Linear Regression. Algorithmic Barriers and a Local Search Algorithm**

*David Gamarnik, Ilias Zadik*

http://arxiv.org/abs/1711.04952v1

We consider a sparse high dimensional regression model where the goal is to recover a k-sparse unknown vector \beta^* from n noisy linear observations of the form Y=X\beta^

+W \in R^n where X \in R^{n \times p} has iid N(0,1) entries and W \in R^n has iid N(0,\sigma^2) entries. Under certain assumptions on the parameters, an intriguing assymptotic gap appears between the minimum value of n, call it n^, for which the recovery is information theoretically possible, and the minimum value of n, call it n_{alg}, for which an efficient algorithm is known to provably recover \beta^. In a recent paper it was conjectured that the gap is not artificial, in the sense that for sample sizes n \in [n^,n_{alg}] the problem is algorithmically hard. We support this conjecture in two ways. Firstly, we show that a well known recovery mechanism called Basis Pursuit Denoising Scheme provably fails to \ell_2-stably recover the vector when n \in [n^,c n_{alg}], for some sufficiently small constant c>0. Secondly, we establish that n_{alg}, up to a multiplicative constant factor, is a phase transition point for the appearance of a certain Overlap Gap Property (OGP) over the space of k-sparse vectors. The presence of such an Overlap Gap Property phase transition, which originates in statistical physics, is known to provide evidence of an algorithmic hardness. Finally we show that if n>C n_{alg} for some large enough constant C>0, a very simple algorithm based on a local search improvement is able to infer correctly the support of the unknown vector \beta^, adding it to the list of provably successful algorithms for the high dimensional linear regression problem.

• [math.ST]**Strong consistency and optimality for generalized estimating equations with stochastic covariates**

*Laura Dumitrescu, Ioana Schiopu-Kratina*

http://arxiv.org/abs/1711.04990v1

In this article we study the existence and strong consistency of GEE estimators, when the generalized estimating functions are martingales with random coefficients. Furthermore, we characterize estimating functions which are asymptotically optimal.

• [math.ST]**The mixability of elliptical distributions with supermodular functions**

*Xiaoqian Zhang, Chuancun Yin*

http://arxiv.org/abs/1711.05085v1

The concept of $\phi$-complete mixability and $\phi$-joint mixability was first introduced in Bignozzi and Puccetti (2015), which is a direct extension of complete and joint mixability. Following Bignozzi and Puccetti (2015), we consider two cases of $\phi$ and investigate the $\phi$-joint mixability for elliptical distributions and logarithmic elliptical distributions. The results generalize the corresponding ones of joint mixability for elliptical distributions in the literature.

• [math.ST]**Thresholding Bandit for Dose-ranging: The Impact of Monotonicity**

*Aurélien Garivier, Pierre Ménard, Laurent Rossi*

http://arxiv.org/abs/1711.04454v1

We analyze the sample complexity of the thresholding bandit problem, with and without the assumption that the mean values of the arms are increasing. In each case, we provide a lower bound valid for any risk $\delta$ and any $\delta$-correct algorithm; in addition, we propose an algorithm whose sample complexity is of the same order of magnitude for small risks. This work is motivated by phase 1 clinical trials, a practically important setting where the arm means are increasing by nature, and where no satisfactory solution is available so far.

• [q-bio.CB]**Using Game Theory for Real-Time Behavioral Dynamics in Microscopic Populations with Noisy Signaling**

*Adam Noel, Yuting Fang, Nan Yang, Dimitrios Makrakis, Andrew W. Eckford*

http://arxiv.org/abs/1711.04870v1

This article introduces the application of game theory to understand noisy real-time signaling and the resulting behavioral dynamics in microscopic populations such as bacteria and other cells. It presents a bridge between the fields of molecular communication and microscopic game theory. Molecular communication uses conventional communication engineering theory and techniques to study and design systems that use chemical molecules as information carriers. Microscopic game theory models interactions within and between populations of cells and microorganisms. Integrating these two fields provides unique opportunities to understand and control microscopic populations that have imperfect signal propagation. Two case studies, namely bacteria resource sharing and tumor cell signaling, are presented as examples to demonstrate the potential of this approach.

• [q-bio.QM]**Parkinson's Disease Digital Biomarker Discovery with Optimized Transitions and Inferred Markov Emissions**

*Avinash Bukkittu, Baihan Lin, Trung Vu, Itsik Pe'er*

http://arxiv.org/abs/1711.04078v1

We search for digital biomarkers from Parkinson's Disease by observing approximate repetitive patterns matching hypothesized step and stride periodic cycles. These observations were modeled as a cycle of hidden states with randomness allowing deviation from a canonical pattern of transitions and emissions, under the hypothesis that the averaged features of hidden states would serve to informatively characterize classes of patients/controls. We propose a Hidden Semi-Markov Model (HSMM), a latent-state model, emitting 3D-acceleration vectors. Transitions and emissions are inferred from data. We fit separate models per unique device and training label. Hidden Markov Models (HMM) force geometric distributions of the duration spent at each state before transition to a new state. Instead, our HSMM allows us to specify the distribution of state duration. This modified version is more effective because we are interested more in each state's duration than the sequence of distinct states, allowing inclusion of these durations the feature vector.

• [quant-ph]**Quantum transport senses community structure in networks**

*Chenchao Zhao, Jun S. Song*

http://arxiv.org/abs/1711.04979v1

Quantum time evolution exhibits rich physics, attributable to the interplay between the density and phase of a wave function. However, unlike classical heat diffusion, the wave nature of quantum mechanics has not yet been extensively explored in modern data analysis. We propose that the Laplace transform of quantum transport (QT) can be used to construct a powerful ensemble of maps from a given complex network to a circle $S^1$, such that closely-related nodes on the network are grouped into sharply concentrated clusters on $S^1$. The resulting QT clustering (QTC) algorithm is shown to outperform the state-of-the-art spectral clustering method on synthetic and real data sets containing complex geometric patterns. The observed phenomenon of QTC can be interpreted as a collective behavior of the microscopic nodes that evolve as macroscopic cluster orbitals in an effective tight-binding model recapitulating the network.

• [stat.AP]**A Bayesian Model for Forecasting Hierarchically Structured Time Series**

*Julie Novak, Scott McGarvie, Beatriz Etchegaray Garcia*

http://arxiv.org/abs/1711.04738v1

An important task for any large-scale organization is to prepare forecasts of key performance metrics. Often these organizations are structured in a hierarchical manner and for operational reasons, projections of these metrics may have been obtained independently from one another at each level of the hierarchy by specialists focusing on certain areas within the business. There is no guarantee that when combined, these aggregates will be consistent with projections produced directly at other levels of the hierarchy. We propose a Bayesian hierarchical method that treats the initial forecasts as observed data which are then combined with prior information and historical predictive accuracy to infer a probability distribution of revised forecasts. When used to create point estimates, this method can reflect preferences for increased accuracy at specific levels in the hierarchy. We present simulated and real data studies to demonstrate when our approach results in improved inferences over alternative methods.

• [stat.AP]**How to estimate time-varying Vector Autoregressive Models? A comparison of two methods**

*Jonas M B Haslbeck, Laura F Bringmann, Lourens J Waldorp*

http://arxiv.org/abs/1711.05204v1

The ubiquity of mobile devices led to a surge in intensive longitudinal (or time series) data of individuals. This is an exciting development because personalized models both naturally tackle the issue of heterogeneities between people and increase the validity of models for applications. A popular model for time series is the Vector Autoregressive (VAR) model, in which each variable is modeled as a linear function of all variables at previous time points. A key assumption of this model is that the parameters of the true data generating model are constant (or stationary) across time. The most straightforward way to check for time-varying parameters is to fit a model that allows for time-varying parameters. In the present paper we compare two methods to estimate time-varying VAR models: the first method uses a spline-approach to allow for time-varying parameters, the second uses kernel-smoothing. We report the performance of both methods and their stationary counterparts in an extensive simulation study that reflects the situations typically encountered in practice. We compare the performance of stationary and time-varying models and discuss the theoretical characteristics of all methods in the light of the simulation results. In addition, we provide a step-by-step tutorial for both methods showing how to estimate a time-varying VAR model on an openly available individual time series dataset.

• [stat.AP]**Machine Learning Meets Microeconomics: The Case of Decision Trees and Discrete Choice**

*Timothy Brathwaite, Akshay Vij, Joan L. Walker*

http://arxiv.org/abs/1711.04826v1

We provide a microeconomic framework for decision trees: a popular machine learning method. Specifically, we show how decision trees represent a non-compensatory decision protocol known as disjunctions-of-conjunctions and how this protocol generalizes many of the non-compensatory rules used in the discrete choice literature so far. Additionally, we show how existing decision tree variants address many economic concerns that choice modelers might have. Beyond theoretical interpretations, we contribute to the existing literature of two-stage, semi-compensatory modeling and to the existing decision tree literature. In particular, we formulate the first bayesian model tree, thereby allowing for uncertainty in the estimated non-compensatory rules as well as for context-dependent preference heterogeneity in one's second-stage choice model. Using an application of bicycle mode choice in the San Francisco Bay Area, we estimate our bayesian model tree, and we find that it is over 1,000 times more likely to be closer to the true data-generating process than a multinomial logit model (MNL). Qualitatively, our bayesian model tree automatically finds the effect of bicycle infrastructure investment to be moderated by travel distance, socio-demographics and topography, and our model identifies diminishing returns from bike lane investments. These qualitative differences lead to bayesian model tree forecasts that directly align with the observed bicycle mode shares in regions with abundant bicycle infrastructure such as Davis, CA and the Netherlands. In comparison, MNL's forecasts are overly optimistic.

• [stat.AP]**On constraining projections of future climate using observations and simulations from multiple climate models**

*Philip G. Sansom, David B. Stephenson, Thomas J. Bracegirdle*

http://arxiv.org/abs/1711.04139v1

Appropriate statistical frameworks are required to make credible inferences about the future state of the climate from multiple climate models. The spread of projections simulated by different models is often a substantial source of uncertainty. This uncertainty can be reduced by identifying "emergent relationships" between future projections and historical simulations. Estimation of emergent relationships is hampered by unbalanced experimental designs and varying sensitivity of models to input parameters and boundary conditions. The relationship between the climate models and the Earth system is uncertain and requires careful modeling. Observation uncertainty also plays an important role when emergent relationships are exploited to constrain projections of future climate in the Earth system A new Bayesian framework is presented that can constrain projections of future climate using historical observations by exploiting robust estimates of emergent relationships while accounting for observation uncertainty. A detailed theoretical comparison with previous multi-model frameworks is provided. The proposed framework is applied to projecting surface temperature in the Arctic at the end of the 21st century. Projected temperatures in some regions are more than 2C lower when constrained by historical observations. The uncertainty about the climate response is reduced by up to 30% where strong constraints exist.

• [stat.AP]**State space models for non-stationary intermittently coupled systems**

*Philip G. Sansom, Daniel B. Williamson, David B. Stephenson*

http://arxiv.org/abs/1711.04135v1

Many time series exhibit non-stationary behaviour that can be explained by intermittent coupling between the observed system and one or more unobserved drivers. We develop Bayesian state space methods for modelling changes to the mean level or temporal correlation structure due to intermittent coupling. Improved system diagnostics and prediction are achieved by incorporating expert knowledge for both the observed and driver processes. Time-varying autoregressive residual processes are developed to model changes in the temporal correlation structure. Efficient filtering and smoothing methods are proposed for the resulting class of models. We for evaluating the evidence of unobserved drivers, and for quantifying their overall effect, and their effects during individual events. Methods for evaluating the potential for skilful predictions within coupled periods, and in future coupled periods, are also proposed. The proposed methodology is applied to the study of winter variability in the dominant pattern of climate variation in the northern hemisphere, the North Atlantic Oscillation. Over 70% of the inter-annual variance in the winter mean level is attributable to an unobserved driver. Skilful predictions for the remainder of the winter season are possible as soon as 30 days after the beginning of the coupled period.

• [stat.CO]**Feature Selection based on the Local Lift Dependence Scale**

*Diego Marcondes, Adilson Simonis, Junior Barrera*

http://arxiv.org/abs/1711.04181v1

This paper uses a classical approach to feature selection: minimization of a cost function of an estimated joint distribution. However, the space in which such minimization is performed will be extended from the Boolean lattice generated by the power set of the features, to the Boolean lattice generated by the power set of the features support, i.e., the values they can assume. In this approach we may not only select the features that are most related to a variable $Y$, but also select the values of the features that most influence the variable or that are most prone to have a specific value of $Y$. The \textit{Local Lift Dependence Scale}, an scale for measuring variable dependence in multiple \textit{resolutions}, is used to develop the cost functions, which are based on classical dependence measures, as the Mutual Information, Cross Entropy and Kullback-Leibler Divergence. The proposed approach is applied to a dataset consisting of student performances on a university entrance exam and on undergraduate courses. This approach is used to select the subjects of the entrance exam, and the performances on them, that are most related to the performance on the undergraduate courses.

• [stat.ME]**(Un)Conditional Sample Generation Based on Distribution Element Trees**

*Daniel W. Meyer*

http://arxiv.org/abs/1711.04632v1

Recently, distribution element trees (DETs) were introduced as an accurate and computationally efficient method for density estimation. In this work, we demonstrate that the DET formulation promotes an easy and inexpensive way to generate random samples similar to a smooth bootstrap. These samples can be generated unconditionally, but also, without further complications, conditionally utilizing available information about certain probability-space components.

• [stat.ME]**A Test for Isotropy on a Sphere using Spherical Harmonic Functions**

*Indranil Sahoo, Joseph Guinness, Brian J. Reich*

http://arxiv.org/abs/1711.04092v1

Analysis of geostatistical data is often based on the assumption that the spatial random field is isotropic. This assumption, if erroneous, can adversely affect model predictions and statistical inference. Nowadays many applications consider data over the entire globe and hence it is necessary to check the assumption of isotropy on a sphere. In this paper, a test for spatial isotropy on a sphere is proposed. The data are first projected onto the set of spherical harmonic functions. Under isotropy, the spherical harmonic coefficients are uncorrelated whereas they are correlated if the underlying fields are not isotropic. This motivates a test based on the sample correlation matrix of the spherical harmonic coefficients. In particular, we use the largest eigenvalue of the sample correlation matrix as the test statistic. Extensive simulations are conducted to assess the Type I errors of the test under different scenarios. We show how temporal correlation affects the test and provide a method for handling temporal correlation. We also gauge the power of the test as we move away from isotropy. The method is applied to the near-surface air temperature data which is part of the HadCM3 model output. Although we do not expect global temperature fields to be isotropic, we propose several anisotropic models with increasing complexity, each of which has an isotropic process as model component and we apply the test to the isotropic component in a sequence of such models as a method of determining how well the models capture the anisotropy in the fields.

• [stat.ME]**Bayesian linear regression models with flexible error distributions**

*Nívea B. da Silva, Marcos O. Prates, Flávio B. Gonçalves*

http://arxiv.org/abs/1711.04376v1

This work introduces a novel methodology based on finite mixtures of Student-t distributions to model the errors' distribution in linear regression models. The novelty lies on a particular hierarchical structure for the mixture distribution in which the first level models the number of modes, responsible to accommodate multimodality and skewness features, and the second level models tail behavior. Moreover, the latter is specified in a way that no degrees of freedom parameters are estimated and, therefore, the known statistical difficulties when dealing with those parameters is mitigated, and yet model flexibility is not compromised. Inference is performed via Markov chain Monte Carlo and simulation studies are conducted to evaluate the performance of the proposed methodology. The analysis of two real data sets are also presented.

• [stat.ME]**Causal Inference from Observational Studies with Clustered Interference**

*Brian G. Barkley, Michael G. Hudgens, John D. Clemens, Mohammad Ali, Michael E. Emch*

http://arxiv.org/abs/1711.04834v1

Inferring causal effects from an observational study is challenging because participants are not randomized to treatment. Observational studies in infectious disease research present the additional challenge that one participant's treatment may affect another participant's outcome, i.e., there may be interference. In this paper recent approaches to defining causal effects in the presence of interference are considered, and new causal estimands designed specifically for use with observational studies are proposed. Previously defined estimands target counterfactual scenarios in which individuals independently select treatment with equal probability. However, in settings where there is interference between individuals within clusters, it may be unlikely that treatment selection is independent between individuals in the same cluster. The proposed causal estimands instead describe counterfactual scenarios in which the treatment selection correlation structure is the same as in the observed data distribution, allowing for within-cluster dependence in the individual treatment selections. These estimands may be more relevant for policy-makers or public health officials who desire to quantify the effect of increasing the proportion of treated individuals in a population. Inverse probability-weighted estimators for these estimands are proposed. The large-sample properties of the estimators are derived, and a simulation study demonstrating the finite-sample performance of the estimators is presented. The proposed methods are illustrated by analyzing data from a study of cholera vaccination in over 100,000 individuals in Bangladesh.

• [stat.ME]**Change Detection in a Dynamic Stream of Attributed Networks**

*Mostafa Reisi Gahrooei, Kamran Paynabar*

http://arxiv.org/abs/1711.04441v1

While anomaly detection in static networks has been extensively studied, only recently, researchers have focused on dynamic networks. This trend is mainly due to the capacity of dynamic networks in representing complex physical, biological, cyber, and social systems. This paper proposes a new methodology for modeling and monitoring of dynamic attributed networks for quick detection of temporal changes in network structures. In this methodology, the generalized linear model (GLM) is used to model static attributed networks. This model is then combined with a state transition equation to capture the dynamic behavior of the system. Extended Kalman filter (EKF) is used as an online, recursive inference procedure to predict and update network parameters over time. In order to detect changes in the underlying mechanism of edge formation, prediction residuals are monitored through an Exponentially Weighted Moving Average (EWMA) control chart. The proposed modeling and monitoring procedure is examined through simulations for attributed binary and weighted networks. The email communication data from the Enron corporation is used as a case study to show how the method can be applied in real-world problems.

• [stat.ME]**Checking validity of monotone domain mean estimators**

*Cristian Oliva, Mary C. Meyer, Jean D. Opsomer*

http://arxiv.org/abs/1711.04749v1

Estimates of population characteristics such as domain means are often expected to follow monotonicity assumptions. Recently, a method to adaptively pool neighboring domains was proposed, which ensures that the resulting domain mean estimates follow monotone constraints. The method leads to asymptotically valid estimation and inference, and can lead to substantial improvements in efficiency, in comparison with unconstrained domain estimators. However, assuming incorrect shape constraints could lead to biased estimators. Here, we develop the Cone Information Criterion for Survey Data (CICs) as a diagnostic method to measure monotonicity departures on population domain means. We show that the criterion leads to a consistent methodology that makes an asymptotically correct decision choosing between unconstrained and constrained domain mean estimators.

• [stat.ME]**Deterministic parallel analysis**

*Edgar Dobriban, Art B. Owen*

http://arxiv.org/abs/1711.04155v1

Factor analysis is widely used in many application areas. The first step, choosing the number of factors, remains a serious challenge. One of the most popular methods is parallel analysis (PA), which compares the observed factor strengths to simulated ones under a noise-only model. % Abstracts are commonly just one paragraph. This paper presents a deterministic version of PA (DPA), which is faster and more reproducible than PA. We show that DPA selects large factors and does not select small factors just like [Dobriban, 2017] shows for PA. Both PA and DPA are prone to a shadowing phenomenon in which a strong factor makes it hard to detect smaller but more interesting factors. We develop a deflated version of DPA (DDPA) that counters shadowing. By raising the decision threshold in DDPA, a new method (DDPA+) also improves estimation accuracy. We illustrate our methods on data from the Human Genome Diversity Project (HGDP). There PA and DPA select seemingly too many factors, while DDPA+ selects only a few. A Matlab implementation is available.

• [stat.ME]**Estimating prediction error for complex samples**

*Andrew Holbrook, Daniel Gillen*

http://arxiv.org/abs/1711.04877v1

Non-uniform random samples are commonly generated in multiple scientific fields ranging from economics to medicine. Complex sampling designs afford research with increased precision for estimating parameters of interest in less prevalent sub-populations. With a growing interest in using complex samples to generate prediction models for numerous outcomes it is necessary to account for the sampling design that gave rise to the data in order to assess the generalized predictive utility of a proposed prediction rule. Specifically, after learning a prediction rule based on a complex sample, it is of interest to estimate the rule's error rate when applied to unobserved members of the population. Efron proposed a general class of covariance-inflated prediction error estimators that assumed the available training data is representative of the target population for which the prediction rule is to be applied. We extend Efron's estimator to the complex sample context by incorporating Horvitz-Thompson sampling weights and show that it is consistent for the true generalization error rate when applied to the underlying superpopulation giving rise to the training sample. The resulting Horvitz-Thompson-Efron (HTE) estimator is equivalent to dAIC---a recent extension of AIC to survey sampling data---and is more widely applicable. The proposed methodology is assessed via empirical simulations and is applied to data predicting renal function that was obtained from the National Health and Nutrition Examination Survey (NHANES).

• [stat.ME]**Generalised empirical likelihood-based kernel density estimation**

*Vitaliy Oryshchenko, Richard J. Smith*

http://arxiv.org/abs/1711.04793v1

If additional information about the distribution of a random variable is available in the form of moment conditions, the weighted kernel density estimator constructed by replacing the uniform weights with the generalised empirical likelihood probabilities provides an improved approximation to the moment constraints. More importantly, a reduction in variance is achieved due to the systematic use of the extra information. Same approach can be used to estimate a density or distribution of certain functions of data and, possibly, of the unknown parameters, the latter being replaced by their generalised empirical likelihood estimates. A special case of interest is estimation of densities or distributions of (generalised) residuals in semi-parametric models defined by a finite number of moment restrictions. Such estimates are of great practical interest and can be used for diagnostic purposes, including testing parametric assumptions about the error distribution, goodness-of-fit, or overidentifying moment restrictions. We give conditions under which such estimators are consistent, and describe their asymptotic mean squared error properties. Analytic examples illustrate the situations where re-weighting provides a reduction in variance, and a simulation study is conducted to evaluate small sample performance of the proposed estimators.

• [stat.ME]**Graph-Based Two-Sample Tests for Discrete Data**

*Jingru Zhang, Hao Chen*

http://arxiv.org/abs/1711.04349v1

In the regime of two-sample comparison, tests based on a graph constructed on observations by utilizing similarity information among them is gaining attention due to their flexibility and good performances under various settings for high-dimensional data and non-Euclidean data. However, when there are repeated observations or ties in terms of the similarity graph, these graph-based tests could be problematic as they are versatile to the choice of the similarity graph. We study two ways to fix the "tie" problem for the existing graph-based test statistics and a new max-type statistic. Analytic p-value approximations for these extended graph-based tests are also derived and shown to work well for finite samples, allowing the tests to be fast applicable to large datasets. The new tests are illustrated in the analysis of a phone-call network dataset. All proposed tests are implemented in R package gTests.

• [stat.ME]**K-groups: A Generalization of K-means Clustering**

*Songzi Li, Maria L. Rizzo*

http://arxiv.org/abs/1711.04359v1

We propose a new class of distribution-based clustering algorithms, called k-groups, based on energy distance between samples. The energy distance clustering criterion assigns observations to clusters according to a multi-sample energy statistic that measures the distance between distributions. The energy distance determines a consistent test for equality of distributions, and it is based on a population distance that characterizes equality of distributions. The k-groups procedure therefore generalizes the k-means method, which separates clusters that have different means. We propose two k-groups algorithms: k-groups by first variation; and k-groups by second variation. The implementation of k-groups is partly based on Hartigan and Wong's algorithm for k-means. The algorithm is generalized from moving one point on each iteration (first variation) to moving $m$ $(m > 1)$ points. For univariate data, we prove that Hartigan and Wong's k-means algorithm is a special case of k-groups by first variation. The simulation results from univariate and multivariate cases show that our k-groups algorithms perform as well as Hartigan and Wong's k-means algorithm when clusters are well-separated and normally distributed. Moreover, both k-groups algorithms perform better than k-means when data does not have a finite first moment or data has strong skewness and heavy tails. For non--spherical clusters, both k-groups algorithms performed better than k-means in high dimension, and k-groups by first variation is consistent as dimension increases. In a case study on dermatology data with 34 features, both k-groups algorithms performed better than k-means.

• [stat.ME]**Optimal estimation in functional linear regression for sparse noise-contaminated data**

*Behdad Mostafaiy, MohammadReza FaridRohani, Shojaeddin Chenouri*

http://arxiv.org/abs/1711.04854v1

In this paper, we propose a novel approach to fit a functional linear regression in which both the response and the predictor are functions of a common variable such as time. We consider the case that the response and the predictor processes are both sparsely sampled on random time points and are contaminated with random errors. In addition, the random times are allowed to be different for the measurements of the predictor and the response functions. The aforementioned situation often occurs in the longitudinal data settings. To estimate the covariance and the cross-covariance functions we use a regularization method over a reproducing kernel Hilbert space. The estimate of the cross-covarinace function is used to obtain an estimate of the regression coefficient function and also functional singular components. We derive the convergence rates of the proposed cross-covariance, the regression coefficient and the singular component function estimators. Furthermore, we show that, under some regularity conditions, the estimator of the coefficient function has a minimax optimal rate. We conduct a simulation study and demonstrate merits of the proposed method by comparing it to some other existing methods in the literature. We illustrate the method by an example of an application to a well known multicenter AIDS Cohort Study.

• [stat.ME]**Quickest Detection of Markov Networks**

*Javad Heydari, Ali Tajer, H. Vincent Poor*

http://arxiv.org/abs/1711.04268v1

Detecting correlation structures in large networks arises in many domains. Such detection problems are often studied independently of the underlying data acquisition process, rendering settings in which data acquisition policies (e.g., the sample-size) are pre-specified. Motivated by the advantages of data-adaptive sampling, this paper treats the inherently coupled problems of data acquisition and decision-making for correlation detection. Specifically, this paper considers a Markov network of agents generating random variables, and designs a quickest sequential strategy for discerning the correlation model governing the Markov network. By abstracting the Markov network as an undirected graph,designing the quickest sampling strategy becomes equivalent to sequentially and data-adaptively identifying and sampling a sequence of vertices in the graph. The existing data-adaptive approaches fail to solve this problem since the node selection actions are dependent. Hence, a novel sampling strategy is proposed to incorporate the correlation structures into the decision rules for minimizing the average delay in forming a confident decision. The proposed procedure involves searching over all possible future decisions which becomes computationally prohibitive for large networks. However, by leveraging the Markov properties it is shown that the search horizon can be reduced to the neighbors of each node in the dependency graph without compromising the performance. Performance analyses and numerical evaluations demonstrate the gains of the proposed sequential approach over the existing ones.

• [stat.ME]**Sharpening randomization-based causal inference for $2^2$ factorial designs with binary outcomes**

*Jiannan Lu*

http://arxiv.org/abs/1711.04432v1

In medical research, a scenario often entertained is randomized controlled $2^2$ factorial design with a binary outcome. By utilizing the concept of potential outcomes, Dasgupta et al. (2015) proposed a randomization-based causal inference framework, allowing flexible and simultaneous estimations and inferences of the factorial effects. However, a fundamental challenge that Dasgupta et al. (2015)'s proposed methodology faces is that the sampling variance of the randomization-based factorial effect estimator is unidentifiable, rendering the corresponding classic "Neymanian" variance estimator suffering from over-estimation. To address this issue, for randomized controlled $2^2$ factorial designs with binary outcomes, we derive the sharp lower bound of the sampling variance of the factorial effect estimator, which leads to a new variance estimator that sharpens the finite-population Neymanian causal inference. We demonstrate the advantages of the new variance estimator through a series of simulation studies, and apply our newly proposed methodology to two real-life datasets from randomized clinical trials, where we gain new insights.

• [stat.ME]**Simultaneous Registration and Clustering for Multi-dimensional Functional Data**

*Pengcheng Zeng, Jian Qing Shi, Won-Seok Kim*

http://arxiv.org/abs/1711.04761v1

The clustering for functional data with misaligned problems has drawn much attention in the last decade. Most methods do the clustering after those functional data being registered and there has been little research using both functional and scalar variables. In this paper, we propose a simultaneous registration and clustering (SRC) model via two-level models, allowing the use of both types of variables and also allowing simultaneous registration and clustering. For the data collected from subjects in different unknown groups, a Gaussian process functional regression model with time warping is used as the first level model; an allocation model depending on scalar variables is used as the second level model providing further information over the groups. The former carries out registration and modeling for the multi-dimensional functional data (2D or 3D curves) at the same time. This methodology is implemented using an EM algorithm, and is examined on both simulated data and real data.

• [stat.ME]**The SPDE approach for Gaussian random fields with general smoothness**

*David Bolin, Kristin Kirchner*

http://arxiv.org/abs/1711.04333v1

A popular approach for modeling and inference in spatial statistics is to represent Gaussian random fields as solutions to stochastic partial differential equations (SPDEs) $L^{\beta}u = \mathcal{W}$, where $\mathcal{W}$ is Gaussian white noise, $L$ is a second-order differential operator, and $\beta>0$ is a parameter that determines the smoothness of $u$. However, this approach has been limited to the case $2\beta\in\mathbb{N}$, which excludes several important covariance models such as the exponential covariance on $\mathbb{R}^2$. We demonstrate how this restriction can be avoided by combining a finite element discretization in space with a rational approximation of the function $x^{-\beta}$ to approximate the solution $u$. For the resulting approximation, an explicit rate of strong convergence is derived and we show that the method has the same computational benefits as in the restricted case $2\beta\in\mathbb{N}$ when used for statistical inference and prediction. Several numerical experiments are performed to illustrate the accuracy of the method, and to show how it can be used for likelihood-based inference for all model parameters including $\beta$.

• [stat.ML]**A Batch Learning Framework for Scalable Personalized Ranking**

*Kuan Liu, Prem Natarajan*

http://arxiv.org/abs/1711.04019v1

In designing personalized ranking algorithms, it is desirable to encourage a high precision at the top of the ranked list. Existing methods either seek a smooth convex surrogate for a non-smooth ranking metric or directly modify updating procedures to encourage top accuracy. In this work we point out that these methods do not scale well to a large-scale setting, and this is partly due to the inaccurate pointwise or pairwise rank estimation. We propose a new framework for personalized ranking. It uses batch-based rank estimators and smooth rank-sensitive loss functions. This new batch learning framework leads to more stable and accurate rank approximations compared to previous work. Moreover, it enables explicit use of parallel computation to speed up training. We conduct empirical evaluation on three item recommendation tasks. Our method shows consistent accuracy improvements over state-of-the-art methods. Additionally, we observe time efficiency advantages when data scale increases.

• [stat.ML]**A Sequence-Based Mesh Classifier for the Prediction of Protein-Protein Interactions**

*Edgar D. Coelho, Igor N. Cruz, André Santiago, José Luis Oliveira, António Dourado, Joel P. Arrais*

http://arxiv.org/abs/1711.04294v1

The worldwide surge of multiresistant microbial strains has propelled the search for alternative treatment options. The study of Protein-Protein Interactions (PPIs) has been a cornerstone in the clarification of complex physiological and pathogenic processes, thus being a priority for the identification of vital components and mechanisms in pathogens. Despite the advances of laboratorial techniques, computational models allow the screening of protein interactions between entire proteomes in a fast and inexpensive manner. Here, we present a supervised machine learning model for the prediction of PPIs based on the protein sequence. We cluster amino acids regarding their physicochemical properties, and use the discrete cosine transform to represent protein sequences. A mesh of classifiers was constructed to create hyper-specialised classifiers dedicated to the most relevant pairs of molecular function annotations from Gene Ontology. Based on an exhaustive evaluation that includes datasets with different configurations, cross-validation and out-of-sampling validation, the obtained results outscore the state-of-the-art for sequence-based methods. For the final mesh model using SVM with RBF, a consistent average AUC of 0.84 was attained.

• [stat.ML]**ACtuAL: Actor-Critic Under Adversarial Learning**

*Anirudh Goyal, Nan Rosemary Ke, Alex Lamb, R Devon Hjelm, Chris Pal, Joelle Pineau, Yoshua Bengio*

http://arxiv.org/abs/1711.04755v1

Generative Adversarial Networks (GANs) are a powerful framework for deep generative modeling. Posed as a two-player minimax problem, GANs are typically trained end-to-end on real-valued data and can be used to train a generator of high-dimensional and realistic images. However, a major limitation of GANs is that training relies on passing gradients from the discriminator through the generator via back-propagation. This makes it fundamentally difficult to train GANs with discrete data, as generation in this case typically involves a non-differentiable function. These difficulties extend to the reinforcement learning setting when the action space is composed of discrete decisions. We address these issues by reframing the GAN framework so that the generator is no longer trained using gradients through the discriminator, but is instead trained using a learned critic in the actor-critic framework with a Temporal Difference (TD) objective. This is a natural fit for sequence modeling and we use it to achieve improvements on language modeling tasks over the standard Teacher-Forcing methods.

• [stat.ML]**Alpha-Divergences in Variational Dropout**

*Bogdan Mazoure, Riashat Islam*

http://arxiv.org/abs/1711.04345v1

We investigate the use of alternative divergences to Kullback-Leibler (KL) in variational inference(VI), based on the Variational Dropout \cite{kingma2015}. Stochastic gradient variational Bayes (SGVB) \cite{aevb} is a general framework for estimating the evidence lower bound (ELBO) in Variational Bayes. In this work, we extend the SGVB estimator with using Alpha-Divergences, which are alternative to divergences to VI' KL objective. The Gaussian dropout can be seen as a local reparametrization trick of the SGVB objective. We extend the Variational Dropout to use alpha divergences for variational inference. Our results compare $\alpha$-divergence variational dropout with standard variational dropout with correlated and uncorrelated weight noise. We show that the $\alpha$-divergence with $\alpha \rightarrow 1$ (or KL divergence) is still a good measure for use in variational inference, in spite of the efficient use of Alpha-divergences for Dropout VI \cite{Li17}. $\alpha \rightarrow 1$ can yield the lowest training error, and optimizes a good lower bound for the evidence lower bound (ELBO) among all values of the parameter $\alpha \in [0,\infty)$.

• [stat.ML]**Analyzing and Improving Stein Variational Gradient Descent for High-dimensional Marginal Inference**

*Jingwei Zhuo, Chang Liu, Ning Chen, Bo Zhang*

http://arxiv.org/abs/1711.04425v1

Stein variational gradient descent (SVGD) is a nonparametric inference method, which iteratively transports a set of randomly initialized particles to approximate a differentiable target distribution, along the direction that maximally decreases the KL divergence within a vector-valued reproducing kernel Hilbert space (RKHS). Compared to Monte Carlo methods, SVGD is particle-efficient because of the repulsive force induced by kernels. In this paper, we develop the first analysis about the high dimensional performance of SVGD and emonstrate that the repulsive force drops at least polynomially with increasing dimensions, which results in poor marginal approximation. To improve the marginal inference of SVGD, we propose Marginal SVGD (M-SVGD), which incorporates structural information described by a Markov random field (MRF) into kernels. M-SVGD inherits the particle efficiency of SVGD and can be used as a general purpose marginal inference tool for MRFs. Experimental results on grid based Markov random fields show the effectiveness of our methods.

• [stat.ML]**Blind Source Separation Using Mixtures of Alpha-Stable Distributions**

*Nicolas Keriven, Antoine Deleforge, Antoine Liutkus*

http://arxiv.org/abs/1711.04460v1

We propose a new blind source separation algorithm based on mixtures of alpha-stable distributions. Complex symmetric alpha-stable distributions have been recently showed to better model audio signals in the time-frequency domain than classical Gaussian distributions thanks to their larger dynamic range. However, inference of these models is notoriously hard to perform because their probability density functions do not have a closed-form expression in general. Here, we introduce a novel method for estimating mixture of alpha-stable distributions based on random moment matching. We apply this to the blind estimation of binary masks in individual frequency bands from multichannel convolutive audio mixes. We show that the proposed method yields better separation performance than Gaussian-based binary-masking methods.

• [stat.ML]**Data Augmentation Generative Adversarial Networks**

*Antreas Antoniou, Amos Storkey, Harrison Edwards*

http://arxiv.org/abs/1711.04340v1

Effective training of neural networks requires much data. In the low-data regime, parameters are underdetermined, and learnt networks generalise poorly. Data Augmentation \cite{krizhevsky2012imagenet} alleviates this by using existing data more effectively. However standard data augmentation produces only limited plausible alternative data. Given there is potential to generate a much broader set of augmentations, we design and train a generative model to do data augmentation. The model, based on image conditional Generative Adversarial Networks, takes data from a source domain and learns to take any data item and generalise it to generate other within-class data items. As this generative process does not depend on the classes themselves, it can be applied to novel unseen classes of data. We show that a Data Augmentation Generative Adversarial Network (DAGAN) augments standard vanilla classifiers well. We also show a DAGAN can enhance few-shot learning systems such as Matching Networks. We demonstrate these approaches on Omniglot, on EMNIST having learnt the DAGAN on Omniglot, and VGG-Face data. In our experiments we can see over 13% increase in accuracy in the low-data regime experiments in Omniglot (from 69% to 82%), EMNIST (73.9% to 76%) and VGG-Face (4.5% to 12%); in Matching Networks for Omniglot we observe an increase of 0.5% (from 96.9% to 97.4%) and an increase of 1.8% in EMNIST (from 59.5% to 61.3%).

• [stat.ML]**Fast and reliable inference algorithm for hierarchical stochastic block models**

*Yongjin Park, Joel S. Bader*

http://arxiv.org/abs/1711.05150v1

Network clustering reveals the organization of a network or corresponding complex system with elements represented as vertices and interactions as edges in a (directed, weighted) graph. Although the notion of clustering can be somewhat loose, network clusters or groups are generally considered as nodes with enriched interactions and edges sharing common patterns. Statistical inference often treats groups as latent variables, with observed networks generated from latent group structure, termed a stochastic block model. Regardless of the definitions, statistical inference can be either translated to modularity maximization, which is provably an NP-complete problem. Here we present scalable and reliable algorithms that recover hierarchical stochastic block models fast and accurately. Our algorithm scales almost linearly in number of edges, and inferred models were more accurate that other scalable methods.

• [stat.ML]**Feature importance scores and lossless feature pruning using Banzhaf power indices**

*Bogdan Kulynych, Carmela Troncoso*

http://arxiv.org/abs/1711.04992v1

Understanding the influence of features in machine learning is crucial to interpreting models and selecting the best features for classification. In this work we propose the use of principles from coalitional game theory to reason about importance of features. In particular, we propose the use of the Banzhaf power index as a measure of influence of features on the outcome of a classifier. We show that features having Banzhaf power index of zero can be losslessly pruned without damage to classifier accuracy. Computing the power indices does not require having access to data samples. However, if samples are available, the indices can be empirically estimated. We compute Banzhaf power indices for a neural network classifier on real-life data, and compare the results with gradient-based feature saliency, and coefficients of a logistic regression model with $L_1$ regularization.

• [stat.ML]**Improving Factor-Based Quantitative Investing by Forecasting Company Fundamentals**

*John Alberg, Zachary C. Lipton*

http://arxiv.org/abs/1711.04837v1

On a periodic basis, publicly traded companies are required to report fundamentals: financial data such as revenue, operating income, debt, among others. These data points provide some insight into the financial health of a company. Academic research has identified some factors, i.e. computed features of the reported data, that are known through retrospective analysis to outperform the market average. Two popular factors are the book value normalized by market capitalization (book-to-market) and the operating income normalized by the enterprise value (EBIT/EV). In this paper: we first show through simulation that if we could (clairvoyantly) select stocks using factors calculated on future fundamentals (via oracle), then our portfolios would far outperform a standard factor approach. Motivated by this analysis, we train deep neural networks to forecast future fundamentals based on a trailing 5-years window. Quantitative analysis demonstrates a significant improvement in MSE over a naive strategy. Moreover, in retrospective analysis using an industry-grade stock portfolio simulator (backtester), we show an improvement in compounded annual return to 17.1% (MLP) vs 14.4% for a standard factor model.

• [stat.ML]**Invariances and Data Augmentation for Supervised Music Transcription**

*John Thickstun, Zaid Harchaoui, Dean Foster, Sham M. Kakade*

http://arxiv.org/abs/1711.04845v1

This paper explores a variety of models for frame-based music transcription, with an emphasis on the methods needed to reach state-of-the-art on human recordings. The translation-invariant network discussed in this paper, which combines a traditional filterbank with a convolutional neural network, was the top-performing model in the 2017 MIREX Multiple Fundamental Frequency Estimation evaluation. This class of models shares parameters in the log-frequency domain, which exploits the frequency invariance of music to reduce the number of model parameters and avoid overfitting to the training data. All models in this paper were trained with supervision by labeled data from the MusicNet dataset, augmented by random label-preserving pitch-shift transformations.

• [stat.ML]**Joint Gaussian Processes for Biophysical Parameter Retrieval**

*Daniel Heestermans Svendsen, Luca Martino, Manuel Campos-Taberner, Francisco Javier García-Haro, Gustau Camps-Valls*

http://arxiv.org/abs/1711.05197v1

Solving inverse problems is central to geosciences and remote sensing. Radiative transfer models (RTMs) represent mathematically the physical laws which govern the phenomena in remote sensing applications (forward models). The numerical inversion of the RTM equations is a challenging and computationally demanding problem, and for this reason, often the application of a nonlinear statistical regression is preferred. In general, regression models predict the biophysical parameter of interest from the corresponding received radiance. However, this approach does not employ the physical information encoded in the RTMs. An alternative strategy, which attempts to include the physical knowledge, consists in learning a regression model trained using data simulated by an RTM code. In this work, we introduce a nonlinear nonparametric regression model which combines the benefits of the two aforementioned approaches. The inversion is performed taking into account jointly both real observations and RTM-simulated data. The proposed Joint Gaussian Process (JGP) provides a solid framework for exploiting the regularities between the two types of data. The JGP automatically detects the relative quality of the simulated and real data, and combines them accordingly. This occurs by learning an additional hyper-parameter w.r.t. a standard GP model, and fitting parameters through maximizing the pseudo-likelihood of the real observations. The resulting scheme is both simple and robust, i.e., capable of adapting to different scenarios. The advantages of the JGP method compared to benchmark strategies are shown considering RTM-simulated and real observations in different experiments. Specifically, we consider leaf area index (LAI) retrieval from Landsat data combined with simulated data generated by the PROSAIL model.

• [stat.ML]**Learning and Visualizing Localized Geometric Features Using 3D-CNN: An Application to Manufacturability Analysis of Drilled Holes**

*Sambit Ghadai, Aditya Balu, Adarsh Krishnamurthy, Soumik Sarkar*

http://arxiv.org/abs/1711.04851v1

3D Convolutional Neural Networks (3D-CNN) have been used for object recognition based on the voxelized shape of an object. However, interpreting the decision making process of these 3D-CNNs is still an infeasible task. In this paper, we present a unique 3D-CNN based Gradient-weighted Class Activation Mapping method (3D-GradCAM) for visual explanations of the distinct local geometric features of interest within an object. To enable efficient learning of 3D geometries, we augment the voxel data with surface normals of the object boundary. We then train a 3D-CNN with this augmented data and identify the local features critical for decision-making using 3D GradCAM. An application of this feature identification framework is to recognize difficult-to-manufacture drilled hole features in a complex CAD geometry. The framework can be extended to identify difficult-to-manufacture features at multiple spatial scales leading to a real-time design for manufacturability decision support system.

• [stat.ML]**Model Criticism in Latent Space**

*Sohan Seth, Iain Murray, Christopher K. I. Williams*

http://arxiv.org/abs/1711.04674v1

Model criticism is usually carried out by assessing if replicated data generated under the fitted model looks similar to the observed data, see e.g. Gelman, Carlin, Stern, and Rubin (2004, p. 165). This paper presents a method for latent variable models by pulling back the data into the space of latent variables, and carrying out model criticism in that space. Making use of a model's structure enables a more direct assessment of the assumptions made in the prior and likelihood. We demonstrate the method with examples of model criticism in latent space applied to ANOVA, factor analysis, linear dynamical systems and Gaussian processes.

• [stat.ML]**Near-Optimal Discrete Optimization for Experimental Design: A Regret Minimization Approach**

*Zeyuan Allen-Zhu, Yuanzhi Li, Aarti Singh, Yining Wang*

http://arxiv.org/abs/1711.05174v1

The experimental design problem concerns the selection of k points from a potentially large design pool of p-dimensional vectors, so as to maximize the statistical efficiency regressed on the selected k design points. Statistical efficiency is measured by optimality criteria, including A(verage), D(eterminant), T(race), E(igen), V(ariance) and G-optimality. Except for the T-optimality, exact optimization is NP-hard. We propose a polynomial-time regret minimization framework to achieve a $(1+\varepsilon)$ approximation with only $O(p/\varepsilon^2)$ design points, for all the optimality criteria above. In contrast, to the best of our knowledge, before our work, no polynomial-time algorithm achieves $(1+\varepsilon)$ approximations for D/E/G-optimality, and the best poly-time algorithm achieving $(1+\varepsilon)$-approximation for A/V-optimality requires $k = \Omega(p^2/\varepsilon)$ design points.

• [stat.ML]**STARK: Structured Dictionary Learning Through Rank-one Tensor Recovery**

*Mohsen Ghassemi, Zahra Shakeri, Anand D. Sarwate, Waheed U. Bajwa*

http://arxiv.org/abs/1711.04887v1

In recent years, a class of dictionaries have been proposed for multidimensional (tensor) data representation that exploit the structure of tensor data by imposing a Kronecker structure on the dictionary underlying the data. In this work, a novel algorithm called "STARK" is provided to learn Kronecker structured dictionaries that can represent tensors of any order. By establishing that the Kronecker product of any number of matrices can be rearranged to form a rank-1 tensor, we show that Kronecker structure can be enforced on the dictionary by solving a rank-1 tensor recovery problem. Because rank-1 tensor recovery is a challenging nonconvex problem, we resort to solving a convex relaxation of this problem. Empirical experiments on synthetic and real data show promising results for our proposed algorithm.

• [stat.ML]**Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train**

*Valeriu Codreanu, Damian Podareanu, Vikram Saletore*

http://arxiv.org/abs/1711.04291v1

For the past 5 years, the ILSVRC competition and the ImageNet dataset have attracted a lot of interest from the Computer Vision community, allowing for state-of-the-art accuracy to grow tremendously. This should be credited to the use of deep artificial neural network designs. As these became more complex, the storage, bandwidth, and compute requirements increased. This means that with a non-distributed approach, even when using the most high-density server available, the training process may take weeks, making it prohibitive. Furthermore, as datasets grow, the representation learning potential of deep networks grows as well by using more complex models. This synchronicity triggers a sharp increase in the computational requirements and motivates us to explore the scaling behaviour on petaflop scale supercomputers. In this paper we will describe the challenges and novel solutions needed in order to train ResNet-50 in this large scale environment. We demonstrate above 90% scaling efficiency and a training time of 28 minutes using up to 104K x86 cores. This is supported by software tools from Intel's ecosystem. Moreover, we show that with regular 90 - 120 epoch train runs we can achieve a top-1 accuracy as high as 77% for the unmodified ResNet-50 topology. We also introduce the novel Collapsed Ensemble (CE) technique that allows us to obtain a 77.5% top-1 accuracy, similar to that of a ResNet-152, while training a unmodified ResNet-50 topology for the same fixed training budget. All ResNet-50 models as well as the scripts needed to replicate them will be posted shortly.

• [stat.ML]**Semi-Supervised Learning via New Deep Network Inversion**

*Randall Balestriero, Vincent Roger, Herve G. Glotin, Richard G. Baraniuk*

http://arxiv.org/abs/1711.04313v1

We exploit a recently derived inversion scheme for arbitrary deep neural networks to develop a new semi-supervised learning framework that applies to a wide range of systems and problems. The approach outperforms current state-of-the-art methods on MNIST reaching $99.14%$ of test set accuracy while using $5$ labeled examples per class. Experiments with one-dimensional signals highlight the generality of the method. Importantly, our approach is simple, efficient, and requires no change in the deep network architecture.

• [stat.ML]**Sensor Selection and Random Field Reconstruction for Robust and Cost-effective Heterogeneous Weather Sensor Networks for the Developing World**

*Pengfei Zhang, Ido Nevat, Gareth W. Peters, Wolfgang Fruhwirt, Yongchao Huang, Ivonne Anders, Michael Osborne*

http://arxiv.org/abs/1711.04308v1

We address the two fundamental problems of spatial field reconstruction and sensor selection in heterogeneous sensor networks: (i) how to efficiently perform spatial field reconstruction based on measurements obtained simultaneously from networks with both high and low quality sensors; and (ii) how to perform query based sensor set selection with predictive MSE performance guarantee. For the first problem, we developed a low complexity algorithm based on the spatial best linear unbiased estimator (S-BLUE). Next, building on the S-BLUE, we address the second problem, and develop an efficient algorithm for query based sensor set selection with performance guarantee. Our algorithm is based on the Cross Entropy method which solves the combinatorial optimization problem in an efficient manner.

• [stat.ML]**Should You Derive, Or Let the Data Drive? An Optimization Framework for Hybrid First-Principles Data-Driven Modeling**

*Remi R. Lam, Lior Horesh, Haim Avron, Karen E. Willcox*

http://arxiv.org/abs/1711.04374v1

Mathematical models are used extensively for diverse tasks including analysis, optimization, and decision making. Frequently, those models are principled but imperfect representations of reality. This is either due to incomplete physical description of the underlying phenomenon (simplified governing equations, defective boundary conditions, etc.), or due to numerical approximations (discretization, linearization, round-off error, etc.). Model misspecification can lead to erroneous model predictions, and respectively suboptimal decisions associated with the intended end-goal task. To mitigate this effect, one can amend the available model using limited data produced by experiments or higher fidelity models. A large body of research has focused on estimating explicit model parameters. This work takes a different perspective and targets the construction of a correction model operator with implicit attributes. We investigate the case where the end-goal is inversion and illustrate how appropriate choices of properties imposed upon the correction and corrected operator lead to improved end-goal insights.

• [stat.ML]**Simple And Efficient Architecture Search for Convolutional Neural Networks**

*Thomas Elsken, Jan-Hendrik Metzen, Frank Hutter*

http://arxiv.org/abs/1711.04528v1

Neural networks have recently had a lot of success for many tasks. However, neural network architectures that perform well are still typically designed manually by experts in a cumbersome trial-and-error process. We propose a new method to automatically search for well-performing CNN architectures based on a simple hill climbing procedure whose operators apply network morphisms, followed by short optimization runs by cosine annealing. Surprisingly, this simple method yields competitive results, despite only requiring resources in the same order of magnitude as training a single network. E.g., on CIFAR-10, our method designs and trains networks with an error rate below 6% in only 12 hours on a single GPU; training for one day reduces this error further, to almost 5%.

• [stat.ML]**Sparse quadratic classification rules via linear dimension reduction**

*Irina Gaynanova, Tianying Wang*

http://arxiv.org/abs/1711.04817v1

We consider the problem of high-dimensional classification between the two groups with unequal covariance matrices. Rather than estimating the full quadratic discriminant rule, we perform simultaneous variable selection and linear dimension reduction on original data, with the subsequent application of quadratic discriminant analysis on the reduced space. The projection vectors can be efficiently estimated by solving the convex optimization problem with sparsity-inducing penalty. The new rule performs comparably to linear discriminant analysis when the assumption of equal covariance matrices is satisfied, and improves the misclassification error rates when this assumption is violated. In contrast to quadratic discriminant analysis, the proposed framework doesn't require estimation of precision matrices and scales linearly with the number of measurements, making it especially attractive for the use on high-dimensional datasets. We support the methodology with theoretical guarantees on variable selection consistency, and empirical comparison with competing approaches.

• [stat.ML]**Statistically Optimal and Computationally Efficient Low Rank Tensor Completion from Noisy Entries**

*Dong Xia, Ming Yuan, Cun-Hui Zhang*

http://arxiv.org/abs/1711.04934v1

In this article, we develop methods for estimating a low rank tensor from noisy observations on a subset of its entries to achieve both statistical and computational efficiencies. There have been a lot of recent interests in this problem of noisy tensor completion. Much of the attention has been focused on the fundamental computational challenges often associated with problems involving higher order tensors, yet very little is known about their statistical performance. To fill in this void, in this article, we characterize the fundamental statistical limits of noisy tensor completion by establishing minimax optimal rates of convergence for estimating a $k$th order low rank tensor under the general $\ell_p$ ($1\le p\le 2$) norm which suggest significant room for improvement over the existing approaches. Furthermore, we propose a polynomial-time computable estimating procedure based upon power iteration and a second-order spectral initialization that achieves the optimal rates of convergence. Our method is fairly easy to implement and numerical experiments are presented to further demonstrate the practical merits of our estimator.

• [stat.ML]**Stochastic Strictly Contractive Peaceman-Rachford Splitting Method**

*Sen Na, Mingyuan Ma, Shuming Ma, Guangju Peng*

http://arxiv.org/abs/1711.04955v1

In this paper, we propose a couple of new Stochastic Strictly Contractive Peaceman-Rachford Splitting Method (SCPRSM), called Stochastic SCPRSM (SS-PRSM) and Stochastic Conjugate Gradient SCPRSM (SCG-PRSM) for large-scale optimization problems. The two types of Stochastic PRSM algorithms respectively incorporate stochastic variance reduced gradient (SVRG) and conjugate gradient method. Stochastic PRSM methods and most stochastic ADMM algorithms can only achieve a $O(1/\sqrt{t})$ convergence rate on general convex problems, while our SS-PRSM has a $O(1/t)$ convergence rate in general convexity case which matches the convergence rate of the batch ADMM and SCPRSM algorithms. Besides our methods has faster convergence rate and lower memory cost. SCG-PRSM is the first to improve the performance by incorporating conjugate gradient and using the Armijo line search method. Experiments shows that the proposed algorithms are faster than stochastic and batch ADMM algorithms. The numerical experiments show SCG-PRSM achieve the state-of-the-art performance on our benchmark datasets.

• [stat.ML]**Straggler Mitigation in Distributed Optimization Through Data Encoding**

*Can Karakus, Yifan Sun, Suhas Diggavi, Wotao Yin*

http://arxiv.org/abs/1711.04969v1

Slow running or straggler tasks can significantly reduce computation speed in distributed computation. Recently, coding-theory-inspired approaches have been applied to mitigate the effect of straggling, through embedding redundancy in certain linear computational steps of the optimization algorithm, thus completing the computation without waiting for the stragglers. In this paper, we propose an alternate approach where we embed the redundancy directly in the data itself, and allow the computation to proceed completely oblivious to encoding. We propose several encoding schemes, and demonstrate that popular batch algorithms, such as gradient descent and L-BFGS, applied in a coding-oblivious manner, deterministically achieve sample path linear convergence to an approximate solution of the original problem, using an arbitrarily varying subset of the nodes at each iteration. Moreover, this approximation can be controlled by the amount of redundancy and the number of nodes used in each iteration. We provide experimental results demonstrating the advantage of the approach over uncoded and data replication strategies.

• [stat.ML]**The Multi-layer Information Bottleneck Problem**

*Qianqian Yang, Pablo Piantanida, Deniz Gündüz*

http://arxiv.org/abs/1711.05102v1

The muti-layer information bottleneck (IB) problem, where information is propagated (or successively refined) from layer to layer, is considered. Based on information forwarded by the preceding layer, each stage of the network is required to preserve a certain level of relevance with regards to a specific hidden variable, quantified by the mutual information. The hidden variables and the source can be arbitrarily correlated. The optimal trade-off between rates of relevance and compression (or complexity) is obtained through a single-letter characterization, referred to as the rate-relevance region. Conditions of successive refinabilty are given. Binary source with BSC hidden variables and binary source with BSC/BEC mixed hidden variables are both proved to be successively refinable. We further extend our result to Guassian models. A counterexample of successive refinability is also provided.

• [stat.ML]**Variance Reduced methods for Non-convex Composition Optimization**

*Liu Liu, Ji Liu, Dacheng Tao*

http://arxiv.org/abs/1711.04416v1

This paper explores the non-convex composition optimization in the form including inner and outer finite-sum functions with a large number of component functions. This problem arises in some important applications such as nonlinear embedding and reinforcement learning. Although existing approaches such as stochastic gradient descent (SGD) and stochastic variance reduced gradient (SVRG) descent can be applied to solve this problem, their query complexity tends to be high, especially when the number of inner component functions is large. In this paper, we apply the variance-reduced technique to derive two variance reduced algorithms that significantly improve the query complexity if the number of inner component functions is large. To the best of our knowledge, this is the first work that establishes the query complexity analysis for non-convex stochastic composition. Experiments validate the proposed algorithms and theoretical analysis.

• [stat.ML]**WMRB: Learning to Rank in a Scalable Batch Training Approach**

*Kuan Liu, Prem Natarajan*

http://arxiv.org/abs/1711.04015v1

We propose a new learning to rank algorithm, named Weighted Margin-Rank Batch loss (WMRB), to extend the popular Weighted Approximate-Rank Pairwise loss (WARP). WMRB uses a new rank estimator and an efficient batch training algorithm. The approach allows more accurate item rank approximation and explicit utilization of parallel computation to accelerate training. In three item recommendation tasks, WMRB consistently outperforms WARP and other baselines. Moreover, WMRB shows clear time efficiency advantages as data scale increases.