# 今日学术视野(2016.07.13)

说明:

astro-ph.GA - 星系天体物理学

cs.AI - 人工智能

cs.CE - 计算工程、 金融和科学

cs.CL - 计算与语言

cs.CV - 机器视觉与模式识别

cs.DB - 数据库

cs.DC - 分布式、并行与集群计算

cs.DL - 数字图书馆

cs.GT - 计算机科学与游戏理论

cs.HC - 人机接口

cs.IR - 信息检索

cs.IT - 信息论

cs.LG - 自动学习

cs.NA - 数值分析

cs.NE - 神经与进化计算

cs.PL - 编程语言

cs.SI - 社交网络与信息网络

math.OC - 优化与控制

math.ST - 统计理论

physics.ao-ph - 大气和海洋物理

physics.soc-ph - 物理学与社会

stat.AP - 应用统计

stat.CO - 统计计算

stat.ME - 统计方法论

stat.ML - (统计)机器学习

• [astro-ph.GA]Bayesian isochrone fitting and stellar ages

• [cs.AI]How to Allocate Resources For Features Acquisition?

• [cs.CE]Microwave Tomographic Imaging of Cerebrovascular Accidents by Using High-Performance Computing

• [cs.CE]The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability

• [cs.CL]Analysis of opinionated text for opinion mining

• [cs.CL]Charagram: Embedding Words and Sentences via Character n-grams

• [cs.CL]Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach

• [cs.CL]Mapping distributional to model-theoretic semantic spaces: a baseline

• [cs.CL]Open Information Extraction

• [cs.CL]Syntactic Phylogenetic Trees

• [cs.CL]The benefits of word embeddings features for active learning in clinical information extraction

• [cs.CV]A Benchmark for License Plate Character Segmentation

• [cs.CV]Adversarial Training For Sketch Retrieval

• [cs.CV]Adversarial examples in the physical world

• [cs.CV]Annotation Methodologies for Vision and Language Dataset Creation

• [cs.CV]Combining multiple resolutions into hierarchical representations for kernel-based image classification

• [cs.CV]Deep Learning of Appearance Models for Online Object Tracking

• [cs.CV]Direct Sparse Odometry

• [cs.CV]Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search

• [cs.CV]Hierarchical Deep Temporal Models for Group Activity Recognition

• [cs.CV]Hypergraph Modelling for Geometric Model Fitting

• [cs.CV]Inference of Haemoglobin Concentration From Stereo RGB

• [cs.CV]Learning to Sketch Human Facial Portraits using Personal Styles by Case-Based Reasoning

• [cs.CV]Memory Efficient Nonuniform Quantization for Deep Convolutional Neural Network

• [cs.CV]Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition

• [cs.CV]Towards an "In-the-Wild" Emotion Dataset Using a Game-based Framework

• [cs.CV]Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition

• [cs.CV]Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

• [cs.DB]Extending Weakly-Sticky Datalog+/-: Query-Answering Tractability and Optimizations

• [cs.DC]AccuracyTrader: Accuracy-aware Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services

• [cs.DC]Design Patterns in Beeping Algorithms

• [cs.DC]Enhancing HPC Security with a User-Based Firewall

• [cs.DC]High-Level Programming Abstractions for Distributed Graph Processing

• [cs.DL]BioInfoBase : A Bioinformatics Resourceome

• [cs.GT]From Behavior to Sparse Graphical Games: Efficient Recovery of Equilibria

• [cs.HC]Augmenting Supervised Emotion Recognition with Rule-Based Decision Model

• [cs.HC]Multimodal Affect Recognition using Kinect

• [cs.IR]Hybrid Recommender System Based on Personal Behavior Mining

• [cs.IR]Randomised Relevance Model

• [cs.IT]Bounds on the Number of Measurements for Reliable Compressive Classification

• [cs.IT]Linear signal recovery from $b$-bit-quantized linear measurements: precise analysis of the trade-off between bit depth and number of measurements

• [cs.IT]Minimum Description Length Principle in Supervised Learning with Application to Lasso

• [cs.IT]New approach to Bayesian high-dimensional linear regression

• [cs.LG]Classifier Risk Estimation under Limited Labeling Resources

• [cs.LG]Dealing with Class Imbalance using Thresholding

• [cs.LG]Incremental Factorization Machines for Persistently Cold-starting Online Item Recommendation

• [cs.LG]Kernel-based methods for bandit convex optimization

• [cs.LG]Learning a metric for class-conditional KNN

• [cs.LG]Learning from Multiway Data: Simple and Efficient Tensor Regression

• [cs.LG]Online Learning Schemes for Power Allocation in Energy Harvesting Communications

• [cs.LG]Recurrent Memory Array Structures

• [cs.LG]Tight Lower Bounds for Multiplicative Weights Algorithmic Families

• [cs.LG]Uncovering Locally Discriminative Structure for Feature Analysis

• [cs.NA]A Unified Alternating Direction Method of Multipliers by Majorization Minimization

• [cs.NA]Inexact Block Coordinate Descent Methods For Symmetric Nonnegative Matrix Factorization

• [cs.NA]Proximal Quasi-Newton Methods for Convex Optimization

• [cs.NE]Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling

• [cs.NE]Forward Table-Based Presynaptic Event-Triggered Spike-Timing-Dependent Plasticity

• [cs.NE]The BioDynaMo Project

• [cs.PL]sk_p: a neural program corrector for MOOCs

• [cs.SI]Are human interactivity times lognormal?

• [cs.SI]Learning from the News: Predicting Entity Popularity on Twitter

• [cs.SI]Privacy Leakage through Innocent Content Sharing in Online Social Networks

• [math.OC]An Improved Convergence Analysis of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization

• [math.OC]Beating level-set methods for 3D seismic data interpolation: a primal-dual alternating approach

• [math.ST]A data-driven method for improving the correlation estimation in serial ensemble Kalman filters

• [math.ST]Barycentric Subspace Analysis on Manifolds

• [math.ST]Beyond the Pearson correlation: heavy-tailed risks, weighted Gini correlations, and a Gini-type weighted insurance pricing model

• [math.ST]Conjugacy properties of time-evolving Dirichlet and gamma random measures

• [math.ST]Convergence of Multivariate Quantile Surfaces

• [math.ST]Integral form of the COM-Poisson normalization constant

• [math.ST]MAGIC: a general, powerful and tractable method for selective inference

• [math.ST]On the Unique Crossing Conjecture of Diaconis and Perlman on Convolutions of Gamma Random Variables

• [physics.ao-ph]Evaluating Effectiveness of DART Buoy Networks

• [physics.soc-ph]Influence of temporal aspects and age-correlations on the process of opinion formation based on Polish contact survey

• [physics.soc-ph]Towards Limited Scale-free Topology with Dynamic Peer Participation

• [stat.AP]A low-rank based estimation-testing procedure for matrix-covariate regression

• [stat.AP]Approximate Bayesian Computation for Lorenz Curves from Grouped Data

• [stat.AP]Stochastic differential equation mixed effects models for tumor growth and response to treatment

• [stat.CO]Parallel local approximation MCMC for expensive models

• [stat.ME]Approximate Bayesian computation (ABC) coupled with Bayesian model averaging method for estimating mean and standard deviation

• [stat.ME]Bayesian variable selection in high dimensional problems without assumptions on prior model probabilities

• [stat.ME]Convex Relaxation for Community Detection with Covariates

• [stat.ME]Discrete Choice Models for Nonmonotone Nonignorable Missing Data: Identification and Inference

• [stat.ME]Estimating the number of species to attain sufficient representation in a random sample

• [stat.ME]How to use empirical process for deriving asymptotic laws for functions of the sample

• [stat.ME]Non-Concave Penalization in Linear Mixed-Effects Models and Regularized Selection of Fixed Effects

• [stat.ME]Pseudo-Marginal Hamiltonian Monte Carlo

• [stat.ME]Robust estimation and inference for the local instrumental variable curve

• [stat.ME]Scalable Bayesian modeling, monitoring and analysis of dynamic network flow data

• [stat.ME]Shared Subspace Models for Multi-Group Covariance Estimation

• [stat.ML]Bayesian quantile additive regression trees

• [stat.ML]Magnetic Hamiltonian Monte Carlo

• [stat.ML]Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016)

• [stat.ML]Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-Recidivism Policies in Colombia

• [stat.ML]Sparse additive Gaussian process with soft interactions

·····································

• [astro-ph.GA]**Bayesian isochrone fitting and stellar ages**

*D. Valls-Gabaud*

http://arxiv.org/abs/1607.03000v1

Stellar evolution theory has been extraordinarily successful at explaining the different phases under which stars form, evolve and die. While the strongest constraints have traditionally come from binary stars, the advent of asteroseismology is bringing unique measures in well-characterised stars. For stellar populations in general, however, only photometric measures are usually available, and the comparison with the predictions of stellar evolution theory have mostly been qualitative. For instance, the geometrical shapes of isochrones have been used to infer ages of coeval populations, but without any proper statistical basis. In this chapter we provide a pedagogical review on a Bayesian formalism to make quantitative inferences on the properties of single, binary and small ensembles of stars, including unresolved populations. As an example, we show how stellar evolution theory can be used in a rigorous way as a prior information to measure the ages of stars between the ZAMS and the Helium flash, and their uncertainties, using photometric data only.

• [cs.AI]**How to Allocate Resources For Features Acquisition?**

*Oran Richman, Shie Mannor*

http://arxiv.org/abs/1607.02763v1

We study classification problems where features are corrupted by noise and where the magnitude of the noise in each feature is influenced by the resources allocated to its acquisition. This is the case, for example, when multiple sensors share a common resource (power, bandwidth, attention, etc.). We develop a method for computing the optimal resource allocation for a variety of scenarios and derive theoretical bounds concerning the benefit that may arise by non-uniform allocation. We further demonstrate the effectiveness of the developed method in simulations.

• [cs.CE]**Microwave Tomographic Imaging of Cerebrovascular Accidents by Using High-Performance Computing**

*P. -H. Tournier, I. Aliferis, M. Bonazzoli, M. de Buhan, M. Darbas, V. Dolean, F. Hecht, P. Jolivet, I. El Kanfoud, C. Migliaccio, F. Nataf, C. Pichot, S. Semenov*

http://arxiv.org/abs/1607.02573v1

The motivation of this work is the detection of cerebrovascular accidents by microwave tomographic imaging. This requires the solution of an inverse problem relying on a minimization algorithm (for example, gradient-based), where successive iterations consist in repeated solutions of a direct problem. The reconstruction algorithm is extremely computationally intensive and makes use of efficient parallel algorithms and high-performance computing. The feasibility of this type of imaging is conditioned on one hand by an accurate reconstruction of the material properties of the propagation medium and on the other hand by a considerable reduction in simulation time. Fulfilling these two requirements will enable a very rapid and accurate diagnosis. From the mathematical and numerical point of view, this means solving Maxwell's equations in time-harmonic regime by appropriate domain decomposition methods, which are naturally adapted to parallel architectures.

• [cs.CE]**The Vectorization of the Tersoff Multi-Body Potential: An Exercise in Performance Portability**

*Markus Höhnerbach, Ahmed E. Ismail, Paolo Bientinesi*

http://arxiv.org/abs/1607.02904v1

Molecular dynamics simulations, an indispensable research tool in computational chemistry and materials science, consume a significant portion of the supercomputing cycles around the world. We focus on multi-body potentials and aim at achieving performance portability. Compared with well-studied pair potentials, multibody potentials deliver increased simulation accuracy but are too complex for effective compiler optimization. Because of this, achieving cross-platform performance remains an open question. By abstracting from target architecture and computing precision, we develop a vectorization scheme applicable to both CPUs and accelerators. We present results for the Tersoff potential within the molecular dynamics code LAMMPS on several architectures, demonstrating efficiency gains not only for computational kernels, but also for large-scale simulations. On a cluster of Intel Xeon Phi's, our optimized solver is between 3 and 5 times faster than the pure MPI reference.

• [cs.CL]**Analysis of opinionated text for opinion mining**

*K Paramesha, K C Ravishankar*

http://arxiv.org/abs/1607.02576v1

In sentiment analysis, the polarities of the opinions expressed on an object/feature are determined to assess the sentiment of a sentence or document whether it is positive/negative/neutral. Naturally, the object/feature is a noun representation which refers to a product or a component of a product, let us say, the "lens" in a camera and opinions emanating on it are captured in adjectives, verbs, adverbs and noun words themselves. Apart from such words, other meta-information and diverse effective features are also going to play an important role in influencing the sentiment polarity and contribute significantly to the performance of the system. In this paper, some of the associated information/meta-data are explored and investigated in the sentiment text. Based on the analysis results presented here, there is scope for further assessment and utilization of the meta-information as features in text categorization, ranking text document, identification of spam documents and polarity classification problems.

• [cs.CL]**Charagram: Embedding Words and Sentences via Character n-grams**

*John Wieting, Mohit Bansal, Kevin Gimpel, Karen Livescu*

http://arxiv.org/abs/1607.02789v1

We present Charagram embeddings, a simple approach for learning character-based compositional models to embed textual sequences. A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear transformation to yield a low-dimensional embedding. We use three tasks for evaluation: word similarity, sentence similarity, and part-of-speech tagging. We demonstrate that Charagram embeddings outperform more complex architectures based on character-level recurrent and convolutional neural networks, achieving new state-of-the-art performance on several similarity tasks.

• [cs.CL]**Exploring the Political Agenda of the European Parliament Using a Dynamic Topic Modeling Approach**

*Derek Greene, James P. Cross*

http://arxiv.org/abs/1607.03055v1

This study analyzes the political agenda of the European Parliament (EP) plenary, how it has evolved over time, and the manner in which Members of the European Parliament (MEPs) have reacted to external and internal stimuli when making plenary speeches. To unveil the plenary agenda and detect latent themes in legislative speeches over time, MEP speech content is analyzed using a new dynamic topic modeling method based on two layers of Non-negative Matrix Factorization (NMF). This method is applied to a new corpus of all English language legislative speeches in the EP plenary from the period 1999-2014. Our findings suggest that two-layer NMF is a valuable alternative to existing dynamic topic modeling approaches found in the literature, and can unveil niche topics and associated vocabularies not captured by existing methods. Substantively, our findings suggest that the political agenda of the EP evolves significantly over time and reacts to exogenous events such as EU Treaty referenda and the emergence of the Euro-crisis. MEP contributions to the plenary agenda are also found to be impacted upon by voting behaviour and the committee structure of the Parliament.

• [cs.CL]**Mapping distributional to model-theoretic semantic spaces: a baseline**

*Franck Dernoncourt*

http://arxiv.org/abs/1607.02802v1

Word embeddings have been shown to be useful across state-of-the-art systems in many natural language processing tasks, ranging from question answering systems to dependency parsing. (Herbelot and Vecchi, 2015) explored word embeddings and their utility for modeling language semantics. In particular, they presented an approach to automatically map a standard distributional semantic space onto a set-theoretic model using partial least squares regression. We show in this paper that a simple baseline achieves a +51% relative improvement compared to their model on one of the two datasets they used, and yields competitive results on the second dataset.

• [cs.CL]**Open Information Extraction**

*Duc-Thuan Vo, Ebrahim Bagheri*

http://arxiv.org/abs/1607.02784v1

Open Information Extraction (Open IE) systems aim to obtain relation tuples with highly scalable extraction in portable across domain by identifying a variety of relation phrases and their arguments in arbitrary sentences. The first generation of Open IE learns linear chain models based on unlexicalized features such as Part-of-Speech (POS) or shallow tags to label the intermediate words between pair of potential arguments for identifying extractable relations. Open IE currently is developed in the second generation that is able to extract instances of the most frequently observed relation types such as Verb, Noun and Prep, Verb and Prep, and Infinitive with deep linguistic analysis. They expose simple yet principled ways in which verbs express relationships in linguistics such as verb phrase-based extraction or clause-based extraction. They obtain a significantly higher performance over previous systems in the first generation. In this paper, we describe an overview of two Open IE generations including strengths, weaknesses and application areas.

• [cs.CL]**Syntactic Phylogenetic Trees**

*Kevin Shu, Sharjeel Aziz, Vy-Luan Huynh, David Warrick, Matilde Marcolli*

http://arxiv.org/abs/1607.02791v1

In this paper we identify several serious problems that arise in the use of syntactic data from the SSWL database for the purpose of computational phylogenetic reconstruction. We show that the most naive approach fails to produce reliable linguistic phylogenetic trees. We identify some of the sources of the observed problems and we discuss how they may be, at least partly, corrected by using additional information, such as prior subdivision into language families and subfamilies, and a better use of the information about ancient languages. We also describe how the use of phylogenetic algebraic geometry can help in estimating to what extent the probability distribution at the leaves of the phylogenetic tree obtained from the SSWL data can be considered reliable, by testing it on phylogenetic trees established by other forms of linguistic analysis. In simple examples, we find that, after restricting to smaller language subfamilies and considering only those SSWL parameters that are fully mapped for the whole subfamily, the SSWL data match extremely well reliable phylogenetic trees, according to the evaluation of phylogenetic invariants. This is a promising sign for the use of SSWL data for linguistic phylogenetics.

• [cs.CL]**The benefits of word embeddings features for active learning in clinical information extraction**

*Mahnoosh Kholghi, Lance De Vine, Laurianne Sitbon, Guido Zuccon, Anthony Nguyen*

http://arxiv.org/abs/1607.02810v1

Objective This study investigates the use of word embeddings and sequence features for sample representation in an active learning framework built to extract clinical concepts from clinical free text. The objective is to further reduce the manual annotation effort while achieving higher effectiveness compared to a set of baseline features. Materials and methods The comparative performance of unsupervised features and baseline hand-crafted features in an active learning framework were investigated. Unsupervised features were derived from skip-gram word embeddings and a sequence representation approach. Least confidence, information diversity, information density and diversity, and domain knowledge informativeness were used as selection criteria for active learning framework. Two clinical datasets were used for evaluation: the i2b2/VA 2010 NLP challenge and the ShARe/CLEF 2013 eHealth Evaluation Lab. Results Our results demonstrated significant improvement by adding unsupervised word and sequence level features in the active learning framework in terms of effectiveness as well as annotation effort across both datasets. Using unsupervised features along with baseline features for sample representation led to further savings of up to 10% and 6% of the token and concept annotation rates, respectively. Conclusion This study shows that the manual annotation of clinical free text for automated analysis can be accelerated by using unsupervised features for sample representation in an active learning framework. To the best of our knowledge, this is the first study to analyze the effect of unsupervised features on active learning performance in clinical information extraction.

• [cs.CV]**A Benchmark for License Plate Character Segmentation**

*Gabriel Resende Gonçalves, Sirlene Pio Gomes da Silva, David Menotti, William Robson Schwartz*

http://arxiv.org/abs/1607.02937v1

Automatic License Plate Recognition (ALPR) has been the focus of many researches in the past years. In general, ALPR is divided into the following problems: detection of on-track vehicles, license plates detection, segmention of license plate characters and optical character recognition (OCR). Even though commercial solutions are available for controlled acquisition conditions, e.g., the entrance of a parking lot, ALPR is still an open problem when dealing with data acquired from uncontrolled environments, such as roads and highways when relying only on imaging sensors. Due to the multiple orientations and scales of the license plates captured by the camera, a very challenging task of the ALPR is the License Plate Character Segmentation (LPCS) step, which effectiveness is required to be (near) optimal to achieve a high recognition rate by the OCR. To tackle the LPCS problem, this work proposes a novel benchmark composed of a dataset designed to focus specifically on the character segmentation step of the ALPR within an evaluation protocol. Furthermore, we propose the Jaccard-Centroid coefficient, a new evaluation measure more suitable than the Jaccard coefficient regarding the location of the bounding box within the ground-truth annotation. The dataset is composed of 2,000 Brazilian license plates consisting of 14,000 alphanumeric symbols and their corresponding bounding box annotations. We also present a new straightforward approach to perform LPCS efficiently. Finally, we provide an experimental evaluation for the dataset based on four LPCS approaches and demonstrate the importance of character segmentation for achieving an accurate OCR.

• [cs.CV]**Adversarial Training For Sketch Retrieval**

*Antonia Creswell, Anil Anthony Bharath*

http://arxiv.org/abs/1607.02748v1

Generative Adversarial Networks (GAN) can learn excellent representations for unlabelled data which have been applied to image generation and scene classification. The representations have not yet - to the best of our knowledge - been applied to visual search. In this paper, we show that representations learned by GANs can be applied to visual search within heritage documents that contain Merchant Marks, sketch-like symbols that are similar to hieroglyphs. We introduce a novel GAN architecture with design features that makes it suitable for sketch understanding. The performance of this sketch-GAN is compared to a modified version of the original GAN architecture with respect to simple invariance properties. Experiments suggest that sketch-GANs learn representations that are suitable for retrieval and which also have increased stability to rotation, scale and translation.

• [cs.CV]**Adversarial examples in the physical world**

*Alexey Kurakin, Ian Goodfellow, Samy Bengio*

http://arxiv.org/abs/1607.02533v1

Most existing machine learning classifiers are highly vulnerable to adversarial examples. An adversarial example is a sample of input data which has been modified very slightly in a way that is intended to cause a machine learning classifier to misclassify it. In many cases, these modifications can be so subtle that a human observer does not even notice the modification at all, yet the classifier still makes a mistake. Adversarial examples pose security concerns because they could be used to perform an attack on machine learning systems, even if the adversary has no access to the underlying model. Up to now, all previous work have assumed a threat model in which the adversary can feed data directly into the machine learning classifier. This is not always the case for systems operating in the physical world, for example those which are using signals from cameras and other sensors as an input. This paper shows that even in such physical world scenarios, machine learning systems are vulnerable to adversarial examples. We demonstrate this by feeding adversarial images obtained from cell-phone camera to an ImageNet Inception classifier and measuring the classification accuracy of the system. We find that a large fraction of adversarial examples are classified incorrectly even when perceived through the camera.

• [cs.CV]**Annotation Methodologies for Vision and Language Dataset Creation**

*Gitit Kehat, James Pustejovsky*

http://arxiv.org/abs/1607.02769v1

Annotated datasets are commonly used in the training and evaluation of tasks involving natural language and vision (image description generation, action recognition and visual question answering). However, many of the existing datasets reflect problems that emerge in the process of data selection and annotation. Here we point out some of the difficulties and problems one confronts when creating and validating annotated vision and language datasets.

• [cs.CV]**Combining multiple resolutions into hierarchical representations for kernel-based image classification**

*Yanwei Cui, Sébastien Lefevre, Laetitia Chapel, Anne Puissant*

http://arxiv.org/abs/1607.02654v1

Geographic object-based image analysis (GEOBIA) framework has gained increasing interest recently. Following this popular paradigm, we propose a novel multiscale classification approach operating on a hierarchical image representation built from two images at different resolutions. They capture the same scene with different sensors and are naturally fused together through the hierarchical representation, where coarser levels are built from a Low Spatial Resolution (LSR) or Medium Spatial Resolution (MSR) image while finer levels are generated from a High Spatial Resolution (HSR) or Very High Spatial Resolution (VHSR) image. Such a representation allows one to benefit from the context information thanks to the coarser levels, and subregions spatial arrangement information thanks to the finer levels. Two dedicated structured kernels are then used to perform machine learning directly on the constructed hierarchical representation. This strategy overcomes the limits of conventional GEOBIA classification procedures that can handle only one or very few pre-selected scales. Experiments run on an urban classification task show that the proposed approach can highly improve the classification accuracy w.r.t. conventional approaches working on a single scale.

• [cs.CV]**Deep Learning of Appearance Models for Online Object Tracking**

*Mengyao Zhai, Mehrsan Javan Roshtkhari, Greg Mori*

http://arxiv.org/abs/1607.02568v1

This paper introduces a novel deep learning based approach for vision based single target tracking. We address this problem by proposing a network architecture which takes the input video frames and directly computes the tracking score for any candidate target location by estimating the probability distributions of the positive and negative examples. This is achieved by combining a deep convolutional neural network with a Bayesian loss layer in a unified framework. In order to deal with the limited number of positive training examples, the network is pre-trained offline for a generic image feature representation and then is fine-tuned in multiple steps. An online fine-tuning step is carried out at every frame to learn the appearance of the target. We adopt a two-stage iterative algorithm to adaptively update the network parameters and maintain a probability density for target/non-target regions. The tracker has been tested on the standard tracking benchmark and the results indicate that the proposed solution achieves state-of-the-art tracking results.

• [cs.CV]**Direct Sparse Odometry**

*Jakob Engel, Vladlen Koltun, Daniel Cremers*

http://arxiv.org/abs/1607.02565v1

We propose a novel direct sparse visual odometry formulation. It combines a fully direct probabilistic model (minimizing a photometric error) with consistent, joint optimization of all model parameters, including geometry -- represented as inverse depth in a reference frame -- and camera motion. This is achieved in real time by omitting the smoothness prior used in other direct methods and instead sampling pixels evenly throughout the images. Since our method does not depend on keypoint detectors or descriptors, it can naturally sample pixels from across all image regions that have intensity gradient, including edges or smooth intensity variations on mostly white walls. The proposed model integrates a full photometric calibration, accounting for exposure time, lens vignetting, and non-linear response functions. We thoroughly evaluate our method on three different datasets comprising several hours of video. The experiments show that the presented approach significantly outperforms state-of-the-art direct and indirect methods in a variety of real-world settings, both in terms of tracking accuracy and robustness.

• [cs.CV]**Efficient Activity Detection in Untrimmed Video with Max-Subgraph Search**

*Chao-Yeh Chen, Kristen Grauman*

http://arxiv.org/abs/1607.02815v1

We propose an efficient approach for activity detection in video that unifies activity categorization with space-time localization. The main idea is to pose activity detection as a maximum-weight connected subgraph problem. Offline, we learn a binary classifier for an activity category using positive video exemplars that are "trimmed" in time to the activity of interest. Then, given a novel \emph{untrimmed} video sequence, we decompose it into a 3D array of space-time nodes, which are weighted based on the extent to which their component features support the learned activity model. To perform detection, we then directly localize instances of the activity by solving for the maximum-weight connected subgraph in the test video's space-time graph. We show that this detection strategy permits an efficient branch-and-cut solution for the best-scoring---and possibly non-cubically shaped---portion of the video for a given activity classifier. The upshot is a fast method that can search a broader space of space-time region candidates than was previously practical, which we find often leads to more accurate detection. We demonstrate the proposed algorithm on four datasets, and we show its speed and accuracy advantages over multiple existing search strategies.

• [cs.CV]**Hierarchical Deep Temporal Models for Group Activity Recognition**

*Mostafa S. Ibrahim, Srikanth Muralidharan, Zhiwei Deng, Arash Vahdat, Greg Mori*

http://arxiv.org/abs/1607.02643v1

In this paper we present an approach for classifying the activity performed by a group of people in a video sequence. This problem of group activity recognition can be addressed by examining individual person actions and their relations. Temporal dynamics exist both at the level of individual person actions as well as at the level of group activity. Given a video sequence as input, methods can be developed to capture these dynamics at both person-level and group-level detail. We build a deep model to capture these dynamics based on LSTM (long short-term memory) models. In order to model both person-level and group-level dynamics, we present a 2-stage deep temporal model for the group activity recognition problem. In our approach, one LSTM model is designed to represent action dynamics of individual people in a video sequence and another LSTM model is designed to aggregate person-level information for group activity recognition. We collected a new dataset consisting of volleyball videos labeled with individual and group activities in order to evaluate our method. Experimental results on this new Volleyball Dataset and the standard benchmark Collective Activity Dataset demonstrate the efficacy of the proposed models.

• [cs.CV]**Hypergraph Modelling for Geometric Model Fitting**

*Guobao Xiao, Hanzi Wang, Taotao Lai, David Suter*

http://arxiv.org/abs/1607.02829v1

In this paper, we propose a novel hypergraph based method (called HF) to fit and segment multi-structural data. The proposed HF formulates the geometric model fitting problem as a hypergraph partition problem based on a novel hypergraph model. In the hypergraph model, vertices represent data points and hyperedges denote model hypotheses. The hypergraph, with large and "data-determined" degrees of hyperedges, can express the complex relationships between model hypotheses and data points. In addition, we develop a robust hypergraph partition algorithm to detect sub-hypergraphs for model fitting. HF can effectively and efficiently estimate the number of, and the parameters of, model instances in multi-structural data heavily corrupted with outliers simultaneously. Experimental results show the advantages of the proposed method over previous methods on both synthetic data and real images.

• [cs.CV]**Inference of Haemoglobin Concentration From Stereo RGB**

*Geoffrey Jones, Neil T. Clancy, Simon Arridge, Daniel S. Elson, Danail Stoyanov*

http://arxiv.org/abs/1607.02936v1

Multispectral imaging (MSI) can provide information about tissue oxygenation, perfusion and potentially function during surgery. In this paper we present a novel, near real-time technique for intrinsic measurements of total haemoglobin (THb) and blood oxygenation (SO2) in tissue using only RGB images from a stereo laparoscope. The high degree of spectral overlap between channels makes inference of haemoglobin concentration challenging, non-linear and under constrained. We decompose the problem into two constrained linear sub-problems and show that with Tikhonov regularisation the estimation significantly improves, giving robust estimation of the Thb. We demonstrate by using the co-registered stereo image data from two cameras it is possible to get robust SO2 estimation as well. Our method is closed from, providing computational efficiency even with multiple cameras. The method we present requires only spectral response calibration of each camera, without modification of existing laparoscopic imaging hardware. We validate our technique on synthetic data from Monte Carlo simulation % of light transport through soft tissue containing submerged blood vessels and further, in vivo, on a multispectral porcine data set.

• [cs.CV]**Learning to Sketch Human Facial Portraits using Personal Styles by Case-Based Reasoning**

*Bingwen Jin, Songhua Xu, Weidong Geng*

http://arxiv.org/abs/1607.02715v1

This paper employs case-based reasoning (CBR) to capture the personal styles of individual artists and generate the human facial portraits from photos accordingly. For each human artist to be mimicked, a series of cases are firstly built-up from her/his exemplars of source facial photo and hand-drawn sketch, and then its stylization for facial photo is transformed as a style-transferring process of iterative refinement by looking-for and applying best-fit cases in a sense of style optimization. Two models, fitness evaluation model and parameter estimation model, are learned for case retrieval and adaptation respectively from these cases. The fitness evaluation model is to decide which case is best-fitted to the sketching of current interest, and the parameter estimation model is to automate case adaptation. The resultant sketch is synthesized progressively with an iterative loop of retrieval and adaptation of candidate cases until the desired aesthetic style is achieved. To explore the effectiveness and advantages of the novel approach, we experimentally compare the sketch portraits generated by the proposed method with that of a state-of-the-art example-based facial sketch generation algorithm as well as a couple commercial software packages. The comparisons reveal that our CBR based synthesis method for facial portraits is superior both in capturing and reproducing artists' personal illustration styles to the peer methods.

• [cs.CV]**Memory Efficient Nonuniform Quantization for Deep Convolutional Neural Network**

*Fangxuan Sun, Jun Lin*

http://arxiv.org/abs/1607.02720v1

Convolutional neural network (CNN) is one of the most famous algorithms for deep learning. It has been applied in various applications due to its remarkable performance. The real-time hardware implement of CNN is highly demanded due to its excellent performance in computer vision. However, the cost of memory of a deep CNN is very huge which increases the area of hardware implementation. In this paper, we apply several methods in the quantization of CNN and use about 5 bits for convolutional layers. The accuracy lost is less than $2%$ without fine tuning. Our experiment is depending on the VGG-16 net and Alex net. In VGG-16 net, the total memory needed after uniform quantization is 16.85 MB per image and the total memory needed after our quantization is only about 8.42 MB. Our quantization method has saved $50.0%$ of the memory needed in VGG-16 and Alex net compared with the state-of-art quantization method.

• [cs.CV]**Salient Region Detection and Segmentation in Images using Dynamic Mode Decomposition**

*Sikha O K, Sachin Kumar S, K P Soman*

http://arxiv.org/abs/1607.03021v1

Visual Saliency is the capability of vision system to select distinctive parts of scene and reduce the amount of visual data that need to be processed. The presentpaper introduces (1) a novel approach to detect salient regions by considering color and luminance based saliency scores using Dynamic Mode Decomposition (DMD), (2) a new interpretation to use DMD approach in static image processing. This approach integrates two data analysis methods: (1) Fourier Transform, (2) Principle Component Analysis.The key idea of our work is to create a color based saliency map. This is based on the observation thatsalient part of an image usually have distinct colors compared to the remaining portion of the image. We have exploited the power of different color spaces to model the complex and nonlinear behavior of human visual system to generate a color based saliency map. To further improve the effect of final saliency map, weutilized luminance information exploiting the fact that human eye is more sensitive towards brightness than color.The experimental results shows that our method based on DMD theory is effective in comparison with previous state-of-art saliency estimation approaches. The approach presented in this paperis evaluated using ROC curve, F-measure rate, Precision-Recall rate, AUC score etc.

• [cs.CV]**Towards an "In-the-Wild" Emotion Dataset Using a Game-based Framework**

*Wei Li, Farnaz Abtahi, Christina Tsangouri, Zhigang Zhu*

http://arxiv.org/abs/1607.02678v1

In order to create an "in-the-wild" dataset of facial emotions with large number of balanced samples, this paper proposes a game-based data collection framework. The framework mainly include three components: a game engine, a game interface, and a data collection and evaluation module. We use a deep learning approach to build an emotion classifier as the game engine. Then a emotion web game to allow gamers to enjoy the games, while the data collection module obtains automatically-labelled emotion images. Using our game, we have collected more than 15,000 images within a month of the test run and built an emotion dataset "GaMo". To evaluate the dataset, we compared the performance of two deep learning models trained on both GaMo and CIFE. The results of our experiments show that because of being large and balanced, GaMo can be used to build a more robust emotion detector than the emotion detector trained on CIFE, which was used in the game engine to collect the face images.

• [cs.CV]**Transition Forests: Learning Discriminative Temporal Transitions for Action Recognition**

*Guillermo Garcia-Hernando, Tae-Kyun Kim*

http://arxiv.org/abs/1607.02737v1

A human action can be seen as transitions between one's body poses over time, where the transition depicts a temporal relation between two poses. Recognizing actions thus involves learning a classifier sensitive to these pose transitions from given high-dimensional frame representations. In this paper, we introduce transitions forests, an ensemble of decision trees that learn transitions between pairs of two independent frames in a discriminative fashion. During training, node splitting is driven by alternating two different criteria: the standard classification entropy that maximizes the discrimination power in individual frames, and the proposed one in pairwise frame transitions. Growing the trees tends to group frames that have similar associated transitions and share same action label. Unlike conventional classification trees where node-wise the best split is determined, the transition forests try to find the best split of nodes jointly (within a layer) for incorporating distant node transitions. When inferring the class label of a new video, frames are independently passed down the trees (thus highly efficient) then a prediction for a certain time-frame is made based on the transitions between previous observed frames and the current one in an efficient manner. We apply our method on varied action recognition datasets showing its suitability over several baselines and state-of-the-art approaches.

• [cs.CV]**Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks**

*Tianfan Xue, Jiajun Wu, Katherine L. Bouman, William T. Freeman*

http://arxiv.org/abs/1607.02586v1

We study the problem of synthesizing a number of likely future frames from a single input image. In contrast to traditional methods, which have tackled this problem in a deterministic or non-parametric way, we propose a novel approach that models future frames in a probabilistic manner. Our probabilistic model makes it possible for us to sample and synthesize many possible future frames from a single input image. Future frame synthesis is challenging, as it involves low- and high-level image and motion understanding. We propose a novel network structure, namely a Cross Convolutional Network to aid in synthesizing future frames; this network structure encodes image and motion information as feature maps and convolutional kernels, respectively. In experiments, our model performs well on synthetic data, such as 2D shapes and animated game sprites, as well as on real-wold videos. We also show that our model can be applied to tasks such as visual analogy-making, and present an analysis of the learned network representations.

• [cs.DB]**Extending Weakly-Sticky Datalog+/-: Query-Answering Tractability and Optimizations**

*Mostafa Milani, Leopoldo Bertossi*

http://arxiv.org/abs/1607.02682v1

Weakly-sticky (WS) Datalog+/- is an expressive member of the family of Datalog+/- programs that is based on the syntactic notions of stickiness and weak-acyclicity. Query answering over the WS programs has been investigated, but there is still much work to do on the design and implementation of practical query answering (QA) algorithms and their optimizations. Here, we study sticky and WS programs from the point of view of the behavior of the chase procedure, extending the stickiness property of the chase to that of generalized stickiness of the chase (gsch-property). With this property we specify the semantic class of GSCh programs, which includes sticky and WS programs, and other syntactic subclasses that we identify. In particular, we introduce joint-weakly-sticky (JWS) programs, that include WS programs. We also propose a bottom-up QA algorithm for a range of subclasses of GSCh. The algorithm runs in polynomial time (in data) for JWS programs. Unlike the WS class, JWS is closed under a general magic-sets rewriting procedure for the optimization of programs with existential rules. We apply the magic-sets rewriting in combination with the proposed QA algorithm for the optimization of QA over JWS programs.

• [cs.DC]**AccuracyTrader: Accuracy-aware Approximate Processing for Low Tail Latency and High Result Accuracy in Cloud Online Services**

*Rui Han, Siguang Huang, Fei Tang, Fugui Chang, Jianfeng Zhan*

http://arxiv.org/abs/1607.02734v1

Modern latency-critical online services such as search engines often process requests by consulting large input data spanning massive parallel components. Hence the tail latency of these components determines the service latency. To trade off result accuracy for tail latency reduction, existing techniques use the components responding before a specified deadline to produce approximate results. However, they may skip a large proportion of components when load gets heavier, thus incurring large accuracy losses. This paper presents AccuracyTrader that produces approximate results with small accuracy losses while maintaining low tail latency. AccuracyTrader aggregates information of input data on each component to create a small synopsis, thus enabling all components producing initial results quickly using their synopses. AccuracyTrader also uses synopses to identify the parts of input data most related to arbitrary requests' result accuracy, thus first using these parts to improve the produced results in order to minimize accuracy losses. We evaluated AccuracyTrader using workloads in real services. The results show: (i) AccuracyTrader reduces tail latency by over 40 times with accuracy losses of less than 7% compared to existing exact processing techniques; (ii) when using the same latency, AccuracyTrader reduces accuracy losses by over 13 times comparing to existing approximate processing techniques.

• [cs.DC]**Design Patterns in Beeping Algorithms**

*Arnaud Casteigts, Yves Métivier, John Michael Robson, Akka Zemmari*

http://arxiv.org/abs/1607.02951v1

We consider networks of processes which interact with beeps. In the basic model defined by Cornejo and Kuhn, which we refer to as the $BL$ variant, processes can choose in each round either to beep or to listen. Those who beep are unable to detect simultaneous beeps. Those who listen can only distinguish between silence and the presence of at least one beep. Stronger variants exist where the nodes can also detect collision while they are beeping ($B_{cd}L$) or listening ($BL_{cd}$), or both ($B_{cd}L_{cd}$). This paper starts with a discussion on generic building blocks ({\em design patterns}) which seem to occur frequently in the design of beeping algorithms. They include {\em multi-slot phases}, {\em exclusive beeps}, {\em adaptive probability}, {\em internal} (resp. {\em peripheral}) collision detection, and {\em emulation} of collision detection when it is not available as a primitive. The paper then provides algorithms for a number of basic problems, including colouring, 2-hop colouring, degree computation, 2-hop MIS, and collision detection (in $BL$). Using the patterns, we formulate these algorithms in a rather concise and intuitive way. Their analyses are more technical. One of them relies on a Martingale technique with non-independent variables. Another of our analyses improves that of the MIS algorithm by Jeavons et al. by getting rid of a gigantic constant (the asymptotic order was already optimal). Finally, we study the relative power of several variants of beeping models. In particular, we explain how {\em every} Las Vegas algorithm with collision detection can be converted, through emulation, into a Monte Carlo algorithm without, at the cost of a logarithmic slowdown. We prove this is optimal up to a constant factor by giving a matching lower bound, and provide an example of use for solving the MIS problem in $BL$.

• [cs.DC]**Enhancing HPC Security with a User-Based Firewall**

*Andrew Prout, William Arcand, David Bestor, Bill Bergeron, Chansup Byun, Vijay Gadepally, Matthew Hubbell, Michael Houle, Michael Jones, Peter Michaleas, Lauren Milechin, Julie Mullen, Antonio Rosa, Siddharth Samsi, Albert Reuther, Jeremy Kepner*

http://arxiv.org/abs/1607.02982v1

HPC systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network must provide their own authentication. Many methods have been employed to perform this authentication. However, support for all of these methods requires the HPC application developer to include support and the user to configure and enable these services. The user-based firewall capability we have prototyped enables a set of rules governing connections across the HPC internal network to be put into place using Linux netfilter. By using an operating system-level capability, the system is not reliant on any developer or user actions to enable security. The rules we have chosen and implemented are crafted to not impact the vast majority of users and be completely invisible to them.

• [cs.DC]**High-Level Programming Abstractions for Distributed Graph Processing**

*Vasiliki Kalavri, Vladimir Vlassov, Seif Haridi*

http://arxiv.org/abs/1607.02646v1

Efficient processing of large-scale graphs in distributed environments has been an increasingly popular topic of research in recent years. Inter-connected data that can be modeled as graphs arise in application domains such as machine learning, recommendation, web search, and social network analysis. Writing distributed graph applications is inherently hard and requires programming models that can cover a diverse set of problem domains, including iterative refinement algorithms, graph transformations, graph aggregations, pattern matching, ego-network analysis, and graph traversals. Several high-level programming abstractions have been proposed and adopted by distributed graph processing systems and big data platforms. Even though significant work has been done to experimentally compare distributed graph processing frameworks, no qualitative study and comparison of graph programming abstractions has been conducted yet. In this survey, we review and analyze the most prevalent high-level programming models for distributed graph processing, in terms of their semantics and applicability. We identify the classes of graph applications that can be naturally expressed by each abstraction and we also give examples of applications that are hard or impossible to express. We review 34 distributed graph processing systems with respect to their programming abstractions, execution models, and communication mechanisms. Finally, we discuss trends and open research questions in the area of distributed graph processing.

• [cs.DL]**BioInfoBase : A Bioinformatics Resourceome**

*Saeid Kadkhodaei, Fatemeh Barantalab, Sima Taheri, Majid Foroughi, Farahnaz Golestan Hashemi, Mahmood Reza Shabanimofrad, Hossein Hosseinimonfared, Morvarid Akhavan Rezaei, Ali Ranjbarfard, Mahbod Sahebi, Parisa Azizi, Maryam Dadar, Rambod Abiri, Mohammad Fazel Harighi, Nahid Kalhori, Mohammad Reza Etemadi, Ali Baradaran, Mahmoud Danaee, Zahra Azhdari, Hamid Rajabi Memari, vajiheh Safavi, Naser Tajabadi, Faruku Bande*

http://arxiv.org/abs/1607.02974v1

Over the past decade there has been a significant growth in bioinformatics databases, tools and resources. Although, bioinformatics is becoming more specific, increasing the number of bioinformatics-wares has made it difficult for researchers to find the most appropriate databases, tools or methods which match their needs. Our coordinated effort has been planned to establish a reference website in Bioinformatics as a public repository of tools, databases, directories and resources annotated with contextual information and organized by functional relevance. Within the first phase of BioInfoBase development, 22 experts in different fields of molecular biology contributed and more than 2500 records were registered, which are increasing daily. For each record submitted to the database of website almost all related data (40 features) has been extracted. These include information from the biological category and subcategory to the scientific article and developer information. Searching the query keyword(s) returns links containing the entered keyword(s) found within the different features of the records with more weights on the title, abstract and application fields. The search results simply provide the users with the most informative features of the records to select the most suitable ones. The usefulness of the returned results is ranked according to the matching score based on the Term Frequency-Inverse Document Frequency (TF-IDF) methods. Therefore, this search engine will screen a comprehensive index of bioinformatics tools, databases and resources and provide the best suited records (links) to the researchers need. The BioInfoBase resource is available at www.bioinfobase.info.

• [cs.GT]**From Behavior to Sparse Graphical Games: Efficient Recovery of Equilibria**

*Asish Ghoshal, Jean Honorio*

http://arxiv.org/abs/1607.02959v1

In this paper we study the problem of exact recovery of the pure-strategy Nash equilibria (PSNE) set of a graphical game from noisy observations of joint actions of the players alone. We consider sparse linear influence games --- a parametric class of graphical games with linear payoffs, and represented by directed graphs of n nodes (players) and in-degree of at most k. We present an $\ell_1$-regularized logistic regression based algorithm for recovering the PSNE set exactly, that is both computationally efficient --- i.e. runs in polynomial time --- and statistically efficient --- i.e. has logarithmic sample complexity. Specifically, we show that the sufficient number of samples required for exact PSNE recovery scales as $\mathcal{O}(\mathrm{poly}(k) \log n)$. We also validate our theoretical results using synthetic experiments.

• [cs.HC]**Augmenting Supervised Emotion Recognition with Rule-Based Decision Model**

*Amol Patwardhan, Gerald Knapp*

http://arxiv.org/abs/1607.02660v1

The aim of this research is development of rule based decision model for emotion recognition. This research also proposes using the rules for augmenting inter-corporal recognition accuracy in multimodal systems that use supervised learning techniques. The classifiers for such learning based recognition systems are susceptible to over fitting and only perform well on intra-corporal data. To overcome the limitation this research proposes using rule based model as an additional modality. The rules were developed using raw feature data from visual channel, based on human annotator agreement and existing studies that have attributed movement and postures to emotions. The outcome of the rule evaluations was combined during the decision phase of emotion recognition system. The results indicate rule based emotion recognition augment recognition accuracy of learning based systems and also provide better recognition rate across inter corpus emotion test data.

• [cs.HC]**Multimodal Affect Recognition using Kinect**

*Amol Patwardhan, Gerald Knapp*

http://arxiv.org/abs/1607.02652v1

Affect (emotion) recognition has gained significant attention from researchers in the past decade. Emotion-aware computer systems and devices have many applications ranging from interactive robots, intelligent online tutor to emotion based navigation assistant. In this research data from multiple modalities such as face, head, hand, body and speech was utilized for affect recognition. The research used color and depth sensing device such as Kinect for facial feature extraction and tracking human body joints. Temporal features across multiple frames were used for affect recognition. Event driven decision level fusion was used to combine the results from each individual modality using majority voting to recognize the emotions. The study also implemented affect recognition by matching the features to the rule based emotion templates per modality. Experiments showed that multimodal affect recognition rates using combination of emotion templates and supervised learning were better compared to recognition rates based on supervised learning alone. Recognition rates obtained using temporal feature were higher compared to recognition rates obtained using position based features only.

• [cs.IR]**Hybrid Recommender System Based on Personal Behavior Mining**

*Zhiyuan Fang, Lingqi Zhang, Kun Chen*

http://arxiv.org/abs/1607.02754v1

Recommender systems are mostly well known for their applications in e-commerce sites and are mostly static models. Classical personalized recommender algorithm includes item-based collaborative filtering method applied in Amazon, matrix factorization based collaborative filtering algorithm from Netflix, etc. In this article, we hope to combine traditional model with behavior pattern extraction method. We use desensitized mobile transaction record provided by T-mall, Alibaba to build a hybrid dynamic recommender system. The sequential pattern mining aims to find frequent sequential pattern in sequence database and is applied in this hybrid model to predict customers' payment behavior thus contributing to the accuracy of the model.

• [cs.IR]**Randomised Relevance Model**

*Dominik Wurzer, Miles Osborne, Victor Lavrenko*

http://arxiv.org/abs/1607.02641v1

Relevance Models are well-known retrieval models and capable of producing competitive results. However, because they use query expansion they can be very slow. We address this slowness by incorporating two variants of locality sensitive hashing (LSH) into the query expansion process. Results on two document collections suggest that we can obtain large reductions in the amount of work, with a small reduction in effectiveness. Our approach is shown to be additive when pruning query terms.

• [cs.IT]**Bounds on the Number of Measurements for Reliable Compressive Classification**

*Hugo Reboredo, Francesco Renna, Robert Calderbank, Miguel R. D. Rodrigues*

http://arxiv.org/abs/1607.02801v1

This paper studies the classification of high-dimensional Gaussian signals from low-dimensional noisy, linear measurements. In particular, it provides upper bounds (sufficient conditions) on the number of measurements required to drive the probability of misclassification to zero in the low-noise regime, both for random measurements and designed ones. Such bounds reveal two important operational regimes that are a function of the characteristics of the source: i) when the number of classes is less than or equal to the dimension of the space spanned by signals in each class, reliable classification is possible in the low-noise regime by using a one-vs-all measurement design; ii) when the dimension of the spaces spanned by signals in each class is lower than the number of classes, reliable classification is guaranteed in the low-noise regime by using a simple random measurement design. Simulation results both with synthetic and real data show that our analysis is sharp, in the sense that it is able to gauge the number of measurements required to drive the misclassification probability to zero in the low-noise regime.

• [cs.IT]**Linear signal recovery from $b$-bit-quantized linear measurements: precise analysis of the trade-off between bit depth and number of measurements**

*Martin Slawski, Ping Li*

http://arxiv.org/abs/1607.02649v1

We consider the problem of recovering a high-dimensional structured signal from independent Gaussian linear measurements each of which is quantized to $b$ bits. Our interest is in linear approaches to signal recovery, where "linear" means that non-linearity resulting from quantization is ignored and the observations are treated as if they arose from a linear measurement model. Specifically, the focus is on a generalization of a method for one-bit observations due to Plan and Vershynin [\emph{IEEE~Trans. Inform. Theory, \textbf{59} (2013), 482--494}]. At the heart of the present paper is a precise characterization of the optimal trade-off between the number of measurements $m$ and the bit depth per measurement $b$ given a total budget of $B = m \cdot b$ bits when the goal is to minimize the $\ell_2$-error in estimating the signal. It turns out that the choice $b = 1$ is optimal for estimating the unit vector (direction) corresponding to the signal for any level of additive Gaussian noise before quantization as well as for a specific model of adversarial noise, while the choice $b = 2$ is optimal for estimating the direction and the norm (scale) of the signal. Moreover, Lloyd-Max quantization is shown to be an optimal quantization scheme w.r.t. $\ell_2$-estimation error. Our analysis is corroborated by numerical experiments showing nearly perfect agreement with our theoretical predictions. The paper is complemented by an empirical comparison to alternative methods of signal recovery taking the non-linearity resulting from quantization into account. The results of that comparison point to a regime change depending on the noise level: in a low-noise setting, linear signal recovery falls short of more sophisticated competitors while being competitive in moderate- and high-noise settings.

• [cs.IT]**Minimum Description Length Principle in Supervised Learning with Application to Lasso**

*Masanori Kawakita, Jun'ichi Takeuchi*

http://arxiv.org/abs/1607.02914v1

The minimum description length (MDL) principle in supervised learning is studied. One of the most important theories for the MDL principle is Barron and Cover's theory (BC theory), which gives a mathematical justification of the MDL principle. The original BC theory, however, can be applied to supervised learning only approximately and limitedly. Though Barron et al. recently succeeded in removing a similar approximation in case of unsupervised learning, their idea cannot be essentially applied to supervised learning in general. To overcome this issue, an extension of BC theory to supervised learning is proposed. The derived risk bound has several advantages inherited from the original BC theory. First, the risk bound holds for finite sample size. Second, it requires remarkably few assumptions. Third, the risk bound has a form of redundancy of the two-stage code for the MDL procedure. Hence, the proposed extension gives a mathematical justification of the MDL principle to supervised learning like the original BC theory. As an important example of application, new risk and (probabilistic) regret bounds of lasso with random design are derived. The derived risk bound holds for any finite sample size $n$ and feature number $p$ even if $n\ll p$ without boundedness of features in contrast to the past work. Behavior of the regret bound is investigated by numerical simulations. We believe that this is the first extension of BC theory to general supervised learning with random design without approximation.

• [cs.IT]**New approach to Bayesian high-dimensional linear regression**

*Shirin Jalali, Arian Maleki*

http://arxiv.org/abs/1607.02613v1

Consider the problem of estimating parameters $X^n \in \mathbb{R}^n $, generated by a stationary process, from $m$ response variables $Y^m = AX

^{n+Z}m$, under the assumption that the distribution of $X^n$ is known. This is the most general version of the Bayesian linear regression problem. The lack of computationally feasible algorithms that can employ generic prior distributions and provide a good estimate of $X^n$ has limited the set of distributions researchers use to model the data. In this paper, a new scheme called Q-MAP is proposed. The new method has the following properties: (i) It has similarities to the popular MAP estimation under the noiseless setting. (ii) In the noiseless setting, it achieves the "asymptotically optimal performance" when $X^n$ has independent and identically distributed components. (iii) It scales favorably with the dimensions of the problem and therefore is applicable to high-dimensional setups. (iv) The solution of the Q-MAP optimization can be found via a proposed iterative algorithm which is provably robust to the error (noise) in the response variables.

• [cs.LG]**Classifier Risk Estimation under Limited Labeling Resources**

*Anurag Kumar, Bhiksha Raj*

http://arxiv.org/abs/1607.02665v1

In this paper we propose strategies for estimating performance of a classifier when labels cannot be obtained for the whole test set. The number of test instances which can be labeled is very small compared to the whole test data size. The goal then is to obtain a precise estimate of classifier performance using as little labeling resource as possible. Specifically, we try to answer, how to select a subset of the large test set for labeling such that the performance of a classifier estimated on this subset is as close as possible to the one on the whole test set. We propose strategies based on stratified sampling for selecting this subset. We show that these strategies can reduce the variance in estimation of classifier accuracy by a significant amount compared to simple random sampling (over 65% in several cases). Hence, our proposed methods are much more precise compared to random sampling for accuracy estimation under restricted labeling resources. The reduction in number of samples required (compared to random sampling) to estimate the classifier accuracy with only 1% error is high as 60% in some cases.

• [cs.LG]**Dealing with Class Imbalance using Thresholding**

*Charmgil Hong, Rumi Ghosh, Soundar Srinivasan*

http://arxiv.org/abs/1607.02705v1

We propose thresholding as an approach to deal with class imbalance. We define the concept of thresholding as a process of determining a decision boundary in the presence of a tunable parameter. The threshold is the maximum value of this tunable parameter where the conditions of a certain decision are satisfied. We show that thresholding is applicable not only for linear classifiers but also for non-linear classifiers. We show that this is the implicit assumption for many approaches to deal with class imbalance in linear classifiers. We then extend this paradigm beyond linear classification and show how non-linear classification can be dealt with under this umbrella framework of thresholding. The proposed method can be used for outlier detection in many real-life scenarios like in manufacturing. In advanced manufacturing units, where the manufacturing process has matured over time, the number of instances (or parts) of the product that need to be rejected (based on a strict regime of quality tests) becomes relatively rare and are defined as outliers. How to detect these rare parts or outliers beforehand? How to detect combination of conditions leading to these outliers? These are the questions motivating our research. This paper focuses on prediction of outliers and conditions leading to outliers using classification. We address the problem of outlier detection using classification. The classes are good parts (those passing the quality tests) and bad parts (those failing the quality tests and can be considered as outliers). The rarity of outliers transforms this problem into a class-imbalanced classification problem.

• [cs.LG]**Incremental Factorization Machines for Persistently Cold-starting Online Item Recommendation**

*Takuya Kitazawa*

http://arxiv.org/abs/1607.02858v1

Real-world item recommenders commonly suffer from a persistent cold-start problem which is caused by dynamically changing users and items. In order to overcome the problem, several context-aware recommendation techniques have been recently proposed. In terms of both feasibility and performance, factorization machine (FM) is one of the most promising methods as generalization of the conventional matrix factorization techniques. However, since online algorithms are suitable for dynamic data, the static FMs are still inadequate. Thus, this paper proposes incremental FMs (iFMs), a general online factorization framework, and specially extends iFMs into an online item recommender. The proposed framework can be a promising baseline for further development of the production recommender systems. Evaluation is done empirically both on synthetic and real-world unstable datasets.

• [cs.LG]**Kernel-based methods for bandit convex optimization**

*Sébastien Bubeck, Ronen Eldan, Yin Tat Lee*

http://arxiv.org/abs/1607.03084v1

We consider the adversarial convex bandit problem and we build the first $\mathrm{poly}(T)$-time algorithm with $\mathrm{poly}(n) \sqrt{T}$-regret for this problem. To do so we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves $\tilde{O}(n^{9.5} \sqrt{T})$-regret, and we show that a simple variant of this algorithm can be run in $\mathrm{poly}(n \log(T))$-time per step at the cost of an additional $\mathrm{poly}(n) T^{o(1)}$ factor in the regret. These results improve upon the $\tilde{O}(n^{11} \sqrt{T})$-regret and $\exp(\mathrm{poly}(T))$-time result of the first two authors, and the $\log(T)^{\mathrm{poly}(n)} \sqrt{T}$-regret and $\log(T)^{\mathrm{poly}(n)}$-time result of Hazan and Li. Furthermore we conjecture that another variant of the algorithm could achieve $\tilde{O}(n^{1.5} \sqrt{T})$-regret, and moreover that this regret is unimprovable (the current best lower bound being $\Omega(n \sqrt{T})$ and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order $n^3 / \epsilon^2$.

• [cs.LG]**Learning a metric for class-conditional KNN**

*Daniel Jiwoong Im, Graham W. Taylor*

http://arxiv.org/abs/1607.03050v1

Naive Bayes Nearest Neighbour (NBNN) is a simple and effective framework which addresses many of the pitfalls of K-Nearest Neighbour (KNN) classification. It has yielded competitive results on several computer vision benchmarks. Its central tenet is that during NN search, a query is not compared to every example in a database, ignoring class information. Instead, NN searches are performed within each class, generating a score per class. A key problem with NN techniques, including NBNN, is that they fail when the data representation does not capture perceptual (e.g.~class-based) similarity. NBNN circumvents this by using independent engineered descriptors (e.g.~SIFT). To extend its applicability outside of image-based domains, we propose to learn a metric which captures perceptual similarity. Similar to how Neighbourhood Components Analysis optimizes a differentiable form of KNN classification, we propose "Class Conditional" metric learning (CCML), which optimizes a soft form of the NBNN selection rule. Typical metric learning algorithms learn either a global or local metric. However, our proposed method can be adjusted to a particular level of locality by tuning a single parameter. An empirical evaluation on classification and retrieval tasks demonstrates that our proposed method clearly outperforms existing learned distance metrics across a variety of image and non-image datasets.

• [cs.LG]**Learning from Multiway Data: Simple and Efficient Tensor Regression**

*Rose Yu, Yan Liu*

http://arxiv.org/abs/1607.02535v1

Tensor regression has shown to be advantageous in learning tasks with multi-directional relatedness. Given massive multiway data, traditional methods are often too slow to operate on or suffer from memory bottleneck. In this paper, we introduce subsampled tensor projected gradient to solve the problem. Our algorithm is impressively simple and efficient. It is built upon projected gradient method with fast tensor power iterations, leveraging randomized sketching for further acceleration. Theoretical analysis shows that our algorithm converges to the correct solution in fixed number of iterations. The memory requirement grows linearly with the size of the problem. We demonstrate superior empirical performance on both multi-linear multi-task learning and spatio-temporal applications.

• [cs.LG]**Online Learning Schemes for Power Allocation in Energy Harvesting Communications**

*Pranav Sakulkar, Bhaskar Krishnamachari*

http://arxiv.org/abs/1607.02552v1

We consider the problem of power allocation over a time-varying channel with unknown distribution in energy harvesting communication systems. In this problem, the transmitter has to choose the transmit power based on the amount of stored energy in its battery with the goal of maximizing the average rate obtained over time. We model this problem as a Markov decision process (MDP) with the transmitter as the agent, the battery status as the state, the transmit power as the action and the rate obtained as the reward. The average reward maximization problem over the MDP can be solved by a linear program (LP) that uses the transition probabilities for the state-action pairs and their reward values to choose a power allocation policy. Since the rewards associated the state-action pairs are unknown, we propose two online learning algorithms: UCLP and Epoch-UCLP that learn these rewards and adapt their policies along the way. The UCLP algorithm solves the LP at each step to decide its current policy using the upper confidence bounds on the rewards, while the Epoch-UCLP algorithm divides the time into epochs, solves the LP only at the beginning of the epochs and follows the obtained policy in that epoch. We prove that the reward losses or regrets incurred by both these algorithms are upper bounded by constants. Epoch-UCLP incurs a higher regret compared to UCLP, but reduces the computational requirements substantially. We also show that the presented algorithms work for online learning in cost minimization problems like the packet scheduling with power-delay tradeoff with minor changes.

• [cs.LG]**Recurrent Memory Array Structures**

*Kamil Rocki*

http://arxiv.org/abs/1607.03085v1

The following report introduces ideas augmenting standard Long Short Term Memory (LSTM) architecture with multiple memory cells per hidden unit in order to improve its generalization capabilities. It considers both deterministic and stochastic variants of memory operation. It is shown that the nondeterministic Array-LSTM approach improves state-of-the-art performance on character level text prediction achieving 1.402 BPC on enwik8 dataset. Furthermore, this report estabilishes baseline neural-based results of 1.12 BPC and 1.19 BPC for enwik9 and enwik10 datasets respectively.

• [cs.LG]**Tight Lower Bounds for Multiplicative Weights Algorithmic Families**

*Nick Gravin, Yuval Peres, Balasubramanian Sivan*

http://arxiv.org/abs/1607.02834v1

We study the fundamental problem of prediction with expert advice and develop regret lower bounds for a large family of algorithms for this problem. We develop simple adversarial primitives, that lend themselves to various combinations leading to sharp lower bounds for many algorithmic families. We use these primitives to show that the classic Multiplicative Weights Algorithm (MWA) has a regret of $\sqrt{\frac{T \ln k}{2}}$, there by completely closing the gap between upper and lower bounds. We further show a regret lower bound of $\frac{2}{3}\sqrt{\frac{T\ln k}{2}}$ for a much more general family of algorithms than MWA, where the learning rate can be arbitrarily varied over time, or even picked from arbitrary distributions over time. We also use our primitives to construct adversaries in the geometric horizon setting for MWA to precisely characterize the regret at $\frac{0.391}{\sqrt{\delta}}$ for the case of $2$ experts and a lower bound of $\frac{1}{2}\sqrt{\frac{\ln k}{2\delta}}$ for the case of arbitrary number of experts $k$.

• [cs.LG]**Uncovering Locally Discriminative Structure for Feature Analysis**

*Sen Wang, Feiping Nie, Xiaojun Chang, Xue Li, Quan Z. Sheng, Lina Yao*

http://arxiv.org/abs/1607.02559v1

Manifold structure learning is often used to exploit geometric information among data in semi-supervised feature learning algorithms. In this paper, we find that local discriminative information is also of importance for semi-supervised feature learning. We propose a method that utilizes both the manifold structure of data and local discriminant information. Specifically, we define a local clique for each data point. The k-Nearest Neighbors (kNN) is used to determine the structural information within each clique. We then employ a variant of Fisher criterion model to each clique for local discriminant evaluation and sum all cliques as global integration into the framework. In this way, local discriminant information is embedded. Labels are also utilized to minimize distances between data from the same class. In addition, we use the kernel method to extend our proposed model and facilitate feature learning in a high-dimensional space after feature mapping. Experimental results show that our method is superior to all other compared methods over a number of datasets.

• [cs.NA]**A Unified Alternating Direction Method of Multipliers by Majorization Minimization**

*Canyi Lu, Jiashi Feng, Shuicheng Yan, Zhouchen Lin*

http://arxiv.org/abs/1607.02584v1

Accompanied with the rising popularity of compressed sensing, the Alternating Direction Method of Multipliers (ADMM) has become the most widely used solver for linearly constrained convex problems with separable objectives. In this work, we observe that many previous variants of ADMM update the primal variable by minimizing different majorant functions with their convergence proofs given case by case. Inspired by the principle of majorization minimization, we respectively present the unified frameworks and convergence analysis for the Gauss-Seidel ADMMs and Jacobian ADMMs, which use different historical information for the current updating. Our frameworks further generalize previous ADMMs to the ones capable of solving the problems with non-separable objectives by minimizing their separable majorant surrogates. We also show that the bound which measures the convergence speed of ADMMs depends on the tightness of the used majorant function. Then several techniques are introduced to improve the efficiency of ADMMs by tightening the majorant functions. In particular, we propose the Mixed Gauss-Seidel and Jacobian ADMM (M-ADMM) which alleviates the slow convergence issue of Jacobian ADMMs by absorbing merits of the Gauss-Seidel ADMMs. M-ADMM can be further improved by using backtracking, wise variable partition and fully exploiting the structure of the constraint. Beyond the guarantee in theory, numerical experiments on both synthesized and real-world data further demonstrate the superiority of our new ADMMs in practice. Finally, we release a toolbox at https://github.com/canyilu/LibADMM that implements efficient ADMMs for many problems in compressed sensing.

• [cs.NA]**Inexact Block Coordinate Descent Methods For Symmetric Nonnegative Matrix Factorization**

*Qingjiang Shi, Haoran Sun, Songtao Lu, Mingyi Hong, Meisam Razaviyayn*

http://arxiv.org/abs/1607.03092v1

Symmetric nonnegative matrix factorization (SNMF) is equivalent to computing a symmetric nonnegative low rank approximation of a data similarity matrix. It inherits the good data interpretability of the well-known nonnegative matrix factorization technique and have better ability of clustering nonlinearly separable data. In this paper, we focus on the algorithmic aspect of the SNMF problem and propose simple inexact block coordinate decent methods to address the problem, leading to both serial and parallel algorithms. The proposed algorithms have guaranteed stationary convergence and can efficiently handle large-scale and/or sparse SNMF problems. Extensive simulations verify the effectiveness of the proposed algorithms compared to recent state-of-the-art algorithms.

• [cs.NA]**Proximal Quasi-Newton Methods for Convex Optimization**

*Hiva Ghanbari, Katya Scheinberg*

http://arxiv.org/abs/1607.03081v1

In [19], a general, inexact, efficient proximal quasi-Newton algorithm for composite optimization problems has been proposed and a sublinear global convergence rate has been established. In this paper, we analyze the convergence properties of this method, both in the exact and inexact setting, in the case when the objective function is strongly convex. We also investigate a practical variant of this method by establishing a simple stopping criterion for the subproblem optimization. Furthermore, we consider an accelerated variant, based on FISTA [1], to the proximal quasi-Newton algorithm. A similar accelerated method has been considered in [7], where the convergence rate analysis relies on very strong impractical assumptions. We present a modified analysis while relaxing these assumptions and perform a practical comparison of the accelerated proximal quasi- Newton algorithm and the regular one. Our analysis and computational results show that acceleration may not bring any benefit in the quasi-Newton setting.

• [cs.NE]**Classifying Variable-Length Audio Files with All-Convolutional Networks and Masked Global Pooling**

*Lars Hertel, Huy Phan, Alfred Mertins*

http://arxiv.org/abs/1607.02857v1

We trained a deep all-convolutional neural network with masked global pooling to perform single-label classification for acoustic scene classification and multi-label classification for domestic audio tagging in the DCASE-2016 contest. Our network achieved an average accuracy of 84.5% on the four-fold cross-validation for acoustic scene recognition, compared to the provided baseline of 72.5%, and an average equal error rate of 0.17 for domestic audio tagging, compared to the baseline of 0.21. The network therefore improves the baselines by a relative amount of 17% and 19%, respectively. The network only consists of convolutional layers to extract features from the short-time Fourier transform and one global pooling layer to combine those features. It particularly possesses neither fully-connected layers, besides the fully-connected output layer, nor dropout layers.

• [cs.NE]**Forward Table-Based Presynaptic Event-Triggered Spike-Timing-Dependent Plasticity**

*Bruno U. Pedroni, Sadique Sheik, Siddharth Joshi, Georgios Detorakis, Somnath Paul, Charles Augustine, Emre Neftci, Gert Cauwenberghs*

http://arxiv.org/abs/1607.03070v1

Spike-timing-dependent plasticity (STDP) incurs both causal and acausal synaptic weight updates, for negative and positive time differences between pre-synaptic and post-synaptic spike events. For realizing such updates in neuromorphic hardware, current implementations either require forward and reverse lookup access to the synaptic connectivity table, or rely on memory-intensive architectures such as crossbar arrays. We present a novel method for realizing both causal and acausal weight updates using only forward lookup access of the synaptic connectivity table, permitting memory-efficient implementation. A simplified implementation in FPGA, using a single timer variable for each neuron, closely approximates exact STDP cumulative weight updates for neuron refractory periods greater than 10 ms, and reduces to exact STDP for refractory periods greater than the STDP time window. Compared to conventional crossbar implementation, the forward table-based implementation leads to substantial memory savings for sparsely connected networks supporting scalable neuromorphic systems with fully reconfigurable synaptic connectivity and plasticity.

• [cs.NE]**The BioDynaMo Project**

*Roman Bauer, Lukas Breitwieser, Alberto Di Meglio, Leonard Johard, Marcus Kaiser, Marco Manca, Manuel Mazzara, Max Talanov*

http://arxiv.org/abs/1607.02717v1

Computer simulations have become a very powerful tool for scientific research. Given the vast complexity that comes with many open scientific questions, a purely analytical or experimental approach is often not viable. For example, biological systems (such as the human brain) comprise an extremely complex organization and heterogeneous interactions across different spatial and temporal scales. In order to facilitate research on such problems, the BioDynaMo project (\url{https://biodynamo.web.cern.ch/}) aims at a general platform for computer simulations for biological research. Since the scientific investigations require extensive computer resources, this platform should be executable on hybrid cloud computing systems, allowing for the efficient use of state-of-the-art computing technology. This paper describes challenges during the early stages of the software development process. In particular, we describe issues regarding the implementation and the highly interdisciplinary as well as international nature of the collaboration. Moreover, we explain the methodologies, the approach, and the lessons learnt by the team during these first stages.

• [cs.PL]**sk_p: a neural program corrector for MOOCs**

*Yewen Pu, Karthik Narasimhan, Armando Solar-Lezama, Regina Barzilay*

http://arxiv.org/abs/1607.02902v1

We present a novel technique for automatic program correction in MOOCs, capable of fixing both syntactic and semantic errors without manual, problem specific correction strategies. Given an incorrect student program, it generates candidate programs from a distribution of likely corrections, and checks each candidate for correctness against a test suite. The key observation is that in MOOCs many programs share similar code fragments, and the seq2seq neural network model, used in the natural-language processing task of machine translation, can be modified and trained to recover these fragments. Experiment shows our scheme can correct 29% of all incorrect submissions and out-performs state of the art approach which requires manual, problem specific correction strategies.

• [cs.SI]**Are human interactivity times lognormal?**

*Norbert Blenn, Piet Van Mieghem*

http://arxiv.org/abs/1607.02952v1

In this paper, we are analyzing the interactivity time, defined as the duration between two consecutive tasks such as sending emails, collecting friends and followers and writing comments in online social networks (OSNs). The distributions of these times are heavy tailed and often described by a power-law distribution. However, power-law distributions usually only fit the heavy tail of empirical data and ignore the information in the smaller value range. Here, we argue that the durations between writing emails or comments, adding friends and receiving followers are likely to follow a lognormal distribution. We discuss the similarities between power-law and lognormal distributions, show that binning of data can deform a lognormal to a power-law distribution and propose an explanation for the appearance of lognormal interactivity times. The historical debate of similarities between lognormal and power-law distributions is reviewed by illustrating the resemblance of measurements in this paper with the historical problem of income and city size distributions.

• [cs.SI]**Learning from the News: Predicting Entity Popularity on Twitter**

*Pedro Saleiro, Carlos Soares*

http://arxiv.org/abs/1607.03057v1

In this work, we tackle the problem of predicting entity popularity on Twitter based on the news cycle. We apply a supervised learn- ing approach and extract four types of features: (i) signal, (ii) textual, (iii) sentiment and (iv) semantic, which we use to predict whether the popularity of a given entity will be high or low in the following hours. We run several experiments on six different entities in a dataset of over 150M tweets and 5M news and obtained F1 scores over 0.70. Error analysis indicates that news perform better on predicting entity popularity on Twitter when they are the primary information source of the event, in opposition to events such as live TV broadcasts, political debates or football matches.

• [cs.SI]**Privacy Leakage through Innocent Content Sharing in Online Social Networks**

*Maria Han Veiga, Carsten Eickhoff*

http://arxiv.org/abs/1607.02714v1

The increased popularity and ubiquitous availability of online social networks and globalised Internet access have affected the way in which people share content. The information that users willingly disclose on these platforms can be used for various purposes, from building consumer models for advertising, to inferring personal, potentially invasive, information. In this work, we use Twitter, Instagram and Foursquare data to convey the idea that the content shared by users, especially when aggregated across platforms, can potentially disclose more information than was originally intended. We perform two case studies: First, we perform user de-anonymization by mimicking the scenario of finding the identity of a user making anonymous posts within a group of users. Empirical evaluation on a sample of real-world social network profiles suggests that cross-platform aggregation introduces significant performance gains in user identification. In the second task, we show that it is possible to infer physical location visits of a user on the basis of shared Twitter and Instagram content. We present an informativeness scoring function which estimates the relevance and novelty of a shared piece of information with respect to an inference task. This measure is validated using an active learning framework which chooses the most informative content at each given point in time. Based on a large-scale data sample, we show that by doing this, we can attain an improved inference performance. In some cases this performance exceeds even the use of the user's full timeline.

• [math.OC]**An Improved Convergence Analysis of Cyclic Block Coordinate Descent-type Methods for Strongly Convex Minimization**

*Xingguo Li, Tuo Zhao, Raman Arora, Han Liu, Mingyi Hong*

http://arxiv.org/abs/1607.02793v1

The cyclic block coordinate descent-type (CBCD-type) methods have shown remarkable computational performance for solving strongly convex minimization problems. Typical applications include many popular statistical machine learning methods such as elastic-net regression, ridge penalized logistic regression, and sparse additive regression. Existing optimization literature has shown that the CBCD-type methods attain iteration complexity of $O(p\cdot\log(1/\epsilon))$, where $\epsilon$ is a pre-specified accuracy of the objective value, and $p$ is the number of blocks. However, such iteration complexity explicitly depends on $p$, and therefore is at least $p$ times worse than those of gradient descent methods. To bridge this theoretical gap, we propose an improved convergence analysis for the CBCD-type methods. In particular, we first show that for a family of quadratic minimization problems, the iteration complexity of the CBCD-type methods matches that of the gradient descent methods in term of dependency on $p$ (up to a $\log^2 p$ factor). Thus our complexity bounds are sharper than the existing bounds by at least a factor of $p/\log^2p$. We also provide a lower bound to confirm that our improved complexity bounds are tight (up to a $\log^2 p$ factor) if the largest and smallest eigenvalues of the Hessian matrix do not scale with $p$. Finally, we generalize our analysis to other strongly convex minimization problems beyond quadratic ones

• [math.OC]**Beating level-set methods for 3D seismic data interpolation: a primal-dual alternating approach**

*Rajiv Kumar, Oscar López, Damek Davis, Aleksandr Y. Aravkin, Felix J. Herrmann*

http://arxiv.org/abs/1607.02624v1

Acquisition cost is a crucial bottleneck for seismic workflows, and low-rank formulations for data interpolation allow practitioners to `fill in' data volumes from critically subsampled data acquired in the field. Tremendous size of seismic data volumes required for seismic processing remains a major challenge for these techniques. We propose a new approach to solve residual constrained formulations for interpolation. We represent the data volume using matrix factors, and build a block-coordinate algorithm with constrained convex subproblems that are solved with a primal-dual splitting scheme. The new approach is competitive with state of the art level-set algorithms that interchange the role of objectives with constraints. We use the new algorithm to successfully interpolate a large scale 5D seismic data volume, generated from the geologically complex synthetic 3D Compass velocity model, where 80% of the data has been removed.

• [math.ST]**A data-driven method for improving the correlation estimation in serial ensemble Kalman filters**

*Michèle De La Chevrotière, John Harlim*

http://arxiv.org/abs/1607.02538v1

A data-driven method for improving the correlation estimation in serial ensemble Kalman filters is introduced. The method finds a linear map that transforms, at each assimilation cycle, the poorly estimated sample correlation into an improved correlation. This map is obtained from an offline training procedure without any tuning as the solution of a linear regression problem that uses appropriate sample correlation statistics obtained from historical data assimilation products. In an idealized OSSE with the Lorenz-96 model and for a range of cases of linear and nonlinear observation models, the proposed scheme improves the filter estimates, especially when the ensemble size is small relative to the dimension of the state space.

• [math.ST]**Barycentric Subspace Analysis on Manifolds**

*Xavier Pennec*

http://arxiv.org/abs/1607.02833v1

This paper investigates the generalization of Principal Component Analysis (PCA) to Riemannian manifolds. We first propose a new and more general type of family of subspaces in manifolds that we call barycen-tric subspaces. They are implicitly defined as the locus of points which are weighted means of k + 1 reference points. As this definition relies on points and not on tangent vectors, it can also be extended to geodesic spaces which are not Riemannian. For instance, in stratified spaces, it naturally allows principal subspaces that span several strata, which is impossible in previous generalizations of PCA. We show that barycentric subspaces locally define a submanifold of dimension k which generalizes geodesic subspaces. Second, we rephrase PCA in Euclidean spaces as an optimization on flags of linear subspaces (a hierarchy of properly embedded linear sub-spaces of increasing dimension). We show that the Euclidean PCA minimizes the sum of the unexplained variance by all the subspaces of the flag, also called the Area-Under-the-Curve (AUC) criterion. Barycentric subspaces are naturally nested, allowing the construction of hierarchically nested subspaces. Optimizing the AUC criterion to optimally approximate data points with flags of affine spans in Riemannian manifolds lead to a particularly appealing generalization of PCA on manifolds, that we call Barycentric Subspaces Analysis (BSA).

• [math.ST]**Beyond the Pearson correlation: heavy-tailed risks, weighted Gini correlations, and a Gini-type weighted insurance pricing model**

*Edward Furman, Ricardas Zitikis*

http://arxiv.org/abs/1607.02623v1

Gini-type correlation coefficients have become increasingly important in a variety of research areas, including economics, insurance and finance, where modelling with heavy-tailed distributions is of pivotal importance. In such situations, naturally, the classical Pearson correlation coefficient is of little use. On the other hand, it has been observed that when light-tailed situations are of interest, and hence when both the Gini-type and Pearson correlation coefficients are well-defined and finite, then these coefficients are related and sometimes even coincide. In general, understanding how the correlation coefficients above are related has been an illusive task. In this paper we put forward arguments that establish such a connection via certain regression-type equations. This, in turn, allows us to introduce a Gini-type Weighted Insurance Pricing Model that works in heavy-tailed situation and thus provides a natural alternative to the classical Capital Asset Pricing Model. We illustrate our theoretical considerations using several bivariate distributions, such as elliptical and those with heavy-tailed Pareto margins.

• [math.ST]**Conjugacy properties of time-evolving Dirichlet and gamma random measures**

*Omiros Papaspiliopoulos, Matteo Ruggiero, Dario Spanò*

http://arxiv.org/abs/1607.02896v1

We extend classic characterisations of posterior distributions under Dirichlet process and gamma random measures priors to a dynamic framework. We consider the problem of learning, from indirect observations, two families of time-dependent processes of interest in Bayesian nonparametrics: the first is a dependent Dirichlet process driven by a Fleming-Viot model, and the data are random samples from the process state at discrete times; the second is a collection of dependent gamma random measures driven by a Dawson-Watanabe model, and the data are collected according to a Poisson point process with intensity given by the process state at discrete times. Both driving processes are diffusions taking values in the space of discrete measures whose support varies with time, and are stationary and reversible with respect to Dirichlet and gamma priors respectively. A common methodology is developed to obtain in closed form the time-marginal posteriors given past and present data. These are shown to belong to classes of finite mixtures of Dirichlet processes and gamma random measures for the two models respectively, yielding conjugacy of these classes to the type of data we consider. We provide explicit results on the parameters of the mixture components and on the mixing weights, which are time-varying and drive the mixtures towards the respective priors in absence of further data. Explicit algorithms are provided to recursively compute the parameters of the mixtures. Our results are based on the projective properties of the signals and on certain duality properties of their projections.

• [math.ST]**Convergence of Multivariate Quantile Surfaces**

*Adil Ahidar-Coutrix, Philippe Berthet*

http://arxiv.org/abs/1607.02604v1

We define the quantile set of order $\alpha \in \left[ 1/2,1\right) $ associated to a law $P$ on $\mathbb{R}^{d}$ to be the collection of its directional quantiles seen from an observer $O\in \mathbb{R}^{d}$. Under minimal assumptions these star-shaped sets are closed surfaces, continuous in $(O,\alpha )$ and the collection of empirical quantile surfaces is uniformly consistent.\ Under mild assumptions -- no density or symmetry is required for $P$ -- our uniform central limit theorem reveals the correlations between quantile points and a non asymptotic Gaussian approximation provides joint confident enlarged quantile surfaces. Our main result is a dimension free rate $n^{-1/4} (\log n)^{1/2}(\log\log n) ^{1/4} $ of Bahadur-Kiefer embedding by the empirical process indexed by half-spaces. These limit theorems sharply generalize the univariate quantile convergences and fully characterize the joint behavior of Tukey half-spaces.

• [math.ST]**Integral form of the COM-Poisson normalization constant**

*Tibor K. Pogány*

http://arxiv.org/abs/1607.02727v1

In this brief note an integral expression is presented for the COM-Poisson renormalization constant $Z(\lambda, \nu)$ on the real axis.

• [math.ST]**MAGIC: a general, powerful and tractable method for selective inference**

*Xiaoying Tian, Nan Bi, Jonathan Taylor*

http://arxiv.org/abs/1607.02630v1

Selective inference is a recent research topic that tries to perform valid inference after using the data to select a reasonable statistical model. We propose MAGIC, a new method for selective inference that is general, powerful and tractable. MAGIC is a method for selective inference after solving a convex optimization problem with smooth loss and $\ell_1$ penalty. Randomization is incorporated into the optimization problem to boost statistical power. Through reparametrization, MAGIC reduces the problem into a sampling problem with simple constraints. MAGIC applies to many $\ell_1$ penalized optimization problem including the Lasso, logistic Lasso and neighborhood selection in graphical models, all of which we consider in this paper.

• [math.ST]**On the Unique Crossing Conjecture of Diaconis and Perlman on Convolutions of Gamma Random Variables**

*Yaming Yu*

http://arxiv.org/abs/1607.02689v1

Diaconis and Perlman (1990) conjecture that the distribution functions of two weighted sums of iid gamma random variables cross exactly once if one weight vector majorizes the other. We disprove this conjecture when the shape parameter of the gamma variates is $\alpha <1$ and prove it when $\alpha\geq 1$.

• [physics.ao-ph]**Evaluating Effectiveness of DART Buoy Networks**

*Donald B. Percival, Donald W. Denbo, Edison Gica, Paul Y. Huang, Harold O. Mofjeld, Michael C. Spillane, Vasily V. Titov*

http://arxiv.org/abs/1607.02795v1

A performance measure for a DART tsunami buoy network has been developed. The measure is based on a statistical analysis of simulated forecasts of wave heights outside an impact site and how much the forecasts are degraded in accuracy when one or more buoys are inoperative. The analysis uses simulated tsunami height time series collected at each buoy from selected source segments in the Short-term Inundation Forecast for Tsunamis (SIFT) database and involves a set for 1000 forecasts for each buoy/segment pair at sites just offshore of selected impact communities. Random error-producing scatter in the time series is induced by uncertainties in the source location, addition of real oceanic noise, and imperfect tidal removal. Comparison with an error-free standard leads to root-mean-square errors (RMSEs) for DART buoys located near a subduction zone. The RMSEs indicate which buoy provides the best forecast (lowest RMSE) for sections of the zone, under a warning-time constraint for the forecasts of 3 hrs. The analysis also shows how the forecasts are degraded (larger minimum RMSE among the remaining buoys) when one or more buoys become inoperative. The RMSEs also provide a way to assess array augmentation or redesign such as moving buoys to more optimal locations. Examples are shown for buoys off the Aleutian Islands and off the West Coast of South America for impact sites at Hilo HI and along the U.S. West Coast (Crescent City CA and Port San Luis CA). A simple measure (coded green, yellow or red) of the current status of the network's ability to deliver accurate forecasts is proposed to flag the urgency of buoy repair.

• [physics.soc-ph]**Influence of temporal aspects and age-correlations on the process of opinion formation based on Polish contact survey**

*Andrzej Grabowski, Andrzej Jarynowski*

http://arxiv.org/abs/1607.02588v1

On the basis of the experimental data concerning interactions between humans the process of Ising-based model of opinion formation in a social network was investigated. In the paper the data concerning human social activity, i.e. frequency and duration time of interpersonal interactions as well as age correlations - homophily are presented in comparison to base line homogeneous, static and uniform mixing. It is known from previous studies that number of contact and average age of nearest neighbors are highly correlated with age of an individual. Such real, assortative patterns usually speed up processes (like epidemic spread) on the networks, but here it only plays a role for small social temperature values (by reducing `freezing by heating' effect). A real structure of contacts affects processes in many various studies in different way, however here it causes stronger (dynamic) and smoother (durations) susceptibility on external field. Moreover, our research shows that the cross interactions between contact frequency and its duration impose the significant increase in critical temperature.

• [physics.soc-ph]**Towards Limited Scale-free Topology with Dynamic Peer Participation**

*Xiaoyan Lu, Eyuphan Bulut, Boleslaw Szymanski*

http://arxiv.org/abs/1607.02733v1

Growth models have been proposed for constructing the scale-free overlay topology to improve the performance of unstructured peer-to-peer (P2P) networks. However, previous growth models are able to maintain the limited scale-free topology when nodes only join but do not leave the network; the case of nodes leaving the network while preserving a precise scaling parameter is not included in the solution. Thus, the full dynamic of node participation, inherent in P2P networks, is not considered in these models. In order to handle both nodes joining and leaving the network, we propose a robust growth model E-SRA, which is capable of producing the perfect limited scale-free overlay topology with user-defined scaling parameter and hard cut-offs. Scalability of our approach is ensured since no global information is required to add or remove a node. E-SRA is also tolerant to individual node failure caused by errors or attacks. Simulations have shown that E-SRA outperforms other growth models by producing topologies with high adherence to the desired scale-free property. Search algorithms, including flooding and normalized flooding, achieve higher efficiency over the topologies produced by E-SRA.

• [stat.AP]**A low-rank based estimation-testing procedure for matrix-covariate regression**

*Hung Hung, Zhi-Yu Jou*

http://arxiv.org/abs/1607.02957v1

Matrix-covariate is now frequently encountered in many biomedical researches. It is common to fit conventional statistical models by vectorizing matrix-covariate. This strategy, however, results in a large number of parameters, while the available sample size is relatively too small to have reliable analysis results. To overcome the problem of high-dimensionality in hypothesis testing, variance component test has been proposed with promise detection power, but is not straightforward to provide estimates of effect size. In this work, we overcome the problem of high-dimensionality by utilizing the inherent structure of the matrix-covariate. The advantage is that estimation and hypothesis testing can be conducted simultaneously as in the conventional case, while the estimation efficiency and detection power can be largely improved, due to a parsimonious parameterization for the coefficients of matrix-covariate. Our method is applied to test the significance of gene-gene interactions in the PSQI data, and is applied to test if electroencephalography is associated with the alcoholic status in the EEG data, wherein sparse effects and low-rank effects of matrix-covariates are identified, respectively.

• [stat.AP]**Approximate Bayesian Computation for Lorenz Curves from Grouped Data**

*Genya Kobayashi, Kazuhiko Kakamu*

http://arxiv.org/abs/1607.02735v1

There are two main approaches to estimate the Gini coefficient from grouped income data in the parametric framework: one fits a hypothetical statistical distribution for income and the other fits a specific functional form to the Lorenz curve. This paper proposes a new Bayesian approach to estimate the Gini coefficient from the Lorenz curve based on grouped data. The proposed approach assumes a hypothetical income distribution and estimates the parameter by directly working on the likelihood function implied by the Lorenz curve of the income distribution from the grouped data. It inherits the advantages of the two main approaches, because of which it can estimate the Gini coefficient more accurately and can provide a straightforward interpretation about the underlying income distribution. Since the likelihood function is implicitly defined, the approximate Bayesian computational approach based on the sequential Monte Carlo method is adopted. The usefulness of the proposed approach is illustrated through an extensive simulation study and real Japanese income data.

• [stat.AP]**Stochastic differential equation mixed effects models for tumor growth and response to treatment**

*Julie Lyng Forman, Umberto Picchini*

http://arxiv.org/abs/1607.02633v1

This paper aim at modeling the growth dynamics underlying the repeated measurements of tumor volumes in mice obtained from a tumor xenography study. We consider a two compartments representation corresponding to the fractions of tumor cells that have been killed by the treatment and survived it, respectively. Growth and elimination dynamics are modeled with stochastic differential equations. This results in a new biologically plausible stochastic differential equation mixed effects model (SDEMEM) for response to treatment and regrowth. Inference for SDEMEMs is notoriously challenging due to the intractable likelihood function. Methods for exact and approximate Bayesian inference for the model parameters are discussed. As a case study we consider experimental data from two treatment groups and one control, each consisting of 7-8 mice. We were able to estimate the model parameters, using both an exact Bayesian method (pseudo marginal approach, using sequential Monte Carlo) and an approximate method using the synthetic likelihoods approach. We believe this is the first application of synthetic likelihoods to SDEMEMs. Consistently for both methods, our model is able to identify a specific treatment to be the most effective in delaying tumor growth.

• [stat.CO]**Parallel local approximation MCMC for expensive models**

*Patrick Conrad, Andrew Davis, Youssef Marzouk, Natesh Pillai, Aaron Smith*

http://arxiv.org/abs/1607.02788v1

Performing Bayesian inference via Markov chain Monte Carlo (MCMC) can be exceedingly expensive when posterior evaluations invoke the evaluation of a computationally expensive model, such as a system of partial differential equations. In recent work [Conrad et al. JASA 2015, arXiv:1402.1694] we described a framework for constructing and refining local approximations of such models during an MCMC simulation. These posterior--adapted approximations harness regularity of the model to reduce the computational cost of inference while preserving asymptotic exactness of the Markov chain. Here we describe two extensions of that work. First, focusing on the Metropolis--adjusted Langevin algorithm, we describe how a proposal distribution can successfully employ gradients and other relevant information extracted from the approximation. Second, we prove that samplers running in parallel can collaboratively construct a shared posterior approximation while ensuring ergodicity of each associated chain, providing a novel opportunity for exploiting parallel computation in MCMC. We investigate the practical performance of our strategies using two challenging inference problems, the first in subsurface hydrology and the second in glaciology. Using local approximations constructed via parallel chains, we successfully reduce the run time needed to characterize the posterior distributions in these problems from days to hours and from months to days, respectively, dramatically improving the tractability of Bayesian inference.

• [stat.ME]**Approximate Bayesian computation (ABC) coupled with Bayesian model averaging method for estimating mean and standard deviation**

*Deukwoo Kwon, Isildinha M. Reis*

http://arxiv.org/abs/1607.03080v1

Background: We proposed approximate Bayesian computation with single distribution selection (ABC-SD) for estimating mean and standard deviation from other reported summary statistics. The ABC-SD generates pseudo data from a single parametric distribution thought to be the true distribution of underlying study data. This single distribution is either an educated guess, or it is selected via model selection using posterior probability criterion for testing two or more candidate distributions. Further analysis indicated that when model selection is used, posterior model probabilities are sensitive to the prior distribution(s) for parameter(s) and dependable on the type of reported summary statistics. Method: We propose ABC with Bayesian model averaging (ABC-BMA) methodology to estimate mean and standard deviation based on various sets of other summary statistics reported in published studies. We conduct a Monte Carlo simulation study to compare the new proposed ABC-BMA method with our previous ABC-SD method. Results: In the estimation of standard deviation, ABC-BMA has smaller average relative errors (AREs) than that of ABC-SD for normal, lognormal, beta, and exponential distributions. For Weibull distribution, ARE of ABC-BMA is larger than that of ABC-SD but <0.05 in small sample sizes and moves toward zero as sample size increases. When underlying distribution is highly skewed and available summary statistics are only quartiles and sample size, ABC-BMA is recommended but it should be used with caution. Comparison of mean estimation between ABC-BMA and ABC-SD shows similar patterns of results as for standard deviation estimation. Conclusion: ABC-BMA is easy to implement and it performs even better than our previous ABC-SD method for estimation of mean and standard deviation.

• [stat.ME]**Bayesian variable selection in high dimensional problems without assumptions on prior model probabilities**

*James O. Berger, Gonzalo Garcia-Donato, Miguel A. Martinez-Beneito, Victor Peña*

http://arxiv.org/abs/1607.02993v1

We consider the problem of variable selection in linear models when $p$, the number of potential regressors, may exceed (and perhaps substantially) the sample size $n$ (which is possibly small).

• [stat.ME]**Convex Relaxation for Community Detection with Covariates**

*Bowei Yan, Purnamrita Sarkar*

http://arxiv.org/abs/1607.02675v1

Community detection in networks is an important problem in many applied areas. In this paper, we investigate this in the presence of node covariates. Recently, an emerging body of theoretical work has been focused on leveraging information from both the edges in the network and the node covariates to infer community memberships. However, so far the role of the network and that of the covariates have not been examined closely. In essence, in most parameter regimes, one of the sources of information provides enough information to infer the hidden clusters, thereby making the other source redundant. To our knowledge, this is the first work which shows that when the network and the covariates carry "orthogonal" pieces of information about the cluster memberships, one can get asymptotically consistent clustering by using them both, while each of them fails individually.

• [stat.ME]**Discrete Choice Models for Nonmonotone Nonignorable Missing Data: Identification and Inference**

*Eric J. Tchetgen Tchetgen, Linbo Wang, BaoLuo Sun*

http://arxiv.org/abs/1607.02631v1

Nonmonotone missing data arise routinely in empirical studies of social and health sciences, and when ignored, can induce selection bias and loss of efficiency. In practice, it is common to account for nonresponse under a missing-at-random assumption which although often convenient, is rarely appropriate when nonresponse is nonmonotone. Likelihood and Bayesian missing data methodologies often require specification of a parametric model for the full data law, thus \textit{a priori} ruling out any prospect for semiparametric inference. In this paper, we propose an all-purpose approach that delivers semiparametric inferences when missing data are nonmonotone and missing-not-at-random. The approach is based on a discrete choice model (DCM) of the nonresponse process. DCMs have a longstanding tradition in the social sciences, as a principled approach for generating a large class of multinomial models to describe discrete choice decision making under rational utility maximization. In this paper, DCMs are used for a somewhat different purpose, as a means to generate a large class of nonmonotone nonresponse mechanisms that are nonignorable. Sufficient conditions for nonparametric identification are given, and a general framework for semiparametric inference under an arbitrary DCM is proposed. Special consideration is given to the case of logit discrete choice nonresponse model (LDCM). For inference under LDCM, analogous to existing methods when missing data are monotone and at random, we describe generalizations of inverse-probability weighting, pattern-mixture estimation, doubly robust estimation and multiply robust estimation. As we show, the proposed LDCM estimators are particularly attractive in their ease of implementation and perform quite well in finite samples in simulation studies and a data application.

• [stat.ME]**Estimating the number of species to attain sufficient representation in a random sample**

*Chao Deng, Andrew D. Smith*

http://arxiv.org/abs/1607.02804v1

The statistical problem of using an initial sample to estimate the number of unique species in a larger sample has found important applications in fields far removed from ecology. Here we address the general problem of estimating the number of species that will be represented at least r times, for any r greater or equal to 1, in a future sample. We derive a procedure to construct estimators that apply universally for a given population: once constructed, they can be evaluated as a simple function of r. Our approach is based on a relationship between the number of species represented at least r times and the higher derivatives of the number of unique species seen per unit of sampling. We further show the estimators retain asymptotic behaviors that are essential for applications on large-scale data sets and for large r. We validate practical performance of this approach by applying it to analyze Dickens' vocabulary, the topology of a Twitter social network, and DNA sequencing data.

• [stat.ME]**How to use empirical process for deriving asymptotic laws for functions of the sample**

*Gane Samb Lo*

http://arxiv.org/abs/1607.02745v1

The functional empirical process is a very powerful tool for deriving asymptotic laws for almost any kind of statistics whenever we know how to express them into functions of the sample. Since this method seems to be applied more and more in the very recent future, this paper is intended to a complete but short description and justification of the method and illustrate it with a non-trivial example using bivariate data. It may also serve for citation whithout repeating the arguments.

• [stat.ME]**Non-Concave Penalization in Linear Mixed-Effects Models and Regularized Selection of Fixed Effects**

*Abhik Ghosh, Magne Thoresen*

http://arxiv.org/abs/1607.02883v1

Mixed-effect models are very popular for analyzing data with a hierarchical structure, e.g. repeated observations within subjects in a longitudinal design, patients nested within centers in a multicenter design. However, recently, due to the medical advances, the number of fixed effect covariates collected from each patient can be quite large, e.g. data on gene expressions of each patient, and all of these variables are not necessarily important for the outcome. So, it is very important to choose the relevant covariates correctly for obtaining the optimal inference for the overall study. On the other hand, the relevant random effects will often be low-dimensional and pre-specified. In this paper, we consider regularized selection of important fixed effect variables in linear mixed-effects models along with maximum penalized likelihood estimation of both fixed and random effect parameters based on general non-concave penalties. Asymptotic and variable selection consistency with oracle properties are proved for low-dimensional cases as well as for high-dimensionality of non-polynomial order of sample size (number of parameters is much larger than sample size). We also provide a suitable computationally efficient algorithm for implementation. Additionally, all the theoretical results are proved for a general non-convex optimization problem that applies to several important situations well beyond the mixed model set-up (like finite mixture of regressions etc.) illustrating the huge range of applicability of our proposal.

• [stat.ME]**Pseudo-Marginal Hamiltonian Monte Carlo**

*Fredrik Lindsten, Arnaud Doucet*

http://arxiv.org/abs/1607.02516v1

Bayesian inference in the presence of an intractable likelihood function is computationally challenging. When following a Markov chain Monte Carlo (MCMC) approach to approximate the posterior distribution in this context, one typically either uses MCMC schemes which target the joint posterior of the parameters and some auxiliary latent variables or pseudo-marginal Metropolis-Hastings (MH) schemes which mimic a MH algorithm targeting the marginal posterior of the parameters by approximating unbiasedly the intractable likelihood. In scenarios where the parameters and auxiliary variables are strongly correlated under the posterior and/or this posterior is multimodal, Gibbs sampling or Hamiltonian Monte Carlo (HMC) will perform poorly and the pseudo-marginal MH algorithm, as any other MH scheme, will be inefficient for high dimensional parameters. We propose here an original MCMC algorithm, termed pseudo-marginal HMC, which approximates the HMC algorithm targeting the marginal posterior of the parameters. We demonstrate through experiments that pseudo-marginal HMC can outperform significantly both standard HMC and pseudo-marginal MH schemes.

• [stat.ME]**Robust estimation and inference for the local instrumental variable curve**

*Edward H. Kennedy, Scott A. Lorch, Dylan S. Small*

http://arxiv.org/abs/1607.02566v1

Instrumental variables are commonly used to estimate effects of a treatment afflicted by unmeasured confounding, and in practice instrumental variables are often continuous (e.g., measures of distance, or treatment preference). However, available methods for continuous instrumental variables have important limitations: they either require restrictive parametric assumptions for identification, or else rely on modeling both the outcome and treatment process well (and require modeling effect modification by all adjustment covariates). In this work we develop robust semiparametric estimators of a "local" effect curve among compliers, i.e., the effect among those who would take treatment for instrument values above some threshold and not below. The proposed methods do not require parametric assumptions, incorporate information about the instrument mechanism, allow for flexible data-adaptive estimation of effect modification by covariate subsets, and are robust to misspecification of either the instrument or treatment/outcome processes (i.e., are doubly robust). We discuss asymptotic properties under weak conditions, and use the methods to study infant mortality effects of neonatal intensive care units with high versus low technical capacity, using travel time as an instrument.

• [stat.ME]**Scalable Bayesian modeling, monitoring and analysis of dynamic network flow data**

*Xi Chen, Kaoru Irie, David Banks, Robert Haslinger, Jewell Thomas, Mike West*

http://arxiv.org/abs/1607.02655v1

Traffic flow count data in networks arise in many applications, such as automobile or aviation transportation, certain directed social network contexts, and Internet studies. Using an example of Internet browser traffic flow through site-segments of an international news website, we present Bayesian analyses of two linked classes of models which, in tandem, allow fast, scalable and interpretable Bayesian inference. We first develop flexible state-space models for streaming count data, able to adaptively characterize and quantify network dynamics efficiently in real-time. We then use these models as emulators of more structured, time-varying gravity models that allow formal dissection of network dynamics. This yields interpretable inferences on traffic flow characteristics, and on dynamics in interactions among network nodes. Bayesian monitoring theory defines a strategy for sequential model assessment and adaptation in cases when network flow data deviates from model-based predictions. Exploratory and sequential monitoring analyses of evolving traffic on a network of web site-segments in e-commerce demonstrate the utility of this coupled Bayesian emulation approach to analysis of streaming network count data.

• [stat.ME]**Shared Subspace Models for Multi-Group Covariance Estimation**

*Alexander Franks, Peter Hoff*

http://arxiv.org/abs/1607.03045v1

We develop a model-based method for evaluating heterogeneity among several p x p covariance matrices in the large p, small n setting. This is done by assuming a spiked covariance model for each group and sharing information about the space spanned by the group-level eigenvectors. We use an empirical Bayes method to identify a low-dimensional subspace which explains variation across all groups and use an MCMC algorithm to estimate the posterior uncertainty of eigenvectors and eigenvalues on this subspace. The implementation and utility of our model is illustrated with analyses of high-dimensional multivariate gene expression and metabolomics data.

• [stat.ML]**Bayesian quantile additive regression trees**

*Bereket P. Kindo, Hao Wang, Timothy Hanson, Edsel A. Peña*

http://arxiv.org/abs/1607.02676v1

Ensemble of regression trees have become popular statistical tools for the estimation of conditional mean given a set of predictors. However, quantile regression trees and their ensembles have not yet garnered much attention despite the increasing popularity of the linear quantile regression model. This work proposes a Bayesian quantile additive regression trees model that shows very good predictive performance illustrated using simulation studies and real data applications. Further extension to tackle binary classification problems is also considered.

• [stat.ML]**Magnetic Hamiltonian Monte Carlo**

*Nilesh Tripuraneni, Mark Rowland, Zoubin Ghahramani, Richard Turner*

http://arxiv.org/abs/1607.02738v1

Hamiltonian Monte Carlo (HMC) exploits Hamiltonian dynamics to construct efficient proposals for Markov chain Monte Carlo (MCMC). In this paper, we present a generalization of HMC which exploits \textit{non-canonical} Hamiltonian dynamics. We refer to this algorithm as magnetic HMC, since in 3 dimensions a subset of the dynamics map onto the mechanics of a charged particle coupled to a magnetic field. We establish a theoretical basis for the use of non-canonical Hamiltonian dynamics in MCMC, and construct a symplectic, leapfrog-like integrator allowing for the implementation of magnetic HMC. Finally, we exhibit several examples where these non-canonical dynamics can lead to improved mixing of magnetic HMC relative to ordinary HMC.

• [stat.ML]**Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016)**

*Been Kim, Dmitry M. Malioutov, Kush R. Varshney*

http://arxiv.org/abs/1607.02531v1

This is the Proceedings of the 2016 ICML Workshop on Human Interpretability in Machine Learning (WHI 2016), which was held in New York, NY, June 23, 2016. Invited speakers were Susan Athey, Rich Caruana, Jacob Feldman, Percy Liang, and Hanna Wallach.

• [stat.ML]**Retrospective Causal Inference with Machine Learning Ensembles: An Application to Anti-Recidivism Policies in Colombia**

*Cyrus Samii, Laura Paler, Sarah Zukerman Daly*

http://arxiv.org/abs/1607.03026v1

We present new methods to estimate causal effects retrospectively from micro data with the assistance of a machine learning ensemble. This approach overcomes two important limitations in conventional methods like regression modeling or matching: (i) ambiguity about the pertinent retrospective counterfactuals and (ii) potential misspecification, overfitting, and otherwise bias-prone or inefficient use of a large identifying covariate set in the estimation of causal effects. Our method targets the analysis toward a well defined ``retrospective intervention effect'' (RIE) based on hypothetical population interventions and applies a machine learning ensemble that allows data to guide us, in a controlled fashion, on how to use a large identifying covariate set. We illustrate with an analysis of policy options for reducing ex-combatant recidivism in Colombia.

• [stat.ML]**Sparse additive Gaussian process with soft interactions**

*Garret Vo, Debdeep Pati*

http://arxiv.org/abs/1607.02670v1

Additive nonparametric regression models provide an attractive tool for variable selection in high dimensions when the relationship between the response and predictors is complex. They offer greater flexibility compared to parametric non-linear regression models and better interpretability and scalability than the non-parametric regression models. However, achieving sparsity simultaneously in the number of nonparametric components as well as in the variables within each nonparametric component poses a stiff computational challenge. In this article, we develop a novel Bayesian additive regression model using a combination of hard and soft shrinkages to separately control the number of additive components and the variables within each component. An efficient algorithm is developed to select the importance variables and estimate the interaction network. Excellent performance is obtained in simulated and real data examples.