|Bharathi Asokarajan||Abhijit Bendale||Mohammadreza Esfandiari|
|Joshua Little||Ethan Rudd||Kelly Spendlove|
|Yazhong Wang||Benjamin Weinstein||Xin Yang|
Data Science and Analytics
Pixel oriented visualization – An aid to analyze large-scale text data
Classics scholars work with text data that is not just Big, but also Interesting and Complex. We are developing new pixel-based text visualization technique that display the hierarchical structure of primary texts with their rich apparatus metadata in an accessible and comparable fashion. As a part of this, we are investigating new ways to support focus+context interactions across multiple scales of text. These visualization designs will help scholars to engage effectively and efficiently with the long and deep provenance of knowledge that surrounds some of humanity's most important historical works. We anticipate that successful application of new interactive visualization techniques for text analysis to a complex domain like classics will provide a clear direction for application to scholarship and learning on text, language, and communication in a wide variety of domains.
I am a graduate student in Data Science and Analytics program at University of Oklahoma. My research area is in data visualization and I am currently working as a research assistant in The Digital Latin Library Project. Our visualization team is working on developing novel tools to help the Classics scholars to perform text analysis of ancient Latin manuscripts. A very unique feature of these texts is that it is accompanied by densely encoded critical apparatus, which includes annotations and commentaries of words recorded by numerous scribes. The analysis of these texts results in critical edition, which is an attempt to recreate the original text by eliminating the transcribing errors committed by scribes. Due to the large-scale text data and the necessity to analyze the text at multiple levels of granularity (such as pages, lines and words) we are using a variation of the pixel-based visualization technique with additional query features and focus+context techniques.
|Data Visualization||Human Computer Interaction||Text Analysis|
Towards Open World Recognition
With the of advent rich classification models and high computational power visual recognition systems have found many operational applications. Recognition in the real world poses multiple challenges that are not apparent in controlled lab environments. The datasets are dynamic and novel categories must be continuously detected and then added. At prediction time, a trained system has to deal with myriad unseen categories. Operational systems require minimal downtime, even to learn. To handle these operational issues, we present the problem of Open World Recognition and formally define it. We prove that thresholding sums of monotonically decreasing functions of distances in linearly transformed feature space can balance “open space risk” and empirical risk. Our theory extends existing algorithms for open world recognition.
I am PhD student in Computer Science at University of Colorado at Colorado Springs. In the past, I have worked at places like MIT, MIT Media Lab, EPFL-Idiap in Switzerland and SRI International. I regularly consult for number of startups in the domain on machine learning and data science. My research lies at the intersection of machine learning and computer vision. I am interested in building systems that learn continuously from incoming data and are robust to operational issues that arise in real-world. The primary focus of my work is in the problem of large scale object recognition for image understanding. However, tools and methods that I have developed have been applied in multiple areas such as Biometrics/Face Recognition, Social Computer Vision, Predictive Analytics in Social and Programmatic Advertising.
|Computer Vision||Machine Learning||Data Science|
BSP: An iterated local search heuristic for minimizing 0-1 loss
We consider the problem of minimizing the number of misclassified instances, known to be NPhard,plus the hinge loss. We implement and study an iterated local search algorithm and study it on 33 randomly real datasets from the UCI machine learning repository. We consider the effect of optimizing the error and margin terms separately and together with a parameter to balance the two terms. We show the effect of search parameters on our algorithm as well as correlation between training and test error of hyperplanes as the search progresses. Our program averages 12.4% error compared to the cross-validated liblinear program that reaches 12.4%. With bootstrapping our error reduces to 12.3% and the same holds for liblinear. In terms of running time We provide a a freely available beta version of our implementation.
I'm working on different methods of Supervised algorithm. Currently I'm working on 0-1 loss minimization problem.
|Machine Learning||Data Science|
Computer Science & Engineering
Large Scale Understanding of Activities in Public Spaces
Obesity is one of the greatest problems currently in the United States. In order to effectively encourage human activity and help fight obesity, it is important to understand how people's use of public spaces is affected by different factors, including the built environment, the current weather, and recent public health awareness campaigns. By exploiting large-scale online image archives—such as the Archive of Many Outdoor Scenes, a data set archiving imagery from 30000 public, outdoor webcams from over the last 8 years—it is possible to obtain millions of images regarding people's use of public space. However, webcams are usually low-resolution enough that existing methods of detecting and classifying people in images perform poorly. Data driven methods are required to learn how to effectively use image context to solve this problem, and therefore to turn large image archives into useful resources to understand human behavior.
I work on computer vision problems (usually) involving large sets of images, such as from the Archive of Many Outdoor Scenes.
|Computer Vision||Machine Learning|
The Extreme Value Machine
This paper describes a scalable, non-linear model called the Extreme Value Machine (EVM), an analog to the Support Vector Machine (SVM) derived from statistical Extreme Value Theory. The EVM is far more scalable than a kernelized SVM, exhibits comparable accuracy on closed set datasets (where all classes are known at test time), and avoids the need for a parameter grid search. This allows the EVM model to scale to large datasets that are computationally infeasible for non-linear SVMs. Moreover, unlike SVMs, our EVM model performs well in the open-set regime (when unknown classes are present at test time), achieving state-of-the-art results on open-set datasets.
I am a graduate student with reasearch interests in a combination of computer vision, machine learning, and computer security, especially as they relate to increasing speed/scalability of current alorithms and implementation.
|Computer Vision||Machine Learning||Compter Security|
Determining Periodicity In Data
In the last few years high-throughput technologies have enabled the efficient and inexpensive collection of massive amounts of data. In many cases the data are high dimensional and being generated by some nonlinear system. In such a situation one is interested in both the geometry of the data and the action of the unknown nonlinear system. One of the most fundamental problems in analyzing nonlinear systems is determining whether the system is periodic. However, the past few decades of dynamical systems theory have shown nonlinear systems can exhibit extremely complex behavior with respect to both system variables and parameters. Such complex behavior proven in theoretical work has to be contrasted with the capabilities of application; in the case of modeling multiscale processes, for instance, measurements may be of limited precision, parameters are rarely known exactly and nonlinearities are often not derived from first principles. This contrast suggests that extracting a robust characterization of the periodic behavior is of greater importance than a detailed understanding of the fine structure. For such a characterization, we propose an approach which incorporates Takens’ embedding theorem, persistent homology and diffusion maps.
I am an NSF Fellow in mathematics at Rutgers University. My research concerns the analysis of data which is typically high dimensional and being generated by some nonlinear system. I focus on investigating both the geometry of the data and the action of the unknown nonlinear system using methods from topological data analysis, Conley index theory and spectral geometry.
|Topological Data Analysis||Dynamical Systems||Spectral Geometry|
Cosmological experiments in condensed matter system
We studied topological defects in hexagonal manganites to help understand the evolution of our Universe based on Kibble-Zurek mechanism. In this work, we have to account defect density in a large scale and the coordinates of each vortex core on some optical images. It takes us several months to do this data analysis. We are seeking more efficient way to do this work. And it will take us huge convenience in the future research.
I work on magnetic and ferroelectric properties on emergent materials. Also we study the topological defects by ferroelectric domains. I this work, we need to deal with tons of ferroelectic vortex-antivortex. This data analysis takes us lots of time. We are seeking more efficient methods.
Ecology and Evolution
A pipeline for combining crowd-sourced images and computer vision to monitor plant flowering
Using images gathered from the Flickr photo sharing cite to collect data on the timing of flowering plants in Mt. Rainier National Park.
My primary research focuses on the mechanisms that maintain biodiversity. Most of my work is on tropical hummingbirds and their food plants in a global biodiversity Hotspot in Northern Ecuador. I am committed to finding new ways to collect field data and increase the use of computer vision in biology.
|Community Ecology||Computer Vision||Citizen Science|
Spatial Regularization for Multitask Learning and Application in fMRI Data Analysis
fMRI data has extremely complicated structure. Hence efficient and accurate models are necessary in detecting accurate neuronal activity, by incorporating spatial and spectral information. In this paper, we use General Linear Model to formulate the fMRI data, where each voxel is assumed as a task, and proposed a class of spatial Multi-task Learning models, which incorporates spatial information provided by each task’s neighborhood. Simulation and real application results show satisfactory performance from spatial Multi-task Learning algorithms.
I am a graduate student from MTSU, my research area is applying Machine Learning algorithms to fMRI data.
Information Sciences and Technology
Clustering Distributions at Scale: A New Tool for Data Sciences
We introduce a fast and parallel tool for clustering large-scale discrete distributions under the optimal transport distance. The significant computational cost brought by the optimal transport has left the problems of machine learning from such unstructured data almost untouched till today. Our proposed optimization method successfully resolves the scalability bottleneck of previous methods, hence is readily applicable for analyzing large distributional dataset without first specifying which form the distribution of data has to comply.
I am interested in the broader area of machine learning and vision. My PhD thesis is focused on large-scale non-parametric learning and its applications in visual perceptions and affective image modeling.
|machine learning||multimedia||computer vision|