当前位置:首页 > Windows程序 > 正文

Growing Pains for Deep Learning

2021-05-24 Windows程序

Growing Pains for Deep Learning

Advances in theory and computer hardware have allowed neural networks to become a core part of online services such as Microsoft‘s Bing, driving their image-search and speech-recognition systems. The companies offering such capabilities are looking to the technology to drive more advanced services in the future, as they scale up the neural networks to deal with more sophisticated problems.

It has taken time for neural networks, initially conceived 50 years ago, to become accepted parts of information technology applications. After a flurry of interest in the 1990s, supported in part by the development of highly specialized integrated circuits designed to overcome their poor performance on conventional computers, neural networks were outperformed by other algorithms, such as support vector machines in image processing and Gaussian models in speech recognition.

Older simple neural networks use only up to three layers, split into an input layer, a middle ‘hidden‘ layer, and an output layer. The neurons are highly interconnected across layers. Each neuron feeds its output to each of the neurons in the following layer. The networks are trained by iteratively adjusting the weights that each neuron applies to its input data to try to minimize the error between the output of the entire network and the desired result.

Although neuroscience suggested the human brain has a deeper architecture involving a number of hidden layers, the results from early experiments on these types of systems were worse than for shallow networks. In 2006, work on deep architectures received a significant boost from work by Geoffrey Hinton and Ruslan Salakhutdinov at the University of Toronto. They developed training techniques that were more effective for training networks with multiple hidden layers. One of the techniques was ‘pre-training‘ to adjust the output of each layer independently before moving on to trying to optimize the network‘s output as a whole. The approach made it possible for the upper layers to extract high-level features that could be used more efficiently to classify data by the lower, hidden layers.

Even with improvements in training, scale presents a problem for deep learning. The need to fully interconnect neurons, particularly in the upper layers, requires immense compute power. The first layer for an image-processing application may need to analyze a million pixels. The number of connections in the multiple layers of a deep network will be orders of magnitude greater. "There are billions and even hundreds of billions of connections that have to be processed for every image," says Dan Cire?an, researcher at the Manno, Switzerland-based Dalle Molle Institute for Artificial Intelligence Research (IDSIA). Training such a large network requires quadrillions of floating-point operations, he adds.

Researchers such as Cire?an found it was possible to use alternative computer architectures to massively speed up processing. Graphics processing units (GPUs) made by companies such as AMD and nVidia provide the ability to perform hundreds of floating-point operations in parallel. Previous attempts to speed up neural-network training revolved around clusters of workstations that are slower, but which were easier to program. In one experiment in which a deep neural network was trained to look for characteristic visual features of biological cell division, Cire?an says the training phase could have taken five months on a conventional CPU; "it took three days on a GPU."

Yann LeCun, director of artificial intelligence research at Facebook and founding director of New York University‘s Center for Data Science, says, "Before, neural networks were not breaking records for recognizing continuous speech; they were not big enough. When people replaced Gaussian models with deep neural nets, the error rates went way down."

Deep neural nets showed an improvement of more than a third, cutting error rates on speech recognition with little background noise from 35% to less than 25%, with optimizations allowing further improvements since their introduction.

There are limitations to this form of learning. London-based DeepMind—which was bought by Google in early 2014 for $400 million—used computer games to evaluate the performance of deep neural networks on different types of problems. Google researcher Volodymyr Mnih says the system cannot deal with situations such as traversing a maze, where the rewards only come after successfully completing a number of stages. In these cases, the network has very little to learn from when it tries various random initial maneuvers but fails. The deep neural network fares much better at games such as Breakout and Virtual Pinball, where success may be delayed, but it can learn from random responses.

When it comes to deploying deep networks in commercial applications, teams have turned to custom computer designs using field-programmable gate arrays (FPGAs). These implement custom electronic circuits using a combination of programmable logic lookup tables, hard-wired arithmetic logic units optimized for digital signal processing, and a matrix of memory cells to define how all of these elements are connected.

温馨提示: 本文由Jm博客推荐,转载请保留链接: https://www.jmwww.net/file/70425.html