What does NASA do with big data? Related transaction analysis

During the time we read the title, NASA may have collected up to 1.73 GB of data from about 100 of the current active tasks. NASA (NASA) is continuing to promote the work, and data collection rate is also growing exponentially. It is precisely because of this, this part of the data management has become a difficult task facing it. However, the data collected by NASA is also very valuable and plays a significant role in the relevant scientific research work. NASA is working to streamline the use of these data, integrate it into day-to-day work and predict the trends in the universe, and hope to seek the well-being of all mankind through innovation and creativity.

In the "Open Government Plan" version 2.0, which was published in 2012, NASA has discussed but not in-depth analysis of the large value of "big data" in its work & mdash; and they are Admitted that large data also has a very wide range of exploration potential.

I believe we are very clear about the definition and role of large data, so here is no longer on the specific concept to be described. Now let's go straight to the topic of today's discussion.

NASA Big Data Challenge

We may generally think that NASA's big data challenge is the challenge associated with the Earth - but the reality is not so rigid. Most of the large data sets are defined by an important metadata set, but these large data sets also present serious challenges to current and future data management practices. In general, NASA's main task is to continue to obtain information from spacecraft in space, and its production rate is much higher than the current data management, storage and analysis speed. NASA has two types of spacecraft, one for deep spacecraft, and the other for near-Earth orbit satellites. The role of the deep spacecraft is to the amount of MB per second to send back to the Earth data, and near the satellite although the operating mechanism with the deep spacecraft similar, but the amount of data transmitted per second GB level. NASA uses laser and other communications technology to accelerate the download capacity of large-scale data to thousands of times. But for now, NASA can not handle so much data, and it is clear that it is necessary to prepare for this. In fact, the current NASA goal is to process 24 TB of data in a day. If we see the overall amount of data as a single task, then the single-day data throughput will reach 2.4 times the Library of Congress.

NASA is focused on collecting the most important information from a large number of data, rather than storing all the data, because transferring data from the spacecraft to the NASA data center will bring a very high cost. NASA is also faced with a series of follow-up tasks for storing, managing, visualizing, and analyzing the data in the data center. In order to have a preliminary understanding of the size of the mission that NASA needs to deal with, we share an example here that the size of the global climate change database is expected to increase to 230 PB by the end of 2030. More accurate comparison, the United States within one year of postal services sent by the total amount of all the letters only the equivalent of 5 PB.

In addition to spacecraft, NASA also needs to process data from online platforms, low-cost sensors, and mobile devices. In October 2012, an article published in the Harvard Business Review magazine described the task as "everyone else is equivalent to a data builder that will move around". Like many other organizations, NASA's big data challenges seem to be extremely difficult to solve.

And it is conceivable that the increase in the amount of data is not the only challenge facing NASA. With the increase in the size of the data, with the transfer, index and search and other challenges are in exponential growth. In addition, the complexity of algorithms and devices continues to rise, technology updates are accelerating and budget levels are declining, all of which have had a significant impact on NASA's large data processing efforts. Fortunately, the current US government is highly concerned about the big data challenges. In March 2012, the Obama administration announced the "Big Data Research and Development Program", which focused on using the required technology and tools to enhance the ability to access, organize and access information from a large number of digital data. The goal of the program is to change the way the government uses large data and to make the data more valuable in biomedical and environmental research, education, national security and even scientific discoveries.

Existing program

NASA is considering building a new processing solution designed to visualize, analyze and interpret the highest priority data. In the government, the reality also requires its bottom-up and top-down approach to large data for effective treatment. NASA based on the objectives of the task (including technology, science, human space exploration, aviation and combat) "angle, through the" open government plan "2.0 version found a variety of large data processing solutions and practical initiatives.

NASA brings the world's leading examples of processing, such as archiving, storage, management, visualization, analysis, and practical use of large data:

Management and processing

Mission Data Processing and Control System (MDPCS) shows the specific method of NASA processing and managing large-scale data. Recently, the Mars probe curiosity that use this program. MDPCS combined with the deep space framework to ensure that NASA's curiosity Mars reconnaissance aircraft to provide on-site data, and these raw data for real-time processing. Prior to this, the whole process takes hours or even days to complete the calculation. In addition, the aircraft operations team is also using the Custom Data Visualizations built by the system in the implementation of the task.


NASA's Goddard Space Research Institute and the Global Modeling and Assimilation Office primarily use the NASA Climate Simulation Center (NCCS), which specializes in providing large data storage solutions for NASA. NCCS's main focus is on weather and climate data, with a total of 32 PB of current data and a total storage space of 37 PB. NCCS also uses a set of advanced visualization tools, a 17-foot-16-foot visual wall. The tool provides a high-resolution interface that allows scientists to display relevant animated content, images, and video for NCCS data.

Archiving and distribution

The Center for Atmospheric Science Data Center (ASDC) focuses on the Earth Science and Planetary Data System (PDS), focusing primarily on the field of planetary science. ASDC operation of the intuitive display of NASA for large data archiving and processing. ASDC is located at the NASA Langley Research Center and is responsible for the distribution, archiving and processing of NASA Earth Science data. The atmospheric data provided by ASDC plays a vital role in understanding global climate change and the impact of human activities on climate change, and at present it has collected years of climate data. PDS incorporates scientific data into the NASA Laboratory's Planetary Survey and Astronomical Observatory website, which currently offers more than 100 TB of space images, models, telemetry, and various types of information related to planetary missions over the past 30 years.


NASA's Pleiades supercomputer provides powerful analytical capabilities and supports all tasks from space weather, solar flares to space full-featured vehicle design. Pleiades has recently been used to handle NASA's massive amounts of star data collected from the Cape Gaswell spacecraft. The Kepler Spacecraft is responsible for searching the galaxies for planets that are close to Earth's size. There are about 1,200 users in the United States rely on this system to handle complex and computational tasks. In addition, Pleiades was also used by developers to conduct Bolshoi cosmological simulations. This project analyzed the evolution of large-scale structures of galaxies and even the whole universe over the past few billion years.


NASA Earth Exchange Virtual Labs (NEX) integrates data visualization, data systems, models and algorithms, supercomputers, and ultra-large-scale online data using collaborative technology and social networking. Before NEX was established, the scientists invested a lot of time and effort to build high-end computing methods, which directly led to its inability to concentrate on real scientific problems. Scientists are now able to use the supercomputer to visualize geoscientific data sets while sharing and running modeling algorithms and collaborating on existing or new projects. Recently, the NEX environment was used by a research team in the United States to map atmospheric imaging images to observe global vegetation density at a resolution of 30 meters. The 34 billion total of the total number of pixels in the Pleiades supercomputer took only a few hours to complete the deal, making the team can easily on all kinds of new methods and algorithms for experiments. NASA also provides a great deal of knowledge sharing and collaboration platform for the Earth science community, which covers a combination of workflow management, Earth system modeling, NASA remote sensing data sources, and supercomputers to provide researchers with a comprehensive, Program.

Commercial cloud computing services

The results of the mission implementation of the Mars Science Laboratory demonstrate that NASA's current large-scale data modernization approach is effective and that it leverages commercial cloud storage solutions and cloud computing services. NASA migrated to Amazon Web Services and Content Management systems in less than four months. The Mars Science Laboratory has been highly dependent on mission-critical applications in the past, but these applications are distributed within about 10 data centers and any failures are likely to affect about 150 Gb of data per second for the public, scientists, and operators Stream delivery capability. Now, the team developed solutions that can download telemetry data and original image solutions directly from curiosity. All images from Mars are delivered, uploaded, stored and processed in the form of data streams to the cloud. With the availability of high availability and scalability of the database, the relevant data classification and through a Restful interface released to the application and users. In this way, the content manager of the Martian website can use powerful real-time images to provide relevant information. The solution helped NASA deliver up to 120 terabytes of dynamic content and 30 terabytes of static content overnight to fully meet the click request for more than 8 million downloads per minute. In addition, the team can also take full advantage of JPL Nebula and JPL Galaxy supercomputer Viagra. The two supercomputers are able to process about 200 Monte Carlo simulation tasks at a rate of 20 GB per task within 24 hours.

NASA in the real life of large data applications

The inclusion of large data technology into NASA not only brought many benefits to the US government, but also brought tangible effects to the general population. As NASA's expertise in large data technology expertise in real life, it is in the field of aviation safety. NASA collects data from aircraft to find all kinds of security risks that can help commercial airlines improve existing maintenance processes, while avoiding all types of equipment failures. Using advanced algorithms, NASA is able to extract relevant information from a large number of unstructured data for anticipating and avoiding security issues. Using an open source algorithm called multi-kernel anomaly detection (MKAD), NASA can find common points between two persistent data networks or data streams, and then uses a single framework to detect its content to establish pattern awareness and thus automatically detect Its association with previously occurring flight failure events.

Big data brings opportunities

NASA has deservedly become a leader in large data applications, from real-time observations of global climate change to the study of solar plasma sprays, to most of the larger engineering and modernization tasks. At NASA, scientists are working to take advantage of innovative ways to control the changing environment and help the government cope with the many challenges that come with it and the way NASA itself does business. NASA in the field of large data exploration is undoubtedly have almost unlimited development opportunities.

An open government plan outlines NASA's specific initiatives in the area of ​​large data exploration. NASA has now established the data.nasa.gov website as its data reference portal, and we can view it as NASA's unique and extremely unique simple data directory. NASA is also using these capabilities to provide users with more easy-to-use high-quality tools and related data applications.

NASA scientists set a goal for creating more collaboration space for NASA's big data development opportunities while strengthening partnerships with other organizations to encourage the general population to use these raw data sets And support the construction of related applications and NASA's own mission. NASA also collaborated with the US Department of Energy's Science and National Science Foundation's "Big Data Challenge" contest on the TopCoder & rdquo; platform. Participants are required to develop mobile applications that are designed to discover new values ​​from discrete data from the government information department and then think about how to get out of the individual island restrictions to be integrated into inter-agency general-purpose solutions for sharing. This is a new partnership with NASA opportunities and development direction, but also help to help the Government to achieve the future success of the new thinking and unique concept of progress. Based on this, we have also been able to glimpse NASA in the effective handling of large data and the use of groundbreaking work to give full play to its specific considerations. To be sure, if we can effectively manage large data, we can use these data more. In addition, by virtue of the important organizations within the NASA such as the widespread popularity of large data also ushered in a bright future. There are a growing number of talent with large data training qualifications, and many of the world's top institutions are also actively attracting such talent. As a result, access to knowledge reserves and certification in the areas of large data also helps to achieve the desired job opportunities in well-known institutions around the world.

