Near Real-Time Data Analysis
We are entering an era of cheap data. Sensor technology has advanced to the point where it has become easy to collect large amounts of measurement data at high spatiotemporal resolution.
We are now to the point where we have gigabytes worth of data on soil moisture, plant canopy processes, precipitation, wind speed, and temperature, but the amount of data is so overwhelming that we are having a difficult time dealing with it. The cost of measurement data is dropping so quickly, people are forced to change from a historical mindset where they analyzed individual data points to the mindset of turning gigabytes of data into knowledge.
One approach suggested by my colleague Rick Gill, a BYU Ecologist, is to collaborate with bioinformatics students. Because they are used to working with DNA data, these students understand how to write computer programs that analyze large amounts of data in near real-time. Rick came up with the idea to tap these students’ expertise in order to analyze the considerable information he anticipates collecting in our Desert FMP Project, an experiment which will use TEROS 21 and SRS sensors to determine the role of varying environmental and biological factors involved in rangeland fire recovery.
Rick and I are predicting that near real-time data analysis will give us several advantages. First, we need readily available information so we can tell that sensors and systems are working at the remote site. Large gaps in data are common for sites that aren’t visited often, and sensor failures are missed when data are collected but never analyzed. With our new approach, all data are databased instantly, and the results are visualized as we go. Not only that, we’ll be able to control what’s being analyzed as we see what’s happening. We can tell the bioinformatics students what we need as we begin to see the results come in. If we see important trends, we can assign them to analyze new data that may be relevant right away.
These techniques have the potential to help scientists from all disciplines become more efficient at collection and analysis of large data streams. Although we’ve started the process, we have yet to determine its effectiveness. I will post more information as we see how well it is working and as new developments arise.
Watch Dr. Gill’s data analysis webinar: Finding Insights in Big Data Sets
Download the “Researcher’s complete guide to soil moisture”—>
Get more information on applied environmental research in our
This is a general approach but with somehow attracting way..How about just illustrating one easy practical method which deals with real-time data analysis?
Thanks a lot
In other words please give a clear practical exemple…That’s all.
Thank you very much.
Hi Abdelwahed. Thanks for your question. It is somewhat hard to speak in specifics at this point, since it is a work in progress. However, we are collecting all data from the field site via Decagon’s Em50G with is automatically stored on a server. From there, we’re in the process of writing scripts to take the data off the server, parse it, and store it in HydroServer (pictured in the post), a freely available databasing software. HydroServer offers the flexibility that data can be manipulated with the popular programming language “R”, which we will use to provide the real-time data products. As I said, we are currently working on this, so don’t have any results to show, but will blog about it when we do.