- Gis Group needs more people
- GEONIS needs a service agreement
- Smart Forest
- Sensor Networks
- Streaming Data webpage- who has?-- Wade, Sven, Phil, Margaret, :: add more ::
- Where are they now
- better search options
- METADATA organization
- too many results in some cases
- first all site searchable data
Inigo -- DRUPAL GROUP -----ason
- code commits for the master branches of DEIMS
- special ones for each group, like mcmurdo, ntl :: add more ::
- CSV v. EML etc.
- Training? - last year Taiwan too far? but this year, 3 sessions
- Development, training, migration and adoption
- what can we add
- next databits editor
- Tuesday -- 3D scientific visualization, Long's Peak Diamond West, 4:00-6:00
- Wednesday -- 1:30 to 3:30 Mountain West East, Ned Garner, Visualization Workshop
- Tuesday -- 8:45 AM Ned's Talks
- Tuesday -- 10 am, DEIMS talk - Phil
- Wednesday -- 4 pm, Migrating your site to DEIMS
- Tuesday - 1:30-3:30 an exploration in LTER Large Data -- "big data"? Haha.
- Wednesday - 8:00 to 9:45 - GCE Sensor Toolbox usage, opportunities to get deeper into it, needs assessment
-
- data 1 and discovery problem (is this like SEO?) Semantic web features
-
- new data center ad hoc group
JAMES HAD A MOUSTACHE AS A BABY
-
provide a well documented set of core network information management serveices that support data preservation and re-use
-
lower the barrier to adoption
-
etc. :: add more ::
-
physical entity or more PI's
-
governance / operations committee
-
administrative functions are needed
-
Operations Committee:
- NISAC, IM EXEC, etc. committees will have overlap
-
Project management:
- The work that we are doing stays on track
- Management Processes - clearly defined. See flow chart?
- Data processing and discovery
-
Define a basic workflow
- input from the IMC
- Science colleagues
- Executive Core helps us to spend money in a way that makes sense to all involved
- Project selection/ prioritization process
-
Communications office collaboration?
-
IGERT grant for collaboration of broader sciences with IM's
-
NSF funding -- extension from Saran?
-
Can we move faster than that?
-
Theresa questions about meeting and funding :: could not hear well please fill in ::
- service framework, core and additional, creative activities
- strawman process for science input
- committee structure, interactive relationships
- technical infrastructure
- budget developement
In the new structure, how will we define our relationships so that it is not un-balanced.
Peter Groffman came by
-
Margaret -- technical interactions for on and off campus
-
UVA, Wisconsin, etc. interested.
-
Rules on the distribution of the money
-
Accountability from the ESIP PI?
-
Could not do this from office -- could not use ESIP infrastructure.
-
If ESIP will facilitate the science, it would be good.
-
Many lunches?
-
NCEAS
-
planned not as an inheritance of current components
-
scientific programming support
-
continuity
-
science working groups -> synthesis working groups and education
-
full time communication sperson
-
EcoLog - best established list serv
body message "SUBSCRIBE ECOLOG-L"
-
Matt Jones
-
Director of computing
-
Collaborative versioning and training the importance of code as a scientific product
-
Inigo : how is the communications office going to help?
Start with brainstorming - Cor
Itemization of what has worked, and what we can continue and expand like Design working groups, IM working groups, EML mentor, GCE Working groups- MCM
Thematic data centers, centers of excellence around various data themes, streamflow chemistry, working groups that could address that specifically so that the datasets can be named and structured in a way that they could be more easily pulled together at one site and people map in somehow - Don
A synthesis theme that the scientists would like? Lay the foundation for better described data that scientists could actually use - Don, Corinna, Emery
Corals of the Future - Science Synthesis working group that used the Moorea Model and tried to apply it to other sites - main suggestion was making a consistent data product modelled after the LTER data on Coral. -- MCM
Scope of data products - how can we get to a data product that is more flexible? Develop base level products that you can do analysis and synthesis that fits with the needs -- MCM, Corinna, Emery, Don
Best practices for how to do the stream data (or any kind of example data) - Climate best practices, Stream best practices, etc. - use the recommendations. Not complete uniformity but at least consistency across the measurements - Emery
NCEAS - first practice is to say what data we have and what condition that it is in. Some PI's don't know what they have, but others are able to provide lovely data on a link. Hardest thing is to keep people from bottlenecking on data that already exists. Scope is narrow for some things like coral reefs; scope is broad for things like stream chemistry. -MCM
Scientists will always want much more than we could provide - general consensus
Put the data into a public repository - EML, LTER, DataOne, CUASHI -- prexisting schema may help. Funding of people who do this ran out. We need documentation of the labs, methods, significant digits, detection limits, etc. If that was out there, sites could fit into it better. Attributes could be standardized, but right now they are not. Alba, etc. spent a lot of time trying to clean those things up. A real data component to a science synthesis project -- Don, MCM, Corinna
NCEAS requires every science working group to be involved with their programmers and the liason -- do they actually have a lot of products.
Will NSF Fund this? VEG-E (no), Clim-DB (yes)-- Corinna, Don, Emery
In theory....
-
we have a database of high value, scientists would approve it, how to get the data into the database and have people use it?
-
a funded representative helps to pull the data into the database
-
is there another way?
-
taking the idea of a center of excellence a little further and actually managing the data for all the sites -- not using the local IM at all but rather processing all the data for all the sites -- trade off the data into a new gear
-
direct communication between external IM at the excellence center and the field techs. It has potential to be more efficient. Have best practices sessions first.
-
Full range of variability that is at all the sites. Output final structure outputs. Don (or theoretical excellence person) designs the output to fit whatever is best.
-
What is the currency?
-
Data types:
- Streaming Data
- Organism Surveys
- Climate
- "Critter Counts"
- ITIS descriptions : taxonomy changes faster than ITIS?
-
sites might be not wanting to give up control
-
best practices - sites would develop as well as they could - a person can move between the sites to make a nice synthetic data set. Maybe updated every so often.
-
Could we have flagship products which are live and they update themselves
-
Scientists pass it over to the data scientists
3 ideas : we don't really know which would work, but it might cost some $ for these, and we don't want to ask the $ for this
- trading of units of work
- synthesis person on a specific topics
- center of excellence for a particular topic
What do we do if a site has something to offer but it doesn't fit the need of the other sites?
Can we send currency to other sites? DEIMS center? Web center of excellence? Web space for everyone?
Programming services -- Wade is very generous, people are using his product. PASTA-prog is helpful but it could be overwhelming.
What does being a network mean?
- same descriptors for everyone
- people worried about interest in data or stealing data
- NSF cares that data is accessible, available, and can be easily used
- NSF slow to recognize data management is important
- would need data manageres to visit NSF
- working in a federated fashion?
Groups overviewed what they found (1-5). Theresa, Gastile, Inigo, Phil, :: add names ::
Afternoon -- Corinna, Don, Me, Gastil, Mark from NCEAS, Emery, :: add names ::
- how can we make environmental data management more efficient
- workflow development
- how can we efficiently use the tools which are already out there
- done vs. perfect
- More efficient vs. better data product
- Resistance of turning the data over to another data product to process
- Mark - scientists don't even know that a lot of data models that are important, it's quite possible that researchers at a particular site don't even know that this model is good
- LTER does not have a catalog of expertise
- Currency may be really lopsided - training is critical in those cases
- Bitcoin
-
what are good data products and what are useful data products?
-
xml, knb, PASTA, sites each doing things their own way is probably not as good-- synthesis working groups will not want to synthesize these groups
-
nut-net and drought-net -> standardized sampling is not really an option for us
-
can we ever get to the place where things just flow easily into a database?
-
adding more work to what we already doc
-
mapping would be watched over
-
The larger issue is making the data at the site discoverable and integrateble for the synthesis efforts
-
new data and legacy data -- may need to be prioritized -- should we get the data in or work on how we get it out.
Common schema and common syntaxes for expressing our managements and structures - now what?
- site contributes?
- site does not contribute?
- person working there
-
Corinna<- great question, HOW CAN WE MAKE INFORMATION MANAGEMENT MORE EFFICIENT?
-
who pulls the data together and processes them?
-
analysts?
-
information modelstaers coming in, scientific programmers
-
expert in some kind of data is the one who works on that kind of data on behalf of a subset of sites that have that type of data
-
HDF/NetCDF :)
-
Recommend this directory of specialty:
-
"Specialist Services" : each site has a person who is an expert in one type of data who promotes an explicit model with certain data types.
-
Training?
- MOU
- working with data 1
- incentive to have own site would decrease over time
- ways to make information management more efficient
- was a little late
Theresa --> services group, lots of documents, added some to those, use the GEONIS as a template from which other services can involve.
John --> Training, graduate students, not only contribute data but use the data we are providing, the role of data quality and metadata quality. (Wade notes we need a way to evaluate quality)
Phil --> We talked more about NISAC. Who said we have to attend all three days of ASM -- 2 day IM meeting? Proposal team (small, not know how to make), review team (one person from each group)
Emery --> work more efficient and streamlined, and think about working with groups outside of LTER- data1, earthcube, datalink, esip, etc. Learn about what they are doing and think about how to frame the proposal in a way that might extend beyond the LTER. We also want to work with NCEAS and clarify what the relationships will be between our office and theirs.
Margaret --> present our decision to them like a faculty search would be. A position description, our decision process, and why that person is a good fit. Four volunteers from EB want to review our proposal. 2-3 weeks between iterations? We might have a draft by september.
Sites may want to participate in selection of the PI - we might create a document we can share with them. Phil and Theresa and Corinna would help to put it together. They would distribute it this week.
Template to the sites by the 4th. By the 17th, submit a mini-proposal from each site (will fill in the template)
-
NPP data - how to get
-
Data ontology we should explore
-
"increase terminological rigor in the sciences"
-
etimology of certain words important
-
there is some physical reality that is independent of the mind and we want the words to actually describe it - concepts or terms
-
Big Data = Volume, variety, velocity
-
Lehman and Tilman - so many definitions of the word "stability" -- how do we compare conclusions amongst many?
-
how do we capture the notion of forests
-
Simple semantics SKOS
-
we need a better conclsept of terms at the dataset level
-
URL : uniform resource locator, static, rendered, pointers to places. If you're on a page let's talk
-
could be called IRI's to represent international
-
URI's : abstract that notion of location to identification
-
URI's tell you about something at that end point - an identifier that is a "global identifier"
-
Identify some resource. Any place on the web you can find.
-
Persistance : identifiers that stick around for a long time
-
what if we use the label like "is part of", "preys upon", "generates"
IRI: passenger pigeon IRI: hasConservationStatus IRI: extinct
-
-
set theory:
- difference between instances and subclasses - important to know
- The wine is the class, the data about the wine is the subclass, the bottle is the instance
-
Ontologies based on web standards are really where want to be based on the technical standards over the web.
-
what sort of structures are really useful to make something better than simple prototype ontologies
-
LET'S READ THE W3C STANDARDS TOGETHER!!!!
-
annotation : binding a concept to instances of that concept in your data
-
semantic annotation - the scientist provides description about what information means
-
ecological cycling concepts - how to build an ontology - download an ontology editor such as protoge and use it to make a good system
- mark schildhaur
- margaret o'brien
-
can be used to make an ontology paper - puts in sources and stuff, you can annotate the column or the whole dataset with that concept, and you'll be saying that the data set adheres to the definition given
-
adopting the SKOS ontology
-
we included some connection information. Everyone should have an orchid ID or something else similar.
-
Term: "knowledge modeler"
-
Test queries - improve discovery -- build up a test corpus of the matches to the queries
-
Precision : you are getting what you want;
-
Recall : you are getting back a lot of cruft
-
semantics - how to broaden that search?
-
Consensus can be avoided if you put in a "fete comple" -- "uhm you did a lot of work but we can't do much better"
-
a nice, polished product that has utility for the scientists
-
ANPP means annual or aboveground net primary productivity
-
annotation properties: just "tags"? Not really deeply searchable
-
what is the binding - like if you bind to carbon, does it bind only to that level or does it also bind to the things above it?
-
measurements is the focus right now
-
import the subtrees of the ontologies that you need
-
example is "lignin" -- a behavior ontology
-
tagging is the formal semantics
-
two people from UGA/GCE
-
using DRUPAL but don't know much
-
interested from Oracle perspective
-
David Blankman - the EML evangelist - woods hole, FCE, etc.
-
good for managing personnel and for managing content
-
bibliographic content fits in nicely
-
do not do a lot of the editing in drupal itself
-
how does new data get into DRUPAL?
-
forms are a pain in the butt
Very lovely streams in McMurdo:
Here is a general link to the McMurdo streams site: MCM
-
Inigo has many options to do things with forms. A user doesn't see these forms. User goes to the new draft form and them moves from the display of the dataset to its form view. Dataset ID is part of the NIS system in the Pasta.
-
Editor can paste stuff straight from word.
-
content or ancillary. Each tab has different aspects of the data. You can add things like data sources, personnel, etc. The custom things in github can be special categories.
-
i.e. in McMurdo longterm v. shorterm.
-
very long maintenance field can be editted to hold history.
-
seems like a very good tool for linking people with data. in McMurdo the lead PI is usually the one listed.
-
selectable names in the database - this seems friendly.
-
methods : shows how we got the data in there
-
pull down of the core areas - actual LTER 5 core areas - can label the data set with different templates. Helps to create automated views and lists of things. CORE ---> THEMES ---> SITE SPECIFIC; linked to LTER controls made by John Porter.
-
put in the dates - 2015, etc. What happens if you put in a bad date format?
-
related information / relevant links
-
papers that result from the dataset - ones that are on your website- they are not referenced from an external source.
-
geo reference - can do all kinds of geo data -- string, shape file, different workflow puts these in so you don't have to worry that a stream is a point. renders the data set. Very good.
-
has its own CMS system built in, and includes an email re. commitments. When the moderation state moves to published its okay.
-
good data gathering workflow -- this is a reasonable system for establishing that.
-
Most of Drupal is forms--
-
physical data, you upload it, it's BIG!
-
you can remove and replace the new data set, and upload it there. There's a ton of meta data you can add in.
-
you can store the delimiter
-
connectors to databases
-
your data database is separate from your drupal database
-
painfully rich metadata
-
discharge in L/s at MCM ? CFS at HJA. Hmmm
-
data explorer will let you query that special database using the native variables; builds a form that exposes teh variable and allow users to filter on that variable...
-
date times must match, discharge rates must match
how does this work:
-
inigo hacked the core of the mysql driver to keep the date space time titles to fix this. wow. amazing!
-
here is a page with the data sets and summaries, plus links to data explorer
-
populates the name and variables
-
custom deims makes lovely images - inigo made a view in a block and put it into google images as a kml. really super awesome - renders a map with the polygons from the spatial data
-
d3 js used - nothing has to be written, reads right off the database. jquery/ajax for graphs. biblio module.
-
responsive web design has "priority columns set up"
-
invisi-mail, here's a form field obfuscator
-
The Book of DEIMS has all the interesting ways to install and use. Also it has "wizard". Clone from GitHub.
- Improve and well document how to use project.roles <- this is something that could need fixing.
- Improve personnel - metadata provider - front end
-
Climate data
-
area of global change science where the models are great and the communication is important
-
originally worked in biodiversity -- LTER is his foundation
-
information to knowledge
-
REACHING PEOPLE WHERE THEY ARE!
-
the arctic - climate action plan - ned's team supported this with his toolkit
-
working actively with NASA
-
the trajectory of the Artic was a real concern. Really cared about sharing that message. color and "age" of ice
-
prioritzing outreach
-
data curation
-
data access (especially satallite)
-
sharing code
-
A UNIQUE AND VERY POWERFUL CONCEPT
-
really lovely data cover visualizer. Show people how remote sensing, multi-spectral systems work. Made a nice visualization that allows scientists to make maps that are all over the world. deconstruct how maps are made
-
biodiversity theme at the museum - bioviz
-
people who are non-scientists don't take in the information the way we do. "when people who are non-scientists see shifting colors on a map they say, 'oh, shifting colors on a map, okay'"
-
researchers out of yale - greenhouse gases read experiment
-
storytelling
-
data.gov
-
visualizations embedded within narratives can help people understand and build relationships
-
"watch the story" ==> simple visualizations can be VERY POWERFUL
-
FOX: This guy is my inspiration. I must be this Ned Gardiner guy.
What is big data?
-
share our ideas and experiences with storing and managing and delivering large data fields
-
services we might need at the network level?
-
big data set relates teh use of advanced methods like modeling to extract value from Data
- large, complex, or both
-
Ecological Systems Theory --> very complicated lots of stuff
-
how large is large?
-
reference list of existing resources for storing large data files
-
how large is large?
-
is there a constraint to the file size we can use in synthesis processes?
-
is the whole more than the sum of its parts?
-
pasta can't handle our really large data sets -- what do we do?
-
transfer of data as cost versus storage as costs!
-
How to plan for giant data?
- talk to perspective PI's and Scientists
- Raid arrays etc. lose storage capacity, doubles every year and a half to two years the capacity for storage
- host institution can help to add space
- network capacity to transfer the files is limiting
-
I would suggest this: https://www.chameleoncloud.org/docs/user-guides/openstack-kvm-user-guide/
-
budgets and funding cycles
-
IM's must compete with scientists?
-
Oregon state shares costs
-
Pointer to pull down the large data
-
Expanding using NAS HDD
-
-
project forms?
-
part of project planning phase?
-
other?
-
Amazon S3
-
it's expensive if you are transfering
-
Inigo is using the cloud
-
downloads
-
do it in the cloud
-
torrent
-
tarballs
-
tapes / flash disks
-
mp3s
-
Map and image services as separate services
-
D-dupe "cyclic redundancy checking - makes sure there's not 600 copies of the same data on the server- everyone else gets just a pointer to it."-- really great approach. spatial data area is available to everyone, people are not making multiple copies of the same data.
Did not get as many notes as I'd like because it was very interesting so I was sort of in my zone. Will be getting copies of notes from participants.
-
back transform space into image space
-
watched video
-
Theresa - fusion, watershed 1,
-
point cloud lidar
-
Andrews LiDar - big data set many folders
-
profile of stream or trees
-
Bob - VELMA, Vistas - overlay, new technology is on the way!
-
stream flow vs. soil moisture v. other drivers
-
Envision and models - Allison
-
techonology- Chris
-
Quick overview, then talking about how to use with Coweeta
-
Automating Sensor Data Collection with the GCE
-
started at GCE in 2000 - more than 4000 downloads and used at 80 + sites in LTER and elsewhere
-
MATLAB is a mathworks tool : its costly, but not bad as far as commercial software goes. It does allow tinkering with the source code. It is scaleable.
-
Data model that will work well with long-term data.
-
any number of numerical and text variables, structured metadata documentation along with it.
-
thoughts from fox : really it's a pretty great tool, when it's managed well for a specific source and site. with good programming knowledge and maintenance, and maybe a little bit of tinkering, I feel more inspired by this tool today than I do normally. Wade has a mastery of this tool that I did not see in other experiences with it.
-
all of its functions must use the attributes in its metadata to work with it out of the box.
-
Campbell logger files, Seabird Logger files, Hobo logger files, etc. * SQL data sources, CLIM-DB, Hydro DB, etc.
-
Data Turbine and other streaming data middleware.
-
"friends don't let friends type metadata" - is smart tool to get what it can get from the data headers
-
gives you managed data and fully documented data sets
-
there is a streamlined method to use the software without having to generate all the metadata and stuff, you can actually work with only the import and data stream parts -- I wonder if what the Andrews is doing is this? We seem to have a lot of metadata we don't use?
-
post-processing tools, can set rules on the drive values to carry through QC information as you work with it. The data and metadata can be exported in a variety of formats, or pushed into a relational database. Push directly into the Drupal system is a future possible. CUASHI data model. HTML, KML, XML, SQL dbs, etc. Generate the web dashabord from the box without doing special stuff.
-
select and merge can be automated, as well as cleaning between certain days. Run any number of harvsters on a little schedule. Grow those files and go back in later and review the data
-
every operation is in the context of a dataset
-
there's a wiki site, svn repository -- I should ask if I can migrate it to github
-
user-support
-
list-serv
-
training opportunities / workshops
-
ISO standards aren't really used in MATLAB structure, because it is not so tabular? :: I didn't really follow this, but I think I missed something ::
-
harvest workflows
-
web design is coupled tightly- file is read from station to website
-
trimming features are built in to trim down the size of text files that can become un-useable.
-
you can do a lot in just the GUI
-
data set editor is the tool with the menu system
-
loads a naked campbell file and assigns nice names to it; doesn't know the units yet because they weren't in the campbell logger; comes in with basic data type information. Floating points, strings, etc. all described in there -- these help us use it in certain ways.
-
in the gui, you can change all the organization and structure of the attributes-- like wade showed how he changed "record" to ordinal from whatever it came in on
-
data table viewer in the database. the data is editted but once its destroyed or changed you can't go back. So you have to save originally an intact version.
-
Big edits, like filtering, will generate for you a backup - but small operations, like making a new flag, will not.
-
the date time- generates date component columns - there is a manual way to do all these changes.
^^ note the above being said for repeatability ^^
-
change the data select
-
right now there is not "flag semantics" - so you must always throw all the flags -- I, Q, V, etc. master list of flag with some priority ordering. They would maybe do this if they got more funding though :)
-
documentation from this presentation would be useful for future because there is a lot of documentation on the toolkit, so the wiki helps pair that down too.
-
QC Framework - data model has built in storage for the flags ('flags','values')
-
Data structure must fit the right framework before it can even go into the typed data system with all the rules. The qc is the finer scale qc
-
there are tools for bulk flagging and shift correction
-
the toolbox generally thinks in a vectorized way
-
rule based QC can "cause as many problems as you solve"
-
there are nice provisions for going back and doing manual flagging. This is one of GCE's nice features. This is good for removing flags too!
-
you can import and copy flags.
-
flags can really be manipulated very well manually. it has a ton of tools for this. Wade also has a good QC strategy -- limit rules, set based rules, etc. Multi-column dependency checks.
-
:: computer had to update, missed a few here :: but it is possible to interpolate missing date times
-
you can hide the missing values
:: please fix if I got your name bad ::
- Claire- cedar creek
- Luquillo (didn't get name)
- CU Boulder NIWOT Ridge LTER
- NTL Sam Zipper - agricultural and urban systems
- UGA person! GCE.
- Dom - groundwater
groundwater and precip and trees
-
continuum of drought and the gradient in certain systems
-
oscillations and such, look at mms versus different from historical averages
-
historical rain vs. continuous rain vs. sample consistency over time
-
sample when there is a lot of rain
-
stream flows and run off -- stream discharge, what's available to the forest, etc. evapotranspiration.
-
cedar creek inputs and penetration are so different from luquillo and from andrews and water access.
-
satallite
-
luq.lternet.edu/article/2015/7/10/effects.... :: link very cool graph with greenhouse and soil o2 ::